ZooKeeper

“Because coordinating distributed systems is a Zoo”

Background

ZooKeeper in the Hadoop ecosystem

image

Motivation

ZooKeeper: designed to relieve developers from writing coordination logic code.

A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

Highly Available

Tolerates the loss of a minority of ensemble members and still function.

One more way of looking at this odd number majority scheme is that, when there is a partition, there can’t be more than one partition with majority of servers in it.

High Performance

Strictly ordered access

Basic Cluster Interactions

image

When one server goes down, clients will see a disconnect event and client will re-connect themselves to another member of the quorum.

One more thing with 2 * n + 1 servers is that, any two majorities will have atleast one overlap server. Because there are atleast n + 1 in the majority, there is intersection with atleast one server from the previous majority.

Zookeeper data structure

image

File system analogy

image

image

image

image

image

Watches

image

The leader executes all write requests forwarded by followers. The leader then broadcasts the changes. image

image

API

Creation API

Get/Watch API

Other API

Implementation Details

image

The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.

ZooKeeper uses a custom atomic messaging protocol. Since the messaging layer is atomic, ZooKeeper can guarantee that the local replicas never diverge. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.

Uses of Zookeeper

ZooKeeper was not implemented to be a large datastore.

Discovery of hosts

A typical use case for ephemeral nodes is when using ZooKeeper for discovery of hosts in your distributed system. Each server can then publish its IP address in an ephemeral node, and should a server loose connectivity with ZooKeeper and fail to reconnect within the session timeout, then its information is deleted.

Leader election

An easy way of doing leader election with ZooKeeper is to let every server publish its information in a zNode that is both sequential and ephemeral.

Then, whichever server has the lowest sequential zNode is the leader. If the leader or any other server for that matter, goes offline, its session dies and its ephemeral node is removed, and all other servers can observe who is the new leader.

If we use write for leader instead of lowest sequential zNode, then Zookeeper will send the notification to all servers and all servers will try to write to the zookeeper to become a new leader at the same time creating a herd effect.

Message queue

With the use of watchers one can implement a message queue by letting all clients interested in a certain topic register a watcher on a zNode for that topic, and messages regarding that topic can be broadcast to all the clients by writing to that zNode.

Ref: