ZooKeeper is an application library that allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers; these data registers are called as znodes.
ZooKeeper was designed to store coordination data such as status information, configuration, location information, and so on.
ZooKeeper was a sub-project of Hadoop but is now a top-level project in its own right.
Following are few definitions from different sources:
Wikipedia: Apache ZooKeeper is a software project of the Apache Software Foundation, providing an open source distributed configuration service, synchronization service, and naming registry for large distributed systems.
zookeeper.apache.org / Hadoop Wiki: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Example Scenario
Let us also see a quick example scenario where ZooKeeper can be used.
Consider a Master-Worker application. The master process should keep track of the workers and tasks available, and assign tasks to workers. There are three key problems: Master may crash, Workers may crash and even Communication failures may occur.
ZooKeeper can be used for the coordination here as it provides key mechanisms to expose tasks such as keeping track of workers, tasks and task assignment, to the application, in the form of a primitive, hiding completely the implementation details from the application developer.
Coordination Tasks
As we saw from the definition, ZooKeeper is an application library that allows developers to implement common coordination tasks such as electing a master server, managing group membership, and managing metadata etc.
Coordination tasks can be of two types: for the purposes of cooperation or to regulate contention.
-
Cooperation means that the processes need to do something together, and processes take action to enable other processes to make progress. In the master-worker scenario, worker may update the master about its availability.
-
Contention refers to situations in which two processes cannot make progress concurrently, so one must wait for the other. In the master-worker scenario, if one of the multiple processes want to become master, then they may content for becoming the master in a way similar to how threads acquire locks in multithreading.
Popular Zookeeper Applications
Looking into some of the applications of Zookeeper gives a better perspective on what ZooKeeper is all about.
-
Apache Kafka
-
Kafka uses ZooKeeper to detect crashes, to implement topic discovery, and to maintain production and consumption state for topics.
-
-
Apache HBase
-
In HBase, ZooKeeper is used to elect a cluster master, to keep track of available servers, and to keep cluster metadata.
-
-
Apache Solr
-
Solr uses ZooKeeper to store metadata about the cluster and coordinate the updates to this metadata.
-
-
Facebook Messages
-
Facebook Messages uses ZooKeeper as a controller for implementing sharding and failover, and also for service discovery.
-
What ZooKeeper does not do
ZooKeeper implements a core set of operations that enable the implementation of tasks that are common to many distributed applications. Let us also see things that ZooKeeper won't be doing, to avoid any confusion.
-
ZooKeeper, does not implement the tasks for you. It does not elect a master or track live processes for the application out of the box. Instead, it provides the tools for implementing such tasks. The developer decides what coordination tasks to implement.
-
ZooKeeper does not make the problems with distributed systems disappear or render them completely transparent to applications, but it does make the problems more tractable.
References:
ZooKeeper by Flavio Junqueira and Benjamin Reed.
https://en.wikipedia.org/wiki/Apache_ZooKeeper
https://wiki.apache.org/hadoop/ZooKeeper
https://zookeeper.apache.org/
HBase is a data store typically used alongside Hadoop.
Kafka is a pub-sub messaging system.
Solr is an enterprise search platform. In its distributed form, called SolrCloud.
Facebook Messages is a Facebook application that integrates communication channels: email, SMS, Facebook Chat, and the existing Facebook Inbox.
- heartin's blog
- Log in or register to post comments
Recent comments