Neo4j HA can be set up to accommodate differing requirements for load, fault tolerance and available hardware.
Within a cluster, Neo4j HA uses Apache ZooKeeper [2] for master election and propagation of general cluster and machine status information.
ZooKeeper can be seen as a distributed coordination service.
Neo4j HA requires a coordinator service for initial master election, new master election (current master failing) and to publish general status information about the current Neo4j HA cluster (for example when a server joined or left the cluster).
Read operations through the GraphDatabaseService
API will always work and even writes can survive coordinator failures if a master is present.
ZooKeeper requires a majority of the ZooKeeper instances to be available to operate properly. This means that the number of ZooKeeper instances should always be an odd number since that will make best use of available hardware.
To further clarify the fault tolerance characteristics of Neo4j HA here are a few example setups:
This setup is conservative in the use of hardware while being able to handle moderate read load. It can fully operate when at least 2 of the coordinator instances are running. Since the coordinator and Neo4j HA are running together on each machine this will in most scenarios mean that only one server is allowed to go down.
This setup may mean that two different machine setups have to be managed (some machines run both coordinator and Neo4j HA). The fault tolerance will depend on how many machines there are that are running coordinators. With 3 coordinators the cluster can survive one coordinator going down, with 5 it can survive 2 and with 7 it can handle 3 coordinators failing. The number of Neo4j HA instances that can fail for normal operations is theoretically all but 1 (but for each required master election the coordinator must be available).
In this setup all coordinators are running on separate machines as a dedicated service. The dedicated coordinator cluster can handle half of the instances, minus 1, going down. The Neo4j HA cluster will be able to operate with at least a single live machine. Adding more Neo4j HA instances is very easy in this setup since the coordinator cluster is operating as a separate service.
For installation instructions of a High Availability cluster see Section 22.4, “High Availability setup tutorial”.
Note that while the HighlyAvailableGraphDatabase
supports the same API as the EmbeddedGraphDatabase
, it does have additional configuration parameters.
HighlyAvailableGraphDatabase configuration parameters
Parameter Name | Value | Example value | Required? |
---|---|---|---|
| integer >= 0 |
| yes |
| (auto-discovered) host & port to bind when acting as master |
| no |
| comma delimited coordinator connections |
| yes |
| interval for polling master from a slave, in seconds |
| no |
| creates a slave-only instance that will never become a master |
| no |
| how long a slave will wait for response from master before giving up (default 20) |
| no |
| how long a slave lock acquisition request will wait for response from master before giving up (defaults to what ha.read_timeout is, or its default if absent) |
| no |
Caution | |
---|---|
Neo4j’s HA setup depends on ZooKeeper a.k.a. Coordinator which makes certain assumptions about the state of the underlying operating system. In particular ZooKeeper expects that the system time on each machine is set correctly, synchronized with respect to each other. If this is not true, then Neo4j HA will appear to misbehave, caused by seemingly random ZooKeeper hiccups. |
Caution | |
---|---|
Neo4j uses the Coordinator cluster to store information representative of the deployment’s state, including key fields of the database itself. Since that information is permanently stored in the cluster,
you cannot reuse it for multiple deployments of different databases. In particular, removing the Neo4j servers, replacing the database and restarting them using the same coordinator instances will
result in errors mentioning the existing HA deployment. To reset the Coordinator cluster to a clean state, you have to shutdown all instances, remove the |
Copyright © 2012 Neo Technology