|
What is the maximum number of nodes supported? What is the maximum number of edges supported per node?
|
|
At the moment it is 34.4 billion nodes, 34.4 billion relationships, and 68.7 billion properties, in total.
|
|
What is the largest complete connected graph supported (i.e. every node is connecting to all other nodes)?
|
|
Theoretical limits can be derived from numbers above: It basically comes out to a full graph of 262144 nodes and 34359607296 relationships. We have never seen this use case though.
|
|
Are read/write depending on the number of nodes/edges in the DB?
|
|
This question can mean a couple of different things. The performance of a single read/write operation does not depend on the size of the DB. Whether the graph has 10 nodes or 10 million nodes does not matter.
— There is however another facet here, which is that if your graph is big on disk, you may not be able to fit it all into the cache in RAM. Therefore, you may end up hitting disk more often. Most customers don’t have graphs of this size, but some do. If you happen to reach these sizes, we have approaches for scaling out on multiple machines to mitigate the performance impact by increasing the cache "surface area" across machines.
|
|
How many concurrent read/write requests supported?
|
|
There is no limit on the number of concurrent requests. The amount of requests we can serve per second depends very much on the operation performed (heavy write operation, simple read, complex traversal, etc.), and the hardware used. A rough estimate is 1,000 hops per millisecond while traversing the graph in the simplest way possible. After a discussion about the specific use case, we would be able to give a better idea of the performance one can expect.
|
|
How is data consistency maintained in cluster environment?
|
|
Master-slave replication. Slaves pull changes from the master. The pull interval can be configured per slave, from subsecond to minutes, as necessary. HA can also write through slaves. When that happens, the slave that is being written through catches up with the master, and then the write is made durable on the slave and the master. The other slaves then catch up as normal.
|
|
How is the latency in updating all the servers when there is an update on the DB from one of them?
|
|
Pull interval can be configured per slave, from subsecond to minutes, as necessary. When writing through a slave, the slave is immediately synchronized with the master before the write is committed on the slave and the master. In general, read/write load does not affect slaves syncing up. A heavy write load will however put pressure on the filesystem of the master, which is also required for reading changes for the slaves. In practice, we have however not seen this become a notable issue.
|
|
Will the latency increase proportional to the number of servers in the cluster?
|
|
When scaling beyond 10s of slaves in a cluster, we anticipate that the number of pull requests coming from slaves will reduce the performance of the master. Only write performance on the cluster would be affected. Read performance would continue to scale linearly.
|
|
Is online expansion supported? In other words, do we need to bring down all the servers and the DB if we want to add new servers to the cluster?
|
|
New slaves can be added to an existing cluster without having to stop and start the whole cluster. Our HA protocol will bring a newly added slave up-to-date. Slaves can also be removed simply by shutting them down.
|
|
How long will it take for the newly joined servers to sync up?
|
|
We recommend providing a new slave with a recent snapshot of the database before bringing it online. This is typically done from a backup. The slave will then only need to synchronize the most recent updates, which will typically be a matter of seconds.
|
|
How long does it take to reboot?
|
|
If by reboot, you mean take the cluster down and take it up again, it’s pretty much dependent on how fast you can type. So it could be <10s. The Neo4j caches will however not auto-warm up, but the OS filesystem cache will retain its data.
|
|
Are there any backup and restore/recovery mechanisms?
|
|
Neo4j Enterprise Edition provides an online backup feature for full and incremental backups during operation.
|
|
Is cross-continental clustering supported? Say, can servers in the cluster be located in different continents provided that the chance for inter-continental communication is much lower than the intra one?
|
|
We have customers who have tested multi-region deployments in AWS. Cross-continental latencies will have an impact, however on the efficiency of the cluster management and synchronization protocols; large latencies in the cluster management can trigger frequent master re-elections, which will slow down the cluster. Feature support in this area will be improving over time.
|
|
Is there any special handling/policy for this kind of setup?
|
|
We’d have to have a more in-depth discussion about the requirements pertaining to this specific deployment.
|
|
Is writing to the DB thread-safe? Or is it the application logic to protect writing to the same nodes/edges?
|
|
Whether in single instance or HA mode, the database provides thread safety by way of locking on nodes and relationships upon modification.
|
|
What is the best strategy for reading back your writes on HA?
|
| -
Sticky sessions.
-
Send back data in response, removing the need to read back in a separate request.
-
Force a pull of updates from the master when required by the operation.
|
|
What is the best strategy for get-or-create semantics?
|
| -
Single thread.
-
If not exists, pessimistically lock on a common node (or set of common nodes).
-
If not exists, optimistically create, and then double check afterwards. (explanation will be extended)
|