ArangoDB Performance Peculiarities
Why ArangoDB?
At work, we are looking into incorporating a graph database into our next generation stack. Graph database fits our business domain and business model very nicely, and a lot of our queries involve some form travarsal of relationships.
I have previous experience in Neo4j, and think it is very good at its job. On top of that, I think Cypher, the Neo4j’s query language, is well designed and intuitive. However, due to licensing issues for Neo4j, the higher-ups do not approve of using Neo4j. Then my immediate manager finds out about ArangoDB. So off I went with my teammate to evaluate ArangoDB.
What is ArangoDB
ArangoDB is a multi-model database. It supports document, key/value, and graph data models. Quite honestly, when I first read about the multi-model, I feel skeptical. Having different models could lead to compromises. That may mean ArangoDB is "jack of all trades, master of none".
Evaluation
I wrote this post before ArangoDB 3.0.0 came out. For the following evaluation, I am using version 2.8.9. The results for 3.0.0 will be posted shortly. The evaluation was done on a 15-inch MacBook Pro (mid-2014 model); the test is to determine how long does ArangoDB take to create 50,000 vertices and 49,999 edges. The test is very simple, but I discover something peculiar with ArangoDB. The evaluate code, written in Scala, can be found in this gist.
The Results
Below are the results for ArangoDB 2.8.9:
--> Creating 50000 vertices - graphCreateVertex
Elapsed time: 56748ms
--> Creating 49999 edges - graphCreateEdge
Elapsed time: 57497ms
--> Creating 50000 vertices - createDocument
Elapsed time: 9987ms
--> Creating 49999 edges - createEdge
Elapsed time: 9283ms
--> Creating 50000 vertices - createDocument (batched 100)
Elapsed time: 3520ms
--> Creating 49999 edges - createEdge (batched 100)
Elapsed time: 3942ms
As you can see, the creation times for the vertices and edges go down
from around 57 seconds to around 10 seconds.
With batching enabled, the numbers go down even further.
All in all, the times go down from around 57s to just 4s;
that is a big jump!
What is surprising is the time difference between the different API calls, namely,
graphCreateVertex
vs. createDocument
and graphCreateEdge
vs. createEdge
.
As far as I can tell, the difference between graphCreateEdge
and createEdge
is that
the former performs validation;
it makes sure the two vertices are present.
Whereas the latter do not perform such validation.
However, I cannot really tell the difference between graphCreateVertex
and createDocument
.
At first glance, one may say graphCreateVertex
is for creating a vertex,
and createDocument
is for creating a document.
However, as far as I understand, a vertex in ArangoDB is just a document.
Moreover, the graph created by createDocument
and createEdge
,
works just fine with AQL, the ArangoDB’s query language.
The outcome seems to be the same,
but createDocument
is significantly faster than graphCreateVertex
.
Closing Thoughts
Overall ArangoDB is quite capable. Aside from the few gotchas I mentioned in the previous section, everything functions as advertised, and the performance is acceptable. Being able to support different data models in ArangoDB can be convenient. Though I am only concerned with the graph capabilities. In this post, I only evaluate the graph creation time. Another metric I care about is the traversal time. I have code that measures the traversal time as well, but I will leave that for a later post.