Why ArangoDB?

At work, we are looking into incorporating a graph database into our next generation stack. Graph database fits our business domain and business model very nicely, and a lot of our queries involve some form travarsal of relationships.

I have previous experience in Neo4j, and think it is very good at its job. On top of that, I think Cypher, the Neo4j’s query language, is well designed and intuitive. However, due to licensing issues for Neo4j, the higher-ups do not approve of using Neo4j. Then my immediate manager finds out about ArangoDB. So off I went with my teammate to evaluate ArangoDB.

What is ArangoDB

ArangoDB is a multi-model database. It supports document, key/value, and graph data models. Quite honestly, when I first read about the multi-model, I feel skeptical. Having different models could lead to compromises. That may mean ArangoDB is "jack of all trades, master of none".

Evaluation

I wrote this post before ArangoDB 3.0.0 came out. For the following evaluation, I am using version 2.8.9. The results for 3.0.0 will be posted shortly. The evaluation was done on a 15-inch MacBook Pro (mid-2014 model); the test is to determine how long does ArangoDB take to create 50,000 vertices and 49,999 edges. The test is very simple, but I discover something peculiar with ArangoDB. The evaluate code, written in Scala, can be found in this gist.

The Results

Below are the results for ArangoDB 2.8.9:

--> Creating 50000 vertices - graphCreateVertex
Elapsed time: 56748ms
--> Creating 49999 edges - graphCreateEdge
Elapsed time: 57497ms
--> Creating 50000 vertices - createDocument
Elapsed time: 9987ms
--> Creating 49999 edges - createEdge
Elapsed time: 9283ms
--> Creating 50000 vertices - createDocument (batched 100)
Elapsed time: 3520ms
--> Creating 49999 edges - createEdge (batched 100)
Elapsed time: 3942ms

As you can see, the creation times for the vertices and edges go down from around 57 seconds to around 10 seconds. With batching enabled, the numbers go down even further. All in all, the times go down from around 57s to just 4s; that is a big jump! What is surprising is the time difference between the different API calls, namely, graphCreateVertex vs. createDocument and graphCreateEdge vs. createEdge.

As far as I can tell, the difference between graphCreateEdge and createEdge is that the former performs validation; it makes sure the two vertices are present. Whereas the latter do not perform such validation.

However, I cannot really tell the difference between graphCreateVertex and createDocument. At first glance, one may say graphCreateVertex is for creating a vertex, and createDocument is for creating a document. However, as far as I understand, a vertex in ArangoDB is just a document. Moreover, the graph created by createDocument and createEdge, works just fine with AQL, the ArangoDB’s query language. The outcome seems to be the same, but createDocument is significantly faster than graphCreateVertex.

Closing Thoughts

Overall ArangoDB is quite capable. Aside from the few gotchas I mentioned in the previous section, everything functions as advertised, and the performance is acceptable. Being able to support different data models in ArangoDB can be convenient. Though I am only concerned with the graph capabilities. In this post, I only evaluate the graph creation time. Another metric I care about is the traversal time. I have code that measures the traversal time as well, but I will leave that for a later post.