More information about the Underscore mailing list

[_] ArangoDB anyone?

Russ Topia russf at topia.com
Wed May 18 21:37:54 BST 2016

Without knowing all the details, I’ll just say that graph databases are GREAT when you have a growing set of different types of entity, and the relationships are flexible, evolving, and evolve at runtime. Of course you can do the equivalent in relational, but you have to have a relationship join table, that names the type of the relations, and the end points. In a graph database you just add edges between nodes, name the type of the edge, and sometimes add attributes to the edge. (So you could have a married relation, with start/end dates). Neo4J and Titan can be used schemaless, where your latest code might simply start adding a new type of edge/node, and doing a new type of query.  You can also use model layers to impose schema. Often these model layers are friendly to attributes that they don’t recognise, which is helpful to evolving applications.

These things come into their own when, for example, you are scraping, or getting stuff back from API calls. In SQL, you’d worry about whether to add columns or not, to an entity. Or you’d decide a relationship was important enough to implement. In a schemaless graph, if you start finding new relationships or attributes offered in your data, you simply slam it in the DB, and decide later if you want to exploit it or not. It’s a fundamentally different approach.

Another feature is that you can have processes crawling over the graph increasing the value of the data, by adding higher level derived information, pruning, modifying, etc. There are strategies to ensure consistency, and reliability, but you don’t get those for free ;)

With Titan, its convenient to use the graph to find keys for much bigger datarows in other keyspaces. So you can use the graph to hold the index like information/metadata/inferences on data that comes from another part of database in a more column/table model.

Anyway, I think I crossed the line into selling, somewhere back there.  (

Clarification: I don’t sell technology, but I do consult to help people figure out their needs. Conversations like this are dangerous, since one really needs to know more context before any recommendations are really valid.  I’m providing the above just to level the playing field so you can see if this is worth deeper consideration or not.

Best,

—r


> On 18 May 2016, at 20:58, James Geldart <james at nuvola.co.uk> wrote:
> 
> Russ, Adam
> 
> Thanks for the replies (and the interesting related responses on the social media platform question). I am pretty much a relational db guy so this is all new to me. The main thing we're trying to do is put something together which is very searchable and on which some logic can be applied that can easily create semantic links between different datasets and within them in some form of automated way. No they aren't particularly well defined at this stage!
> 
> This is going to be replacing an existing platform that is just a bunch of pretty slow reports on a large MySQL database of document downloads (client is a financial publishing company). We want to increase the performance and use related datasets to inject some intelligence/insights. MarkLogic looks pretty awesome but as you say expensive and it looks like we'd be leaning on their professional services a lot, so even more expensive, so I'm looking at alternatives.
> 
> Cheers
> James
> 
> On 18/05/2016 17:47, Adam Retter wrote:
>>> Does anyone here have any experience using ArangoDB?
>> Nope. But I met the core developer at a conference in Amsterdam and he
>> seemed like a nice guy who knows what he is talking about (disclaimer
>> - I build database engines). The most interesting thing about Arango
>> perhaps is their query language, which is in many ways similar to
>> XQuery yet not XQuery.
>> 
>>> I am looking at it as an option for the back end for a data crunching web
>>> and mobile app for one of my clients. The brief is broadly to combine a load
>>> of readership stats with contact and other data to produce insights to my
>>> client's clients. ArangoDB seems to compare favourably against things like
>>> Neo4j, MarkLogic and Elasticsearch and is completely open source as far as I
>>> can see.
>> Those are 4 very different databases with very different data models.
>> I can only guess that maybe you haven't investigated that too closely.
>> Your best bet is to choose a database type that matches the shape of
>> your data, and then start narrowing down the choices based on query
>> language, features and performance etc.
>> 
>> To clarify:
>> 
>> Neo4j is a graph store, I would like to say that it's a triple store,
>> but its more than that.
>> MarkLogic is a native XML database (i.e. it's a tree store). It is
>> also insanely expensive and requires a lot of expertise.
>> Elasticsearch is a fancy search engine based on Apache Lucene... and
>> is *kinda* a JSON document store.
>> ArangoDB is a poly-store, currently support key/value, document and
>> graph data models.
>> 
>> 
>> ...So what shape is your data?  ...also if it looks like tabular data,
>> then probably you should just use PostgreSQL which is excellent and is
>> a well trodden path.
>> 
>> 
> 
> -- 
> *James Geldart*
> Nuvola Ltd
> 07968 210725
> uk.linkedin.com/in/jamesgeldart <http://uk.linkedin.com/in/jamesgeldart> <https://uk.linkedin.com/in/jamesgeldart <https://uk.linkedin.com/in/jamesgeldart>>
> -- 
> underscore_ list info/archive -> http://www.under-score.org.uk/mailman/listinfo/underscore <http://www.under-score.org.uk/mailman/listinfo/underscore>
Russ Ferriday -- Software Product Architect, Developer, Mentor
Founder & CTO Topia Systems Ltd.
russf at topia.com  --  +44 7429 518822