The most important feature of a graph database is the management of relationships. Many users come to OrientDB from MongoDB or other document databases because they lack efficient support of relationships.
The relational model (and RDBMS - relational database management systems) has long been thought to be the best way to handle relationships. Graph databases suggest a more modern approach to this topic.
Most database developers are familiar with the relational model given it's 30+ years of dominance, spreading over generations of developers. Let's review how these systems manage relationships. As an example, we will use the relationships between the Customer and Address tables.
RDBMSs store the value of the target record in the "address" column of the Customer table. This is called a foreign key. The foreign key points to the primary key of the related record in the Address table:
To retrieve the address pointed to by customer "Luca", the query in a RDBMS would be:
SELECT B.location FROM Customer A, Address B WHERE A.name = 'Luca' AND A.address = B.id
This is a JOIN! A JOIN is executed at run-time every time you retrieve a relationship.
Since RDBMS have no concept of collections the Customer table cannot have multiple foreign keys. The way to manage a 1-to-Many relationship is by moving the foreign key to the Address table.
To extract all addresses of Customer 'Luca', the query in RDBMS reads:
SELECT B.location FROM Customer A, Address B WHERE A.name = 'Luca' AND B.customer = A.id
The most complex case is the Many-to-Many relationship. To handle this type of association, RDBMSs need a separate, intermediary table that matches both Customer and Addresses in all required combinations. This results in a double JOIN per record at runtime;
To extract all addresses of Customer 'Luca's the query in RDBMS becomes:
SELECT B.location FROM Customer A, Address B, CustomerAddress C WHERE A.name = 'Luca' AND B.id = A.id AND B.address = C.id
With document and relational DBMS, the more data you have, the slower the database will perform. Joins have heavy runtime costs. In comparison, OrientDB handles relationships as physical links to the records, assigned only once when the edge is created O(1). Compare this to an RDBMS that “computes“ the relationship every single time you query a database O(LogN). With OrientDB, speed of traversal is not affected by the database size. It is always constant regardless if it has one record or 100 billion records. This is critical in the age of Big Data.
Searching for an ID at runtime each time you execute a query, for every record could be very expensive! The first optimization with RDMS is using indexes. Indexes speed up searches but they slow down INSERT
, UPDATE
and DELETE
operations. In addition, they occupy substantial space on disk and in memory. You also need to qualify - are you sure the lookup into an index is actually fast? Let's try to understand how indexes work.
The database industry has plenty of indexing algorithms. The most common in both Relational and NoSQL DBMS is the B+Tree. All balanced trees work in similar ways. Here is and example of how it would work when you're looking for "Luca": after only 5 hops the record is found.
But what if there were millions or billions of records? There would be many, many more hops. And this operation is executed on every JOIN per record! Imagine joining 4 tables with thousands of records: the number of JOINS could be in the millions!
OrientDB doesn't use JOINs. Instead it uses LINKs. A LINK is a relationship managed by storing the target RID in the source record. It's much like storing a pointer between 2 objects in memory. When you have Invoice -> Customer, then you have a pointer to Customer inside Invoice as an attribute. It's exactly the same. In this way it's like your database was in memory, a memory of several exabytes.
What about 1-to-N relationships? These relationships are handled as a collection of RIDs, like you would manage objects in memory. OrientDB supports different kinds of relationships:
Map<?,Record>
.