OrientDB Manual 1.7.8

Import from RDBMS to Graph Model

This guide is to import an exported relational database into OrientDB using the Graph Model. If you're using the Document Model, look at Import into Document Model.

OrientDB supports a subset of SQL, so importing a database created as "Relational" is straightforward. You can import a database using the API, the OrientDB Studio visual tool or the Console. In this guide the console is used.

For the sake of simplicity consider your Relational database having just these two tables:

  • POST
  • COMMENT

Where the relationship is between Post and comment as One-2-Many.

TABLE POST:
+----+----------------+
| id | title          |
+----+----------------+
| 10 | NoSQL movement |
| 20 | New OrientDB   |
+----+----------------+

TABLE COMMENT:
+----+--------+--------------+
| id | postId | text         |
+----+--------+--------------+
|  0 |   10   | First        |
|  1 |   10   | Second       |
| 21 |   10   | Another      |
| 41 |   20   | First again  |
| 82 |   20   | Second Again |
+----+--------+--------------+

Since the Relational Model hasn't Object Oriented concepts you can create a class per table in OrientDB. Furthermore in the RDBMS references One-2-Many are inverted from the target table to the source one. In OrientDB the Object Oriented model is respected and you've a collection of links from POST to COMMENT instances. In a RDBMS you have:

Table POST    <- (foreign key) Table COMMENT

In OrientDB the Graph model uses Edges to manage relationships:

Class POST (extends V) <----(collection of edges)----> Class COMMENT (extends V)

Export your Relational Database

Most of Relational DBMSs provide a way to export a database in SQL format. What you need is a text file containing the SQL INSERT commands to recreate the database from scratch. Take a look to the documentation of your RDBMS provider. Below the link to the export utilities for the most common RDBMSs:

At this point you should have a .sql file containing the Relational database exported in SQL format like this:

DROP TABLE IF EXISTS post;
CREATE TABLE post (
id int(11) NOT NULL auto_increment,
title varchar(128),
PRIMARY KEY (id)
);

DROP TABLE IF EXISTS comment;
CREATE TABLE comment (
id int(11) NOT NULL auto_increment,
postId int(11),
text text,
PRIMARY KEY (id),
CONSTRAINT `fk_comments`
    FOREIGN KEY (`postId` )
    REFERENCES `post` (`id` )
);

INSERT INTO POST (id, title) VALUES( 10, 'NoSQL movement' );
INSERT INTO POST (id, title) VALUES( 20, 'New OrientDB' );

INSERT INTO COMMENT (id, postId, text) VALUES( 0, 10, 'First' );
INSERT INTO COMMENT (id, postId, text) VALUES( 1, 10, 'Second' );
INSERT INTO COMMENT (id, postId, text) VALUES( 21, 10, 'Another' );
INSERT INTO COMMENT (id, postId, text) VALUES( 41, 20, 'First again' );
INSERT INTO COMMENT (id, postId, text) VALUES( 82, 20, 'Second Again' );

Modify the SQL script

What we're going to do is to change the generated SQL file to be imported into a OrientDB database. Don't execute the following commands but include them into the SQL file to be executed in batch mode by the OrientDB Console.

What database to use?

Before to import the database you need an open connection to a OrientDB database. You can create a brand new database or use an existent one. You can use a volatile in-memory only database or a persistent disk-based one.

For persistent databases you can choose to create it in a remote server or locally using the "plocal" mode avoiding the server at all. This is suggested to have better performance on massive insertion.

Create a new database

Use the embedded mode

create database plocal:/tmp/db/blog admin admin plocal document

This will create a new database under the directory "/tmp/db/blog".

Use the remote mode

Or start a OrientDB server and create a database using the "remote" protocol in the connection URL. Example:

create database remote:localhost/blog root dkdf383dhdsj plocal document

NOTE: When you create a remote database you need the server's credentials to do it. Use the user "root" and the password stored in config/orientdb-server-config.xml file.

Use an existent database

Use the embedded mode

If you already have a database where to import, just open it:

connect plocal:/tmp/db/blog admin admin

Use the remote mode

connect remote:localhost/blog admin admin

Declare the 'massive insert' intent

In order to obtain the maximum of performance you can tell to OrientDB what you're going to do. These are called "Intents". The "Massive Insert" intent will auto tune the OrientDB engine for fast insertion.

Add the following line:

DECLARE INTENT massiveinsert

Create the classes, one for tables

Since the Relational Model hasn't Object Oriented concepts you can create a class per table. Change the CREATE TABLE ... statements with CREATE CLASS. Since you're using the Graph Model, let all the classes to extend the base Vertex class "V". Example:

CREATE CLASS POST EXTENDS V
CREATE CLASS COMMENT EXTENDS V

Pseudo Object Oriented database

This is the case when your Relational database was created using a OR-Mapping tool like Hibernate or Data Nucleus (JDO).

In this case you have to re-build the original Object Oriented structure directly in OrientDB using the Object Oriented capabilities of OrientDB.

Convert INSERT INTO into CREATE VERTEX

By using the Graph Model the best way to create vertices is using the SQL Create Vertex command. In this case it's easier to leave the original exported file as much as we can, so the INSERT INTO statements will be ok. Leave unaltered all the INSERT INTOstatements.

Create edges between vertices

At this point you need to create edges as relationships between vertices. The Create Edge command creates edges between two or more vertices.

In the Relational world relationships are resolved as foreign keys. Using OrientDB, instead, you have direct connections. So the navigation is from Post to Comment and not viceversa as for Relational model. For this reason you need to create edges as INVERSE.

Execute (this command is still in draft):

CREATE EDGE e FROM comment JOIN post ON postId = id INVERSE

Remove old constraints

This is an optional step. Now you've direct links the field 'postId' has no more sense, so remove it:

UPDATE comment REMOVE postId

Expected output

After these steps the expected output should be similar to that below:

CONNECT plocal:/tmp/db/blog admin admin

DECLARE INTENT massiveinsert

CREATE CLASS POST EXTENDS V
CREATE CLASS COMMENT EXTENDS V

INSERT INTO POST (id, title) VALUES ( 10, 'NoSQL movement' );
INSERT INTO POST (id, title) VALUES ( 20, 'New OrientDB' );

INSERT INTO COMMENT (id, postId, text) VALUES ( 0, 10, 'First' );
INSERT INTO COMMENT (id, postId, text) VALUES ( 1, 10, 'Second' );
INSERT INTO COMMENT (id, postId, text) VALUES ( 21, 10, 'Another' );
INSERT INTO COMMENT (id, postId, text) VALUES ( 41, 20, 'First again' );
INSERT INTO COMMENT (id, postId, text) VALUES ( 82, 20, 'Second Again' );

CREATE EDGE E TRANSFORM comment.postId TO post.id INVERSE
UPDATE comment REMOVE postId

Import the records

Now you have modified the SQL script execute it by invoking the console tool in batch mode (text file just created as first argument). Example of importing the file called "database.sql":

    $ console.sh database.sql

Enjoy

That's all. Now you've a OrientDB database where relationships are direct without JOINS.

Now enjoy with your new document-graph database and the following queries:

Select all the post with comments:

orientdb> select * from post where comments.size() > 0

Select all the posts where comments contain the word 'flame' in the text property (before as column):

orientdb> select * from post where comments contains ( text like '%flame%' )

Select all the posts commented today. In this case we're assuming a property "date" is present in Comment class:

orientdb> select * from post where comments contains ( date > '2011-04-14 00:00:00' )

To know more about other SQL commands look at SQL commands.

This is a command of the Orient console. To know all the commands go to Console-Commands.