The OrientDB-ETL module is an amazing tool to move data from and to OrientDB by executing an ETL process. It's super easy to use. OrientDB ETL is based on the following principles:
EXTRACTOR => TRANSFORMERS[] => LOADER
Example of a process that extract from a CSV file, apply some change, lookup if the record has already been created and then store the record as document against OrientDB database:
+-----------+-----------------------+-----------+
| | PIPELINE |
+ EXTRACTOR +-----------------------+-----------+
| | TRANSFORMERS | LOADER |
+-----------+-----------------------+-----------+
| FILE ==> CSV->FIELD->MERGE ==> OrientDB |
+-----------+-----------------------+-----------+
The pipeline, made of transformation and loading phases, can run in parallel by setting the configuration {"parallel":true}
.
Starting from OrientDB v2.0 the ETL module will be distributed in bundle with the official release. If you want to use it, then follow these steps:
git clone https://github.com/orientechnologies/orientdb-etl.git
mvn clean install
script/oetl.sh
(or .bat under Windows) to $ORIENTDB_HOME/bintarget/orientdb-etl-2.0-SNAPSHOT.jar
to $ORIENTDB_HOME/lib$ cd $ORIENTDB_HOME/bin
$ ./oetl.sh config-dbpedia.json
Examples:
Roadmap: The project is in beta status, but will be final with OrientDB 2.0 whee we'll provide more components.