OrientDB Manual 1.7.8

Sharding

Many NoSQL products do sharding against an hashed key (see MongoDB or any DHT system like DynamoDB). This wouldn’t work well in a Graph Database where traversing relationships means jumping from a node to another one with very high probability.

So the sharding strategy is in charge to the developer/dba. How? By defining multiple clusters per class. Example of splitting the class “Client” in 3 clusters:

Class Client -> Clusters [ client_0, client_1, client_2 ]

This means that OrientDB will consider any record/document/graph element in any of such clusters as “Clients” (Client class relies on such clusters). In Distributed-Architecture each cluster can be assigned to one or multiple server nodes.

image

Shards, based on clusters, work against indexed and non-indexed class/clusters.

Multiple servers per cluster

You can assign each cluster to one or more servers. If more servers are enlisted the records will be copied in all the servers. This is similar to what RAID stands for Disks.

This is an example of configuration where the Client class has been split in the 3 clusters client_0, client_1 and client_2, each one with different configuration:

  • client_0, will write to “europe” and “usa” nodes
  • client_1, only to “usa” node
  • client_2, to all the nodes (it would be equivalent as writing “”, see cluster ‘*’, the default one)

image

Configuration

In order to keep things simple, the entire OrientDB Distributed Configuration is stored on a single JSON file. Example of distributed database configuration for (Multiple servers per cluster)[Distributed-Sharding#Multiple-servers-per-cluster] use case:

{
  "autoDeploy": true,
  "hotAlignment": false,
  "readQuorum": 1,
  "writeQuorum": 2,
  "failureAvailableNodesLessQuorum": false,
  "readYourWrites": true,
  "clusters": {
    "internal": {
    },
    "index": {
    },
    "client_0": {
      "servers" : [ "europe", "usa" ]
    },
    "client_1": {
      "servers" : [ "usa" ]
    },
    "client_2": {
      "servers" : [ "asia", "europe", "usa" ]
    },
    "*": {
      "servers" : [ "<NEW_NODE>" ]
    }
  }
}

Autosharding

To enable autosharding, create one cluster per server for all the classes you want to automatic shard. This can be done automatically by OrientDB every time a new class is created by executing this command:

alter database minimumclusters 4

This will create 4 clusters per every new created class, so the following command will create the "Garden" class with 4 clusters (garden_0, garden_1, garden_2, garden_3):

create class Garden

By default each class has the ROUND_ROBIN strategy, where each cluster is used in turn on insertion. This works generally well when you pre-create multiple clusters, but if you add further clusters at existent class, they will be 0 records at the beginning. In this case you can use the BALANCED strategy to privilege the most empty cluster to be filled faster than others.

Define the target cluster/shard

The application can decide where to insert a new Client. To insert it into the first cluster the cluster name must be specified.

SQL

INSERT INTO Client CLUSTER:client_0 SET name = 'Jay'

Java Graph API

OrientVertex v = graph.addVertex("class:Client,cluster:client_0");
v.setProperty("name", "Jay");

Java Document API

ODocument doc = new ODocument("Client");
doc.field("name", "Jay");
doc.save( "client_0" );

Retrieve records

Against a shard

In both cases, OrientDB will send the record to both “europe” and “usa” nodes, because client_0 has been configured with 2 servers: [ "europe", "usa" ].

All the query works by aggregating the result sets of all the involved nodes. Example of a query against a particular cluster:

select from cluster:client_0

Against all the shards

This query will be executed against “europe” or “usa” node. Instead this query (against the Class):

select from Client

Will be executed against all 3 clusters that made the Client class, so against all the nodes.

Map/Reduce

OrientDB supports Map/Reduce by using the OrientDB SQL. The Map/Reduce operation is totally transparent to the developer. When a query involve multiple shards (clusters), OrientDB executes the query against all the involved server nodes (Map operation) and then merge the results (Reduce operation). Example:

select max(amount), count(*), sum(amount) from Client

image

In this case the query is executed across all the 3 nodes and then filtered again on starting node.

image

Limitation

  1. Hot change of distributed configuration not available. This will be introduced at release 2.0 via command line and in visual way in the Workbench of the Enterprise Edition (commercial licensed).
  2. Not complete merging of results for all the projections. Some functions like AVG() doesn’t work on map/reduce

Indexes

All the indexes are managed locally to a server. This means that if a class is spanned across 3 clusters on 3 different servers, each server will have own local indexes. By executing a distributed query (Map/Reduce like) each server will use own indexes.

Hot management of distributed configuration

With Community Edition the distributed configuration cannot be changed at run-time but you have to stop and restart all the nodes. Enterprise Edition allows to create and drop new shards without stopping the distributed cluster.

By using Enterprise Edition and the Workbench, you can deploy the database to the new server and defining the cluster to assign to it. In this example a new server "usa2" is created where only the cluster "client_1" will be copied. After the deployment, cluster "client_1" will be replicated against nodes "usa" and "usa2".

image