OrientDB Manual 1.7.8

Clusters

We've already talked about classes. A class is a logical concept in OrientDB. Clusters are also an important concept in OrientDB. Records (or documents/vertices) are stored in clusters.

What is a cluster?

A cluster is a place where a group of records are stored. Perhaps the best equivalent in the relational world would be a Table. By default, OrientDB will create one cluster per class. All the records of a class are stored in the same cluster which has the same name as the class.

Understanding the concepts of classes and clusters allows you to take advantage of the power of clusters while designing your new database.

Even though the default strategy is that each class maps to one cluster, a class can rely on multiple clusters. You can spawn records physically in multiple places, thereby creating multiple clusters. For example:

Class-Custer

The class "Customer" relies on 2 clusters:

  • USA_customers, containing all USA customers. This is the default cluster as denoted by the red star.
  • China_customers, containing all Chinese customers.

The default cluster (in this case, the USA_customers cluster) is used by default when the generic class "Customer" is used. Example:

Class-Custer

When querying the "Customer" class, all the involved clusters are scanned:

Class-Custer

If you know the location of a customer you're looking for you can query the target cluster directly. This avoids scanning the other clusters and optimizes the query:

Class-Custer

The benefits of using different physical places to store records are:

  • faster queries against clusters because only a sub-set of all the class's clusters must be searched
  • good partitioning allows you to reduce/remove the use of indexes
  • parallel queries if on multiple disks
  • sharding large data sets across multiple disks or server instances

There are two types of clusters:

  • Physical cluster (known as local) which is persistent because it writes directly to the file system
  • Memory Cluster where everything is volatile and will be lost on termination of the process or server if the database is remote

For most cases physical clusters are preferred because the database must be persistent. OrientDB creates physical clusters by default so you don't have to worry too much about it for now.

To view all clusters, from the console run the clusters command:

orientdb> clusters

CLUSTERS:
----------------------------------------------+------+---------------------+-----------+
 NAME                                         |  ID  | TYPE                | RECORDS   |
----------------------------------------------+------+---------------------+-----------+
 account                                      |    11| PHYSICAL            |      1107 |
 actor                                        |    91| PHYSICAL            |         3 |
 address                                      |    19| PHYSICAL            |       166 |
 animal                                       |    17| PHYSICAL            |         0 |
 animalrace                                   |    16| PHYSICAL            |         2 |
 ....                                         |  ....| ....                |      .... |
----------------------------------------------+------+---------------------+-----------+
 TOTAL                                                                           23481 |
---------------------------------------------------------------------------------------+

Since by default each class has its own cluster, we can query the database's users by class or by cluster:

orientdb> browse cluster OUser

---+---------+--------------------+--------------------+--------------------+--------------------
  #| RID     |name                |password            |status              |roles
---+---------+--------------------+--------------------+--------------------+--------------------
  0|     #5:0|admin               |{SHA-256}8C6976E5B5410415BDE908BD4DEE15DFB167A9C873FC4BB8A81F6F2AB448A918|ACTIVE              |[1]
  1|     #5:1|reader              |{SHA-256}3D0941964AA3EBDCB00CCEF58B1BB399F9F898465E9886D5AEC7F31090A0FB30|ACTIVE              |[1]
  2|     #5:2|writer              |{SHA-256}B93006774CBDD4B299389A03AC3D88C3A76B460D538795BC12718011A909FBA5|ACTIVE              |[1]
---+---------+--------------------+--------------------+--------------------+--------------------

The result is identical to browse class ouser executed in the classes section because there is only one cluster for the OUser class in this example.