2. Getting to know RethinkDB

Let’s warm up with some RethinkDB concepts, ideas and tools. In this chapter, thing may a bit confuse because sometime to understand concept A, you need to understand B. To understand B, you need C, which is based on A. So plese use your feeling and don’t hestitate to do some quick lookup on offical docs to clear thing out a bit.

From now on, we will use term ReQL to mean anything related to RethinkDB query language, or query API.

Getting Started

It’s not uncommon to see someone write an interactive shell in browser for evaluation purpose such as mongly, tryRedis. This isn’t applied for RethinkDB because it comes with an excellent editor where you can type code and run it.

Install RethinkDB by downloading package for your platform http://rethinkdb.com/docs/install/. Run it after installing.

The Ports

By default, RethinkDB runs on 3 ports

8080: this is the web user interface of RethinkDB or the dashboard. You can query the data, check performance and server status on that UI.
28015: this is the client drive port. All client drive will connect to RethinkDB through this port. If you remember in previous chapter, we used a tcpdump command to listen on this port to capture data send over it.
29015: this is intracluster port; different RethinkDB node in a cluster communicates with eath others via this port

The dashboard

Open your browser at http://127.0.0.1:8080 and welcome RethinkDB. You can play around to see what you have:

Navigate to the Explorer tab, you can type the command from there. Let’s start with

1 r.dbList()

Run it and you can see a list of database.

RethinkDB object

Similar to traditonal database system, we also have database in RethinkDB. A database contains many tables. Each table contains your JSON document. Those JSON document can contains any fields. A table doesn’t force a schema for those fields.

A JSON document is similar to a row in MySQL. Each of field in the document is similar to column in MySQL. When I say JSON document, I mean an JSON object with fields, not a single number, an array or a string. However, each field can content whatever JSON data type.

More than that, the same field can accept whatever data type. On same table, two document can contains diferent data type for same field.

Durability

You will see an option/argument call durability *durability appear a lot in many option of ReQL. Because it’s so common and it’s very important, I want to address it here. Durability accepts value of ‘soft’ or ‘hard’.

soft: means the writes will be acknowledge by server immdediately and data will be flushed to disk in background.
hard: The opposite of soft. The default behaviour is to acknowledge after data is written to disk. Therefore, when you don’t need the data to be consitent, such as writing a cache, or an important log, you should set durability to soft in order to increase speed

Atomicity

According to RethinkDB docs [^atomic], write atomicity is supported on a per-document basis. So when you write to a single document, it’s either succesfully or nothing orrcur instead of updating a couple field and leaving your data in a bad shape. Furthermore, RethinkDB guarantees that any combination of operation can be executed in a single document will be write atomically.

However, it does comes with a limit. To quote RethinkDB doc, Operations that cannot be proven deterministic cannot update the document in an atomic way. That being said, the unpredictable value won’t be atomic. Eg, randome value, operation run by using JavaScript expression other than ReQL, or values which are fetched from somewhere else. RethinkDB will throw an error instead of silently do it or ignore. You can choose to set a flag for writing data in non-atomic way.

Multiple document writing isn’t atomic.

[^atomic] http://www.rethinkdb.com/docs/architecture/#how-does-the-atomicity-model-work

Command line tool

Besides the dashboard, RethinkDB gives us some command line utility to interactive with it. Some of them are:

import
export
dump
restore

import

In the spirit of giving users the dashboard, RethinkDB also gives us some sample data. You can download the data in file input_polls and country_stats at https://github.com/rethinkdb/rethinkdb/tree/next/demos/election and import them into test database

1 rethinkdb import -c localhost:28015 --table test.input_polls --pkey uuid -f inpu\
2 t_polls.json --format json
3 rethinkdb import -c localhost:28015 --table test.county_stats --pkey uuid -f cou\
4 nty_stats.json --format json

Notice the --table argument, we are passing the table name in format of database_name.*table_name”. In our case, we import the data into two tables: input_polls and county_stats inside database test.

Basically you can easily import any file contains a valid JSON document.

export

export exports your database into many JSON files, each file is a table. The JSON file can be import using above import command.

dump

dump will just export whole of data of a cluster, it’s similar to an export command, then follow by gzip to compress all JSON output file. Syntax is as easy as.

1 rethinkdb dump -c 127.0.0.1:28015

Here is an example output when I run this command:

 1 rethinkdb dump -c 127.0.0.1:28015
 2 NOTE: 'rethinkdb-dump' saves data and secondary indexes, but does *not* save
 3  cluster metadata.  You will need to recreate your cluster setup yourself after
 4  you run 'rethinkdb-restore'.
 5 Exporting to directory...
 6 [========================================] 100%
 7 764509 rows exported from 9 tables, with 21 secondary indexes
 8   Done (157 seconds)
 9 Zipping export directory...
10   Done (5 seconds)

The dump result is a gzip file whose name is in format rethinkdb_dump_{timestamp}.tar.gz It’s very useful when you want to try out something and get back your original data. Note it here because you will need it later.

restore

Once we got the dump file with dump command. We can restore with:

1 rethinkdb restore -c 127.0.0.1:28015 rethinkdb_dump_DATE_TIME.tar.gz

Import sample data

It’s much nicer to work with real and fun data than boring data. I found a very useful dataset call FooDB¹. It’s a data about food constituents, chemistry and biolog. To quote their about page:

What is FooDB FooDB is the worldâ€™s largest and most comprehensive resource on food constituents, chemistry and biology. It provides information on both macronutrients and micronutrients, including many of the constituents that give foods their flavor, color, taste, texture and aroma

I import their data into RethinkDB, and generate some sample tables such as users table. At the end, I used the dump command to generate sample data which you can download using below links²

https://www.dropbox.com/s/dy48el02j9p4b2g/simplyrethink_dump_2015-08-11T22%3A15%3A51.tar.gz?dl=0. Once you download it, you can import this sample dataset:

1 rethinkdb restore -c 127.0.0.1:28015 simplyrethink_dump_2015-08-11T22:15:51.tar.\
2 gz

The output looks like this:

 1 Unzipping archive file...
 2   Done (1 seconds)
 3 Importing from directory...
 4 [                                        ]   0%
 5 [                                        ]   0%
 6 [                                        ]   0%
 7 [                                        ]   0%
 8 [========================================] 100%
 9 764509 rows imported in 9 tables
10   Done (2166 seconds)

Once this processing is done, you should have a database call foodb which contains the data we play throught the book. At any point, if you messed up data, you can always restore from this sample data. Also, I encourage to back up data if you build many interesting data to experiment yourself.

http://foodb.ca/about↩
https://www.dropbox.com/s/dy48el02j9p4b2g/simplyrethink_dump_2015-08-11T22%3A15%3A51.tar.gz?dl=0↩

Up next

Getting Started