Selecting data
In this section, we will learn how to get data out of RethinkDB. Most of the
time, we will choose a db to work with, and chain into command table.
Select the whole table
Let’s find all foods. This is same as SELECT * FROM foods in SQL.
1 r.db('foodb').table('foods')
2 //=>
3
4 [{
5 "created_at": Wed Feb 09 2011 00: 37: 17 GMT - 08: 00,
6 "creator_id": null,
7 "description": null,
8 "food_group": "Herbs and Spices",
9 "food_subgroup": "Spices",
10 "food_type": "Type 1",
11 "id": 43,
12 "itis_id": "29610",
13 "legacy_id": 46,
14 "name": "Caraway",
15 "name_scientific": "Carum carvi",
16 "picture_content_type": "image/jpeg",
17 "picture_file_name": "43.jpg",
18 "picture_file_size": 59897,
19 "picture_updated_at": Fri Apr 20 2012 09: 38: 36 GMT - 07: 00,
20 "updated_at": Fri Apr 20 2012 16: 38: 37 GMT - 07: 00,
21 "updater_id": null,
22 "wikipedia_id": null
23 }, {
24 "created_at": Wed Feb 09 2011 00: 37: 18 GMT - 08: 00,
25 "creator_id": null,
26 "description": null,
27 "food_group": "Herbs and Spices",
28 "food_subgroup": "Spices",
29 "food_type": "Type 1",
30 "id": 67,
31 "itis_id": "501839",
32 "legacy_id": 73,
33 "name": "Cumin",
34 "name_scientific": "Cuminum cyminum",
35 "picture_content_type": "image/jpeg",
36 "picture_file_name": "67.jpg",
37 "picture_file_size": 73485,
38 "picture_updated_at": Fri Apr 20 2012 09: 32: 32 GMT - 07: 00,
39 "updated_at": Fri Apr 20 2012 16: 32: 33 GMT - 07: 00,
40 "updater_id": null,
41 "wikipedia_id": null
42 },
43 ...
44 ]
You should get back an array of JSON object. By default, the data explorer will automatically paginate it and display a part of data.
Typing r.db(db_name) all the time is insane. We can drop it to use r.table()
without calling r.db() if the table is in current selected database. Without
any indication, the default database is test. On Data Exploer, without a
r.db command, RethinkDB will use test as default database. Unfortunately we
cannot set a default database with data exploer1
Counting
We can also count the table or any sequence by calling count command.
1 r.db('foodb').table('foods').count()
2 //=>
3 863
Select a single document by its primary key
To select a single element, we call get on a table, and passing its primary
key value.
1 r.db('foodb').table('foods')
2 .get(108)
3 //=>
4 {
5 "created_at": Wed Feb 09 2011 00: 37: 20 GMT - 08: 00,
6 "creator_id": null,
7 "description": null,
8 "food_group": "Herbs and Spices",
9 "food_subgroup": "Herbs",
10 "food_type": "Type 1",
11 "id": 108,
12 "itis_id": "32565",
13 "legacy_id": 115,
14 "name": "Lemon balm",
15 "name_scientific": "Melissa officinalis",
16 "picture_content_type": "image/jpeg",
17 "picture_file_name": "108.jpg",
18 "picture_file_size": 30057,
19 "picture_updated_at": Fri Apr 20 2012 09: 33: 54 GMT - 07: 00,
20 "updated_at": Fri Apr 20 2012 16: 33: 54 GMT - 07: 00,
21 "updater_id": null,
22 "wikipedia_id": null
23 }
Every document in RethinkDB includes a primary key field, its value is
unique across cluster and is used to identify the document. The name of primary
field is id by default. However, when you create a table, you have an option
to change name of primary field. We will learn more about it later. Just keep a
note here.
In RethinkDB, using of incremental primary key isn’t recommended because that’s hard in a cluster environment. To make sure the uniqueness of the new value, We have to check in every clusters somehow. RethinkDB team decides2 to use an universal unique3 id instead of an incremental value.
get command returns the whole document. What if we get a single field? Such
as we only care about name? RethinkDB has a command call bracket for that
purpose. In Ruby it’s [], and in JavaScript it’s ().
We can do this in JavaScript:
1 r.db('foodb').table('foods')
2 .get(108)("name")
3 //=>
4 "Lemon balm"
Or in Ruby
1 r.connect.repl
2 r.db('foodb').table('foods').get(108)[:name].run
What special about bracket is that it return a single value of the field.
The type of value is same type of value, not a subset of document. We can verify
that with typeOf command:
1 r.db('foodb').table('foods')
2 .get(108)
3 ("name")
4 .typeOf()
5 //=>
6 "STRING"
You can even get nested field with bracket:
1 r.db('foodb').table('test')
2 .get(108)("address")("country")
with assumption that the document has address field is an object contains a field name country.
If you don’t like the using of bracket, you can use getField(JavaScript) or
get_field(Ruby) which have same effect:
1 r.db('foodb').table('foods')
2 .get(108)
3 .getField('name')
4 //=>
5 "Lemon balm"
How about getting a sub set of document, we can use pluck like this:
1 r.db('foodb').table('foods')
2 .get(108)
3 .pluck(get"name", "id")
4 //=>
5 {
6 "id": 108 ,
7 "name": "Lemon balm"
8 }
pluck probably existed in many standard library of your favourite language.
This example shows you how friendly ReQL is.
Select many documents by value of fields
To select many document based on value of field, We used filter method, and
passing an object with expected value.
Let’s find all food that were inserted into database on 2011, the year I come
to the US.
1 r.db('foodb').table('foods')
2 .filter(r.row('created_at').year().eq(2011))
3 //=>Executed in 59ms. 40 rows returned, 40 displayed, more available
4 [{
5 "created_at": Wed Feb 09 2011 00: 37: 17 GMT - 08: 00,
6 "creator_id": null,
7 "description": null,
8 "food_group": "Herbs and Spices",
9 "food_subgroup": "Spices",
10 "food_type": "Type 1",
11 "id": 43,
12 "itis_id": "29610",
13 "legacy_id": 46,
14 "name": "Caraway",
15 "name_scientific": "Carum carvi",
16 "picture_content_type": "image/jpeg",
17 "picture_file_name": "43.jpg",
18 "picture_file_size": 59897,
19 "picture_updated_at": Fri Apr 20 2012 09: 38: 36 GMT - 07: 00,
20 "updated_at": Fri Apr 20 2012 16: 38: 37 GMT - 07: 00,
21 "updater_id": null,
22 "wikipedia_id": null
23 }
24 ...
25 ]
r.row is new to you, but no worry, it just means current document. We used
r.row('created_at') to get value of created_at field, similar with how we
use bracket on get command to get a single value. Because created_at
is a datetime value, I get its year with, well, year command, then using eq
to do an equal compare with 2011. Sound a lot, but above query is really simple
and exlain itself. Sometimes I feel redundant to explain query but I have to
write this book anyway.
We can also pass an filter object to do matching filter:
1 r.db('foodb').table('foods')
2 .filter({
3 food_type: 'Type 1',
4 food_group: 'Fruits'
5 })
6 //=>
7 [
8 {
9 "created_at": Wed Feb 09 2011 00:37:15 GMT-08:00 ,
10 "creator_id": null ,
11 "description": null ,
12 "food_group": "Fruits" ,
13 "food_subgroup": "Tropical fruits" ,
14 "food_type": "Type 1" ,
15 "id": 14 ,
16 "itis_id": "18099" ,
17 "legacy_id": 14 ,
18 "name": "Custard apple" ,
19 "name_scientific": "Annona reticulata" ,
20 "picture_content_type": "image/jpeg" ,
21 "picture_file_name": "14.jpg" ,
22 "picture_file_size": 29242 ,
23 "picture_updated_at": Fri Apr 20 2012 09:30:49 GMT-07:00 ,
24 "updated_at": Fri Apr 20 2012 16:30:49 GMT-07:00 ,
25 "updater_id": null ,
26 "wikipedia_id": null
27 },...
28 ]
Passing an object will match exactly document with those field and value. In
other words, passing an object is equal to passing multiple eq command and and
command. Above query can re-write using expression:
1 r.db('foodb').table('foods')
2 .filter(
3 r.and(
4 r.row('food_type').eq('Type 1'),
5 r.row('food_group').eq('Fruits')
6 )
7 )
The object notation is much cleaner in this case.
From a selection of document, We can use pluck to get a subset of documents’s
field instead of returning the whole document. Similarly to how we use bracket
to get a particular field
1 r.db('foodb').table('foods')
2 .filter({
3 food_type: 'Type 1',
4 food_group: 'Fruits'
5 })
6 .pluck('id', 'name', 'food_subgroup')
7 //=>Executed in 70ms. 40 rows returned, 40 displayed, more available
8 [
9 {
10 "food_subgroup": "Berries" ,
11 "id": 75 ,
12 "name": "Black crowberry"
13 }, {
14 "food_subgroup": "Tropical fruits" ,
15 "id": 150 ,
16 "name": "Guava"
17 }, {
18 "food_subgroup": "Tropical fruits" ,
19 "id": 151 ,
20 "name": "Pomegranate"
21 }, ...
22 ]
By passing a list of field to pluck, we can get only those field.
Opposite of pluck is without. We passed a list of fields, and it removes
those fiels from document.
1 r.db('foodb').table('foods')
2 .filter({
3 food_type: 'Type 1',
4 food_group: 'Fruits'
5 })
6 .without("created_at", "picture_content_type", 'picture_file_name', 'picture_f\
7 ile_size', 'picture_updated_at')
8 //=> Executed in 52ms. 40 rows returned, 40 displayed, more available
9 [
10 {
11 "creator_id": null ,
12 "description": null ,
13 "food_group": "Fruits" ,
14 "food_subgroup": "Berries" ,
15 "food_type": "Type 1" ,
16 "id": 75 ,
17 "itis_id": "23743" ,
18 "legacy_id": 81 ,
19 "name": "Black crowberry" ,
20 "name_scientific": "Empetrum nigrum" ,
21 "updated_at": Fri Apr 20 2012 16:29:43 GMT-07:00 ,
22 "updater_id": null ,
23 "wikipedia_id": null
24 },...
25 ]
With simple filterting, we can easily pass an filter object as above. But what
up with complex searching? Such as finding all foods whose name starts with
character N. As you see at the beginning, we used r.row command to do a
bit complex query.
1 r.db('foodb').table('foods')
2 .filter(r.row('created_at').year().eq(2011))
Let’s dive more into it.
r.row
r.row is our swiss army knife. It refers to current visited document.
Literally, it’s the document at which RethinkDB is accessing. You can think of
it like this in a JavScript callback/iterator. Or think of it like current
element in an iterator loop. It’s very handy because we can call other ReQL
command on it to achieve our filtering.
It somehow feel like jQuery filtering command. For an instance, we write this
in JavaScript to filter all DOM element whose data-type value is anime.
1 $('.db').find('.table').filter(function() {
2 return $(this).data('type')=='anime'
3 })
In ReQL, using filter with filter object:
1 r.db('foodb').table('foods').filter({food_group: 'Fruits'})
We can re-write it with r.row
1 r.db('foodb').table('foods').filter(r.row('food_group').eq('Fruits'))
Breaking it down we have:
- r.row ⇒ current document
- (‘type’) ⇒ get value of field type
- .eq(‘anime’) ⇒ return true if the value is equal to the argument, anime in this case
r.row a RethinkDB object, which we can continue call many method to filter or
manipulation it. The expression that we pass into filter is a normal ReQL
expression but evaluate to a boolean result. RethinkDB runs it and if the returned value is true, the
document is included into result set. Ideally, any function that returns boolean
result can used with filter. Note that the evaluation of filter expression run
on RethinkDB server, therefore they has to be a valid ReQL expression, they
cannot be any arbitrary language expression. You cannot write:
1 r.db('db').table('table').filter(r.row('type') == 'anime')
In manner of filter action, we usually execute comparison or some condition to
be matched, RethinkDB gives us some kind of those method. You should refer to
its API for extensive command. Usually, we can use r.row in combine with
pluck or without or bracket command to narrow down data before
comparing. Below are some function for that purpose:
-
eq(value) check equal to value. similar to
==. -
ne(value) check not equal to value. similar to
!=. -
ge(value) check greater than or equal value. similar to
>=. -
gt(value) check greater than value. similar to
>. -
le(value) check less than or equal value. similar to
<=. -
lt(value) check less than value. similar to
<. - add(value) Sum two numbers, concatenate two strings, or concatenate 2 arrays.
- sub() Subtract two numbers.
Each of above command can be call on different data type. Eg, when you call add
on an array, it will append the element to array. when you call on a string, it
concat parameter to the original string. Or calling on a number and they just
do arithmetic operation.
Run those command in data explorer:
1 r.expr(["foo", "bar"]).add(['forbar'])
2 //=>
3 [
4 "foo" ,
5 "bar" ,
6 "forbar"
7 ]
8
9 r.expr(2).add(10)
10 //=>
11 12
12
13 r.expr('foo').add("bar")
14 //=>
15 "foobar"
You can find more about those document in RethinkDB doc, in group Math and logic4.
Let’s apply what we learn, by finding al food where its name starts with
character R and is a tropical fruits.
1 r.db("foodb").table("foods")
2 .filter(
3 r.row("name").match("^R")
4 .and(
5 r.row("food_subgroup").eq('Tropical fruits')
6 )
7 )
8 //=>
9 {
10 "created_at": Wed Feb 09 2011 00:37:27 GMT-08:00 ,
11 "creator_id": null ,
12 "description": null ,
13 "food_group": "Fruits" ,
14 "food_subgroup": "Tropical fruits" ,
15 "food_type": "Type 1" ,
16 "id": 234 ,
17 "itis_id": "506073" ,
18 "legacy_id": 249 ,
19 "name": "Rambutan" ,
20 "name_scientific": "Nephelium lappaceum" ,
21 "picture_content_type": "image/jpeg" ,
22 "picture_file_name": "234.jpg" ,
23 "picture_file_size": 71055 ,
24 "picture_updated_at": Fri Apr 20 2012 09:43:04 GMT-07:00 ,
25 "updated_at": Fri Apr 20 2012 16:43:05 GMT-07:00 ,
26 "updater_id": null ,
27 "wikipedia_id": null
28 }
Here we are usinbg match with an regular expression ^R means any name starts
with R, and using and to do an and operator with other boolean. Other
boolean is result of getting field food_subgroup and compare with tropical
fruits.
filter seems handy but it’s actually limited. filter didn’t leverage index.
It scan and hold all data in memory. Of course, this isn’t scale infinite. Only
100,000 records can be filter. For anything large than that, we have to use
getAll or between which we will learn in chapter 5.
Now, let’s try to find all foods which has more than 10 foods document in its
group. We probably think of a simple solution like this: for each of document,
we get its food_group and count how many items has that same food group, if
the result is greater than 10, we return true, so that it will be included in
filter result. We may have duplicate result but let’s try this naieve
soltuion:
1 r.db('foodb').table('foods')
2 .filter(
3 r.db('foodb').table('foods')
4 .filter(
5 {food_group: r.row("food_group")}
6 )
7 .count()
8 .gt(10)
9 )
Query looks good but when we run, we get this:
1 RqlCompileError: Cannot use r.row in nested queries. Use functions instead in:
2 r.db("foodb").table("foods").filter(r.db("foodb").table("foods").filter({food_gr\
3 oup:
4 r.row("food_group")}).count().gt(10))
Basically, we have nested query here, and RethinkDB doesn’t know which query
r.row should belong to, is it parent query, or the sub query? In those case,
we have to use filter with function. Let’s move to next chapter.
Filter with function
Beside passing an ReQL expression, we can also use a function which return true
or false to filter.
Let’s try previous example.
1 r.db('foodb').table('foods')
2 .filter(function (food) {
3 return r.db('foodb').table('foods').filter({food_group: food("food_group")}).\
4 count().gt(10)
5 })
Now, we no longer using r.row, we pass an anonymous function with a single
parameter(which we can name whatever), when itereating over the table, RethinkDB
call this function, and passing current document as its first argument. By using
function, we can still access current document, without using r.row, and
clearly bind current document to a variable, so that we can access its value and
avoid conflicting. Here, we name our argument food, instead of writing:
1 filter({food_group: r.row("food_group")})
We will write:
1 filter({filter_group: food("food_group")})
And we using boolean value, count().gt(10) here, as result of function.
Filter with function helps us write query with complex logic.
Pagination data
We rarely want a whole sequence of document, usually we care about a subset of data such as pagination data. In this section, we go over commands: order, limit and skip.
Order data
So far, we only select data and accept default ordering. Let’s control how they appear:
1 r.db('foodb').table('foods')
2 .filter(function (food) {
3 return r.db('foodb').table('foods').filter({food_group:
food(“food_group”)}).count().gt(10) }) .orderBy(“name”) //⇒Executed in 5.69s. 821 rows returned [ { “created_at”: { “$reql_type$”: “TIME”, “epoch_time”: 1297240650, “timezone”: “-08:00” }, “creator_id”: null, “description”: null, “food_group”: “Aquatic foods”, “food_subgroup”: “Mollusks”, “food_type”: “Type 1”, “id”: 280, “itis_id”: “69493”, “legacy_id”: 307, “name”: “Abalone”, “name_scientific”: “Haliotis”, “picture_content_type”: “image/jpeg”, “picture_file_name”: “280.jpg”, “picture_file_size”: 99231, “picture_updated_at”: { “$reql_type$”: “TIME”, “epoch_time”: 1334940073, “timezone”: “-07:00” }, “updated_at”: { “$reql_type$”: “TIME”, “epoch_time”: 1334965273, “timezone”: “-07:00” }, “updater_id”: null, “wikipedia_id”: null }, … ]
We re-used above filter query, but append orderBy("name"). If you notice,
the above command run quite long Executed in 5.56s. 821 rows returned and
all rows are returned instead of a streams as usual. When we are calling
orderBy without specifing an index, it load all data into memory to sort,
which is both of slow and in-efficient. We will learn more about ordering with
index in chapter 5. For now, let’s continue with this method because, well, they
are easy to use, at first :D
We can reverse order by applying r.desc command:
1 r.db('foodb').table('foods')
2 .filter(function (food) {
3 return r.db('foodb').table('foods').filter({food_group: food("food_group")})\
4 .count().gt(10)
5 })
6 .orderBy(r.desc("name"))
We can order on table too, not just filter:
1 r.db('foodb').table('foods')
2 .orderBy(r.desc("name"))
We can order by multiple field, at a time
1 r.db('foodb').table('foods')
2 .orderBy(r.desc("name"), r.asc("created_at"))
We order by descending order on field name and ascending on field
created_at.
We can combine some document commands with orderBy too. Such as pluck only
an useful set of fields:
1 r.db('foodb').table('foods')
2 .pluck("id", "name", "food_group")
3 .orderBy(r.desc("name"), r.asc("created_at"))
4 //=>Executed in 122ms. 863 rows returned
5 [
6 {
7 "food_group": "Milk and milk products",
8 "id": 634,
9 "name": "Yogurt"
10 },
11 {
12 "food_group": "Milk and milk products",
13 "id": 656,
14 "name": "Ymer"
15 },
16 {
17 "food_group": "Aquatic foods",
18 "id": 523,
19 "name": "Yellowtail amberjack"
20 },
21 ...
22 ]
Limiting data
Once we have an ordering sequence, we usually want to select a limit number of
document instead of the whole sequence. We use command limit(n) for this
purpose. It get n elements from the sequence or array.
1 r.db('foodb').table('foods')
2 .pluck("id", "name", "food_group")
3 .orderBy(r.desc("name"), r.asc("created_at"))
4 .limit(4)
5 //=>Executed in 107ms. 2 rows returned
6 [{
7 "food_group": "Milk and milk products",
8 "id": 634,
9 "name": "Yogurt"
10 }, {
11 "food_group": "Milk and milk products",
12 "id": 656,
13 "name": "Ymer"
14 }, {
15 "food_group": "Aquatic foods",
16 "id": 523,
17 "name": "Yellowtail amberjack"
18 }, {
19 "food_group": "Aquatic foods",
20 "id": 522,
21 "name": "Yellowfin tuna"
22 }]
limit get us a number of document that we want, but it always start from the
beginning of sequence. To start selecting data starts from a position, we used
skip.
Skip
As its name, skip(n) ignore a number of element from the head of sequence.
1 r.db('foodb').table('foods')
2 .pluck("id", "name", "food_group")
3 .orderBy(r.desc("name"), r.asc("created_at"))
4 .skip(2)
5 .limit(2)
6 //=> Executed in 97ms. 2 rows returned
7 [{
8 "food_group": "Aquatic foods",
9 "id": 523,
10 "name": "Yellowtail amberjack"
11 }, {
12 "food_group": "Aquatic foods",
13 "id": 522,
14 "name": "Yellowfin tuna"
15 }]
Access Nested field
As you know, RethinkDB document is a JSON object. Very likely we have two or more level of data structure. So how we can access those nested field, or to drill down the fields.
Let’s beging this chapter by creating some sample data. Just copy and paste, ignore the syntax for now because we save them for chapter 4.
First, create table on test db.
1 r.tableCreate("books")
Then, insert sample data:
1 r.table("books")
2 .insert([
3 {
4 id: 1,
5 name: "Simply RethinkDB",
6 address: {
7 country: {
8 code: "USA",
9 name: "The United State of America"
10 }
11 },
12 contact: {
13 phone: {
14 work: "408-555-1212",
15 home: "408-555-1213",
16 cell: "408-555-1214"
17 },
18 email: {
19 work: "bob@smith.com",
20 home: "bobsmith@gmail.com",
21 other: "bobbys@moosecall.net"
22 },
23 im: {
24 skype: "Bob Smith",
25 aim: "bobmoose",
26 icq: "nobodyremembersicqnumbers"
27 }
28 }
29 },
30 {
31 id: 2,
32 name: "TKKG",
33 address: {
34 country: {
35 code: "GER",
36 name: "Germany"
37 }
38 },
39 contact: {
40 phone: {
41 work: "408-111-1212",
42 home: "408-111-1213",
43 cell: "408-111-1214"
44 },
45 email: {
46 work: "bob@gmail.com",
47 home: "bobsmith@axcoto.com",
48 other: "bobbys@axcoto.com"
49 },
50 im: {
51 skype: "Jon",
52 aim: "Jon",
53 icq: "nooneremembersicqnumbers"
54 }
55 }
56 }
57 ])
Depend on your language, you will usually have some way to access nested field, by following the nested path. In above example, let’s say we want to access *skype im, the path is:
contact -> im -> skype
Using JavaScript driver, we will use bracket to access field and sub field.
1 r.table('books').get(1)('contact')('im')('skype')
2 //=>
3 "Bob Smith"
While as, in Ruby driver, bracket notation is [field]
1 r.table('books').get(1)['contact']['im']['skype']
We can keep calling bracket to get the final nested field follow the path. Not just a single document, we can use bracket on table level too:
1 r.table('books')('address')('country')
2 [
3 {
4 "code": "GER" ,
5 "name": "Germany"
6 }, {
7 "code": "USA" ,
8 "name": "The United State of America"
9 }
10 ]
Or using in combination with filter, on selection:
1 r.table('books')
2 .filter({id: 1})('address')('country')('name')
3 //=>
4 "The United State of America"
Beside using bracket command, we can also using getField if that feel more
nature:
1 r.table('books')
2 .getField('contact')('email')
3 //=>
4 [
5 {
6 "home": bobsmith@axcoto.com, »
7 "other": bobbys@axcoto.com, »
8 "work": bob@gmail.com, »
9 }, {
10 "home": bobsmith@gmail.com, »
11 "other": bobbys@moosecall.net, »
12 "work": bob@smith.com, »
13 }]
At the end of the day all you have to remember is to drill down the path with a chain of bracket command.
- https://github.com/rethinkdb/rethinkdb/issues/829↩
- http://stackoverflow.com/questions/21020823/unique-integer-counter-in-rethinkdb↩
- http://en.wikipedia.org/wiki/Universally_unique_identifier↩
- http://rethinkdb.com/api/javascript/#mod↩