Leanpub: Publish Early, Publish Often

Appendix A. A Short Introduction to Ruby

The aim of this section is to gently introduce you to Ruby in enough detail that you can read the recipes without getting completely bogged down in the syntax. You can safely skip this if you have any Ruby experience.

I’m not going to attempt to cover Ruby in great detail. If you’re interested in learning more, there are many great books out there on Ruby. One particularly good one is The Ruby Programming Language by David Flanagan and Yukihiro Matsumoto.

An Example Program

Here is an example program that really doesn’t do anything but illustrate some syntax:

Example A.1. example.rb

 1 #!/usr/bin/env ruby
 2 
 3 # Comments begin with the pound sign.  They can also go at the end of lines
 4 puts "the example program" # puts sends output to STDOUT and appends a newline
 5 
 6 # assignment
 7 number = 2 
 8 s = "this is a string"
 9 number = "two"
10 
11 # arrays
12 an_array = [1,2,3,4,5]
13 # A more complicated array, containing integers, a string and another array
14 an_array = [1,2, 'three', [4,5,6]]
15 
16 # hashes
17 a_hash = {1 => 'one', 2 => 'two', 3 => 'three', 4 => 'four'}
18 # A more complicated hash, using symbols as keys
19 a_hash = {:one => 'one', :two => 2, :three_and_four => [3,4]}
20 
21 # A simple iterator
22 an_array.each do |element|
23   puts "#{element}"
24 end

Depending on your coding background, this code might look pretty normal, or just plain crazy. I’ve been coding in dynamic languages for way too long to intuitively know what areas might make you go “whoah!”, but here are some pointers that might help:

Variables are not declared, and can change type

Notice the two lines

1 number = 2

and

1 number = "two"

This, while bad programming practice, won’t raise any errors. At the end of the program, number is a string with the value “two”. Note that trying to read the value of an undeclared variable will raise an error:

1 $> irb
2 >> puts undeclared_variable
3 NameError: undefined local variable or method `undeclared_variable' for main:Obj\
4 ect

Everything is an object

In Ruby, everything is an object. This has many consequences, most of them good. One consequence is illustrated in the example code here:

1 an_array = [1,2, 'three', [4,5,6]]

An array is an object, and contains any number of other objects. There’s no need to jump through any hoops if you want to make an array of arrays or have different object types in a single array.

Data types

The code in example.rb shows the instantiation of String, Integer, Array and Hash objects. Symbols are also flirted with briefly. These and a few other data types are discussed further below.

Iterators

Notice the way the array is looped through in the last bit of the example code:

1 an_array.each do |element|
2   puts "#{element}"
3 end

This is an example of a Ruby iterator. Ruby does have things like for and while loops, but they’re rarely used. I think I used a while loop once in the code for this book, and only because I couldn’t figure out a way not to. Instead, blocks and iterators are used. They’re explained in more detail below.

Ruby Data Types

Numbers

Numbers can be integers or floating point numbers. One fun thing about numbers in Ruby is that they have methods too:

1 12.333.round # 12
2 -13.abs # 13
3 9.lcm(12) # The Lowest Common Multiple of 9 and 12 is 36

Strings

Strings are just like in every other language: an ordered list of characters. One difference from a few other languages is that strings are mutable

1 name = 'Scott Patton' 
2 name[10] = 'e' # name is now 'Scott Patten'

Arrays

An array is an ordered collection of objects. The objects in an array are called elements. An array is instantiated like this:

1 some_array = [object1, object2, object3, ...]
2 some_array = [1,2,3,7,9]

You access an element of an array by giving its index (counting from zero) between square brackets

1 ordinal_numbers = ['zeroth', 'first', 'second', 'third', 'fourth', 'fifth']
2 ordinal_numbers[3] # third

For the case above, where you are instantiating an array of strings, you will often see the following syntax, which is equivalent but more concise and readable:

1 ordinal_numbers = %w(zeroth first second third fourth fifth)

Hashes

A hash is an un-ordered collection of objects. Each entry in the hash has a key (used to access the value) and a value. hashes are instantiated like this:

1 some_hash = {key1 => value1, key2 => value2, key3 => value3}

Keys can be any object, but it’s usually a bad idea to use something mutable like an array or hash as a key. It’s also more idiomatic (and memory efficient) to use a symbol rather than a string as a key. Values can be any object as well. You access a value by putting its key between square brackets, like this:

1 params = {:track_response => true, :default_response => 'Yes', :acceptable_respo\
2 nses => ['Yes', 'No', 'Maybe']}
3 params[:default_response] # true
4 params[:acceptable_responses][0] # 'Yes'

Symbols

Symbols take a little getting used to (at least they did for me when I started using Ruby). Symbols look like this: :some_symbol, a colon followed by a string of characters. At first, you can think of them as immutable strings. Symbols of the same name are initialized and exist in memory only once (as opposed to strings, which will be different objects in memory even if their contents are the same). So, if you use a symbol multiple times, you are using the same object and save some memory. For a nice explanation of symbols, see http://glu.ttono.us/articles/2005/08/19/understanding-ruby-symbols. One of the comments on that article, by Ahmad Alhashemi, gives another useful way of thinking about symbols: symbols “…are literal values, just like the number 3 and the string ‘red’. you can say: v = :s, but you can’t say: :s = v” Nicely put, Ahmad.

Ranges

A range is instantiated like

1 (start_value .. end_value)

1 (start_value ... end_value)

They do what you think they should: give you a range of values, increasing by a single step for every value. The two dot version includes the end value in the range, the three dot version doesn’t. If you want to see what’s inside a range, use the to_a (to array) method on it:

1 >> (1 .. 5).to_a # [1, 2, 3, 4, 5]
2 >> (1 ... 5).to_a # [1, 2, 3, 4]

To use a range, you typically iterate over it

1 (1 .. 5).each {|n| puts n*n}
2 1
3 4
4 9
5 16
6 25

Notice that I didn’t say that ranges increase by one on every step. I was intentionally vague about what the amount they increase is. Ranges can be built of any object that implements a succ method (to give the next value in the range) and the <=> method (to compare two values in the range). This is true for classes you create yourself as well.

1 >> ('a' .. 'g').to_a # ["a", "b", "c", "d", "e", "f", "g"]
2 >> require 'date'
3 >> start_date = Date.new(2010,03,12)
4 >> end_date = Date.new(2010,03,21)
5 >> vancouver_olympics_dates = (start_date .. end_date)

Output

You print things to the standard output (typically the terminal you’re working in) using puts or print. puts appends a newline to the output, print doesn’t.

Strings enclosed in double quotes do variable interpolation. Anything inside of a #{} is interpreted as Ruby code, and its output as a string inserted into the output.

1 "#{n} squared = #{n * n}" # "32 squared = 1024"

Strings in single quotes do not do variable interpolation

1 '#{n} squared = #{n * n}' # "\#{n} squared = \#{n * n}"

Every object implements two methods of converting itself to a string: to_s and inspect. By default, inspect just calls to_s. In many objects, inspect provides more readable output.

1 a = [1,2,3,4]
2 a.to_s # "1234"
3 a.inspect # "[1, 2, 3, 4]"
4 h = {:one => 1, :two => 2, :three => 3}
5 h.to_s # "one1two2three3"
6 h.inspect # "{:one=>1, :two=>2, :three=>3}"

I use inspect a lot in the recipes to show results.

You can, of course, define to_s and inspect on any classes you create.

Control Statements

Ruby has the standard control statements. An if statement looks like

1 if age > 30
2   puts "don't trust him!"
3 end

You can also use else and elseif modifiers to your if statements:

1 if x > 30
2   puts "Don't trust him!"
3 elsif x < 3
4   puts "Likely innocent"
5 else
6   puts "Still don't trust him!"
7 end

One thing that often trips up newcomers to Ruby is how true and false are defined. In most languages, the following code would result in no output:

Example A.2. truth_test.rb

 1 x = 0
 2 string = ""
 3 
 4 if x
 5   puts "x is true"
 6 end
 7 
 8 if string
 9   puts "string is true"
10 end

In Ruby, the only things that are false are false itself, and nil. The code above will print both statements:

1 $> ruby truth_test.rb
2 x is true
3 string is true

Blocks and iterators

One of the defining characteristics of Ruby is the use of blocks and, especially, using blocks to iterate. The most basic iterator is the each iterator. For an array, each goes through each element of the array, passing the element’s value to the block. So, the following code:

1 [1,2,3,4].each {|element| puts "#{element}:#{element * element}" }

will result in the following output:

The Enumerable mixin provides a host of other iterators. All iterators work the same way: you pass a block to the iterator, and it will execute that block for each element in the collection you are iterating over. The return value you get depends on which iterator you use. The iterators provided by Enumarable include

collect and map

collect and map return an array containing the value returned by the block for each element in the collection:

1 >> [1,2,3,4].collect {|n| n*n} # [1, 4, 9, 16]
2 >> %w(monday tuesday wednesday thursday friday saturday).collect {|day| day.capi\
3 talize}
4 => ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]

select

select returns all elements in the collection for which the block returns true

1 >> (1 .. 30).select {|n| n.is_prime?}
2 => [1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

reject

reject is the opposite of select: it returns all elements in the collection for which the block returns false

1 >> (1 .. 30).reject {|n| n.even?}
2 => [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29]

detect or find

detect and find return the first value in the collection for which the block returns true.

all?

all? returns true if the block returns true for all elements in the collection

1 >> %w(monday tuesday wednesday thursday friday saturday sunday).all? do |day| 
2   day =~ /y\Z/
3 end
4 => true

There are a few more iterators in Enumerable, but the list above shows the ones used in the book.

Methods

In Ruby, what you might think of as functions or subroutines are called methods. They are defined like this:

1 def method_name(arguments)
2   ... body of method goes here ...    
3 end

The body of the method can be anything you want. A method always returns a value, even if it’s just nil. If you don’t use the return method, the method will return the value of the last statement executed.

1 def double(n)
2   n * 2
3 end
4 double(2) # 4
5 double('two') # 'twotwo'

A method always belongs to a class. If you define it outside of a class, it will be added as a class method of the Object class:

1 def test
2   puts "hi, I'm a test method"
3 end
4 test
5 # hi, I'm a test method
6 Object.test
7 # hi, I'm a test method

Arguments are passed to methods as a comma delimited list of values. There are a bunch of different exceptions to this, but two that you’ll see throughout the book are default values for parameters and parameter hashes.

Default values for arguments are given like this:

1 def day_of_week(date, capitalize = true, short_form = false)
2   ...
3 end

In this case, the capitalize argument has a default value of true and short_form has a default value of false. If you only pass one argument when you call day_of_week, then capitalize will be set to true and short_form to false.

There are a few rules regarding this. First, since arguments with default values are not required to be sent, all arguments with default values must be after all the arguments without default values.

Second, if you want to set the second default argument to a non-default value, then you have to set the first default argument’s value as well. You can’t get around this by giving the argument names like you can in, for example, Python.

If the last argument to a method is a hash, then it can be passed in without the curly braces. This is often used to mimic the ability to pass in parameters without worrying about ordering:

 1 def day_of_week(date, params = {})
 2   capitalize = params[:capitalize] || true
 3   short_form = params[:short_form] || false
 4   ....
 5 end
 6 
 7 day_of_week(some_date, :short_form => true) # capitalize = true, short_form = tr\
 8 ue
 9 day_of_week(some_date) # capitalize = true, short_form = false
10 day_of_week(some_date, :capitalize => false) # capitalize = false, short_form = \
11 false

Another, more scaleable, way of setting the default values is:

1 def day_of_week(date, params = {})
2   defaults = {:capitalize => true, :short_form => false}
3   params = defaults.merge(params)
4   ....
5 end

Classes

A class in Ruby is defined like this:

1 class Bucket
2     
3     .... contents of class ....
4     
5 end

If you want to sub-class another class, then:

1 class Bucket < S3SuperClass
2     
3     .... contents of class ....
4     
5 end

This is read as “Bucket extends S3SuperClass”

Class names must start with a capital letter. By convention, they are camel-case (CamelCaseMeansWordsBeginWithCapitalLetters).

To create a new instance of a class, you use ClassName.new. This calls the initialize method on the class and returns a new instance of the class.

 1 class Bucket
 2   
 3   def initialize(arg1, arg2, arg3, ...)
 4     .... some initialization code ....
 5   end
 6   
 7 end
 8 
 9 # Create a new bucket
10 b = Bucket.new(arg1, arg2, arg3, ...)

Classes have both class methods and instance methods. An instance method acts on an instance of a class. A class method, called a static method in some languages, doesn’t require an instance to act on. Instance methods are defined like

1 def some_instance_method(arg1, arg2, arg3, ...
2   ... contents of method ...
3 end)

Class methods are defined as either

1 def self.class_method_name
2   ... contents of method ...
3 end

1 def ClassName.class_method_name
2   ... contents of method ...
3 end

and called as ClassName.class_method_name

Variables and Scope

In the interest of brevity and readability, I’m going to be a bit sloppy here. The aim, again, is to give you enough information to read the recipes, not to give a thorough introduction to variables and scope in Ruby.

In Ruby, there are five kinds of variables: global variables, class variables, instance variables, local variables and constants. They are differentiated by their names, as shown below in Table A.1, “Ruby Variable Types”.

Table A.1. Ruby Variable Types

Type Naming convention Examples Scope Notes

Global Start with a dollar sign ($) $global, $pi Everywhere Global variables are rarely used. I don’t think there is one in the book.

Class Start with two at signs (@@) @@class_variable, @@current_number_of_users Within their class Class variables are rarely used as well, but are more common than global variables

Instance Start with a single at sign (@) @instance_variable, @price Within a single instance of a class Instance variables are used widely. They are in scope throughout a single instance. See the discussion below on creating getters and setters for instance variables.

Local Start with a lowercase letter or underscore (_) some_variable, n Within the block they are defined in, and that block’s children. Easily the most common variable type.

Constants Start with a capital letter SOME_CONSTANT, PI Everywhere By convention, constants are all caps with underscores separating words. If you’re being pedantic, constants aren’t strictly variables: they will raise a warning if you try to change their value.

Getters and Setters for Instance Variables

None of the variables inside of an instance are available outside of that instance. In many classes, you will want to create getter and setter methods for instance variables. You might be tempted to write something like this:

Example A.3. dog.rb

 1 class Dog
 2   
 3   def initialize(name, age, breed)
 4     @name = name
 5     @age = age
 6     @breed = breed
 7   end
 8   
 9   def name
10     @name
11   end
12   
13   def name=(new_name)
14     @name = new_name
15   end
16   
17   def age
18     @age
19   end
20   
21   def age=(new_age)
22     @age = new_age
23   end
24   
25   def breed
26     @breed
27   end
28   
29   def breed=(new_breed)
30     @breed = new_breed
31   end
32   
33 end

This will work. You can use it like this:

 1 $> irb -r dog
 2 >> d = Dog.new('frank', 13, 'Heinz 57')
 3 => #<Dog:0x35a110 @name="frank", @breed="Heinz 57", @age=13>
 4 >> d.breed
 5 => "Heinz 57"
 6 >> d.name
 7 => "frank"
 8 >> d.name = "Frank the Great, Esquire"
 9 => "Frank the Great, Esquire"
10 >> d.name
11 => "Frank the Great, Esquire"

Creating getters and setters for instance variables is such a common task, however, that there’s a much easier way to do it. The attr_accessor macro will create a getter and setter for each symbol listed after it. The following code is equivalent to the more verbose version of dog.rb listed above:

Example A.4. dog.rb

 1 class Dog
 2   
 3   attr_accessor :name, :age, :breed
 4   
 5   def initialize(name, age, breed)
 6     @name = name
 7     @age = age
 8     @breed = breed
 9   end
10   
11 end

If you only want to create a getter method, use the attr_reader macro. If you want only the setter method, use attr_writer.