Table of Contents
Scala Is Not A Language
If you’re new to Scala, the chances are you’ll have some complaints. The language seems bewilderingly large and complex; things can and do go wrong in unexpected ways. One moment it’s implicit function arguments that are the source of the frustration, the next it’s something called the “cake pattern”. Those who’ve been working with the language a little longer are likely to smile benevolently and humour your objections with an infuriating air of enlightened tolerance. “Just stick with it”, they may say. “I felt exactly the same six months ago, and then…”
And then what? Not everybody will make their peace with Scala: some people will always dislike the language, and some of them will have good reasons for doing so. But the transition from wary discomfort to actual enjoyment of Scala programming is a common one, and seems to take place for many people after somewhere around six months of exposure to it. What changes for them? Is it just a matter of increased familiarity, or is there some particular realisation that makes all the difference?
A colleague of mine, Ian, described his own discovery of the true nature of Scala in the following terms. On first acquaintance, he had thought of it as a language like Java or Python, only bigger and hairier. His own preference for compact, single-idiom languages that are easy to fit into your brain made him rebel against Scala’s seemingly profligate abundance of features and supported programming styles.
According to Larry Wall, famously, in Perl “there’s more than one way to do it” - which makes Perl a great tool for hackers who enjoy creatively flexing the language, but a many-tentacled nightmare for those who prefer Python’s more prescriptive rubric: “there should be one - and preferably only one - obvious way to do it”. Scala seemed to Ian to favour the solo hacker (or freewheeling show-off) over the team-player who wrote software for others to read and understand.
Indeed, there was a general feeling in our team at the time that the only safe way to use Scala was to restrain the show-offs, and stick to a subset of the language that everyone felt comfortable with. The trouble with that approach was that whatever features we decided to rope off as unsafe, we’d find ourselves forced to use by 3rd-party libraries for which they appeared to be essential. The complexity seemed unavoidable.
However, my colleague came to realise that Scala was not a language in the sense the Java or Python were languages, but rather a toolkit out of which a family of related mini-languages could be put together. Each of these mini-languages was compact and consistent: the art of Scala programming was knowing their different idioms, and deciding which to use for a given task. Once you understood this, the different pieces of the picture started to separate out. Instead of a confusing whirl of novelty, Scala started to seem like a collection of well-designed parts - parts that actually hung together surprisingly well.
Of course, it’s an exaggeration to say that Scala “isn’t a language”. What Scala isn’t, is a single-idiom language. Scala is both “multi-paradigm”, notably in the sense that it supports both object-oriented and functional programming styles, and multi-idiom, in the sense that it supports the creation of mini-languages tailored for specific purposes. If you’re having trouble finding out what “idiomatic Scala” is supposed to look like, it may be because there are multiple idioms to choose from.
The purpose of this book is to explore some of these idioms, and to show how a DSL (Domain Specific Language) like ScalaQuery, an advanced typeclass library like Scalaz and a distributed programming framework like Akka can coexist within the same language, each defining its own particular patterns of usage. We’ll be looking at code examples, getting to know the techniques and language features which support each idiom, and critically comparing them. My ambition is not to answer the question “when should I use X?” for you, but to enable you to answer that question for yourself, on the basis of a widely-informed understanding of the options and possibilities that Scala provides.
Rudiments
Styles of composition
In Scala, we are most often programming with expressions rather than with statements. What’s the difference? A statement prescribes an action, something that the program will do while executing. For example, take the following method for deleting a file:
1
def
delete
(
file
:
java
.
io
.
File
)
:
Unit
=
{
2
println
(
"Deleting a file"
)
3
file
.
delete
();
4
println
(
"Deleted the file"
);
5
}
This method executes a sequence of actions: write to the console, delete the file, write to the console again. The return type of the method as a whole is Unit
, which indicates that it returns no value that we care about: the meaning of the method lies in the relationship between its inputs and what it makes happen. Now consider this method for computing the sum of three integers:
1
def
sum3
(
i1
:
Int
,
i2
:
Int
,
i3
:
Int
)
:
Int
=
i1
+
i2
+
i3
The expression i1 + i2 + i3
describes how to obtain a new value by combining existing values. It has no side-effects (such as deleting a file), but it defines a computation. The meaning of the method lies in the relationship between its inputs and the result of this computation.
In practice, things are often more muddled than this. A method that reads a line from a file and returns it has both side-effects (it moves the file-pointer on) and a non-Unit
return value, one that we presumably care about. A method that writes a line to a file may return Unit
, but might also throw an exception if the write-operation fails. Scala isn’t a “pure” language, so it doesn’t enforce (like Haskell) a strict separation between side-effecting and non-side-effecting code. It also permits a blend of code that deals with failures by throwing exceptions with code that discriminates between success and failure in its return values.
It makes sense, therefore, to think of a “statement” in Scala as a peculiar type of “expression” - one that is side-effecting, and which may have a Unit
return type. Statements are usually composed into a sequence, as part of the body of a method: in order to do X, first do this, then that, then that. But expressions in general can be composed in a variety of ways, three of which we’ll quickly consider here.
“Fluent” method chaining
You will often see Scala code that looks like this:
1
people
.
sortBy
(
_
.
name
)
2
.
zipWithIndex
3
.
map
{
case
(
person
,
i
)
=>
"%d
: %
s
".format(i + 1, person.name) }
4
.
foreach
{
println
(
_
)
}
Here, the return value of each method call becomes the target of the next. Sometimes (but not always) it can be clarifying to pull out and name some of the intermediate values (I’ve also added their types, although it’s usually unnecessary to state these explicitly):
1
val peopleByName: List[Person] = people.sortBy(_.name)
2
val peopleWithRank: List[(Person, Int)] = peopleByName.zipWithIndex
3
val toString = peopleWithRank.map { case (person, i) => "%d: %s".format(i +\
4
1, person.name) }
5
toString.foreach { println(_) }
One advantage of this more explicit style is that it makes it clear where intermediate values, especially collections, are being constructed. (We’ll discuss later the use of view
to make chained collection-transforming methods more efficient).
Infix combinators
Infix combinators enable functions to be chained together; the most commonplace is andThen
:
1
val sortByName: List[Person] => List[Person] = _.sortBy(_.name)
2
val addRank: List[Person] => List[(Person, Int)] = _.zipWithIndex
3
val makeString: List[(Person, Int)] => List[String] = _.map { case (person,\
4
i) => "%d: %s".format(i + 1, person.name) }
5
val printOut: List[String] => Unit = _.foreach{ println(_) }
6
7
(sortByName andThen addRank andThen makeString andThen printOut)(people)
Scalaz defines quite a few more, to do with such category-theoretic exotica as applicative functors and kleisli arrows. We’ll cover these in due course.
for
-comprehensions
These deserve a longer treatment (and will get one, later on). For now, let’s just note that
1
(
for
{
2
byName <-
persons.sortBy(
_.name)
.point[
Identity]
3
withRank <-
byName.zipWithIndex.point[
Identity]
4
toString <-
withRank.map {
case (
person,
i)
=>
"%d: %s"
.format(
i +
1
,
per\
5
son.name)
}
.point[
Identity]
6
_ <-
toString.foreach {
println(
_)
}
.point[
Identity]
7
}
yield Unit)
.value
translates the code in the previous section into the Identity monad, and that for
-comprehensions provide an extremely powerful idiom for composing computations of many different kinds.
If you already know about monads, that last sentence won’t seem to say very much. If you don’t, it will probably seem to say even less. For the moment, I just want to list the for
-comprehension syntax among the styles of composition mentioned here. Monads are important even in “vanilla” Scala (that is, Scala without the Scalaz library); they’re also idiomatic, to the point where a programmer who neither knows nor cares what a monad is will soon find themselves using a handful of common monads comfortably, and entirely unawares.
Querying and transforming data: map
, flatMap
, filter
and fold
In this section, we’ll look at one of the simplest and most fundamental sub-languages of Scala, the data query and transformation language provided by the collections library. We’ll see how the logic of relational queries over database tables, expressed in SQL, translates into the logic of functional queries over in-memory collections, expressed in native Scala.
By “functional queries” I mean simply queries expressed using the higher-order functions map
, flatMap
, filter
and fold
(or its variants: foldLeft
, foldRight
and reduce
). In Scala these are implemented as methods on collection types (e.g. List
) which take a function and return the method’s target transformed using that function. Let’s start with an example of one such query. Consider the following snippet of code, which takes a list of three numbers, multiplies each of them by 2, throws away any that are not greater than 2, and adds the remaining numbers together:
1
List
(
1
,
2
,
3
).
map
(
_
*
2
)
2
.
filter
(
_
>
2
)
3
.
foldLeft
(
0
)(
_
+
_
)
4
//> res1: Int = 10
Syntactically, this has the look of a “fluent API” in which method calls are chained and laid out across several lines. Each method call returns a new value - the initial List(1, 2, 3)
is unmodified - so that a sequence of intermediate values is threaded from call to call. If we pull out and name the intermediate values, it looks like this:
1
val
initialList
=
List
(
1
,
2
,
3
)
//> initialList: List[In\
2
t] = [1, 2, 3]
3
val
doubles
=
initialList
.
map
(
_
*
2
)
//> doubles: List[Int] =\
4
[2, 4, 6]
5
val
greaterThan2
=
doubles
.
filter
(
_
>
2
)
//> greaterThan2: List[I\
6
nt] = [4, 6]
7
val
summed
=
greaterThan2
.
foldLeft
(
0
)(
_
+
_
)
//> summed: Int = 10
The map
method applies the function _ * 2
to each member of initialList
, returning a list of doubles
, while the filter
method creates a copy of doubles
which contains only the values for which the function _ > 2
returns true
. Finally, the foldLeft
function takes an initial value of 0 and adds each of the values in greaterThan2
to it in turn, obtaining the sum of the values in the list. (In fact, for lists of numeric values there’s a sum
method that does exactly the same.)
A great many operations which involve selecting some subset of a dataset, transforming it and presenting some summary of the result can be expressed using this language, and it is common in Scala to use this idiom to express algorithms which can be broken down into steps of these kinds. (We’ll consider some of the shortcomings of this idiom, and discuss an alternative approach, later on).
Relational queries
A relational query is a query which is expressed in terms of some variation of relational algebra. The database query language SQL may be considered a somewhat low-fidelity implementation of this idea. Among the operations it supports are projection, which identifies a range of columns to return, and selection which restricts the returned column-sets to those matching a particular condition.
Confusingly, the keyword SQL uses for projection is SELECT
:
1
SELECT
name
,
age
2
FROM
Person
Here we have a table, Person
, and a query which projects the name
and age
columns from this table, returning a result set containing the name and age of every person in the table. For selection, we use the WHERE
clause:
1
SELECT
name
,
age
2
FROM
Person
3
WHERE
age
>
40
In this case, we are only interested in persons over 40 - the condition (or “predicate”) age > 40
is applied as a test to each person in the table, and the projection is only applied to those which meet the condition.
Projection and selection in Scala
Let’s see if we can define a similar query in Scala. Firstly, we define a Person
using a case class:
1
case
class
Person
(
name
: String
,
age
:
Int
,
favouriteColour
:
String
)
Secondly, we create a data structure - in this case, a plain List
- containing several Persons:
1
val
maggie
=
Person
(
"Maggie"
,
32
,
"Red"
)
2
val
millie
=
Person
(
"Millie"
,
24
,
"Blue"
)
3
val
molly
=
Person
(
"Molly"
,
45
,
"Green"
)
4
val
may
=
Person
(
"May"
,
41
,
"Yellow"
)
5
val
persons
=
List
(
maggie
,
millie
,
molly
,
may
)
Finally, we can express our query over this structure using map
and filter
:
1
persons
.
map
(
person
=>
(
person
.
name
,
person
.
age
))
2
.
filter
(
_
.
_2
>
40
)
3
//> res1: List[(String, Int)] = [("Molly", 45), ("May", 41)]
The map
performs the “projection” part of the SQL query (the SELECT
clause), projecting a tuple of name and age out of each Person
. The filter
then applies the “selection” (the WHERE
clause).
Joining tables
Now for a more complicated example. Here’s a straightforward join in SQL, expressed by taking the Cartesian product of two tables and selecting only those rows in the product where the row in the second table is related to the row in the first:
1
SELECT
p
.
name
,
p
.
age
,
a
.
postcode
2
FROM
Person
p
,
Address
a
3
WHERE
a
.
occupant_id
=
p
.
id
This will give us the name, age and postcode of every person who has an address, for every address that they have (some people might have more than one).
Now here’s some Scala that will (rather inefficiently) do the same:
1
case
class
Address
(
postcode
: String
,
occupant
:
Person
)
2
3
val
addresses
=
List
(
4
Address
(
"AB1 2CD"
,
maggie
),
5
Address
(
"VB6 5UX"
,
maggie
),
6
Address
(
"HR9 5BH"
,
may
)
7
)
8
9
addresses
.
map
(
address
=>
10
persons
.
filter
(
address
.
occupant
==
_
)
11
.
map
(
person
=>
(
person
.
name
,
person
.
age
,
address
.
postcode
)))
12
//> res1: List[List[(String, Int, String)]] = List(List((Maggie,32,AB1 2CD)\
13
), List((Maggie,32,VB6 5UX)), List((May,41,HR9 5BH)))
Because we have two nested map
s here, we end up with a List
of List
s. We can “flatten” this into a single List
by using flatMap
in place of the first map
:
1
addresses
.
flatMap
(
address
=>
2
persons
.
filter
(
address
.
occupant
==
_
)
3
.
map
(
person
=>
(
person
.
name
,
person
.
age
,
address
.
postcode
)))
4
//> res1: List[(String, Int, String)] = List((Maggie,32,AB1 2CD), (Maggie,3\
5
2,VB6 5UX), (May,41,HR9 5BH))
The combined use of flatMap
and map
here gives us a clue that this code could be written more concisely and pleasingly in for
-comprehension syntax (if you don’t know why this is, don’t worry - we’ll come back to it):
1
for
{
2
person <-
persons
3
address <-
addresses if
address.occupant ==
person
4
}
yield (
person.name,
person.age,
address.postcode)
However, as we said, this code is rather inefficient - it has to do the equivalent of a “table scan” across the addresses
collection for each person
in the persons
collection. If we had many more persons and addresses, then it might be better to construct an index:
1
val addressesByOccupant =
addresses.groupBy(
_.occupant)
2
for
{
3
person <-
persons
4
address <-
addressesByOccupant.getOrElse(
person,
List.empty)
5
}
yield (
person.name,
person.age,
address.postcode))
What does groupBy
do? It maps over the collection, using the supplied function to extract a key for each value, and building up a list of items which have the same key. The resulting set of key/list-of-items pairs is returned as a Map[Key, List[Item]]
. We can then use this to look up all the addresses having a given occupant (returning an empty List
if the person has no addresses).
A good SQL query optimiser will determine when it is more efficient to construct an index to compute a join, and try to ensure that the “query plan” into which the SQL statement is compiled implements a suitable strategy for retrieving and combining the required data. In this Scala code we are working at the level of the query execution engine which actually carries out the query plan, rather than defining a query in a declarative style which can then be optimised. Later on we’ll see how ScalaQuery makes it possible to define declarative queries, which are converted to SQL and passed to the database engine to optimise, using native Scala syntax.
All you need is fold
We could write a groupBy
function explicitly in terms of map
and foldRight
as follows:
1
def
groupBy
[
A
,
B
](
iterable
:
Iterable
[
A
],
key
:
A
=>
B
)
:
Map
[
B
,
List
[
A
]]
=
2
iterable
.
map
(
item
=>
(
key
(
item
),
item
))
3
.
foldRight
(
Map
.
empty
[
B
,
List
[
A
]])
{
(
map
,
pair
)
=>
4
map
+
(
pair
.
_1
->
(
pair
.
_2
::
map
.
getOrElse
(
pair
.
_1
,
List
.
empty
[
A
])))
5
}