Leanpub: Publish Early, Publish Often

2. For Comprehensions

Scala’s for comprehension is the ideal FP abstraction for sequential programs that interact with the world. Since we’ll be using it a lot, we’re going to relearn the principles of for and how scalaz can help us to write cleaner code.

This chapter doesn’t try to write pure programs and the techniques are applicable to non-FP codebases.

2.1 Syntax Sugar

Scala’s for is just a simple rewrite rule, also called syntax sugar, that doesn’t have any contextual information.

To see what a for comprehension is doing, we use the show and reify feature in the REPL to print out what code looks like after type inference.

  scala> import scala.reflect.runtime.universe._
  scala> val a, b, c = Option(1)
  scala> show { reify {
           for { i <- a ; j <- b ; k <- c } yield (i + j + k)
         } }
  
  res:
  $read.a.flatMap(
    ((i) => $read.b.flatMap(
      ((j) => $read.c.map(
        ((k) => i.$plus(j).$plus(k)))))))

There is a lot of noise due to additional sugarings (e.g. + is rewritten $plus, etc). We’ll skip the show and reify for brevity when the REPL line is reify>, and manually clean up the generated code so that it doesn’t become a distraction.

  reify> for { i <- a ; j <- b ; k <- c } yield (i + j + k)
  
  a.flatMap {
    i => b.flatMap {
      j => c.map {
        k => i + j + k }}}

The rule of thumb is that every <- (called a generator) is a nested flatMap call, with the final generator a map containing the yield body.

2.1.1 Assignment

We can assign values inline like ij = i + j (a val keyword is not needed).

  reify> for {
           i <- a
           j <- b
           ij = i + j
           k <- c
         } yield (ij + k)
  
  a.flatMap {
    i => b.map { j => (j, i + j) }.flatMap {
      case (j, ij) => c.map {
        k => ij + k }}}

A map over the b introduces the ij which is flat-mapped along with the j, then the final map for the code in the yield.

Unfortunately we cannot assign before any generators. It has been requested as a language feature but has not been implemented: https://github.com/scala/bug/issues/907

  scala> for {
           initial = getDefault
           i <- a
         } yield initial + i
  <console>:1: error: '<-' expected but '=' found.

We can workaround the limitation by defining a val outside the for

  scala> val initial = getDefault
  scala> for { i <- a } yield initial + i

or create an Option out of the initial assignment

  scala> for {
           initial <- Option(getDefault)
           i <- a
         } yield initial + i

val doesn’t have to assign to a single value, it can be anything that works as a case in a pattern match.

  scala> val (first, second) = ("hello", "world")
  first: String = hello
  second: String = world
  
  scala> val list: List[Int] = ...
  scala> val head :: tail = list
  head: Int = 1
  tail: List[Int] = List(2, 3)

The same is true for assignment in for comprehensions

  scala> val maybe = Option(("hello", "world"))
  scala> for {
           entry <- maybe
           (first, _) = entry
         } yield first
  res: Some(hello)

But be careful that you don’t miss any cases or you’ll get a runtime exception (a totality failure).

  scala> val a :: tail = list
  caught scala.MatchError: List()

2.1.2 Filter

It is possible to put if statements after a generator to filter values by a predicate

  reify> for {
           i  <- a
           j  <- b
           if i > j
           k  <- c
         } yield (i + j + k)
  
  a.flatMap {
    i => b.withFilter {
      j => i > j }.flatMap {
        j => c.map {
          k => i + j + k }}}

Older versions of scala used filter, but Traversable.filter creates new collections for every predicate, so withFilter was introduced as the more performant alternative.

We can accidentally trigger a withFilter by providing type information: it is actually interpreted as a pattern match.

  reify> for { i: Int <- a } yield i
  
  a.withFilter {
    case i: Int => true
    case _      => false
  }.map { case i: Int => i }

Like in assignment, a generator can use a pattern match on the left hand side. But unlike assignment (which throws MatchError on failure), generators are filtered and will not fail at runtime. However, there is an inefficient double application of the pattern.

The compiler plugin better-monadic-for produces alternative, better, desugarings than the scala compiler. This example is interpreted as:

  reify> for { i: Int <- a } yield i
  
  a.map { (i: Int) => i}

instead of inefficient double matching (in the best case) and silent filtering at runtime (in the worst case). Highly recommended.

2.1.3 For Each

Finally, if there is no yield, the compiler will use foreach instead of flatMap, which is only useful for side-effects.

  reify> for { i <- a ; j <- b } println(s"$i $j")
  
  a.foreach { i => b.foreach { j => println(s"$i $j") } }

2.1.4 Summary

The full set of methods supported by for comprehensions do not share a common super type; each generated snippet is independently compiled. If there were a trait, it would roughly look like:

  trait ForComprehensible[C[_]] {
    def map[A, B](f: A => B): C[B]
    def flatMap[A, B](f: A => C[B]): C[B]
    def withFilter[A](p: A => Boolean): C[A]
    def foreach[A](f: A => Unit): Unit
  }

If the context (C[_]) of a for comprehension doesn’t provide its own map and flatMap, all is not lost. If an implicit scalaz.Bind[T] is available for T, it will provide map and flatMap.

It often surprises developers when inline Future calculations in a for comprehension do not run in parallel:

  import scala.concurrent._
  import ExecutionContext.Implicits.global
  
  for {
    i <- Future { expensiveCalc() }
    j <- Future { anotherExpensiveCalc() }
  } yield (i + j)

This is because the flatMap spawning anotherExpensiveCalc is strictly after expensiveCalc. To ensure that two Future calculations begin in parallel, start them outside the for comprehension.

  val a = Future { expensiveCalc() }
  val b = Future { anotherExpensiveCalc() }
  for { i <- a ; j <- b } yield (i + j)

for comprehensions are fundamentally for defining sequential programs. We will show a far superior way of defining parallel computations in a later chapter. Spoiler: don’t use Future.

2.2 Unhappy path

So far we’ve only looked at the rewrite rules, not what is happening in map and flatMap. Let’s consider what happens when the for context decides that it can’t proceed any further.

In the Option example, the yield is only called when i,j,k are all defined.

  for {
    i <- a
    j <- b
    k <- c
  } yield (i + j + k)

If any of a,b,c are None, the comprehension short-circuits with None but it doesn’t tell us what went wrong.

How often have you seen a function that takes Option parameters but requires them all to exist? An alternative to throwing a runtime exception is to use a for comprehension, giving us totality (a return value for every input):

  def namedThings(
    someName  : Option[String],
    someNumber: Option[Int]
  ): Option[String] = for {
    name   <- someName
    number <- someNumber
  } yield s"$number ${name}s"

but this is verbose, clunky and bad style. If a function requires every input then it should make its requirement explicit, pushing the responsibility of dealing with optional parameters to its caller — don’t use for unless you need to.

  def namedThings(name: String, num: Int) = s"$num ${name}s"

If we use Either, then a Left will cause the for comprehension to short circuit with extra information, much better than Option for error reporting:

  scala> val a = Right(1)
  scala> val b = Right(2)
  scala> val c: Either[String, Int] = Left("sorry, no c")
  scala> for { i <- a ; j <- b ; k <- c } yield (i + j + k)
  
  Left(sorry, no c)

And lastly, let’s see what happens with a Future that fails:

  scala> import scala.concurrent._
  scala> import ExecutionContext.Implicits.global
  scala> for {
           i <- Future.failed[Int](new Throwable)
           j <- Future { println("hello") ; 1 }
         } yield (i + j)
  scala> Await.result(f, duration.Duration.Inf)
  caught java.lang.Throwable

The Future that prints to the terminal is never called because, like Option and Either, the for comprehension short circuits.

Short circuiting for the unhappy path is a common and important theme. for comprehensions cannot express resource cleanup: there is no way to try / finally. This is good, in FP it puts a clear ownership of responsibility for unexpected error recovery and resource cleanup onto the context (which is usually a Monad as we’ll see later), not the business logic.

2.3 Gymnastics

Although it is easy to rewrite simple sequential code as a for comprehension, sometimes we’ll want to do something that appears to require mental summersaults. This section collects some practical examples and how to deal with them.

2.3.1 Fallback Logic

Let’s say we are calling out to a method that returns an Option and if it is not successful we want to fallback to another method (and so on and so on), like when we’re using a cache:

  def getFromRedis(s: String): Option[String]
  def getFromSql(s: String): Option[String]
  
  getFromRedis(key) orElse getFromSql(key)

If we have to do this for an asynchronous version of the same API

  def getFromRedis(s: String): Future[Option[String]]
  def getFromSql(s: String): Future[Option[String]]

then we have to be careful not to do extra work because

  for {
    cache <- getFromRedis(key)
    sql   <- getFromSql(key)
  } yield cache orElse sql

will run both queries. We can pattern match on the first result but the type is wrong

  for {
    cache <- getFromRedis(key)
    res   <- cache match {
               case Some(_) => cache !!! wrong type !!!
               case None    => getFromSql(key)
             }
  } yield res

We need to create a Future from the cache

  for {
    cache <- getFromRedis(key)
    res   <- cache match {
               case Some(_) => Future.successful(cache)
               case None    => getFromSql(key)
             }
  } yield res

Future.successful creates a new Future, much like an Option or List constructor.

If functional programming was like this all the time, it’d be a nightmare. Thankfully these tricky situations are the corner cases.

2.3.2 Early Exit

Let’s say we have some condition that should exit early with a successful value.

If we want to exit early with an error, it is standard practice in OOP to throw an exception

  def getA: Int = ...
  
  val a = getA
  require(a > 0, s"$a must be positive")
  a * 10

which can be rewritten async

  def getA: Future[Int] = ...
  def error(msg: String): Future[Nothing] =
    Future.failed(new RuntimeException(msg))
  
  for {
    a <- getA
    b <- if (a <= 0) error(s"$a must be positive")
         else Future.successful(a)
  } yield b * 10

But if we want to exit early with a successful return value, the simple synchronous code:

  def getB: Int = ...
  
  val a = getA
  if (a <= 0) 0
  else a * getB

translates into a nested for comprehension when our dependencies are asynchronous:

  def getB: Future[Int] = ...
  
  for {
    a <- getA
    c <- if (a <= 0) Future.successful(0)
         else for { b <- getB } yield a * b
  } yield c

If there is an implicit Monad[T] for T[_] (i.e. T is monadic) then scalaz lets us create a T[A] from a value a: A by calling a.pure[T].

Scalaz provides Monad[Future], and .pure[Future] calls Future.successful. Besides pure being slightly shorter to type, it is a general concept that works beyond Future, and is therefore recommended.

  for {
    a <- getA
    c <- if (a <= 0) 0.pure[Future]
         else for { b <- getB } yield a * b
  } yield c

2.4 Incomprehensible

The context we’re comprehending over must stay the same: we can’t mix contexts.

  scala> def option: Option[Int] = ...
  scala> def future: Future[Int] = ...
  scala> for {
           a <- option
           b <- future
         } yield a * b
  <console>:23: error: type mismatch;
   found   : Future[Int]
   required: Option[?]
           b <- future
                ^

Nothing can help us mix arbitrary contexts in a for comprehension because the meaning is not well defined.

But when we have nested contexts the intention is usually obvious yet the compiler still doesn’t accept our code.

  scala> def getA: Future[Option[Int]] = ...
  scala> def getB: Future[Option[Int]] = ...
  scala> for {
           a <- getA
           b <- getB
         } yield a * b
  <console>:30: error: value * is not a member of Option[Int]
         } yield a * b
                   ^

Here we want for to take care of the outer context and let us write our code on the inner Option. Hiding the outer context is exactly what a monad transformer does, and scalaz provides implementations for Option and Either named OptionT and EitherT respectively.

The outer context can be anything that normally works in a for comprehension, but it needs to stay the same throughout.

We create an OptionT from each method call. This changes the context of the for from Future[Option[_]] to OptionT[Future, _].

  scala> val result = for {
           a <- OptionT(getA)
           b <- OptionT(getB)
         } yield a * b
  result: OptionT[Future, Int] = OptionT(Future(<not completed>))

.run returns us to the original context

  scala> result.run
  res: Future[Option[Int]] = Future(<not completed>)

Alternatively, OptionT[Future, Int] has getOrElse and getOrElseF methods, taking Int and Future[Int] respectively, returning a Future[Int].

The monad transformer also allows us to mix Future[Option[_]] calls with methods that just return plain Future via .liftM[OptionT] (provided by scalaz):

  scala> def getC: Future[Int] = ...
  scala> val result = for {
           a <- OptionT(getA)
           b <- OptionT(getB)
           c <- getC.liftM[OptionT]
         } yield a * b / c
  result: OptionT[Future, Int] = OptionT(Future(<not completed>))

and we can mix with methods that return plain Option by wrapping them in Future.successful (.pure[Future]) followed by OptionT

  scala> def getD: Option[Int] = ...
  scala> val result = for {
           a <- OptionT(getA)
           b <- OptionT(getB)
           c <- getC.liftM[OptionT]
           d <- OptionT(getD.pure[Future])
         } yield (a * b) / (c * d)
  result: OptionT[Future, Int] = OptionT(Future(<not completed>))

It is messy again, but it is better than writing nested flatMap and map by hand. We can clean it up with a DSL that handles all the required conversions into OptionT[Future, _]

  def liftFutureOption[A](f: Future[Option[A]]) = OptionT(f)
  def liftFuture[A](f: Future[A]) = f.liftM[OptionT]
  def liftOption[A](o: Option[A]) = OptionT(o.pure[Future])
  def lift[A](a: A)               = liftOption(Option(a))

combined with the |> operator, which applies the function on the right to the value on the left, to visually separate the logic from the transformers

  scala> val result = for {
           a <- getA       |> liftFutureOption
           b <- getB       |> liftFutureOption
           c <- getC       |> liftFuture
           d <- getD       |> liftOption
           e <- 10         |> lift
         } yield e * (a * b) / (c * d)
  result: OptionT[Future, Int] = OptionT(Future(<not completed>))

This approach also works for EitherT (and others) as the inner context, but their lifting methods are more complex and require parameters. Scalaz provides monad transformers for a lot of its own types, so it is worth checking if one is available.

Implementing a monad transformer is an advanced topic. Although ListT exists, it should be avoided because it can unintentionally reorder flatMap calls according to https://github.com/scalaz/scalaz/issues/921. A better alternative is StreamT, which we will visit later.

Up next

3. Application Design