Leanpub: Publish Early, Publish Often

4. Data and Functionality

From OOP we are used to thinking about data and functionality together: class hierarchies carry methods, and traits can demand that data fields exist. Runtime polymorphism of an object is in terms of “is a” relationships, requiring classes to inherit from common interfaces. This can get messy as a codebase grows. Simple data types become obscured by hundreds of lines of methods, trait mixins suffer from initialisation order errors, and testing / mocking of highly coupled components becomes a chore.

FP takes a different approach, defining data and functionality separately. In this chapter, we will cover the basics of data types and the advantages of constraining ourselves to a subset of the Scala language. We will also discover typeclasses as a way to achieve compiletime polymorphism: thinking about functionality of a data structure in terms of “has a” rather than “is a” relationships.

4.1 Data

The fundamental building blocks of data types are

final case class also known as products
sealed abstract class also known as coproducts
case object and Int, Double, String (etc) values

with no methods or fields other than the constructor parameters. We prefer abstract class to trait in order to get better binary compatibility and to discourage trait mixing.

The collective name for products, coproducts and values is Algebraic Data Type (ADT).

We compose data types from the AND and XOR (exclusive OR) Boolean algebra: a product contains every type that it is composed of, but a coproduct can be only one. For example

product: ABC = a AND b AND c
coproduct: XYZ = x XOR y XOR z

written in Scala

  // values
  case object A
  type B = String
  type C = Int
  
  // product
  final case class ABC(a: A.type, b: B, c: C)
  
  // coproduct
  sealed abstract class XYZ
  case object X extends XYZ
  case object Y extends XYZ
  final case class Z(b: B) extends XYZ

4.1.1 Generalised ADTs

When we introduce a type parameter into an ADT, we call it a Generalised Algebraic Data Type (GADT).

scalaz.IList, a safe alternative to the stdlib List, is a GADT:

  sealed abstract class IList[A]
  final case class INil[A]() extends IList[A]
  final case class ICons[A](head: A, tail: IList[A]) extends IList[A]

If an ADT refers to itself, we call it a recursive type. IList is recursive because ICons contains a reference to IList.

4.1.2 Functions on ADTs

ADTs can contain pure functions

  final case class UserConfiguration(accepts: Int => Boolean)

But ADTs that contain functions come with some caveats as they don’t translate perfectly onto the JVM. For example, legacy Serializable, hashCode, equals and toString do not behave as one might reasonably expect.

Unfortunately, Serializable is used by popular frameworks, despite far superior alternatives. A common pitfall is forgetting that Serializable may attempt to serialise the entire closure of a function, which can crash production servers. A similar caveat applies to legacy Java classes such as Throwable, which can carry references to arbitrary objects. This is one of the reasons why we restrict what can live on an ADT.

We will explore alternatives to the legacy methods when we discuss the scalaz library in the next chapter, at the cost of losing interoperability with some legacy Java and Scala code.

4.1.3 Exhaustivity

It is important that we use sealed abstract class, not just abstract class, when defining a data type. Sealing a class means that all subtypes must be defined in the same file, allowing the compiler to know about them in pattern match exhaustivity checks and in macros that eliminate boilerplate. e.g.

  scala> sealed abstract class Foo
         final case class Bar(flag: Boolean) extends Foo
         final case object Baz extends Foo
  
  scala> def thing(foo: Foo) = foo match {
           case Bar(_) => true
         }
  <console>:14: error: match may not be exhaustive.
  It would fail on the following input: Baz
         def thing(foo: Foo) = foo match {
                               ^

This shows the developer what they have broken when they add a new product to the codebase. We’re using -Xfatal-warnings, otherwise this is just a warning.

However, the compiler will not perform exhaustivity checking if the class is not sealed or if there are guards, e.g.

  scala> def thing(foo: Foo) = foo match {
           case Bar(flag) if flag => true
         }
  
  scala> thing(Baz)
  scala.MatchError: Baz (of class Baz$)
    at .thing(<console>:15)

To remain safe, don’t use guards on sealed types.

The -Xstrict-patmat-analysis flag has been proposed as a language improvement to perform additional pattern matcher checks.

4.1.4 Alternative Products and Coproducts

Another form of product is a tuple, which is like an unlabelled final case class.

(A.type, B, C) is equivalent to ABC in the above example but it is best to use final case class when part of an ADT because the lack of names is awkward to deal with.

Another form of coproduct is when we nest Either types. e.g.

  Either[X.type, Either[Y.type, Z]]

equivalent to the XYZ sealed abstract class. A cleaner syntax to define nested Either types is to create an alias type ending with a colon, allowing infix notation with association from the right:

  type |:[L,R] = Either[L, R]
  
  X.type |: Y.type |: Z

This is useful to create anonymous coproducts when you can’t put all the implementations into the same source file.

  type Accepted = String |: Long |: Boolean

Yet another alternative coproduct is to create a custom sealed abstract class with final case class definitions that simply wrap the desired type:

  sealed abstract class Accepted
  final case class AcceptedString(value: String) extends Accepted
  final case class AcceptedLong(value: Long) extends Accepted
  final case class AcceptedBoolean(value: Boolean) extends Accepted

Pattern matching on these forms of coproduct can be tedious, which is why Union Types are being explored in the Dotty next-generation scala compiler. Macros such as totalitarian and iotaz exist as alternative ways of encoding anonymous coproducts.

4.1.5 Convey Information

Besides being a container for necessary business information, data types can be used to encode constraints. For example,

  final case class NonEmptyList[A](head: A, tail: IList[A])

can never be empty. This makes scalaz.NonEmptyList a useful data type despite containing the same information as List.

Product types often contain types that are far more general than is allowed. In traditional OOP this would be handled with input validation through assertions:

  final case class Person(name: String, age: Int) {
    require(name.nonEmpty && age > 0) // breaks Totality, don't do this!
  }

Instead, we can use the Either data type to provide Right[Person] for valid instances and protect invalid instances from propagating. Note that the constructor is private:

  final case class Person private(name: String, age: Int)
  object Person {
    def apply(name: String, age: Int): Either[String, Person] = {
      if (name.nonEmpty && age > 0) Right(new Person(name, age))
      else Left(s"bad input: $name, $age")
    }
  }
  
  def welcome(person: Person): String =
    s"${person.name} you look wonderful at ${person.age}!"
  
  for {
    person <- Person("", -1)
  } yield welcome(person)

4.1.5.1 Refined Data Types

A clean way to restrict the values of a general type is with the refined library, providing a suite of restrictions to the contents of data. To install refined, add the following to build.sbt

  libraryDependencies += "eu.timepit" %% "refined-scalaz" % "0.9.2"

and the following imports

  import eu.timepit.refined
  import refined.api.Refined

Refined allows us to define Person using adhoc refined types to capture requirements exactly (typically written A Refined B rather than Refined[A, B])

  import refined.numeric.Positive
  import refined.collection.NonEmpty
  
  final case class Person(
    name: String Refined NonEmpty,
    age: Int Refined Positive
  )

A Refined B can be read as “an A that meets the requirements defined in B”. The underlying value can be obtained with .value. We can construct a value at runtime using .refineV

  scala> import refined.refineV
  scala> refineV[NonEmpty]("")
  Left(Predicate isEmpty() did not fail.)
  
  scala> refineV[NonEmpty]("Sam")
  Right(Sam)

And if we add the following import

  import refined.auto._

we can construct valid values at compiletime and get a compile error if the provided value does not meet the requirements

  scala> val sam: String Refined NonEmpty = "Sam"
  Sam
  
  scala> val empty: String Refined NonEmpty = ""
  <console>:21: error: Predicate isEmpty() did not fail.

More complex requirements can be captured, for example we can use the built-in rule MaxSize with the following imports

  import refined.W
  import refined.boolean.And
  import refined.collection.MaxSize

capturing the requirement that the String must be both non-empty and have a maximum size of 10 characters:

  type Name = NonEmpty And MaxSize[W.`10`.T]
  
  final case class Person(
    name: String Refined Name,
    age: Int Refined Positive
  )

It is easy to define custom requirements that are not covered by the refined library. For example, the requirement that a String contains a valid java.net.URL is as simple as the following, from the refined library:

  object string {
    final case class Url()
    object Url {
      implicit def urlValidate: refined.Validate.Plain[String, Url] =
        Validate.fromPartial(new java.net.URL(_), "Url", Url())
    }
  }

By not providing any functionality, ADTs can have a minimal set of dependencies. This makes them easy to publish and share with other developers. By using a simple data modelling language, it makes it possible to interact with cross-discipline teams, such as DBAs, UI developers and business analysts, using the actual code instead of a hand written document as the source of truth.

Furthermore, tooling can be more easily written to produce or consume schemas from other programming languages and wire protocols.

4.1.7 Counting Complexity

The complexity of a data type is the number of instances that can exist. A good data type has the least amount of complexity it needs to hold the information it conveys, and no more.

Values have a built-in complexity:

Unit has one instance (why it is called “unit”)
Boolean has two instances
Int has 4,294,967,295 instances
String has effectively infinite instances

To find the complexity of a product, we multiply the complexity of each part.

(Boolean, Boolean) has 4 instances (2*2)
(Boolean, Boolean, Boolean) has 8 instances (2*2*2)

To find the complexity of a coproduct, we add the complexity of each part.

(Boolean |: Boolean) has 4 instances (2+2)
(Boolean |: Boolean |: Boolean) has 6 instances (2+2+2)

To find the complexity of a GADT, multiply each part by the complexity of the type parameter:

Option[Boolean] has 3 instances, Some[Boolean] and None (2+1)

In FP, functions are total and must return an instance for every input, no Exception. Minimising the complexity of inputs and outputs is the best way to achieve totality. As a rule of thumb, it is a sign of a badly designed function when the complexity of a function’s return value is larger than the product of its inputs: it is a source of entropy.

The complexity of a total function itself is the number of possible functions that can satisfy the type signature: the output to the power of the input.

Unit => Boolean has complexity 2
Boolean => Boolean has complexity 4
Option[Boolean] => Option[Boolean] has complexity 27
Boolean => Int is a mere quintillion going on a sextillion.
Int => Boolean is so big that if all implementations were assigned a unique number, each would require 4 gigabytes to represent.

In reality, Int => Boolean will be something simple like isOdd, isEven or a sparse BitSet. This function, when used in an ADT, could be better replaced with a coproduct labelling the limited set of functions that are relevant.

When our complexity is “infinity in, infinity out” we should introduce restrictive data types and validation closer to the point of input with Refined from the previous section.

The ability to count the complexity of a type signature has one other practical application: we can find simpler type signatures with High School algebra! To go from a type signature to its algebra of complexity, simply replace

Either[A, B] with a + b
(A, B) with a * b
A => B with b ^ a

do some rearranging, and convert back. For example, say we’ve designed a framework based on callbacks and we’ve managed to work ourselves into the situation where we have created this type signature:

  (A => C) => ((B => C) => C)

We can convert and rearrange

  (c ^ (c ^ b)) ^ (c ^ a)
  = c ^ ((c ^ b) * (c ^ a))
  = c ^ (c ^ (a + b))

then convert back to types and get

  (Either[A, B] => C) => C

which is much simpler: we only need to ask the users of our framework to provide a Either[A, B] => C.

The same line of reasoning can be used to prove that

  A => B => C

is equivalent to

  (A, B) => C

also known as Currying.

4.1.8 Prefer Coproduct over Product

An archetypal modelling problem that comes up a lot is when there are mutually exclusive configuration parameters a, b and c. The product (a: Boolean, b: Boolean, c: Boolean) has complexity 8 whereas the coproduct

  sealed abstract class Config
  object Config {
    case object A extends Config
    case object B extends Config
    case object C extends Config
  }

has a complexity of 3. It is better to model these configuration parameters as a coproduct rather than allowing 5 invalid states to exist.

The complexity of a data type also has implications on testing. It is practically impossible to test every possible input to a function, but it is easy to test a sample of values with the scalacheck property testing framework. If a random sample of a data type has a low probability of being valid, it is a sign that the data is modelled incorrectly.

4.1.9 Optimisations

A big advantage of using a simplified subset of the Scala language to represent data types is that tooling can optimise the JVM bytecode representation.

For example, we can pack Boolean and Option fields into an Array[Byte], cache instances, memoise hashCode, optimise equals, use @switch statements when pattern matching, and much more.

These optimisations are not applicable to OOP class hierarchies that may be managing state, throwing exceptions, or providing adhoc method implementations.

4.2 Functionality

Pure functions are typically defined as methods on an object.

  package object math {
    def sin(x: Double): Double = java.lang.Math.sin(x)
    ...
  }
  
  math.sin(1.0)

However, it can be clunky to use object methods since it reads inside-out, not left to right. In addition, a function on an object steals the namespace. If we were to define sin(t: T) somewhere else we get ambiguous reference errors. This is the same problem as Java’s static methods vs class methods.

With the implicit class language feature (also known as extension methodology or syntax), and a little boilerplate, we can get the familiar style:

  scala> implicit class DoubleOps(x: Double) {
           def sin: Double = math.sin(x)
         }
  
  scala> 1.0.sin
  res: Double = 0.8414709848078965

Often it is best to just skip the object definition and go straight for an implicit class, keeping boilerplate to a minimum:

  implicit class DoubleOps(x: Double) {
    def sin: Double = java.lang.Math.sin(x)
  }

implicit class is syntax sugar for an implicit conversion:

  implicit def DoubleOps(x: Double): DoubleOps = new DoubleOps(x)
  class DoubleOps(x: Double) {
    def sin: Double = java.lang.Math.sin(x)
  }

Which unfortunately has a runtime cost: each time the extension method is called, an intermediate DoubleOps will be constructed and then thrown away. This can contribute to GC pressure in hotspots.

There is a slightly more verbose form of implicit class that avoids the allocation and is therefore preferred:

  implicit final class DoubleOps(val x: Double) extends AnyVal {
    def sin: Double = java.lang.Math.sin(x)
  }

4.2.1 Polymorphic Functions

The more common kind of function is a polymorphic function, which lives in a typeclass. A typeclass is a trait that:

holds no state
has a type parameter
has at least one abstract method (primitive combinators)
may contain generalised methods (derived combinators)
may extend other typeclasses

There can only be one implementation of a typeclass for any given type parameter, a property known as typeclass coherence. Typeclasses look superficially similar to algebraic interfaces from the previous chapter, but algebras do not have to be coherent.

Typeclasses are used in the Scala stdlib. We’ll explore a simplified version of scala.math.Numeric to demonstrate the principle:

  trait Ordering[T] {
    def compare(x: T, y: T): Int
  
    def lt(x: T, y: T): Boolean = compare(x, y) < 0
    def gt(x: T, y: T): Boolean = compare(x, y) > 0
  }
  
  trait Numeric[T] extends Ordering[T] {
    def plus(x: T, y: T): T
    def times(x: T, y: T): T
    def negate(x: T): T
    def zero: T
  
    def abs(x: T): T = if (lt(x, zero)) negate(x) else x
  }

We can see all the key features of a typeclass in action:

there is no state
Ordering and Numeric have type parameter T
Ordering has abstract compare and Numeric has abstract plus, times, negate and zero
Ordering defines generalised lt and gt based on compare, Numeric defines abs in terms of lt, negate and zero.
Numeric extends Ordering

We can now write functions for types that “have a” Numeric typeclass:

  def signOfTheTimes[T](t: T)(implicit N: Numeric[T]): T = {
    import N._
    times(negate(abs(t)), t)
  }

We are no longer dependent on the OOP hierarchy of our input types, i.e. we don’t demand that our input “is a” Numeric, which is vitally important if we want to support a third party class that we cannot redefine.

Another advantage of typeclasses is that the association of functionality to data is at compiletime, as opposed to OOP runtime dynamic dispatch.

For example, whereas the List class can only have one implementation of a method, a typeclass method allows us to have a different implementation depending on the List contents and therefore offload work to compiletime instead of leaving it to runtime.

4.2.2 Syntax

The syntax for writing signOfTheTimes is clunky, there are some things we can do to clean it up.

Downstream users will prefer to see our method use context bounds, since the signature reads cleanly as “takes a T that has a Numeric”

  def signOfTheTimes[T: Numeric](t: T): T = ...

but now we have to use implicitly[Numeric[T]] everywhere. By defining boilerplate on the companion of the typeclass

  object Numeric {
    def apply[T](implicit numeric: Numeric[T]): Numeric[T] = numeric
  }

we can obtain the implicit with less noise

  def signOfTheTimes[T: Numeric](t: T): T = {
    val N = Numeric[T]
    import N._
    times(negate(abs(t)), t)
  }

But it is still worse for us as the implementors. We have the syntactic problem of inside-out static methods vs class methods. We deal with this by introducing ops on the typeclass companion:

  object Numeric {
    def apply[T](implicit numeric: Numeric[T]): Numeric[T] = numeric
  
    object ops {
      implicit class NumericOps[T](t: T)(implicit N: Numeric[T]) {
        def +(o: T): T = N.plus(t, o)
        def *(o: T): T = N.times(t, o)
        def unary_-: T = N.negate(t)
        def abs: T = N.abs(t)
  
        // duplicated from Ordering.ops
        def <(o: T): T = N.lt(t, o)
        def >(o: T): T = N.gt(t, o)
      }
    }
  }

Note that -x is expanded into x.unary_- by the compiler’s syntax sugar, which is why we define unary_- as an extension method. We can now write the much cleaner:

  import Numeric.ops._
  def signOfTheTimes[T: Numeric](t: T): T = -(t.abs) * t

The good news is that we never need to write this boilerplate because Simulacrum provides a @typeclass macro annotation to have the companion apply and ops automatically generated. It even allows us to define alternative (usually symbolic) names for common methods. In full:

  import simulacrum._
  
  @typeclass trait Ordering[T] {
    def compare(x: T, y: T): Int
    @op("<") def lt(x: T, y: T): Boolean = compare(x, y) < 0
    @op(">") def gt(x: T, y: T): Boolean = compare(x, y) > 0
  }
  
  @typeclass trait Numeric[T] extends Ordering[T] {
    @op("+") def plus(x: T, y: T): T
    @op("*") def times(x: T, y: T): T
    @op("unary_-") def negate(x: T): T
    def zero: T
    def abs(x: T): T = if (lt(x, zero)) negate(x) else x
  }
  
  import Numeric.ops._
  def signOfTheTimes[T: Numeric](t: T): T = -(t.abs) * t

4.2.3 Instances

Instances of Numeric (which are also instances of Ordering) are defined as an implicit val that extends the typeclass, and can provide optimised implementations for the generalised methods:

  implicit val NumericDouble: Numeric[Double] = new Numeric[Double] {
    def plus(x: Double, y: Double): Double = x + y
    def times(x: Double, y: Double): Double = x * y
    def negate(x: Double): Double = -x
    def zero: Double = 0.0
    def compare(x: Double, y: Double): Int = java.lang.Double.compare(x, y)
  
    // optimised
    override def lt(x: Double, y: Double): Boolean = x < y
    override def gt(x: Double, y: Double): Boolean = x > y
    override def abs(x: Double): Double = java.lang.Math.abs(x)
  }

Although we are using +, *, unary_-, < and > here, which are the ops (and could be an infinite loop!), these methods exist already on Double. Class methods are always used in preference to extension methods. Indeed, the scala compiler performs special handling of primitives and converts these method calls into raw dadd, dmul, dcmpl and dcmpg bytecode instructions, respectively.

We can also implement Numeric for Java’s BigDecimal class (avoid scala.BigDecimal, it is fundamentally broken)

  import java.math.{ BigDecimal => BD }
  
  implicit val NumericBD: Numeric[BD] = new Numeric[BD] {
    def plus(x: BD, y: BD): BD = x.add(y)
    def times(x: BD, y: BD): BD = x.multiply(y)
    def negate(x: BD): BD = x.negate
    def zero: BD = BD.ZERO
    def compare(x: BD, y: BD): Int = x.compareTo(y)
  }

We could even take some liberties and create our own data structure for complex numbers:

  final case class Complex[T](r: T, i: T)

And derive a Numeric[Complex[T]] if Numeric[T] exists. Since these instances depend on the type parameter, it is a def, not a val.

  implicit def numericComplex[T: Numeric]: Numeric[Complex[T]] =
    new Numeric[Complex[T]] {
      type CT = Complex[T]
      def plus(x: CT, y: CT): CT = Complex(x.r + y.r, x.i + y.i)
      def times(x: CT, y: CT): CT =
        Complex(x.r * y.r + (-x.i * y.i), x.r * y.i + x.i * y.r)
      def negate(x: CT): CT = Complex(-x.r, -x.i)
      def zero: CT = Complex(Numeric[T].zero, Numeric[T].zero)
      def compare(x: CT, y: CT): Int = {
        val real = (Numeric[T].compare(x.r, y.r))
        if (real != 0) real
        else Numeric[T].compare(x.i, y.i)
      }
    }

The observant reader may notice that abs is not at all what a mathematician would expect. The correct return value for abs should be T, not Complex[T].

scala.math.Numeric tries to do too much and does not generalise beyond real numbers. This is a good lesson that smaller, well defined, typeclasses are often better than a monolithic collection of overly specific features.

If you need to write generic code that works for a wide range of number types, prefer spire to the stdlib. Indeed, in the next chapter we will see that concepts such as having a zero element, or adding two values, are worthy of their own typeclass.

4.2.4 Implicit Resolution

We’ve discussed implicits a lot: this section is to clarify what implicits are and how they work.

Implicit parameters are when a method requests that a unique instance of a particular type is in the implicit scope of the caller, with special syntax for typeclass instances. Implicit parameters are a clean way to thread configuration through an application.

In this example, foo requires that typeclasses instances of Numeric and Typeable are available for A, as well as an implicit Handler object that takes two type parameters

  def foo[A: Numeric: Typeable](implicit A: Handler[String, A]) = ...

Implicit conversion is when an implicit def exists. One such use of implicit conversions is to enable extension methodology. When the compiler is resolving a call to a method, it first checks if the method exists on the type, then its ancestors (Java-like rules). If it fails to find a match, it will search the implicit scope for conversions to other types, then search for methods on those types.

Another use for implicit conversion is typeclass derivation. In the previous section we wrote an implicit def that derived a Numeric[Complex[T]] if a Numeric[T] is in the implicit scope. It is possible to chain together many implicit def (including recursively) which is the basis of typeful programming, allowing for computations to be performed at compiletime rather than runtime.

The glue that combines implicit parameters (receivers) with implicit conversion (providers) is implicit resolution.

First, the normal variable scope is searched for implicits, in order:

local scope, including scoped imports (e.g. the block or method)
outer scope, including scoped imports (e.g. members in the class)
ancestors (e.g. members in the super class)
the current package object
ancestor package objects (only when using nested packages)
the file’s imports

If that fails to find a match, the special scope is searched, which looks for implicit instances inside a type’s companion, its package object, outer objects (if nested), and then repeated for ancestors. This is performed, in order, for the:

given parameter type
expected parameter type
type parameter (if there is one)

If two matching implicits are found in the same phase of implicit resolution, an ambiguous implicit error is raised.

Implicits are often defined on a trait, which is then extended by an object. This is to try and control the priority of an implicit relative to another more specific one, to avoid ambiguous implicits.

The Scala Language Specification is rather vague for corner cases, and the compiler implementation is the de facto standard. There are some rules of thumb that we will use throughout this book, e.g. prefer implicit val over implicit object despite the temptation of less typing. It is a quirk of implicit resolution that implicit object on companion objects are not treated the same as implicit val.

Implicit resolution falls short when there is a hierarchy of typeclasses, like Ordering and Numeric. If we write a function that takes an implicit Ordering, and we call it for a type which has an instance of Numeric defined on the Numeric companion, the compiler will fail to find it. A workaround is to add implicit conversions to the companion of Ordering that up-cast more specific instances. Fixed In Dotty.

Implicit resolution is particularly hit-or-miss if type aliases are used where the shape of the implicit parameters are changed. For example an implicit parameter using an alias such as type Values[A] = List[Option[A]] will probably fail to find implicits defined as raw List[Option[A]] because the shape is changed from a thing of things of A to a thing of A.

4.3 Modelling OAuth2

We will finish this chapter with a practical example of data modelling and typeclass derivation, combined with algebra / module design from the previous chapter.

In our drone-dynamic-agents application, we must communicate with Drone and Google Cloud using JSON over REST. Both services use OAuth2 for authentication. There are many ways to interpret OAuth2, but we will focus on the version that works for Google Cloud (the Drone version is even simpler).

4.3.1 Description

Every Google Cloud application needs to have an OAuth 2.0 Client Key set up at

  https://console.developers.google.com/apis/credentials?project={PROJECT_ID}

You will be provided with a Client ID and a Client secret.

The application can then obtain a one time code by making the user perform an Authorization Request in their browser (yes, really, in their browser). We need to make this page open in the browser:

  https://accounts.google.com/o/oauth2/v2/auth?\
    redirect_uri={CALLBACK_URI}&\
    prompt=consent&\
    response_type=code&\
    scope={SCOPE}&\
    access_type=offline&\
    client_id={CLIENT_ID}

The code is delivered to the {CALLBACK_URI} in a GET request. To capture it in our application, we need to have a web server listening on localhost.

Once we have the code, we can perform an Access Token Request:

  POST /oauth2/v4/token HTTP/1.1
  Host: www.googleapis.com
  Content-length: {CONTENT_LENGTH}
  content-type: application/x-www-form-urlencoded
  user-agent: google-oauth-playground
  code={CODE}&\
    redirect_uri={CALLBACK_URI}&\
    client_id={CLIENT_ID}&\
    client_secret={CLIENT_SECRET}&\
    scope={SCOPE}&\
    grant_type=authorization_code

which gives a JSON response payload

  {
    "access_token": "BEARER_TOKEN",
    "token_type": "Bearer",
    "expires_in": 3600,
    "refresh_token": "REFRESH_TOKEN"
  }

Bearer tokens typically expire after an hour, and can be refreshed by sending an HTTP request with any valid refresh token:

  POST /oauth2/v4/token HTTP/1.1
  Host: www.googleapis.com
  Content-length: {CONTENT_LENGTH}
  content-type: application/x-www-form-urlencoded
  user-agent: google-oauth-playground
  client_secret={CLIENT_SECRET}&
    grant_type=refresh_token&
    refresh_token={REFRESH_TOKEN}&
    client_id={CLIENT_ID}

responding with

  {
    "access_token": "BEARER_TOKEN",
    "token_type": "Bearer",
    "expires_in": 3600
  }

All userland requests to the server should include the header

  Authorization: Bearer BEARER_TOKEN

after substituting the actual BEARER<sub>TOKEN</sub>.

Google expires all but the most recent 50 bearer tokens, so the expiry times are just guidance. The refresh tokens persist between sessions and can be expired manually by the user. We can therefore have a one-time setup application to obtain the refresh token and then include the refresh token as configuration for the user’s install of the headless server.

Drone doesn’t implement the /auth endpoint, or the refresh, and simply provides a BEARER<sub>TOKEN</sub> through their user interface.

4.3.2 Data

The first step is to model the data needed for OAuth2. We create an ADT with fields having exactly the same name as required by the OAuth2 server. We will use String and Long for brevity, but we could use refined types if they leak into our business models.

  package http.oauth2.client.api
  
  import refined.api.Refined
  import refined.string.Url
  
  final case class AuthRequest(
    redirect_uri: String Refined Url,
    scope: String,
    client_id: String,
    prompt: String = "consent",
    response_type: String = "code",
    access_type: String = "offline"
  )
  final case class AccessRequest(
    code: String,
    redirect_uri: String Refined Url,
    client_id: String,
    client_secret: String,
    scope: String = "",
    grant_type: String = "authorization_code"
  )
  final case class AccessResponse(
    access_token: String,
    token_type: String,
    expires_in: Long,
    refresh_token: String
  )
  final case class RefreshRequest(
    client_secret: String,
    refresh_token: String,
    client_id: String,
    grant_type: String = "refresh_token"
  )
  final case class RefreshResponse(
    access_token: String,
    token_type: String,
    expires_in: Long
  )

4.3.3 Functionality

We need to marshal the data classes we defined in the previous section into JSON, URLs and POST-encoded forms. Since this requires polymorphism, we will need typeclasses.

jsonformat is a simple JSON library that we will study in more detail in a later chapter, as it has been written with principled FP and ease of readability as its primary design objectives. It consists of a JSON AST and encoder / decoder typeclasses:

  package jsonformat
  
  sealed abstract class JsValue
  final case object JsNull                                    extends JsValue
  final case class JsObject(fields: IList[(String, JsValue)]) extends JsValue
  final case class JsArray(elements: IList[JsValue])          extends JsValue
  final case class JsBoolean(value: Boolean)                  extends JsValue
  final case class JsString(value: String)                    extends JsValue
  final case class JsDouble(value: Double)                    extends JsValue
  final case class JsInteger(value: Long)                     extends JsValue
  
  @typeclass trait JsEncoder[A] {
    def toJson(obj: A): JsValue
  }
  
  @typeclass trait JsDecoder[A] {
    def fromJson(json: JsValue): String \/ A
  }

\/ is scalaz’s Either and has a .flatMap. We can use it in for comprehensions, whereas stdlib Either does not support .flatMap prior to Scala 2.12.

We need instances of JsDecoder[AccessResponse] and JsDecoder[RefreshResponse]. We can do this by making use of a helper function:

  implicit class JsValueOps(j: JsValue) {
    def getAs[A: JsDecoder](key: String): String \/ A = ...
  }

We put the instances on the companions of our data types, so that they are always in the implicit scope:

  import jsonformat._, JsDecoder.ops._
  
  object AccessResponse {
    implicit val json: JsDecoder[AccessResponse] = j =>
      for {
        acc <- j.getAs[String]("access_token")
        tpe <- j.getAs[String]("token_type")
        exp <- j.getAs[Long]("expires_in")
        ref <- j.getAs[String]("refresh_token")
      } yield AccessResponse(acc, tpe, exp, ref)
  }
  
  object RefreshResponse {
    implicit val json: JsDecoder[RefreshResponse] = j =>
      for {
        acc <- j.getAs[String]("access_token")
        tpe <- j.getAs[String]("token_type")
        exp <- j.getAs[Long]("expires_in")
      } yield RefreshResponse(acc, tpe, exp)
  }

We can then parse a string into an AccessResponse or a RefreshResponse

  scala> import jsonformat._, JsDecoder.ops._
  scala> val json = JsParser("""
                       {
                         "access_token": "BEARER_TOKEN",
                         "token_type": "Bearer",
                         "expires_in": 3600,
                         "refresh_token": "REFRESH_TOKEN"
                       }
                       """)
  
  scala> json.map(_.as[AccessResponse])
  AccessResponse(BEARER_TOKEN,Bearer,3600,REFRESH_TOKEN)

We need to write our own typeclasses for URL and POST encoding. The following is a reasonable design:

  package http.encoding
  
  // URL query key=value pairs, in unencoded form.
  final case class UrlQuery(params: List[(String, String)])
  
  @typeclass trait UrlQueryWriter[A] {
    def toUrlQuery(a: A): UrlQuery
  }
  
  @typeclass trait UrlEncodedWriter[A] {
    def toUrlEncoded(a: A): String Refined UrlEncoded
  }

We need to provide typeclass instances for basic types:

  object UrlEncodedWriter {
    implicit val encoded: UrlEncodedWriter[String Refined UrlEncoded] = identity
  
    implicit val string: UrlEncodedWriter[String] =
      (s => Refined.unsafeApply(URLEncoder.encode(s, "UTF-8")))
  
    implicit val long: UrlEncodedWriter[Long] =
      (s => Refined.unsafeApply(s.toString))
  
    implicit def ilist[K: UrlEncodedWriter, V: UrlEncodedWriter]
      : UrlEncodedWriter[IList[(K, V)]] = { m =>
      val raw = m.map {
        case (k, v) => k.toUrlEncoded.value + "=" + v.toUrlEncoded.value
      }.intercalate("&")
      Refined.unsafeApply(raw) // by deduction
    }
  
  }

We use Refined.unsafeApply when we can logically deduce that the contents of the string are already url encoded, bypassing any further checks.

ilist is an example of simple typeclass derivation, much as we derived Numeric[Complex] from the underlying numeric representation. The .intercalate method is like .mkString but more general.

UrlEncodedWriter is making use of the Single Abstract Method (SAM types) Scala language feature. The full form of the above is

  implicit val string: UrlEncodedWriter[String] =
    new UrlEncodedWriter[String] {
      override def toUrlEncoded(s: String): String = ...
    }

When the Scala compiler expects a class (which has a single abstract method) but receives a lambda, it fills in the boilerplate automatically.

Prior to SAM types, a common pattern was to define a method named instance on the typeclass companion

  def instance[T](f: T => String): UrlEncodedWriter[T] =
    new UrlEncodedWriter[T] {
      override def toUrlEncoded(t: T): String = f(t)
    }

allowing for

  implicit val string: UrlEncodedWriter[String] = instance { s => ... }

This pattern is still used in code that must support older versions of Scala, or for typeclasses instances that need to provide more than one method.

Note that there are a lot of bugs around SAM types, as they do not interact with all the language features. Revert to the non-SAM variant if there are any strange compiler crashes.

In a dedicated chapter on Typeclass Derivation we will calculate instances of UrlQueryWriter and UrlEncodedWriter automatically, as well as clean up what we have already written, but for now we will write the boilerplate for the types we wish to convert:

  import UrlEncodedWriter.ops._
  object AuthRequest {
    private def stringify[T: UrlEncodedWriter](t: T) =
      java.net.URLDecoder.decode(t.toUrlEncoded.value, "UTF-8")
  
    implicit val query: UrlQueryWriter[AuthRequest] = { a =>
      UriQuery(List(
        ("redirect_uri"  -> stringify(a.redirect_uri)),
        ("scope"         -> stringify(a.scope)),
        ("client_id"     -> stringify(a.client_id)),
        ("prompt"        -> stringify(a.prompt)),
        ("response_type" -> stringify(a.response_type)),
        ("access_type"   -> stringify(a.access_type)))
    }
  }
  object AccessRequest {
    implicit val encoded: UrlEncodedWriter[AccessRequest] = { a =>
      Seq(
        "code"          -> a.code.toUrlEncoded,
        "redirect_uri"  -> a.redirect_uri.toUrlEncoded,
        "client_id"     -> a.client_id.toUrlEncoded,
        "client_secret" -> a.client_secret.toUrlEncoded,
        "scope"         -> a.scope.toUrlEncoded,
        "grant_type"    -> a.grant_type.toUrlEncoded
      ).toUrlEncoded
    }
  }
  object RefreshRequest {
    implicit val encoded: UrlEncodedWriter[RefreshRequest] = { r =>
      Seq(
        "client_secret" -> r.client_secret.toUrlEncoded,
        "refresh_token" -> r.refresh_token.toUrlEncoded,
        "client_id"     -> r.client_id.toUrlEncoded,
        "grant_type"    -> r.grant_type.toUrlEncoded
      ).toUrlEncoded
    }
  }

4.3.4 Module

That concludes the data and functionality modelling required to implement OAuth2. Recall from the previous chapter that we define mockable components that need to interact with the world as algebras, and we define pure business logic in a module.

We define our dependency algebras, and use context bounds to show that our responses must have a JsDecoder and our POST payload must have a UrlEncodedWriter:

  package http.client.algebra
  
  trait JsonClient[F[_]] {
    def get[A: JsDecoder](
      uri: String Refined Url,
      headers: IList[HttpHeader]
    ): F[A]
  
    def postUrlencoded[P: UrlEncoded, A: JsDecoder](
      uri: String Refined Url,
      payload: P,
      headers: IList[HttpHeader]
    ): F[A]
  }

Note that we only define the happy path in the JsonClient API. We will get around to error handling in a later chapter.

  package http.oauth2.client.algebra
  
  final case class CodeToken(token: String, redirect_uri: String Refined Url)
  
  trait UserInteraction[F[_]] {
    /** returns the URL of the local server */
    def start: F[String Refined Url]
  
    /** prompts the user to open this URL */
    def open(uri: String Refined Url): F[Unit]
  
    /** recover the code from the callback */
    def stop: F[CodeToken]
  }
  
  trait LocalClock[F[_]] {
    def now: F[Epoch]
  }

some convenient data classes

  final case class ServerConfig(
    auth: String Refined Url,
    access: String Refined Url,
    refresh: String Refined Url,
    scope: String,
    clientId: String,
    clientSecret: String
  )
  final case class RefreshToken(token: String)
  final case class BearerToken(token: String, expires: Epoch)

and then write an OAuth2 client:

  import http.encoding.UrlQueryWriter.ops._
  
  class OAuth2Client[F[_]: Monad](
    config: ServerConfig
  )(
    implicit
    user: UserInteraction[F],
    client: JsonClient[F],
    clock: LocalClock[F]
  ) {
    def authenticate: F[CodeToken] =
      for {
        callback <- user.start
        params   = AuthRequest(callback, config.scope, config.clientId)
        _        <- user.open(params.toUrlQuery.forUrl(config.auth))
        code     <- user.stop
      } yield code
  
    def access(code: CodeToken): F[(RefreshToken, BearerToken)] =
      for {
        request <- AccessRequest(code.token,
                                 code.redirect_uri,
                                 config.clientId,
                                 config.clientSecret).pure[F]
        msg     <- client.postUrlencoded[AccessRequest, AccessResponse](
                     config.access, request)
        time    <- clock.now
        expires = time + msg.expires_in.seconds
        refresh = RefreshToken(msg.refresh_token)
        bearer  = BearerToken(msg.access_token, expires)
      } yield (refresh, bearer)
  
    def bearer(refresh: RefreshToken): F[BearerToken] =
      for {
        request <- RefreshRequest(config.clientSecret,
                                  refresh.token,
                                  config.clientId).pure[F]
        msg     <- client.postUrlencoded[RefreshRequest, RefreshResponse](
                     config.refresh, request)
        time    <- clock.now
        expires = time + msg.expires_in.seconds
        bearer  = BearerToken(msg.access_token, expires)
      } yield bearer
  }

4.4 Summary

data types are defined as products (final case class) and coproducts (sealed abstract class).
Refined types can enforce constraints on values
specific functions are defined on implicit class
polymorphic functions are defined as typeclasses. Functionality is provided via “has a” context bounds, rather than “is a” class hierarchies.
typeclass instances are implementations of the typeclass.
@simulacrum.typeclass generates .ops on the companion, providing convenient syntax for types that have a typeclass instance.
typeclass derivation is compiletime composition of typeclass instances.
generic instances automatically derive instances for your data types.

Up next

5. Scalaz Typeclasses