4. Data and Functionality
From OOP we are used to thinking about data and functionality together: class hierarchies carry methods, and traits can demand that data fields exist. Runtime polymorphism of an object is in terms of “is a” relationships, requiring classes to inherit from common interfaces. This can get messy as a codebase grows. Simple data types become obscured by hundreds of lines of methods, trait mixins suffer from initialisation order errors, and testing / mocking of highly coupled components becomes a chore.
FP takes a different approach, defining data and functionality separately. In this chapter, we will cover the basics of data types and the advantages of constraining ourselves to a subset of the Scala language. We will also discover typeclasses as a way to achieve compiletime polymorphism: thinking about functionality of a data structure in terms of “has a” rather than “is a” relationships.
4.1 Data
The fundamental building blocks of data types are
-
final case classalso known as products -
sealed abstract classalso known as coproducts -
case objectandInt,Double,String(etc) values
with no methods or fields other than the constructor parameters. We prefer
abstract class to trait in order to get better binary compatibility and to
discourage trait mixing.
The collective name for products, coproducts and values is Algebraic Data Type (ADT).
We compose data types from the AND and XOR (exclusive OR)
Boolean algebra: a product contains every type that it is composed of,
but a coproduct can be only one. For example
- product:
ABC = a AND b AND c - coproduct:
XYZ = x XOR y XOR z
written in Scala
// values
case object A
type B = String
type C = Int
// product
final case class ABC(a: A.type, b: B, c: C)
// coproduct
sealed abstract class XYZ
case object X extends XYZ
case object Y extends XYZ
final case class Z(b: B) extends XYZ
4.1.1 Generalised ADTs
When we introduce a type parameter into an ADT, we call it a Generalised Algebraic Data Type (GADT).
scalaz.IList, a safe alternative to the stdlib List, is a GADT:
sealed abstract class IList[A]
final case class INil[A]() extends IList[A]
final case class ICons[A](head: A, tail: IList[A]) extends IList[A]
If an ADT refers to itself, we call it a recursive type. IList is
recursive because ICons contains a reference to IList.
4.1.2 Functions on ADTs
ADTs can contain pure functions
final case class UserConfiguration(accepts: Int => Boolean)
But ADTs that contain functions come with some caveats as they don’t
translate perfectly onto the JVM. For example, legacy Serializable,
hashCode, equals and toString do not behave as one might
reasonably expect.
Unfortunately, Serializable is used by popular frameworks, despite
far superior alternatives. A common pitfall is forgetting that
Serializable may attempt to serialise the entire closure of a
function, which can crash production servers. A similar caveat applies
to legacy Java classes such as Throwable, which can carry references
to arbitrary objects. This is one of the reasons why we restrict what
can live on an ADT.
We will explore alternatives to the legacy methods when we discuss the scalaz library in the next chapter, at the cost of losing interoperability with some legacy Java and Scala code.
4.1.3 Exhaustivity
It is important that we use sealed abstract class, not just
abstract class, when defining a data type. Sealing a class means
that all subtypes must be defined in the same file, allowing the
compiler to know about them in pattern match exhaustivity checks and
in macros that eliminate boilerplate. e.g.
scala> sealed abstract class Foo
final case class Bar(flag: Boolean) extends Foo
final case object Baz extends Foo
scala> def thing(foo: Foo) = foo match {
case Bar(_) => true
}
<console>:14: error: match may not be exhaustive.
It would fail on the following input: Baz
def thing(foo: Foo) = foo match {
^
This shows the developer what they have broken when they add a new
product to the codebase. We’re using -Xfatal-warnings, otherwise
this is just a warning.
However, the compiler will not perform exhaustivity checking if the
class is not sealed or if there are guards, e.g.
scala> def thing(foo: Foo) = foo match {
case Bar(flag) if flag => true
}
scala> thing(Baz)
scala.MatchError: Baz (of class Baz$)
at .thing(<console>:15)
To remain safe, don’t use guards on sealed types.
The -Xstrict-patmat-analysis flag has been proposed as a language
improvement to perform additional pattern matcher checks.
4.1.4 Alternative Products and Coproducts
Another form of product is a tuple, which is like an unlabelled final
case class.
(A.type, B, C) is equivalent to ABC in the above example but it is
best to use final case class when part of an ADT because the lack of
names is awkward to deal with.
Another form of coproduct is when we nest Either types. e.g.
Either[X.type, Either[Y.type, Z]]
equivalent to the XYZ sealed abstract class. A cleaner syntax to define
nested Either types is to create an alias type ending with a colon,
allowing infix notation with association from the right:
type |:[L,R] = Either[L, R]
X.type |: Y.type |: Z
This is useful to create anonymous coproducts when you can’t put all the implementations into the same source file.
type Accepted = String |: Long |: Boolean
Yet another alternative coproduct is to create a custom sealed abstract class
with final case class definitions that simply wrap the desired type:
sealed abstract class Accepted
final case class AcceptedString(value: String) extends Accepted
final case class AcceptedLong(value: Long) extends Accepted
final case class AcceptedBoolean(value: Boolean) extends Accepted
Pattern matching on these forms of coproduct can be tedious, which is why Union Types are being explored in the Dotty next-generation scala compiler. Macros such as totalitarian and iotaz exist as alternative ways of encoding anonymous coproducts.
4.1.5 Convey Information
Besides being a container for necessary business information, data types can be used to encode constraints. For example,
final case class NonEmptyList[A](head: A, tail: IList[A])
can never be empty. This makes scalaz.NonEmptyList a useful data type despite
containing the same information as List.
Product types often contain types that are far more general than is allowed. In traditional OOP this would be handled with input validation through assertions:
final case class Person(name: String, age: Int) {
require(name.nonEmpty && age > 0) // breaks Totality, don't do this!
}
Instead, we can use the Either data type to provide Right[Person] for valid
instances and protect invalid instances from propagating. Note that the
constructor is private:
final case class Person private(name: String, age: Int)
object Person {
def apply(name: String, age: Int): Either[String, Person] = {
if (name.nonEmpty && age > 0) Right(new Person(name, age))
else Left(s"bad input: $name, $age")
}
}
def welcome(person: Person): String =
s"${person.name} you look wonderful at ${person.age}!"
for {
person <- Person("", -1)
} yield welcome(person)
4.1.5.1 Refined Data Types
A clean way to restrict the values of a general type is with the refined
library, providing a suite of restrictions to the contents of data. To install
refined, add the following to build.sbt
libraryDependencies += "eu.timepit" %% "refined-scalaz" % "0.9.2"
and the following imports
import eu.timepit.refined
import refined.api.Refined
Refined allows us to define Person using adhoc refined types to capture
requirements exactly (typically written A Refined B rather than Refined[A,
B])
import refined.numeric.Positive
import refined.collection.NonEmpty
final case class Person(
name: String Refined NonEmpty,
age: Int Refined Positive
)
A Refined B can be read as “an A that meets the requirements defined in
B”. The underlying value can be obtained with .value. We can construct a
value at runtime using .refineV
scala> import refined.refineV
scala> refineV[NonEmpty]("")
Left(Predicate isEmpty() did not fail.)
scala> refineV[NonEmpty]("Sam")
Right(Sam)
And if we add the following import
import refined.auto._
we can construct valid values at compiletime and get a compile error if the provided value does not meet the requirements
scala> val sam: String Refined NonEmpty = "Sam"
Sam
scala> val empty: String Refined NonEmpty = ""
<console>:21: error: Predicate isEmpty() did not fail.
More complex requirements can be captured, for example we can use the built-in
rule MaxSize with the following imports
import refined.W
import refined.boolean.And
import refined.collection.MaxSize
capturing the requirement that the String must be both non-empty and have a
maximum size of 10 characters:
type Name = NonEmpty And MaxSize[W.`10`.T]
final case class Person(
name: String Refined Name,
age: Int Refined Positive
)
It is easy to define custom requirements that are not covered by the refined
library. For example, the requirement that a String contains a valid
java.net.URL is as simple as the following, from the refined library:
object string {
final case class Url()
object Url {
implicit def urlValidate: refined.Validate.Plain[String, Url] =
Validate.fromPartial(new java.net.URL(_), "Url", Url())
}
}
4.1.6 Simple to Share
By not providing any functionality, ADTs can have a minimal set of dependencies. This makes them easy to publish and share with other developers. By using a simple data modelling language, it makes it possible to interact with cross-discipline teams, such as DBAs, UI developers and business analysts, using the actual code instead of a hand written document as the source of truth.
Furthermore, tooling can be more easily written to produce or consume schemas from other programming languages and wire protocols.
4.1.7 Counting Complexity
The complexity of a data type is the number of instances that can exist. A good data type has the least amount of complexity it needs to hold the information it conveys, and no more.
Values have a built-in complexity:
-
Unithas one instance (why it is called “unit”) -
Booleanhas two instances -
Inthas 4,294,967,295 instances -
Stringhas effectively infinite instances
To find the complexity of a product, we multiply the complexity of each part.
-
(Boolean, Boolean)has 4 instances (2*2) -
(Boolean, Boolean, Boolean)has 8 instances (2*2*2)
To find the complexity of a coproduct, we add the complexity of each part.
-
(Boolean |: Boolean)has 4 instances (2+2) -
(Boolean |: Boolean |: Boolean)has 6 instances (2+2+2)
To find the complexity of a GADT, multiply each part by the complexity of the type parameter:
-
Option[Boolean]has 3 instances,Some[Boolean]andNone(2+1)
In FP, functions are total and must return an instance for every
input, no Exception. Minimising the complexity of inputs and outputs
is the best way to achieve totality. As a rule of thumb, it is a sign
of a badly designed function when the complexity of a function’s
return value is larger than the product of its inputs: it is a source
of entropy.
The complexity of a total function itself is the number of possible functions that can satisfy the type signature: the output to the power of the input.
-
Unit => Booleanhas complexity 2 -
Boolean => Booleanhas complexity 4 -
Option[Boolean] => Option[Boolean]has complexity 27 -
Boolean => Intis a mere quintillion going on a sextillion. -
Int => Booleanis so big that if all implementations were assigned a unique number, each would require 4 gigabytes to represent.
In reality, Int => Boolean will be something simple like isOdd, isEven or
a sparse BitSet. This function, when used in an ADT, could be better replaced
with a coproduct labelling the limited set of functions that are relevant.
When our complexity is “infinity in, infinity out” we should introduce
restrictive data types and validation closer to the point of input with
Refined from the previous section.
The ability to count the complexity of a type signature has one other practical application: we can find simpler type signatures with High School algebra! To go from a type signature to its algebra of complexity, simply replace
-
Either[A, B]witha + b -
(A, B)witha * b -
A => Bwithb ^ a
do some rearranging, and convert back. For example, say we’ve designed a framework based on callbacks and we’ve managed to work ourselves into the situation where we have created this type signature:
(A => C) => ((B => C) => C)
We can convert and rearrange
(c ^ (c ^ b)) ^ (c ^ a)
= c ^ ((c ^ b) * (c ^ a))
= c ^ (c ^ (a + b))
then convert back to types and get
(Either[A, B] => C) => C
which is much simpler: we only need to ask the users of our framework to provide
a Either[A, B] => C.
The same line of reasoning can be used to prove that
A => B => C
is equivalent to
(A, B) => C
also known as Currying.
4.1.8 Prefer Coproduct over Product
An archetypal modelling problem that comes up a lot is when there are
mutually exclusive configuration parameters a, b and c. The
product (a: Boolean, b: Boolean, c: Boolean) has complexity 8
whereas the coproduct
sealed abstract class Config
object Config {
case object A extends Config
case object B extends Config
case object C extends Config
}
has a complexity of 3. It is better to model these configuration parameters as a coproduct rather than allowing 5 invalid states to exist.
The complexity of a data type also has implications on testing. It is practically impossible to test every possible input to a function, but it is easy to test a sample of values with the scalacheck property testing framework. If a random sample of a data type has a low probability of being valid, it is a sign that the data is modelled incorrectly.
4.1.9 Optimisations
A big advantage of using a simplified subset of the Scala language to represent data types is that tooling can optimise the JVM bytecode representation.
For example, we can pack Boolean and Option fields into an Array[Byte],
cache instances, memoise hashCode, optimise equals, use @switch statements
when pattern matching, and much more.
These optimisations are not applicable to OOP class hierarchies that
may be managing state, throwing exceptions, or providing adhoc method
implementations.
4.2 Functionality
Pure functions are typically defined as methods on an object.
package object math {
def sin(x: Double): Double = java.lang.Math.sin(x)
...
}
math.sin(1.0)
However, it can be clunky to use object methods since it reads
inside-out, not left to right. In addition, a function on an object
steals the namespace. If we were to define sin(t: T) somewhere else
we get ambiguous reference errors. This is the same problem as
Java’s static methods vs class methods.
With the implicit class language feature (also known as extension
methodology or syntax), and a little boilerplate, we can get the
familiar style:
scala> implicit class DoubleOps(x: Double) {
def sin: Double = math.sin(x)
}
scala> 1.0.sin
res: Double = 0.8414709848078965
Often it is best to just skip the object definition and go straight
for an implicit class, keeping boilerplate to a minimum:
implicit class DoubleOps(x: Double) {
def sin: Double = java.lang.Math.sin(x)
}
4.2.1 Polymorphic Functions
The more common kind of function is a polymorphic function, which lives in a typeclass. A typeclass is a trait that:
- holds no state
- has a type parameter
- has at least one abstract method (primitive combinators)
- may contain generalised methods (derived combinators)
- may extend other typeclasses
There can only be one implementation of a typeclass for any given type parameter, a property known as typeclass coherence. Typeclasses look superficially similar to algebraic interfaces from the previous chapter, but algebras do not have to be coherent.
Typeclasses are used in the Scala stdlib. We’ll explore a simplified
version of scala.math.Numeric to demonstrate the principle:
trait Ordering[T] {
def compare(x: T, y: T): Int
def lt(x: T, y: T): Boolean = compare(x, y) < 0
def gt(x: T, y: T): Boolean = compare(x, y) > 0
}
trait Numeric[T] extends Ordering[T] {
def plus(x: T, y: T): T
def times(x: T, y: T): T
def negate(x: T): T
def zero: T
def abs(x: T): T = if (lt(x, zero)) negate(x) else x
}
We can see all the key features of a typeclass in action:
- there is no state
-
OrderingandNumerichave type parameterT -
Orderinghas abstractcompareandNumerichas abstractplus,times,negateandzero -
Orderingdefines generalisedltandgtbased oncompare,Numericdefinesabsin terms oflt,negateandzero. -
NumericextendsOrdering
We can now write functions for types that “have a” Numeric
typeclass:
def signOfTheTimes[T](t: T)(implicit N: Numeric[T]): T = {
import N._
times(negate(abs(t)), t)
}
We are no longer dependent on the OOP hierarchy of our input types,
i.e. we don’t demand that our input “is a” Numeric, which is vitally
important if we want to support a third party class that we cannot
redefine.
Another advantage of typeclasses is that the association of functionality to data is at compiletime, as opposed to OOP runtime dynamic dispatch.
For example, whereas the List class can only have one implementation
of a method, a typeclass method allows us to have a different
implementation depending on the List contents and therefore offload
work to compiletime instead of leaving it to runtime.
4.2.2 Syntax
The syntax for writing signOfTheTimes is clunky, there are some
things we can do to clean it up.
Downstream users will prefer to see our method use context bounds,
since the signature reads cleanly as “takes a T that has a
Numeric”
def signOfTheTimes[T: Numeric](t: T): T = ...
but now we have to use implicitly[Numeric[T]] everywhere. By
defining boilerplate on the companion of the typeclass
object Numeric {
def apply[T](implicit numeric: Numeric[T]): Numeric[T] = numeric
}
we can obtain the implicit with less noise
def signOfTheTimes[T: Numeric](t: T): T = {
val N = Numeric[T]
import N._
times(negate(abs(t)), t)
}
But it is still worse for us as the implementors. We have the
syntactic problem of inside-out static methods vs class methods. We
deal with this by introducing ops on the typeclass companion:
object Numeric {
def apply[T](implicit numeric: Numeric[T]): Numeric[T] = numeric
object ops {
implicit class NumericOps[T](t: T)(implicit N: Numeric[T]) {
def +(o: T): T = N.plus(t, o)
def *(o: T): T = N.times(t, o)
def unary_-: T = N.negate(t)
def abs: T = N.abs(t)
// duplicated from Ordering.ops
def <(o: T): T = N.lt(t, o)
def >(o: T): T = N.gt(t, o)
}
}
}
Note that -x is expanded into x.unary_- by the compiler’s syntax
sugar, which is why we define unary_- as an extension method. We can
now write the much cleaner:
import Numeric.ops._
def signOfTheTimes[T: Numeric](t: T): T = -(t.abs) * t
The good news is that we never need to write this boilerplate because
Simulacrum provides a @typeclass macro annotation to have the
companion apply and ops automatically generated. It even allows us
to define alternative (usually symbolic) names for common methods. In
full:
import simulacrum._
@typeclass trait Ordering[T] {
def compare(x: T, y: T): Int
@op("<") def lt(x: T, y: T): Boolean = compare(x, y) < 0
@op(">") def gt(x: T, y: T): Boolean = compare(x, y) > 0
}
@typeclass trait Numeric[T] extends Ordering[T] {
@op("+") def plus(x: T, y: T): T
@op("*") def times(x: T, y: T): T
@op("unary_-") def negate(x: T): T
def zero: T
def abs(x: T): T = if (lt(x, zero)) negate(x) else x
}
import Numeric.ops._
def signOfTheTimes[T: Numeric](t: T): T = -(t.abs) * t
4.2.3 Instances
Instances of Numeric (which are also instances of Ordering) are
defined as an implicit val that extends the typeclass, and can
provide optimised implementations for the generalised methods:
implicit val NumericDouble: Numeric[Double] = new Numeric[Double] {
def plus(x: Double, y: Double): Double = x + y
def times(x: Double, y: Double): Double = x * y
def negate(x: Double): Double = -x
def zero: Double = 0.0
def compare(x: Double, y: Double): Int = java.lang.Double.compare(x, y)
// optimised
override def lt(x: Double, y: Double): Boolean = x < y
override def gt(x: Double, y: Double): Boolean = x > y
override def abs(x: Double): Double = java.lang.Math.abs(x)
}
Although we are using +, *, unary_-, < and > here, which are
the ops (and could be an infinite loop!), these methods exist already
on Double. Class methods are always used in preference to extension
methods. Indeed, the scala compiler performs special handling of
primitives and converts these method calls into raw dadd, dmul,
dcmpl and dcmpg bytecode instructions, respectively.
We can also implement Numeric for Java’s BigDecimal class (avoid
scala.BigDecimal, it is fundamentally broken)
import java.math.{ BigDecimal => BD }
implicit val NumericBD: Numeric[BD] = new Numeric[BD] {
def plus(x: BD, y: BD): BD = x.add(y)
def times(x: BD, y: BD): BD = x.multiply(y)
def negate(x: BD): BD = x.negate
def zero: BD = BD.ZERO
def compare(x: BD, y: BD): Int = x.compareTo(y)
}
We could even take some liberties and create our own data structure for complex numbers:
final case class Complex[T](r: T, i: T)
And derive a Numeric[Complex[T]] if Numeric[T] exists. Since these
instances depend on the type parameter, it is a def, not a val.
implicit def numericComplex[T: Numeric]: Numeric[Complex[T]] =
new Numeric[Complex[T]] {
type CT = Complex[T]
def plus(x: CT, y: CT): CT = Complex(x.r + y.r, x.i + y.i)
def times(x: CT, y: CT): CT =
Complex(x.r * y.r + (-x.i * y.i), x.r * y.i + x.i * y.r)
def negate(x: CT): CT = Complex(-x.r, -x.i)
def zero: CT = Complex(Numeric[T].zero, Numeric[T].zero)
def compare(x: CT, y: CT): Int = {
val real = (Numeric[T].compare(x.r, y.r))
if (real != 0) real
else Numeric[T].compare(x.i, y.i)
}
}
The observant reader may notice that abs is not at all what a
mathematician would expect. The correct return value for abs should
be T, not Complex[T].
scala.math.Numeric tries to do too much and does not generalise
beyond real numbers. This is a good lesson that smaller, well defined,
typeclasses are often better than a monolithic collection of overly
specific features.
If you need to write generic code that works for a wide range of number types, prefer spire to the stdlib. Indeed, in the next chapter we will see that concepts such as having a zero element, or adding two values, are worthy of their own typeclass.
4.2.4 Implicit Resolution
We’ve discussed implicits a lot: this section is to clarify what implicits are and how they work.
Implicit parameters are when a method requests that a unique instance of a particular type is in the implicit scope of the caller, with special syntax for typeclass instances. Implicit parameters are a clean way to thread configuration through an application.
In this example, foo requires that typeclasses instances of Numeric and
Typeable are available for A, as well as an implicit Handler object that
takes two type parameters
def foo[A: Numeric: Typeable](implicit A: Handler[String, A]) = ...
Implicit conversion is when an implicit def exists. One such use
of implicit conversions is to enable extension methodology. When the
compiler is resolving a call to a method, it first checks if the
method exists on the type, then its ancestors (Java-like rules). If it
fails to find a match, it will search the implicit scope for
conversions to other types, then search for methods on those types.
Another use for implicit conversion is typeclass derivation. In the
previous section we wrote an implicit def that derived a
Numeric[Complex[T]] if a Numeric[T] is in the implicit scope. It
is possible to chain together many implicit def (including
recursively) which is the basis of typeful programming, allowing for
computations to be performed at compiletime rather than runtime.
The glue that combines implicit parameters (receivers) with implicit conversion (providers) is implicit resolution.
First, the normal variable scope is searched for implicits, in order:
- local scope, including scoped imports (e.g. the block or method)
- outer scope, including scoped imports (e.g. members in the class)
- ancestors (e.g. members in the super class)
- the current package object
- ancestor package objects (only when using nested packages)
- the file’s imports
If that fails to find a match, the special scope is searched, which looks for implicit instances inside a type’s companion, its package object, outer objects (if nested), and then repeated for ancestors. This is performed, in order, for the:
- given parameter type
- expected parameter type
- type parameter (if there is one)
If two matching implicits are found in the same phase of implicit resolution, an ambiguous implicit error is raised.
Implicits are often defined on a trait, which is then extended by an
object. This is to try and control the priority of an implicit
relative to another more specific one, to avoid ambiguous implicits.
The Scala Language Specification is rather vague for corner cases, and
the compiler implementation is the de facto standard. There are some
rules of thumb that we will use throughout this book, e.g. prefer
implicit val over implicit object despite the temptation of less
typing. It is a quirk of implicit resolution that implicit object on
companion objects are not treated the same as implicit val.
Implicit resolution falls short when there is a hierarchy of
typeclasses, like Ordering and Numeric. If we write a function
that takes an implicit Ordering, and we call it for a type which has
an instance of Numeric defined on the Numeric companion, the
compiler will fail to find it. A workaround is to add implicit
conversions to the companion of Ordering that up-cast more specific
instances. Fixed In Dotty.
Implicit resolution is particularly hit-or-miss if type aliases are used where
the shape of the implicit parameters are changed. For example an implicit
parameter using an alias such as type Values[A] = List[Option[A]] will
probably fail to find implicits defined as raw List[Option[A]] because the
shape is changed from a thing of things of A to a thing of A.
4.3 Modelling OAuth2
We will finish this chapter with a practical example of data modelling and typeclass derivation, combined with algebra / module design from the previous chapter.
In our drone-dynamic-agents application, we must communicate with Drone and
Google Cloud using JSON over REST. Both services use OAuth2 for authentication.
There are many ways to interpret OAuth2, but we will focus on the version that
works for Google Cloud (the Drone version is even simpler).
4.3.1 Description
Every Google Cloud application needs to have an OAuth 2.0 Client Key set up at
https://console.developers.google.com/apis/credentials?project={PROJECT_ID}
You will be provided with a Client ID and a Client secret.
The application can then obtain a one time code by making the user perform an Authorization Request in their browser (yes, really, in their browser). We need to make this page open in the browser:
https://accounts.google.com/o/oauth2/v2/auth?\
redirect_uri={CALLBACK_URI}&\
prompt=consent&\
response_type=code&\
scope={SCOPE}&\
access_type=offline&\
client_id={CLIENT_ID}
The code is delivered to the {CALLBACK_URI} in a GET request. To
capture it in our application, we need to have a web server listening
on localhost.
Once we have the code, we can perform an Access Token Request:
POST /oauth2/v4/token HTTP/1.1
Host: www.googleapis.com
Content-length: {CONTENT_LENGTH}
content-type: application/x-www-form-urlencoded
user-agent: google-oauth-playground
code={CODE}&\
redirect_uri={CALLBACK_URI}&\
client_id={CLIENT_ID}&\
client_secret={CLIENT_SECRET}&\
scope={SCOPE}&\
grant_type=authorization_code
which gives a JSON response payload
{
"access_token": "BEARER_TOKEN",
"token_type": "Bearer",
"expires_in": 3600,
"refresh_token": "REFRESH_TOKEN"
}
Bearer tokens typically expire after an hour, and can be refreshed by sending an HTTP request with any valid refresh token:
POST /oauth2/v4/token HTTP/1.1
Host: www.googleapis.com
Content-length: {CONTENT_LENGTH}
content-type: application/x-www-form-urlencoded
user-agent: google-oauth-playground
client_secret={CLIENT_SECRET}&
grant_type=refresh_token&
refresh_token={REFRESH_TOKEN}&
client_id={CLIENT_ID}
responding with
{
"access_token": "BEARER_TOKEN",
"token_type": "Bearer",
"expires_in": 3600
}
All userland requests to the server should include the header
Authorization: Bearer BEARER_TOKEN
after substituting the actual BEARER<sub>TOKEN</sub>.
Google expires all but the most recent 50 bearer tokens, so the expiry times are just guidance. The refresh tokens persist between sessions and can be expired manually by the user. We can therefore have a one-time setup application to obtain the refresh token and then include the refresh token as configuration for the user’s install of the headless server.
Drone doesn’t implement the /auth endpoint, or the refresh, and simply
provides a BEARER<sub>TOKEN</sub> through their user interface.
4.3.2 Data
The first step is to model the data needed for OAuth2. We create an ADT with
fields having exactly the same name as required by the OAuth2 server. We will
use String and Long for brevity, but we could use refined types if they leak
into our business models.
package http.oauth2.client.api
import refined.api.Refined
import refined.string.Url
final case class AuthRequest(
redirect_uri: String Refined Url,
scope: String,
client_id: String,
prompt: String = "consent",
response_type: String = "code",
access_type: String = "offline"
)
final case class AccessRequest(
code: String,
redirect_uri: String Refined Url,
client_id: String,
client_secret: String,
scope: String = "",
grant_type: String = "authorization_code"
)
final case class AccessResponse(
access_token: String,
token_type: String,
expires_in: Long,
refresh_token: String
)
final case class RefreshRequest(
client_secret: String,
refresh_token: String,
client_id: String,
grant_type: String = "refresh_token"
)
final case class RefreshResponse(
access_token: String,
token_type: String,
expires_in: Long
)
4.3.3 Functionality
We need to marshal the data classes we defined in the previous section into JSON, URLs and POST-encoded forms. Since this requires polymorphism, we will need typeclasses.
jsonformat is a simple JSON library that we will study in more detail in a
later chapter, as it has been written with principled FP and ease of readability
as its primary design objectives. It consists of a JSON AST and encoder /
decoder typeclasses:
package jsonformat
sealed abstract class JsValue
final case object JsNull extends JsValue
final case class JsObject(fields: IList[(String, JsValue)]) extends JsValue
final case class JsArray(elements: IList[JsValue]) extends JsValue
final case class JsBoolean(value: Boolean) extends JsValue
final case class JsString(value: String) extends JsValue
final case class JsDouble(value: Double) extends JsValue
final case class JsInteger(value: Long) extends JsValue
@typeclass trait JsEncoder[A] {
def toJson(obj: A): JsValue
}
@typeclass trait JsDecoder[A] {
def fromJson(json: JsValue): String \/ A
}
\/ is scalaz’s Either and has a .flatMap. We can use it in for
comprehensions, whereas stdlib Either does not support .flatMap prior to
Scala 2.12.
We need instances of JsDecoder[AccessResponse] and JsDecoder[RefreshResponse].
We can do this by making use of a helper function:
implicit class JsValueOps(j: JsValue) {
def getAs[A: JsDecoder](key: String): String \/ A = ...
}
We put the instances on the companions of our data types, so that they are always in the implicit scope:
import jsonformat._, JsDecoder.ops._
object AccessResponse {
implicit val json: JsDecoder[AccessResponse] = j =>
for {
acc <- j.getAs[String]("access_token")
tpe <- j.getAs[String]("token_type")
exp <- j.getAs[Long]("expires_in")
ref <- j.getAs[String]("refresh_token")
} yield AccessResponse(acc, tpe, exp, ref)
}
object RefreshResponse {
implicit val json: JsDecoder[RefreshResponse] = j =>
for {
acc <- j.getAs[String]("access_token")
tpe <- j.getAs[String]("token_type")
exp <- j.getAs[Long]("expires_in")
} yield RefreshResponse(acc, tpe, exp)
}
We can then parse a string into an AccessResponse or a RefreshResponse
scala> import jsonformat._, JsDecoder.ops._
scala> val json = JsParser("""
{
"access_token": "BEARER_TOKEN",
"token_type": "Bearer",
"expires_in": 3600,
"refresh_token": "REFRESH_TOKEN"
}
""")
scala> json.map(_.as[AccessResponse])
AccessResponse(BEARER_TOKEN,Bearer,3600,REFRESH_TOKEN)
We need to write our own typeclasses for URL and POST encoding. The following is a reasonable design:
package http.encoding
// URL query key=value pairs, in unencoded form.
final case class UrlQuery(params: List[(String, String)])
@typeclass trait UrlQueryWriter[A] {
def toUrlQuery(a: A): UrlQuery
}
@typeclass trait UrlEncodedWriter[A] {
def toUrlEncoded(a: A): String Refined UrlEncoded
}
We need to provide typeclass instances for basic types:
object UrlEncodedWriter {
implicit val encoded: UrlEncodedWriter[String Refined UrlEncoded] = identity
implicit val string: UrlEncodedWriter[String] =
(s => Refined.unsafeApply(URLEncoder.encode(s, "UTF-8")))
implicit val long: UrlEncodedWriter[Long] =
(s => Refined.unsafeApply(s.toString))
implicit def ilist[K: UrlEncodedWriter, V: UrlEncodedWriter]
: UrlEncodedWriter[IList[(K, V)]] = { m =>
val raw = m.map {
case (k, v) => k.toUrlEncoded.value + "=" + v.toUrlEncoded.value
}.intercalate("&")
Refined.unsafeApply(raw) // by deduction
}
}
We use Refined.unsafeApply when we can logically deduce that the contents of
the string are already url encoded, bypassing any further checks.
ilist is an example of simple typeclass derivation, much as we derived
Numeric[Complex] from the underlying numeric representation. The
.intercalate method is like .mkString but more general.
In a dedicated chapter on Typeclass Derivation we will calculate instances of
UrlQueryWriter and UrlEncodedWriter automatically, as well as clean up what
we have already written, but for now we will write the boilerplate for the types
we wish to convert:
import UrlEncodedWriter.ops._
object AuthRequest {
private def stringify[T: UrlEncodedWriter](t: T) =
java.net.URLDecoder.decode(t.toUrlEncoded.value, "UTF-8")
implicit val query: UrlQueryWriter[AuthRequest] = { a =>
UriQuery(List(
("redirect_uri" -> stringify(a.redirect_uri)),
("scope" -> stringify(a.scope)),
("client_id" -> stringify(a.client_id)),
("prompt" -> stringify(a.prompt)),
("response_type" -> stringify(a.response_type)),
("access_type" -> stringify(a.access_type)))
}
}
object AccessRequest {
implicit val encoded: UrlEncodedWriter[AccessRequest] = { a =>
Seq(
"code" -> a.code.toUrlEncoded,
"redirect_uri" -> a.redirect_uri.toUrlEncoded,
"client_id" -> a.client_id.toUrlEncoded,
"client_secret" -> a.client_secret.toUrlEncoded,
"scope" -> a.scope.toUrlEncoded,
"grant_type" -> a.grant_type.toUrlEncoded
).toUrlEncoded
}
}
object RefreshRequest {
implicit val encoded: UrlEncodedWriter[RefreshRequest] = { r =>
Seq(
"client_secret" -> r.client_secret.toUrlEncoded,
"refresh_token" -> r.refresh_token.toUrlEncoded,
"client_id" -> r.client_id.toUrlEncoded,
"grant_type" -> r.grant_type.toUrlEncoded
).toUrlEncoded
}
}
4.3.4 Module
That concludes the data and functionality modelling required to implement OAuth2. Recall from the previous chapter that we define mockable components that need to interact with the world as algebras, and we define pure business logic in a module.
We define our dependency algebras, and use context bounds to show that our
responses must have a JsDecoder and our POST payload must have a
UrlEncodedWriter:
package http.client.algebra
trait JsonClient[F[_]] {
def get[A: JsDecoder](
uri: String Refined Url,
headers: IList[HttpHeader]
): F[A]
def postUrlencoded[P: UrlEncoded, A: JsDecoder](
uri: String Refined Url,
payload: P,
headers: IList[HttpHeader]
): F[A]
}
Note that we only define the happy path in the JsonClient API. We will get around to error handling in a later chapter.
package http.oauth2.client.algebra
final case class CodeToken(token: String, redirect_uri: String Refined Url)
trait UserInteraction[F[_]] {
/** returns the URL of the local server */
def start: F[String Refined Url]
/** prompts the user to open this URL */
def open(uri: String Refined Url): F[Unit]
/** recover the code from the callback */
def stop: F[CodeToken]
}
trait LocalClock[F[_]] {
def now: F[Epoch]
}
some convenient data classes
final case class ServerConfig(
auth: String Refined Url,
access: String Refined Url,
refresh: String Refined Url,
scope: String,
clientId: String,
clientSecret: String
)
final case class RefreshToken(token: String)
final case class BearerToken(token: String, expires: Epoch)
and then write an OAuth2 client:
import http.encoding.UrlQueryWriter.ops._
class OAuth2Client[F[_]: Monad](
config: ServerConfig
)(
implicit
user: UserInteraction[F],
client: JsonClient[F],
clock: LocalClock[F]
) {
def authenticate: F[CodeToken] =
for {
callback <- user.start
params = AuthRequest(callback, config.scope, config.clientId)
_ <- user.open(params.toUrlQuery.forUrl(config.auth))
code <- user.stop
} yield code
def access(code: CodeToken): F[(RefreshToken, BearerToken)] =
for {
request <- AccessRequest(code.token,
code.redirect_uri,
config.clientId,
config.clientSecret).pure[F]
msg <- client.postUrlencoded[AccessRequest, AccessResponse](
config.access, request)
time <- clock.now
expires = time + msg.expires_in.seconds
refresh = RefreshToken(msg.refresh_token)
bearer = BearerToken(msg.access_token, expires)
} yield (refresh, bearer)
def bearer(refresh: RefreshToken): F[BearerToken] =
for {
request <- RefreshRequest(config.clientSecret,
refresh.token,
config.clientId).pure[F]
msg <- client.postUrlencoded[RefreshRequest, RefreshResponse](
config.refresh, request)
time <- clock.now
expires = time + msg.expires_in.seconds
bearer = BearerToken(msg.access_token, expires)
} yield bearer
}
4.4 Summary
- data types are defined as products (
final case class) and coproducts (sealed abstract class). -
Refinedtypes can enforce constraints on values - specific functions are defined on
implicit class - polymorphic functions are defined as typeclasses. Functionality is provided via “has a” context bounds, rather than “is a” class hierarchies.
- typeclass instances are implementations of the typeclass.
-
@simulacrum.typeclassgenerates.opson the companion, providing convenient syntax for types that have a typeclass instance. - typeclass derivation is compiletime composition of typeclass instances.
- generic instances automatically derive instances for your data types.