Leanpub: Publish Early, Publish Often

Haskell

Scalaz documentation often cites libraries or papers written in the Haskell programming language. In this short chapter, we will learn enough Haskell to be able to understand the source material, and to attend Haskell talks at functional programming conferences.

Data

Haskell has a very clean syntax for ADTs. This is a linked list structure:

  data List a = Nil | Cons a (List a)

List is a type constructor, a is the type parameter, | separates the data constructors, which are: Nil the empty list and a Cons cell. Cons takes two parameters, which are separated by whitespace: no commas and no parameter brackets.

There is no subtyping in Haskell, so there is no such thing as the Nil type or the Cons type: both construct a List.

Roughly translated to Scala:

  sealed abstract class List[A]
  object Nil {
    def apply[A]: List[A] = ...
    def unapply[A](as: List[A]): Option[Unit] = ...
  }
  object Cons {
    def apply[A](head: A, tail: List[A]): List[A] = ...
    def unapply[A](as: List[A]): Option[(A, List[A])] = ...
  }

i.e. the type constructor is like sealed abstract class, and each data constructor is .apply / .unapply. Note that Scala does not perform exhaustive pattern matches on this encoding, which is why Scalaz does not use it.

We can use infix, a nicer definition might use the symbol :. instead of Cons

  data List t = Nil | t :. List t
  infixr 5 :.

where we specify a fixity, which can be infix, infixl or infixr for no, left, and right associativity, respectively. A number from 0 (loose) to 9 (tight) specifies precedence. We can now create a list of integers by typing

  1 :. 2 :. Nil

Haskell already comes with a linked list, which is so fundamental to functional programming that it gets language-level square bracket syntax [a]

  data [] a = [] | a : [a]
  infixr 5 :

and a convenient multi-argument value constructor: [1, 2, 3] instead of 1 : 2 : 3 : [].

Ultimately our ADTs need to hold primitive values. The most common primitive data types are:

Char a unicode character
Text for blocks of unicode text
Int a machine dependent, fixed precision signed integer
Word an unsigned Int, and fixed size Word8 / Word16 / Word32 / Word64
Float / Double IEEE single and double precision numbers
Integer / Natural arbitrary precision signed / non-negative integers
(,) tuples, from 0 (also known as unit) to 62 fields
IO the inspiration for Scalaz’s IO, implemented in the runtime.

with honorary mentions for

  data Bool       = True | False
  data Maybe a    = Nothing | Just a
  data Either a b = Left a  | Right b
  data Ordering   = LT | EQ | GT

Like Scala, Haskell has type aliases: an alias or its expanded form can be used interchangeably. For legacy reasons, String is defined as a linked list of Char

  type String = [Char]

which is very inefficient and we always want to use Text instead.

Finally we can define field names on ADTs using record syntax, which means we contain the data constructors in curly brackets and use double colon type annotations to indicate the types

  -- raw ADT
  data Resource = Human Int String
  data Company  = Company String [Resource]
  
  -- with record syntax
  data Resource = Human
                  { serial    :: Int
                  , humanName :: String
                  }
  data Company  = Company
                  { companyName :: String
                  , employees   :: [Resource]
                  }

Note that the Human data constructor and Resource type do not have the same name. Record syntax generates the equivalent of a field accessor and a copy method.

  -- construct
  adam = Human 0 Adam
  -- field access
  serial adam
  -- copy
  eve = adam { humanName = "Eve" }

A more efficient alternative to single field data definitions is to use a newtype, which has no runtime overhead:

  newtype Alpha = Alpha { underlying :: Double }

equivalent to extends AnyVal but without the caveats.

A limitation of Haskell’s record syntax is that a field name cannot be used in different data types. However, we can workaround this by enabling a LANGUAGE extension, allowing us to use name in both Human and Company:

  {-# LANGUAGE DuplicateRecordFields #-}
  
  data Resource = Human
                  { serial :: Int
                  , name   :: String
                  }
  data Company  = Company
                  { name      :: String
                  , employees :: [Resource]
                  }

There are a lot of language extensions and it is not uncommon to have 20 or more in a small project. Haskell is extremely conservative and new language features are opt-in for a long period of time before they can be accepted into the language standard.

Functions

Although not necessary, it is good practice to explicitly write the type signature of a function: its name followed by its type. For example foldl specialised for a linked list

  foldl :: (b -> a -> b) -> b -> [a] -> b

All functions are curried in Haskell, each parameter is separated by a -> and the final type is the return type. This is equivalent to the following Scala signature:

  def foldLeft[A, B](f: (B, A) => B)(b: B)(as: List[A]): B

Some observations:

there is no keyword
there is no need to declare the types that are introduced
there is no need to name the parameters

which makes for terse code.

Infix functions are defined in parentheses and need a fixity definition:

  (++) :: [a] -> [a] -> [a]
  infixr 5 ++

Regular functions can be called in infix position by surrounding their name with backticks. The following are equivalent:

  a `foo` b
  foo a b

An infix function can be called like a regular function if we keep it surrounded by brackets, and can be curried on either the left or the right, often giving different semantics:

  invert = (1.0 /)
  half   = (/ 2.0)

Functions are typically written with the most general parameter first, to enable maximum reuse of the curried forms.

The definition of a function may use pattern matching, with one line per case. This is where we may name the parameters, using the data constructors to extract parameters much like a Scala case clause:

  fmap :: (a -> b) -> Maybe a -> Maybe b
  fmap f (Just a) = Just (f a)
  fmap _ Nothing  = Nothing

Underscores are a placeholder for ignored parameters and function names can be in infix position:

  (<+>) :: Maybe a -> Maybe a -> Maybe a
  Just a <+> _      = Just a
  Empty  <+> Just a = Just a
  Empty  <+> Empty  = Empty

We can define anonymous lambda functions with a backslash, which looks like the Greek letter 位. The following are equivalent:

  (*)
  (\a1 -> \a2 -> a1 * a2)
  (\a1 a2     -> a1 * a2)

Pattern matched Haskell functions are just syntax sugar for nested lambda functions. Consider a simple function that creates a tuple when given three inputs:

  tuple :: a -> b -> c -> (a, b, c)

The implementation

  tuple a b c = (a, b, c)

desugars into

  tuple = \a -> \b -> \c -> (a, b, c)

In the body of a function we can create local value bindings with let or where clauses. The following are equivalent definitions of map for a linked list (an apostrophe is a valid identifier name):

  map :: (a -> b) -> [a] -> [b]
  
  -- explicit
  map f as = foldr map' [] as
             where map' a bs = f a : bs
  
  -- terser, making use of currying
  map f    = foldr map' []
             where map' a = (f a :)
  
  -- let binding
  map f    = let map' a = (f a :)
             in foldr map' []
  
  -- actual implementation
  map _ []       = []
  map f (x : xs) = f x : map f xs

if / then / else are keywords for conditional statements:

  filter :: (a -> Bool) -> [a] -> [a]
  filter _ [] = []
  filter f (head : tail) = if f head
                           then head : filter f tail
                           else filter f tail

An alternative style is to use case guards

  filter f (head : tail) | f head    = head : filter f tail
                         | otherwise = filter f tail

Pattern matching on any term is with case ... of

  unfoldr :: (a -> Maybe (b, a)) -> a -> [b]
  unfoldr f b = case f b of
                  Just (b', a') -> b' : unfoldr f a'
                  Nothing       -> []

Guards can be used within matches. For example, say we want to special case zeros:

  unfoldrInt :: (a -> Maybe (Int, a)) -> a -> [Int]
  unfoldrInt f b = case f b of
                     Just (i, a') | i == 0    -> unfoldrInt f a'
                                  | otherwise -> i : unfoldrInt f a'
                     Nothing                  -> []

Finally, two functions that are worth noting are ($) and (.)

  -- application operator
  ($) :: (a -> b) -> a -> b
  infixr 0
  
  -- function composition
  (.) :: (b -> c) -> (a -> b) -> a -> c
  infixr 9

Both of these functions are stylistic alternatives to nested parentheses.

The following are equivalent:

  Just (f a)
  Just $ f a

as are

  putStrLn (show (1 + 1))
  putStrLn $ show $ 1 + 1

There is a tendency to prefer function composition with . instead of multiple $

  (putStrLn . show) $ 1 + 1

Typeclasses

To define a typeclass we use the class keyword, followed by the name of the typeclass, its type parameter, then the required members in a where clause.

If there are dependencies between typeclasses, i.e. Applicative requires a Functor to exist, we call this a constraint and use => notation:

  class Functor f where
    (<$>) :: (a -> b) -> f a -> f b
    infixl 4 <$>
  
  class Functor f => Applicative f where
    pure  :: a -> f a
    (<*>) :: f (a -> b) -> f a -> f b
    infixl 4 <*>
  
  class Applicative f => Monad f where
    (=<<) :: (a -> f b) -> f a -> f b
    infixr 1 =<<

We provide an implementation of a typeclass with the instance keyword. If we wish to repeat the type signature on instance functions, useful for clarity, we must enable the InstanceSigs language extension.

  {-# LANGUAGE InstanceSigs #-}
  
  data List a = Nil | a :. List a
  
  -- defined elsewhere
  (++) :: List a -> List a -> List a
  map :: (a -> b) -> List a -> List b
  flatMap :: (a -> List b) -> List a -> List b
  foldLeft :: (b -> a -> b) -> b -> List a -> b
  
  instance Functor List where
    (<$>) :: (a -> b) -> List a -> List b
    f <$> as = map f as
  
  instance Applicative List where
    pure a = a :. Nil
  
    Nil <*> _  = Nil
    fs  <*> as = foldLeft (++) Nil $ (<$> as) <$> fs
  
  instance Monad List where
    f =<< list = flatMap f list

If we have a typeclass constraint in a function, we use the same => notation. For example we can define something similar to Scalaz’s Apply.apply2

  apply2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
  apply2 f fa fb = f <$> fa <*> fb

Since we have introduced Monad, it is a good time to introduce do notation, which was the inspiration for Scala’s for comprehensions:

  do
    a <- f
    b <- g
    c <- h
    pure (a, b, c)

desugars to

  f >>= \a ->
    g >>= \b ->
      h >>= \c ->
        pure (a, b, c)

where >>= is =<< with parameters flipped

  (>>=) :: Monad f => f a -> (a -> f b) -> f b
  (>>=) = flip (=<<)
  infixl 1 >>=
  
  -- from the stdlib
  flip :: (a -> b -> c) -> b -> a -> c

Unlike Scala, we do not need to bind unit values, or provide a yield if we are returning (). For example

  for {
    _ <- putStr("hello")
    _ <- putStr(" world")
  } yield ()

translates to

  do putStr "hello"
     putStr " world"

Non-monadic values can be bound with the let keyword:

  nameReturn :: IO String
  nameReturn = do putStr "What is your first name? "
                  first <- getLine
                  putStr "And your last name? "
                  last  <- getLine
                  let full = first ++ " " ++ last
                  putStrLn ("Pleased to meet you, " ++ full ++ "!")
                  pure full

Finally, Haskell has typeclass derivation with the deriving keyword, the inspiration for @scalaz.deriving. Defining the derivation rules is an advanced topic, but it is easy to derive a typeclass for an ADT:

  data List a = Nil | a :. List a
                deriving (Eq, Ord)

Algebras

In Scala, typeclasses and algebras are both defined as a trait interface. Typeclasses are injected by the implicit feature and algebras are passed as explicit parameters. There is no language-level support in Haskell for algebras: they are just data!

Consider the simple Console algebra from the introduction. We can rewrite it into Haskell as a record of functions:

  data Console m = Console
                    { println :: Text -> m ()
                    , readln  :: m Text
                    }

with business logic using a Monad constraint

  echo :: (Monad m) => Console m -> m ()
  echo c = do line <- readln c
              println c line

A production implementation of Console would likely have type Console IO. The Scalaz liftIO function is inspired by a Haskell function of the same name and can lift Console IO into any Advanced Monad stack.

Two additional language extensions make the business logic even cleaner. For example, RecordWildCards allows us to import all the fields of a data type by using {..}:

  echo :: (Monad m) => Console m -> m ()
  echo Console{..} = do line <- readln
                        println line

NamedFieldPuns requires each imported field to be listed explicitly, which is more boilerplate but makes the code easier to read:

  echo :: (Monad m) => Console m -> m ()
  echo Console{readln, println} = do line <- readln
                                     println line

Whereas in Scala this encoding may be called Finally Tagless, in Haskell it is known as MTL style. Without going into details, some Scala developers didn’t understand a research paper about the performance benefits of Generalised ADTs in Haskell.

An alternative to MTL style are Extensible Effects, also known as Free Monad style.

Modules

Haskell source code is arranged into hierarchical modules with the restriction that all contents of a module must live in a single file. The top of a file declares the module name

  module Silly.Tree where

A convention is to use directories on disk to organise the code, so this file would go into Silly/Tree.hs.

By default all symbols in the file are exported but we can choose to export specific members, for example the Tree type and data constructors, and a fringe function, omitting sapling:

  module Silly.Tree (Tree(..), fringe) where
  
  data Tree a = Leaf a | Branch (Tree a) (Tree a)
  
  fringe :: Tree a -> [a]
  fringe (Leaf x)            = [x]
  fringe (Branch left right) = fringe left ++ fringe right
  
  sapling :: Tree String
  sapling = Leaf ""

Interestingly, we can export symbols that are imported into the module, allowing library authors to package up their entire API into a single module, regardless of how it is implemented.

In a different file we can import all the exported members from Silly.Tree

  import Silly.Tree

which is roughly equivalent to Scala’s import silly.tree._ syntax. If we want to restrict the symbols that we import we can provide an explicit list in parentheses after the import

  import Silly.Tree (Tree, fringe)

Here we only import the Tree type constructor (not the data constructors) and the fringe function. If we want to import all the data constructors (and pattern matchers) we can use Tree(..). If we only want to import the Branch constructor we can list it explicitly:

  import Silly.Tree (Tree(Branch), fringe)

If we have a name collision on a symbol we can use a qualified import, with an optional list of symbols to import

  import qualified Silly.Tree (fringe)

and now to call the fringe function we have to type Silly.Tree.fringe instead of just fringe. We can change the name of the module when importing it

  import qualified Silly.Tree as T

The fringe function is now accessed by T.fringe.

Alternatively, rather than select what we want to import, we can choose what not to import

  import Silly.Tree hiding (fringe)

By default the Prelude module is implicitly imported but if we add an explicit import from the Prelude module, only our version is used. We can use this technique to hide unsafe legacy functions

  import Prelude hiding ((!!), head)

or use a custom prelude and disable the default prelude with the NoImplicitPrelude language extension.

Evaluation

Haskell compiles to native code, there is no virtual machine, but there is a garbage collector. A fundamental aspect of the runtime is that all parameters are lazily evaluated by default. Haskell treats all terms as a promise to provide a value when needed, called a thunk. Thunks get reduced only as much as necessary to proceed, no more.

A huge advantage of lazy evaluation is that it is much harder to trigger a stack overflow! A disadvantage is that there is an overhead compared to strict evaluation, which is why Haskell allows us to opt in to strict evaluation on a per parameter basis.

Haskell is also nuanced about what strict evaluation means: a term is said to be in weak head normal-form (WHNF) if the outermost code blocks cannot be reduced further, and normal form if the term is fully evaluated. Scala’s default evaluation strategy roughly corresponds to normal form.

For example, these terms are normal form:

  42
  (2, "foo")
  \x -> x + 1

whereas these are not in normal form (they can be reduced further):

  1 + 2            -- reduces to 3
  (\x -> x + 1) 2  -- reduces to 3
  "foo" ++ "bar"   -- reduces to "foobar"
  (1 + 1, "foo")   -- reduces to (2, "foo")

The following terms are in WHNF because the outer code cannot be reduced further (even though the inner parts can be):

  (1 + 1, "foo")
  \x -> 2 + 2
  'f' : ("oo" ++ "bar")

and the following are not in WHNF

  1 + 1              -- reduces to 2
  (\x y -> x + y) 2  -- reduces to \y -> 2 + y
  "foo" ++ "bar"     -- reduces to "foobar"

The default evaluation strategy is to perform no reductions when passing a term as a parameter. Language level support allows us to request WHNF for any term with ($!)

  -- evaluates `a` to WHNF, then calls the function with that value
  ($!) :: (a -> b) -> a -> b
  infixr 0

We can use an exclamation mark ! on data parameters

  data StrictList t = StrictNil | !t :. !(StrictList t)
  
  data Employee = Employee
                    { name :: !Text
                    , age :: !Int
                    }

The StrictData language extension enables strict parameters for all data in the module.

Another extension, BangPatterns, allows ! to be used on the arguments of functions. The Strict language extension makes all functions and data parameters in the module strict by default.

Going to the extreme we can use ($!!) and the NFData typeclass for normal form evaluation:

  class NFData a where
    rnf :: a -> ()
  
  ($!!) :: (NFData a) => (a -> b) -> a -> b

which is subject to the availability of an NFData instance.

The cost of strictness is that Haskell behaves like any other strict language and may perform unnecessary work. Opting in to strictness must therefore be done with great care, and only for measured performance improvements. If in doubt, be lazy and stick with the defaults.

Next Steps

Haskell is a faster, safer and simpler language than Scala and has proven itself in industry. Consider taking the data61 course on functional programming, and ask questions in the #qfpl chat room on freenode.net.

Some additional learning materials are:

Programming in Haskell to learn Haskell from first principles.
Parallel and Concurrent Programming in Haskell and What I Wish I Knew When Learning Haskell for intermediate wisdom.
Glasgow Haskell Compiler User Guide and HaskellWiki for the cold hard facts.
Eta, i.e. Haskell for the JVM.

If you enjoy using Haskell and understand the value that it would bring to your business, then tell your managers! That way, the small percentage of managers who commission Haskell projects will be able to attract functional programming talent from the many teams who do not, and everybody will be happy.

Up next

Third Party Licenses