A Not Very Gentle Introduction to Haskell

This short note of Haskell is based on A Gentle Introduction to Haskell Version 98, which contains most essential syntax and idioms of Haskell, if you have experience with other functional programming languages, this document might appropriate for you.

1. Types

To declare type for expression:

5 :: Integer
'a' :: Char
inc :: Integer -> Integer
[1,2,3] :: [Integer]
('b', 4) :: (Char, Integer)

Functions can defined by a series of equations. For example:

inc :: Integer -> Integer
inc n = n + 1

Note: if using GHCi, should need a let to define it, like

let inc n = n + 1

List

: is the infix operator to add first argument to the second argument (a list), and : is right associate, so we can write

1 : 2 : []

Ploymorphic Type

Using pattern matching.

length :: [a] -> Integer
length [] = 0
length (x : xs) = 1 + length xs

head :: [a] -> a
head (x : xs) = x

tail :: [a] -> [a]
tail (x : xs) = xs

User-Defined Type

data Bool = False | True

data Color = Red | Green | Blue | Indigo | Voilet

Bool is a nullary type constructor, and False and Ture are (also nullary) data constructor. Bool and Color are called (disjoint) union type or sum type.

data Point a = Pt a a
Pt 2.0 3.0 :: Point Float
Pt 'a' 'b' :: Point Char
Pt True False :: Point Bool

Point is called a tuple type, since it is just a catesian product of other types. It’s also a unary type constructor, since from the type t it constructs a new type Point t.

[] is also a type constructor, given any type t we can derive a new type [t]. -> is also a type constructor, given two types t and u, t->u is the type of functions mapping elements of type t to type u.

Different between applying a data constructor to yield a value, and applying a type constructor to yield a type: former happens in run-time, and latter happens at compile-time which is the part of type system’s process to ensuring type safty.

Type constructor and data constructor are in separate namespaces, so we can write

data Point a = Point a a

Recursive Types

Binary tree:

data Tree a = Leaf a | Branch (Tree a) (Tree a)

Tree is a type constructor, where Branch and Leaf are data constructors. So they have following types:

Branch :: Tree a -> Tree a -> Tree a
Leaf   :: a -> Tree a

Define a function fringe that returns a list of all elements in the leaves of a tree from left to right:

fringe :: Tree a -> [a]
fringe (Leaf x) = [x]
fringe (Branch leaf right) = fringe left ++ fringe right

++ is the infix operator that concatenates two lists to one.

Type Synonyms

Type synonyms do not define new types, it just gives new names for existing types.

type String = [Char]
type Person = (Name, Address)
type Name = String
type Address = None | Addr String
type AssocList a b = [(a, b)]

Built-in Types

Built-in types are nothing special with with user-defined types. For example, Char type is just an enumerated type consisting of a large number of nullary constructors, smilarily, with Int and Integer.

Differents between list and tuple:

For $(e_1, e_2, \cdots e_n)$, $n \ge 2$, if $t_i$ is the type of $e_i$, then the type of the tuple is $(t_1, t_2, \cdots t_n)$.

For $[e_1, e_2, \cdots e_n]$, $n \ge 0$, each $e_i$ must have the same type $t$, and the type of the list is $[t]$.

List Comprehensions

[f x | x <- xs]

This expression can be read as “the list of all f x such that x is drawn from xs“. The phrase x <- xs is called a generator.

[(x, y) | x <- xs, y <- ys]

This list comprehension forms the cartesian product of the two list xs and ys.

Example of quick sort:

quicksort [] = []
quicksort (x : xs) = quicksort [y | y <- xs, y < x] ++ [x] ++ quicksort [y | y <- xs, y >= x]

Arithmetic Sequences

[1 .. 10]    -- [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 3 .. 10] -- [1, 3, 5, 7, 9]
[1, 3 ..]    -- [1, 3, 5, 7, ... ] infinite sequence

String

String type is a predefined type synonym. "hello" is actually shorthand of the list of characters ['h', 'e', 'l', 'l', 'o'].

type String = [Char]

2. Functions

Define a function that add two arguments:

add :: Integer -> Integer -> Integer
add x y = x + y

It’s a curied function. Function application associates to the left, -> associates to the right. We can define inc in a different way:

inc = add + 1

This is an example of the partial application of a curried function.

map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs

map (add 1) [1, 2, 3] -- [2, 3, 4]

Function application has higher precedence than any infix operator.

Lambda

inc = \x -> x + 1
add = \x -> \y -> x + y
add = \x y -> x + y

Infix Operator

Infix operator are just functions.

(++) :: [a] -> [a] -> [a]
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

Haskell has no prefix operator, with the exception of minus -, which is both infix and prefix.

Another example, function composition:

(.) :: (b -> c) -> (a -> b) -> (a -> c)
f . g = \x -> f (g x)

Sections

In Haskell, partial application of an infix operator is called a section. The parentheses are mandatory.

(x+) -- \y -> x + y
(+y) -- \x -> x + y
(+)  -- \x y -> x + y

map (+) [1, 2, 3] -- returns a list of functions

We can coerce an infix operator into a functional value.

inc = (+ 1)
add = (+)

And we can also do it inversely.

x `add` y

Fixity Declarations

A fixity declaration can be given for any infix operator or constructor, with a precedence level from 0 to 9.

infixr 5 ++
infixr 9 .

These are all right-associativity, and we can specify left-associativity by infixl, and non-associativity by infix.

Functions are Non-Strict

Abstractly, we denote the value of a non-terminating experssion as $\bot$. Expressions that result in some kind of run-time error, such as 1/0 also have this value. Such an error is not recoverable, while some error such as IO error are recoverable.

A function f is strict if when applied to a non-terminating expression, it also fails to terminate. In other word, f is strict if and only if the value of f bot is $\bot$.

Haskell are non-strict, or called lazily, or by need. For example

const1 x = 1

If we apply bot to const1, it does not need to know the value when applying, rather than, it need the value when the function really need it.

Haskell using definitions rather than assignments found in traditional languages.

Infinite Data Structures

ones = 1 : ones
numsFrom n = n : numsFrom (n + 1)
squares = map (^2) (numsFrom 0)

take 5 squares -- [0, 1, 4, 9, 16]

ones above is an example of circular list, another example is fib

fib = 1 : 1 : [ a + b | (a, b) <- zip fib (tail fib)]

where zip is a prelude function that returns the pairwise interleaving of two list arguments:

zip (x : xs) (y : ys) = (x, y) : zip xs ys
zip xs ys = []

Error Function

Haskell has a built-in function called error whose type is String->a. There is a value shared by all types: $\bot$. Semantically it is exactly what value that returned by error, since all error have value $\bot$.

3. Case Expressions and Pattern Matching

We have already seen examples of data constructor patterns, nesting of patterns are also permitted. Patterns are not first-class.

Formal parameters are also patterns, they never fail to match a value. Patterns like formal parameters that never fail to match are said to be irrefutable, in contrast to refutable patterns which may fail to match. There are three kinds of irrefutable pattern, except formal parameters, they are

As-patterns

f (x:xs) = x : x : xs
f s@(x:xs) = x : s

Wild-card

head(x:_) = x
tail(_:xs) = xs

Pattern-Matching Semantics

Pattern matching can either fail, succeed, or diverge. A successful match binds the formal parameters. Divergence occurs when a value needed by the pattern contains an error $\bot$. The matching process itself occurs top-down, left-to-right. Failure of a pattern anywhere in one equation results in failure of the whole equation, and the next equation is then tried. If all equations are failed, the value of the function application is $\bot$, and results in a run-time error.

Top-level patterns may also have a boolean guard:

sign x | x > 0  =  1
       | x == 0 =  0
       | x < 0  = -1

Examples

The order of patterns matter. Consider two versions of take

take 0 _ = []
take _ [] = []
take n (x:xs) = x : take (n-1)  xs

-- --

take1 _ [] = []
take1 0 _ = []
take1 n (x:xs) = x : take (n-1)  xs

When we apply with $\bot$,

take  0 bot -- []
take1 0 bot -- bot

take  bot [] -- bot
take1 bot [] -- []

Case Expressions

A function definition of the form

f p11 ... p1k = e1
...
f pn1 ... pnk = en

where each pij is a pattern, is semantically equivalent to

f x1 x2 ... xk = case (x1, ... xk) of (p11, ... p1k) -> e1
                                      ...
                                      (pn1, ... pnk) -> en

For example, the definition of take is equivalent to

take m ys = case (m, ys) of
              (0, _) -> []
              (_, []) -> []
              (n, x:xs) -> x : take (n-1) xs

Note the type of right hand they must have the same type.

The conditional expression is

if e1 then e2 else e3

which is really short-hand for:

case e1 of True  -> e2
           False -> e3

if-then-else can be viewd as a function with type Bool -> a -> a -> a.

Lazy Patterns

Haskell has lazy pattern, with the form ~pat. Lazy patterns are irrefutable, they will always succeeds, the pattern is later used on the right hand side.

It’s useful when dealing with infinite data structures. For a client example, we may write as follow

client init ~(resp:resps) = init : client (next resp) resps

For another example, consider the fib we wrote before,

fib = 1 : 1 : [ a + b | (a, b) <- zip fib (tail fib)]

We can rewrite it using an as-pattern

fib@(1:tfib) = 1 : 1 : [ a + b | (a, b) <- zip fib tfib]

Lexical Scoping and Nested Forms

Let

let y = a * b
    f x = (x + y) / y
in f c + f d

The set of bindings created by a let expression is mutally recursive, and pattern bindings are treated as lazy patterns(carry an implicit ~).

Where Clauses

where can scope bindings over several guarded equations.

f x y | y > z = ...
      | y == z = ...
      | y < z = ...
      where z = x * x

This cannot be done with let expression, which only scopes over the expression which it encloses. A where clause is only allowed at the top level of a set of equations or case expression.

Remember: let is an expression, while where is not, it’s the part of the syntax of function declaration and case expressions.

Layout

Haskell uses a two-dimensional syntax called layout that essentially relies on declarations being “lined up in columns”.

First, the next character following any of the keywords where, let, or of is what determines the starting column for the declarations in the where, let or case expression being written.

Second, make sure the starting column is further to the right than the starting column assocaited with the immediately surrounding caluse.

Layout is actually shorthand for an explicit grouping mechanism. The let example is equivalent to

let { y = a * b
    ; f x = (x + y) / y
    }
in f c + f d

4. Type Classes and Overloading

There are two kinds of polymorphism, the first one is called parametric polymorphism which we already have seen before, the second one is called ad hoc polymorphism, better known as overloading. In Haskell, type classes provide a structured way to control ad hoc polymorphism.

Type classes allow us to declare which types are instances of which class, and to provide definitions of the overloaded operations associated with a class. For example, we define a type class containing an equality operator:

class Eq a where
  (==) :: a -> a -> Bool

This declaration can be read as “a type a is an instance of the class Eq if there is an operation ==, of the appropriate type, defined on it”. Eq a is not a type expression, but rather it expresses a constraint on a type, and is called context. Contexts are placed at the front of type expressions.

(==) :: (Eq a) => a -> a -> Bool

This should be read, for every type a that is an instance of the class Eq, == has type a->a->Bool.

Instance Declaration

We can use instance declaration to specify which type are instance of which class. For example

instance Eq Integer where
  x == y = x `integerEq` y

The definition of == is called a method. It is saying that the type Integer is an instance of the class Eq, and here is the definition of the method corresponding to the operation ==.

Recursive types such as Tree can also be handled.

instance (Eq a) => Eq (Tree a) where
  Leaf a == Leaf b                 = a == b
  (Branch l1 r1) == (Branch l2 r2) = (l1 == l2) && (r1 == r2)
  _ == _                           = False

Note that context Eq a is necessary.

In Haskell, there are many useful type classes in Prelude, for example, the Eq class

class  Eq a  where
   (==), (/=) :: a -> a -> Bool
   -- Minimal complete definition:
   -- (==) or (/=)
   x /= y     =  not (x == y)

The above example also demonstrates default method. If a method is omitted in an instance declaration, then the default one defined in the class will be use instaed.

Class Extension

Haskell also supports class extension. For example, Ord may inherits all of operations in Eq, but in addition has a set of comparison operations and minimum and maximum functions:

class (Eq a) => Ord a where
  (<), (<=), (>=), (>) :: a -> a -> Bool
  max, min             :: a -> a -> Bool

We say that Eq is the superclass of Ord, and Ord is the subclass of Eq, any instance of Ord must also be the instance of Eq. An exmaple of using Ord is the quicksort,

quicksort :: (Ord a) => [a] -> [a]

Multiple Inheritance

Haskell also permits multiple inheritance.

class (Eq a, Show a) => C a where ...

Class methods are treated as top level declarations, they share same namespace as ordinary varibales, a name cannot be used to denote both a class method and a variable or method in different class.

Functor

Now consider a Functor class

class Functor f where
  fmap :: (a -> b) -> f a -> f b

Here f is a type variable, and fmap function generalizs the map function. An instance of Functor for type Tree would be:

instance Function Tree where
  fmap f (Leaf x)       = Leaf (f x)
  fmap f (Branch t1 t2) = Branch (f t1) (f t2)

kind

Haskell has a second type system which ensures the correctness of types, each type has an associated kind. Type expressions are classified into different kinds which take one of two possible forms:

  • The symbol $\star$ represents the kind of type associated with concrete data objects. That is, if the value v has type t, the kind of v must be $\star$.

  • If $k_1$ and $k_2$ are kinds, then $k_1 \rightarrow k_2$ is the kind of types that take a type of kind $k_1$ and return a type of kind $k_2$.

The type constructor Tree has kind $\star \rightarrow \star$, and the type Tree Int has kind $\star$. Members of Functor class must all have the kind $\star \rightarrow \star$.

Other useful document: Haskell Classes and Types

5. Types, Again

We will now discuss some advanced aspects of types.

Newtype

newtype declaration creates a new type from an existing one, but with a separate identity in the type system. The newtype declaration uses the same syntax as a data. For example,

newtype Natural = MakeNatural Integer

It creates an new type Natural, whose only constructor contains a single Integer.

toNatural :: Integer -> Natural
toNatural x | x < 0 = error "Can't create negative naturals"
            | otherwise = MakeNatural x

fromNatural :: Natural -> Integer
fromNatural (MakeNatural i) = i

The following instance declaration admits Natural to the Num class:

instance Num Natural where
  fromInteger = toNatural
  x + y       = toNatural (fromNatural x + fromNatural y)
  x - y       = let r = fromNatural x - fromNatural y in
                  if r < 0 then error "Unnatural subtraction"
                           else toNatural r
  x * y       = toNatural (fromNatural x * fromNatural y)

Field Labels

The fields of data type can be accessed either by position or by field label. For exmaple to define a Point:

data Point = Pt { pointx, pointy :: Float}

It also defines two field names, pointx and pointy, which we can used as selector functions to extract a component from a structure.

pointx :: Point -> Float
pointy :: Point -> Float

-- a example using these selectors

absPoint :: Point -> Float
absPoint p = sqrt (pointx p * pointx p + pointy p * pointy p)

Field labels can also be used to construct new values or used in pattern matching:

Pt {pointx = 1, pointy = 2}

absPoint (Pt {pointx = x, pointy = y}) = sqrt(x*x + y*y)

We can also update some filed, by create a value and replace the corresponding field. If p is a Point, then p {pointx=2} creates a new Point with same pointy but different pointx.

Field labels share the top level namespace with ordinary variables and class methods, so a field name cannot be used in more than one data type. But, within a data type, the same field name can be used more than once, as long as they have same type in all cases. For exmaple

data T = C1 {f :: Int, g :: Float}
       | C2 {f :: Int, h :: Bool}

Strict Data Constructors

In Haskell, each field of a lazy data object is wrapped in a structure called thunk that encapsulates the computation defining the field value. Thunks can contain errors $\bot$ do not affect other elements of a data structure.

But we can use a strictness flat ! that allows specific fields of a constructor to be evaluated immediately, selectively suppressing laziness.

data RealFloat a => Complex a = !a :+ !a

Note :+ is infix constructor.

6. Input and Output

Haskell’s I/O system is built on monad, which we will discuss deeply later.

In imperative languages, programs proceed via actions which examine and modify the current state of the world. In Haskell, actions are defined rather than invoked whithin the expression languages of Haskell. Evaluating the definition of actions doesn’t actually cause the action to happen. Actions are either atomic.

Basic I/O Operations

Every I/O action returns a value, which is tagged with IO type. For example, the type of function getChar and function putChar are

getChar :: IO Char
putChar :: Char -> IO ()

Actions are sequenced using an operator >>= (or bind), here we will firstly use its syntactic sugar, the do notation. The do notation can be trivially expanded to >>=.

The do notation introduces a sequence of statements which are executed in order. A statement is either (1) an action, (2) a pattern bound to the result of an action using <-, or (3) a set of local definitions introduced by let. The do notation needs proper layout.

For example of a simple read and print char program:

main :: IO ()
main = do c <- getChar
          putChar c

main is defined to be the entry point of a Haskell program, and must have an IO type, usually IO (). The variable defined by <- are only in scope in the following statements.

To return some value in the do notation, we need return function:

return :: a -> IO a

Consider a getLine function:

getLine :: IO String
getLine = do c <- getChar
             if c == '\n'
                  then return ""
                  else do l <- getLine
                          return (c:l)

The return function admits an ordinary value such as a boolean to the realm of I/O actions. But we can not do it inversely.

Programming with Actions

I/O actions are just ordinary Haskell values.

todoList :: [IO ()]
todoList = [putChar 'a',
            do putChar 'b'
               putChar 'c',
            do c <- getChar
               putChar c]

This list doesn’t actually invoke any actions, it just holds them. To join these actions into a single actions, we can use a sequence_ function:

sequence_ :: [IO ()] -> IO ()
sequence_ []     = return ()
sequence_ (a:as) = do a sequence_ as

This can be simplified by do x;y is expanded to x >> y. We can use foldr function to define a better sequence_:

sequence_ :: [IO ()] -> IO ()
sequence_ = foldr (>>) (return ())

The sequence_ function can be used to construct putStr function:

putStr :: String -> IO ()
putStr s = sequence_ (map putChar s)

Exception

Errors are encoded using a special data type, IOError, this is an abstract type: no constructors for IOError are available for users. We may have some functions to determine the actual type of error:

isEOFError :: IOError -> Bool

An exception handler has type IOError -> IO a, the catch function associates an exception handler with an action or set of actions:

catch :: IO a -> (IOError -> IO a) -> IO a

The arguments of catch are an action and a handler. If the action succeeds, its result is returned without invoking handler, otherwise, it is passed to the handler as a value of type IOError and the handler is then invoked. For example, a newline is returned when an error is encountered:

getChar' :: IO Char
getChar' = getChar `catch` (\e -> return '\n')

Another example is dealing with end-of-file error:

getChar' :: IO Char
getChar' = getChar `catch` eofHandler where
    eofHandler e = if isEOFError e then return '\n' else ioError e

The ioError function used here to throws an exception on to the next exception handler. The type of ioError is

ioError :: IOError -> IO a

Nested calls to catch care permitted, and produce nested exception handlers:

getLine' :: IO String
getLine' = catch getLine'' (\e -> return ("Error: " + show e))
    where getLine'' = do c <- getChar'
                      if c == '\n' then return ""
                                   else do l <- getLine'
                                           return (c:l)

Files, Channels, and Handles

Opening a file creates a handle (of type Handle) for use in I/O transactions. Closing the handle close the associated file:

type FilePath = String
openFile :: FilePath -> IOMode -> IO Handle
hClose :: Handle -> IO ()
data IOMode = ReadMode | WriteMpde | AppendMode | ReadWriteMode

Handles can aslo be associated with channels: communication ports not directly attached to files, such as stdin, stdout and stderr. Character level I/O operations include hGetChar and hPutChar which takes a handle as argument.

getChar = hGetChar stdin

Haskell also allows getting entrie contents of file or channel to be returned as a string, lazily.

getContents :: Handle -> IO String

Finanly, an example of copying one file to anther:

main = do fromHandle <- getAndOpenFile "Copy from: " ReadMode
          toHandle   <- getAndOpenFile "Copy to: " WriteMode
          contents   <- hGetContents fromHandle
          hPutStr toHandle contents
          hClose toHandle
          putStr "Done."

getAndOpenFile :: String -> IOMode -> IO Handle
getAndOpenFile prompt mode =
    do putStr prompt
       name <- getLine
       catch (openFile name mode)
             (\_ -> do putStrLn ("Cannot open " ++ name ++ "\n")
                       getAndOpenFile prompt mode)

7. Standard Haskell Classes

Haskell Standard Classes

Equality and Ordered Classes

The classes Eq and Ord are defined in Prelude. In particular, note the compare method:

data Ordering = EQ | LT | GT
compare :: Ord a => a -> a -> Ordering

Enumeration Classes

Class Enum has a set of operations that underlie the syntactic sugar of arithmetic sequences we have seen before. For example, [1,3..] stands for enumFromThen 1 3. Similarly for Char and ['a' .. 'z'].

Read and Show Classes

The instances of class Show are types that can be converted to strings. The instances of class Read provides operations for parsing character string to obtain the values they may represent.

show

The type of function show is:

show :: (Show a) => a -> String

Now consider a function to show a binary tree:

showTree :: (Show a) => Tree a -> String
showTree (Leaf x)     = show x
showTree (Branch l r) = "<" ++ showTree l ++ "|" ++ showTree r ++ ">"

Since ++ has linear time complexity, so the showTree function is potentially quadratic in size of the tree. To restore the linear complexity, the function shows is provided:

shows :: (Show a) => a -> String -> String

shows takes two arguments, and will concatenate the string representation of first argument to the second one. Acutally, the show function is defined as

show x = shows x ""

We can use shows to define a more efficient version of showTree:

showsTree :: (Show a) => Tree a -> String -> String
showsTree (Leaf x) s     = shows x s
showsTree (Branch l r) s = '<' : showsTree l ('|' : showsTree r ('>' : s))

This solves the efficiency problem, but the presentation of function can be improved. First, create a type synonym:

type ShowS = String -> String

Then we can avoid carrying accumulators and avoid parentheses by using functional composition. Recall that function composition: f . g = \x -> f (g x).

showsTree :: (Show a) => Tree a -> ShowS
showsTree (Leaf x)     = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)

read

We can also trun string to Tree structure. The basic idea is a paser for a type a, which is a function that takes a string and returns a list of (a, String) pairs. The Prelude provides a type synonym for such functions:

type ReadS a = String -> [(a, String)]

The standard function reads is a parser for any instance of Read:

reads :: (Read a) => ReadS a

We can use this function to define a parsing function for the string we generate by showsTree. List comprehensions give us a convenient idiom to do this:

readsTree :: (Read a) => ReadS (Tree a)
readsTree ('<':s) = [(Branch l r, u) | (l, '|':t) <- readsTree s,
                                       (r, '>':u) <- readsTree t ]
readsTree s       = [(Leaf x, t) | (x, t)]

We can verify using following evaluations:

-- should be [(Branch (Leaf 1) (Branch (Leaf 2) (Leaf 3))), ""]
readsTree "<1|<2|3>>" :: [(Tree Int, String)] 

-- should be []
readsTree "<1|2"

There are some problems for our readsTree, the first one that the parser can not deal with whitespace; the other is that the way we parse our punctuation symbols is quite different from the way we parse leaf values and subtrees, the lack of uniformity making the function harder to read. We can address these problems by using lexical analyzer provided by the Prelude:

lex :: ReadS String

lex returns a list containing a pair of strings.

readsTree :: (Read a) => ReadS (Tree a)
readsTree s = [(Branch l r, x) | ("<", t) <- lex s
                                 (l,   u) <- readsTree t,
                                 ("|", v) <- lex u,
                                 (r,   w) <- readsTree w,
                                 (">", x) <- lex w ]
              ++
              [(Leaf x, t) | (x, t) <- reads s]

The Show and Read instances for Tree are:

instance Show a => Show (Tree a) where
    showsPrec _ x = showsTree x

instance Read a => Read (Tree a) where
    readsPrec _ x = readsTree x

The showsPrec and readsPrec methods are parameterized versions of shows and reads. Alternatively, the Show instance could be defined in terms of showTree:

instance Show a => Show (Tree a) where
    show t = showTree t

We can test Read and Show instances by applying (read . show) which should be identity to some values, where read is a specialization of reads:

read :: (Read a) => String -> a

Derived Instances

For the previous == method for Tree type, actually, the Eq instance can be derived automatically from the data declaration if we specify:

data Tree a = Leaf a
            | Branch (Tree a) (Tree a)
            deriving Eq

Instances of Ord, Enum, Ix, Read, and Show can also be generate by deriving clasue.

The derived Ord instance is:

instance (Ord a) => Ord (Tree a) where
    (Leaf _) <= (Branch _)         = True
    (Leaf x) <= (Leaf y)           = x <= y
    (Branch _) <= (Leaf _)         = False
    (Branch l r) <= (Branch l' r') = l == l && r <= r' || l <= l'

This is lexicographic order: constructors are ordered by the order of their apperance in the data declaration, and the arguments are compared from left to right.

Another example of deriving Enum, Show and Read instance:

data Day = Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday
         deriving (Enum, Show, Read)
--
[Wednesday .. Friday]  -- [Wednesday, Thursday, Friday]
[Monday, Wednesday ..] -- [Monday, Wednesday, Friday]
-- 
show [Wednesday .. Friday]         -- "[Wednesday, Thursday, Friday]"
read "Monday" :: Day               -- Monday
read "[Wednesday,Monday]" :: [Day] -- [Wednesday, Monday]

In practice, Eq and Ord instance are almost always derived.

8. Monads

A monad is a monoid in the category of endofunctors, what’s the problem?
— Philip Wadler

Monadic Classes

A monad is constructed on top of a polymorphic type such as IO. The monad itself is defined by instance declarations associating the type with the some or all of the monadic classes, Functor, Monad, and MonadPlus. In addition to IO, two other types in Prelude are members of monadic classes: list [] and Maybe.

Mathematically, monads are governed by set of laws that should hold for monadic operations.

Functor

The Functor class, defines a single operation fmap. It applies an operation to the objects inside a container, return a container with the same shape. These laws apply to fmap of Functor:

fmap id = id
fmap (f . g) = fmap f . fmap g

Monad

The Monad class defines two basic operations, >>= (bind) and return:

infixl 1 >>, >>=
class Monad m where
    (>>=)   :: m a -> (a -> m b) -> m b
    (>>)    :: m a -> m b -> m b
    return  :: a -> m a
    fail    :: String -> m a
    m >> k  =  m >>= \_ -> k

The bind operation >> and >>= combine two monadic values, and return operation injects a value into a monad (container).

ma >>= \v -> mb combines a monadic value ma containing values of type a, and a function which takes a argument v of type a, returning the monadic value mb. The result is to combine ma and mb into a monadic value containing b. The >> function is used when the function does not need the value produced by the first monadic operator (v).

For example, in the IO monad, x >>= y performs two actions sequentially, passing the result of the first into the second. The do syntax provides a shorthand for chains of monadic operations.

do e1 ; e2     = e1 >> e2
do p <- e1; e2 = e1 >>= \p -> e2

When the pattern is this second form of do is refutable, pattern match failure calls the fail operation. This may raise an error or return a zero (as in the list monad). Thus the more complex translation is:

do p <- e1 ; e2 = e1 >>= (\v -> case v of p -> e2; _ -> fail "s")

Where "s" is a string identifying the location of do statement for possible use in an error message. For example, in the IO monad, an action such as 'a' <- getChar will call fail if the input character is not 'a'.

The laws which govern >>= and return are:

return a >>= k          = k a
m >>= return            = m
xs >>= return . f       = fmap f xs
m >>= (\x -> k x >>= h) = (m >>= k) >>= h

MonadPlus

The class MonadPlus is used for monads that have a zero element and a plus operations:

class (Monad m) => MonadPlus m where
    mzero :: m a
    mplus :: m a -> m a -> m a

The zero element obeys:

m >>= \x -> mzero = mzero
mzero >>= m       = mzero

For example, in list the zero value is []. The IO Monad has no zero element and is not member of MonadPlus.

The mplus operation follows:

m `mplus` mzero = m
mzero `mplus` m = m

Built-in Monads

We have already seen the IO Monad, then we will examine two other built-in monads.

List

For lists, monadic binding involves joining together a set of calculations for each value in the list. The signature >>= becomes:

(>>=) :: [a] -> (a -> [b]) -> [b]

Given a list of a and a function that maps a value of type a to a list of type b, binding applies this function to each values of type a and return all of the values with type b into a list. The return function creates a singleton list.

List comprehensions can be easily be expressed using these monadic operations. For examples, these three expressions are all same thing:

[(x, y) | x <- [1, 2, 3], y <- [1, 2, 3], x /= y]
----
do x <- [1, 2, 3]
   y <- [1, 2, 3]
   True <- return (x /= y)
   return (x, y)
----
[1, 2, 3] >>= (\x -> [1, 2, 3] >>= (\y -> return (x/=y) >>=
    (\r -> case r of True -> return (x, y)
                     _    -> fail "")))

The list monad can be thought of as describing functions of multi-valued arguments. For example:

mvLift2 :: (a -> b -> c) -> [a] -> [b] -> [c]
mvLift2 f x y = do x' <- x
                   y' <- y
                   return (f x' y')
----
mvLift2 (+) [1, 3] [10, 20, 30]    -- [11, 21, 31, 13, 23, 33]
mvLift2 (\a b -> [a, b]) "ab" "cd" -- ["ac", "ad", "bc", "bd"]
mvLift2 (*) [1, 2, 3] []           -- []

mvLift2 truns an ordinary function of two arguments into a function over multiple values, returning a value for each possible combination of the two input arguments.

Maybe

The Maybe monad is similar to list monad, the value Nothing serves as [] and Just x as [x].

Using Monads

The advantage of using monad is modularity, which by defining an operation monadically, we can hide underlying machinery in a way that allows new features to be incorporated into the monad transparently.

Walder’s paper Monads for Functional Programming extensively discussed this topic.

I may would like to write another blog to talk about monads:)

9. Numbers

The standard number types includ fixed- and abitrary-precision integers, ratios formed from each integer type, and single- and double-precision real and complex floatting-point.

Numeric Class Structure

Num is a subclass of Eq but not of Ord, since the order predicates do not apply to complex numbers. The subclass Real of Num, is a subclass of Ord.

The Num class provides several operations:

(+), (-), (*) :: (Num a) => a -> a -> a
negate, abs   :: (Num a) => a -> a

Two different kinds of divison operators are provided in two non-overlapping subclass of Num. The class Integral provides whole-number divison and remainder operations, the standard instance of Integral are Integer and Int. The Integer are unbounded integers, or “bignums”; while Int are machine integers with a range equivalent to at least 29-bit signed binary. Note that Integral is a subclass of Real.

All other numeric types fall in the class Fractional which provides the ordinary division operator /. Floating contains trigonometric, logarithmic, and exponential operations.

The RealFrac subclass of Fractional and Real provides a function properFraction, which decomposes a number into its whole and fractional parts, and a collection of functions that round to integral values by differing rules:

properFraction :: (Fractional a, Integral b) => a -> (b, a)
truncate, round, floor, ceiling :: (Fractional a, Integral b) => a -> b

Constructed Numbers

Int, Integer, Float and Double and primitive, others are made by constructors.

Complex is a type constructor that makes a complex type in class Floating from a RealFloat type:

data (ReadFloat a) => Complex a = !a :+ !a deriving (Eq, Text)

:+ is the data constructor, and we can use in pattern matching.

conjugate :: (RealFloat a) => Complex a -> Complex a 
conjugate (x :+ y) = x :+ (-y)

Similarly, the type constructor Ratio amkes a rational type in class RealFrac from an instance of Integral. Rationals use the % function to form a ratio from two integers.

(%) :: (Integral a) => a -> a -> Ratio a
numerator, denominator :: (Integral a) -> Ratio a -> a

Numeric Coercions and Overloaded Literals

The standard library provides several overloaded functions as explicity coercions:

fromInteger     :: (Num a) => Integer -> a
fromRational    :: (Fractional a) => Rational -> a
toInteger       :: (Integral a) => a -> Integer
toRational      :: (RealFrac a) => a -> Rational
fromIntegral    :: (Integral a, Num b) => a -> b
fromRealFrac    :: (RealFrac a, Fractional b) => a -> b

fromIntegral    = fromInteger . toInteger
fromRealFrac    = fromRational . toRational

An integer numeral is actually equivalent to an application of fromInteger to the value of the numeral as an Integer. Thus, 7 has the type (Num a) -> a.

Similarly, a float numeral is regarded as an application of fromRational to the value of numeral as Rational, thus, 8.4 has the type (Fractional a) => a.

This means we can use numeric literals in generic numeric functions:

halve :: (Fractional a) => a -> a
halve x = x * 0.5

Default Numeric Types

Consider the following function:

rms :: (Floating a) => a -> a -> a
rms x y = sqrt ((x^2 + y^2) * 0.5)

The exponentiation function ^ has type (Num a, Integral b) => a -> b -> a, and since 2 has type (Num a) => a, so the type of x^2 is (Num a, Integral b) => a. But, there is no way to resolve the overloading associated with the type b, in other words, b should be Int or Integer? We may cloud fix this by explicitly denote the type:

rms x y = sqrt ((x^(2::Integer) + y^(2::Integer)) * 0.5)

But this way cloud soon grow tiresome, and not elegant at all.

To solve the overloading ambiguity, Haskell provides a way that is restricted to numbers: each module may contain a default declaration, consisting of the keyword default followed by a parenthesized list of numeric monotypes. When encountering an ambiguous type, if at least one of its classes is numeric and all of its classes are standard, the default list is consulted, and the first type from the list that will satisfy the context will be used.

For example, default (Int, Float), then the ambiguous exponent above will be resolved as int.

10. Modules

A Haskell program consists of a collection of modules, a module serves for controlling namespaces and creating abstract data types.

Haskell’s namespace of modules is completely flat, and modules is not first-class. More than one module could in a single file, and one module may span several files.

For example, a module of Tree will be:

module Tree ( Tree(Leaf, Branch), fringe ) where
data Tree a = Leaf a | Branch (Tree a) (Tree a)

fringe :: Tree a -> [a]
fringe (Leaf x)     = [x]
fringe (Branch l r) = fringe l ++ fringe r

This module explicityle exports Tree, Leaf, Branch and fringe. If the export list is empty, all of the names bound at the top level in the module will be exported. Note that the name of data constructors have be grouped together : Tree(Leaf, Branch), we can also write Tree(..) instead. Exporting a subset of constructors is possible.

To import the Tree module in other module:

module Main (main) where
import Tree ( Tree(Leaf, Branch), fringe )
main = print (fringe (Brach (Leaf 1) (Leaf 2)))

The items being imported are called entities. Note that the import list can be omitted, which would import all entities from Tree module.

Qualified Names

Quanlifiers are used to resolve conflicts between different entities which have the same names.

In the following example, we will show the Fringe as a qualified name:

module Fringe(fringe) where
import Tree (Tree(..))

fringe :: Tree a -> [a]
fringe (Leaf x) = [x]
fringe (Branch l r) = fringe l ++ fringe r
----
module Main where
import Tree (Tree(Leaf, Branch))
import quanlified Fringe (fringe)

main = print (Fringe.fringe (Branch (Leaf 1) (Leaf 2)))

Abstract Data Types

Modules provide the only way to build abstract data types in Haskell. The representation type of ADT is hidden, all operations on the ADT are done at an abstract level which does not depend on the representation. For the Tree example:

module TreeADT (Tree, leaf, branch, 
                cell, left, right, 
                isLeaf) where
data Tree a = Leaf a | Branch (Tree a) (Tree a)
leaf               = Leaf
branch             = Branch
cell (Leaf a)      = a
left (Branch l r)  = l
right (Branch l r) = r
isLeaf (Leaf _)    = True
isLeaf _           = False

Note that the Leaf and Branch data constructor are not exported.

11. Arrays

Note: this part are mainly based on Haskell 98, nowadays the GHC has 9 types of array constructors. To know about the updated arrays please see Haskell Wiki - Arrays.

In practice, we want to access array elements efficiently. In Haskell, array is not general functions, but abstract data types with a subscript operation.

There are two main approaches to functional arrays implementation: incremental and monolithic.

In the incremental case, we have a function that produces an empty array of a given size, and another function takes an array, an index and a value, producing a new array that differs from the old one only at the given index. Obviously, a naive implementaion of this way would be very inefficient, either requiring copying when creating new array, or taking linear time for array lookup; so a serious attempts of this approach employ sophisticated static analysis and clever run-time devices to avoid excessive copying.

The monolithic approach, constructs an array all at once, without reference to intermediate array values.

Any module using arrays must import the Data.Array module.

Index types

The Ix library defines a type class of array indices:

class (Ord a) => Ix a where
    range   :: (a, a) -> [a]
    index   :: (a, a) a -> Int
    inRange :: (a, a) -> a -> Bool

Instances declarations are provided for Int, Integer, Char Bool and tuples of Ix types up to length 5; instances can also be derived fro enumerated and tuple types. The first argument of each of these operations of class Ix is a pair of indices, they are tpically the bounds of an array.

The range operation takes a bounds pair and produces the list of indices lying between the bounds, in index order:

range (0, 4)           -- [0, 1, 2, 3, 4]
range ((0, 0), (1, 2)) -- [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2)]

The inRange predicate determines whether an index lies between a given pair of bounds.And the index operation allows a particular element of an array to be addressed: given a bounds pair and an in-range index, the operations yields the zero-origin ordinal of the index within the range. For example:

index (1, 9) 2                -- 1
index ((0, 0), (0, 1)) (1, 1) -- 4

Array Creation

Haskell’s monolithic array creation function from a pair of bounds and a list of index-value pairs:

array :: (Ix a) => (a,a) -> [(a,b)] -> Array a b

For example, a definition of an array of the squares of numbers from 1 to 100:

squares = array (1, 100) [(i, i*i) | i <- [1..100]]

Array subscripting is performed with the infix operator !, and the bounds of an array can be extracted with the function bounds:

squares!7      -- 49
bounds squares -- (1,100)

We could generalize this example by parameterizing the bounds and the function to be applied each index:

mkArray :: (Ix a) => (a -> b) -> (a, a) -> Array a b
mkArray f bnds = array bnds [(i, f i) | i <- range bnds]

And for another Fibonacci example:

fibs :: Int -> Array Int Int 
fibs n = a where a = array (0, n) ([(0,1), (1,1)] ++ 
                                   [(i, a!(i-2) + a!(i-1)) | i <- [2..n]])

Accumulation

By relax the restriction that an index appear at most once in the association list by specifying how to combine multiple values associated with a single index, we can obtain accumulated array:

accumArray :: (Ix a) => (b -> c -> b) -> b -> (a, a) -> [Assoc a c] -> Array a b

The first argument is the accumulating function, the second is an initial value, and the remaining arguments are bounds and an association list, as with the array function. Usually, the accumulating function is (+), the initial value 0. For example,

hist :: (Ix a, Integral b) => (a, a) -> [a] 0-> Array a b
hist bnds is = accumArray (+) 0 bnds [(i, 1) | i <- is, inRange bnds i]

hist takes a pair of bounds and a list of values, and yields a histogram, that is a table of the number of occurrences of each value within the bounds.

Suppose we have a collection of measurements on the interval [a, b), and we want to divide the interval into decades and count the number of measurements within each:

decades :: (RealFrac a) => a -> a -> [a] -> Array Int Int
decades a b = hist (0, 9) . map decade
              where decade x = floor ((x-a) * s)
                    s = 10 / (b - a)

Incremental Updates

Haskell also has an incremental array update function //, to update $i$th element in array a with value v, we can write a // [(i, v)].

(//) :: (Ix a) => Array a b -> [(a, b)] -> Array a b

For another example, to interchange two rows in a matrix:

swapRows :: (Ix a, Ix b, Enum b) => a -> a -> Array (a, b) c -> Array (a, b) c
swapRows i i' a = a // ([((i, j), a!(i',j)) | j <- [jLo..jHi]] ++
                        [((i',j), a!(i, j)) | j <- [jLo..jHi]])
                  where ((iLo,jLo), (iHi,jHi)) = bounds a

The concatenation of two list comprehensions is not very efficient, we can perform an equivalent version with loop fusion optimization:

swapRows i i' a = a // [assoc | j <- [jLo..jHi],
                                assoc <- [((i, j), a!(i',j)),
                                          ((i',j), a!(i, j))]]
                  where ((iLo,jLo), (iHi,jHi)) = bounds a

12. Typing Pitfalls

This section will talk about some problems that novices run into with Haskell’s type system.

Let-Bound Polymorphism

Any language using Hindley-Milner type system has let-bound polymorphism. Because identifiers not bound using a let or where clause are limited with respect to their polymorphism. For example:

let f g = (g [], g 'a')
in f (\x -> x)

is ill-typed, because of g is a anonymous function with type a -> a, which is used within f in two different ways: [a] -> [a] and Char -> Char.

Numeric Overloading

Numerals are overloaded, and not implicitly coerced to the various numeric types.

For example:

average xs = sum xs / length xs

is ill-typed. Because / requires fractional arguments, but length result is Int. We can fix it with an explicit coercion:

average :: (Fractional a) => [a] -> a
average xs = sum xs / fromInteger (length xs)

Monomorphism Restriction

Haskell’s type system contains a restriction related to type classes that is not found in ordinary HM type system: the monomorphism restriction. It syas that any identifier bound by a pattern binding, and having no explicit type signature, must be monomorphic. An identifier is monomorphic if either not overloaded, or is overloaded but is used in at most one specific overloading and is not exported.

If violate this restriction, there will be a static type error.

For example, consider the following definition:

sum = folfl (+) 0

This would cause a static type error. We can fix it by adding type signature.

sum :: (Num a) => [a] -> a
sum = folfl (+) 0

Or we can also written not using pattern binings:

sum xs = foldl (+) 0 xs