Small-Step CEK Machine in OCaml

CEK Machine is a kind of abstract machine for describing and executing functional languages. The CEK stands for three components of a machine state: Control, Environment and Continuation. Originally, Landin proposed the first abstract machine SECD machine for evaluating lambda calculus in 1960s, and later Felleisen and Friedman obtained CEK machine as a modern version of SECD machine. Comparing with SECD machine, CEK machine supports tail-call optimization which has a constant size of continuation when dealing with tail calls. In this post, I will talk about an OCaml implementation of CEK machine.

We start by defining the language accepted by this CEK machine. Basically, it is lambda calculus with integers as primitive type, as well as operations on integers such as plus and multiplication.

module CEKMachine = struct
  type var = string

  type expr = 
    | Int of int
    | Var of var
    | Lam of string * expr
    | App of expr * expr
    | Plus of expr * expr
    | Mult of expr * expr

The type expression are reducible term in out abstract machine, while the type value are irreducible term and only has two forms: an integer number, or a closure. Since the restriction of OCaml, we have to define type value and type env mutual recursively. The environment is a map from variable to value. Here we implement the environment structure as an associated list, each element in the list is a pair of variable and value. We also define two auxiliary function ext and lookup to add a new pair to the environment or look up a value given a variable name.

  type value = 
    | Int of int
    | Clo of expr * env
  and env = (var * value) list

  let ext var value env = (var, value)::env

  exception UnboundedVariable of string

  let rec lookup var = function
    | [] -> raise (UnboundedVariable var)
    | (x, v)::env -> if x = var then v else lookup var env

The continuation is an interesting part of CEK machine, it represents the next computation we need to do when we done the current. Continuations are chained, which means each continuation is associated with another continuation. The Mt continuation denotes halting, which has nothing left to do. The Fn and Ar continuation are used for handling function application. According to left-to-right call-by-value semantics, we need to evaluate the first position of an application expression until have a lambda there, then evaluate the second position until have a value there, finally we go into the body of the lambda term with an extended environment. So the Fn continuation denotes we hold a lambda term comes from the first posistion, and the Ar continuation represents we hold the expression comes from the second position and waits for evaluation.

Without loosing generality, we use only one continuation Op to represent the continuations of different primitive operators. It consists of four components: 1) an op denotes the operator, 2) a list contains values of computed arguments, 3) a list of closure that represents remaining computation, and 4) the next continuation. The type op is an type alias of a function type int -> int -> int.

  type op = int -> int -> int

  let plusOp = fun x y -> x + y
  let multOp = fun x y -> x * y

  type cont = 
    | Mt
    | Fn of expr * env * cont
    | Ar of expr * env * cont
    | Op of op * value list * (expr * env) list * cont

Now the state of machine can be easily represented as a product type of control, env and cont. For readability, we rename expr as control.

  type control = expr
  type state = control * env * cont

The following function defines the single-step transition rules. The first rule is saying that given a variable at control position, we need to lookup its corresponding value from current environment. If the value is a closure, then we take the environment from the closure as the environment of the new state, and use the lambda term as control of the new state. If the value is an integer, then we just use it as control of the next state.

The second and third rules are handling primitive operations. We use term at the first position as control of next state, and generate a new continuation which records the operation, the second argument and current continuation. The fourth rule happens when we have an integer as control, and have an Op continuation holds the second argument. Then we transfer the control to the second argument and save the integer of first arugment. The fifth rule is just doing the arithmetic when we have both integers of the first and second arugments.

Handling application applies the rules 6, 7, 8 and 9. As we said before, the rule 6 generates a new continuation Ar which means it holds the second term of application expression. In rule 7, when we have a lambda term as control, then we may start evaluate the second term saved in Ar, as well as save the lambda term in Fn. Rules 8 and 9 are different cases of evaluation of the second term, in either case, we transfer the body of lambda and extend the environment with x maps to the value we got where the x is the variable name of lambda term. And the value is a closure if we have a lambda term comes from the second position, or the value is just an integer.

The states except we described above are all illegal state, we raise an exception.

  exception IllegalState

  let step state = match state with
    | (Var x, env, k) -> (match lookup x env with
       | Clo (lam, env') -> (lam, env', k)
       | Int x -> (Int x, env, k))
    | (Plus (e1, e2), env, k) 
      -> (e1, env, Op (OpPlus, [], [(e2, env)], k))
    | (Mult (e1, e2), env, k) 
      -> (e1, env, Op (OpMult, [], [(e2, env)], k))
    | (Int x, env, Op (op, [], [(e2,env')], k)) 
      -> (e2, env', Op (op, [(Int x)], [], k))
    | (Int x, env, Op (op, [(Int y)], [], k)) 
      -> (Int (op x y), [], k)
    | (App (e1, e2), env, k) 
      -> (e1, env, Ar (e2, env, k))
    | (Lam (x, body), env, Ar (e2, env', k)) 
      -> (e2, env', Fn (Lam (x, body), env, k))
    | (Lam (_,_) as lam, env, Fn (Lam (x, body), env', k)) 
      -> (body, ext x (Clo (lam, env)) env', k)
    | (Int i, env, Fn (Lam (x, body), env', k))
      -> (body, ext x (Int i) env', k)
    | s -> raise IllegalState

Helper function to run the CEK machine are defined. Function inject returns an initial machine state given an expression. Funtion run takes an expression and turns it to a machine state, then start evaluation from this state according to the single-step transition rules. The machine terminates if it has a state which the control is a value and the continuation is Mt; otherwise the machine uses step to obtain the next state and do the matchin recursively.

  let inject e = (e, [], Mt)

  let run e =
    let rec aux_run state = 
      match step state with
      | (Int i, e, Mt) -> string_of_int i
      | (Lam (_,_), e, Mt) -> "closure"
      | (c, e, k) as state' -> aux_run state';
    in aux_run (inject e)
end

Here are some test. The first example ex1 is just a (+ 3 4), and the second one is (+ ((lambda (x) (+ x 4)) 3) ((lambda (z) (+ z 5)) 6)). And note that the language and CEK machine is turing complete, which means it allows non-terminating computation such as ((lambda (x) (x x)) (lambda (x) (x x))), as the third example.

open CEKMachine

let ex1 = Plus (Int 3, Int 4);;

let ex2 = Plus (App (Lam ("x", Plus (Var "x", Int 4)), Int 3), 
                App (Lam ("z", Plus (Var "z", Int 5)), Int 6));;

let nonter = App ((Lam ("x", App (Var "x", Var "x"))),
                  (Lam ("x", App (Var "x", Var "x"))));;

print_string (run ex2);;