A 0-CFA Implementation in Scala

Control flow analysis is a static program analysis technique for determining the control flow of a program. In functional programming languages, the widely used control flow analysis algorithm is $k$-CFA.

$k$-CFA is a family of algorithms parameterized by $k$, which is a number represents the sensitivity of context. The sensitivity here denotes the precision of analysis, or intuitively speaking, how the analysis result approximates the actual result. Specifically, $k$-CFA over-approximately computes which functions are applied at a given call site by $k$ context sensitivity. If $k$ is larger, then the analysis is more precise. For example, 1-CFA analysis have ability to distinguish the flow given by 1 recent different calling context, we also use polyvariant analysis for context-sensitive analysis. To know more about control flow analysis, Jan Midtgaard’s survey paper Control-flow analysis of functional programs is a comprehensive reference.

In this post, I present a Scala implementation of 0-CFA(also called monovariant analysis), which is the most imprecise but efficent analysis in $k$-CFA family. The result of analysis is a map from variable to a set of lambda terms, tells that for a given variable x, which functions can be applied at call site of x during all executions of the program. By setting the $k$ to be $0$, 0-CFA does not distinguish different contexts, and merge all lambda term flows of the same variable. We build a purely functional implementation to analyze a simple language in continuation-passing style (see On Understanding CPS).

Note: This is a simple implementation, and although all different kinds of control flow analyses can be understood in term of abstract interpretation, but the formulation presented here comes from the paper Abstract Compilation, Boucher and Feeley and is a little bit different from the very recent research in control flow analysis, such as Abstracting Abstract Machine methodology which I will try to have another post to explain it.

The Analyzed Language

The first step is to define the language structure. Since the language we are analyzing is in CPS form, so the essential structures are Lambda and App for representing lambda term and function application respectively. A Lambda has a list of variables and and a body. An App has a expression at the operator position, and a list of expression as arguments. Another thing interesting is Letrec which used for creating recursive bindings. Here a binding is a variable name with an expression. Other things like literal values, primitive operators (such as + or if), variable names should be pretty straightforward.

case class Binding(name: String, value: Expr)

abstract class Expr
case class Lit(n: Int) extends Expr
case class Op(rator: String) extends Expr
case class Var(name: String) extends Expr
case class App(f: Expr, args: List[Expr]) extends Expr
case class Lambda(vars: List[String], body: Expr) extends Expr
case class Letrec(bindings: List[Binding], body: Expr) extends Expr

Here is an example of our analyzed program:

((lambda (apply k1)
      (apply (lambda (x1 k2) (+ x1 1 k2))
             (lambda (t2) (apply t2 (lambda (t3) (t3 2 k1))))))
    (lambda (f k3) (k3 (lambda (x2 k4) (f x2 k4))))
    (lambda (x) (halt x)))

Store

Then we define the Store, a mapping from variable names to set of lambda expression. Basically, our implementation of Store is a wrapper of immutable map from Scala. The two basic operations of store are lookup and update.

What lookup does is given a variable name, it tells you what lambda terms may flow to it according to the current store. For convenience, we also support use literal value Lit and lambda term Lambda as argument of lookup: for a literal value we return an empty set; and for a lambda term we return a singleton set just contains itself.

Methods update takes key(s) and value(s) and returns a new Store that contains these new key/value pairs. Because of the range of map is a set of expressions (Set[Expr]), if there exists a such key in the store we are updating, we merge the two sets as the new range rather than simply replace the old set. In abstract interpretation, this is a join operation on lattice. update(addr: String, v: Expr): Store and update(addrs: List[String], vs: List[Set[Expr]]): Store are helper methods when we would like to update a variable with a single lambda, or update a list of variables with a list of set of values respectively.

case class Store(map: Map[String, Set[Expr]]) {
  case class UndefinedAddrException(addr: Expr) extends RuntimeException

  def lookup(addr: String): Set[Expr] = map.get(addr) match {
    case Some(s: Set[Expr]) => s
    case None => Set()
  }

  def lookup(addr: Expr): Set[Expr] = { 
    addr match {
      case Lit(_) => Set()
      case Var(name) => lookup(name)
      case Lambda(_, _) => Set(addr)
      case _ => throw UndefinedAddrException(addr)
    }
  }

  def update(addr: String, vals: Set[Expr]): Store = {
    map.get(addr) match {
      case Some(oldvals) => Store(map ++ Map(addr -> (oldvals ++ vals)))
      case None => Store(map ++ Map(addr -> vals))
    }
  }

  def update(addr: String, v: Expr): Store = update(addr, Set(v))

  def update(addrs: List[String], vs: List[Set[Expr]]): Store = {
    (addrs zip vs).foldLeft(this) {
      case (store, (addr, v)) => store.update(addr, v)
    }
  }
}

object Store {
  def mtStore = Store(Map())
}

The Analysis

The entrance of the algorithm is method analysis, which is a fixed-point algorithm that iteratively applies analysisProgram with current store until the store not changes any more. At the very beginning, we provide an empty store Store.mtStore.

object ZeroCFA {
  def analysis(prog: Expr): Store = {
    def iter(store: Store): Store = {
      val newStore = analysisProgram(prog, store)
      if (store == newStore) store else iter(newStore)
    }
    iter(Store.mtStore)
  }

The method analysisProgram just immediately calls analysisCall, recall that we are analyzing a CPS program, so the a program can be only a function call at the top level. Then for analysisCall we do a case analysis for Letrec and App. If the program is a letrec, we collect all the bindings and use them to create a new store, then use analysisArgs to analyze all the right-hand side values in the letrec and obtain newNewStore. We may think that a letrec is a special form of application, but all the arguments can be mutual recrusivly defined. So finally we just call analysisCall. The way we trate App is similar and simpler.

  def analysisProgram(prog: Expr, store: Store): Store = analysisCall(prog, store)

  def analysisCall(call: Expr, store: Store): Store = {
    call match {
      case Letrec(bds, body) =>
        val newStore = store.update(bds.map(_.name), bds.map((b: Binding) => Set(b.value)))
        val newNewStore = analysisArgs(bds.map(_.value), newStore)
        analysisCall(body, newNewStore)
      case App(f, args) => analysisApp(f, args, analysisArgs(args, store))
    }
  }

The method analysisApp interprets function application. There are three cases can be happened at operator position:

  • it can be a variable, then we lookup the current enviornment (store) to see what functions can be used for this variable, and abstractly interprets these functions by analysisAbsApp
  • it can be a primitive operator, in this case we just need analyze arguments, since we don’t need to go into the implementation of operator
  • it can be just a lambda, then we update the store with its arguments and go back to analysisCall since the body of lambda is also a call

The method analysisAbsApp actually does something so called abstract interpretation. It tries every possible lambda can be applied non-determinstically, prepares the arguments for these functions. And analysisArgs tries to analyze every argument which is also a function call in continuation-passing style.

Recalls that the algorithm is a fixed-point algorithm, and we may have many iterations to find the fixed point store. So we are not like normal interpretation which uses a depth-first evaluation strategy if we imagine the program as a state-transition graph. One variable not exists in current iteration will interrupt some evaluation, but no worries, since the variable may exists in next iteration. At the next iteration, the evluation can be continued.

  def analysisApp(f: Expr, args: List[Expr], store: Store): Store = {
    f match {
      case Var(name) => analysisAbsApp(store.lookup(name), args, store)
      case Op(_) => analysisArgs(args, store)
      case Lambda(vars, body) => 
        val newArgs = args.map(store.lookup(_))
        val newStore = store.update(vars, newArgs)
        analysisCall(body, newStore)
    }
  }

  def analysisAbsApp(fs: Set[Expr], args: List[Expr], store: Store): Store = {
    fs match {
      case SetExtractor() => store
      case SetExtractor(f, rest@_*) => 
        val newArgs = args.map(store.lookup(_))
        val newStore = store.update(f.asInstanceOf[Lambda].vars, newArgs)
        analysisAbsApp(rest.toSet, args, newStore)
    }
  }

  def analysisArgs(args: List[Expr], store: Store): Store = {
    args match {
      case Nil => store
      case arg::rest => 
        val newStore = arg match {
          case Lambda(vars, body) => analysisCall(body, store)
          case _ => store
        }
        analysisArgs(rest, newStore)
    }
  }
}

Besides, we have an auxiliary calss SetExtractor used for matching sets in analysisAbsApp.

object SetExtractor {
  def unapplySeq[T](s: Set[T]): Option[Seq[T]] = Some(s.toSeq)
}

Test

Here we demonstrate the analysis with a simple program:

/*
  ((lambda (x k) (k (lambda (a) (halt a)))) 
   3 
   (lambda (z) (halt z)))
*/
def example = App(Lambda(List("x", "k"),
                App(Var("k"), List(Lambda(List("a"), App(Var("halt"), List(Var("a"))))))),
                List(Lit(3), Lambda(List("z"), App(Var("halt"), List(Var("z"))))))

The result of analysis tells us that what lambda can be appeared for each variable:

  x  |-->  empty set
  z  |-->  (lambda (a) (halt a))
  k  |-->  (lambda (z) (halt z))