Building an Abstract Interpreter in Scala 3

Jan 11, 2023

Building an Abstract Interpreter in Scala 3

In this tutorial, I would like to show how to build a static analyzer for a simple imperative programming language following the principle of abstract interpretation.

The analyzer computes a lower bound and an upper bound for all numerical variables in the program without running the program using actual inputs. This analysis can be used to detect various errors in programs, for example, out-of-bound array accesses.

We will build the analyzer step-by-step, starting from a concrete interpreter, then implement abstract domains, and finally construct the abstract interpreter. We will not focus on the theory of abstract interpretation, but on the actual construction of an executable analyzer, which nevertheless quite faithfully reflects the theory beind abstract interpretation. The analyzer is written in Scala 3 and we will use a few advanced features of Scala 3 to make the construction quite modular and concise.

The code presented in this post was originally a baseline analyzer I wrote for a course project. Later, I realized it is probably a nice example to demonstrate how to build abstract interpreters. If you have any comments or find mistakes here, please kindly let me know.

1 The While Language

1.1 Syntax

1.2 An Example

2 Concrete Interpreter

2.1 Representing Values

2.2 Store

2.3 Evaluation of Expressions

2.4 Execution of Statements

2.4.1 Alternative Loop Semantics

2.5 Running the Concrete Interpreter

3 Abstract Domains

3.1 Lattices

3.1.1 Example: Doubles as Lattice

3.1.2 Lifting Lattices: Cartesian Products

3.1.3 Lifting Lattices: Map

3.2 Interval

3.2.1 Definition

3.2.2 Intervals as Lattice

3.2.3 Interval Arithmetic

3.3 Numerical Abstract Domain

4 Abstract Interpreter

4.1 Abstract Values and Stores

4.2 Abstract Evaluation of Expressions

4.3 Abstract Execution of Statements

4.4 Fixed-Point Computation

4.5 Alternative Fixed-Point Computation

5 Running the Abstract Interpreter

5.1 Power Function

5.2 Conditional

5.3 Non-Terminating Programs

5.4 Why Widening is Necessary?

6 What’s Next?

Bibliography

1 The While Language

Let us start from a tiny imperative programming language with while loops and if-then-else control structures. The basic constructs in this language are expressions and statements. The only possible data type is integers, and there are binary arithmetic operations over integers. We do not have functions or memory operations in this language, although they can be added when modeling more realistic languages.

1.1 Syntax

We use algebraic data types🔗 to define the abstract syntax of the language. The abstract syntax of expressions is the followings: an expression is either a literal number Lit, a variable Var, or a binary operation BinOp. The two operands of BinOp are also Expr, recursively.

enum Expr:
  case Lit(i: Int)
  case Var(x: String)
  case BinOp(op: String, e1: Expr, e2: Expr)

The abstract syntax of statements is as follows: a statement is either a no-op, an assignment, a sequence of two statements, an if-then-else statement, or a while loop.

enum Stmt:
  case Skip
  case Assign(x: String, e: Expr)
  case Seq(s1: Stmt, s2: Stmt)
  case Cond(e: Expr, s1: Stmt, s2: Stmt)
  case While(e: Expr, s: Stmt)

In the rest of this tutorial (a literate Scala file), we import all definitions from Expr and Stmt to the current (top-level) scope:

import Expr._
import Stmt._

1.2 An Example

We show a program snippet that iteratively computes the power of numbers b^x:

// x and b are inputs
res = 1
while (0 < x) {
  res = b * res
  x = x - 1
}

Its abstract syntax can be defined as:

val powerWhile =
  Seq(Assign("res", Lit(1)),
    While(BinOp("<", Lit(0), Var("x")),
      Seq(Assign("res", BinOp("*", Var("b"), Var("res"))),
        Assign("x", BinOp("-", Var("x"), Lit(1))))))

2 Concrete Interpreter

With the defined abstract syntax of the While language, we build a concrete interpreter, which serves as the semantic ground truth for our abstract interpreter.

2.1 Representing Values

Since the only possible data in this pedagogical language are integers, we can simply use a case class NumV to represent those values. We also set a type alias from Value to NumV. If the language is extended, we could use proper algebraic data types to represent (tag) different variants of values.

case class NumV(i: Int)
type Value = NumV

NumV is really just a shallow wrapper of Scala’s Int type. For convenience, we define an implicit conversion🔗 from Int to Value, so that anywhere expecting a value of type Value we can simply provide an integer, and the Scala compiler will automatically insert the conversion we defined.

given Conversion[Int, Value] with
  def apply(i: Int): Value = NumV(i)

2.2 Store

To evaluate expressions, we need to keep track of the values of variables. This is achieved by using a store, i.e. a mapping from variable names to their values. Variables names are simply represented as strings.

type Store = Map[String, Value]

2.3 Evaluation of Expressions

Next, we define the evaluation of expressions. The syntax defines three possible cases of expressions: variables, literals, and binary operators.

For variables, we can readily look up the store to retrieve its corresponding value. For literals, we map the syntactic numeric value into our semantic domain (i.e. NumV). For binary operators, we define a primitive function evalOp to evaluate the operator and the two value operands. Note that evalOp takes two values that are already evaluated by recursive calls to eval.

def eval(s: Expr, σ: Store): Value =
  s match
    case Var(x) => σ(x)
    case Lit(i) => NumV(i)
    case BinOp(op, e1, e2) => evalOp(op, eval(e1, σ), eval(e2, σ))

It is worth noting that eval is a partial function: if a variable x is not defined in store σ, an error will be raised (as a Scala’s runtime exception).

The remaining job is to define evalOp. Given the straightforward value representation NumV, it is not hard to define a function such as the following to evaluate against the operator op:

def evalOp(op: String, v1: NumV, v2: NumV): NumV

However, we decide to generalize the function a bit, which works on any type T that has implemented common arithmetic operations.

def evalOp[T: Arith](op: String, v1: T, v2: T): T

As we will see later, when defining the abstract interpreter (Abstract Evaluation of Expressions), this generalization is beneficial and enables more code reuse.

So, let us define a type class🔗 Arith that captures common arithmetic operations. We additionally define comparison operators < and ≡, where the “boolean” results of these comparisons are encoded as members of type A (e.g. if T is Int, then 0 for false and 1 for true). The type class itself is a trait with a type parameter. Those arithmetic operations are declared as extension methods🔗 of type A in Scala 3, which can be used conveniently with infix notations.

trait Arith[A]:
  extension (x: A)
    def +(y: A): A
    def -(y: A): A
    def *(y: A): A
    def /(y: A): A
    def <(y: A): A
    def ≡(y: A): A

Then, it is easy to show that NumV is an instance of Arith. After all, NumV is just a shallow wrapper of Int.

given NumVArith: Arith[NumV] with
  extension (x: NumV)
    def +(y: NumV): NumV = x.i + y.i
    def -(y: NumV): NumV = x.i - y.i
    def *(y: NumV): NumV = x.i * y.i
    def /(y: NumV): NumV = x.i / y.i
    def <(y: NumV): NumV = if (x.i < y.i) 1 else 0
    def ≡(y: NumV): NumV = if (x.i == y.i) 1 else 0

When defining this given instance, we have implicitly used the conversion we defined previously in Representing Values. For example, note that x.i and y.i are of type Int, but we can still write x.i + y.i of type Int, where type NumV is expected. The integer value is converted to NumV automatically.

Finally, evalOp can be defined generically over any type T with an ad-hoc requirement of T being an instance of Arith. For each case of op, we dispatch it to the extension methods defined in Arith to evaluate the operator and operands.

def evalOp[T: Arith](op: String, v1: T, v2: T): T =
  op match
    case "+" => v1 + v2
    case "-" => v1 - v2
    case "*" => v1 * v2
    case "/" => v1 / v2
    case "<" => v1 < v2
    case ">" => v2 < v1
    case "=" => v1 ≡ v2
    case _ => throw new RuntimeException("Unsupported operation")

2.4 Execution of Statements

Evaluating expressions is a pure procedure. However, statements have side effects (i.e. mutating values in the store) and control the flow of the program (i.e. conditional and loop statements). We now define the meaning of statements as a transformation of stores, where the store captures the effect of statements.

The execution function of statements takes a syntactic statement and a store as arguments, and it returns a store:

def exec(s: Stmt, σ: Store): Store

If we tweak a bit the definition of exec by currying the arguments, i.e. def exec(s: Stmt)(σ: Store): Store, it becomes more evident that given a statement s, exec(s): Store => Store is a transformation of stores.

def exec(s: Stmt, σ: Store): Store =
  s match
    case Skip => σ
    case Assign(x, e) => σ + (x -> eval(e, σ))
    case Seq(s1, s2) => exec(s2, exec(s1, σ))
    case Cond(e, s1, s2) =>
      val NumV(c) = eval(e, σ)
      if (c == 1) exec(s1, σ) else exec(s2, σ)
    case While(e, s) =>
      val NumV(c) = eval(e, σ)
      if (c == 1) exec(While(e, s), exec(s, σ))
      else σ

There are five cases of statements: Skip, Assign, Seq, Cond, and While. For Skip, the store is unchanged and returned directly. For Assign, we update the store with a new mapping from variable x to the evaluated value of expression e under store σ. If there this already a mapping of x in the store, it is overwritten. For statement sequential composition Seq(s1, s2), we first execute s1 and then execute s2 with the resulting store from s1. For conditional statements Cond(e, s1, s2), we evaluate the condition expression e and extract its integer value. If the value is 1 we execute the “then” branch and “else” branch otherwise (note that the condition is any value other than 1 in the otherwise case).

The While case is similar to the conditional case in that we need to evaluate the condition first. Then if the condition is true, we unfold the syntax of While(e, s) into Seq(s, While(e, s)) and then execute the new statement, which is equivalent to exec(While(e, s), exec(s, σ)).

It is worth noting that we can easily construct non-terminating programs that stuck within the While case. This is the reason that we need to build abstract interpreters that analyze programs in a computable way (thus always terminate).

2.4.1 Alternative Loop Semantics

In addition, there are different ways to define the semantics of loops. What we have just seen is to assign the meaning by operationally manipulating the syntactic structure and resort to recursive calls of exec.

One alternative is to use “recursion” from the meta-language (Scala) to define a store transformation function loop. The loop function then calls itself recursively if the condition holds.

case While(e, s) =>
  def loop(σ: Store): Store = {
    val NumV(c) = eval(e, σ)
    if (c == 1) loop(exec(s, σ)) else σ
  }
  loop(σ)

Then to compute the store changes of While(e, s), we call loop(σ) with the current store.

Or, the loop function does not have to be (closed-)recursive. Instead, it can be written as an open-recursive function, and we use a fixed-point combinator🔗 to provide the self-reference and compute the fixed-point of the function. The following snippet shows the idea.

case While(e, s) =>
  def loop(rec: Store => Store)(σ: Store): Store = {
    val NumV(c) = eval(e, σ)
    if (c == 1) rec(exec(s, σ)) else σ
  }
  fix(loop)(σ)

The function loop does not directly call itself via its name, instead, it takes an argument rec of type Store => Store representing the capability to call itself. Then we can use the fixed-point combinator fix to compute the fixed-point of the function.

def fix[A, B](f: (A => B) => A => B)(x: A): B = f(fix(f))(x)

The fixed-point combinator is written as a recursive function that takes an another function argument of type (A => B) => A => B. The first part (A => B) corresponds to what loop takes as its argument rec. In the body of fix, we call fix(f) yielding a function value of type A => B, which can be fed into f. This term f(fix(f)) again has type A => B, so we apply it with the argument x of type A. This is the common way to define fixed-point combinators in call-by-value languages.

After defining this loop, calling fix(loop) returns a function Store => Store, then we can provide the current store σ. In fact, this is the classic denotational semantics where the meaning of a statement s is mapped to a continuous function🔗 of type Store => Store.

2.5 Running the Concrete Interpreter

Let again use the power program (An Example) to show how to run the concrete interpreter. For example, to compute 2^{10}, we execute the program with an initial store \{ x \mapsto 10, b \mapsto 2 \}:

exec(powerWhile, Map("x" -> NumV(10), "b" -> NumV(2)))

The execution produces a final store where the expected result res is mapped to NumV(1024):

Map(x -> NumV(0), b -> NumV(2), res -> NumV(1024))

3 Abstract Domains

Abstract domains are basic building blocks of abstract interpreters. The chosen abstract domain represents the space of possible values/behaviors of the program in a computable way, and the abstract interpreter computes conservative approximations within that abstract domain.

In this section, we will construct the basic algebraic structure of abstract domains and then give an instance of numerical abstract domains that tracks lower/upper bounds. Again, the abtract domain structure will be implemented using Scala 3’s type classes mechanism🔗.

3.1 Lattices

Abstract domains are first of all lattices🔗 endowing an ordering relation. We define lattices using type classes in Scala 3, that is, a trait with a type parameter. In the following, we define the structure of lattices, which consists of a bottom element bot, a top element top, a binary operator ⊑ for deciding the ordering between two elements, a binary operator join ⊔ for computing the least upper bound of two elements, and a binary operator meet ⊓ for computing the greatest lower bound of two elements. Similar to how we define the type class of arithmetic operations, those binary operators are implemented as “extension methods”, enabling convenient use as infix notations.

trait Lattice[T]:
  def bot: T
  def top: T
  extension (x: T)
    def ⊑(y: T): Boolean
    def ⊔(y: T): T
    def ⊓(y: T): T

In abstract interpretation, lattices with infinite height (such as intervals) usually do not guarantee termination via join-induced fixed-point iteration. The non-termination is possible since we cannot reach the least fixed-point by repeatedly applying the join operator — there are infinite elements between the bottom (or any point below the least fixed-point) and the least fixed-point. See definitions of widening/narrowing. Therefore, it is necessary to use a “fancier join” operator that is called “widening” to guarantee termination. By using a widening operator ▽, we can reach an overapproximation of the least fixed-point. After that, a narrowing operator can be used to further refine the precision of the overapproximated result (although it is not always necessary to use a narrowing operator).

With the additional widening and narrowing operator, let us call the new structure a proper “abstract domain”. An abstract domain is a lattice, thus is implemented as a sub-trait of Lattice:

trait AbsDomain[T] extends Lattice[T]:
  extension (x: T)
    def ▽(y: T): T // widening
    def △(y: T): T // narrowing

In a later section (Running the Abstract Interpreter), we will also make an experiment that shows widening is necessary to converge.

3.1.1 Example: Doubles as Lattice

To give a concrete idea of lattices, we use floating-point numbers (Double in Scala) as an example. Double numbers are lattices, where the bottom element is the negative infinity and the top element is the positive infinity, and the join and meet operators are mathematical max and min.

given DoubleLattice: Lattice[Double] with
  def bot: Double = Double.NegativeInfinity
  def top: Double = Double.PositiveInfinity
  extension (x: Double)
    def ⊑(y: Double): Boolean = x <= y
    def ⊔(y: Double): Double = math.max(x, y)
    def ⊓(y: Double): Double = math.min(x, y)

This defined instance Lattice[Double] will be used later when defining the lattice for intervals, which is a structure containing two double numbers.

3.1.2 Lifting Lattices: Cartesian Products

If two types (or sets) A and B are lattices, their Cartesian product (A, B) is also a lattice. To show that in Scala, we first summon🔗 the instances of Lattice[A] and Lattice[B], which provide the top/bottom elements of A and B to construct the top/bottom element of (A, B). Since the components of (A, B) are already lattices, to define the ordering/join/meet operator, we can directly use the infix notation of components (e.g. x._1 ⊔ y._1). We basically lift those lattice-theoretical operations of A and B to (A, B).

The following code constructs the evidence of Lattice[(A, B)]:

given ProductLattice[A: Lattice, B: Lattice]: Lattice[(A, B)] with
  val la: Lattice[A] = summon[Lattice[A]]
  val lb: Lattice[B] = summon[Lattice[B]]
  def bot: (A, B) = (la.bot, lb.bot)
  def top: (A, B) = (la.top, lb.top)
  extension (x: (A, B))
    def ⊑(y: (A, B)): Boolean = x._1 ⊑ y._1 && x._2 ⊑ y._2
    def ⊔(y: (A, B)): (A, B) = (x._1 ⊔ y._1, x._2 ⊔ y._2)
    def ⊓(y: (A, B)): (A, B) = (x._1 ⊓ y._1, x._2 ⊓ y._2)

Similarly, if two types A and B are abstract domains, their product (A, B) is also an abstract domain. The widening/narrowing operators are defined component-wise.

given ProductAbsDomain[A: AbsDomain, B: AbsDomain]: AbsDomain[(A, B)] with
  val pl: ProductLattice[A, B] = summon[ProductLattice[A, B]]
  export pl.*
  extension (x: (A, B))
    def ▽(y: (A, B)): (A, B) = (x._1 ▽ y._1, x._2 ▽ y._2)
    def △(y: (A, B)): (A, B) = (x._1 △ y._1, x._2 △ y._2)

Since AbsDomain is a sub-trait of Lattice, to define AbsDomain[(A, B)] we need to provide those definitions that are required from Lattice. But we do not want to repeat the code, so we first summon an instance of ProductLattice[A, B] and use the export clause from Scala 3🔗 to export the existing field/method definitions from ProductLattice to ProductAbsDomain. Without the export, the Scala compiler would complaint missing definitions.

3.1.3 Lifting Lattices: Map

Another derived lattice structure is map. If the values (of type V) of a map are lattices, then the map from K to V itself is a lattice. The bottom element is the empty map. The top element is not defined with our Scala Map representation, because we cannot construct the map with all keys. (However, with a functional representation of maps, it is possible to define a default map that maps all keys to the top element of V.)

The rest operators worth a bit further explanation. The ordering test m1 ⊑ m2 checks if every value from m1 is less than the corresponding value from m2. If the key is not defined in m2, we use the V’s bottom for comparison. The join m1 ⊔ m2 merges the two maps by taking the join of values for shared keys, and preserving those values of disjoint keys. The meet m1 ⊓ m2 returns a map that contains only the shared keys, whose values are the meet of the corresponding values in m1 and m2.

given MapLattice[K, V: Lattice]: Lattice[Map[K, V]] with
  val lv: Lattice[V] = summon[Lattice[V]]
  def bot: Map[K, V] = Map[K, V]()
  def top: Map[K, V] = throw new RuntimeException("No representation of top map")
  extension (m1: Map[K, V])
    def ⊑(m2: Map[K, V]): Boolean =
      m1.forall { case (k, v) => v ⊑ m2.getOrElse(k, lv.bot) }
    def ⊔(m2: Map[K, V]): Map[K, V] =
      m1.foldLeft(m2) { case (m, (k, v)) => m + (k -> v ⊔ m.getOrElse(k, lv.bot)) }
    def ⊓(m2: Map[K, V]): Map[K, V] =
      m1.keySet.intersect(m2.keySet).foldLeft(Map[K,V]()) {
        case (m, k) => m + (k -> m1(k) ⊓ m2(k))
      }

Similarly, if the type V is an abstract domain, then a map from K to V is also an abstract domain. Again, to avoid repeating the code, we summon the instance of MapLattice[K, V] and use the export keyword to export the base definitions of lattices and only provide definitions of widening and narrowing. The widening/narrowing follow the same idea of the definition of join/meet, but only replace V’s join/meet with V’s widening/narrowing operator.

given MapAbsDomain[K, V: AbsDomain]: AbsDomain[Map[K, V]] with
  val ml: MapLattice[K, V] = summon[MapLattice[K, V]]
  export ml.*
  extension (m1: Map[K, V])
    def ▽(m2: Map[K, V]): Map[K, V] =
      m1.foldLeft(m2) { case (m, (k, v)) => m + (k -> v ▽ m.getOrElse(k, lv.bot)) }
    def △(m2: Map[K, V]): Map[K, V] =
      m1.keySet.intersect(m2.keySet).foldLeft(Map[K,V]()) {
        case (m, k) => m + (k -> m1(k) △ m2(k))
      }

Later, we will use maps as the abstract domain to represent the abstract store of program execution.

3.2 Interval

The interval abstract domain is a simple example of numeric abstract domains and we will be using it as abstract values in our abstract interpreter. We will first present its definition, and then show that it is a lattice, an arithmetic domain, and an abstract domain.

3.2.1 Definition

Intervals is defined as a case class that contains two Double numbers, respectively representing the lower bound and the upper bound. The intervals we defined here are inclusive on both ends.

case class Interval private (lb: Double, ub: Double):
  def toConst: Option[Int] = if (lb == ub) Some(lb.toInt) else None

The Interval class also has a toConst helper method, which returns a number when the interval represents a single constant number, and returns None otherwise. This can be considered a limited concretization function and will be useful later to gain more precision for some cases.

The default constructor of the case class is declared as private, so we define a companion object for Interval containing some smart constructors.

object Interval:
  def make(lb: Double, ub: Double): Interval =
    if (ub < lb) Interval(Double.PositiveInfinity, Double.NegativeInfinity)
    else Interval(lb, ub)
  def from(is: Int*): Interval = Interval(is.min.toDouble, is.max.toDouble)

The make constructor performs an additional check and returns the bottom element if the lower bound is greater than the upper bound. This is necessary since the case class constructor can be used to build ill-formed intervals, such as Interval(2, 1), which will be normalized to the unique representation of the bottom element [+\infty, -\infty] by the smart constructor.

The from constructor can be seen as an “abstraction” function that lifts a sequence of concrete numbers into the interval domain. For example, we can write Interval.from(3) to build singleton interval [3, 3], and Interval.from(5,4,3,10) to build the interval [3, 10].

3.2.2 Intervals as Lattice

In the abstract interpreter, we will use intervals to represent values of programs, i.e., those values stored in the abstract store. To make the store a lattice, its range (value type) must be a lattice (recall Lifting Lattices: Map). Therefore, now we need to show that intervals also induce a lattice structure.

Since the interval consists of two double numbers, where Double is already a lattice (shown in Example: Doubles as Lattice), we can make use of that fact and Double’s lattice operators to define the lattice of intervals.

The top element of intervals ranges from negative infinity to positive infinity. We also designate a canonical representation of the bot element, which inverses the bounds of top to represent the empty set of values, i.e. [\infty, -\infty].

The ordering relation of intervals compares if the second operand spans a larger inclusive range than the first operand. To join two intervals, we use the smaller lower bound and the larger upper bound to construct the resulting interval, thus including the values in both intervals. To meet two intervals, we use the larger lower bound and the smaller upper bound to construct the resulting interval, thus only including those values that are in both intervals.

given IntervalLattice: Lattice[Interval] with
  val dl = summon[Lattice[Double]]
  def bot = Interval.make(dl.top, dl.bot)
  def top = Interval.make(dl.bot, dl.top)
  extension (l1: Interval)
    def ⊑(l2: Interval) = l2.lb ⊑ l1.lb && l1.ub ⊑ l2.ub
    def ⊔(l2: Interval) = Interval.make(l1.lb ⊓ l2.lb, l1.ub ⊔ l2.ub)
    def ⊓(l2: Interval) = Interval.make(l1.lb ⊔ l2.lb, l1.ub ⊓ l2.ub)

Next, we define the widening and narrowing operator for intervals. The intuition is to use widening to accelerate and ensure convergence of analysis, whereas the narrowing is to refine the result.

given IntervalAbsDomain: AbsDomain[Interval] with
  val il: Lattice[Interval] = summon[Lattice[Interval]]
  export il.*
  extension (l1: Interval)
    def ▽(l2: Interval): Interval =
      if (l1 == bot) l2
      else if (l2 == bot) l1
      else Interval.make(if (l1.lb ⊑ l2.lb) l1.lb else Double.NegativeInfinity,
                         if (l2.ub ⊑ l1.ub) l1.ub else Double.PositiveInfinity)
    def △(l2: Interval): Interval =
      if (l1 == bot) bot
      else if (l2 == bot) bot
      else Interval.make(if (l1.lb == Double.NegativeInfinity) l2.lb else l1.lb,
                         if (l1.ub == Double.PositiveInfinity) l2.ub else l1.ub)

For the widening operator, we widen the lower bound to -\infty if l2’s lower bound is lower than l1’s lower bound, and widen the upper bound to \infty if l2’s upper bound is higher than l1’s upper bound. By widening in this fashion, we skip some elements that are higher than the “join” of l1 and l2 and directly reach the end of the lattice, thus ensuring convergence.

3.2.3 Interval Arithmetic

Intervals are used to abstractly model number values in programs, thus it is required to also model arithmetic operations for intervals🔗. Interval arithmetic has been well studied and used in error analysis and other engineering disciplines.

We can define the instance showing that Interval is an Arith, which is defined in Evaluation of Expressions. The addition and subtraction operations are straightforward, for example, adding two intervals is to add the lower bounds and the upper bounds of the two intervals (if they are not bottom), e.g. [1, 5] + [2, 3] = [3, 8].

The multiplication and division operations are a bit tricky. But I won’t go into the details here. A detail explanation of arithmetics on intervals can be found at Wikipedia🔗.

given IntervalArith: Arith[Interval] with
  extension (x: Interval)
    def +(y: Interval): Interval =
      if (x == bot || y == bot) bot
      else Interval.make(x.lb + y.lb, x.ub + y.ub)
    def -(y: Interval): Interval =
      if (x == bot || y == bot) bot
      else Interval.make(x.lb - y.ub, x.ub - y.lb)
    def *(y: Interval): Interval =
      if (x == bot || y == bot) bot
      else {
        val Interval(lb1, ub1) = x
        val Interval(lb2, ub2) = y
        val arr = List[Double](lb1 * lb2, lb1 * ub2, ub1 * lb2, ub1 * ub2)
        Interval.make(arr.min, arr.max)
      }
    def /(y: Interval): Interval =
      if (x == bot || y == bot) bot
      else if (y.toConst == Some(0)) bot
      else {
        val rhs = y match
          case Interval(lb2, ub2) if !(lb2 <= 0 && 0 <= ub2) =>
            Interval.make(1/ub2, 1/lb2)
          case Interval(lb2, 0) =>
            Interval.make(Double.NegativeInfinity, 1/lb2)
          case Interval(0, ub2) =>
            Interval.make(1/ub2, Double.PositiveInfinity)
          case  _ => Interval.make(Double.NegativeInfinity, Double.PositiveInfinity)
        x * rhs
      }
    def <(y: Interval): Interval =
      if (x == bot || y == bot) bot
      else {
        val Interval(lb1, ub1) = x
        val Interval(lb2, ub2) = y
        if (ub1 < lb2) Interval.from(1)
        else if (lb1 > ub2) Interval.from(0)
        else Interval.from(0, 1)
      }
    def ≡(y: Interval): Interval = (x.toConst, y.toConst) match
      case (Some(v1), Some(v2)) =>
        if (v1 == v2) Interval.from(1) else Interval.from(0)
      case (_, _) => Interval.from(0, 1)

For comparison operators, we inspect the interval bounds and try to determine precise result when possible. For instance, if the upper bound of the first interval is smaller than the lower bound of the second interval, then all values represented by the first interval definitely is smaller than those represented by the second interval. In this case, we can therefore directly return a constant interval [1, 1] representing true. Similarly, for deciding equality, we can directly return [1, 1] if the two intervals are the same constant number. For the cases that neither true nor false can be uniquely determined, we return an interval [0, 1] representing the possibility of being either true or false. Although it is doable that we can refine this analysis for better precision, what we have here is enough to demonstrate the idea.

3.3 Numerical Abstract Domain

Finally, let’s compose the Arith type class with abstract domains, which yields numerical abstract domains.

trait NumAbsDomain[T] extends AbsDomain[T] with Arith[T]

The instance that shows Interval is a numerical abstract domain can be obtained directly by composition and export:

given IntervalNumAbsDomain: NumAbsDomain[Interval] with
  export IntervalArith.{+, -, ‘*‘, /, <, ≡}
  export IntervalAbsDomain.{bot, top, ⊑, ⊔, ⊓, ▽, △}

4 Abstract Interpreter

So far, we have defined all necessary infrastructural components for building a simple abstract interpreter. Let us now proceed!

4.1 Abstract Values and Stores

We first define type aliases that Interval is used as abstract values AbsVal, and Map[String, AbsVal] is used as abstract stores AbsStore.

type AbsVal = Interval
type AbsStore = Map[String, AbsVal]

It is obvious that AbsVal is an instance of NumAbsDomain. Further more, since AbsVal as an abstract domain is used as the range of the store mapping, AbsStore is also an abstract domain, hence a lattice too (recall Lifting Lattices: Map).

4.2 Abstract Evaluation of Expressions

Similar to how we define the concrete evaluation of expressions (Evaluation of Expressions), we define the abstract evaluation by performing a case analysis over the syntax of Expr. In fact, it follows the same structure of the concrete evaluation function where each case we introduce new behaviors.

For Var, we only lookup the variable if the variable is defined in σ. If not, we return the bottom value of AbsVal, in contrast to the concrete evaluation where an exception is raised. By performing this check, absEval is made to be a total function and our abstract interpreter can detect “undefined variable” errors. For Lit, we lift the syntactic number into the abstract domain, resulting in a constant interval. For BinOp, we apply the abstract evaluator for binary operators absEvalOp.

def absEval(e: Expr, σ: AbsStore): AbsVal =
  e match
    case Var(x) => if (σ.contains(x)) σ(x) else summon[Lattice[AbsVal]].bot
    case Lit(i) => Interval.from(i)
    case BinOp(op, e1, e2) => absEvalOp(op, absEval(e1, σ), absEval(e2, σ))

In Evaluation of Expressions, we have already defined a generic evaluation function for primitive binary operations. Therefore, we can reuse the definition of concrete evaluation function evalOp without any modification. Just note that the instantiation of T here is Interval instead of concrete NumV, which can be often omitted and inferred by Scala’s type checker.

def absEvalOp[T: Arith](op: String, i1: T, i2: T): T = evalOp[T](op, i1, i2)

4.3 Abstract Execution of Statements

Again, similar to the concrete execution statements (Execution of Statements), the abstract execution of statements is function that transforms abstract stores.

We proceed with a case analysis of the syntax of Stmt. For the Skip case, we simply return the input store, which is the the same action as in concrete execution. For the Assign case, we construct a new store by joining the existing store and the new assignment. This join of stores is monotonic and preserves existing values: if x is already defined in σ, the new store has x mapped to σ(x) ⊔ absEval(e, σ). For the Seq case, we compose the execution of two statements, which is the the same action as in concrete execution too.

def absExec(s: Stmt, σ: AbsStore): AbsStore =
  s match
    case Skip => σ
    case Assign(x, e) => σ ⊔ Map(x -> absEval(e, σ))
    case Seq(s1, s2) => absExec(s2, absExec(s1, σ))
    case Cond(e, s1, s2) =>
      val c = absEval(e, σ)
      val thn = if (Interval.from(1) ⊑ c) absExec(s1, σ)
                else summon[Lattice[AbsStore]].bot
      val els = if (!(c ⊑ Interval.from(1)))
                absExec(s2, σ) else summon[Lattice[AbsStore]].bot
      thn ⊔ els
    case While(e, s) =>
      def loop(σ: AbsStore): AbsStore = {
        if (Interval.from(1) ⊑ absEval(e, σ)) absExec(s, σ)
        else σ
      }
      kleene(loop)(σ)

For the Cond case, we first evaluate the condition e to an interval c, which represents an over-approximation of the possible condition value. Then we check (1) if 1 is possibly captured by c, then the “then” branch will be executed, otherwise the empty abstract store is used for the “then” branch result; and (2) if any value other 1 is represented by c, then the “else“ branch will be executed, otherwise the empty abstract store is used for the “else” branch result. !(c ⊑ Interval.from(1)) is the condition of executing s2, which might look counter-intuitive: it is derived from checking [1,1] \not\sqsubseteq c, which is equivalent to \neg (c \sqsubseteq [1,1]). It is possible that condition c is the bottom value, in which case, both “then” and “else” branches are not executed. Finally, we join the two branches thn ⊔ els as the result of this conditional.

For the While case, we define a function loop that expresses a single iteration of while’s body statement. The body of our loop function checks if Interval.from(1) is included in the condition interval. If so it executes the body statement abstractly, otherwise it returns the current store. loop is defined in a way that is similar to the first alternative we discussed in Alternative Loop Semantics, i.e. it takes a single abstract store argument and returns an abstract store.

Next, we are going to use the Kleene iteration🔗 to compute the fixed-point σ of this loop function such that loop(σ) = σ. Once we reach this fixed-point, it means that we cannot gain any more information by repeating more times of the loop body, thus the abstract execution can terminate.

4.4 Fixed-Point Computation

The Kleene iteration computing the fixed-point of a function is defined as a higher-order function that takes a function f: T => T and returns a function of type T => T.

def kleene[T: AbsDomain](f: T => T)(t: T): T =
  val next = f(t)
  if (next ⊑ t) t else kleene(f)(t ▽ next)

In the above definition, t is the value from the last iteration, and f(t) computes the value of the next iteration. If next is less then t, then we have reached a fixed-point since we are not going up in the lattice. However, if next is greater than t, then we continue the iteration by recursively calling kleene with the new value t ▽ next that widens t with next. In this way, we effectively compute through a chain such as t_0, t_1, \dots, t_{n-1}, t_n where we stop at t_n \sqsubseteq t_{n-1}.

When computing the abstract store of While statements, we call kleene with the function loop and the initial store kleene(loop)(σ), where σ serves as the very first t argument to kleene.

Note that since the use of the widening operator, kleene may not compute the least fixed-point (but still a fixed-point). With the interval domain (which is a dcpo🔗) and the widening operation, the Kleene iteration guarantees termination and reaching a fixed-point even for lattices with infinite height.

4.5 Alternative Fixed-Point Computation

As we discussed in Alternative Loop Semantics, there are different ways to define the meaning of loops. Notably, we can define the meaning of loops as a open-recursive function. Similarly here for the abstract interpreter, we can do the same.

case While(e, s) =>
  def loop(rec: AbsStore => AbsStore)(σ: AbsStore): AbsStore = {
    if (Interval.from(1) ⊑ absEval(e, σ)) rec(absExec(s, σ))
    else σ
  }
  kleene(loop)(σ)

The difference lies in that we call rec if the condition is true, which is arguably more natural serving as the meaning of loops, in contrast to only expressing one-iteration execution of the loop. Now, the form and how we define loop follows the same pattern as the second alternative (Alternative Loop Semantics) we did for the concrete loop execution.

But how should we ensure its termination? The style of open-recursion provides a way to instrument the function with a termination check; this is done by tweaking how we compute the Kleene iteration:

def kleene[T: AbsDomain](f: (T => T) => T => T)(t: T): T =
  var acc: T = summon[AbsDomain[T]].bot
  def iter(t: T): T =
    if (t ⊑ acc) acc
    else {
      acc = acc ▽ t
      f(iter)(acc)
    }
  iter(t)

We first define a mutable variable acc to store the current value as climbing the chain, which is initially the bottom value. Then we define a local function iter of type T => T that instruments the function f with a termination check. During the iteration, we first check if the current value acc is already greater than the argument t, indicating the termination. Otherwise, we widen acc with t as the new acc value, then we call f with the instrumented iter function and the current acc in the chain, where the iter is the capability to make recursive call. The passed function f itself would make a recursive call via rec when needed, which has been instrumented with termination check. And each recursive call made via rec will update the accumulated value acc.

5 Running the Abstract Interpreter

In this section, we will play with the abstract interpreter to see how it works.

5.1 Power Function

Let us revisit the power function we showed in An Example and use the abstract interpreter to analyze the program.

val powerWhile =
  Seq(Assign("res", Lit(1)),
    While(BinOp("<", Lit(0), Var("x")),
      Seq(Assign("res", BinOp("*", Var("b"), Var("res"))),
        Assign("x", BinOp("-", Var("x"), Lit(1))))))

We lift the values of the initial store to the interval domain, which are constant intervals for b and x:

absExec(powerWhile, Map("x" -> Interval.from(10), "b" -> Interval.from(2)))

The abstract interpreter produces the following result that captures all possible values during all time of execution:

Map(x -> Interval(-Infinity,10.0),
    res -> Interval(1.0,Infinity),
    b -> Interval(2.0,2.0))

which is a faithful over-approximation of the resulting concrete store:

Map(x -> NumV(0),
    res -> NumV(1024),
    b -> NumV(2))

In the resulting store, the abstract value of x ends up being [-\infty, 10], since initially it is indeed 10, and later it is decreased by 1 in a loop. But the abstract interpreter could not determine a precise lower bound, or realize the decrease stops at 0. The lower bound -\infty is introduced by the widening operator. This is of course not a super precise result, but can be useful in many cases.

5.2 Conditional

It is also interesting to see how imprecision/over-approximation is introduced when a condition can be either true or false. We shall consider a simple program

if (x == 1) { y := 2 } else { y := 10 }

whose abstract syntax tree is the following:

val condProg =
  Cond(BinOp("=", Var("x"), Lit(1)),
    Assign("y", Lit(2)),
    Assign("y", Lit(10)))

We then run that program abstractly where initially x could be any value:

absExec(condProg, Map("x" -> summon[Lattice[Interval]].top))

The resulting store produced shows that y could be any value in between 2 and 10, which captures the execution of both branches:

Map(y -> Interval(2.0,10.0), x -> Interval(-Infinity,Infinity))

5.3 Non-Terminating Programs

Another simple program is a non-terminating program

while (1) { x = x + 1 }

whose abstract syntax tree is the following:

val nonTerm = While(Lit(1), Assign("x", BinOp("+", Var("x"), Lit(1))))

The program keeps incrementing the variable x by 1. Let us analyze the program with an initial store where x is 100:

absExec(nonTerm, Map("x" -> Interval.from(100)))

The analyzer correctly produces the result: the range of x is [100, +inf] since the loop is unbounded. The analyzer itself is able to realize that without running into an infinite loop.

Map(x -> Interval(100.0, Infinity))

We could also assume that the input value x is the top interval (i.e. arbitrary):

absExec(nonTerm, Map("x" -> summon[Lattice[Interval]].top))

Then the analysis result tells us that x indeed could be any value during execution:

Map(x -> Interval(-Infinity, Infinity))

5.4 Why Widening is Necessary?

Recall that when defining the Fixed-Point Computation, we use the widening operator ▽ to combine the state at the current iteration and the state of the next iteration. We could also make an experiment that replaces the widening operator with the join operator ⊔:

def kleene[T: AbsDomain](f: T => T)(t: T): T =
  val next = f(t)
  if (next ⊑ t) t else kleene(f)(t ⊔ next) // replace ▽ with ⊔

Then we run the simple non-terminating program from last section:

absExec(nonTerm, Map("x" -> Interval.from(100)))

Surprisingly, the abstract interpreter itself does not even terminate. We get stuck in the Kleene fixed-point iteration. If we further debug what’s going on by printing the variable next of each iteration, we see the upper bound of x keeps increasing but is never stabilized:

Map(x -> Interval(100.0,101.0))
Map(x -> Interval(100.0,102.0))
Map(x -> Interval(100.0,103.0))
Map(x -> Interval(100.0,104.0))
Map(x -> Interval(100.0,105.0))
...

However, with the widening operation, the iteration converges quickly after a few iterations, where the upper bound of x is Infinity:

Map(x -> Interval(100.0,101.0))
Map(x -> Interval(100.0,Infinity))

6 What’s Next?

Now, we have build an abstract interpreter to decide the lower and upper bound of values in the While language. So what’s next?

Source Code. The complete source code can be found here🔗.

Abstract Domains. We defined a general interface of numerical abstract domains, thus the use of Interval can be replaced with more powerful abstract domains. It is also worth noting that intervals are an example of non-relational abstract domains, which do not track the relations between variables. It is possible to use relational abstract domains for higher precision (which can track relations such as x < y), including zones (Miné 2001), octagons (Miné 2006), and polyhedron (Cousot and Halbwachs 1978). The idea of abstract interpretation and using a numerical abstract domain to analyze low-level system programs has many real-world applications, e.g. in analyzing the eBPF programs running in the Linux kernel (Gershuni et al. 2019).

Analyzing Richer Languages. The While language is a small toy language for demonstration. It lacks many features of real programming languages, such as first-class functions and various control mechanisms (e.g. break, exception, call/cc etc.). In those languages, the control flow of programs is conflated with data flow. To analyze modern functional languages, we need to analyze how functions are flowed in programs, which is traditionally formulated as control-flow analysis (Shivers 1991) (Horn and Might 2010).

Further Abstractions. We have already noticed the structural similarity between the concrete interpreter and the abstract interpreter. Indeed they can be further abstracted so that we define a generic interpreter, which can be instantiated with either concrete or abstract semantics. In this way, we obtain abstracting definitional interpreters (Darais et al. 2017). My own work further abstracts over another dimension, binding-times of the semantics-agnostic interpreter, thus the interpreter can be used as a compiler to generate efficient code (Wei et al. 2019) (Wei et al. 2020). This is an effective way to eliminate the interpretation overhead of analysis and to improve the performance of static analyses.

Acknowledgement

This article is written with the assistant from Github Copilot.

This work is licensed under a Creative Commons Attribution (BY) 4.0 International License.

Bibliography

Patrick Cousot and Nicolas Halbwachs. Automatic Discovery of Linear Restraints Among Variables of a Program. In Proc. POPL, 1978.

David Darais, Nicholas Labich, Phuc C. Nguyen, and David Van Horn. Abstracting definitional interpreters (functional pearl). Proc. ACMProgram. Lang. 1(ICFP), 2017.

Elazar Gershuni, Nadav Amit, Arie Gurfinkel, Nina Narodytska, Jorge A. Navas, Noam Rinetzky, Leonid Ryzhyk, and Mooly Sagiv. Simple and precise static analysis of untrusted Linux kernel extensions. In Proc. PLDI, 2019.

David Van Horn and Matthew Might. Abstracting abstract machines. In Proc. ICFP, 2010.

Antoine Miné. A New Numerical Abstract Domain Based on Difference-Bound Matrices. In Proc. PADO, 2001.

Antoine Miné. The Octagon Abstract Domain. High. Order Symb. Comput. 19(1), pp. 31–100, 2006.

Olin Shivers. Control-flow analysis of higher-order languages or taming lambda. 1991.

Guannan Wei, Oliver Bracevac, Shangyin Tan, and Tiark Rompf. Compiling symbolic execution with staging and algebraic effects. Proc. ACMProgram. Lang. 4(OOPSLA), 2020.

Guannan Wei, Yuxuan Chen, and Tiark Rompf. Staged abstract interpreters: fast and modular whole-program analysis via meta-programming. Proc. ACMProgram. Lang. 3(OOPSLA), 2019.

contents ← prev up next →

	Simulating Quantum Circuits in Feynman-Style using Higher-Order Functions
	译西西弗斯的神话
	Building an Abstract Interpreter in Scala 3
	存在主义笔记
	Notes on Algebraic Structures
	Notes on Universal Algebra
	文学理论入门
	Notes on Topology
	A Soundness Proof of STLC by Definitional Interpreters in Agda
	Strong Normalization of STLC
	A Regular Expression Matcher in CPS
	Notes on Abstract Interpretation

1	The While Language
2	Concrete Interpreter
3	Abstract Domains
4	Abstract Interpreter
5	Running the Abstract Interpreter
6	What’s Next?
	Bibliography