On this page:
1 Motivation
2 A Simple Language
2.1 Syntax
2.2 Trace Semantics
3 Theory of Approximation
3.1 Order, Lattices, and Complete Lattices
3.2 Collecting Semantics
4 Galois Connections
4.1 Examples
4.2 Properties
5 Computing Abstract Semanitcs
5.1 Values and Environments
5.2 Expressions
5.3 Traces
5.4 States
6 Combination of Abstractions
6.1 Products
6.2 Sets
6.3 Functions with Fixed Points
6.4 Higher-order Abstract Interpretation
7 Composition of Abstractions
7.1 Reduced Cardinal Product
7.2 Reduced Cardinal Power
8 Loops
8.1 Fixed-points
8.2 CPO
8.3 Fixed-point Collecting Semantics
8.4 Abstraction of Fixed-points
8.5 Widening
8.6 Narrowing
10/22/2019

Notes on Abstract Interpretation

Guannan Wei <guannanwei@purdue.edu>

This is my notes on abstract interpretation based on Bruno Blanchet’s Introduction to Abstract Interpretation, Matt Might’s blog post Order theory for computer scientists, Alexandru Salcianu’s Notes on Abstract Interpretation, and Stephen Chong’s CS252r course material.

1 Motivation

Most of the interesting properties of programs are undecidable since they can be reduced to the Halting Problem. The proof of undecidability of Halting problem is as follows.

Proof: By contradiction. First assume that there existing an analyzer Halt?(P, E) returning true when program P terminates on input E and false otherwise. Consider a program Q(P) = if Halt?(P, P) then loop else halt. Now apply Q(Q):
  • If Halt?(Q, Q) is true, then Q(Q) loops. Contradiction.

  • If Halt?(Q, Q) is false, then Q(Q) halts. Contradiction. □

To analyze the behavior of programs, we introduce approximations. The approximations must be sound: if the system gives a definite answer, then this answer is true. Sometimes the system answers "maybe", then the system is not complete.

2 A Simple Language

In this simple programming language, a program is a function from integers (PC) to commands, indicating the command at each PC in the program.

2.1 Syntax

We define the grammar of expressions:

 

Exp

 ::= 

var

 

  |  

num

 

  |  

Exp Aop Exp

 

  |  

Exp Bop Exp

 

Aop

 ::= 

+

 

  |  

-

 

  |  

*

 

  |  

/

 

Bop

 ::= 

>=

 

  |  

...

and the grammar of commands:

 

Cmd

 ::= 

var := Exp

 

  |  

if Exp goto PC

 

  |  

input var

 

  |  

print Exp

 

  |  

abort

2.2 Trace Semantics

The state of programs consists of 1) a program counter, which indicates the next command to execute, 2) an environment, which contains a partial map from variables to values. We ignore overflows. Formally,

pc \in PC := \mathbb{N} \rho \in Env := Var \rightarrow \mathbb{Z} s \in State := PC \times Env

A sequence of states is a trace. A program may have several traces, depending on the user input; so the semantics of a program is a set of traces.

3 Theory of Approximation

We want to prove properties like "all traces of the program satisfy a given condition". To do this, we over-approximate the set of traces, and obtain a superset of the set of concrete traces. An analysis is more precise when it yields a small superset of the set of concrete traces. All correct approximations are the ones are greater than the smallest correct one, namely, the set of concrete traces.

3.1 Order, Lattices, and Complete Lattices

Definition(Partially-ordered set): A partially-ordered set (poset for short) is a set S equipped with a binary relation \leq which is reflexive, anti-symmetric and transitive. A binary relation R is anti-symmetric if a R b and b R a implies a = b.

The ordering relation can be partial, i.e., there exists incomparable elements.

Definition(Lattice): A lattice is a poset (L, \leq) s.t. for all a, b \in L, a and b have a least upper bound a \sqcup b and a greatest lower bound a \sqcap b. We denote the lattice by (L, \leq, \sqcup, \sqcap).

All finite sets in a lattice have least upper bounds and greatest lower bounds, but not necessarily for infinite sets.

Definition(Complete lattice): A complete lattice is a poset (L, \leq) s.t. every subset X of L has a least upper bound \sqcup X and a greatest lower bound \sqcap X. In particular, L has a least element \bot = \sqcup \varnothing and a greatest element \top = \sqcap L. We denote the complete lattice by (L, \leq, \bot, \top, \sqcup, \sqcap).

Proposition: If S is a set, then (\mathcal{P}(S), \subseteq, \varnothing, S, \cup, \cap) is a complete lattice.

Intuitively, a \leq b means that a is more precise than b.

3.2 Collecting Semantics

Collecting semantics computes the set of possible traces, formally:

T_r = \{ s_1 \rightarrow \cdots \rightarrow s_n | s_1 \text{ is the initial state,}~ s_i \rightarrow s_{i+1} ~\text{is an allowed transition} \}

The set of reachable states is defined by:

S_r = \{ s | s_1 \rightarrow \cdots \rightarrow s \in T_r \}

The set of all traces, of all states, are ordered by inclusion. They are complete lattice (in this case, they are concrete lattice.)

4 Galois Connections

Definition(Galois connection): Let (L_1, \leq_1) and (L_2, \leq_2) be posets. (\alpha, \gamma) is a Galois connection (or pair of adjoined functions) between L_1 and L_2 iff \alpha : L_1 \to L_2 and \gamma : L_2 \to L_1, and \forall x \in L_1, \forall y \in L_2, \alpha(x) \leq_2 y \iff x \leq_1 \gamma(y)

4.1 Examples
4.2 Properties

Proposition: (\alpha, \gamma) is a Glaois connection iff \alpha and \gamma are monotone, (\alpha \circ \gamma)(y) \leq_2 y and x \leq_1 (\gamma \circ \alpha)(x).

Proof:
  • First assume that (\alpha, \gamma) is a Galois connection.
    • By definition \alpha(x) \leq_2 y, and by reflexivity \gamma(y) \leq_1 \gamma(y), thus (\alpha \circ \gamma)(y) \leq_2 y.

    • By definition x \leq_1 \gamma(y), and by reflexivity \alpha(x) \leq_2 \alpha(x), thus x \leq_1 (\gamma \circ \alpha)(y).

    • Let x \leq_1 x{}’. We know that x{}’ \leq_1 (\gamma \circ \alpha)(x{}’), then by transitivity, x \leq_1 (\gamma \circ \alpha)(x{}’). Thus by definition, \alpha(x) \leq_2 \alpha(x{}’), and \alpha is monotone.

    • Let y \leq_2 y{}’. We know that (\alpha \circ \gamma)(y) \leq_2 y, then by transitivity, (\alpha \circ \gamma)(y) \leq_2 y{}’. Thus by definition, \gamma(y) \leq_2 \gamma(y{}’), and \gamma is monotone.

  • Then assume \alpha and \gamma are monotone, (\alpha \circ \gamma)(y) \leq_2 y and x \leq_1 (\gamma \circ \alpha)(x).
    • Assume that \alpha(x) \leq_2 y, since \gamma is monotone, then \gamma(\alpha(x)) \leq_1 \gamma(y). By assumption, x \leq_1 \gamma(\alpha(x)); by transitivity, x \leq_1 \gamma(y).

    • Assume that x \leq_1 \gamma(y), since \alpha is monotone, then \alpha(x) \leq_2 \alpha(\gamma(y)). By assumption, \alpha(\gamma(y)) \leq_2 y; by transitivity, \alpha(x) \leq_2 y. □

Proposition: Let (L_1, \leq_1, \bot_1, \top_1, \sqcup_1, \sqcap_1) and (L_2, \leq_2, \bot_2, \top_2, \sqcup_2, \sqcap_2) be complete lattices, and (\alpha, \gamma) be a Galois connection between L_1 and L_2. Then
  • Each function in the pair (\alpha, \gamma) uniquely determines the other: \alpha(x) = \sqcap_2 \{ y \in L_2 | x \leq_1 \gamma(y) \} \\ \gamma(y) = \sqcup_1 \{ x \in L_1 | \alpha(x) \leq_2 y\}

  • \alpha is a complete join-morphism (i.e., additive: \alpha(\sqcup S) = \sqcup \{ \alpha(x) | x \in S \}), \alpha(\bot_1) = \bot_2. \gamma is a complete meet-morphism (i.e., \gamma(\sqcap S) = \sqcap \{ \gamma(x) | x \in S \}), \gamma(\top_2) = \top_1.

Proof: TODO □

Proposition: Let (L_1, \leq_1, \bot_1, \top_1, \sqcup_1, \sqcap_1) and (L_2, \leq_2, \bot_2, \top_2, \sqcup_2, \sqcap_2) be complete lattices. Then
  • If \alpha : L_1 \to L_2 is a complete join-morphism and \gamma(y) = \sqcup_1 \{ x \in L_1 | \alpha(x) \leq_2 y \}, then (\alpha, \gamma) is a Galois connection.

  • If \gamma : L_2 \to L_1 is a complete meet-morphism and \alpha(x) = \sqcup_2 \{ y \in L_2 | x \leq_1 \gamma(y) \}, then (\alpha, \gamma) is a Galois connection.

Proof: TODO □

Proposition: \alpha is surjective iff \gamma is bijective iff \alpha \circ \gamma = \lambda y.y.

Proof: TODO □

Remark: Let \sigma(y) = \sqcap \{ y{}’ | \gamma(y{}’) = \gamma(y)\}, then \alpha(L_1) = \sigma(L_2).

Proof: TODO □

5 Computing Abstract Semanitcs

The approach: start from the concrete semantics, lift it to sets to obtain the collecting semantics, and then abstract the collecting semantics.

5.1 Values and Environments

The correctness of the abstraction \hat{x} of a value x is \alpha(x) \leq \hat{x}. For an operator, such as +, is a correct abstraction iff \alpha(a) \leq \hat{a} and \alpha(b) \leq \hat{b}, then \alpha(a+b) \leq \hat{a} \hat{+} \hat{b}.

The abstraction of environments can be defined in two steps:

Finally, the Galois connection of environments \mathcal{P}(Var \to \mathbb{Z}) \underset{\gamma_R}{\overset{\alpha_R}{\rightleftharpoons}} Var \to L is obtained by composing the two Galois connections: \alpha_R = \alpha_{R_2} \circ \alpha_{R_1} and \gamma_R = \gamma_{R_1} \circ \gamma_{R_2}.

To summarize:

 

Semantics

 

Collecting Semantics

 

Abstraction

 

Abstract Semantics

Values

 

\mathbb{Z}

 

(\mathcal{P}(\mathbb{Z}), \subseteq)

 

\underset{\gamma}{\overset{\alpha}{\rightleftharpoons}}

 

(L, \leq)

Operators

 

+ : \mathbb{Z} \times \mathbb{Z} \to \mathbb{Z}

 

+ : \mathcal{P}(\mathbb{Z}) \times \mathcal{P}(\mathbb{Z}) \to \mathcal{P}(\mathbb{Z})

 

 

\hat{a} ~\hat{+}~ \hat{b} = \alpha(\gamma(\hat{a}) + \gamma(\hat{b}))

Environments

 

\rho : Var \to \mathbb{Z}

 

(R : \mathcal{P}(Var \to \mathbb{Z}), \subseteq)

 

\underset{\gamma_R}{\overset{\alpha_R}{\rightleftharpoons}}

 

\hat{R} : Var \to (L, \leq)

5.2 Expressions

The expressions can be abstracted in a similar way:

Semantics

 

Collecting Semantics

 

Abstract Semantics

\llbracket E \rrbracket \rho \in \mathbb{Z}

 

\llbracket E \rrbracket R \in \mathcal{P}(\mathbb{Z})

 

\widehat{\llbracket E \rrbracket} \hat{R} \in L

\llbracket x \rrbracket \rho = \rho(x)

 

 

\widehat{\llbracket x \rrbracket} \hat{R} = \hat{R}(x)

\llbracket E_1 + E_2 \rrbracket \rho = \llbracket E_1 \rrbracket \rho + \llbracket E_2 \rrbracket \rho

 

 

\widehat{\llbracket E_1 + E_2 \rrbracket} \hat{R} = \widehat{\llbracket E_1 \rrbracket}\hat{R} ~\hat{+}~ \widehat{\llbracket E_2 \rrbracket}\hat{R}

The correctness of the abstract semantics of expressions: if \alpha_R(R) \leq \hat{R} then \alpha(\llbracket E \rrbracket R) \leq \widehat{\llbracket E \rrbracket} \hat{R}.

Proof: TODO □

5.3 Traces

Given a program P, the abstract trace semantics is defined as follows:

The transition for conditionals is non-deterministic. Sometimes we can have a better analysis by refining the values from Exp.

The correctness of the abstrace trace semantics is expressed by: if \langle pc_1, \rho_1 \to pc_2, \rho_2 \rangle, and \alpha_R(\{ \rho_1 \}) \le \hat{R_1}, then there exists \hat{R_2} s.t. \langle pc_1, \hat{R_1} \rangle \to \langle pc_2, \hat{R_2} \rangle and \alpha_R(\{ \rho_2 \}) \leq \hat{R_2}.

5.4 States

The next part is the abstract semantics of set of states.

6 Combination of Abstractions

6.1 Products
6.2 Sets
6.3 Functions with Fixed Points
6.4 Higher-order Abstract Interpretation

7 Composition of Abstractions

7.1 Reduced Cardinal Product
7.2 Reduced Cardinal Power

8 Loops

8.1 Fixed-points
8.2 CPO
8.3 Fixed-point Collecting Semantics
8.4 Abstraction of Fixed-points
8.5 Widening

f_\triangledown^n = \begin{cases} \bot & \text{if } n = 0 \\ f_{n-1}^\triangledown & \text{if } n > 0 \wedge f(f_{n-1}^\triangledown) \leq f_{n-1}^\triangledown \\ f_{n-1}^\triangledown ~ \triangledown ~ f(f_{n-1}^\triangledown) & \text{otherwise} \end{cases}

8.6 Narrowing