10/22/2019

### Notes on Abstract Interpretation

This is my notes on abstract interpretation based on Bruno Blanchet’s Introduction to Abstract Interpretation, Matt Might’s blog post Order theory for computer scientists, Alexandru Salcianu’s Notes on Abstract Interpretation, and Stephen Chong’s CS252r course material.

#### 1Motivation

Most of the interesting properties of programs are undecidable since they can be reduced to the Halting Problem. The proof of undecidability of Halting problem is as follows.

Proof: By contradiction. First assume that there existing an analyzer Halt?(P, E) returning true when program P terminates on input E and false otherwise. Consider a program Q(P) = if Halt?(P, P) then loop else halt. Now apply Q(Q):
• If Halt?(Q, Q) is true, then Q(Q) loops. Contradiction.

• If Halt?(Q, Q) is false, then Q(Q) halts. Contradiction. □

To analyze the behavior of programs, we introduce approximations. The approximations must be sound: if the system gives a definite answer, then this answer is true. Sometimes the system answers "maybe", then the system is not complete.

#### 2A Simple Language

In this simple programming language, a program is a function from integers (PC) to commands, indicating the command at each PC in the program.

##### 2.1Syntax

We define the grammar of expressions:

 ‹Exp› ::= ‹var› | ‹num› | ‹Exp› ‹Aop› ‹Exp› | ‹Exp› ‹Bop› ‹Exp› ‹Aop› ::= + | - | * | / ‹Bop› ::= >= | ...

and the grammar of commands:

 ‹Cmd› ::= ‹var› := ‹Exp› | if ‹Exp› goto ‹PC› | input ‹var› | print ‹Exp› | abort

##### 2.2Trace Semantics

The state of programs consists of 1) a program counter, which indicates the next command to execute, 2) an environment, which contains a partial map from variables to values. We ignore overflows. Formally,

pc \in PC := \mathbb{N} \rho \in Env := Var \rightarrow \mathbb{Z} s \in State := PC \times Env

• The semantics of expressions is a function \llbracket Exp \rrbracket \rho \in \mathbb{Z}. In case of errors (e.g., division by 0), \llbracket Exp \rrbracket \rho is undefined. We may use \bot to represent that.

• The semantics of commands is modeled as a transition system: \langle pc, \rho \rangle \rightarrow \langle pc{}’, \rho{}’ \rangle. Now given a program P, and current state \langle pc, \rho \rangle,
• If P(pc) = x := E, then \langle pc, \rho \rangle \rightarrow \langle pc+1, \rho[ x \mapsto \llbracket E \rrbracket \rho ] \rangle, if \llbracket E \rrbracket \rho is defined

• If P(pc) = if E goto n, then \langle pc, \rho \rangle \rightarrow \langle n, \rho \rangle if \llbracket E \rrbracket \rho \neq 0; \langle pc, \rho \rangle \rightarrow \langle pc+1, \rho \rangle otherwise.

• If P(pc) = input x, then \langle pc, \rho \rangle \rightarrow \langle pc+1, \rho[ x \mapsto v ] \rangle for some v \in \mathbb{Z}.

• If P(pc) = print E, then \langle pc, \rho \rangle \rightarrow \langle pc+1, \rho \rangle if \llbracket E \rrbracket \rho is defined.

A sequence of states is a trace. A program may have several traces, depending on the user input; so the semantics of a program is a set of traces.

#### 3Theory of Approximation

We want to prove properties like "all traces of the program satisfy a given condition". To do this, we over-approximate the set of traces, and obtain a superset of the set of concrete traces. An analysis is more precise when it yields a small superset of the set of concrete traces. All correct approximations are the ones are greater than the smallest correct one, namely, the set of concrete traces.

##### 3.1Order, Lattices, and Complete Lattices

Definition(Partially-ordered set): A partially-ordered set (poset for short) is a set S equipped with a binary relation \leq which is reflexive, anti-symmetric and transitive. A binary relation R is anti-symmetric if a R b and b R a implies a = b.

• c \in S is an upper bound of X \subseteq S iff \forall c{}’ \in X, c{}’ \leq c.

• c \in S is a lower bound of X \subseteq S iff \forall c{}’ \in X, c \leq c{}’.

• c \in S is the least upper bound of X \subseteq S iff \forall c{}’ \in X, c{}’ \leq c and \forall c{}’{}’ \in S s.t. \forall c{}’ \in X, c{}’ \leq c{}’{}’, we have c \leq c{}’{}’.

• c \in S is the greatest lower bound of X \subseteq S iff \forall c{}’ \in X, c \leq c{}’ and \forall c{}’{}’ \in S s.t. \forall c{}’ \in X, c{}’{}’ \leq c{}’, we have c{}’{}’ \leq c.

The ordering relation can be partial, i.e., there exists incomparable elements.

Definition(Lattice): A lattice is a poset (L, \leq) s.t. for all a, b \in L, a and b have a least upper bound a \sqcup b and a greatest lower bound a \sqcap b. We denote the lattice by (L, \leq, \sqcup, \sqcap).

All finite sets in a lattice have least upper bounds and greatest lower bounds, but not necessarily for infinite sets.

Definition(Complete lattice): A complete lattice is a poset (L, \leq) s.t. every subset X of L has a least upper bound \sqcup X and a greatest lower bound \sqcap X. In particular, L has a least element \bot = \sqcup \varnothing and a greatest element \top = \sqcap L. We denote the complete lattice by (L, \leq, \bot, \top, \sqcup, \sqcap).

Proposition: If S is a set, then (\mathcal{P}(S), \subseteq, \varnothing, S, \cup, \cap) is a complete lattice.

Intuitively, a \leq b means that a is more precise than b.

##### 3.2Collecting Semantics

Collecting semantics computes the set of possible traces, formally:

T_r = \{ s_1 \rightarrow \cdots \rightarrow s_n | s_1 \text{ is the initial state,}~ s_i \rightarrow s_{i+1} ~\text{is an allowed transition} \}

The set of reachable states is defined by:

S_r = \{ s | s_1 \rightarrow \cdots \rightarrow s \in T_r \}

The set of all traces, of all states, are ordered by inclusion. They are complete lattice (in this case, they are concrete lattice.)

#### 4Galois Connections

Definition(Galois connection): Let (L_1, \leq_1) and (L_2, \leq_2) be posets. (\alpha, \gamma) is a Galois connection (or pair of adjoined functions) between L_1 and L_2 iff \alpha : L_1 \to L_2 and \gamma : L_2 \to L_1, and \forall x \in L_1, \forall y \in L_2, \alpha(x) \leq_2 y \iff x \leq_1 \gamma(y)

##### 4.1Examples
• The concrete lattice is (\mathcal{P}(S), \subseteq) for some set S. The abstract lattice contains elements \bot, c, \top; for each c \in S, ordered by \bot \leq c \leq \top. The abstraction is defined by \alpha(\varnothing) = \bot, \alpha(\{c\}) = c, \alpha(\_) = \top. The concretization is defined by \gamma(\bot) = \varnothing, \gamma(c) = \{c\}, \gamma(\top) = S.

• Sign. The concrete lattice is \mathcal{P}(\mathbb{Z}). The abstract lattice is \mathcal{P}(\{-, 0, +\}), ordered by inclusion. The abstraction is defined by: \alpha(S) = \begin{cases} \{+\}, & \text{if }S \cap [0, +\infty] \neq \varnothing \\ \{0\}, & \text{if }0 \in S \\ \{-\}, & \text{if }S \cap [-\infty, 0] \neq \varnothing \end{cases} The concretization is defined by: \gamma(\hat{S}) = \begin{cases} [-\infty, 0], & \text{if } - \in \hat{S} \\ \{0\} , & \text{if } 0 \in \hat{S} \\ [0, +\infty], & \text{if } + \in \hat{S} \end{cases}

• Interval. The concrete lattice is \mathcal{P}(\mathbb{Z}). The abstract lattice is \{ \varnothing \} \cup \{ [a, b] | a \leq b, a \in \mathbb{Z} \cup \{- \infty\}, b \in \mathbb{Z} \cup \{+\infty\} \} The ordering is [a,b] \leq [a{}’,b{}’] iff a{}’ \leq a and b \leq b{}’; additionally, \varnothing \leq [a, b]. The abstraction is defined by: \alpha(S) = \begin{cases} \varnothing, & \text{if } S = \varnothing \\ [min(S), max(S)], & \text{otherwise} \end{cases} The concretization is defined by: \gamma(\hat{S}) = \begin{cases} \varnothing, & \text{if } \hat{S} = \varnothing \\ \{a, \dots, b\}, & \text{if } \hat{S} = [a,b] \end{cases}

##### 4.2Properties

Proposition: (\alpha, \gamma) is a Glaois connection iff \alpha and \gamma are monotone, (\alpha \circ \gamma)(y) \leq_2 y and x \leq_1 (\gamma \circ \alpha)(x).

Proof:
• First assume that (\alpha, \gamma) is a Galois connection.
• By definition \alpha(x) \leq_2 y, and by reflexivity \gamma(y) \leq_1 \gamma(y), thus (\alpha \circ \gamma)(y) \leq_2 y.

• By definition x \leq_1 \gamma(y), and by reflexivity \alpha(x) \leq_2 \alpha(x), thus x \leq_1 (\gamma \circ \alpha)(y).

• Let x \leq_1 x{}’. We know that x{}’ \leq_1 (\gamma \circ \alpha)(x{}’), then by transitivity, x \leq_1 (\gamma \circ \alpha)(x{}’). Thus by definition, \alpha(x) \leq_2 \alpha(x{}’), and \alpha is monotone.

• Let y \leq_2 y{}’. We know that (\alpha \circ \gamma)(y) \leq_2 y, then by transitivity, (\alpha \circ \gamma)(y) \leq_2 y{}’. Thus by definition, \gamma(y) \leq_2 \gamma(y{}’), and \gamma is monotone.

• Then assume \alpha and \gamma are monotone, (\alpha \circ \gamma)(y) \leq_2 y and x \leq_1 (\gamma \circ \alpha)(x).
• Assume that \alpha(x) \leq_2 y, since \gamma is monotone, then \gamma(\alpha(x)) \leq_1 \gamma(y). By assumption, x \leq_1 \gamma(\alpha(x)); by transitivity, x \leq_1 \gamma(y).

• Assume that x \leq_1 \gamma(y), since \alpha is monotone, then \alpha(x) \leq_2 \alpha(\gamma(y)). By assumption, \alpha(\gamma(y)) \leq_2 y; by transitivity, \alpha(x) \leq_2 y. □

Proposition: Let (L_1, \leq_1, \bot_1, \top_1, \sqcup_1, \sqcap_1) and (L_2, \leq_2, \bot_2, \top_2, \sqcup_2, \sqcap_2) be complete lattices, and (\alpha, \gamma) be a Galois connection between L_1 and L_2. Then
• Each function in the pair (\alpha, \gamma) uniquely determines the other: \alpha(x) = \sqcap_2 \{ y \in L_2 | x \leq_1 \gamma(y) \} \\ \gamma(y) = \sqcup_1 \{ x \in L_1 | \alpha(x) \leq_2 y\}

• \alpha is a complete join-morphism (i.e., additive: \alpha(\sqcup S) = \sqcup \{ \alpha(x) | x \in S \}), \alpha(\bot_1) = \bot_2. \gamma is a complete meet-morphism (i.e., \gamma(\sqcap S) = \sqcap \{ \gamma(x) | x \in S \}), \gamma(\top_2) = \top_1.

Proof: TODO □

Proposition: Let (L_1, \leq_1, \bot_1, \top_1, \sqcup_1, \sqcap_1) and (L_2, \leq_2, \bot_2, \top_2, \sqcup_2, \sqcap_2) be complete lattices. Then
• If \alpha : L_1 \to L_2 is a complete join-morphism and \gamma(y) = \sqcup_1 \{ x \in L_1 | \alpha(x) \leq_2 y \}, then (\alpha, \gamma) is a Galois connection.

• If \gamma : L_2 \to L_1 is a complete meet-morphism and \alpha(x) = \sqcup_2 \{ y \in L_2 | x \leq_1 \gamma(y) \}, then (\alpha, \gamma) is a Galois connection.

Proof: TODO □

Proposition: \alpha is surjective iff \gamma is bijective iff \alpha \circ \gamma = \lambda y.y.

Proof: TODO □

Remark: Let \sigma(y) = \sqcap \{ y{}’ | \gamma(y{}’) = \gamma(y)\}, then \alpha(L_1) = \sigma(L_2).

Proof: TODO □

#### 5Computing Abstract Semanitcs

The approach: start from the concrete semantics, lift it to sets to obtain the collecting semantics, and then abstract the collecting semantics.

##### 5.1Values and Environments

The correctness of the abstraction \hat{x} of a value x is \alpha(x) \leq \hat{x}. For an operator, such as +, is a correct abstraction iff \alpha(a) \leq \hat{a} and \alpha(b) \leq \hat{b}, then \alpha(a+b) \leq \hat{a} \hat{+} \hat{b}.

The abstraction of environments can be defined in two steps:

• First, we abstract sets of environments to mappings from variables to sets of integers: \mathcal{P}(Var \to \mathbb{Z}) \underset{\gamma_{R1}}{\overset{\alpha_{R1}}{\rightleftharpoons}} Var \to \mathcal{P}(\mathbb{Z})

The abstraction is defined by \alpha_{R1} (R_1) = \lambda x . \{ \rho(x) | \rho \in R_1 \} The concretization is defined by \gamma_{R1}(\hat{R_1}) = \{ \lambda x . y | y \in \hat{R_1}(x) \}

• Then we abstract each result into L by (\alpha, \gamma): Var \to \mathcal{P}(\mathbb{Z}) \underset{\gamma_{R2}}{\overset{\alpha_{R2}}{\rightleftharpoons}} Var \to L The abstraction is defined by \alpha_{R2}(R_2) = \lambda x . \alpha(R_2(x)) The concretization is defined by \gamma_{R2}(\hat{R_2}) = \lambda x . \gamma(\hat{R_2}(x))

Finally, the Galois connection of environments \mathcal{P}(Var \to \mathbb{Z}) \underset{\gamma_R}{\overset{\alpha_R}{\rightleftharpoons}} Var \to L is obtained by composing the two Galois connections: \alpha_R = \alpha_{R_2} \circ \alpha_{R_1} and \gamma_R = \gamma_{R_1} \circ \gamma_{R_2}.

To summarize:

 Semantics Collecting Semantics Abstraction Abstract Semantics Values \mathbb{Z} (\mathcal{P}(\mathbb{Z}), \subseteq) \underset{\gamma}{\overset{\alpha}{\rightleftharpoons}} (L, \leq) Operators + : \mathbb{Z} \times \mathbb{Z} \to \mathbb{Z} + : \mathcal{P}(\mathbb{Z}) \times \mathcal{P}(\mathbb{Z}) \to \mathcal{P}(\mathbb{Z}) \hat{a} ~\hat{+}~ \hat{b} = \alpha(\gamma(\hat{a}) + \gamma(\hat{b})) Environments \rho : Var \to \mathbb{Z} (R : \mathcal{P}(Var \to \mathbb{Z}), \subseteq) \underset{\gamma_R}{\overset{\alpha_R}{\rightleftharpoons}} \hat{R} : Var \to (L, \leq)

##### 5.2Expressions

The expressions can be abstracted in a similar way:

 Semantics Collecting Semantics Abstract Semantics \llbracket E \rrbracket \rho \in \mathbb{Z} \llbracket E \rrbracket R \in \mathcal{P}(\mathbb{Z}) \widehat{\llbracket E \rrbracket} \hat{R} \in L \llbracket x \rrbracket \rho = \rho(x) \widehat{\llbracket x \rrbracket} \hat{R} = \hat{R}(x) \llbracket E_1 + E_2 \rrbracket \rho = \llbracket E_1 \rrbracket \rho + \llbracket E_2 \rrbracket \rho \widehat{\llbracket E_1 + E_2 \rrbracket} \hat{R} = \widehat{\llbracket E_1 \rrbracket}\hat{R} ~\hat{+}~ \widehat{\llbracket E_2 \rrbracket}\hat{R}

The correctness of the abstract semantics of expressions: if \alpha_R(R) \leq \hat{R} then \alpha(\llbracket E \rrbracket R) \leq \widehat{\llbracket E \rrbracket} \hat{R}.

Proof: TODO □

##### 5.3Traces

Given a program P, the abstract trace semantics is defined as follows:

• If P(pc) = x := E, then \langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R}[x \mapsto \widehat{\llbracket E \rrbracket} \hat{R} ]\rangle.

• If P(pc) = input x, then \langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R}[x \mapsto \top] \rangle.

• If P(pc) = print x, then \langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R} \rangle.

• If P(pc) = if E goto n, then
• \langle pc, \hat{R} \rangle \to \langle n, \hat{R} \rangle

• \langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R} \rangle

The transition for conditionals is non-deterministic. Sometimes we can have a better analysis by refining the values from Exp.

The correctness of the abstrace trace semantics is expressed by: if \langle pc_1, \rho_1 \to pc_2, \rho_2 \rangle, and \alpha_R(\{ \rho_1 \}) \le \hat{R_1}, then there exists \hat{R_2} s.t. \langle pc_1, \hat{R_1} \rangle \to \langle pc_2, \hat{R_2} \rangle and \alpha_R(\{ \rho_2 \}) \leq \hat{R_2}.

##### 5.4States

The next part is the abstract semantics of set of states.

#### 8Loops

##### 8.5Widening

f_\triangledown^n = \begin{cases} \bot & \text{if } n = 0 \\ f_{n-1}^\triangledown & \text{if } n > 0 \wedge f(f_{n-1}^\triangledown) \leq f_{n-1}^\triangledown \\ f_{n-1}^\triangledown ~ \triangledown ~ f(f_{n-1}^\triangledown) & \text{otherwise} \end{cases}