# Notes on Abstract Interpretation

This is my notes on abstract interpretation based on Bruno Blanchet’s Introduction to Abstract Interpretation, Matt Might’s blog post Order theory for computer scientists, Alexandru Salcianu’s Notes on Abstract Interpretation, and Stephen Chong’s CS252r course material.

### Motivation

Most interesting properties of programs are undecidable; they can be reduced to halting problem.
Proof. By contradiction. Assume that there exists a analyzer Halt(P,E) returning true when P terminates on input E, and false otherwise. Consider a program P'(P) = if Halt(P, P) then loop else halt. Now apply P'(P')

• If Halt(P',P') is true, then P'(P') loops. Contradiction.
• If Halt(P',P') is false, then P'(P') halts. Contradiction. $\square$

So we introduce approximations. The approximations must be sound (correct): if the system gives a definite answer, then this answer is true. Sometimes the system answers “maybe”, then the system is not complete.

Abstract interpretation is a theory of approximation. We first formalize the semantics of programs, then its approximations.

### A Simple Language

#### Syntax

Expressions:

E ::= variables
| numbers
| E + E
| E - E
| E * E
| E / E
| E >= E


Commands:

C ::= end
| x := E
| if E goto n
| input x
| print E


A program is a function Prog from integers to commands, indicating which command is at each address in the program.

#### Trace Semantics

The state of programs consists of 1) the program counter which indicates the next command to execute, 2) an environment which contains a partial map from variables to values. We ignore overflows.

Formally:
$$\rho \in Env = Var \rightarrow \mathbb{Z} \\ pc \in PC \\ s \in State = PC \times Env$$

• Semantics of expressions: $\llbracket E \rrbracket \rho \in \mathbb{Z}$. In case of errors (e.g., division by 0), $\llbracket E \rrbracket \rho$ is undefined. We may use $\bot$ to represent that.

$$\llbracket x \rrbracket \rho = \rho(x) \\ \llbracket E_1 + E_2 \rrbracket \rho = \llbracket E_1 \rrbracket \rho + \llbracket E_2 \rrbracket \rho \\ \cdots$$

• Semantics of commands is modeled as a transition system: $\langle pc, \rho \rangle \rightarrow \langle pc’, \rho’ \rangle$.
• If Prog(pc) = x := E, then $\langle pc, \rho \rangle \rightarrow \langle pc+1, \rho[ x \mapsto \llbracket E \rrbracket \rho ] \rangle$, if $\llbracket E \rrbracket \rho$ is defined.
• If Prog(pc) = input x, then $\langle pc, \rho \rangle \rightarrow \langle pc+1, \rho[ x \mapsto v ] \rangle$ for some $v \in \mathbb{Z}$.
• If Prog(pc) = print E, then $\langle pc, \rho \rangle \rightarrow \langle pc+1, \rho \rangle$ if $\llbracket E \rrbracket \rho$ is defined.
• If Prog(pc) = if E goto n, then $\langle pc, \rho \rangle \rightarrow \langle n, \rho \rangle$ if $\llbracket E \rrbracket \rho \neq 0$; $\langle pc, \rho \rangle \rightarrow \langle pc+1, \rho \rangle$ otherwise.

A sequence of state is a trace. A program may have several traces, depending of the user input; so the semantics of a program is a set of traces.

### Approximation

We want to prove properties like “all traces of the program satisfy a given condition”. To do this, we over-approximate the set of traces, and obtain a superset of the set of concrete traces. An analysis is more precise when it yields a smaller superset of the set of concrete trances. All correct approximations are the ones that are greater than the smallest correct one, namely, the set of concrete traces.

#### Order, Lattices, Complete lattices

Definition (Partially ordered set): A partially ordered set is a set $S$ equipped with a binary relation $\leq$ such that 1) $\leq$ is reflexive, 2) $\leq$ is antisymmetric, and 3) $\leq$ is transitive. A binray relation $R$ is antisymmetric if $a R b$ and $b R a$ implies $a = b$.

• $c \in S$ is an upper bound of $X \subseteq S$ iff $\forall c’ \in X, c’ \leq c$.
• $c \in S$ is a lower bound of $X \subseteq S$ iff $\forall c’ \in X, c \leq c’$.
• $c \in S$ is the least upper bound of $X \subseteq S$ iff $\forall c’ \in X, c’ \leq c$ and $\forall c’’ \in S$ s.t. $\forall c’ \in X, c’ \leq c’’$, we have $c \leq c’’$.
• $c \in S$ is the greatest lower bound of $X \subseteq S$ iff $\forall c’ \in X, c \leq c’$ and $\forall c’’ \in S$ s.t. $\forall c’ \in X, c’’ \leq c’$, we have $c’’ \leq c$.

The ordering can be partial, i.e., exists incomparable elements.

Definition (Lattice): A lattice is a poset $(L, \leq)$ s.t. for all $a, b \in L$, $a$ and $b$ have a least upper bound $a \sqcup b$ and a greatest lower bound $a \sqcap b$. We denote the lattice by $(L, \leq, \sqcup, \sqcap)$.

All finite sets in a lattice have least upper bounds and greatest lower bounds, but not necessarily infinite sets.

Definition (Complete lattice): A complete lattice is a poset $(L, \leq)$ s.t. every subset $X$ of $L$ has a least upper bound $\sqcup X$ and a greatest lower bound $\sqcap X$. In particular, $L$ has a least element $\bot = \sqcup \varnothing$ and a greatest element $\top = \sqcap L$. We denote the lattice by $(L, \leq, \bot, \top, \sqcup, \sqcap)$.

Proposition: If $S$ is a set, $(\mathcal{P}(S), \subseteq, \varnothing, S, \cup, \cap)$ is a complete lattice.

Intuitively, $a \leq b$ means $a$ is more precise than $b$.

#### Collecting semantics

Collecting semantics computes the set of possible traces, formally:

$$T_r = \{ s_1 \rightarrow \cdots \rightarrow s_n | s_1 ~\text{ is the initial state,}~ s_i \rightarrow s_{i+1} ~\text{is an allowed transition} \}$$

The set of reachable states is defined by:

$$S_r = \{ s | s_1 \rightarrow \cdots \rightarrow s \in T_r \}$$

The set of all traces, of all states, are ordered by inclusion. They are complete lattice. (In this case, they are concrete lattice.)

#### Galois connections

Definition (Galois connection): Let $(L_1, \leq_1)$ and $(L_2, \leq_2)$ be posets. $(\alpha, \gamma)$ is a Galois connection (or pair of adjoined functions) between $L_1$ and $L_2$ iff $\alpha : L_1 \to L_2$ and $\gamma : L_2 \to L_1$, and $$\forall x \in L_1, \forall y \in L_2, \alpha(x) \leq_2 y \iff x \leq_1 \gamma(y)$$

This is denoted by $(L_1, \leq_1) \underset{\gamma}{\overset{\alpha}{\rightleftharpoons}} (L_2, \leq_2)$, where $(L_1, \leq_1)$ is the concrete lattice, $(L_2, \leq_2)$ is the abstract lattice, $\alpha$ is the abstraction and $\gamma$ is the concretization. We say that $\alpha(x)$ is the most precise approximation of $x \in L_1$ in $L_2$; $\gamma(y)$ is the least precise element of $L_1$ which can be correctly approximated by $y \in L_2$.

###### Examples of Galois connections
• The concrete lattice is $(\mathcal{P}(S), \subseteq)$ for some set $S$. The abstract lattice contains $\bot$, $c$, $\top$, for each $c \in S$, ordered by $\bot \leq c \leq \top$.
The abstraction is defined by $\alpha(\varnothing) = \bot$, $\alpha(\{c\}) = c$, $\alpha(\_) = \top$.
The concretization is defined by $\gamma(\bot) = \varnothing$, $\gamma(c) = \{c\}$, $\gamma(\top) = S$.

• Sign. The concrete lattice is $\mathcal{P}(\mathbb{Z})$. The abstract lattice is $\mathcal{P}(\{-, 0, +\})$, ordered by inclusion.
The abstraction is defined by:
$$\alpha(S) = \begin{cases} \{+\}, & \text{if }S \cap [0, +\infty] \neq \varnothing \\ \{0\}, & \text{if }0 \in S \\ \{-\}, & \text{if }S \cap [-\infty, 0] \neq \varnothing \end{cases}$$ The concretization is defined by:
$$\gamma(\hat{S}) = \begin{cases} [-\infty, 0], & \text{if } - \in \hat{S} \\ \{0\} , & \text{if } 0 \in \hat{S} \\ [0, +\infty], & \text{if } + \in \hat{S} \end{cases}$$

• Interval. The concrete lattice is $\mathcal{P}(\mathbb{Z})$. The abstract lattice is $$\{ \varnothing \} \cup \{ [a, b] | a \leq b, a \in \mathbb{Z} \cup \{- \infty\}, b \in \mathbb{Z} \cup \{+\infty\} \}$$ The ordering is $[a,b] \leq [a’,b’]$ iff $a’ \leq a$ and $b \leq b’$, additionally, $\varnothing \leq [a, b]$.
The abstraction is defined by:
$$\alpha(S) = \begin{cases} \varnothing, & \text{if } S = \varnothing \\ [min S, max S], & \text{otherwise} \end{cases}$$ The concretization is defined by:
$$\gamma(\hat{S}) = \begin{cases} \varnothing, & \text{if } \hat{S} = \varnothing \\ \{a, \dots, b\}, & \text{if } \hat{S} = [a,b] \end{cases}$$

###### Properties

Proposition: $(\alpha, \gamma)$ is a Glaois connection iff $\alpha$ and $\gamma$ are monotone, $(\alpha \circ \gamma)(y) \leq_2 y$ and $x \leq_1 (\gamma \circ \alpha)(x)$.
Proof.

• First assume that $(\alpha, \gamma)$ is a Galois connection.
• By definition $\alpha(x) \leq_2 y$, and by reflexivity $\gamma(y) \leq_1 \gamma(y)$, thus $(\alpha \circ \gamma)(y) \leq_2 y$.
• By definition $x \leq_1 \gamma(y)$, and by reflexivity $\alpha(x) \leq_2 \alpha(x)$, thus $x \leq_1 (\gamma \circ \alpha)(y)$.
• Let $x \leq_1 x’$. We know $x’ \leq_1 (\gamma \circ \alpha)(x’)$, then by transitivity, $x \leq_1 (\gamma \circ \alpha)(x’)$. Thus by definition, $\alpha(x) \leq_2 \alpha(x’)$ and $\alpha$ is monotone.
• Let $y \leq_2 y’$. We know $(\alpha \circ \gamma)(y) \leq_2 y$, then by transitivity, $(\alpha \circ \gamma)(y) \leq_2 y’$. Thus by definition, $\gamma(y) \leq_2 \gamma(y’)$, and $\gamma$ is monotone.
• Then assume $\alpha$ and $\gamma$ are monotone, $(\alpha \circ \gamma)(y) \leq_2 y$ and $x \leq_1 (\gamma \circ \alpha)(x)$.
• Assume that $\alpha(x) \leq_2 y$, since $\gamma$ is monotone, then $\gamma(\alpha(x)) \leq_1 \gamma(y)$. By assumption, $x \leq_1 \gamma(\alpha(x))$, by transitivity, $x \leq_1 \gamma(y)$.
• Assume that $x \leq_1 \gamma(y)$. since $\alpha$ is monotone, then $\alpha(x) \leq_2 \alpha(\gamma(y))$. By assumption, $\alpha(\gamma(y)) \leq_2 y$, by transitivity, $\alpha(x) \leq_2 y$. $\square$

Proposition: Let $(L_1, \leq_1, \bot_1, \top_1, \sqcup_1, \sqcap_1)$ and $(L_2, \leq_2, \bot_2, \top_2, \sqcup_2, \sqcap_2)$ be complete lattices, and $(\alpha, \gamma)$ be a Galois connection between $L_1$ and $L_2$.
(1) Each function in the pair $(\alpha, \gamma)$ uniquely determines the other: $$\alpha(x) = \sqcap_2 \{ y \in L_2 | x \leq_1 \gamma(y) \} \\ \gamma(y) = \sqcup_1 \{ x \in L_1 | \alpha(x) \leq_2 y\}$$ (2) $\alpha$ is a complete join-morphism (i.e., is additive: $\alpha(\sqcup S) = \sqcup \{ \alpha(x) | x \in S \}$), $\alpha(\bot_1) = \bot_2$. $\gamma$ is a complete meet-morphism (i.e., $\gamma(\sqcap S) = \sqcap \{ \gamma(x) | x \in S \}$), $\gamma(\top_2) = \top_1$.
Proof. TODO $\square$

Proposition: Let $(L_1, \leq_1, \bot_1, \top_1, \sqcup_1, \sqcap_1)$ and $(L_2, \leq_2, \bot_2, \top_2, \sqcup_2, \sqcap_2)$ be complete lattices.
(1) If $\alpha : L_1 \to L_2$ is a complete join-morphism and $\gamma(y) = \sqcup_1 \{ x \in L_1 | \alpha(x) \leq_2 y \}$, then $(\alpha, \gamma)$ is a Galois connection.
(2) If $\gamma : L_2 \to L_1$ is a complete meet-morphism and $\alpha(x) = \sqcup_2 \{ y \in L_2 | x \leq_1 \gamma(y) \}$, then $(\alpha, \gamma)$ is a Galois connection.
Proof. TODO $\square$

Proposition: $\alpha$ is surjective iff $\gamma$ is bijective iff $\alpha \circ \gamma = \lambda y.y$.
Proof. TODO $\square$

Remark: Let $\sigma(y) = \sqcap \{ y’ | \gamma(y’) = \gamma(y)\}$, then $\alpha(L_1) = \sigma(L_2)$.
Proof. TODO $\square$

#### Computing abstract semantics

The approach: start from the concrete semantics, lift it to sets to obtain the collecting semantics, and abstract the collecting semantics.

###### Values and Environments

The correctness of the abstraction $\hat{x}$ of a value $x$ is $\alpha(x) \leq \hat{x}$. For an operator, such as $+$, is a correct abstraction iff $\alpha(a) \leq \hat{a}$ and $\alpha(b) \leq \hat{b}$, then $\alpha(a+b) \leq \hat{a} \hat{+} \hat{b}$.

The abstraction of environments can be defined in two steps:

• Abstract sets of environments to mappings from variables to sets of integers: $$\mathcal{P}(Var \to \mathbb{Z}) \underset{\gamma_{R1}}{\overset{\alpha_{R1}}{\rightleftharpoons}} Var \to \mathcal{P}(\mathbb{Z})$$ The abstraction is defined by $\alpha_{R1} (R_1) = \lambda x . \{ \rho(x) | \rho \in R_1 \}$.
The concretization is defined by $\gamma_{R1}(\hat{R_1}) = \{ \lambda x . y | y \in \hat{R_1}(x) \}$.

• Then abstract each result into $L$ by $(\alpha, \gamma)$: $$Var \to \mathcal{P}(\mathbb{Z}) \underset{\gamma_{R2}}{\overset{\alpha_{R2}}{\rightleftharpoons}} Var \to L$$ The abstraction is defined by $\alpha_{R2}(R_2) = \lambda x . \alpha(R_2(x))$.
The concretization is defined by $\gamma_{R2}(\hat{R_2}) = \lambda x . \gamma(\hat{R_2}(x))$.

• Finally, the Galois connection of environments $$\mathcal{P}(Var \to \mathbb{Z}) \underset{\gamma_R}{\overset{\alpha_R}{\rightleftharpoons}} Var \to L$$ is obtained by composing the two Galois connections: $\alpha_R = \alpha_{R_2} \circ \alpha_{R_1}$ and $\gamma_R = \gamma_{R_1} \circ \gamma_{R_2}$.

To summarize:

SemanticsCollecting SemanticsAbstractionAbstract Semantics
Values$\mathbb{Z}$$(\mathcal{P}(\mathbb{Z}), \subseteq)$$\underset{\gamma}{\overset{\alpha}{\rightleftharpoons}}$$(L, \leq) Operators+ : \mathbb{Z} \times \mathbb{Z} \to \mathbb{Z}$$+ : \mathcal{P}(\mathbb{Z}) \times \mathcal{P}(\mathbb{Z}) \to \mathcal{P}(\mathbb{Z})$$\hat{a} ~\hat{+}~ \hat{b} = \alpha(\gamma(\hat{a}) + \gamma(\hat{b})) Environments\rho : Var \to \mathbb{Z}$$(R : \mathcal{P}(Var \to \mathbb{Z}), \subseteq)$$\underset{\gamma_R}{\overset{\alpha_R}{\rightleftharpoons}}$$\hat{R} : Var \to (L, \leq)$
###### Expressions
SemanticsCollecting SemanticsAbstract Semantics
$\llbracket E \rrbracket \rho \in \mathbb{Z}$$\llbracket E \rrbracket R \in \mathcal{P}(\mathbb{Z})$$\widehat{\llbracket E \rrbracket} \hat{R} \in L$
$\llbracket x \rrbracket \rho = \rho(x)$$\widehat{\llbracket x \rrbracket} \hat{R} = \hat{R}(x) \llbracket E_1 + E_2 \rrbracket \rho = \llbracket E_1 \rrbracket \rho + \llbracket E_2 \rrbracket \rho$$\widehat{\llbracket E_1 + E_2 \rrbracket} \hat{R} = \widehat{\llbracket E_1 \rrbracket}\hat{R} ~\hat{+}~ \widehat{\llbracket E_2 \rrbracket}\hat{R}$

The correctness of the abstract semantics of expressions: if $\alpha_R(R) \leq \hat{R}$ then $\alpha(\llbracket E \rrbracket R) \leq \widehat{\llbracket E \rrbracket} \hat{R}$.
Proof: By induction on expression. TODO $\square$

###### Traces

The abstract trace semantics is defined as follows:

• If Prog(pc) = x := E, then $\langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R}[x \mapsto \widehat{\llbracket E \rrbracket} \hat{R} ]\rangle$.
• If Prog(pc) = input x, then $\langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R}[x \mapsto \top] \rangle$
• If Prog(pc) = print x, then $\langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R} \rangle$
• If Prog(pc) = if E goto n, then
• $\langle pc, \hat{R} \rangle \to \langle n, \hat{R} \rangle$
• $\langle pc, \hat{R} \rangle \to \langle pc+1, \hat{R} \rangle$

The transition for conditional is non-deterministic. Sometimes we can have a better analysis by refining the values from E.

The correctness of the abstrace trace semantics is expressed by: if $\langle pc_1, \rho_1 \to pc_2, \rho_2 \rangle$, and $\alpha_R(\{ \rho_1 \}) \le \hat{R_1}$, then there exists $\hat{R_2}$ s.t. $\langle pc_1, \hat{R_1} \rangle \to \langle pc_2, \hat{R_2} \rangle$ and $\alpha_R(\{ \rho_2 \}) \leq \hat{R_2}$.

###### States

The next part is the abstract semantics of set of states.

SemanticsCollecting SemanticsAbstractionAbstract Semantics