Friday, March 24, 2017

Consolidation of the Denotational Semantics and an Application to Compiler Correctness

This is a two part post. The second part depends on the first.

Part 1. Consolidation of the Denotational Semantics

As a matter of expediency, I've been working with two different versions of the intersection type system upon which the denotational semantics is based, one version with subsumption and one without. I had used the one with subsumption to prove completeness with respect to the reduction semantics whereas I had used the one without subsumption to prove soundness (for both whole programs and parts of programs, that is, contextual equivalence). The two versions of the intersection type system are equivalent. However, it would be nice to simplify the story and just have one version. Also, while the correspondence to intersection types has been enormously helpful in working out the theory, it would be nice to have a presentation of the semantics that doesn't talk about them and instead talks about functions as tables.

Towards these goals, I went back to the proof of completeness with respect to the reduction semantics and swapped in the "take 3" semantics. While working on that I realized that the subsumption rule was almost admissible in the "take 3" semantics, just the variable and application equations needed more uses of \(\sqsubseteq\). With those changes in place, the proof of completeness went through without a hitch. So here's the updated definition of the denotational semantics of the untyped lambda calculus.

The definition of values remains the same as last time: \[ \begin{array}{lrcl} \text{function tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] as does the \(\sqsubseteq\) operator. \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} For the denotation function \(E\), we add uses of \(\sqsubseteq\) to the equations for variables (\(v \sqsubseteq \rho(x)\)) and function application (\(v_3 \sqsubseteq v_3'\)). (I've also added the conditional expression \(\mathbf{if}\,e_1\,e_2\,e_3\) and primitive operations on numbers \(f(e_1,e_2)\), where \(f\) ranges over binary functions on numbers.) \begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ v \mid v \sqsubseteq \rho(x) \} \\ E[\!| \lambda x.\, e |\!](\rho) &= \left\{ T \middle| \begin{array}{l} \forall v_1 v_2'. \, v_1\mapsto v_2' \in T \Rightarrow\\ \exists v_2.\, v_2 \in E[\!| e |\!](\rho(x{:=}v_1)) \land v_2' \sqsubseteq v_2 \end{array} \right\} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists T v_2 v_2' v_3'.\, T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, v'_2\mapsto v_3' \in T \land v'_2 \sqsubseteq v_2 \land v_3 \sqsubseteq v_3' \end{array} \right\} \\ E[\!| f(e_1, e_2) |\!](\rho) &= \{ f(n_1,n_2) \mid \exists n_1 n_2.\, n_1 \in E[\!| e_1 |\!](\rho) \land n_2 \in E[\!| e_2 |\!](\rho) \} \\ E[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](\rho) &= \left\{ v \, \middle| \begin{array}{l} v \in E[\!| e_2 |\!](\rho) \quad \text{if } n \neq 0 \\ v \in E[\!| e_3 |\!](\rho) \quad \text{if } n = 0 \end{array} \right\} \end{align*}

Here are the highlights of the results for this definition.

Proposition (Admissibility of Subsumption)
If \(v \in E[\!| e |\!] \) and \(v' \sqsubseteq v\), then \(v' \in E[\!| e |\!] \).

Theorem (Reduction implies Denotational Equality)

  1. If \(e \longrightarrow e'\), then \(E[\!| e |\!] = E[\!| e' |\!]\).
  2. If \(e \longrightarrow^{*} e'\), then \(E[\!| e |\!] = E[\!| e' |\!]\).

Theorem (Whole-program Soundness and Completeness)

  1. If \(v' \in E[\!| e |\!](\emptyset)\), then \(e \longrightarrow^{*} v\) and \(v' \in E[\!| v |\!](\emptyset)\).
  2. If \(e \longrightarrow^{*} v\), then \(v' \in E[\!| e |\!](\emptyset) \) and \(v' \in E[\!| v |\!](\emptyset) \) for some \(v'\).

Proposition (Denotational Equality is a Congruence)
For any context \(C\), if \(E[\!| e |\!] = E[\!| e' |\!]\), then \(E[\!| C[e] |\!] = E[\!| C[e'] |\!]\).

Theorem (Soundness wrt. Contextual Equivalence)
If \(E[\!| e |\!] = E[\!| e' |\!]\), then \(e \simeq e'\).

Part 2. An Application to Compiler Correctness

Towards finding out how useful this denotational semantics is, I've begun looking at using it to prove compiler correctness. I'm not sure exactly which compiler I want to target yet, but as a first step, I wrote a simple source-to-source optimizer \(\mathcal{O}\) for the lambda calculus. It performs inlining and constant folding and simplifies conditionals. The optimizer is parameterized over the inlining depth to ensure termination. We perform optimization on the body of a function after inlining, so this is a polyvariant optimizer. Here's the definition. \begin{align*} \mathcal{O}[\!| x |\!](k) &= x \\ \mathcal{O}[\!| n |\!](k) &= n \\ \mathcal{O}[\!| \lambda x.\, e |\!](k) &= \lambda x.\, \mathcal{O}[\!| e |\!](k) \\ \mathcal{O}[\!| e_1\,e_2 |\!](k) &= \begin{array}{l} \begin{cases} \mathcal{O}[\!| [x{:=}e_2'] e |\!] (k{-}1) & \text{if } k \geq 1 \text{ and } e_1' = \lambda x.\, e \\ & \text{and } e_2' \text{ is a value} \\ e_1' \, e_2' & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| f(e_1,e_2) |\!](k) &= \begin{array}{l} \begin{cases} f(n_1,n_2) & \text{if } e_1' = n_1 \text{ and } e_2' = n_2 \\ f(e_1',e_2') & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](k) &= \begin{array}{l} \begin{cases} e_2' & \text{if } e_1' = n \text{ and } n \neq 0 \\ e_3' & \text{if } e_1' = n \text{ and } n = 0 \\ \mathbf{if}\,e_1'\, e_2'\,e_3'|\!](k) & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k)\\ \text{ and } e_3' = \mathcal{O}[\!|e_3 |\!](k) \end{array} \end{align*}

I've proved that this optimizer is correct. The first step was proving that it preserves denotational equality.

Lemma (Optimizer Preserves Denotations)
\(E(\mathcal{O}[\!| e|\!](k)) = E[\!|e|\!] \)
Proof
The proof is by induction on the termination metric for \(\mathcal{O}\), which is the lexicographic ordering of \(k\) then the size of \(e\). All the cases are straightforward to prove because Reduction implies Denotational Equality and because Denotational Equality is a Congruence. QED

Theorem (Correctness of the Optimizer)
\(\mathcal{O}[\!| e|\!](k) \simeq e\)
Proof
The proof is a direct result of the above Lemma and Soundness wrt. Contextual Equivalence. QED

Of course, all of this is proved in Isabelle. Here is the tar ball. I was surprised that this proof of correctness for the optimizer was about the same length as the definition of the optimizer!

Friday, March 10, 2017

The Take 3 Semantics, Revisited

In my post about intersection types as denotations, I conjectured that the simple "take 3" denotational semantics is equivalent to an intersection type system. I haven't settled that question per se, but I've done something just as good, which is to show that everything that I've done with the intersection type system can also be done with the "take 3" semantics (with a minor modification).

Recall that the main difference between the "take 3" semantics and the intersection type system is how subsumption of functions is handled. The "take 3" semantics defined function application as follows, using the subset operator \(\sqsubseteq\) to require the argument \(v_2\) to include all the entries in the parameter \(v'_2\), while allowing \(v_2\) to have possibly more entries. \begin{align*} E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists v_1 v_2 v'_2.\, v_1 {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, \{ v'_2\mapsto v_3 \} \sqsubseteq v_1 \land v'_2 \sqsubseteq v_2 \end{array} \right\} \end{align*} Values are either numbers or functions. Functions are represented as a finite tables mapping values to values. \[ \begin{array}{lrcl} \text{tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] and \(\sqsubseteq\) is defined as equality on numbers and subset for function tables: \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} Recall that \(\subseteq\) is defined in terms of equality on elements.

In an intersection type system (without subsumption), function application uses subtyping. Here's one way to formulate the typing rule for application: \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A' \to B \quad A <: A'} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Types are defined as follows \[ \begin{array}{lrcl} \text{types} & A,B,C & ::= & n \mid A \to B \mid A \land B \mid \top \end{array} \] and the subtyping relation is given below. \begin{gather*} \frac{}{n <: n}(a) \quad \frac{}{\top <: \top}(b) \quad \frac{}{A \to B <: \top}(c) \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'}(d) \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B}(e) \quad \frac{}{A \wedge B <: A}(f) \quad \frac{}{A \wedge B <: B}(g) \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)}(h) \end{gather*} Recall that values and types are isomorphic (and dual) to eachother in this setting. Here's the functions \(\mathcal{T}\) and \(\mathcal{V}\) that map back and forth between values and types. \begin{align*} \mathcal{T}(n) &= n \\ \mathcal{T}( \{ v_1 \mapsto v'_1, \ldots, v_n \mapsto v'_n \} ) &= \mathcal{T}(v_1) {\to} \mathcal{T}(v'_1) \land \cdots \land \mathcal{T}(v_n) {\to} \mathcal{T}(v'_n) \\[2ex] \mathcal{V}(n) &= n \\ \mathcal{V}(A \to B) &= \{ \mathcal{V}(A)\mapsto\mathcal{V}(B) \} \\ \mathcal{V}(A \land B) &= \mathcal{V}(A) \cup \mathcal{V}(B)\\ \mathcal{V}(\top) &= \emptyset \end{align*}

Given that values and types are really the same, the the typing rule for application is almost the same as the equation for the denotation of \(E[\!| e_1\;e_2 |\!](\rho)\). The only real difference is the use of \(<:\) versus \(\sqsubseteq\). However, subtyping is a larger relation than \(\sqsubseteq\), i.e., \(v_1 \sqsubseteq v_2\) implies \(\mathcal{T}(v_1) <: \mathcal{T}(v_2)\) but it is not the case that \(A <: B\) implies \(\mathcal{V}(A) \sqsubseteq \mathcal{V}(B)\). Subtyping is larger because of rules \((d)\) and \((h)\). The other rules just express the dual of \(\subseteq\).

So the natural question is whether subtyping needs to be bigger than \(\sqsubseteq\), or would we get by with just \(\sqsubseteq\)? In my last post, I mentioned that rule \((h)\) was not necessary. Indeed, I removed it from the Isabelle formalization without disturbing the proofs of whole-program soundness and completeness wrt. operational semantics, and was able to carry on and prove soundness wrt. contextual equivalence. This morning I also replaced rule \((d)\) with a rule that only allows equal function types to be subtypes. \[ \frac{}{A \to B <: A \to B}(d') \] The proofs went through again! Though I did have to make two minor changes in the type system without subsumption to ensure that it stays equivalent to the version of the type system with subsumption. I used the rule given above for function application instead of \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A \to B} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Also, I had to change the typing rule for \(\lambda\) to use subtyping to relate the body's type to the return type. \[ \frac{\Gamma,x:A \vdash e : B' \qquad B' <: B} {\Gamma \vdash \lambda x.\, e : A \to B} \] Transposing this back into the land of denotational semantics and values, we get the following equation for the meaning of \(\lambda\), in which everything in the return specification \(v_2\) must be contained in the value \(v'_2\) produced by the body. \[ E[\!| \lambda x.\; e |\!] (\rho) = \left\{ v \middle| \begin{array}{l}\forall v_1 v_2. \{v_1\mapsto v_2\} \sqsubseteq v \implies \\ \exists v_2'.\; v'_2 \in E[\!| e |\!] (\rho(x{:=}v_1)) \,\land\, v_2 \sqsubseteq v'_2 \end{array} \right\} \]

So with this little change, the "take 3" semantics is a great semantics for the call-by-value untyped lambda calculus! For whole programs, it's sound and complete with respect to the standard operational semantics, and it is also sound with respect to contextual equivalence.

Wednesday, March 08, 2017

Sound wrt. Contextual Equivalence

The ICFP paper submission deadline kept me busy for much of February, but now I'm back to thinking about the simple denotational semantics of the lambda calculus. In previous posts I showed that this semantics is equivalent to standard operational semantics when considering the behavior of whole programs. However, sometimes it is necessary to reason about the behavior of program fragments and we would like to use the denotational semantics for this as well. For example, an optimizing compiler might want to exchange one expression for another less-costly expression that does the same job.

The formal notion of two such ``exchangeable'' expressions is contextual equivalence (Morris 1968). It says that two expression are equivalent if plugging them into an arbitrary context produces programs that behave the same.

Definition (Contextual Equivalence)
Two expressions \(e_1\) and \(e_2\) are contextually equivalent, written \(e_1 \simeq e_2\), iff for any closing context \(C\), \[ \mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]). \]

We would like to know that when two expressions are denotationally equal, then they are also contextually equivalent.

Theorem (Sound wrt. Contextual Equivalence)
If \(E[e_1]\Gamma = E[e_2]\Gamma\) for any \(\Gamma\), then \(e_1 \simeq e_2\).

The rest of the blog post gives an overview of the proof (except for the discussion of related work at the very end). The details of the proof are in the Isabelle mechanization. But first we need to define the terms used in the above statements.

Definitions

Recall that our denotational semantics is defined in terms of an intersection type system. The meaning of an expression is the set of all types assigned to it by the type system. \[ E[e]\Gamma \equiv \{ A \mid \Gamma \vdash_2 e : A \} \] Recall that the types include singletons, functions, intersections, and a top type: \[ A,B,C ::= n \mid A \to B \mid A \land B \mid \top \] I prefer to think of these types as values, where the function, intersection, and top types are used to represent finite tables that record the input-output values of a function.

The intersection type system that we use here differs from the one in the previous post in that we remove the subsumption rule and sprinkle uses of subtyping elsewhere in a standard fashion (Pierce 2002).

\begin{gather*} \frac{}{\Gamma \vdash_2 n : n} \\[2ex] \frac{} {\Gamma \vdash_2 \lambda x.\, e : \top} \quad \frac{\Gamma \vdash_2 \lambda x.\, e : A \quad \Gamma \vdash_2 \lambda x.\, e : B} {\Gamma \vdash_2 \lambda x.\, e : A \wedge B} \\[2ex] \frac{x:A \in \Gamma}{\Gamma \vdash_2 x : A} \quad \frac{\Gamma,x:A \vdash_2 B} {\Gamma \vdash_2 \lambda x.\, e : A \to B} \\[2ex] \frac{\Gamma \vdash_2 e_1: C \quad C <: A \to B \quad \Gamma \vdash_2 e_2 : A} {\Gamma \vdash_2 e_1 \; e_2 : B} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash_2 e_1 : A \quad A <: n_1 \\ \Gamma \vdash_2 e_2 : B \quad B <: n_2 \end{array} \quad [\!|\mathit{op}|\!](n_1,n_2) = n_3} {\Gamma \vdash_2 \mathit{op}(e_1,e_2) : n_3} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: 0 \quad \Gamma \vdash_2 e_3 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: n \quad n \neq 0 \quad \Gamma \vdash_2 e_2 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \end{gather*}

Regarding subtyping, we make a minor change and leave out the rule \[ \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \] because I had a hunch that it wasn't needed to prove Completeness with respect to the small step semantics, and indeed it was not. So the subtyping relation is defined as follows.

\begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'} \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \end{gather*}

This type system is equivalent to the one with subsumption in the following sense.

Theorem (Equivalent Type Systems)

  1. If \(\Gamma \vdash e : A\), then \(\Gamma \vdash_2 e : A'\) and \(A' <: A\) for some \(A'\).
  2. If \(\Gamma \vdash_2 e : A\), then \(\Gamma \vdash e : A\).
Proof
The proofs of the two parts are straightforward inductions on the derivations of the typing judgments. QED

This type system satisfies the usual progress and preservation properties.

Theorem (Preservation)
If \(\Gamma \vdash_2 e : A\) and \(e \longrightarrow e'\), then \(\Gamma \vdash_e e' : A'\) and \(A' <: A\) for some \(A'\).
Proof
The proof of preservation is by induction on the derivation of the reduction. The case for \(\beta\) reduction relies on lemmas about substitution and type environments. QED

Theorem (Progress)
If \(\emptyset \vdash_2 e : A\) and \(\mathrm{FV}(e) = \emptyset\), then \(e\) is a value or \(e \longrightarrow e'\) for some \(e'\).
Proof
The proof of progress is by induction on the typing derivation. As usual it relies on a canonical forms lemma. QED

Lemma (Canonical forms)
Suppose \(\emptyset \vdash_2 v : A\).

  1. If \(A <: n\), then \(v = n\).
  2. If \(A <: B \to C\), then \(v = \lambda x.\, e\) for some \(x,e\).

Next we turn to the definition of \(\mathit{eval}\). As usual, we shall define the behavior of a program in terms of the operational (small-step) semantics and an \(\mathit{observe}\) function. \begin{align*} \mathit{eval}(e) &= \begin{cases} \mathit{observe}(v) & \text{if } e \longrightarrow^{*} v \\ \mathtt{bad} & \text{otherwise} \end{cases}\\ \mathit{observe}(n) &= n \\ \mathit{observe}(\lambda x.\, e) &= \mathtt{fun} \end{align*} In the above we categorize programs as \(\mathtt{bad}\) if they do not produce a value. Thus, we are glossing over the distinction between programs that diverge and programs that go wrong (e.g., segmentation fault). We do this because our denotational semantics does not make such a distinction. However, I plan to circle back to this issue in the future and develop a version of the semantics that does.

Soundness wrt. Contextual Equivalence

We assume that \(E[e_1]\Gamma = E[e_2]\Gamma\) for any \(\Gamma\) and need to show that \(e_1 \simeq e_2\). That is, we need to show that \(\mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]) \) for any closing context \(C\). We shall prove Congruence which lets us lift the denotational equality of \(e_1\) and \(e_2\) through any context, so we have \begin{equation} E[C[e_1]]\emptyset = E[C[e_2]]\emptyset \qquad\qquad (1) \end{equation} Now let us consider the cases for \(\mathsf{eval}(C[e_1])\).

  • Case \(\mathsf{eval}(C[e_1]) = \mathit{observe}(v)\) and \(C[e_1] \longrightarrow^{*} v\):
    By Completeness of the intersection type system we have \(\emptyset \vdash_2 C[e_1] : A\) and \(\emptyset \vdash_2 v : A'\) for some \(A,A'\) such that \(A' <: A\). Then with (1) we have \begin{equation} \emptyset \vdash_2 C[e_2] : A \qquad\qquad (2) \end{equation} The type system is sound wrt. the big-step semantics, so \(\emptyset \vdash C[e_2] \Downarrow v'\) for some \(v'\). Therefore \(C[e_2] \longrightarrow^{*} v''\) because the big-step semantics is sound wrt. the small-step semantics. It remains to show that \(\mathit{observe}(v'') = \mathit{observe}(v)\). From (2) we have \(\emptyset \vdash_2 v'' : A''\) for some \(A''\) where \(A'' <: A\), by Preservation. Noting that we already have \(\emptyset \vdash_2 v : A'\), \(\emptyset \vdash_2 v'' : A''\), \(A' <: A\), and \(A'' <: A\), we conclude that \(\mathit{observe}(v) = \mathit{observe}(v'')\) by the Lemma Observing values of subtypes.
  • Case \(\mathsf{eval}(C[e_1]) = \mathtt{bad}\):
    So \(C[e_1]\) either diverges or gets stuck. In either case, we have \(E[C[e_1]]\emptyset = \emptyset \) (Lemmas Diverging programs have no meaning and Programs that get stuck have no meaning). So by (1) we have \(E[C[e_2]]\emptyset = \emptyset\). We conclude that \(C[e_2]\) either diverges or gets stuck by Lemma (Programs with no meaning diverge or get stuck). Thus, \(\mathsf{eval}(C[e_2]) = \mathtt{bad}\).
QED

Lemma (Congruence)
Let \(C\) be an arbitrary context. If \(E[e_1]\Gamma' = E[e_2]\Gamma'\) for any \(\Gamma'\), then \(E[C[e_1]]\Gamma = E[C[e_2]]\Gamma\).
Proof
We prove congruence by structural induction on the context \(C\), using the induction hypothesis and the appropriate Compatibility lemma for each kind of expression. QED

Most of the Compatibility lemmas are straightforward, though the one for abstraction is worth discussing.

Lemma (Compatibility for abstraction)
If \(E[e_1]\Gamma' = E[e_2]\Gamma'\) for any \(\Gamma'\), then \(E[\lambda x.\, e_1]\Gamma = E[\lambda x.\, e_2]\Gamma\).
Proof
To prove compatibility for abstractions, we first prove that

If \(\Gamma' \vdash_2 e_1 : B\) implies \(\Gamma' \vdash_2 e_2 : B\) for any \(\Gamma',B\), then \(\Gamma \vdash_2 \lambda x.\, e_1 : C\) implies \(\Gamma \vdash_2 \lambda x.\, e_2 : C\).
This is a straightforward induction on the type \(C\). Compatibility follows by two uses this fact. QED

Theorem (Completeness wrt. small-step semantics) If \(e \longrightarrow^{*} v\) then \(\emptyset \vdash_2 e : A\) and \(\emptyset \vdash_2 v : A'\) for some \(A,A'\) such that \(A' <: A\).
Proof
We have \(\emptyset \vdash e : B\) and \(\emptyset \vdash v : B\) by Completeness of the type system with subsumption. Therefore \(\emptyset \vdash_2 e : A\) and \(A <: B\) by Theorem Equivalent Type Systems. By preservation we conclude that \(\emptyset \vdash_2 v : A'\) and \(A' <: A\). QED

In a previous blog post, we proved soundness with respect to big-step semantics for a slightly different denotational semantics. So we update that proof for the denotational semantics defined above. We shall make use of the following logical relation \(\mathcal{G}\) in this proof. \begin{align*} G[n] &= \{ n \} \\ G[A \to B] &= \{ \langle \lambda x.\, e, \rho \rangle \mid \forall v \in G[A]. \; \rho(x{:=}v) \vdash e \Downarrow v' \text{ and } v' \in G[B] \} \\ G[A \land B] &= G[A] \cap G[B] \\ G[\top] &= \{ v \mid v \in \mathrm{Values} \} \\ \\ G[\emptyset] &= \{ \emptyset \} \\ G[\Gamma,x:A] &= \{ \rho(x{:=}v) \mid v \in G[A] \text{ and } \rho \in G[\Gamma] \} \end{align*}

We shall need two lemmas about this logical relation.

Lemma (Lookup in \(\mathcal{G}\))
If \(x:A \in \Gamma\) and \(\rho \in G[\Gamma]\), then \(\rho(x) = v\) and \(v \in G[A]\).

Lemma (\(\mathcal{G}\) preserves subtyping )
If \(A <: B\) and \(v \in G[A]\), then \(v \in G[B]\).

Theorem (Soundness wrt. big-step semantics)
If \(\Gamma \vdash_2 e : A\) and \(\rho \in G[\Gamma]\), then \(\rho \vdash e \Downarrow v\) and \(v \in G[A]\).
Proof
The proof is by induction on the typing derivation. The case for variables uses the Lookup Lemma and all of the elimination forms use the above Subtyping Lemma (because their typing rules use subtyping). QED

Lemma (Observing values of subtypes)
If \(\emptyset \vdash_2 v : A\), \(\emptyset \vdash_2 v' : B\), \(A <: C\), and \(B <: C\), then \(\mathit{observe}(v) = \mathit{observe}(v')\).
Proof
The proof is by cases of \(v\) and \(v'\). We use Lemmas about the symmetry of subtyping for singletons, an inversion lemma for functions, and that subtyping preserves function types. QED

Lemma (Subtyping symmetry for singletons) If \(n <: A\), then \(A <: n\).

For the next lemma we need to characterize the types for functions. \begin{gather*} \frac{}{\mathit{fun}(A \to B)} \quad \frac{\mathit{fun}(A) \qquad \mathit{fun}(B)} {\mathit{fun}(A \land B)} \quad \frac{}{\mathit{fun}(\top)} \end{gather*}

Lemma (Inversion on Functions)
If \(\Gamma \vdash_2 \lambda x.\, e : A\), then \(\mathit{fun}(A)\).

Lemma (Subtyping preserves functions)
If \(A <: B\) and \(\mathit{fun}(A)\), then \(\mathit{fun}(B)\).

Lemma (Diverging Programs have no meaning)
If \(e\) diverges, then \(E[e]\emptyset = \emptyset\).
Proof
Towards a contradiction, suppose \(E[e]\emptyset \neq \emptyset\). Then we have \(\emptyset \vdash_2 e : A\) for some \(A\). Then by soundness wrt. big-step semantics, we have \(\emptyset \vdash e \Downarrow v\) and so also \(e \longrightarrow^{*} v'\). But this contradicts the premise that \(e\) diverges. QED

Lemma (Programs that get stuck have no meaning)
Suppose that \(e \longrightarrow^{*} e'\) and \(e'\) is stuck (and not a value). Then \(E[e]\emptyset = \emptyset\).
Proof
Towards a contradiction, suppose \(E[e]\emptyset \neq \emptyset\). Then we have \(\emptyset \vdash_2 e : A\) for some \(A\). Therefore \(\emptyset \vdash_2 e' : A'\) for some \(A' <: A\). By Progress, either \(e'\) is a value or it can take a step. But that contradicts the premise. QED

Lemma (Programs with no meaning diverge or gets stuck)
If \(E[e]\emptyset = \emptyset\), then \(e\) diverges or reduces to a stuck non-value.
Proof
Towards a contradiction, suppose that \(e\) does not diverge and does not reduce to a stuck non-value. So \(e \longrightarrow^{*} v\) for some \(v\). But then by Completeness wrt. the small-step semantics, we have \(\emptyset \vdash_2 e : A\) for some \(A\), which contradicts the premise \(E[e]\emptyset = \emptyset\). QED

Related Work

The proof method used here, of proving Compatibility and Congruence lemmas to show soundness wrt. contextual equivalence, is adapted from Gunter's book (1992), where he proves that the standard model for PCF (CPO's and continuous functions) is sound. This approach is also commonly used to show that logical relations are sound wrt. contextual equivalence (Pitts 2005).

The problem of full abstraction is to show that denotational equivalence is both sound (aka. correct): \[ E[e_1] = E[e_2] \qquad \text{implies} \qquad e_1 \simeq e_2 \] and complete: \[ e_1 \simeq e_2 \qquad \text{implies} \qquad E[e_1] = E[e_2] \] with respect to contextual equivalence (Milner 1975). Here we showed that the simple denotational semantics is sound. I do not know whether it is complete wrt. contextual equivalence.

There are famous examples of denotational semantics that are not complete. For example, the standard model for PCF is not complete. There are two expressions in PCF that are contextually equivalent but not denotationally equivalent (Plotkin 1977). The idea behind the counter-example is that parallel-or cannot be defined in PCF, but it can be expressed in the standard model. The two expressions are higher-order functions constructed to behave differently only when applied to parallel-or.

Rocca and Paolini (2004) define a filter model \(\mathcal{V}\) for the call-by-value lambda calculus, similar to our simple denotational semantics, and prove that it is sound wrt. contextual equivalence (Theorem 12.1.18). Their type system and subtyping relation differs from ours in several ways. Their \(\land\,\mathrm{intro}\) rule is not restricted to \(\lambda\), they include subsumption, their \(\top\) type is a super-type of all types (not just function types), they include the distributivity rule discussed at the beginning of this post, and they include a couple other rules (labeled \((g)\) and \((v)\) in Fig. 12.1). I'm not sure whether any of these differences really matter; the two systems might be equivalent. Their proof is quite different from ours and more involved; it is based on the notion of approximants. They also show that \(\mathcal{V}\) is incomplete wrt. contextual equivalence, but go on to create another model based on \(\mathcal{V}\) that is. The fact that \(\mathcal{V}\) is incomplete leads me suspect that \(\mathcal{E}\) is also incomplete. This is certainly worth looking into.

Abramsky (1990) introduced a domain logic whose formulas are intersetion types: \[ \phi ::= \top \mid \phi \land \phi \mid \phi \to \phi \] and whose proof theory is an intersection type system designed to capture the semantics of the lazy lambda calculus. Abramsky proves that it is sound with respect to contextual equivalence. As far as I can tell, the proof is different than the approach used here, as it shows that the domain logic is sound with respect to a denotational semantics that solves the domain equation \(D = (D \to D)_\bot\), then shows that this denotational semantics is sound wrt. contextual equivalence. (See also Alan Jeffrey (1994).)

Sunday, February 05, 2017

On the Meaning of Casts and Blame for Gradual Typing

Gradually typed languages enable programmers to choose which parts of their programs are statically typed and which parts are dynamically typed. Thus, gradually typed languages perform some type checking at compile time and some type checking at run time. When specifying the semantics of a gradually typed language, we usually express the run time checking in terms of casts. Thus, the semantics of a gradually typed language depends crucially on the semantics of casts. This blog post tries to answer the question: "What is a cast?"

Syntax and Static Semantics of Casts

Syntactically, a cast is an expression of the form \[ e : A \Rightarrow^{\ell} B \] where \(e\) is a subexpression; \(A\) and \(B\) are the source and target types, respectively. The \(\ell\) is what we call a blame label, which records the location of the cast (e.g. line number and character position).

Regarding the static semantics (compile-time type checking), a cast enables the sub-expression \(e\) of static type \(A\) to be used in a context expecting a different type \(B\). \[ \frac{\Gamma \vdash e : A} {\Gamma \vdash (e : A \Rightarrow^{\ell} B) : B} \] In gradual typing, \(A\) and \(B\) typically differ in how "dynamic" they are but are otherwise similar to each other. So we often restrict the typing rule for casts to only allow source and target types that have some values in common, that is, when \(A\) and \(B\) are consistent. \[ \frac{\Gamma \vdash e : A \quad A \sim B} {\Gamma \vdash (e : A \Rightarrow^{\ell} B) : B} \] For example, if we let \(\star\) be the unknown type (aka. \(\mathtt{dynamic}\)), then we have \(\mathtt{Int} \sim \star\) and \(\star \sim \mathtt{Int}\) but \(\mathtt{Int} \not\sim \mathtt{Int}\to\mathtt{Int}\). Here are the rules for consistency with integers, functions, and the dynamic type. \begin{gather*} \mathtt{Int} \sim \mathtt{Int} \qquad \frac{A \sim B \qquad A' \sim B'} {A \to A' \sim B \to B'} \qquad A \sim \star \qquad \star \sim B \end{gather*}

Dynamic Semantics of Casts

The dynamic semantics of a cast is to check whether the value produced by subexpression \(e\) is of the target type \(B\) and if so, return the value; otherwise signal an error. The following is a strawman denotational semantics that expresses this basic intuition about casts. Suppose we have already defined the meaning of types, so \(\mathcal{T}[\!| A |\!]\) is the set of values of type \(A\). The meaning function \(\mathcal{E}[\!| e |\!]\) maps an expression to a result (either a value \(v\) or error \(\mathsf{blame}\,\ell\)). \begin{align*} \mathcal{E} [\!| e : A \Rightarrow^{\ell} B |\!] &= \begin{cases} v & \text{if } v \in \mathcal{T}[\!| B |\!] \\ \mathsf{blame}\,\ell & \text{if } v \notin \mathcal{T}[\!| B |\!] \end{cases} \\ & \text{where } v = \mathcal{E} [\!| e |\!] \end{align*}

If we restrict ourselves to first-order types such as \(\mathtt{Int}\), it is straightforward to define \(\mathcal{T}\) and check whether a value is in the set. \begin{align*} \mathcal{T}[\!| \mathtt{Int} |\!] &= \mathbb{Z} \end{align*} The story for function types, that is, for \(A \to B\), is more complicated. In a denotational setting, it traditionally takes sophisticated mathematics to come up with mathematical entities that can serve as function values when the \(\star\) type is involved (Scott 1970, 1976). The primary challenge is that one cannot simply use the usual notion of a mathematical function to represent function values because of a cardinality problem. Suppose that \(D\) is the set of all values. The set of mathematical functions whose domain and codomain is \(D\) is necessarily larger than \(D\), so the mathematical functions cannot fit into the set of all values. There is nothing wrong with sophisticated mathematics per se, but when it comes to using a specification for communication (e.g. between language designers and compiler writers), it is less desirable to require readers of the specification to fully understand a large number of auxiliary definitions and decide whether those definitions match their intuitions.

Competing Operational Semantics for Casts

We'll come back to denotational semantics in a bit, but first let's turn to operational semantics, in particular reduction semantics, which is what the recent literature uses to explains casts and the type \(\star\) (Gronski 2006, Siek 2007, Wadler 2009). In a reduction semantics, we give rewrite rules to say what happens when a syntactic value flows into a cast, that is, we say what expression the cast reduces to. Recall that a syntactic value is just an expression that cannot be further reduced. We can proceed by cases on the consistency of the source type \(A\) and target type \(B\).

  • Case \((v : \mathtt{Int} \Rightarrow^{\ell} \mathtt{Int})\). This one is easy, the static type system ensures that \(v\) has type \(\mathtt{Int}\), so there is nothing to check and we can rewrite to \(v\). \[ v : \mathtt{Int} \Rightarrow^{\ell} \mathtt{Int} \longrightarrow v \]
  • Case \((v : \star \Rightarrow^{\ell} \star)\). This one is also easy. \[ v : \star \Rightarrow^{\ell} \star \longrightarrow v \]
  • Case \((v : A \to A' \Rightarrow^{\ell} B \to B')\). This one is more complicated. We'd like to check that the function \(v\) has type \(B \to B'\). Suppose \(B'=\mathtt{Int}\). How can we determine whether a function returns an integer? In general, that's just as hard as the halting problem, which is undecidable. So instead of checking now, we'll delay the checking until when the function is called. We can accomplish this by rewriting to a lambda expression that casts the input, calls \(v\), and then casts the output. \[ v : A \to A' \Rightarrow^{\ell} B \to B' \longrightarrow \lambda x{:}B. (v \; (x : B \Rightarrow^{\ell} A)) : A' \Rightarrow^{\ell} B' \] Here we see the importance of attaching blame labels to casts. Because of the delayed checking, the point of error can be far removed from the original source code location, but thanks to the blame label we can point back to the source location of the cast that ultimately failed (Findler and Felleisen 2002).
  • Case \((v : A \Rightarrow^{\ell} \star)\). For this one there are multiple options in the literature. One option is declare this as a syntactic value (Siek 2009), so no rewrite rule is necessary. Another option is to factor all casts to \(\star\) through the ground types \(G\): \[ G ::= \mathtt{Int} \mid \star \to \star \] Then we expand the cast from \(A\) to \(\star\) into two casts that go through the unique ground type for \(A\). \begin{align*} v : A \Rightarrow^{\ell} \star &\longrightarrow (v : A \Rightarrow^{\ell} G) : G \Rightarrow^{\ell} \star\\ & \text{where } A \sim G, A \neq G, A \neq \star \end{align*} and then declare that expressions of the form \((v : G \Rightarrow^{\ell} \star)\) are values (Wadler 2009).
  • Case \((v : \star \Rightarrow^{\ell} B)\). There are multiple options here as well, but the choice is linked to the above choice regarding casting from \(A\) to \(\star\). If \(v = (v' : A \Rightarrow^{\ell'} \star)\), then we need the following rewrite rules \begin{align*} (v' : A \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} B &\longrightarrow v' : A \Rightarrow^{\ell} B & \text{if } A \sim B \\[2ex] (v' : A \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} B &\longrightarrow \mathsf{blame}\,\ell & \text{if } A \not\sim B \end{align*} On the other hand, if we want to factor through the ground types, we have the following reduction rules. \begin{align*} v : \star \Rightarrow^{\ell} B &\longrightarrow v : \star \Rightarrow^{\ell} G \Rightarrow^{\ell} B \\ & \text{if } B \sim G, B \neq G, B \neq \star \\[2ex] (v : G \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} G &\longrightarrow v \\[2ex] (v : G \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} G' &\longrightarrow \mathsf{blame}\,\ell\\ & \text{if } G \neq G' \end{align*}

Given that we have multiple options regarding the reduction semantics, an immediate question is whether it matters, that is, can we actually observe different behaviors for some program? Yes, in the following example we cast the identity function on integers to an incorrect type. \begin{equation} \begin{array}{l} \mathtt{let}\, id = (\lambda x{:}\mathtt{Int}. x)\, \mathtt{in}\\ \mathtt{let}\, f = (id : \mathtt{Int}\to \mathtt{Int} \Rightarrow^{\ell_1} \star) \, \mathtt{in} \\ \mathtt{let}\, g = (f : \star \Rightarrow^{\ell_2} (\mathtt{Int}\to \mathtt{Int}) \to \mathtt{Int})\,\mathtt{in} \\ \quad g \; id \end{array} \tag{P0}\label{P0} \end{equation} If we choose the semantics that factors through ground types, the above program reduces to \(\mathsf{blame}\, \ell_1\). If we choose the other semantics, the above program reduces to \(\mathsf{blame}\, \ell_2\). Ever since around 2008 I've been wondering which of these is correct, though for the purposes of full disclosure, I've always felt that \(\mathsf{blame}\,\ell_2\) was the better choice for this program. I've also been thinking for a long time that it would be nice to have some alternative, hopefully more intuitive, way to specify the semantics of casts, with which we could then compare the above two alternatives.

A Denotational Semantics of Functions and Casts

I've recently found out that there is a simple way to represent function values in a denotational semantics. The intuition is that, although a function may be able to deal with an infinite number of different inputs, the function only has to deal with a finite number of inputs on any one execution of the program. Thus, we can represent functions with finite tables of input-output pairs. An empty table is written \(\emptyset\), a single-entry table has the form \(v \mapsto v'\) where \(v\) is the input and \(v'\) is the corresponding output. We build a larger table out of two smaller tables \(v_1\) and \(v_2\) with the notation \(v_1 \sqcup v_2\). So, with the addition of integer values \(n \in \mathbb{Z}\), the following grammar specifies the values. \[ v ::= n \mid \emptyset \mid v \mapsto v \mid v \sqcup v \]

Of course, we can't use just one fixed-size table as the denotation of a lambda expression. Depending on the context of the lambda, we may need a bigger table that handles more inputs. Therefore we map each lambda expression to the set of all finite tables that jive with that lambda. To be more precise, we shall define a meaning function \(\mathcal{E}\) that maps an expression and an environment to a set of values, and an auxiliary function \(\mathcal{F}\) that determines whether a table jives with a lambda expression in a given environment. Here's a first try at defining \(\mathcal{F}\). \begin{align*} \mathcal{F}(n, \lambda x{:}A. e, \rho) &= \mathsf{false} \\ \mathcal{F}(\emptyset, \lambda x{:}A. e, \rho) &= \mathsf{true} \\ \mathcal{F}(v \mapsto v', \lambda x{:}A. e, \rho) &= \mathcal{T}(A,v) \text{ and } v' \in \mathcal{E}[\!| e |\!]\rho(x{:=}v) \\ \mathcal{F}(v_1 \sqcup v_2, \lambda x{:}A. e, \rho) &= \mathcal{F}(v_1, \lambda x{:}A. e, \rho) \text{ and } \mathcal{F}(v_2, \lambda x{:}A. e, \rho) \end{align*} (We shall define \(\mathcal{T}(A,v)\) shortly.) We then define the semantics of a lambda-expression in terms of \(\mathcal{F}\). \[ \mathcal{E}[\!| \lambda x{:}A.\, e|\!]\rho = \{ v \mid \mathcal{F}(v, \lambda x{:}A. e, \rho) \} \] The semantics of function application is essentially that of table lookup. We write \((v_2 \mapsto v) \sqsubseteq v_1\) to say, roughly, that \(v_2 \mapsto v\) is an entry in the table \(v_1\). (We give the full definition of \(\sqsubseteq\) in the Appendix.) \[ \mathcal{E}[\!| e_1 \, e_2 |\!]\rho = \left\{ v \middle| \begin{array}{l} \exists v_1 v_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \text{ and } v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \text{ and } (v_2 \mapsto v) \sqsubseteq v_1 \end{array} \right\} \] Finally, to give meaning to lambda-bound variables, we simply look them up in the environment. \[ \mathcal{E}[\!| x |\!]\rho = \{ \rho(x) \} \]

Now that we have a good representation for function values, we can talk about giving meaning to higher-order casts, that is, casts from one function type to another. Recall that in our strawman semantics, we got stuck when trying to define the meaning of types in the form of map \(\mathcal{T}\) from a type to a set of values. Now we can proceed based on the above definition of values \(v\). (To make the termination of \(\mathcal{T}\) more obvious, we'll instead define \(\mathcal{T}\) has a map from a type and a value to a Boolean. The measure is a lexicographic ordering on the size of the type and then the size of the value.) \begin{align*} \mathcal{T}(\mathtt{Int}, v) &= (\exists n. \; v = n) \\ \mathcal{T}(\star, v) &= \mathsf{true} \\ \mathcal{T}(A \to B, n) &= \mathsf{false} \\ \mathcal{T}(A \to B, \emptyset) &= \mathsf{true} \\ \mathcal{T}(A \to B, v \mapsto v') &= \mathcal{T}(A, v) \text{ and } \mathcal{T}(B, v') \\ \mathcal{T}(A \to B, v_1 \sqcup v_2) &= \mathcal{T}(A \to B, v_1) \text{ and } \mathcal{T}(A \to B, v_2) \end{align*} With \(\mathcal{T}\) defined, we define the meaning to casts as follows. \begin{align*} \mathcal{E} [\!| e : A \Rightarrow^{\ell} B |\!]\rho &= \{ v \mid v \in \mathcal{E} [\!| e |\!]\rho \text{ and } \mathcal{T}(B, v) \}\\ & \quad\; \cup \left\{ \mathsf{blame}\,\ell \middle| \begin{array}{l} \exists v.\; v \in \mathcal{E} [\!| e |\!]\rho \text{ and } \neg \mathcal{T}(B, v)\\ \text{and } (\forall l'. v \neq \mathsf{blame}\,l') \end{array}\right\}\\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in \mathcal{E} [\!| e |\!]\rho \} \end{align*} This version says that the result of the cast should only be those values of \(e\) that also have type \(B\). It also says that we signal an error when a value of \(e\) does not have type \(B\). Also, if there was an error in \(e\) then we propagate it. The really interesting thing about this semantics is that, unlike the reduction semantics, we actually check functions at the moment they go through the cast, instead of delaying the check to when they are called. We immediately determine whether the function is of the target type. If the function is not of the target type, we can immediately attribute blame to this cast, so there is no need for complex blame tracking rules.

Of course, we need to extend values to include blame: \[ v ::= n \mid \emptyset \mid v \mapsto v \mid v \sqcup v \mid \mathsf{blame}\,\ell \] and augment \(\mathcal{T}\) and \(\mathcal{F}\) to handle \(\mathsf{blame}\,\ell\). \begin{align*} \mathcal{T}(A\to B, \mathsf{blame}\,\ell) &= \mathsf{false} \\ \mathcal{F}(\mathsf{blame}\,\ell, \lambda x{:}A.e, \rho) &= \mathsf{false} \end{align*} To propagate errors to the meaning of the entire program, we augment the meaning of other language forms, such as function application to pass along blame. \begin{align*} \mathcal{E}[\!| e_1 \, e_2 |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 v_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \text{ and } v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \text{and } (v_2 \mapsto v) \sqsubseteq v_1 \end{array} \right\} \\ & \quad\; \cup \{ \mathsf{blame}\, \ell \mid \mathsf{blame}\, \ell \in \mathcal{E}[\!| e_1 |\!]\rho \text{ or } \mathsf{blame}\,\ell \in \mathcal{E}[\!| e_2 |\!]\rho\} \end{align*}

Two Examples

Let us consider the ramifications of this semantics. The following example program creates a function \(f\) that returns \(1\) on non-zero input and returns the identity function when applied to \(0\). We cast this function to the type \(\mathtt{Int}\to\mathtt{Int}\) on two separate occasions, cast \(\ell_3\) and cast \(\ell_4\), to create \(g\) and \(h\). We apply \(g\) to \(1\) and \(h\) to its result. \[ \begin{array}{l} \mathtt{let}\,f = \left(\lambda x:\mathtt{Int}.\; \begin{array}{l} \mathtt{if}\, x \,\mathtt{then}\, (0: \mathtt{Int}\Rightarrow^{\ell_1}\,\star)\\ \mathtt{else}\, ((\lambda y:\mathtt{Int}.\; y) : \mathtt{Int}\to\mathtt{Int}\Rightarrow{\ell_2} \, \star) \end{array} \right) \; \mathtt{in} \\ \mathtt{let}\,g = (f : \mathtt{Int}\to\star \Rightarrow^{\ell_3} \mathtt{Int}\to\mathtt{Int})\, \mathtt{in} \\ \mathtt{let}\,h = (f : \mathtt{Int}\to\star \Rightarrow^{\ell_4} \mathtt{Int}\to\mathtt{Int})\, \mathtt{in} \\ \mathtt{let}\,z = (g \; 1)\, \mathtt{in} \\ \quad (h\; z) \end{array} \] The meaning of this program is \(\{ \mathsf{blame}\,\ell_3, \mathsf{blame}\,\ell_4\}\). To understand this outcome, we can analyze the meaning of the various parts of the program. (The semantics is compositional!) Toward writing down the denotation of \(f\), let's define auxiliary functions \(id\) and \(F\). \begin{align*} id(n) &= \mathsf{false} \\ id(\emptyset) &= \mathsf{true} \\ id(v \mapsto v') &= (v = v') \\ id(v_1 \sqcup v_2) &= id(v_1) \text{ and } id(v_2) \\ id(\mathsf{blame}\,\ell) &= \mathsf{false} \\ \\ F(n) &= \textsf{false} \\ F(\emptyset) &= \textsf{true} \\ F(0 \mapsto v) &= \mathit{id}(v) \\ F(n \mapsto 0) &= (n \neq 0)\\ F(v_1 \sqcup v_2) &= F(v_1) \text{ and } F(v_2) \\ F(\mathsf{blame}\,\ell) &= \mathsf{false} \end{align*} The denotation of \(f\) is \[ \mathcal{E}[\!| f |\!] = \{ v \mid F(v) \} \] To express the denotation of \(g\), we define \(G\) \begin{align*} G(n) &= \textsf{false} \\ G(\emptyset) &= \textsf{true} \\ G(n \mapsto 0) &= (n \neq 0) \\ G(v_1 \sqcup v_2) &= G(v_1) \text{ and } G(v_2) \\ G(\mathsf{blame}\,\ell) &= \mathsf{false} \end{align*} The meaning of \(g\) is all the values that satisfy \(G\) and also \(\mathsf{blame}\,\ell_3\). \[ \mathcal{E}[\!| g |\!] = \{ v \mid G(v) \} \cup \{ \mathsf{blame}\, \ell_3 \} \] The meaning of \(h\) is similar, but with different blame. \[ \mathcal{E}[\!| h |\!] = \{ v \mid G(v) \} \cup \{ \mathsf{blame}\, \ell_4 \} \] The function \(g\) applied to \(1\) produces \(\{ 0, \mathsf{blame}\, \ell_3\}\), whereas \(h\) applied to \(0\) produces \(\{ \mathsf{blame}\, \ell_4\}\). Thus, the meaning of the whole program is \(\{ \mathsf{blame}\,\ell_3, \mathsf{blame}\,\ell_4\}\).

Because cast \(\ell_3\) signals an error, one might be tempted to have the meaning of \(g\) be just \(\{ \mathsf{blame}\,\ell_3\}\). However, we want to allow implementations of this language that do not blame \(\ell_3\) (\(g\) is never applied to \(0\) after all, so its guilt was not directly observable) and instead blame \(\ell_4\), who was caught red handed. So it is important for the meaning of \(g\) to include the subset of values from \(f\) that have type \(\mathtt{Int}\to\mathtt{Int}\) so that we can carry on and find other errors as well. We shall expect implementations of this language to be sound with respect to blame, that is, if execution results in blame, it should blame one of the labels that is in the denotation of the program (and not some other innocent cast).

Let us return to the example (P0). The denotation of that program is \(\{\mathsf{blame}\,\ell_2\}\) because the cast at \(\ell_2\) is a cast to \((\mathtt{Int}\to \mathtt{Int}) \to \mathtt{Int}\) and the identity function is not of that type. The other case at \(\ell_1\) is innocent because it is a cast to \(\star\) and all values are of that type, including the identity cast.

Discussion

By giving the cast calculus a denotational semantics in terms of finite function tables, it became straightforward to define whether a function value is of a given type. This in turn made it easy to define the meaning of casts, even casts at function type. A cast succeeds if the input value is of the target type and it fails otherwise. With this semantics we assign blame to a cast in an eager fashion, without the need for the blame tracking machinery that is present in the operational semantics.

We saw an example program where the reduction semantics that factors through ground types attributes blame to a cast that the denotational semantics says is innocent. This lends some evidence to that semantics being less desirable.

I plan to investigate whether the alternative reduction semantics is sound with respect to the denotational semantics in the sense that the reduction semantics only blames a cast if the denotational semantics says it is guilty.

Appendix

We give the full definition of the cast calculus here in the appendix. The relation \(\sqsubseteq\) that we used to define table lookup is the dual of the subtyping relation for intersection types. The denotational semantics is a mild reformation of the intersection type system that I discussed in previous blog posts.

Syntax \[ \begin{array}{lcl} A &::=& \mathtt{Int} \mid A \to B \mid \star \\ e &::= &n \mid \mathit{op}(e,e) \mid \mathtt{if}\, e\, \mathtt{then}\, e \,\mathtt{else}\, e \mid x \mid \lambda x{:}A \mid e \; e \mid e : A \Rightarrow^\ell B \end{array} \] Consistency \begin{gather*} \mathtt{Int} \sim \mathtt{Int} \qquad \frac{A \sim B \qquad A' \sim B'} {A \to A' \sim B \to B'} \qquad A \sim \star \qquad \star \sim B \end{gather*} Type System \begin{gather*} \frac{}{\Gamma \vdash n : \mathtt{Int}} \quad \frac{\Gamma \vdash e_1 : \mathtt{Int} \quad \Gamma \vdash e_2 : \mathtt{Int}} {\Gamma \vdash \mathit{op}(e_1,e_2) : \mathtt{Int}} \\[2ex] \frac{\Gamma \vdash e_1 : \mathtt{Int} \quad \Gamma \vdash e_2 : A \quad \Gamma \vdash e_3 : A} {\Gamma \vdash \mathtt{if}\, e_1\, \mathtt{then}\, e_2 \,\mathtt{else}\, e_3 : A} \\[2ex] \frac{x{:}A \in \Gamma}{\Gamma \vdash x : A} \quad \frac{\Gamma,x{:}A \vdash e : B}{\Gamma \vdash \lambda x{:}A.\; e : A \to B} \quad \frac{\Gamma e_1 : A \to B \quad \Gamma e_2 : A} {\Gamma \vdash e_1 \; e_2 : B} \\[2ex] \frac{\Gamma \vdash e : A \quad A \sim B} {\Gamma \vdash (e : A \Rightarrow^\ell B) : B} \end{gather*} Values \[ v ::= n \mid \emptyset \mid v \mapsto v \mid v \sqcup v \mid \mathsf{blame}\,\ell \] Table Lookup (Value Information Ordering) \begin{gather*} \frac{}{n \sqsubseteq n} \quad \frac{v'_1 \sqsubseteq v_1 \quad v_2 \sqsubseteq v'_2} {v_1 \mapsto v_2 \sqsubseteq v'_1 \mapsto v'_2} \quad \frac{}{\mathsf{blame}\,\ell \sqsubseteq \mathsf{blame}\,\ell} \\[2ex] \frac{}{v_1 \sqsubseteq v_1 \sqcap v_2} \quad \frac{}{v_2 \sqsubseteq v_1 \sqcap v_2} \quad \frac{v_1 \sqsubseteq v_3 \quad v_2 \sqsubseteq v_3} {v_1 \sqcup v_2 \sqsubseteq v_3} \\[2ex] \frac{}{v_1 \mapsto (v_2 \sqcup v_3) \sqsubseteq (v_1 \mapsto v_2) \sqcup (v_1 \mapsto v_3)} \quad \frac{}{\emptyset \sqsubseteq v_1 \mapsto v_2} \quad \frac{}{\emptyset \sqsubseteq \emptyset} \end{gather*} \noindent Semantics of Types \begin{align*} \mathcal{T}(\mathtt{Int}, v) &= (\exists n. \; v = n) \\ \mathcal{T}(\star, v) &= \mathsf{true} \\ \mathcal{T}(A \to B, n) &= \mathsf{false} \\ \mathcal{T}(A \to B, \emptyset) &= \mathsf{true} \\ \mathcal{T}(A \to B, v \mapsto v') &= \mathcal{T}(A, v) \text{ and } \mathcal{T}(B, v') \\ \mathcal{T}(A \to B, v_1 \sqcup v_2) &= \mathcal{T}(A \to B, v_1) \text{ and } \mathcal{T}(A \to B, v_2) \\ \mathcal{T}(A\to B, \mathsf{blame}\,\ell) &= \mathsf{false} \end{align*} Denotational Semantics \begin{align*} \mathcal{E}[\!| n |\!]\rho &= \{ n \}\\ \mathcal{E}[\!| \mathit{op}(e_1,e_2) |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 v_2 n_1 n_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \land v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \land\; n_1 \sqsubseteq v_1 \land n_2 \sqsubseteq v_2 \land v = [\!| \mathit{op} |\!](n_1,n_2) \end{array} \right\}\\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in (\mathcal{E} [\!| e_1 |\!]\rho \cup \mathcal{E} [\!| e_2 |\!]\rho) \} \\ \mathcal{E}[\!| \mathtt{if}\, e_1\, \mathtt{then}\, e_2 \,\mathtt{else}\, e_3 |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 n. v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \land n \sqsubseteq v_1 \\ \land\; (n = 0 \Longrightarrow v \in \mathcal{E}[\!| e_3 |\!]\rho) \\ \land\; (n \neq 0 \Longrightarrow v \in \mathcal{E}[\!| e_2 |\!]\rho) \end{array} \right\}\\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in (\mathcal{E} [\!| e_1 |\!]\rho \cup \mathcal{E} [\!| e_2 |\!]\rho \cup \mathcal{E} [\!| e_3 |\!]\rho ) \} \\ \mathcal{E}[\!| x |\!]\rho &= \{ \rho(x) \}\\ \mathcal{E}[\!| \lambda x{:}A.\, e|\!]\rho &= \{ v \mid \mathcal{F}(v, \lambda x{:}A.\, e, \rho) \} \\ \mathcal{E}[\!| e_1 \, e_2 |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 v_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \land v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \land\; (v_2 \mapsto v) \sqsubseteq v_1 \end{array} \right\} \\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in (\mathcal{E} [\!| e_1 |\!]\rho \cup \mathcal{E} [\!| e_2 |\!]\rho) \} \\ \mathcal{E} [\!| e : A \Rightarrow^{\ell} B |\!]\rho &= \{ v \mid v \in \mathcal{E} [\!| e |\!]\rho \text{ and } \mathcal{T}(B, v) \}\\ & \quad\; \cup \left\{ \mathsf{blame}\,\ell \middle| \begin{array}{l}\exists v.\; v \in \mathcal{E} [\!| e |\!] \rho \text{ and } \neg \mathcal{T}(B, v)\\ \text{and } (\forall \ell'. v \neq \mathsf{blame}\,\ell') \end{array} \right\} \\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in \mathcal{E} [\!| e |\!]\rho \} \\ \mathcal{F}(n, \lambda x{:}A.\, e, \rho) &= \mathsf{false} \\ \mathcal{F}(\emptyset, \lambda x{:}A.\, e, \rho) &= \mathsf{true} \\ \mathcal{F}(v \mapsto v', \lambda x{:}A.\, e, \rho) &= \mathcal{T}(A,v) \text{ and } v' \in \mathcal{E}[\!| e |\!]\rho(x{:=}v) \\ \mathcal{F}(v_1 \sqcup v_2, \lambda x{:}A.\, e, \rho) &= \mathcal{F}(v_1, \lambda x{:}A.\, e, \rho) \text{ and } \mathcal{F}(v_2, \lambda x{:}A.\, e, \rho) \\ \mathcal{F}(\mathsf{blame}\,\ell, \lambda x{:}A.e, \rho) &= \mathsf{false} \end{align*}

References

  • (Findler 2002) Contracts for higher-order functions. R. B. Findler and M. Felleisen. International Conference on Functional Programming. 2002.
  • (Gronski 2006) Sage: Hybrid Checking for Flexible Specifications. Jessica Gronski and Kenneth Knowles and Aaron Tomb and Stephen N. Freund and Cormac Flanagan. Scheme and Functional Programming Workshop, 2006.
  • (Scott 1970) Outline of a Mathematical Theory of Computation. Dana Scott. Oxford University. 1970. Technical report PRG-2.
  • (Scott 1976) Data Types as Lattices. Dana Scott. SIAM Journal on Computing. 1976. Volume 5, Number 3.
  • (Siek 2009) Exploring the Design Space of Higher-Order Casts. Jeremy G. Siek and Ronald Garcia and Walid Taha. European Symposium on Programming. 2009.
  • (Wadler 2009) Well-typed programs can't be blamed. Philip Wadler and Robert Bruce Findler. European Symposium on Programming. 2009.

Monday, January 30, 2017

Completeness of Intersection Types wrt. an Applied CBV Lambda Calculus

I'm still quite excited about the simple denotational semantics and looking forward to applying it to the semantics of gradually typed languages. However, before building on it I'd like to make sure it's correct. Recall that I proved soundness of the simple semantics with respect to a standard big-step operational semantics, but I did not prove completeness. Completeness says that if the operational semantics says that the program reduces to a particular value, then the denotational semantics does too. Recall that the first version of the simple semantics that I gave was not complete! It couldn't handle applying a function to itself, which is needed for the \(Y\) combinator and recursion. I've written down a fix, but the question remains whether the fix is good enough, that is, can we prove completeness? In the mean time, I learned that the simple semantics is closely related to filter models based on type systems with intersection types. This is quite helpful because that literature includes many completeness results for pure lambda calculi, see for example Intersection Types and Computational Rules by Alessi, Barbanera, and Dezani-Ciancaglini (2003).

In this blog post I prove completeness for an intersection type system with respect to a call-by-value lambda calculus augmented with numbers, primitive operators (addition, multiplication, etc.), and a conditional if-expression. The main outline of the proof is adapted from the above-cited paper, in which completeness is proved with respect to small-step operational semantics, though you'll find more details (i.e. lemmas) here because I've mechanized the proof in Isabelle and can't help but share in my suffering ;) (Lambda.thy, SmallStepLam.thy, IntersectComplete.thy) Ultimately I would like to prove completeness for the simple denotational semantics, but a good first step is doing the proof for a system that is in between the simple semantics and the intersection type systems in the literature.

The intersection type system I use here differs from ones in the literature in that I restrict the \(\wedge\) introduction rule to \(\lambda\)'s instead of applying it to any expression, as shown below. I recently realized that this change does not disturb the proof of Completeness because we're dealing with a call-by-value language. \[ \frac{\Gamma \vdash \lambda x.\, e : A \quad \Gamma \vdash \lambda x.\, e : B} {\Gamma \vdash \lambda x.\, e : A \wedge B}(\wedge\,\mathrm{intro}) \] I would like to remove the subsumption rule \[ \frac{\Gamma \vdash e : A \quad A <: B} {\Gamma \vdash e : B}(\mathrm{Sub}) \] but doing so was increasing the complexity of the proof of Completeness. Instead I plan to separately prove that the version without subsumption is equivalent to the version with subsumption. One might also consider doing the same regarding our above change to the \(\wedge\) introduction rule. I have also been working on that approach, but proving the admissibility of the standard \(\wedge\) introduction rule has turned out to be rather difficult (but interesting!).

Definition of an Applied CBV Lambda Calculus

Let us dive into the formalities and define the language that we're interested in. Here's the types, which include function types, intersection types, the top function type (written \(\top\)), and singleton numbers. Our \(\top\) corresponds to the type \(\nu\) from Egidi, Honsell, and Rocca (1992). See also Alessi et al. (2003). \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \mid n \] and here's the expressions: \[ e ::= n \mid \mathit{op}(e,e) \mid \mathrm{if}\,e\,\mathrm{then}\,e\,\mathrm{else}\,e \mid x \mid \lambda x.\, e \mid e\,e \] where \(n\) ranges over numbers and \(\mathit{op}\) ranges over arithmetic operators such as addition.

We define type environments as an association list mapping variables to types. \[ \Gamma ::= \emptyset \mid \Gamma,x:A \]

The type system, defined below, is unusual in that it is highly precise. Note that the rule for arithmetic operators produces a precise singleton result and that the rules for if-expressions require the condition to be a singleton number (zero or non-zero) so that it knows which branch is taken. Thus, this type system is really a kind of dynamic semantics.

\begin{gather*} \frac{}{\Gamma \vdash n : n} \\[2ex] \frac{} {\Gamma \vdash \lambda x.\, e : \top}(\top\,\mathrm{intro}) \quad \frac{\Gamma \vdash \lambda x.\, e : A \quad \Gamma \vdash \lambda x.\, e : B} {\Gamma \vdash \lambda x.\, e : A \wedge B}(\wedge\,\mathrm{intro}) \\[2ex] \frac{\Gamma \vdash e : A \quad A <: B} {\Gamma \vdash e : B}(\mathrm{Sub}) \\[2ex] \frac{x:A \in \Gamma}{\Gamma \vdash x : A} \quad \frac{\Gamma,x:A \vdash B} {\Gamma \vdash \lambda x.\, e : A \to B} \quad \frac{\Gamma \vdash e_1: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B}(\to\mathrm{intro}) \\[2ex] \frac{\Gamma \vdash e_1 : n_1 \quad \Gamma \vdash e_2 : n_2 \quad [\!|\mathit{op}|\!](n_1,n_2) = n_3} {\Gamma \vdash \mathit{op}(e_1,e_2) : n_3} \\[2ex] \frac{\Gamma \vdash e_1 : 0 \quad \Gamma \vdash e_3 : B} {\Gamma \vdash \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \quad \frac{\Gamma \vdash e_1 : n \quad n \neq 0 \quad \Gamma \vdash e_2 : A} {\Gamma \vdash \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : A} \end{gather*}

The rules for subtyping come from the literature.

\begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'}(<:\to) \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \end{gather*}

We shall be working with values that are well typed in an empty type environment. This usually implies that the values have no free variables. However, that is not true of the current type system because of the \(\top\) introduction rule. So we add a side condition for \(\lambda\) in our definition of values. (In retrospect, I should have instead included a statement about free variables in the main Completeness theorem and then propagated that information to where it is needed.) \[ v ::= n \mid \lambda x.\, e \quad \text{where } FV(e) \subseteq \{x\} \]

We use a naive notion of substitution (not capture avoiding) because the \(v\)'s have no free variables to capture. \begin{align*} [x:=v] y &= \begin{cases} v & \text{if } x = y \\ y & \text{if } x \neq y \end{cases} \\ [x:=v] n &= n \\ [x:=v] (\lambda y.\, e) &= \begin{cases} \lambda y.\, e & \text{if } x = y \\ \lambda y.\, [x:=v] e & \text{if } x \neq y \end{cases} \\ [x:=v](e_1\, e_2) &= ([x:=v]e_1\, [x:=v]e_2) \\ [x:=v]\mathit{op}(e_1, e_2) &= \mathit{op}([x:=v]e_1, [x:=v]e_2) \\ [x:=v](\mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3) &= \mathrm{if}\,[x:=v]e_1\,\mathrm{then}\,[x:=v]e_2\,\mathrm{else}\,[x:=v]e_3 \end{align*}

The small-step operational semantics is defined by the following reduction rules. I'm not sure why I chose to use SOS-style rules instead of evaluation contexts. \begin{gather*} \frac{}{(\lambda x.\,e) \; v \longrightarrow [x:=v]e} \quad \frac{e_1 \longrightarrow e'_1}{e_1\,e_2 \longrightarrow e'_1 \, e_2} \quad \frac{e_2 \longrightarrow e'_2}{e_1\,e_2 \longrightarrow e_1 \, e'_2} \\[2ex] \frac{}{\mathit{op}(n_1,n_2) \longrightarrow [\!|\mathit{op}|\!](n_1,n_2)} \quad \frac{e_1 \longrightarrow e'_1} {\mathit{op}(e_1,e_2) \longrightarrow \mathit{op}(e'_1,e_2)} \quad \frac{e_2 \longrightarrow e'_2} {\mathit{op}(e_1,e_2) \longrightarrow \mathit{op}(e_1,e'_2)} \\[2ex] \frac{}{\mathrm{if}\,0\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 \longrightarrow e_3} \quad \frac{n \neq 0} {\mathrm{if}\,n\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 \longrightarrow e_2} \\[2ex] \frac{e_1 \longrightarrow e'_1} {\mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 \longrightarrow \mathrm{if}\,e'_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3} \end{gather*} \[ \frac{}{e \longrightarrow^{*} e} \qquad \frac{e_1 \longrightarrow e_2 \quad e_2 \longrightarrow^{*} e_3} {e_1 \longrightarrow^{*} e_3} \]

Proof of Completeness

The theorem that we aim to prove is that if the operational semantics says that a program reduces to a value, then the program is typable in the intersection type system and that the result type precisely describes the result value. I'm going to present the proof in a top-down style, so the proof of each lemma that I use is found further along in this blog post.

Theorem (Completeness)
If \(e \longmapsto^{*} v\), then \(\emptyset \vdash e : A\) and \(\emptyset \vdash v : A\) for some type \(A\).
Proof
Every value is typable (use the \(\top\) introduction rule for \(\lambda\)), so we have some \(A\) such that \(\emptyset \vdash v : A\). We shall show that typing is preserved by reverse reduction, which will give us \(\emptyset \vdash e : A\). QED

Lemma (Reverse Multi-Step Preserves Types)
If \(e \longrightarrow^{*} e'\) and \(\emptyset \vdash e' : A\), then \(\emptyset \vdash e : A\).
Poof
The proof is by induction on the derivation of \(e \longrightarrow^{*} e'\). The base case is trivial. The induction case requires that typing be preserved for a single-step of reduction, which we prove next. QED

Lemma (Reverse Single-Step Preserves Types)
If \(e \longrightarrow e'\) and \(\emptyset \vdash e' : A\), then \(\emptyset \vdash e : A\).
Proof
The proof is by induction on the derivation of \(e \longrightarrow e'\). The most important case is for function application: \[ (\lambda x.\,e) \; v \longrightarrow [x:=v]e \] We have that \(\emptyset \vdash [x:=v]e : A\) and need to show that \(\emptyset \vdash (\lambda x.\,e) \; v : A\). That is, we need to show that call-by-value \(\beta\)-expansion preserves types. So we need \(x:B \vdash e : A\) and \(\emptyset \vdash v : B\) for some type \(B\). The proof of this was the crux and required some generalization; I found it difficult to find the right statement of the lemma. It is proved below under the name Reverse Substitution Preserves Types. The other cases of this proof are straightforward except for one hiccup. They all require inversion lemmas (aka. generation lemmas) to unpack the information from \(\emptyset \vdash e' : A\). However, as is usual for languages with subsumption, the inversion lemmas are not simply proved by case analysis on typing rules, but must instead be proved by induction on the typing derivations. QED

Lemma (Inversion)

  1. If \(\Gamma \vdash n : A \), then \(n <: A\).
  2. If \(\Gamma \vdash e_1\,e_2 : A \), then \( \Gamma \vdash e_1 : B \to A' \), \( A' <: A \), and \( \Gamma \vdash e_2 : B \) for some \(A'\) and \(B\).
  3. If \(\Gamma \vdash \mathit{op}(e_1,e_2) : A\), then \(\Gamma \vdash e_1 : n_1\), \(\Gamma \vdash e_2 : n_2 \), \(\Gamma \vdash \mathit{op}(e_1,e_2) : [\!|\mathit{op}|\!](n_1,n_2) \), and \([\!|\mathit{op}|\!](n_1,n_2) <: A\) for some \(n_1\) and \(n_2\).
  4. If \(\Gamma \vdash \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : A\), then either
    • \(\Gamma \vdash e_1 : 0\), \(\Gamma\vdash e_3 : B\), and \(B <: A\), for some \(B\).
    • \(\Gamma \vdash e_1 : n\), \(n \neq 0\), \(\Gamma\vdash e_2 : A'\), and \(A' <: A\), for some \(A'\).
Proof The proofs are by induction on the derivation of typing. QED

To state the reverse substitution lemma in a way that provides a useful induction hypothesis in the case for \(\lambda\), we introduce a notion of equivalence of type environments: \[ \Gamma \approx \Gamma' = (x : A \in \Gamma \text{ iff } x : A \in \Gamma') \] The reverse substitution lemma will show that if \([y:=v]e\) is well typed, then so is \(e\) in the environment extended with \(y:B\), for some appropriate choice of \(B\). Now, the value \(v\) may appear in multiple places within \([y:=v]e\) and in each place, \(v\) may have been assigned a different type. For example, \(v\) could be \(\lambda x.\, {+}(x,1)\) and it could have the type \(0\to 1\) in one place and \(1\to 2\) in another place. However, we must choose a single type \(B\) for \(y\). But thanks to intersection types, we can choose \(B\) to be the intersection of all the types assigned to \(v\).

Lemma (Reverse Substitution Preserves Types)
If \(\Gamma \vdash [y:=v]e : A\) and \(y \notin \mathrm{dom}(\Gamma)\), then \( \emptyset \vdash v : B \), \( \Gamma' \vdash e : A\), and \(\Gamma' \approx \Gamma,y:B\) for some \(\Gamma'\) and \(B\).
Proof The proof is by induction on the derivation of \(\Gamma \vdash [y:=v]e : A\). (I wonder if the proof would have been easier if done by induction on \(e\).) The proof is rather long, so I'll just highlight the lemmas that were needed here. The full details are in the Isabelle mechanization.

  • The cases for variables and numbers are relatively straightforward.
  • The case for \(\lambda\) requires lemmas regarding Environment Strengthening and Environment Lowering and their corollaries.
  • The case for subsumption is relatively easy.
  • The case for function application is interesting. We have \( (e_1 \, e_2) = [y:=v]e \), so \(e = e'_1 \, e'_2\) where \(e_1 = [y:=v]e'_1\) and \(e_2 = [y:=v]e'_2\). From the induction hypotheses for \(e_1\) and \(e_2\), we have \(\emptyset \vdash v : B_1\) and \(\emptyset \vdash v : B_2\). The lemma Combine Values gives us some \(B_3\) such that \(\emptyset \vdash v : B_3\) and \(B_3 <: B_1\) and \(B_3 <: B_2\). We choose \(\Gamma' = \Gamma,y:B_3\). To show \(\Gamma' \vdash e'_1\, e'_2 : A\) we use the induction hypotheses for \(e_1\) and \(e_2\), along with the lemmas Equivalent Environments and Environment Lowering.
  • The case for \(\top\) introduction is straightforward.
  • The case for \(\wedge\) introduction uses the lemmas Well-typed with No Free Variables, Environment Strengthening, Combine Values, Equivalent Environments, and Environment Lowering.
  • The cases for arithmetic operators and if-expressions follow a pattern similar to that of function application.
QED

Lemma (Environment Strengthening)
If \(\Gamma \vdash e : A\) and for every free variable \(x\) in \(e\), \( x:A \in \Gamma \text{ iff } x:A \in \Gamma' \), then \(\Gamma' \vdash e : A\).
Proof The proof is by induction on the derivation of \(\Gamma \vdash e : A\). QED

Corollary (Well-typed with No Free Variables)
If \(\Gamma \vdash e : A\) and \(\mathit{FV}(e) = \emptyset\), then \(\emptyset \vdash e : A\).

We define the following ordering relation on environments: \[ \Gamma \sqsupseteq \Gamma' = (x:A \in \Gamma \Longrightarrow x:A' \in \Gamma \text{ and } A' <: A) \]

Lemma (Environment Lowering)
If \(\Gamma \vdash e : A\) and \(\Gamma \sqsupseteq \Gamma'\), then \(\Gamma' \vdash e : A\).
Proof The proof is by induction on the derivation of \(\Gamma \vdash e : A\). QED

Corollary (Equivalent Environments)
If \(\Gamma \vdash e : A\) and \(\Gamma \approx \Gamma'\), then \(\Gamma' \vdash e : A\).
Proof If \(\Gamma \approx \Gamma'\) then we also have \(\Gamma \sqsupseteq \Gamma'\), so we conclude by applying Environment Lowering. QED

Lemma (Combine Values)
If \(\Gamma \vdash v : B_1\) and \(\Gamma \vdash v : B_2\), then \(\Gamma \vdash v : B_3\), \(B_3 <: B_1 \wedge B_2\), and \(B_1 \wedge B_2 <: B_3\) for some \(B_3\).
Proof The proof is by cases on \(v\). It uses the Inversion lemma for numbers and the \(\wedge\) introduction rule for \(\lambda\)'s. QED

Saturday, January 14, 2017

Intersection Types as Denotations

In my previous post I described a simple denotational semantics for the CBV lambda calculus in which the meaning of a \(\lambda\) function is a set of tables. For example, here is a glimpse at some of the tables in the meaning of \(\lambda x. x+2\).

\[ E[\!| (\lambda x. x+2) |\!](\emptyset) = \left\{ \begin{array}{l} \emptyset, \\ \{ 5\mapsto 7 \},\\ \{ 0\mapsto 2, 1 \mapsto 3 \},\\ \{ 0\mapsto 2, 1\mapsto 3, 5 \mapsto 7 \}, \\ \vdots \end{array} \right\} \]

Since then I've been reading the literature starting from an observation by Alan Jeffrey that this semantics seems similar to the domain logic in Abramsky's Ph.D. thesis (1987). That in turn pointed me to the early literature on intersection types, which were invented in the late 1970's by Coppo, Dezani-Ciancaglini, Salle, and Pottinger. It turns out that one of the motivations for intersection types was to create a denotational semantics for the lambda calculus. Furthermore, it seems that intersection types are closely related to my simple denotational semantics!

The intersection types for the pure lambda calculus included function types, intersections, and a top type: \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \] For our purposes we shall also add singleton types for numbers. \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \mid n \] So the number \(2\) has the singleton type \(2\) and any function that maps \(0\) to \(2\) will have the type \(0 \to 2\). Any function that maps \(0\) to \(2\) and also maps \(1\) to \(3\) has the intersection type \[ (0 \to 2) \wedge (1 \to 3) \] These types are starting to look a lot like the tables above! Indeed, even the empty table \(\emptyset\) corresponds to the top type \(\top\), they both can be associated with any \(\lambda\) function.

The addition of the singleton number types introduces a choice regarding the top type \(\top\). Does it include the numbers and functions or just functions? We shall go with the later, which corresponds to the \(\nu\) type in the literature (Egidi, Honsell, Rocca 1992).

Now that we have glimpsed the correspondence between tables and intersection types, let's review the typing rules for the implicitly typed lambda calculus with singletons, intersections, and \(\top\).

\begin{gather*} \frac{}{\Gamma \vdash n : n} \\[2ex] \frac{}{\Gamma \vdash \lambda x.\,e : \top}(\top\,\mathrm{intro}) \quad \frac{\Gamma \vdash e : A \quad \Gamma \vdash e : B} {\Gamma \vdash e : A \wedge B}(\wedge\,\mathrm{intro}) \\[2ex] \frac{\Gamma \vdash e : A \quad A <: B} {\Gamma \vdash e : B}(\mathrm{Sub}) \quad \frac{x:A \in \Gamma}{\Gamma \vdash x : A} \\[2ex] \frac{\Gamma,x:A \vdash e : B} {\Gamma \vdash \lambda x.\, e : A \to B} \quad \frac{\Gamma \vdash e_1: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B}(\to\mathrm{elim}) \end{gather*} where subtyping is defined as follows \begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'} \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \end{gather*}

With intersection types, one can write the same type in many different ways. For example, the type \(5\) is the same as \(5 \wedge 5\). One common way to define such equalities is in terms of subtyping: \(A = B\) iff \(A <: B\) and \(B <: A\).

So how does one define a semantics using intersection types? Barendregt, Coppo, Dezani-Ciancaglini (1983) (BCD) define the meaning of an expression \(e\) to be the set of types for which it is typable, something like \[ [\!| e |\!](\Gamma) = \{ A \mid \Gamma \vdash e : A \} \] For a simple type system (without intersection), such as semantics would not be useful. Any term with self application (needed for recursion) would not type check and therefore its meaning would be the empty set. But with intersection types, the semantics gives a non-empty meaning to all terminating programs!

The next question is, how does the BCD semantics relate to my simple table-based semantics? One difference is that the intersection type system has two rules that are not syntax directed: \((\wedge\,\mathrm{intro})\) and (Sub). However, we can get rid of these rules. The \((\wedge\,\mathrm{intro})\) rule is not needed for numbers, only for functions. So one should be able to move all uses of the \((\wedge\,\mathrm{intro})\) rules to \(\lambda\)'s. \[ \frac{\Gamma \vdash \lambda x.\, e : A \quad \Gamma \vdash \lambda x.\; e : B} {\Gamma \vdash \lambda x.\, e : A \wedge B} \] To get rid of (Sub), we need to modify \((\to\mathrm{elim})\) to allow for the possibility that \(e_1\) is not literally of function type. \[ \frac{\Gamma \vdash e_1 : C \quad C <: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B} \]

All of the rules are now syntax directed, though we now have three rules for \(\lambda\), but those rules handle the three different possible types for a \(\lambda\) function: \(A \to B\), \(A \wedge B\), and \(\top\). Next we observe that a relation is isomorphic to a function that produces a set. So we change from \(\Gamma \vdash e : A\) to \(E[\!| e |\!](\Gamma) = \mathcal{A}\) where \(\mathcal{A}\) ranges over sets of types, i.e., \(\mathcal{A} \in \mathcal{P}(A)\). We make use of an auxiliary function \(F\) to define the meaning of \(\lambda\) functions. \begin{align*} E[\!| n |\!](\Gamma) & = \{ n \} \\ E[\!| x |\!](\Gamma) & = \{ \Gamma(x) \} \\ E[\!| \lambda x.\, e |\!](\Gamma) & = \{ A \mid F(A,x,e,\Gamma) \} \\ E[\!| e_1 \; e_2 |\!](\Gamma) & = \left\{ B \middle| \begin{array}{l} C \in E[\!| e_1 |\!](\Gamma) \\ \land\; A \in E[\!| e_2 |\!](\Gamma) \\ \land\; C <: A \to B \end{array} \right\} \\ \\ F(A \to B, x,e,\Gamma) &= B \in E[\!| e |\!](\Gamma(x:=A)) \\ F(A \wedge B, x,e,\Gamma) &= F(A, x,e,\Gamma) \land F(B, x,e,\Gamma) \\ F(\top, x,e,\Gamma) &= \mathrm{true} \end{align*}

I conjecture that this semantics is equivalent to the "take 3" semantics. There are a couple remaining differences and here's why I don't think they matter. Regarding the case for \(\lambda\) in \(E\), the type \(A\) can be viewed as an alternative representation for a table. The function \(F\) essentially checks that all entries in the table jive with the meaning of the \(\lambda\)'s body, which is what the clause for \(\lambda\) does in the ``take 3'' semantics. Regarding the case for application in \(E\), the \(C\) is a table and \(C <: A \to B\) means that there is some entry \(A' \to B'\) in the table \(C\) such that \(A' \to B' <: A \to B\), which means \(A <: A'\) and \(B' <: B\). The \(A <: A'\) corresponds to our use of \(\sqsubseteq\) in the "take 3" semantics. The \(B' <: B\) doesn't matter.

There's an interesting duality and change of viewpoint going on here between the table-based semantics and the intersection types. The table-based semantics is concerned with what values are produced by a program whereas the intersection type system is concerned with specifying what kind of values are allowed, but the types are so precise that it becomes dual in a strong sense to the values themselves. To make this precise, we can talk about tables in terms of their finite graphs (sets of pairs), and create them using \(\emptyset\), union, and a singleton input-output pair \(\{(v_1,v_2)\}\). With this formulation, tables are literally dual to types, with \(\{(v_1,v_2)\}\) corresponding to \(v_1 \to v_2\), union corresponding to intersection, empty set corresponding to \(\top\), and \(T_1 \subseteq T_2\) corresponding to \(T_2 <: T_1\).

Wednesday, December 21, 2016

Take 3: Application with Subsumption for Den. Semantics of Lambda Calculus

Alan Jeffrey tweeted the following in reply to the previous post:

@jeremysiek wouldn't it be easier to change the defn of application to be
⟦MN⟧σ = { W | T ∈ ⟦M⟧σ, V ∈ ⟦N⟧σ, (V′,W) ∈ T, V′ ⊆ V }?

The idea is that, for higher order functions, if the function \(M\) is expecting to ask all the questions in the table \(V'\), then it is OK to apply \(M\) to a table \(V\) that answers more questions than \(V'\). This idea is quite natural, it is like Liskov's subsumption principle but for functions instead of objects. If this change can help us with the self application problem, then it will be preferable to the graph-o-tables approach described in the previous post because it retains the simple inductive definition of values. So let's see where this takes us!

We have the original definition of values

\[ \begin{array}{lrcl} \text{values} & v & ::= & n \mid \{ (v_1,v'_1),\ldots,(v_n,v'_n) \} \end{array} \]

and here is the denotational semantics, updated with Alan's suggestion to include the clause \(v'_2 \sqsubseteq v_2\) in the case for application.

\begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ \rho(x) \} \\ E[\!| \lambda x.\; e |\!](\rho) &= \{ T \mid \forall v v'. (v,v') \in T \Rightarrow v' \in E[\!|e|\!](\rho(x:=v)) \} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v \middle| \begin{array}{l} \exists T v_2 v'_2. T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land v'_2 \sqsubseteq v_2 \land (v'_2,v) {\in} T \end{array} \right\} \end{align*}

The ordering on values \(\sqsubseteq\) used above is just equality on numbers and subset on function tables.

The first thing to check is whether this semantics can handle self application at all, such as \[ (\lambda f. f \; f) \; (\lambda g. \; 42) \]

Example 1. \( 42 \in E[\!| (\lambda f. f \; f) \; (\lambda g. \; 42) |\!](\emptyset) \)
The main work is figuring out witnesses for the function tables. We're going to need the following tables: \begin{align*} T_0 &= \emptyset \\ T_1 &= \{ (\emptyset, 42)\} \\ T_2 &= \{ (T_1, 42) \} \end{align*} Here's the proof, working top-down, or goal driven. The important use of subsumption is the \( \emptyset \sqsubseteq T_1 \) below.

  • \( T_2 \in E[\!| (\lambda f. f \; f)|\!](\emptyset)\)
    So we need to show: \( 42 \in E[\!| f \; f|\!](f:=T_1) \)
    • \( T_1 \in E[\!| f |\!](f:=T_1) \)
    • \( T_1 \in E[\!| f |\!](f:=T_1) \)
    • \( \emptyset \sqsubseteq T_1 \)
    • \( (\emptyset, 42) \in T_1 \)
  • \( T_1 \in E[\!| (\lambda g. \; 42) |\!](\emptyset)\)
    So we need to show \( 42 \in E[\!| 42 |\!](g:=\emptyset)\), which is immediate.
  • \( T_1 \sqsubseteq T_1 \)
  • \( (T_1,42) \in T_2 \)

Good, so this semantics can handle a simple use of self application. How about factorial? Instead of considering factorial of 3, as in the previous post, we'll go further this time and consider factorial of an arbitrary number \(n\).

Example 2. We shall compute the factorial of \(n\) using the strict version of the Y combinator, that is, the Z combinator. \begin{align*} M & \equiv \lambda x. f \; (\lambda v. (x\; x) \; v) \\ Z & \equiv \lambda f. M \; M \\ F & \equiv \lambda n. \mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)\\ H & \equiv \lambda r. F \\ \mathit{fact} & \equiv Z\, H \end{align*} We shall show that \[ n! \in E[\!|\mathit{fact}\;n|\!](\emptyset) \] For this example we need very many tables, but fortunately there are just a few patterns. To capture these patterns, be define the following table-producing functions. \begin{align*} T_F(n) &= \{ (n,n!) \} \\ T_H(n) &= \{ (\emptyset,T_F(0)), (T_F(0), T_F(1)), \ldots ,(T_F(n-1), T_F(n)) \} \\ T_M(n) &= \begin{cases} \emptyset & \text{if } n = 0 \\ \{ (T_M(n'), T_F(n')) \} \cup T_M(n') & \text{if } n = 1+n' \end{cases} \\ T_Z(n) &= \{ (T_H(n), T_F(n) )\} \end{align*} \(T_F(n)\) is a fragment of the factorial function, for the one input \(n\). \(T_H(n)\) maps each \(T_F(i)\) to \(T_F(i+1) \) for up to \(i+1 = n\). \(T_M(n)\) is the heart of the matter, and what makes the self application work. It maps successively larger versions of itself to fragments of the factorial function, that is \[ T_M(n) = \left\{ \begin{array}{l} T_M(0) \mapsto T_F(0) \\ T_M(1) \mapsto T_F(1) \\ \vdots & \\ T_M(n-1) \mapsto T_F(n-1) \end{array} \right\} \] For example, here is \(T_M(4)\):

The tables \( T_M \) enable self application because we have the following two crucial properties:
  1. \( T_M(n) \sqsubseteq T_M(1+n) \)
  2. \( (T_M(n), T_F(n)) \in T_M(1+n) \)
The main lemma's that we prove are

Lemma 1. If \(n \le k\), then \(T_M(1+n) \in E[\!| M |\!](f:=T_H(k)) \).

Lemma 2. \( T_Z(n) \in E[\!| Z |\!](\emptyset) \)

If you're curious about the details for the complete proof of \( n! \in E[\!|\mathit{fact}\;n|\!](\emptyset) \) you can take a look at the proof in Isabelle that I've written here.

This is all quite promising! Next we look at the proof of soundness with respect to the big step semantics.

Soundness with Respect to the Big-Step Semantics

The proof of soundness is quite similar to that of the first version, as the relation \(\approx\) between the denotational and big-step values remains the same. However, the following two technical lemmas are needed to handle subsumption.

Lemma (Related Table Subsumption) If \(T' \subseteq T\) and \(T \approx \langle \lambda x.e, \rho \rangle\), then \(T' \approx \langle \lambda x.e, \rho \rangle\).
The proof is by induction on \(T'\).

Lemma (Related Value Subsumption) If \(v_1 \approx v'\) and \(v_2 \sqsubseteq v'\), then \(v_2 \approx v'\).
The proof is by case analysis, using the previous lemma when the values are function tables.

Theorem (Soundness).
If \(v \in E[\!| e |\!](\rho) \) and \( \rho \approx \rho' \), then \( \rho' \vdash e \Downarrow v' \) and \(v \approx v'\) for some \(v'\).

The mechanization of soundness in Isabelle is here.