8.1

16 DeBruijn

\(\newcommand{\la}{\lambda} \newcommand{\Ga}{\Gamma} \newcommand{\ra}{\rightarrow} \newcommand{\Ra}{\Rightarrow} \newcommand{\La}{\Leftarrow} \newcommand{\tl}{\triangleleft} \newcommand{\ir}[3]{\displaystyle\frac{#2}{#3}~{\textstyle #1}}\)

We have arrived at the jewel-like heart of the second half of PLFA. Here all the work we did in the previous chapter pays off. That work, thanks to Agda, was still less than I’ve seen with other approaches. But now it gets even easier, thanks to a couple of ideas that are naturally motivated by what we’ve just seen.

The first idea is inherently-typed terms. Previously, we created one inductive datatype to represent terms, and another one to represent a type judgment on terms. But the expressions representing type judgments look a lot like the terms they are typing. Why not just represent a term by its type judgment? Previously, a type judgment was a ternary relation \(\Gamma\vdash t : T\); now a term will be a binary relation \(\Gamma\vdash T\). That is, the inductive datatype for terms will be indexed by a context and a type.

This has the advantage that it is impossible to write a "nonsense" term that cannot be typed. One disadvantage is that signatures get a little harder to read, but this is just a matter of gaining familiarity. The idea is usually attributed to Altenkirch and Reus’s 1999 paper "Monadic presentation of lambda terms using generalized inductive types" (don’t try reading this unless you know some category theory) but there is a lot of prior work. Church’s original presentation of the simply-typed lambda calculus incorporated types in such a way that only typable terms were well-formed; the separate style used earlier here (and just about everywhere else nowadays) is due to Curry.

There are two places where the type judgments we have seen don’t quite look like the terms they are typing. One is when a binding construct is used in the term. The variable name appears in the term, but it does not appear in the type judgment, because we can get it from the term.

The other place is when a variable is used in a term. Its string name (like "x") appears in the term, but what appears in the type judgment is a number of steps back to where the variable appears in the context, with string inequality proofs to ensure that we refer to the most recent binding of that variable name. We gave "step back" the evocative name of S and gave "we’ve arrived" the name of Z to make it clear that we’re just counting back from the right end of the context.

If we abandon the idea of using names for variables, we can also abandon the string inequality proofs (they will not be missed) and the use of a variable (in a term represented by a type judgment) just becomes a number counting back to the type of that variable in the context. Since exactly one item is added to a context for each binder passed under, the number can also be seen as counting the number of enclosing binders to skip over to get to the one being referenced.

This idea is called de Bruijn indices. It was introduced by Nicolaas de Bruijn for use in his pioneering Automath system in 1970. Automath was a system for expressing machine-checkable formal proofs. In designing Automath, de Bruijn independently described many ideas in dependent types (Howard of the Curry-Howard isomorphism being the other person credited). de Bruijn indices are very useful in proof formalization (his original 1972 paper showed how a major result could be simplified using them), but they also are used in compilers and interpreters for functional programming languages. As we’ll see below, they are really not human-readable, which is why we continue to use names in programming.

(Speaking of names, if you say "d’Brown", you get a reasonable approximation to the Dutch pronunciation of de Bruijn’s name.)

Here are some examples. The identity function, \(\la x . x\), becomes \(\la . 0\).
\(\la x . \la y. y\) becomes \(\la . \la . 0\).
\(\la x . \la y . x\) becomes \(\la . \la . 1\).
Next, two examples that appear in PLFA, but rendered here in Church notation. The Church numeral for two, \(\la s . \la z . s~(s~z)\), becomes \(\la . \la . 1~(1~0)\). The addition function: \[\mu + . \la m . \la n . ~\mathit{case}~m~[~\mathit{zero} \Ra n ~|~ \mathit{suc}~m \Ra \mathit{suc}~(+~m~n)]\]

becomes \(\mu . \la . \la . ~\mathit{case}~1~0~\mathit{suc}~(3~0~1)\). Notice what happens in the \(\mathit{case}\) expression. It’s the pattern in the successor clause that acts as a binder. So in the zero clause, the variable \(n\) becomes \(0\), but in the body of the successor clause, the variable \(n\) becomes \(1\). In the named-variable version, if we wanted to refer to the outer binding of \(m\) in the body of the successor clause, we couldn’t, because it’s shadowed by the \(m\) in the pattern \(\mathit{suc}~m\). But in the de Bruijn indexed version, we could; it would be \(2\).

You should now be able to see the advantage of using de Bruijn indices. Shadowing is not possible, so substitution doesn’t have to worry about variable capture, and proofs don’t have to demonstrate that it isn’t happening. We don’t have to restrict ourselves to closed terms any more, because the complications that led us to avoid terms with free variables are gone. How do we represent a free variable? References to bound variables are always less than the number of enclosing binders, so all the rest of the natural numbers can be used for free variables. The excess (after being diminished by the number of enclosing binders) becomes an index into the context. This is exactly the interpretation we need to handle the building up of a context as we descend into a term in the typechecking process.

Since we are not using Church notation, but representing terms as proofs, we can’t just use built-in Agda numbers and expect them to work. Instead we write a function # which maps numbers to lookup judgments expressed using S and Z, and then we can say # 2 instead of S (S Z).

The big disadvantage of de Bruijn indices is the lack of human readability. They’ve been described as "Martian". But they’re a perfectly fine internal format (heavily used as such in interpreters and compilers for functional programming languages), and it’s not hard to translate human-readable syntax into this format, and back again (if a list of names for indices is provided).

A conventional presentation (such as in TaPL, which uses de Bruijn indices in all its OCaml code) focusses next on substitution as needed to define reduction. But this introduces new complications. It’s natural to define substitution by structural recursion on the term being substituted into, and that’s what we did before. As we pass under a binder in that recursion, and put the associated type into the context, the index of the variable being substituted for goes up by one (it starts at zero at the beginning of the operation), and we have to keep track of that (before, it was just a name that didn’t change). But also the indices of the free variables in the term being substituted have to go up by one, to point correctly into the augmented context. That requires another structural recursion on the term being substituted. And here’s the problem: that second recursion passes under binders in the term being substituted. So again a threshold must be maintained, because indices below that threshold (which represent bound variables in the original term being substituted, but look free because binders have been removed) shouldn’t be adjusted.

This sounds messy, because I’m explaining it quickly. It’s not that bad, and you might enjoy working out the details. However, PLFA brings in another clever idea at this point to avoid this complication. It’s due to Conor McBride, from the brief unpublished draft paper "Type-preserving renaming and substitution" (several published followup papers use the idea, with credit). McBride noticed the similarity between renaming and a more general version of substitution. What we did before, now called single substitution in the named case, consumes a term to be substituted into, a variable name, and a term to be substituted, and produces a term. Now single substitution consumes a term (that is, a proof judgment of type \(\Gamma,B\vdash A\)) to be substituted into, a term (a proof judgment which must have type \(\Gamma\vdash B\)) to be substituted for the variable 0 (that is, lookup judgment Z), and produces a term (a proof judgment which must have type \(\Gamma\vdash A\)).

To make substitution look like renaming, we generalize it to substitute for more than one variable. We replace the term to be substituted for the variable zero with a map from lookup judgments over \(\Gamma\) (that is, many variables) to terms (proof judgments over another context \(\Delta\)) to be substituted for them. If we then move the term being substituted into to be the last argument, and notice that we can parenthesize this and the result type (because \(\ra\) is right-associative), the result type becomes a map from proof judgments in context \(\Gamma\) (that is, terms) to proof judgments in context \(\Delta\) (also terms).

Renaming consumes a map from lookup judgments in context \(\Gamma\) (now variables) to lookup judgments in context \(\Delta\), and produces a map from proof judgments in context \(\Gamma\) (now terms) to proof judgments in context \(\Delta\). That is, it lifts a map (from variables to variables) to a map from variables to terms. Our general substitution lifts a map (from variables to terms) to a map from terms to terms. The similarity is probably more obvious looking at the type signatures of the functions in the Agda code than in what I have just said.

rename :  {Γ Δ}
   ( {A}  Γ ∋ A  Δ ∋ A)
    ------------------------
   ( {A}  Γ ⊢ A  Δ ⊢ A)

subst :  {Γ Δ}
   ( {A}  Γ ∋ A  Δ ⊢ A)
    ------------------------
   ( {A}  Γ ⊢ A  Δ ⊢ A)

McBride’s paper abstracts away the differences between renaming and general substitution to create a single operation of which these are special cases. PLFA instead implements rename followed by general substitution as two separate functions, though it does point out the similarities, which are pretty obvious in the clean and elegant implementations. Both the paper and PLFA use renaming as a helper in general substitution (it is used when passing under a binder, because the context changes, so terms in the old context have to be lifted to the new context). Single substitution is the special case where the map is the identity on everything but variable 0, and the two contexts are \(\Gamma,B\) and \(\Gamma\). The next chapter introduces binding constructs that bind two variables at once, and this is also easy to define cleanly using general substitution.

One consequence of the new approach is that we don’t have to prove that substitution preserves types. It is inherent in the definition of substitution itself. Similarly, the rules for reduction incorporate preservation of types, so the need for a separately proved preservation theorem evaporates.

We still have to prove progress (and it doesn’t look much different), and iterating it gets us an evaluator. Determinism also still requires a proof, but PLFA skips it, as it isn’t much different from what we did earlier. So once we get past substitution (which is clearly an improvement), everything is about the same in terms of design, and either easier or about the same in terms of difficulty. The net saving, both in lines of code and readability, is significant. This chapter alone may be the most compelling argument for the existence of this course.

There are no exercises assigned for this chapter. We will make up for it in the next chapter.