14 Lambda
This chapter is the first one in the second part of PLFA. Before we dive into it, it’s worth pausing to review what we have learned, and where we might go with it.
Agda provides a flexible mechanism for defining datatypes, and for implementing functions via definitional equations and pattern matching. The type system is powerful enough to express logical statements and their proofs. Since proofs use the same mechanisms, we can shift perspective between proof and computation.
We’ve seen how to define useful computations in Agda and prove things about them. But there is a lot of software out there not written in Agda, and one wouldn’t expect everyone to adopt it for their varied purposes. They may use computational models that look quite different. Agda is purely functional; all functions are total and must obviously terminate. Other languages have mutable state, partial functions, infinite loops, and exceptions. What can Agda say about these?
We can model other systems in Agda in order to verify their properties. We’ll need datatypes to represent expressions, statements, and programs in the other language. We’ll also need representations of computation, perhaps some sort of state, with transitions among states driven by a program. We can then make statements of the form "When started in this state, running this program arrives at that state", and work on proving them.
The question is where to begin. One early model that is still commonly taught is Hoare logic, which uses a simple imperative language with mutable variables, loops, and conditionals. But it is difficult to add functions, especially higher-order functions. In contrast, it is easier to start with higher-order functions and add mutable state. Much current research in programming languages uses this latter approach, and as I described in the Preface of these notes, this is the line of development that led to PLFA and this course.
The idea of starting with functions may seem recent, but it has a rich history. Alonzo Church’s lambda calculus was the first formal model of general computation defined, and enabled the first proof of uncomputability in 1936. Functional programming languages can often be viewed as elaborations of the lambda calculus. As mentioned above, the lambda calculus is also a good basis for formal definition of the semantics of imperative programming languages such as Java, and for proving properties of such languages.
Agda is an example of a language based on the lambda calculus. We’ve seen that computation in Agda takes the form of rewriting using definitional equations. Each intermediate expression is valid in Agda. That is, the state space is just programs themselves, and transitions among states are determined by rewriting. Looking more closely at rewriting, it is based on substitution, where a function application (matching the left-hand side of a definitional equation) is replaced with an expression (where pattern variables on the right-hand side are replaced by function arguments).
Not surprisingly, there are many variations on the basic model. Usually variables, abstraction (lambda), and application are included, but even the first two can be dispensed with (as in combinatory calculus). Typed or untyped? The first choice makes general recursion difficult; should something more be added to enable it? Should it be possible to form a term and then determine if it has a type, or should it be impossible to create an untypable term? What base types should be included? What rules should govern computation?
PLFA makes a particular set of initial choices, and then both modifies them and adds to them in subsequent chapters. The initial choices are close to an influential set called PCF, designed by Gordon Plotkin in 1977, based on unpublished work by Dana Scott. Plotkin’s PCF included Boolean constructors (true/false) and an eliminator (if-then-else). PLFA leaves these out, probably because natural number constructors (zero/suc) and an eliminator (a case expression with zero/suc branches) are included. Once one understands how to handle naturals, Booleans are an easy exercise. Including them would just add to the length of proofs (by increasing the number of cases) without teaching us anything new.
In addition to the features already mentioned (variables, abstraction, application, and naturals), PCF and PLFA include a general recursion construct, using the Greek letter \(\mu\) (mu). With this construct, one can easily and naturally write recursive functions. Of course, the syntax of \(\mu\) doesn’t give it the ability to implement recursion. That happens in the associated reduction rule.
We want to view an expression of the form μ f ⇒ M as a recursive function. That means that every occurrence of f in the body M should be viewed as a synonym for the whole function, that is, for μ f ⇒ M. The reduction rule for μ f ⇒ M steps to a modified version of M, in which μ f ⇒ M is substituted for every occurrence of f, using the complex but correct definition of substitution developed in this chapter. This has the effect of implementing recursion as we understand it.
With general recursion, one can write infinite loops. Removing it from the above list of features, one can show that every computation is finite (this is called "strong normalization").
The choice to include \(\mu\) makes it harder for us to write an evaluator for the chosen language, since Agda functions must always terminate. We don’t absolutely need an evaluator, but it is convenient to make sure our model and our programs are working as expected. The solution is to add a parameter to the evaluator which is the maximum number of steps that it can take. Thus the computation always terminates, but sometimes without producing a final result. (The undecidability of the halting problem, as demonstrated by the work of Alonzo Church, and contemporaneously Alan Turing in 1936, means we can’t algorithmically handle most questions involving termination.)
14.1 Syntax of terms
We immediately run into the issue of the surface syntax for terms in our chosen source language. Agda has already taken the most natural forms, and we can’t overload them. So PLFA uses slight variants: a lambda with a stroke through it (\Gl- in Emacs), an explicit middle dot (\cdot) for application instead of just juxtaposition, and backquotes (\‘) for variables (also added to terms like zero and suc, but directly instead of with a space in between).
The idea is to define a datatype Term by using Agda’s parser to make source language terms look like Agda terms, while actually building a tree representation that works well for computation. The result can be a little awkward to read and write, so I will use Church’s original notation as a "human-readable" alternative at times. (It’s useful to know this for looking at other references and research papers also.) The identity function, which would be ƛ "x" ⇒ "x" in PLFA notation, is \(\la x . x\) in Church’s notation. Application is juxtaposition (as in Agda), and a low dot is used instead of a double arrow to separate the variable of a lambda from its body.
14.2 Example terms
Church originally developed the untyped lambda calculus, and added types several years later (published in 1940). He used only variables, abstraction, and application. In order to argue that this was a general-purpose model of computation, he needed to encode natural numbers and show how to do arithmetic. The encoding is now called "Church numerals". Although we don’t need them, since Term has representations for zero and successor, they are a good source of nontrivial examples that use higher-order functions. This is potentially confusing, having two languages (three if you include Church’s notation) and three representations of natural numbers; hopefully you can keep them all straight.
In order to form the Church numeral for two, we take the Agda representation for two (suc (suc zero)), and abbreviate suc by \(s\) and zero by \(z\), to get \(s~ (s~ z)\). We then let the user supply whatever interpretation of \(s\) and \(z\) they want, which results in \(\la s . \la z.~s~ (s~ z)\). Another way to think of this is as an iterator, which applies a function \(s\) twice to a base value \(z\). Things get interesting when one or both of these are themselves Church numerals.
We’d like to define arithmetic with Church numerals, though technically it’s too soon, since we don’t have a definition of computation for Term. What PLFA does is work with the intuitive idea of what computation should be, followed later by verification that things really do work as we expect. The exercises at this point are like writing things on paper. There’s no way for you to verify them using Agda (and I have to mark them by hand). PLFA defines a function \(\mathtt{suc}^c\), which is somewhat misnamed. It consumes a natural number in the source language and produces another one. One might use this name for a function that consumes a Church numeral and produces a Church numeral one larger, but because of this decision of PLFA, we can call this second function csuc, and you should be able to write it based on the intuitive idea of computation.
14.3 Bound and free variables
A number of concepts are introduced in the Web version of PLFA that don’t show up directly in the Agda code. They’re necessary in order to understand the code, though, and they occur frequently in references and research papers. We already discussed bound and free variables in the Quantifiers chapter. Here we expand the scope of use of these terms.
A programming language typically has a number of binding constructs (or just binders), which give new meanings to the names of variables. \(\lambda\) is the most common one (sometimes disguised as a function declaration) but here we also have \(\mu\) for recursion, and the suc clause of case. At the top level, in practical terms, every variable has to be bound, or there is no way to interpret it. But as we consider subterms, we may remove the binder, in which case an occurrence of a variable could be free in a subterm. A term with no free variables is closed.
PLFA is clear that the choice to use named variables complicates the code in this chapter and the next, before a different choice is made in the DeBruijn chapter. The issue is with the reuse of names, which can be reduced but not avoided. This is also a problem with for-all and there-exists, which are also binding constructs. I have seen this issue affect students doing their first induction proofs (for natural numbers) on paper, when they insist on using \(n\) for every quantified variable.
14.4 Values
The point of defining values is to give a syntactic characterization of an expression that cannot be further reduced (as opposed to the semantic characterization "no step is possible"). The semantic characterization is also called "normal form", which is why the Agda command to reduce an expression is called "normalize".
14.5 Substitution
PLFA (like SF and TAPL before it) does make some decisions at this point that make our lives easier. Where possible, it considers only closed terms, and defines evaluation so that it uses call-by-value (arguments are evaluated to values before being substituted). This avoids the messiest case in substitution, when a free variable is in danger of being "captured" by being substituted under a binder. What remains is still more complicated than one might expect.
Here is the example given in PLFA of the problem being avoided, but stated using Church notation. If we have the term \(\la x . x~y\) and we wish to substitute \(x~\mathit{zero}\) for \(y\), doing it blindly gives \(\la x . x ~ (x~\mathit{zero})\). The reason this isn’t correct is that the \(x\) in \(x~\mathit{zero}\) has been bound, when we might expect it to remain free. (If this particular substitution is being done in a subexpression, then the full expression might contain the actual binder for what at this point is a free occurrence of \(x\)).
Doing the substitution properly requires renaming the bound variable in the term being substituted into so that it looks something like \(\la z . z~y\), and then the substitution can be completed, giving \(\la z . z~(x~\mathit{zero})\). Coming up with that name \(z\) that avoids a conflict (we couldn’t choose \(y\), for example) is more work (in Agda and in most other systems) than we care to do at this point. Once we learn how to avoid the work (by avoiding names), we will be able to handle situations like this.
14.6 Reduction
The infix single-step relation symbol is formed by a Unicode em-dash (\em in Emacs) and a Unicode single right arrow.
The term "beta reduction" is usually used for the rule called β-ƛ in the definition of single-step reduction, which dates back to Church’s first work on the lambda calculus. By extension, the other rules that involve a substitution at the top level of an expression (for case and \(\mu\)) are also given β names here. Letting subexpressions be rewritten is sometimes called the "compatible closure"; PLFA calls the associated rules "compatibility rules" and uses ξ (xi) to name them, which is also done in several other sources, though these names are not as common as "beta reduction". There is no rule allowing reduction in the body of a binding construct, again to avoid dealing with non-closed terms.
We have seen instances before where PLFA uses ∀ without type annotations on the variable, but starting with the definition of single-step reduction, they are used more frequently. Agda has type inference: if type annotations are left off, the compiler tries to figure out what is appropriate, and complains if it cannot. Both the reader and Agda can figure out the type of Γ, not from the name, but from the way it is used in a typing rule. Annotations should still be used when they help readability. Agda is already doing a fair amount of work for us with implicit arguments and pattern matching, and now is shouldering even more of the burden.
The traditional presentation of the simply-typed lambda calculus also uses type annotations on the variables of lambdas, but PLFA’s Term datatype does not, another decision that makes our life easier in certain ways. The later Inference chapter briefly discusses this, and shows some simple ways in which types can be inferred for our simple language. The situation with Agda is much more complex, but it is based on the same ideas.
If we had actual in-person lectures in this course, I would go over the Quiz items in this and subsequent chapters, asking students to hum if they believed each item to hold, and then for volunteers to provide counterexamples. You won’t get the benefit of the group setting, but you’ll have as much time to think about the answers as you like, and you can check the answers with fellow students or with me if you wish.
14.7 Reflexive and transitive closure
This is the mathematically precise way of extending a single-step relation to "take zero or more steps". There are two different definitions given here, and you can probably think of a third. They are all equivalent, and sometimes one variant will fit the style of a given proof better than another one. So it’s good to remember that generic equivalence theorems can be used to change perspective as needed.
14.8 Confluence
It may seem strange for PLFA to introduce confluence here, after specifically choosing to make computation deterministic. We tend to think of computation as deterministic, but in concurrent or parallel computation, different interleavings of operations are possible. Even in sequential computation, for example, when reducing arithmetic expressions on paper, we can have choices as to which subexpression to reduce first. There we take confluence for granted, and the property that easily follows from it, that normal forms are unique. But there are computations for which we may not have it, and it may not be needed. We may be satisfied with some general property of the result and not with specifics (e.g. all tasks get done but we don’t care which processor did what).
The original lambda calculus was not deterministic, and the proof of confluence took some work. A modern, mechanized version takes up a whole later chapter of PLFA.
14.9 Examples of reductions
Having defined reduction, we can verify that our Church numeral definitions, such as \(\mathtt{plus}^c\), really work as intended. In the example reductions given, Agda names are used as shorthand in some cases in order to make the expressions more readable. Where PLFA writes \(\mathtt{two}^c~\cdot~ \mathtt{suc}^c ~\cdot~ \mathtt{zero}\), the full expansion in Church notation would be \((\la s . \la z.~s~ (s~ z))~ (\la n .~\mathtt{suc}~n)~ (\la s . \la z . z)\). The convention PLFA is using in reduction sequences is to keep names around until they need to be applied with a beta rule, at which point it replaces a name by its meaning, and then does the substitution of the argument value. When we define an evaluator, it won’t follow this convention; it will use full expansions all the way through.
The long reduction sequences given here were generated by the evaluator we will write later, with editing by hand to apply the naming shorthand described above. You shouldn’t be intimidated by them, and you don’t need to follow them step by step all the way through. But it is worth testing your intuitive understanding of the reduction rules by looking at a random line without looking at the one below. See if you can identify the point at which the reduction takes place, and the effect of the beta-rule applied.
14.10 Typing
To introduce types to our source language, we need a way to talk about them (this is the datatype Type), and a way to specify evidence that a term has a given type. Although our terms are closed at the top level, as we descend into them recursively and pass under binders, we encounter subexpressions that have free variables. A context is a way of remembering information about a variable that was present at the moment it became free. It is essentially a list of (name, type) pairs, but we create a new notation for it that matches common practice in many references. We then need to define a lookup judgment that is essentially the same idea as list membership. It helps to think of a (name, type) pair as a binding of the name to a type.
The lookup judgment has two constructors, Z which can be thought of as "here", and S which can be thought of as "not here". An inner lambda can use the same variable name as an outer lambda, so to implement scope correctly, we must always use the most recent binding of a name in the context. The S constructor must thus include a proof that the name being looked for is not the most recent addition (and thus we must look deeper in the context). Since names are strings, this is a string inequality proof. We can use the proof by reflection idea from the Decidable chapter to hide that proof. This is done with the "smart constructor" S′, which we can then use in place of S in the lookup judgments we construct. This saves us only a bit of time and space now, but will be much more helpful in the Inference chapter.
14.11 Typing judgment
That gets us set up for the type judgment _⊢_⦂_, again using standard notation with the "turnstile" symbol (\vdash in Emacs). The colon in the Agda file is a Unicode variant (\: in Emacs, make sure you choose the right one}. We can read \(\Gamma\vdash t : T\) as "Gamma types \(t\) as \(T\)". There is one rule for each way of forming a term (we are not always this fortunate; sometimes there needs to be more than one) and the rules should be fairly intuitive.
The chapter ends with a few examples of type judgments, and short proofs that specific small terms cannot be typed. At this point, multistep reductions and types must be computed by hand, even though we know algorithms must exist (the Agda compiler does these things for a more complicated language). We’ll resist the urge to write those algorithms immediately, and instead prove some properties of reduction and types. As a by-product, we will get an evaluator, which needs only to be supplied with a type judgment for the initial term. Later on, we’ll see how to compute the type of a term.
Here are a couple of asciicasts of interactive development of typing. Much of it can be done with refining, and that which cannot is simple enough that one wishes for automation.
Click to see asciicast of typing development.
Click to see another asciicast of typing development.
PLFA does mention one option for automation, an Agda feature called "Agsy", invoked with C-c C-a. This is fairly simple-minded; it does a low-depth search of the tree of possibilities. It wasn’t mentioned before because it’s too much of a temptation for the beginner, and would get in the way of learning. Now you can use it to do things like fill in the name of something in the context whose type matches the goal. I don’t think it’s been developed much recently, and Agda has moved ahead in features, so don’t count on it to work miracles.