Notes on PLFA
\(\newcommand{\la}{\lambda} \newcommand{\Ga}{\Gamma} \newcommand{\ra}{\rightarrow} \newcommand{\Ra}{\Rightarrow} \newcommand{\La}{\Leftarrow} \newcommand{\tl}{\triangleleft} \newcommand{\ir}[3]{\displaystyle\frac{#2}{#3}~{\textstyle #1}}\)
(Prabhakar Ragde, 2019)
These notes provide additional commentary on the textbook "Programming Language Foundations in Agda" by Wen Kokke and Philip Wadler, in the context of the offering of CS 842 at the University of Waterloo in Fall 2019. They will change with time.
1 Preface
My motivation for wanting to teach from PLFA using Agda are similar to those described by Philip Wadler in the preface to PLFA (which you should read now, if you haven’t already). When I first started teaching the theory of programming languages, I used the textbook Types and Programming Languages (TaPL) by Benjamin Pierce, and I have done so five times now, in CS 442. The proofs there are on paper, written by humans for humans, and it is sometimes difficult to get students to engage with them seriously. Pierce and co-authors went on to develop the online textbook Software Foundations (SF), which uses the Coq proof assistant to formalize some of the proofs in TaPL. I have taught using SF three times at the graduate level in CS 798, as well as using ideas from it twice at the undergraduate level in CS 245E. (The preface to SF is an excellent overview of the motivations for CS 842, and you should also read that as well.)
SF is a great resource, and student reaction to it (and courses I have taught using it) has always been positive. However, I found myself subverting its decision to work nearly always at the top level of abstraction in Coq (proof scripts written using tactics). My subversion consisted of an early exposition of the correspondence between propositions and types (that SF leaves to a late, optional chapter, and PLFA discusses in the first paragraph of the Preface) and by exposing the proof objects (dependently-typed code) that the scripts were manipulating and constructing. In Agda, one works directly with proof objects, so there’s no need for subversion.
Agda and Coq are both dependently-typed programming languages, and have much in common, including some early developers. Coq has a longer history. It is implemented in OCaml, and its syntax resembles OCaml. Agda is implemented in Haskell, and its syntax resembles Haskell. The current version, Agda 2, is largely due to Ulf Norell, and his PhD thesis from 2007 is a good source of information if you really want to know how things work. But both Coq and Agda are under active development. Even from Agda 2.5.4.2 (which was used for the first release of PLFA) to Agda 2.6.0, proofs can turn out differently when developed interactively with identical commands.
In a type system which supports dependent types, the result type of a function can depend on the value of its parameter. It’s not immediately clear why this would be valuable, because most type systems don’t offer this possibility. One big motivation is that logical statements behave this way. A theorem like "For all natural numbers \(m\), \(m + 0 = m\)" is used by instantiating it for a particular value of m. For example, we can conclude "\(4 + 0 = 4\)". The theorem can be viewed as a function that consumes a natural number \(m\) and produces a proof that "\(m + 0 = m\)" for that specific value.
In a dependently-typed programming language, types are expressive enough to encode logical statements. The proof of such a statement is a value of that type. This is what Wadler refers to as "propositions as types" in the preface (he has written an expository paper with that title, from which PLFA sometimes quotes). It is also known as the Curry-Howard correspondence between logic and type theory.
This is quite appealing, as it means that to prove something, one gets to write a program. But there are costs. In order to keep the logical system consistent, all functions must be total (no partial functions, no exceptions, no infinite loops) and termination should be relatively obvious (because Agda’s termination checker is fairly simple-minded). Writing code that typechecks becomes harder, so both Coq and Agda offer interactive development, where code is refined with the help of information from the compiler in a very tight feedback loop. This is independently quite appealing.
SF has had more than a decade of development and has been used at many institutions. Despite my considerable respect for it, I still have differences with its approach. PLFA was first released in early 2019, and has been used in only a few courses. Both books are under active development, just like the languages they use. I will do some things differently from PLFA; these notes should help explain how and why.
2 Getting Started
Perhaps the greatest barrier to taking this course might be setting up the required tools, if you want to use your own machine. I have only done this on Mac OS X, and I did run into some issues (which, fortunately, I could solve). If you are using Windows or Linux, there is a limit to how much help I can give. The tools are installed on the graduate computing environment (linux.cs.uwaterloo.ca) and you can always use them there.
Agda interaction is best managed with the Emacs text editor. Other alternatives are less supported and more buggy. This means you need to install Haskell (8.6.x), Emacs (26.x), Agda (2.6.x), and the Agda standard library (1.0).
On OS X, you can try to do everything through Homebrew, or you can try installing the various components separately. You can’t mix and match, because there seems to be no way to tell Homebrew that you’ve already installed part of what it needs. I installed Emacs through Homebrew, Haskell using the Haskell Platform, and Agda from source. It takes a long time to build it from source.
Emacs: https://www.gnu.org/software/emacs/
Haskell: https://www.haskell.org
Agda: https://agda.readthedocs.io/
You shouldn’t need to host a local copy of PLFA webpages, as described at the end of Getting Started. But you should install the Agda standard library, set it up, and set up PLFA as a library as well. To do that, you’ll need to grab a copy of the PLFA source, as described in Getting Started. You can do this via git, or you can download individual source files.
You will need to configure Emacs to automatically use Agda mode when a file with the .agda (or .lagda for "literate" files) extension is loaded. This is typically done as part of Agda installation but sometimes it doesn’t work properly and you have to edit your .emacs initialization file.
You don’t need to know a lot of Emacs to work with Agda, but you need to know the basics. You can use the mouse (unless you’re remotely logged into a server). Arrow keys should work. But learning some basic keyboard commands will speed things up for you. There is a built-in tutorial, which you can run by typing C-h t. (That’s control and h together, followed by t.) Take notes, especially on how to exit Emacs. A lot of what’s in the tutorial isn’t necessary (all the C-u stuff) but you should learn basic navigation, cut-and-paste, and search. Here are some commands mentioned in the tutorial, separated here by semicolons, that I find useful and recommend to you.
C-x C-c; C-x k; C-g; C-v; M-v; C-l; C-p; C-n; C-b; C-f; M-f; M-b C-a; C-e; M-<; M->; C-x 1; C-<SPC>; C-w; C-y; M-y; C-/; C-x C-f; C-x C-s; C-x C-b; C-x b; C-s; C-r
3 Naturals
Inference rules are introduced in the definition of natural numbers. As PLFA points out, the hypotheses are above the horizontal line and the conclusion is below. This is a standard way of presenting "if/then" rules in a concise (though two-dimensional) format. In this case, the naturals are built from constructors zero and suc, but we can’t use them arbitrarily; the inference rules give restrictions on how they can be used.
PLFA then moves from the inference rules to the datatype definition. In this case, the datatype definition is simple enough that we probably don’t need the inference rules presentation. But the idea of moving from inference rules to datatype definition will be useful later on when the inference rules are talking about more complicated judgments.
Unicode symbols are starting to show up here, and it’s not obvious how to type them in Emacs. Fortunately, when editing an Agda file, a special input method is set up (if you have properly configured your Emacs to use Agda mode). Backslash (\) is the escape key that starts to build a special character. If you really want a backslash, just type it twice. But otherwise what you type next determines the special character. Typing \to gives the small right arrow →. Each PLFA chapter has a list of new Unicode characters and their shortcuts, and I have done this in my versions as well. Note that these shortcuts don’t work outside of Agda mode (without further configuration of Emacs).
It’s difficult to create a Web page or other static presentation of a process that is interactive. A Coq proof script can be replayed one command at a time by anyone. This is not the case for Agda; the actual interactions are not explicitly present in the finished code. Fortunately, there are not that many ways of interacting, and with short pieces of code, it’s usually possible to start at the beginning and figure out how they were made. I have created starter files giving the beginning points for livecoding I plan to do in lecture, and completed files where only the homework exercises remain to be done. I hope this will let you bridge the gap, perhaps with the aid of my annotations or notes made during lecture.
The type signature for the addition function is ℕ → ℕ → ℕ. The two arrows are important: it’s not ℕ ℕ → ℕ or (ℕ,ℕ) → ℕ or anything that might group the two parameters together. In fact, every function in Agda has exactly one parameter. The → type constructor is right-associative, so the type signature ℕ → ℕ → ℕ is read as ℕ → (ℕ → ℕ). That is, addition consumes one natural number and produces a function that consumes another natural number and produces the sum of the two. Since function application is left-associative, we can write _+_ 3 4 and it means (_+_ 3) 4, or we can write 3 + 4 to mean the same thing. This is a common decision among statically-typed functional programming languages, and has a number of advantages, but if you’re more used to conventional imperative languages, this may take some adjustment.
Pattern matching is available in most functional programming languages, but Agda’s pattern matching is particularly sophisticated, as it needs to deal with dependent types in a way that helps the user. We will use it a lot, and gradually learn about the pattern language. Patterns are matched against values. For now, note that a pattern like m means "match anything, and call the result m", and a pattern using a constructor, like suc m, means "match anything whose outer constructor is suc, and let m be the thing that suc is applied to."
In a conventional static type system, a file is completed by the user without using the compiler. The compiler then typechecks the code without running it. Finally, it creates an executable version which can then be run. There is consequently a clean separation between terms (expressions in the language) and types. In fact, one goal for a compiler is to erase types after checking, so they incur no run-time overhead.
But in a type system supporting dependent types, this clean separation breaks down. We want to be able to prove things about our code, which means that expressions involving it (applications of functions, etc.) are going to appear in our types. If we want to prove "my-function 1 = 3" by means of typechecking, then there has to be some evaluation of "my-function 1" at typechecking time.
There are points in Agda’s typechecking where, internally, it must check whether two types are equal. This is obvious in the case above, but there are other points as well. What it does is this: it evaluates each of the types as much as it can, and then it checks equality. To make this clearer, we need to distinguish several kinds of equality.
Structural equality is when exactly the same constructors are used in exactly the same ways to create the two types. For example, suc (suc zero) and suc (suc zero) are structurally equal. The kind of equality I talked about in the previous paragraph is definitional equality. For example, 1 + 3 and 4 are definitionally equal, because 1 + 3 can be evaluated to 4. (I am using shorthand here; 1 really means suc zero, and so on.) Finally, propositional equality is a way of talking about definitional equality in Agda itself. For example, I could say "1 + 3 ≡ 4". But I could also say "1 + 3 ≡ 5". The former has a proof, while the latter does not.
The main takeaway from this discussion is that Agda needs to do some evaluation during typechecking. And PLFA tries to show how this would work for our definition of +. Each step is an application of one of the equational rules used to define +. PLFA displays these using syntax imported from the standard library to display equational reasoning.
But it’s important to understand the limitations of such a display. It isn’t really checking that the next expression follows from the previous one by "one step of computation". This is evident by moving the terms around, or removing some; the result will still typecheck. All the syntax accomplishes is to take expressions \(e_1, e_2, \dots e_n\), where \(e_1 = e_2\), \(e_2 = e_3\), and so on, and let us chain them into the equivalent of \(e_1 = e_2 = e_3 \dots = e_n\). We do this all the time in mathematics, but the rules of Agda are somewhat more rigid, and once we define ≡ as relating two expressions, we cannot then use it to link more than two expressions into one. Hence the additional syntax.
It is possible to create an executable from Agda code and run it, but we will not do so in this course. That may seem strange, to not even compile our programs to native code, let alone run them. But, as explained above and made clear in this chapter, evaluation is taking place, and we can test our functions as part of typechecking. You understand programming from your previous exposure to computation. The goal of this course is to improve your understanding of proof in a computational setting. For that, typechecking suffices.
We see from the example functions in this chapter that a function definition is a series of equations, with patterns on the left-hand sides. Those patterns have to cover all cases, or the compiler will not accept the definition. This is one of the ways that Agda ensures that all functions are total. Two cases may overlap, in the sense that there is a value that matches both left-hand-side patterns. This is permitted, but Agda mode will slightly highlight the second equation as a warning. For a value falling into the overlap, the second equation will not be used, and this could cause problems during interaction when we are trying to get types to match up. The moral of the story is that sometimes it is better to write a bit more code to avoid overlap. If you use C-c C-c (case split) to develop patterns as we did in lecture, this will come naturally.
You might think that propositional equality (≡) is built in, as is its proof refl. But, in fact, both are defined in library code, using a generalization of the same mechanism as we used to build ℕ. refl is the single constructor of the datatype for _≡_. It may seem odd that refl can construct a value in the two different types 3 ≡ 3 and 4 ≡ 4. But we will see that refl has a hidden argument. Further details show up in the Equality chapter.
There is an exercise on binary representation at the end of PLFA Naturals, and also in 842Naturals. But I have changed the notation from PLFA, making the digit constructors postfix instead of the default prefix. The representation of a binary number will have bits on the left then the most significant bit on the left, instead of the PLFA representation with the most significant bit on the right before nil. This makes the Agda notation look more like conventional on-paper binary notation. The idea is due to David Darais. Further work on binary representation appears in several more chapters, so it is worth paying attention early and getting it right.
4 Induction
The big advantage of equational reasoning is that it is readable. This, I believe, is the main reason it was chosen for PLFA. But it is less natural to develop interactively, and it is definitely more verbose, meaning it takes longer to type. For these reasons, I will not be using it in lecture. Whether or not you use it in assignments is up to you. There are times when it may help you to carry out a development. And if you are not sure how a more terse development works, you can try to rewrite it equationally if it is handling a situation for which equational reasoning helps.
My quicker, shorter code is also going to be less readable. It will use rewrite more, and when reading the completed code, you mostly can’t see what the current state of the goal was when the rewrite was done, or the equation that rewrite is using, or the result of the rewrite. You can only see these things if you erase the rewrite and then redo it interactively. That’s the current tradeoff. Perhaps future versions of Agda, or some other proof assistant, will let us have more of all advantages.
As hinted at above, a for-all type is a dependently-typed function. The ∀ character in something like ∀ (n : ℕ) can actually be omitted; it is the name n, annotated by its type ℕ, that signifies the use of a dependent type. We’ll keep the ∀s for readability.
Just as we derived code in 842Naturals using case-split, we can derive proofs, since a proof is nothing more than code. And just as we used recursion in earlier code, we can use recursion in a proof. This is commonly called induction. If you have encountered induction in mathematics, it probably focussed on the natural numbers, as we are doing here. But induction can be done on any inductive datatype, as we will see.
This chapter of PLFA introduces the most useful interactive commands added by Agda mode. We’ve already used some of these in our coverage of the previous chapter.
C-c C-f and C-c C-b (forward to next hole, back to previous hole)
C-c C-, (information about goal and context when in a hole)
C-c C-c (case-split in a hole)
C-c C-r (refine in a hole)
C-c C-SPC (completely fill a hole)
These cover the bulk of what we need to develop code. We will learn nuances of these, and a few more commands, as we go. A complete list can be found in the Agda documentation or by getting Emacs to describe Agda mode. To be honest, I am not sure what some of the available commands do.
When defining addition, there were many wrong choices we could have made in filling in the right-hand side of equations that would have typechecked. We had to be careful to define exactly the function we wanted. But in proving things about addition, we can be a bit more careless. Anything that typechecks will do. Make sure you are conscious of which mode you are operating in and whether it is the right choice.
At the end of the file, we have the second appearance of binary notation, since now we can prove some things about it. I have added a bit more structure than provided in PLFA, and I will reschedule where later work is done. I like these exercises a lot, and I hope you will also.
5 Relations
So far we have created a datatype (ℕ), and proved theorems about it (with the help of things we don’t quite understand yet, like refl). But each theorem is just code whose type matches what we want to prove. That is, the proof is a value with a given type. We can create such values with datatypes as well. But theorems are usually about something. So we need to create datatypes indexed by what they are talking about. We will start with the example of relations over the natural numbers. We’re already working with one such relation (≡), but it is easier to see the details of other relations first. So we start with ≤. This will be indexed by the two natural numbers it relates.
It helps to view each constructor of the datatype as an inference rule. We want a small set of inference rules that suffice to give evidence for all cases of ≤. There could be many different choices, but we only need one set.
PLFA gives one set of inference rules for ≤, followed immediately by the corresponding datatype, written in such a way that it looks like the inference rules. This is done a lot in what follows, so hopefully it doesn’t take much adjustment on your part.
The definition of ≤ in PLFA matches that in the Agda standard library. Interestingly enough, the definition in SF is different, to match the Coq standard library. The two definitions are equivalent. As a thought exercise, perhaps you can figure out another definition, and prove the equivalence of the two definitions.
PLFA presents the proof of 2 ≤ 4 in a stack by identifying the hypotheses of one rule with the conclusion of another. In general this will not result in a stack, but a tree, as there could be several hypotheses (but only one conclusion). The constructor-based proof can be viewed as a concise way of representing this tree. Proof trees are likely to show up in most conventional treatments of logic, so don’t be surprised if you see them elsewhere.
Implicit arguments are helpful in improving expressivity by reducing the amount of information the programmer has to provide to the typechecker. But sometimes the typechecker cannot infer the arguments, or we need to provide different ones than it might expect, so we need ways of providing them. PLFA demonstrates several, some of which show up during interaction, so now you will recognize them when you see them.
In this chapter, we start to see that pattern matching in Agda is sophisticated. It can detect cases that cannot happen and don’t need to be included. Refinement and case-split work well with it, generating patterns that are forced by others. We may sometimes need to edit those in subsequent developments, for example, to name an argument that was previously unused and replaced by an underscore.
This chapter introduces the with clause, a concise way of doing case analysis on an arbitrary expression. As shown in PLFA, it is equivalent to the use of a helper function, which can be defined locally using where (this has the advantage of being able to use local context). A where block can be attached to any equation in a function definition. Later, we will see that rewrite is equivalent to a particular use of with.
The ellipsis (...) in a with clause is replacing the pattern up to the keyword with. Sometimes refinement will fill the pattern back in. There are reasons for doing this in complicated circumstances but for us this is mostly a nuisance. Once our development is done, we can go back and replace the ellipses, and we should, to improve readability.
One way to work interactively with ’with’ clauses is to write out the
expression following it, then write the next line as:
... | r = ?
Here r stands for ’result’, and will be in scope in the hole, at
which point one can split on it. This is the technique I will use in
livecoding.
Just as rewrite can use multiple equations separated by vertical bars, with can split on multiple expressions separated by vertical bars. We’ll see examples of this later.
6 Equality
This chapter is a peek under the hood. We see that the definition of propositional equality (≡) is just another inductive datatype. refl is the unique constructor of this datatype. An equivalent definition is used in Coq, but it is not the only possible definition of equality. Homotopy type theory (which people work on in both Agda and Coq) uses a more general idea.
The material here on Leibniz equality and universe polymorphism is not needed in what follows, and is somewhat specialized. I am leaving it as optional reading. The main reason to do this reading is that the ideas may show up if you look at Agda source code written by experts (including the Agda standard library) and a little preparation will avoid you getting confused later.
The description of Leibniz equality introduces the need to have a hierarchy of sets. This goes back to the first attempts by Frege to formalize logic, and Russell’s paradox which poked a hole in it. Russell’s paradox can be described as follows: if we form the set of all sets that do not contain themselves, does that set contain itself or not? Either possibility leads to a contradiction.
The problem lies in permitting the construction "a set that contains itself". Russell’s stratified type theory, created to avoid the paradox, does not permit this. In Agda, the rule is that if something is formed by quantifying over Set, then the result cannot be in Set; it has to be in Set₁. This is the reason for the type signature of _≐_.
Here is an exposition of how allowing Set : Set (which can be done with a compiler flag or pragma in Agda) lets one prove the False proposition in Agda. I’ve put the link here for reference, but you’ll need to read through the Quantifiers chapter to be able to understand it, and it’s not necessary for you to read it.
http://liamoc.net/posts/2015-09-10-girards-paradox.html
The point is that there needs to be an infinite hierarchy of Set instances in Agda. (It isn’t absolutely necessary for it to be infinite, but cutting it off at some finite level ultimately limits what can be done.) It then follows that one might like to generalize ideas, such as equality, to arbitrary levels of the Set hierarchy. That is universe polymorphism, as discussed at the end of this chapter.
I have assigned no exercises for this chapter.
7 Isomorphism
This chapter starts off with some useful Agda features. If you’ve had experience with functional programming, you’ve seen lambdas before. If not, you will quickly grasp the basics (a way of creating a function on the fly without giving it a name) but wonder how useful it can be. The answer is "incredibly useful", though in the context of Agda "incredibly" becomes "moderately", at least as far as surface syntax is concerned. Many Agda features are just syntactic sugar over lambda, and this is true for other functional languages as well. In the second half of the course, we will look at proving things about models of computation based on lambdas, which is pretty much the basis for modern theories of programming languages.
You may have already seen lambdas if you’ve tried to refine an empty hole which has a function type. This will produce something like λ x → ?, that is, it uses the "pattern is a variable" syntax without curly braces. This is the way lambdas started out (and the way we will study them later, because we’re not going to model pattern matching). But the version PLFA introduces first, with curly braces, allows patterns and is more expressive. Unfortunately, refining will not give us a curly-brace lambda. We have to add the curly braces ourselves (which is worth doing, as it allows us to case-split on the variable).
As with lambdas, using the function composition operator ∘ may not come naturally at first. But used judiciously, it can improve expressivity. You may find that you only see a composition emerging during interaction, and it needs to be edited in later.
Extensionality is an axiom that means "pointwise equality of functions implies propositional equality". It cannot be proved within intuitionistic logic, but it can be added without resulting in inconsistency, and it is sometimes useful. Adding it with a postulate means it cannot be used in computation, only in typechecking. This will not be a problem in what we are doing.
The remainder of this chapter introduces the concept of isomorphism. In math classes you may have seen isomorphism between sets A and B as the existence of a one-to-one and onto function from A to B. Pictorially this looks like blobs for A and B, with dots for individual elements, and a line from each dot in A to a dot in B representing the function. "One-to-one" means no two lines from A go to the same dot in B, and "onto" means every dot in B has a line to it.
Intuitionistically, this is best represented by two functions, a "to" function from A to B, and a "from" function from B to A. The "to" function follows lines from A to B, and the "from" function follows lines from B to A. The "one-to-one" and "onto" conditions above are handled by saying that from (to a) = a for all a in A, and to (from b) = b for all b in B.
This totals four pieces of evidence needed to establish isomorphism, and we pack these into an Agda record (which resembles records or structures in other programming languages, except for the possibility of dependent types). This is common practice when representing mathematical structures with multiple components and laws in Agda. While we will use isomorphism in this chapter and the rest of the first half of PLFA, it will not play a significant role in the second half. Nonetheless, it provides good exercise at this point, and the potential to establish mathematical structure that may be useful or of intrinsic interest.
You should understand record syntax as presented in PLFA. But we will take a step beyond PLFA and build our records using copatterns. A pattern deconstructs a value; a copattern is used to construct a value. (Usually it is used to construct a value of a coinductive datatype, which is a way of representing "infinite" lists, trees, and other data structures. But we will not be looking at these in this course.)
If we are defining name = ? where the hole has a record type, and we refine the empty hole, we get record syntax as presented in PLFA, with holes for the field values. But if we instead case-split on the empty hole, it offers us a choice of typing in one or more variables to split on, or typing nothing and splitting on the result. If we make the latter choice (split on empty), we get copattern syntax. This consists of an equation for each record field name, followed by an argument which is the name we provided. There will be holes on the right-hand side of each equation. This is not only more readable, but we can proceed to define each left-hand side as we are accustomed to, using case-split to generate multiple equations. This is much easier to demonstrate than to describe, and I hope you have been attending lecture to see this. You can see the results in the completed file, and that should suffice to let you replicate their creation.
PLFA shows that isomorphism is an equivalence relation (it is reflexive, symmetric, and transitive) and shows how to set up tabular reasoning, if you want to use that. This forms a more general notion of equality than structural equality, and many results in mathematics are stated "up to isomorphism".
PLFA goes on to describe two weaker notions than isomorphism. An embedding of A into B allows B to be larger (it need not be). In mathematical terms, an embedding is one-to-one, but not onto. Only one line touches every element of B, but an element of A can have several lines. In practical terms, this drops the last requirement of an isomorphism, the "to (from b) = b for all b in B". Since several lines can touch an element of A, but only one of them can be used by the "to" function, a round trip from B is not guaranteed to end up at the same place.
An even weaker notion drops the third requirement of an isomorphism, "from (to a) = a for all a in A". What is left is the "to" function from A to B, and the "from" function from B to A. Interpreted logically, these functions show A implies B and B implies A, which means A if and only if ("iff") B, or propositional equivalence. Of course, this works for sets A and B which perhaps cannot be interpreted as logical propositions. PLFA (following the Agda standard library) defines embedding and propositional equivalence using records.
These ideas are used to explore the strength of logical identities in the next chapter, so I have moved the conclusion of the binary notation exercise to the end of 842Isomorphism. Here I define a notion of a canonical binary embedding (as suggested in PLFA Relations), ask you to prove some theorems about it, and then finally show how to collect up the canonical binary embeddings into a set. I ask you to show that set is isomorphic to ℕ.
Of course, both sets are countable, so there are many isomorphisms. But I want one that preserves the arithmetic operations we have defined on both unary and binary numbers, so I want you to use tob and fromb to help demonstrate the isomorphism. PLFA completes this exercise at the end of the Quantifiers chapter, but I have given you enough structure to let you complete it now.
You will notice that the structure of a proof that a number has a canonical binary encoding is almost identical to the encoding itself. This idea will come up in the second half of the course, in a different and more general context.
8 Connectives
PLFA explores various laws in propositional logic using the tools of the previous chapter: isomorphism, embedding, and propositional equivalence. In SF this is done using implication and sometimes "iff", and (following the Coq standard library) the connectives look like they do in mathematical logic. In Agda they look more like set connectives, which makes sense given the level we are working at. We will sometimes use the logical names and sometimes the set-operator names.
Logical AND (conjunction) is Cartesian product. The operator is _×_ (in Agda mode, \x) and I will usually pronounce it as "and" or "times", and sometimes "cross" (though it is not multiplication and it is not vector cross-product).
We define propositional "true" as ⊤ (in Agda mode, \top), the set with one element (constructed with tt). It is the left and right identity of conjunction up to isomorphism. I will pronounce it "true" or "top", and sometimes "unit".
Logical OR (disjunction) is disjoint union. The operator is _⊎_ (in Agda mode, \u+) and I will usually pronounce it as "or" or "sum", and sometimes "union" (though it is not set union).
We define propositional "false" as ⊥ (in Agda mode, \bot), the set with no elements (no constructors). It is the left and right identity of disjunction up to isomorphism. I will pronounce it "false" or "bottom", and sometimes "empty". Working with it takes some getting used to, but it is used in defining negation (in the next chapter), so it’s worth the effort.
We’ve already been using the function type operator for implication, but in set theory, the set of all functions of type A → B is written Bᴬ and can be pronounced "B to the power A" and called "exponentiation". Some programming languages papers and references use this terminology, so it’s worth remembering.
The proofs we’ll do in lecture (which are in the completed file) are a little more readable than those in PLFA thanks to the use of copatterns. But you are free to use record syntax and lambdas if you prefer!
9 Negation
Working with negation definitely takes some getting used to, but you’ll soon get the hang of it. Mostly you have to grasp what can’t be done intuitionistically. Defining ¬ A as (A → ⊥) corresponds to the following pattern of reasoning: "Assume A, use that in reasoning, get a contradiction, conclude ¬ A." But you don’t get the similar classical pattern that goes like this: "Assume ¬ A, use that in reasoning, get a contradiction, conclude A." That pattern, when formalized, is double negation elimination (¬ ¬ A → A) and it cannot be proved in intuitionistic logic.
It may be less of an issue when using Agda as compared to Coq, but you might need to know about some other patterns that won’t work. In this file we see a proof of contraposition, (A → B) → (¬ B → ¬ A). But the converse, (¬ B → ¬ A) → (A → B), cannot be proved. Later in the file, we see that one of deMorgan’s laws cannot be proved (though others can). Actually proving (metamathematically) that these cannot be proved is a fascinating exercise, but beyond the scope of this course. One can sometimes get a sense of why, particularly when disjunction is involved, since proving an intuitionistic requires definitely injecting a value from one side or the other. One cannot say, "Oh, there’s a value in one of these two, I just don’t know which."
The most significant absence is the law of the excluded middle (LEM): ∀ {A : Set} → A ⊎ ¬ A. (On Twitter, someone I follow expressed disdain for the name of this law, and I suggested as an alternative "There is no try.") It’s important to realize that it may hold for specific A, or if A is replaced by some more complicated function of A. But in full generality, it cannot be proved.
All is not lost, however; these laws can be added as axioms without rendering Agda inconsistent. For LEM, this can be shown with a short proof (called em-irrefutable in 842Negation). I almost wish this had not been done in PLFA, because the proof feels impossible at first, and like a magic trick when completed. I have used it as an assignment question in an undergraduate class.
But an even better exercise is left to us at the end of the file. The "Classical" exercise (I call it "Classical five") gives five laws that hold in classical logic but not in intuitionistic logic. The exercise is to show them all equivalent, that is, each one implies all of the others. This would be twenty implications, but by invoking transitivity, it is possible to do it in as little as five proofs. This exercise appeared in the "Coq’Art" book, formally titled "Interactive Theorem Proving and Program Development", by Bertot and Casteran (2004), though it is much older than that. It also appears in SF.
I have solved the Classical five exercise twice in Coq (two years apart) and once in Agda. It’s definitely easier and more fun in Agda. I have not assigned it as an exercise, to give you the pleasure of solving it without being required to solve it. (Also it is a pain to mark.) I managed (by reusing earlier theorems) to refine my Agda solution to five proofs, each of which has one type signature line and one equational line. But you don’t have to go this far. Six, or seven, or ten proofs are fine, and use as many equational lines as you wish. Just relax and enjoy yourself.
Since we can add any of these as an axiom, and prove more things, why not add them? The price we would pay is that there are no computational interpretations of axioms. In fact, the second half of PLFA uses this lack of computational interpretation as a way to signal an error (we may try to do it a different way). There is a fairly simple translation involving double negations that will convert any statement provable in classical logic into one provable in intuitionistic logic. Sticking with intuitionistic logic also lets us discern finer metamathematical structure, just as we did with isomorphisms and embeddings.
10 Quantifiers
We have been using for-all since 842Induction, so here we mostly see how this quantifier interacts with the other logical connectives. It’s important to realize that implication (non-dependent function) is just a special case of for-all (dependent function).
Existential quantification is more interesting. It doesn’t need to be built in; it can be derived like the other connectives. Evidence of an existential is a pair consisting of a witness and a proof that the witness has the desired property. A bit of special syntax helps make it more readable, but that syntax doesn’t always get used during interaction, unfortunately. There’s nothing new here as far as Agda features or interaction is concerned.
What we studied in the previous chapter was propositional logic (for-all is used only to quantify variables at the top level of a formula). Adding quantifiers that can be used at any level gives us predicate logic. There is no difference between intuitionistic and classical logic on the quantifier level; the only difference lies in the absence or presence of something like LEM. If we view for-all as generalized conjunction, and there-exists as generalized disjunction, then we can see why some logical statements involving quantifiers might not be provable in intuitionistic logic (for example, the generalization of the unprovable propositional version of one of deMorgan’s laws).
PLFA finishes off the Bin-isomorphism exercise here, by asking you to show an isomorphism between ℕ and ∃[ x ](Can x). But this is essentially what you already did at the end of 842Isomorphism.
11 Decidable
Given a property or relation, we can compute whether it holds using Booleans, or we can produce inductively-defined evidence that it holds (or does not hold). There are pros and cons for each approach. The advantage of Booleans is that we’re all used to computing with them. But a Boolean value doesn’t tell us why it was computed that way, what the implications are. The colloquial term for this is "Boolean blindness". It’s not as easy to compute with evidence, but it’s a lot easier to reason with it.
It’s an intermediate-level Agda technique to go back and forth between one approach and the other as needed. This is made a lot easier by decorating Booleans with that missing justification. That is, instead of a value that is "true" or "false", we have an indication of "true" together with a proof of it, or an indication of "false" with a proof of it. This is the basis of the Dec inductive datatype.
This file explores the use of Dec, and shows that it should be preferred to the use of Booleans. It will be used in the second half of PLFA. I have deviated from the stable release to include some material from a future release of PLFA, because it will make our lives easier when working with the second half of the text.
This new material deals with proofs that can be computed by Agda at compile time (during typechecking) and then hidden in implicit arguments, which streamlines the proofs and lets them take on forms familiar to us. The key to the technique (called proof by reflection) is Agda’s treatment of implicit arguments. In the case where the implicit argument is a record, Agda will provide it if all the fields can be provided. Since ⊤ is implemented as a record with no fields in the standard library, Agda can always provide it as an implicit argument. The technique hides proofs for decidable properties P by computing a yes instance of Dec P, erasing it to a Boolean, and then using T to map it to ⊤. (A proof for a no instance cannot be provided by this technique, but if those are what we wish to erase, we can negate the property and then use the technique.)
A simple example of hiding a proof of the less-than-or-equal-to relation for a guarded version of the monus function is given in the local versions (starter and completed) of this file. We will return to the idea in the Lambda chapter.
I believe this file is the first time where one difference between Agda 2.5.4.2 (used to develop PLFA) and Agda 2.6 is evident. In Agda 2.6, a definitional equation that contains an absurd pattern can be omitted as long as there is at least one that does not. The completeness checker can deduce and verify the omitted line, and interactive development will not generate it. This means that one of the lines in T→≡ and ≤ᵇ→≤ will not be generated if we start from scratch, as we do with the starter file. It doesn’t hurt to include the lines, so the code in PLFA is not incorrect, and a case can be made that including them improves readability. We’ll see this more and more as we continue with PLFA, so please don’t be alarmed by this minor divergence.
12 Lists
This material could have been done earlier (as it is in SF), but doing it late allows PLFA to use more abstraction and ask more interesting questions. I have left the All and Any exercises as optional, as I think by this point the ideas should be pretty clear and the solutions not difficult (except for the exercises All-∀ and Any-∃, which appear to require a version of extensionality that works with dependent functions).
The map, filter, foldl, and foldr functions are standard in most functional programming languages.
This wraps up the first half of PLFA. In the second half, we change gears, and start studying the foundations of programming languages, using the Agda tools and concepts that we have learned and developed.
13 Lambda
Alonzo Church’s lambda calculus was the first formal model of general computation defined, and enabled the first proof of uncomputability in 1936. Functional programming languages can often be viewed as elaborations of the lambda calculus. Surprisingly, the lambda calculus is also a good basis for formal definition of the semantics of imperative programming languages such as Java, and for proving properties of such languages.
Not surprisingly, there are many variations. Usually variables, abstraction (lambda), and application are included, but even the first two can be dispensed with (see combinatory calculus). Typed or untyped? The first choice makes general recursion difficult; should something more be added to enable it? Should it be possible to form a term and then determine if it has a type, or should it be impossible to create an untypable term? What base types should be included? What rules should govern computation?
PLFA makes a particular set of initial choices, and then both modifies them and adds to them in subsequent chapters. The initial choices are close to an influential set called PCF by Gordon Plotkin in 1977, based on unpublished work by Dana Scott. Plotkin’s PCF included Boolean constructors (true/false) and an eliminator (if-then-else). PLFA leaves these out, probably because natural number constructors (zero/suc) and an eliminator (a case expression with zero/suc branches) are included. Once one understands how to handle naturals, Booleans are an easy exercise. Including them would just add to the length of proofs (by increasing the number of cases) without teaching us anything new.
In addition to the features already mentioned (variables, abstraction, application, and naturals), PCF and PLFA include a general recursion construct, using the Greek letter \(\mu\) (mu). With this construct, one can not only easily and naturally write recursive functions, one can also write infinite loops. Without it, one can show that every computation is finite (this is called "strong normalization"). The choice to include \(\mu\) makes things difficult in Agda, since we would like to write an evaluator for the chosen language, and Agda functions must always terminate. The solution is to add a parameter to the evaluator which is the number of steps it takes. Thus computation always terminates, but sometimes without producing a final result.
We immediately run into the issue of the surface syntax for terms in our chosen source language. Agda has already taken the most natural forms and we can’t overload them. So PLFA uses slight variants: a lambda with a stroke through it (\Gl- in Emacs), an explicit middle dot (\cdot) for application instead of just juxtaposition, and backquotes (\‘) for variables (also added to terms like zero and suc, but directly instead of with a space in between). The idea is to define a datatype Term by using Agda’s parser to make source language terms look like Agda terms, while actually building a tree representation that works well for computation. The result can be a little awkward to read and write, so I will use Church’s original notation here and on the whiteboard. (It’s useful to know this for looking at other references and research papers also.) The identity function, which would be ƛ "x" ⇒ "x" in PLFA notation, is \(\la x . x\) in Church’s notation. Application is juxtaposition (as in Agda), and a low dot is used instead of a double arrow to separate the variable of a lambda from its body.
Church originally developed the untyped lambda calculus, and added types several years later. He used only variables, abstraction, and application. In order to argue that this was a general-purpose model of computation, he needed to encode natural numbers and show how to do arithmetic. The encoding is now called "Church numerals". Although we don’t need them, since Term has representations for zero and successor, they are a good source of nontrivial examples that use higher-order functions. This is potentially confusing, having two languages (three if you include Church’s notation) and three representations of natural numbers; hopefully you can keep them all straight.
In order to form the Church numeral for two, we take the Agda representation for two (suc (suc zero)), and abbreviate suc by \(s\) and zero by \(z\), to get \(s~ (s~ z)\). We then let the user supply whatever interpretation of \(s\) and \(z\) they want, which results in \(\la s . \la z.~s~ (s~ z)\). Another way to think of this is as an iterator, which applies a function \(s\) twice to a base value \(z\). Things get interesting when one or both of these are themselves Church numerals.
We’d like to define arithmetic with Church numerals, though technically it’s too soon, since we don’t have a definition of computation for Term. What PLFA does is work with the intuitive idea of what computation should be, followed later by verification that things really do work as we expect. The exercises at this point are like writing things on paper. There’s no way for you to verify them. PLFA defines a function \(\mathtt{suc}^c\), which is somewhat misnamed. It consumes a natural number in the source language and produces another one. One might use this name for a function that consumes a Church numeral and produces a Church numeral one larger, but because of this decision of PLFA, we can call this second function csuc, and you should be able to write it based on the intuitive idea of computation.
A number of concepts are introduced in the Web version of PLFA that don’t show up directly in the Agda code. They’re necessary in order to understand the code, though, and they show up all the time in references and research papers. A programming language typically has a number of binding constructs (or just binders), which give new meanings to the names of variables. \(lambda\) is the most common one (sometimes disguised as a function declaration) but here we also have \(\mu\) for recursion, and the suc clause of case. At the top level, in practical terms, every variable has to be bound, or there is no way to interpret it. But as we consider subterms, we may remove the binder, in which case an occurrence of a variable could be free. A term with no free variables is closed.
PLFA is clear that the choice to use named variables complicates the code in this chapter and the next, before a different choice is made in the DeBruijn chapter. The issue is with the reuse of names, which can be reduced but not avoided. This is also a problem with for-all and there-exists, which are also binding constructs. I have seen this issue affect students doing their first induction proofs on paper, when they insist on using \(n\) for every quantified variable.
PLFA (like SF and TAPL before it) does make some decisions at this point that make our lives easier. Where possible, it considers only closed terms, and defines evaluation so that it uses call-by-value (arguments are evaluated to values before being substituted). This avoids the messiest case in substitution, when a free variable is in danger of being "captured" by being substituted under a binder. What remains is still more complicated than one might expect.
Here is the example given in PLFA of the problem being avoided, but stated using Church notation. If we have the term \(\la x . x~y\) and we wish to substitute \(x~\mathit{zero}\) for \(y\), doing it blindly gives \(\la x . x ~ (x~\mathit{zero})\). The reason this isn’t correct is that the \(x\) in \(x~\mathit{zero}\) has been bound, when we might expect it to remain free. (If this particular substitution is being done in a subexpression, then the full expression might contain the actual binder for what at this point is a free occurrence of \(x\)).
Doing the substitution properly requires renaming the bound variable in the term being substituted into so that it looks something like \(\la z . z~y\), and then the substitution can be completed, giving \(\la z . z~(x~\mathit{zero})\). Coming up with that name \(z\) that avoids a conflict (we couldn’t choose \(y\), for example) is more work than we care to do at this point. Once we learn how to avoid the work (by avoiding names), we will be able to handle situations like this.
The term "beta reduction" is usually used for the rule called β-ƛ in the definition of single-step reduction, which dates back to Church’s first work on the lambda calculus. By extension, the other rules that involve a substitution at the top level of an expression (for case and \(\mu\)) are also given β names here. Letting subexpressions be rewritten is sometimes called the "compatible closure"; PLFA calls the associated rules "compatibility rules" and uses ξ (xi) to name them, which is also done in several other sources, though these names are not as common as "beta reduction". There is no rule allowing reduction in the body of a binding construct, again to avoid dealing with non-closed terms.
We have seen instances before where PLFA uses ∀ without type annotations on the variable, but starting with the definition of single-step reduction, they are used more frequently. Agda has type inference: if type annotations are left off, the compiler tries to figure out what is appropriate, and complains if it cannot. Both the reader and Agda can figure out the type of Γ, not from the name, but from the way it is used in a typing rule. Annotations should still be used when they help readability. Agda is already doing a fair amount of work for us with implicit arguments and pattern matching, and now is shouldering even more of the burden.
The traditional presentation of the simply-typed lambda calculus also uses type annotations on the variables of lambdas, but PLFA’s Term datatype does not, another decision that makes our life easier in certain ways. The later Inference chapter briefly discusses this, and shows some simple ways in which types can be inferred for our simple language. The situation with Agda is much more complex, but it is based on the same ideas.
Having defined reduction, we can verify that our Church numeral definitions, such as \(\mathtt{plus}^c\), really work as intended. In the example reductions given, Agda names are used as shorthand in some cases in order to make the expressions more readable. Where PLFA writes \(\mathtt{two}^c~\cdot~ \mathtt{suc}^c ~\cdot~ \mathtt{zero}\), the full expansion in Church notation would be \((\la s . \la z.~s~ (s~ z))~ (\la n .~\mathtt{suc}~n)~ (\la s . \la z . z)\). The convention PLFA is using in reduction sequences is to keep names around until they need to be applied with a beta rule, at which point it replaces a name by its meaning, and then does the substitution of the argument value. When we define an evaluator, it won’t follow this convention; it will use full expansions all the way through.
The long reduction sequences given here were generated by the evaluator we will write later, with editing by hand to apply the naming shorthand described above. You shouldn’t be intimidated by them, and you don’t need to follow them step by step all the way through. But it is worth testing your intuitive understanding of the reduction rules by looking at a random line without looking at the one below. See if you can identify the point at which the reduction takes place, and the effect of the beta-rule applied.
To introduce types to our source language, we need a way to talk about them (this is the datatype Type), and a way to specify evidence that a term has a given type. Although our terms are closed at the top level, as we descend into them recursively and pass under binders, we encounter subexpressions that have free variables. A context is a way of remembering information about a variable that was present at the moment it became free. It is essentially a list of (name, type) pairs, but we create a new notation for it that matches common practice in many references. We then need to define a lookup judgment that is essentially the same idea as list membership. It helps to think of a (name, type) pair as a binding of the name to a type.
I have made one small improvement here which will show up in PLFA in the future. The lookup judgment has two constructors, Z which can be thought of as "here", and S which can be thought of as "not here". An inner lambda can use the same variable name as an outer lambda, so to implement scope correctly, we must always use the most recent binding of a name in the context. The S constructor must thus include a proof that the name being looked for is not the most recent addition (and thus we must look deeper in the context). Since names are strings, this is a string inequality proof. We can use the proof by reflection idea from 842Decidable to hide that proof. This is done with the "smart constructor" S′, which we can then use in place of S in the lookup judgments we construct. This saves us only a bit of time and space now, but will be much more helpful in 842Inference.
That gets us set up for the type judgment _⊢_⦂_ (again using standard notation with the "turnstile" or \vdash symbol). We can read \(\Gamma\vdash t : T\) as "Gamma types \(t\) as \(T\)". There is one rule for each way of forming a term (we are not always this fortunate; sometimes there needs to be more than one) and the rules should be fairly intuitive.
The file ends with a few examples of type judgments, and short proofs that specific small terms cannot be typed. At this point, multistep reductions and types must be computed by hand, even though we know algorithms must exist (the Agda compiler does these things for a more complicated language). We’ll resist the urge to write those algorithms, and instead prove some properties of reduction and types. As a by-product, we will get an evaluator, which needs only to be supplied with a type judgment for the initial term. Later on, we’ll see how to compute the type of a term.
14 Properties
The important properties of the simply-typed lambda calculus that are proved in this chapter show that types behave in a useful way with respect to evaluation. The progress theorem states that a closed, well-typed term is either a value or can take a step. The preservation theorem states that in the latter case, the type before a step is the same as the type after the step. Together, these theorems show that "well-typed terms do not get stuck"; if evaluation of a term terminates, it terminates in a value of the same type. At the end of the file, we put these two theorems together to provide an evaluator.
These proofs are larger and more intricate than the ones we’ve seen so far, though amazingly enough, we will be able to develop them by livecoding in lecture, knowing where we are at all times, with readable proofs. Unfortunately, Agda (like Coq) requires all helper functions to be at least declared before proofs that use them. So we start with helper functions whose utility is not yet apparent. To deal with this inversion, let’s start by sketching the main results and how they might be proved, and see what issues might give rise to helpers. It’s important to keep the big picture in mind at all times, and perhaps this sketch will help.
The premise of the progress theorem is that we have a term that can be typed in the empty context (thus it is closed). To make the proof more readable, PLFA introduces the inductive datatype Progress, which wraps up either a proof that a term is a value or a proof that it can step. The conclusion of the theorem is that we have Progress for this term. The proof is by induction on the typing derivation of the term (which is like derivation on the term structure itself, but with more helpful information). The cases where the top-level typing constructor is ⊢ƛ or ⊢zero are easiest, as the terms involved are clearly values. For an application L · M, we need to recursively apply the theorem (use induction) on L and possibly M as well. If one of these steps, then by the corresponding xi-rule, the application steps. If both are values, then the typing rule tells us that L has a function type; we need a helper to conclude that it’s a lambda, after which we can show that the application steps using a beta-rule.
This motivates the main helper for the progress theorem, called the canonical lemma, which characterizes values and their types. The other cases in the progress theorem use the same ideas.
The premises of the preservation theorem are that we have a closed, well-typed term, and a step that it takes. We want to show that the result of the step has the same type. The two premises are not independent; the top-level typing rule used will restrict the stepping rule, and vice-versa. The two categories of stepping rules are xi-rules and beta-rules. If a xi-rule is used, we can use recursion/induction on the subterm that is rewritten; if a beta-rule is used, then substitution takes place in the step (except for β-zero).
This motivates the substitution lemma, which says that substitution preserves types. What does this mean, exactly? If we want to do the substitution N [ x := V ] in a beta-rule, then x is free in N, and V can be typed as some type A in the empty context. N can be typed as some type B in a context that includes a type for x (which should be A). Since substitution is recursively defined, and several binders may have been pulled off, there could be other free variables in N. So we have to postulate a context Γ such that N can be typed as B in context Γ, x:A, and the conclusion of the substitution lemma should be that N [ x := V ] can be typed as B in context Γ.
Substitution does no recursion in the case of zero and variables. When the variable we arrive at is the same as the one being substituted for, we need to type it as A in the current context, knowing it can be typed as A in the empty context. This is called "weakening", and is a helper.
When substitution is recursive, the substitution lemma is also recursive. Substitution involves terms and not types, but in the substitution lemma, there is an associated type judgment which involves a context. When the recursion passes over a binder, a variable-type binding is added to the context, which complicates the proof. Consider the lambda case (case and mu are similar).
If the variable of the lambda is the same as the one being substituted, then substitution stops at this point. But the body of the lambda is typed using a new type for the variable, added to the context and shadowing the old type. That old type needs to be removed in order to be able to build the judgment needed for the lemma recursion. So a "drop" helper is needed, which says that a shadowed variable can be removed.
If the variable of the lambda is different than the one being substituted for, then substitution continues into the body. But the lambda variable and its type are added to the context in front of the binding for the variable being substituted for, which needs to be at the end to build the judgment for the lemma recursion. So a "swap" helper is needed, which says that two different variables in a context can be swapped.
We could prove each of these three lemmas (weakening, dropping, swapping) separately, but we would discover that the proofs look very similar. (TaPL has them as separate lemmas, but the proofs shown are little more than the statement "by induction".) It’s better to do one proof (as done in SF; called "rename" in PLFA) and then derive these as corollaries. Each of the lemmas wants to take a type judgment in a particular context and extend it to a slightly different context where the differences (entries not used, entries in a different location, shadowed variables) are irrelevant to the particular judgment. Here’s the general framework into which these can be fit. If we have two contexts Γ and Δ such that every successful lookup in Γ can be mapped (constructively) to a successful lookup in Δ with the same result, then type judgments can also be mapped. The lemmas are then special cases with particular mappings.
The general rename lemma is proved by induction on the type judgment being remapped, but again, in the case of binders, the context changes. A variable binding added to Γ needs to be added to Δ in order to construct the mapping of the type judgment. But in order to use recursion/induction in the proof of the renaming lemma, we need to be able to lift the lookup mapping from Γ to Δ by adding the new variable binding to both. This is the last helper, the extension lemma ext (which of course shows up early in the file’s approach to preservation).
Having proved the renaming lemma, the three corollaries we need follow by constructing the appropriate mapping. Here the Emacs command C-c C-h is useful. If the name of a helper function and possibly some arguments are typed into a hole, followed by this command, Agda mode responds with a type signature, and copies it into the Emacs kill ring and system clipboard so that it can be yanked/pasted.
This command (like others in Agda mode) can occasionally do the wrong thing in a complex situation. This will happen in the swap lemma, where there is an additional argument (the proof that the two variables are different). The helper command generates implicit arguments that shadow the variables and make the proof useless. Some manual editing is needed to get things back on track.
15 DeBruijn
We have arrived at the jewel-like heart of the second half of PLFA. Here all the work we did in the previous chapter pays off. That work, thanks to Agda, was still less than I’ve seen with other approaches. But now it gets even easier, thanks to a couple of ideas that are naturally motivated by what we’ve just seen.
The first idea is inherently-typed terms. Previously, we created one inductive datatype to represent terms, and another one to represent a type judgment on terms. But the expressions representing type judgments look a lot like the terms they are typing. Why not just represent a term by its type judgment? Previously, a type judgment was a ternary relation \(\Gamma\vdash t : T\); now a term will be a binary relation \(\Gamma\vdash T\). That is, the inductive datatype for terms will be indexed by a context and a type.
This has the advantage that it is impossible to write a "nonsense" term that cannot be typed. One disadvantage is that signatures get a little harder to read, but this is just a matter of gaining familiarity. The idea is usually attributed to Altenkirch and Reus’s 1999 paper "Monadic presentation of lambda terms using generalized inductive types" (don’t try reading this unless you know some category theory) but there is a lot of prior work. Church’s original presentation of the simply-typed lambda calculus incorporated types in such a way that only typable terms were well-formed; the separate style used earlier here (and just about everywhere else nowadays) is due to Curry.
There are two places where the type judgments we have seen don’t quite look like the terms they are typing. One is when a binding construct is used in the term. The variable name appears in the term, but it does not appear in the type judgment, because we can get it from the term.
The other place is when a variable is used in a term. Its string name (like "x") appears in the term, but what appears in the type judgment is a number of steps back to where the variable appears in the context, with string inequality proofs to ensure that we refer to the most recent binding of that variable name. We gave "step back" the evocative name of S and "we’ve arrived" the name of Z to make it clear that we’re just counting back from the right end of the context.
If we abandon the idea of using names for variables, we can also abandon the string inequality proofs, and a variable use (in a term represented by a type judgment) just becomes a number counting back to the type of that variable in the context. Since exactly one item is added to a context for each binder passed under, the number can also be seen as counting the number of enclosing binders to skip over to get to the one being referenced.
This idea is called de Bruijn indices. It was introduced by Nicolaas de Bruijn for use in his pioneering Automath system in 1970. Automath was a system for expressing machine-checkable formal proofs. In designing Automath, de Bruijn independently described many ideas in dependent types (Howard of the Curry-Howard isomorphism being the other person credited). de Bruijn indices are very useful in proof formalization (his original 1972 paper showed how a major result could be simplified using them), but they also are used in compilers and interpreters for functional programming languages. As we’ll see below, they are really not human-readable, which is why we continue to use names in programming.
Here are some examples. The identity function, \(\la x . x\), becomes \(\la . 0\). \(\la x . \la y. y\) becomes \(\la . \la . 0\). \(\la x . \la y . x\) becomes \(\la . \la . 1\). Next, two examples that appear in PLFA, but rendered here in Church notation. The Church numeral for two, \(\la s . \la z . s~(s~z)\), becomes \(\la . \la . 1~(1~0)\). The addition function: \[\mu + . \la m . \la n . ~\mathit{case}~m~[~\mathit{zero} \Ra n ~|~ \mathit{suc}~m \Ra \mathit{suc}~(+~m~n)]\]
becomes \(\mu . \la . \la . ~\mathit{case}~1~0~\mathit{suc}~(3~0~1)\). Notice what happens in the \(\mathit{case}\) expression. It’s the pattern in the successor clause that acts as a binder. So in the zero clause, the variable \(n\) becomes \(0\), but in the body of the successor clause, the variable \(n\) becomes \(1\). In the named-variable version, if we wanted to refer to the outer binding of \(m\) in the body of successor clause, we couldn’t, because it’s shadowed by the \(m\) in the pattern \(\mathit{suc}~m\). But in the de Bruijn indexed version, we could; it would be \(2\).
You should now be able to see the advantage of using de Bruijn indices. Shadowing is not possible, so substitution doesn’t have to worry about variable capture, and proofs don’t have to demonstrate that it isn’t happening. We don’t have to restrict ourselves to closed terms any more. How do we represent a free variable? References to bound variables are always less than the number of enclosing binders, so all the rest of the natural numbers can be used for free variables. The excess (after being diminished by the number of enclosing binders) becomes an index into the context. This is exactly the interpretation we need to handle the building up of a context as we descend into a term in the typechecking process.
Since we are not using Church notation, but representing terms as proofs, we can’t just use built-in Agda numbers and expect them to work. Instead we write a function # which maps numbers to lookup judgments expressed using S and Z, and then we can say # 2 instead of S (S Z).
A conventional presentation (such as in TaPL, which uses de Bruijn indices in all its OCaml code) focusses next on substitution as needed to define reduction. But this introduces new complications. It’s natural to define substitution by structural recursion on the term being substituted into, and that’s what we did before. As we pass under a binder in that recursion, the index of the variable being substituted for goes up by one (it starts at zero at the beginning of the operation), and we have to keep track of that (before, it was just a name that didn’t change). But also the indices of the free variables in the term being substituted have to go up by one, to point correctly over that binder. That requires another structural recursion on the term being substituted. And here’s the problem: that second recursion passes under binders in the term being substituted. So again a threshold must be maintained, because indices below that threshold (which represent bound variables in the original term being substituted, but look free because binders have been removed) shouldn’t be adjusted.
This sounds messy, because I’m explaining it quickly. It’s not that bad, and you might enjoy working out the details. However, PLFA brings in another clever idea at this point to avoid this complication. It’s due to Conor McBride, from the brief unpublished draft paper "Type-preserving renaming and substitution" (several published followup papers use the idea, with credit). McBride noticed the similarity between renaming and a more general version of substitution. What we did before, now called single substitution in the named case, consumes a term to be substituted into, a variable name, and a term to be substituted, and produces a term. In the new context, single substitution consumes a term (that is, a proof judgment of type \(\Gamma,B\vdash A\)) to be substituted into, a term (a proof judgment which must have type \(\Gamma\vdash B\)) to be substituted for the variable 0 (that is, lookup judgment Z), and produces a term (a proof judgment which must have type \(\Gamma\vdash A\)).
To make substitution look like renaming, we generalize it to substitute for more than one variable. We replace the term to be substituted for the variable zero with a map from lookup judgements over \(\Gamma\) (that is, many variables) to terms (proof judgements over another context \(\Delta\)) to be substituted for them. If we then move the term being substituted into to be the last argument, and notice that we can parenthesize this and the result type (because \(\ra\) is right-associative), the result type becomes a map from proof judgements in context \(\Gamma\) (that is, terms) to proof judgments in context \(\Delta\) (also terms).
Renaming consumes a map from lookup judgements in context \(\Gamma\) (now variables) to lookup judgments in context \(\Delta\), and produces a map from proof judgments in context \(\Gamma\) (now terms) to proof judgments in context \(\Delta\). That is, it lifts a map (from variables to variables) to a map from variables to terms. Our general substitution lifts a map (from variables to terms) to a map from terms to terms. The similarity is probably more obvious looking at the type signatures of the functions in the Agda code than in what I have just said.
McBride’s paper abstracts away the differences between renaming and general substitution to create a single operation of which these are special cases. PLFA instead implements rename followed by general substitution, though it does point out the similarity, which is pretty obvious in the implementations. Both the paper and PLFA use renaming as a helper in general substitution (it is used when passing under a binder, because the context changes, so terms in the old context have to be lifted to the new context). Single substitution is the special case where the map is the identity on everything but variable 0, and the two contexts are \(\Gamma,B\) and \(\Gamma\). The next chapter introduces binding constructs that bind two variables at once, and this is also easy to define cleanly using general substitution.
One consequence of the new approach is that we don’t have to prove that substitution preserves types. It is inherent in the definition of substitution itself. Similarly, the rules for reduction incorporate preservation of types, so the need for a separately proved preservation theorem evaporates. We still have to prove progress (and it doesn’t look much different), and iterating it gets us an evaluator. Determinism also still requires a proof, but PLFA skips it, as it isn’t much different from what we did earlier. So once we get past substitution, everything is about the same in terms of design, and either easier or about the same in terms of difficulty. The net saving, both in lines of code and readability, is significant.
There are no exercises assigned for this chapter. We will make up for it in the next chapter.
16 More
In discussing fairly easy extensions to the basic systems described in the previous three chapters, PLFA makes the decision to largely abandon English prose. Informal descriptions are given by grammars and inference rules; formal descriptions by code one is expected to read. This is a reasonable decision at this point. One can always delete sections of the code and attempt to rederive it in order to gain further insight. There are corresponding chapters in both TaPL and SF, but both use more English prose.
I have assigned the recommended exercises in this chapter, and left the others as optional work. The alternate formulation of products using a case eliminator requires double substitution. There is a stretch exercise at the end that asks you to show that double substitution is equivalent to two single substitutions. This is rather difficult, and best handled in the context of a general algebraic theory of explicit substitutions.
You will find, when doing the assigned exercises, that Agda will help locate places where you need to do work. Some will be type errors, and some will be missing cases corresponding to added constructors. In other words: don’t try to do all the work and then see if it typechecks. Do a bit of it, and let the typechecker show you what to do next.
There is a cryptic comment in the Values section of the implementation: "Implicit arguments need to be supplied when they are not fixed by the given arguments." I paid no attention to this, and ran into trouble when I attempted to solve the exercises that I have set you. I found with both new features that Agda complained about not being able to instantiate an implicit argument, highlighting it in yellow. It turned out that that argument could be "anything", and to solve the problem, I had to implicitly quantify over "anything" to give it a name, and then provide that name in braces as the implicit argument Agda was complaining about. Just be warned that this may happen to you.
17 Bisimulation
We will not be covering this chapter to any significant extent in lecture. The idea of bisimulation is important, though. It arises when considering new language features implemented entirely in terms of old ones, such as let using lambda from the More chapter. If one can show that every computational sequence in the new language has a corresponding sequence in the old language, one can lift theorems proved for the old language to the new language. More generally, compilers do a lot of internal transformations (for the purpose of optimization) between parsing and native code generation. One would generally like to know that these transformations preserve the semantics of programs, and bisimulation is one way of demonstrating this.
18 Inference
So far in PLFA we have essentially computed types by hand. It’s natural to ask if this can be automated. The identity function \(\la x . x\) does not have a unique type in the type system we have been working with.
To motivate the topic, PLFA starts with a minimal version of STLC (cut back from the presentation in Lambda to just variables, abstractions, and applications) in which the variable on each lambda is annotated with a type. (This is how STLC is first presented in TaPL.) This suffices to determine the type of any term (or to conclude that it cannot be typed). It’s not hard to figure out the typing rules, and they suggest an algorithm that is structurally recursive in the term being typed.
This algorithm infers the type of a term. Without the annotations, our previous algorithms checked the provided type of a term. Can we make do with fewer annotations, perhaps with a mix of these two strategies? That is the subject of this chapter, which has become known as bidirectional typing, after the two directional arrows used to differentiate the two possibilities above. The approach arose informally in practice, and only later did researchers provide theoretical underpinnings. A detailed overview can be found in the recent survey paper by Dunfield and Krishnamurthi (draft available on arXiv).
PLFA uses an up arrow for inference or synthesis, and a down arrow for checking or inheritance. This is consistent with earlier papers, but more recent work uses right and left arrows, which are easier to remember. One possible mnemonic is that synthesis is a loftier, more aspirational goal than inheritance.
Going back to the minimal STLC, it seems clear that the type of a lambda cannot be synthesized; it must be inherited. On the other hand, the type of a variable can be synthesized, because we can just look it up in the context kept during typechecking. What about an application? If we synthesize the type of the term in function position, we can then confirm that the term in argument position inherits the domain of that function type, and together these synthesize the type of the application (the range of the function type).
We could also do it the other way around, which highlights an issue with formalizing bidirectional typing: there are a lot of possible choices. PLFA follows a recipe developed by Dunfield and Pfenning to regularize the process and improve desirable characteristics of the result. In this recipe, constructors are typed using inheritance (as with lambda) and the principal term of a destructor is typed using synthesis (as with the function term of an application) while side terms are typed using inheritance (as with the argument term of an application).
The recipe makes deriving the rest of the rules (for naturals and fixpoint) straightforward. In putting a term together, there may be points where there is a mismatch: a subterm needs to be synthesized, but it requires inheritance, or the other way around. In the first case, a type annotation is required; in the second case, since synthesis is a stronger requirement than inheritance, we can simply forget that we have more than we need.
The resulting rules may seem a little odd in places. The constructor zero is typed using inheritance, but we know perfectly well how to synthesize its type! In practice this is not an issue, since a top-level annotation can usually provide what is needed for subterms. A term which is a destructor applied to a constructor will not be able to be typed, but such a term can be simplified. There are legitimate reasons to write such a term: the implementation of let using lambda in More, for example. This can be handled by leaving let in place during typechecking, but rewriting it for evaluation, since types play no role in evaluation.
When we formalize bidirectional typing in Agda, we want to write a function that, say, consumes a term and produces its synthesized type. But this function is partial. The way PLFA deals with this is to abandon a single Term datatype and instead introduce two datatypes Term⁺ (synthesized) and Term⁻ (inherited). The synthesize function is still not total, since a value of type Term⁺ could be well-formed but ill-typed. But we can write a decision procedure and have something valid to provide for "no" instances.
The alternative is to write a Synth datatype indexed by Term that provides evidence that a term meets the criteria for synthesis (though it may still be ill-typed). I have added some material to the end of 842Inference.agda that explores this option. The mutually-recursive definitions of Term⁺ and Term⁻ include a down arrow for annotation and an up arrow for "forgetting synthesis". In the examples in this chapter, you will see that the up arrows are more intrusive than one might like, providing further motivation for the alternative representation.
The design of the datatypes Term⁺ and Term⁻ follows the recipe described above and in the informal first section to the Inference chapter. It uses named variables as in Lambda, rather than inherently-typed terms as in DeBruijn. Given the recipe and informal description, the typing rules are mostly a straightforward adaptation of the ones in Lambda. There is one major change from PLFA in the local file: I have included from 842Lambda the "smart constructor" S′ for lookup judgments that hides string inequality proofs using the proof reflection technique from 842Decidable. Previously S′ was used in type judgments that we wrote by hand, so it could just be a function that used S in its implementation. But we want Agda to use it in computing types here, so it has to be an actual constructor. I have put in the smart definition and dropped the prime on the name.
The resulting savings in 842Lambda was minor; here it is considerable. The reason is that Agda’s decision procedure for strings converts the string to a list of characters, then compares the characters pointwise. Character comparison is done by converting the character to a natural number and then using the decision procedure for natural numbers. You can see how long a "no" answer gets, even for the single-character identifiers we are using, by evaluating "x" ≟ "y". If we are converting to the inherently-typed representation in DeBruijn, as is done later in Inference, these parts of the proof go away. But we still have to look at them for debugging and unit testing.
The goal of the chapter is the mutually-recursive decision procedures synthesize and inherit. inherit consumes a Context, a Term⁻, and a Type, and produces either a typing judgment or a reason why the term cannot have that type. The helper functions that precede these two in the code are all needed in the construction of evidence for "no" instances. synthesize consumes a Context and a Term⁺, and produces either a type and associated typing judgement (these are wrapped up using ∃ syntax) or a reason why the type cannot be synthesized. The code for both of these is structurally recursive on the term.
The exercises I have added at the end bring the Agda development closer to the literature, by using plain terms in the style of Lambda together with type annotation, and Synth/Inherit datatypes as described earlier. You have to write decision functions for these, decorate functions which take plain terms together with appropriate evidence to Term⁺ and Term⁻ respectively, and finally toTerm functions which compute and hide the evidence using the proof by reflection ideas. We could keep going, rewrite the synthesize and inherit functions to work on plain terms, and get rid of Term⁺ and Term⁻ entirely. This will be future work for me. It’s possible that the resulting code will be harder to derive and read.
To be continued.