The lost preludes of FICS
1 Introduction (Agda, 2020)
1.1 Natural numbers from scratch
1.2 Proofs within Agda
1.3 Other forms of data
2 Introduction (Haskell, 2014)
2.1 Natural numbers
2.2 Proofs
2.3 Recursion
2.4 The standard representation
2.5 Tuples
2.6 Sequences
7.6

The lost preludes of FICS

\(\newcommand{\la}{\lambda} \newcommand{\Ga}{\Gamma} \newcommand{\ra}{\rightarrow} \newcommand{\Ra}{\Rightarrow} \newcommand{\La}{\Leftarrow} \newcommand{\tl}{\triangleleft} \newcommand{\ir}[3]{\displaystyle\frac{#2}{#3}~{\textstyle #1}}\)

From 2008 to 2014, I designed and taught CS 145, an enriched first course in computer science open to all students in the Faculty of Mathematics at the University of Waterloo. I had previously designed and taught CS 135, a first course using Racket, which became the standard introduction. I didn’t want the enriched course to just go faster, and I wanted to keep it accessible to good math students without computing experience. I hit on the idea of using Haskell as functional pseudocode, and starting with the construction of the Peano numbers. Students would still program in Racket, but they would need to read and understand simple Haskell code.

I later took ideas from CS 135 and CS 145 to create a course (Math 642) for students (primarily high school math teachers) in the Master of Mathematics in Teaching (MMT) program. For that course, I took the Haskell out. I plan to use that design for the first part of my flânerie A Functional Introduction to Computer Science (FICS), which will not mention Haskell either. But in recent years I have learned more about proof assistants, and they are central to my other flânerie Logic and Computation Intertwined (LACI). LACI contains a lengthy introduction to the dependently-typed language Agda, which is implemented in Haskell and has similar syntax.

It occurred to me that Agda would make for even better functional pseudocode, since I typically discuss reasoning about code and proving things about code, and Agda’s type system can handle that also. Someone else teaches CS 145 now, so there’s no danger of this actually coming to pass. But as a thought experiment, I wrote the following. With Haskell, we are stuck with the built-in syntax for many features. Things are more fluid in Agda which has some lightweight syntax extensions.

At the bottom of this page is the original lecture summary from 2014, using Haskell, as posted immediately after class. But above it is a hasty, unpolished adaptation using Agda. Let me know what you think!

It’ll never happen, but picture this: a room full of students excited for their first university course in computer science. What they get is...

1 Introduction (Agda, 2020)

Lecture module 01 of this fantasy version of CS 145 Fall 2020 introduces course concepts using the programming language Agda. CS 145 students will not be required to program in Agda. We will be using Racket for programming, introduced in the next lecture module. However, Agda is a highly expressive functional programming language, and works well in explaining concepts. Although you will not be required to program in Agda, you will be required to understand some programs written using it, and we encourage you to experiment with programming in it.

1.1 Natural numbers from scratch

We will build our own representation of the natural numbers, which we will call Nat, and we will define some elementary arithmetic operations on this representation. We represent zero (0) by Z. You may be used to having the natural numbers start at one, but starting at zero makes more sense, and this is common in computer science and formal mathematics.

The successor operation (adding one) will be represented by S. Thus S Z is the representation of the natural number one (1). Note that S does not actually add one to anything; this is just the interpretation we give to it when it appears in an expression like S Z.

S (S Z) is the representation of the natural number two (2). The parentheses here are necessary, because the expression S S Z is ambiguous; does it mean (S S) Z or S (S Z)? Here the parentheses indicate order of operation, as they do in mathematics. In our interpretation, (S S) Z would not make sense, because the second S does not represent a number and we cannot speak of adding one to it. But S (S Z) makes sense, because we have already established that S Z is a number, and so we can apply S to S Z.

We denote by Nat the set of objects representable in this fashion. A Nat is either Z or S applied to something that has already been established to be a Nat. The following Agda datatype definition implements this:

data Nat : Set where
  Z : Nat
  S : Nat  Nat

We see the two constructors Z and S, and their type signatures, which say what is described informally above: Z by itself is a Nat, and S has to be applied to a Nat to produce a Nat (also, it cannot be applied to anything else).

It may surprise you to learn that natural numbers in Agda are not built in, but defined in library code, and the library definition is close to the one above. (Only the names are different.) This definition may seem inconvenient compared to regular decimal notation which we use to express numbers like 27. There are ways to use decimal notation as shorthand in Agda, but this is essentially the underlying definition.

The definition may also seem inefficient. Agda also has ways to use the more efficient underlying representations and operations provided by computer hardware. But Agda is not so much about efficiently processing large volumes of data as it is about specifying and reasoning about computation. That’s something computer scientists have to do all the time, which is why we are starting with it. Over this course and the next, we will discuss the efficient representations used by hardware and their effect on how we specify practical computation.

Our immediate goal is to define a function that performs addition for our representation of natural numbers, such that the following Agda expression will add our representations of two and one

plus (S (S Z)) (S Z)

The Agda library definition of addition actually uses +, and it’s used as in mathematics, as an infix operator, appearing between its two operands. The simplest way to define one’s own functions in Agda uses prefix notation, where the name of the function appears before its operands or arguments. As we will see, Racket does not have infix operators; it uses prefix notation. You are probably familiar with prefix notation for functions from mathematics; for example, the trigonometric function \(\sin\) is applied in a prefix fashion, as in the equation \(\sin (\pi / 2) = 1\).

Our implementation of plus starts with its type signature.

plus : Nat  Nat  Nat

Like the type signatures for the constructors for Nat, we specify what plus must be applied to (two expressions of type Nat), and what it produces (a Nat). Later, with Racket, we will write something similar for our functions, but we will call it a contract.

Let’s look at that use of plus again.

plus (S (S Z) (S Z))

The expression consists of plus applied to two arguments, namely the Nats S (S Z) and S Z. Once again, we have to add parentheses to eliminate ambiguity, because an expression which starts plus S (S Z) ... will be treated as (plus S) (S Z) ..., which does not make sense in our interpretation.

The two arguments are separated from each other and from the name of the function by spaces, with no additional punctuation needed. The value of the above expression should be S (S (S Z)) if we define plus properly. We can make a start on this by writing down a rule in Agda in the form of a definitional equation.

plus (S (S Z)) (S Z) = S (S (S Z))

This is one part of a possible definition of the function plus, which has two parameters (because it will be applied to two numbers). We have not defined it for all possible parameter values, only for two specific values. Here are some more rules to continue the possible definition.

plus Z Z = Z

plus Z (S Z) = S Z

plus (S Z) (S Z) = S (S Z)

Although the above lines are legal Agda, we are not going to define plus this way, because we would have to write an infinite number of such lines. Since we do not have infinite time (and this summary does not have infinite space), we throw out everything but the type signature, and start again.

We want a general way of defining plus x w for all Nats x and w. We know that x is either Z or (S y), where y is some Nat. This suggests the following two rules, to be completed by filling in the question marks:

plus Z w = ?

plus (S y) w = ?

We need to fill in the right-hand sides, but first a bit of explanation about the left-hand sides. The rules we wrote down earlier described what should happen when plus is applied to two specific values. These rules are going to be a bit more general.

The first rule will be relevant for applications of plus when the first argument is Z. In this case, the second argument can be anything. We give it a name w, and then we can use w on the right-hand side of the first rule. Here, w on the left-hand side is a pattern, that matches any value and gives a name to it. Z on the left-hand side is also a pattern, but it only matches the value Z. Both patterns have to match for the rule to be used. So in the expressions plus Z Z and plus Z (S Z), the first rule will be used (once we complete all rules by filling in the question marks).

The second rule has a pattern w for the second argument, but a more complicated pattern S y for the first argument. This pattern does not match any value; it fails to match Z. But it matches every other Nat, because they are all formed with the S data constructor. In the expression plus (S Z) (S Z), the first rule will not be used because the first argument is not Z. But the second rule will be used, because the second pattern w matches anything and so matches the second argument S Z, and the first pattern S y matches the first argument S Z. In this case, any use of y on the right-hand side will mean Z.

Now let’s complete the rules.

plus Z w = ?

plus (S y) w = ?

The first one is fairly easy, but the second one is not so obvious.

plus Z w = w

plus (S y) w = ?

The right-hand side of the second rule seems to depend on the values of both y and w. We need to combine these in some sensible way.

Let’s try to systematically write down what y could be in the second rule and see if we can express the left-hand side in terms of w, using our knowledge of how we expect addition to work. These are not more rules, but specific examples to get us ready to propose a single rule. The simplest thing y could be is z, and plus (S Z) w is (intuitively) adding one to w, and so is S w. So we can write this down.

plus (S Z) w = S w

Don’t think of this as a rule or definitional equation. It is a more conventional mathematical equation. Continuing for a bit, we will start to see a pattern. I’ll try to line things up to illustrate.

plus         (S Z) w =       S w

plus     (S (S Z)) w =    S (S w)

plus (S (S (S Z))) w = S (S (S w))

I hope the pattern is clear. Each subsequent line has an S added to the pattern for the first argument on the left-hand side, and an S added to the result on the right-hand side. For an arbitrary pair of lines, we can summarize the pattern like this.

plus     y w =    S (... (S w) ...)

plus (S y) w = S (S (... (S w) ...)

Those uses of ... are standing in for a certain number of uses of S and parentheses, and we can’t have anything like that in the rule we’re trying to derive, but the point is that the right-hand side of the second line is the same as the right-hand side of the first line but with an extra S on the outside. Let’s call the right-hand side of the first line v to make this clearer.

plus     y w =   v

plus (S y) w = S v

And now we can see what the rule for plus (S y) w should be.

plus (S y) w = S (plus y w)

You might think, "Is that permitted? plus appears on the right-hand side!" But this is not a problem. It is expressing a more complex computation (an application of plus where the first argument is of the form S y) in terms of a simpler one (an application of plus where the first argument is y, the previous first argument with one S removed).

Here’s the complete definition of plus in Agda.

plus : Nat  Nat  Nat
plus Z w = w
plus (S y) w = S (plus y w)

Once again, it is very close to the definition in the standard library (which uses infix + instead of prefix plus). The definition is recursive, or self-referential. This should not be surprising, as the definition of Nat is, also.

Let’s take a look at an example of computation, which consists of repeated applications of the rules. We will write this using chained equations as in mathematics. What is below is not legal Agda code to be interpreted by a machine, but an explanation of the result of applying the legal definition of plus above. The first line is the legal Agda expression that we are evaluating, and each subsequent line follows by the left-to-right application of one of the rules in the definition of plus.

          plus (S (S Z)) (S Z)

          

          = S (plus (S Z) (S Z))

          

          = S (S (plus Z (S Z)))

          

          = S (S (S Z))

In the above trace, the second and third lines use the second rule in the definition of plus, and the fourth line uses the first rule.

Stepping back a moment, we have a remarkably simple definition of addition, and this is the consequence of adopting a simple definition of Nat. Think about the method for addition in your head, operating on numbers written in standard decimal notation. There are many ways to describe that method, but none of them are going to be as simple as our two-line definition of plus.

Agda provides features for developing code like this interactively (it actually works with the question marks in our incomplete code above). You may be interested in a "movie" that shows plus being developed this way.

Click to view asciicast of interactive development of plus.

We see that plus operates as expected on a concrete example. We could do similar traces for the other examples we gave above. But how do we know that this code is correct for all possible arguments?

1.2 Proofs within Agda

How do we know our definition of plus corresponds to what we think of as addition? It’s hard to prove this without a formal definition of addition, and in fact our definition of plus is one of the best formal definitions of addition around. We can gain more confidence in our definition by verifying that plus has the same properties that we expect of addition.

For example, we could try to verify the following claim, which intuitively says "adding two to a value is the same as applying S twice to that value".

Claim: For all Nats x, plus (S (S Z)) x = S (S x).

Note that this is a statement about an infinite number of Nats. But we want a finite proof, using only the rules that define plus. Here is one.

Proof:

          plus (S (S Z)) x

          

           = S (plus (S Z) x)

          

           = S (S (plus Z x))

          

           = S (S x)

That concludes the proof. It’s different from the earlier concrete examples, because here x is symbolic. But each line is a valid use of one of the rules (the second and third lines are applications of the second rule, and the fourth line is an application of the first rule). If Agda is convinced that x is a Nat, and one asks the system to compute plus (S (S Z)) x, it will respond with S (S x).

Agda’s type system is rich enough that we can state this as a theorem in code. We need a way of talking about equality in Agda. The equal sign drawn with two parallel lines (=) is reserved for definitional equalities, so Agda uses for logical or propositional equality. Once again, this is not built-in, but defined in library code. This time, we will import it.

open import Relation.Binary.PropositionalEquality using (_≡_; refl)

The second import, refl, is the constructor for the datatype. Here’s an example of using it in Agda. We’ll state a theorem and provide its proof. (The name of the theorem is arbitrary.)

zero-equals-zero : Z ≡ Z
zero-equals-zero = refl

Here zero-equals-zero is the name of a value of type Z ≡ Z, and the next line provides its definition. refl is short for reflexivity, which is the statement that anything equals itself. The definition of the datatype we imported has a single constructor, refl, which can only be used to show that something is equal to itself. Here’s another example.

one-equals-one : S Z ≡ S Z
one-equals-one = refl

Here the left- and right-hand sides of the stated equality are structurally identical. But Agda allows more general statements. If there are two expressions e₁ and e₂ such that computation (as seen above) reduces e₁ to some expression e, and also reduces e₂ to e, then refl will work to construct a value of type e₁ ≡ e₂, that is, to prove the theorem e₁ ≡ e₂.

two-plus-two-is-four : plus (S (S Z)) (S (S Z)) ≡ S (S (S (S Z)))
two-plus-two-is-four = refl

Let’s go back to our claim about adding two to any number. It looked like this:

Claim: For all Nats x, plus (S (S Z)) x = S (S x).

Our reasoning about why this was true involved a computation. So Agda should be able to handle this with refl. Here’s how it goes.

two-plus :  (n : Nat)  plus (S (S Z)) n ≡ S (S n)
two-plus n = refl

The upside-down A, , is standard mathematical notation for "for all". The arrow , which we previously saw in the type signature of the S constructor and the plus function, is standard mathematical notation for "implies" or "it follows that". These aren’t two different meanings; they’re the same meaning. two-plus is a function that, when applied to an expression of type Nat, produces a proof specialized to that expression. two-plus (S Z) has type plus (S (S Z)) (S Z) ≡ S (S (S Z)), which is a theorem we previously proved informally by reasoning about the Agda computation. This ability to talk about both computation and proof in the same language is one of the strengths of Agda.

Agda’s computation using the rules we give it can only take us so far. Here’s a claim that is more difficult to prove. It says that addition where zero is the second argument just produces the first argument.

Claim: For all Nats x, plus x Z = x.

We can try a case analysis as before. If we consider plus x Z, we know that x is either Z or S y. In the first case, plus Z Z = Z, and the claim holds. But what about the second case?

          plus (S y) Z = S (plus y Z)

          

                       = ?

We have no good way of proceeding. Let’s try some small examples to gain intuition.

          plus Z Z = Z

          

          plus (S Z) Z = S (plus Z Z)

          

                       = S Z

          

          plus (S (S Z)) Z = S (plus (S Z) Z)

          

                           = S (S (plus Z Z))

          

                           = S (S Z)

Something interesting is happening here. The proof of plus x Z = x for x = S (S Z) contains a complete copy of the proof for x = S Z. You can see this by taking the right hand sides of the last three lines and stripping away the outermost S (...) from each line.

If we consider the next case, the same phenomenon occurs.

          plus (S (S (S Z))) Z = S (plus (S (S Z)) Z)

          

                               = S (S (plus (S Z) Z))

          

                               = S (S (S (plus Z Z)))

          

                               = S (S (S Z))

The proof of plus x Z = x for x = S (S (S Z)) contains a complete copy of the proof for x = S (S Z).

This should remind you of what happened when we developed the definition of plus. There, we first wrote down some specific rules for small examples, and then we threw those away and wrote two general rules. Here we are going to throw away the above proofs and write a more general proof. But what have we learned from these examples? What is the general pattern?

Just as we did for the second rule for plus, we see that for any specific y, if we have a proof of plus Z y = y, then we can construct a proof that plus Z (S y) = (S y). Here’s how it goes:

          plus Z (S y)     [LHS]

          

            = S (plus Z y) [rule 2 of plus]

          

            = S y          [because plus Z y = y]

Agda’s computation with the rules of plus will get us the second line, but not the third. But the third line should remind you of something. Here’s the definition of plus again.

plus : Nat  Nat  Nat
plus Z w = w
plus (S y) w = S (plus y w)

The last line expresses the computation of plus (S y) w in terms of the computation of plus y w. That’s exactly the idea behind the third line of reasoning about plus Z. Just as the computation of plus needed to be recursive, this line of reasoning about plus needs to be recursive.

Agda is up to the task. Let’s state and prove the theorem we want in Agda.

plus-zero :  (n : Nat)  plus n Z ≡ n
plus-zero n = ?

As we did with plus, we do a case analysis, this time on the natural number argument n. It is either Z or of the form S m for some natural number m.

plus-zero :  (n : Nat)  plus n Z ≡ n
plus-zero Z = ?
plus-zero (S m) = ?

In the first case, what we have to prove is plus Z Z ≡ Z. Agda will reduce the left-hand side to Z, so we just have to prove Z ≡ Z.

plus-zero Z = refl

In the second case, we have to prove plus (S m) Z ≡ S m. Agda will reduce the left-hand side to S (plus m Z), so we have to prove S (plus m Z) ≡ S m. The recursive application of the theorem plus-zero m will prove plus m Z ≡ m. What do we need to do with these?

One of the things we’re used to doing in algebra is taking an equation of the form \(x = y\), and replacing an occurrence of \(x\) in some other equation with \(y\). Here \(x\) and \(y\) can be complex expressions, not just variables.

In our particular situation, we need to use the equation we can prove by recursion, namely plus m Z ≡ m, to substitute in what we are trying to prove, namely S (plus m Z) ≡ S m. The Agda feature that does this is called rewrite.

plus-zero (S m) rewrite plus-zero m = ?

The new "what we are trying to prove", that is, the type of the expression we need to replace the ? with, is now S m = S m, because of the rewrite. And that can be proved with refl. Here’s the complete implementation.

plus-zero :  (n : Nat)  plus n Z ≡ n
plus-zero Z = refl
plus-zero (S m) rewrite plus-zero m = refl

This method of completing a proof using recursion is often called induction. As you will learn in Math 135/145 and subsequent courses, it is the central proof technique in mathematics.

Because Agda accepts this definition as valid (as can be seen in the recording below) we have a machine-checked proof of our theorem.

Click to view asciicast of interactive development of plus-zero.

We could go on to define other arithmetic operations such as multiplication (it looks similar to addition) and prove other theorems about them. But computation isn’t just about natural numbers, and I want to give you a taste of what lies ahead.

1.3 Other forms of data

In the remainder of this lecture module, we’ll take a very quick tour through some other forms of data, using Agda. Don’t worry if the ideas are coming fast and seem a bit unclear at this point; we will return for more careful study of the ideas shortly, using Racket.

I’m going to start a new Agda file, and import the library version of Nat (which is called ) as well as infix addition and multiplication, so that we have something to put in our data structures and familiar operations to use.

open import Relation.Binary.PropositionalEquality using (_≡_; refl)
open import Data.Nat using (ℕ; _+_; _*_)

Each parameter in a function is one value, and the result of the function is also one value. We may wish to group two or more values for conceptual or computational purposes. For example, the mathematical notation \((4,3)\) might represent a vector in two-dimensional space, and we might wish to write a function that scales such vectors. Scaling our example by a factor of 2 gives the vector \((8,6)\).

Here is an Agda definition for ordered pairs of arbitrary types.

data Pair (A B : Set) : Set where
  _,_ : A  B  Pair A B

I’m going to use this in the special case where both elements of a pair are natural numbers, so I’ll define a more readable synonym for that situation, and then proceed to define the scaling function and demonstrate its use.

NatPair = Pair ℕ ℕ

np-ex : NatPair
np-ex = (4 , 3)

scale-pair : NatPair  NatPair
scale-pair s (x , y) = ((s * x) , (s * y))

scale-ex : scale-pair 2 np-ex ≡ (8 , 6)
scale-ex = refl

The function scale-pair works only with pairs or 2-tuples. We can use the same idea to create tuples of any length, but we would have to write separate functions for each tuple-length. How can we manipulate sequences of arbitrary length, such as 4, 3, 5, 1, 4, 2?

We combine the ideas for Nat and Pair, and call the result List. With Nat, we had a special data constructor Z for zero. The corresponding data constructor for List is empty, representing the list (sequence) of length zero, or the empty list.

Nat had the data constructor S to add one to an existing Nat. List has the data constructor cons to add one value to an existing List. (These constructor names happen to match what Racket uses.)

data List (A : Set) : Set where
 empty : List A
 cons : A  List A  List A

NatList = List ℕ

list-ex : NatList
list-ex = cons 4 (cons 2 empty)

Just as we did a case analysis on a Nat argument to implement plus, we can do a case analysis on a NatList argument. Here’s a scaling function for lists, and an example of its use.

scale-list : NatList  NatList
scale-list s empty = empty
scale-list s (cons n rest)
  = cons (n * s) (scale-list s rest)

scale-list-ex : scale-list 2 (cons 3 (cons 1 (cons 4 empty)))(cons 6 (cons 2 (cons 8 empty)))
scale-list-ex = refl

Here’s a possible trace of the computation.

          scale-list 2 (cons 3 (cons 1 (cons 4 empty)))

          

          = cons 6 (scale-list 2 (cons 1 (cons 4 empty)))

          

          = cons 6 (cons 2 (scale-list 2 (cons 4 empty)))

          

          = cons 6 (cons 2 (cons 8 (scale-list 2 empty)))

          

          = cons 6 (cons 2 (cons 8 empty))

You shouldn’t take this as a literal representation of what Agda does, because the details of multiplication (which in turn uses addition) aren’t present, and the order of operations is unclear. Later, with Racket, we will be more precise about these matters, and you will be able to trust traces like this.

Lists are an important data structure in functional programming languages such as Racket and Agda. We often want to apply an operation such as addition to every element.

sum-list : NatList  ℕ
sum-list empty = 0
sum-list (cons n rest) = n + sum-list rest

sum-list-ex : sum-list (cons 4 (cons 3 (cons 5 empty)))12
sum-list-ex = refl

And here’s a possible trace of the computation. Once again, the ordering of actual operations is unclear, and this is more for illumination than definitive proof.

          sum-list (cons 4 (cons 3 (cons 5 empty)))

          

          = 4 + sum-list (cons 3 (cons 5 empty))

          

          = 4 + 3 + sum-list (cons 5 empty)

          

          = 4 + 3 + 5 + sum-list empty

          

          = 4 + 3 + 5 + 0

          

          = 12

We do have Agda to implement proofs, so let’s prove one last theorem about these functions. If we scale a list and then add up the elements, that should be the same as first adding and then scaling the result. We’ll need one of the distributive laws (multiplication distributes over addition) which we can import from the standard library.

open import Data.Nat.Properties using (*-distribʳ-+)

sum-scale-list :  (s :) (xs : NatList) 
                 sum-list (scale-list s xs)(sum-list xs) * s

We’ll do a case analysis on the argument xs.

sum-scale-list s empty = ?
sum-scale-list s (cons x xs) = ?

In the first case, we need to show that sum-list (scale-list s empty) ≡ (sum-list empty) * s. By looking at the implementations of scale-list and sum-list, we can see that Agda will reduce scale-list s empty on the left-hand side to empty, and then sum-list empty to 0. On the right-hand side, sum-list empty will reduce to 0, and then 0 * s will reduce to 0. So refl is all we need for this case.

sum-scale-list s empty = refl

The second case is more complicated. sum-scale-list s (cons x xs) requires us to prove sum-list (scale-list s (cons x xs)) ≡ (sum-list (cons x xs)) * s.

On the left-hand side, Agda will reduce scale-list s (cons x xs) to cons (x * s) (scale-list s xs). It will then reduce sum-list (cons (x * s) (scale-list s xs)) to (x * s) + (sum-list (scale-list s xs)).

On the right hand side, Agda will reduce sum-list (cons x xs) to x + sum-list xs, so the right-hand side becomes (x + sum-list xs) * s.

So we’ll be left with proving (x * s) + (sum-list (scale-list s xs)) ≡ (x + sum-list xs) * s.

The recursive application of the theorem sum-scale-list s xs proves sum-list (scale-list s xs) ≡ (sum-list xs) * s. If we rewrite what we have to prove by this, we have (x * s) + (sum-list xs) * s ≡ (x + sum-list xs) * s. The distributive law theorem we imported can prove (x + sum-list xs) * s ≡ x * s + sum-list xs * s. This is backwards! We can either import and use a theorem that equality is symmetric, or we can rewrite using the distributive law, which reduces both sides to x * s + sum-list xs * s. The second choice is slightly easier, so let’s do that. Here’s the full implementation.

sum-scale-list :  (s :) (xs : NatList) 
                 sum-list (scale-list s xs)(sum-list xs) * s
sum-scale-list s empty = refl
sum-scale-list s (cons x xs)
  rewrite sum-scale-list s xs
  | *-distribʳ-+ s x (sum-list xs)
  = refl

Click here to see interactive development of sum-scale-list.

Racket cannot handle proofs directly, and it is not as concise and expressive as Agda. However, it is much easier to program in for the beginner, and easier to explain rigourously. We will learn about Racket in the next lecture module, and continue to use Agda for high-level explanations.

2 Introduction (Haskell, 2014)

Later lecture summaries are likely to be more terse and less complete than this one.

Lecture module 01 for CS 145 Fall 2014 introduces course concepts using the programming language Haskell. CS 145 students will not be required to program in Haskell (we will be using Racket for programming, introduced in the next lecture module). However, Haskell is a highly-expressive functional programming language, and works well in explaining concepts. Although you will not be required to program in Haskell, you will be required to understand programs written using it, and we encourage you to experiment with programming in it.

2.1 Natural numbers

You may be used to having the natural numbers start at one, but starting at zero makes more sense, and this is common in computer science and formal mathematics.

We will build our own representation of the natural numbers, which we will call Nat, and we will define some elementary arithmetic operations on this representation. We represent zero (0) by Z. The successor operation (adding one) will be represented by S. Thus S Z is the representation of one (1). Note that S does not actually add one to anything; this is just the interpretation we give to it when it appears in an expression like S Z.

S (S Z) is the representation of two (2). The parentheses here are necessary, because the expression S S Z is ambiguous; does it mean (S S) Z or S (S Z)? Here the parentheses indicate order of operation, as they do in mathematics. In our interpretation, (S S) Z would not make sense, because the second S does not represent a number and we cannot speak of adding one to it. But S (S Z) makes sense, because we have already established that S Z is a number, and so we can apply S to S Z.

We denote by Nat the set of objects representable in this fashion. A Nat is either Z or S applied to something that has already been established to be a Nat. This definition is implemented in Haskell as follows:

          data Nat = Z | S Nat

A vertical bar | is commonly used in computer science to mean "or" in some fashion. Here it is used to create two variants of the type Nat. Each of the two variants uses a data constructor. The two data constructors are Z and S. Z is not applied to anything in order to construct a Nat, but S has to be applied to another Nat (and cannot be applied to anything else).

Haskell does not use the representation of natural numbers we have just described. It has its own built-in representation of integers, using notation we are more familiar with: 1 + 2 is an expression whose value is 3. We are not using Haskell’s built-in representation of integers in this section, in order to examine more closely what we mean when we do arithmetic. Instead, we are using Haskell’s facilities for defining new forms of data in order to create a new representation of natural numbers.

The + operator in Haskell (and in mathematics) is an infix operator, because it appears between its two operands. The simplest way to define one’s own functions in Haskell uses prefix notation, where the name of the function appears before its operands or arguments. As we will see, Racket does not have infix operators; it uses prefix notation. You are probably familiar with prefix notation from mathematics; for example, the trigonometric function \(\sin\) is applied in a prefix fashion, as in the equation \(\sin (\pi / 2) = 1\).

This is similar to what is done with S above, but S is a data constructor which is a special kind of function. It constructs a value, a new piece of data, by applying a data constructor around zero or more other values. In general, a function can be applied to a fixed number of values (one or more) and will produce a new value via an arbitrarily complex computation.

Our immediate goal is to define a function that performs addition for our representation of natural numbers, such that the following Haskell expression will add our representations of one and two:

          plus (S Z) (S (S Z))

This expression consists of plus applied to two arguments, namely the Nats S Z and S (S Z). Once again, we have to add parentheses to eliminate ambiguity, because an expression which starts plus S Z ... will be treated as (plus S) Z ..., which does not make sense in our interpretation. The value of the above expression should be S (S (S Z)) if we define plus properly. We can make a start on this by writing down a rule in Haskell in the form of an equation.

          plus (S Z) (S (S Z)) = S (S (S Z))

This is one part of a possible definition of the function plus, which has two parameters (because it will be applied to two numbers). We have not defined it for all possible parameter values, only for two specific values. Here are some more rules to continue the possible definition.

          plus Z Z = Z

          

          plus Z (S Z) = S Z

          

          plus (S Z) (S Z) = S (S Z)

Although the above lines are legal Haskell, we are not going to define plus this way, because we would have to write an infinite number of such lines. Since we do not have infinite time (and this summary does not have infinite space), we throw out the above lines and start again.

We want a general way of defining plus x w for all Nats x and w. We know that w is either Z or (S y), where y is some Nat. This suggests the following two rules, to be completed by filling in the question marks:

          plus x Z = ?

          

          plus x (S y) = ?

We need to fill in the right-hand sides, but first a bit of explanation about the left-hand sides. The rules we wrote down earlier described what should happen when plus is applied to two specific values. These rules are going to be a bit more general. The first rule will be relevant for applications of plus when the second argument is Z. In this case, the first argument can be anything. We give it a name x, and then we can use x on the right-hand side of the first rule. Here, x on the left-hand side is a pattern, that matches any value and gives a name to it. Z on the left-hand side is also a pattern, but it only matches the value Z. Both patterns have to match for the rule to be used. So in the expressions plus Z Z and plus (S Z) Z, the first rule will be used (once we complete all rules by filling in the question marks).

The second rule has a pattern x for the first argument, but a more complicated pattern S y for the second argument. This pattern does not match any value; it fails to match Z. But it matches every other Nat, because they are all formed with the S data constructor. In the expression plus (S Z) (S Z), the first rule will not be used because the second argument is not Z. But the second rule will be used, because the pattern x matches anything and so matches the first argument S Z, and the second pattern S y matches the second argument S Z. In this case, any use of y on the right-hand side will mean Z.

Now let’s complete the rules.

          plus x Z = ?

          

          plus x (S y) = ?

The first one is fairly easy, but the second one is not so obvious.

          plus x Z = x

          

          plus x (S y) = ?

The right-hand side of the second rule seems to depend on the values of both x and y. We need to combine these in some sensible way. Here is one more useful piece of information: it’s okay if plus appears on the right-hand side of a rule also. After a bit of thought, we can complete the definition.

          plus x Z = x

          

          plus x (S y) = S (plus x y)

This is a complete definition of plus in Haskell. Let’s take a look at an example of computation, which consists of repeated applications of the rules. We will write this using chained equations as in mathematics. What is below is not legal Haskell code to be interpreted by a machine, but an explanation of the result of applying the legal definition of plus above. The first line is the legal Haskell expression that we are evaluating, and each subsequent line follows by the left-to-right application of one of the rules in the definition of plus.

          plus (S Z) (S (S Z))

          

          = S (plus (S Z) (S Z))

          

          = S (S (plus (S Z) Z))

          

          = S (S (S Z))

In the above trace, the second and third lines use the second rule in the definition of plus, and the fourth line uses the first rule. We see that plus operates as expected on a concrete example. We could do similar traces for the other examples we gave above. But how do we know that this code is correct for all possible arguments?

2.2 Proofs

The ideas in this section are due to Hermann Grassman (1861), later taken up by Richard Dedekind and Giuseppe Peano based on the logical work of Gottlob Frege.

How do we know our definition of plus corresponds to what we think of as addition? It’s hard to prove this without a formal definition of addition, and in fact our definition of plus is one of the best formal definitions of addition around. We can gain more confidence in our definition by verifying that plus has the same properties that we expect of addition.

For example, we could try to verify the following claim, which intuitively says "adding a value and two is the same as applying S twice to that value".

Claim: For all Nats x, plus x (S (S Z)) = S (S x).

Note that this is a statement about an infinite number of Nats. But we want a finite proof, using only the rules that define plus. Here is one.

Proof:

          plus x (S (S Z))

          

           = S (plus x (S Z))

          

           = S (S (plus x Z))

          

           = S (S x)

That concludes the proof.

Why is this proof convincing? If we substitute in a specific value for x (say S Z), and do this everywhere in the proof, we get a legitimate computation trace. In fact, for this particular choice, we get the trace displayed above. But any choice of value for x results in a legitimate trace. No matter what the value of x is, the second rule will be used on the expression in the first line, and it will result in the expression in the second line. The body of the proof is not a specific computation trace, but a template for creating one, into which any specific value for x can be placed.

Here we have used a method of proving a “for all Nats x” claim that we might call the anonymous method, where we treat x as a complete unknown Nat (and thus "anonymous"). If, while treating x in this way, we can prove some claim mentioning x, then it must be true no matter what the value of x is, that is, it must be true for all Nats x.

Note that we have to agree that this is a legitimate method of proof, that is, it must be part of our proof system. The anonymous method is quite common in mathematics, though it’s usually not called that; in the study of formal logic, it is known as "for-all introduction". At the risk of being obvious, here is the entire statement of the claim and proof again, using the style we expect of you in this course.

Claim: For all Nats x, plus x (S (S Z)) = S (S x).

Proof: By the anonymous method applied to x.

          plus x (S (S Z))    [LHS]

          

           = S (plus x (S Z)) [rule 2 of plus]

          

           = S (S (plus x Z)) [rule 2 of plus]

          

           = S (S x)          [rule 1 of plus, RHS]

QED (which is an abbreviation for a Latin phrase that means, roughly, "what we had to show", and can be used as a brief way of marking the end of proofs).

Note that we have specified what method we are using, and annotated the trace template so the reader can follow along. Please do this in any written proofs on assignments or exams.

We will be using the anonymous a lot, and you should get used to noticing and naming it. The anonymous method is limited in its application, however; it doesn’t always work to prove what we want. Consider the following claim, which intuitively says "Adding zero and any value produces that value".

Claim: For all Nats x, plus Z x = x.

A straightforward application of the anonymous method fails. If we consider plus Z x, we know that x is either Z or S y. In the first case, plus Z Z = Z, and the claim holds. But what about the second case?

          plus Z (S y) = S (plus Z y)

          

                       = ?

We have no good way of proceeding. Let’s try some small cases to gain intuition.

          plus Z Z = Z

          

          plus Z (S Z) = S (plus Z Z)

          

                       = S Z

          

          plus Z (S (S Z)) = S (plus Z (S Z))

          

                           = S (S (plus Z Z))

          

                           = S (S Z)

Something interesting is happening here. The proof of plus Z x = x for x = S (S Z) contains a complete copy of the proof for x = S Z. You can see this by taking the last three lines and stripping away the first expression and then the outermost S (...) from each line.

If we consider the next case, the same phenomenon occurs.

          plus Z (S (S (S Z))) = S (plus Z (S (S Z)))

          

                               = S (S (plus Z (S Z)))

          

                               = S (S (S (plus Z Z)))

          

                               = S (S (S Z))

The proof of plus Z x = x for x = S (S (S Z)) contains a complete copy of the proof for x = S (S Z).

All of this has been gaining intuition. In our definition of plus, we first wrote down some specific rules for small examples, and then we threw those away and wrote two general rules. Here we are going to throw away the above proofs and write a more general proof. But what have we learned from these examples? What is the general pattern?

More generally, we see that for any specific y, if we have a proof of plus Z y = y, then we can construct a proof that plus Z (S y) = (S y). Here’s how it goes:

          plus Z (S y)     [LHS]

          

            = S (plus Z y) [rule 2 of plus]

          

            = S y          [because plus Z y = y]

This is not a trace, but a template to produce a trace, just as we used in the proof using the anonymous method. The second line follows from the first one by the application of a single rule from the definition of plus, but to get the full trace, we would have to take the trace from the proof of plus Z y = y and put in between lines two and three, applying S to each expression.

Here we know we can get from the second line to the third line because we have a proof that plus Z y = y.

We have a proof of plus Z x = x for x = Z, and we know that if we have a proof of plus Z x = x for x = y, where y is some Nat, then we can construct a proof of plus Z (S y) = (S y). These two statements should let us conclude that plus Z x = x is true for all Nats x, because any Nat can be constructed by starting from Z and applying S some number of times.

We have discovered another way of proving a "for all Nats x" claim, which we will call the structural method.

To apply the structural method, we express the claim we wish to prove as a property P[x]. For example, P[x]  =   "plus Z x = x".

To prove "For all Nats x, P[x]" holds:
  1. Prove P[Z] holds.

  2. Prove that "For all Nats y, if P[y] then P[S y]" holds.

Intuitively, this makes sense. The two parts of the method mirror the two parts of the data definition of Nats. The first part shows that the claim holds for the simplest Nat, namely Z, and the second part shows that when we construct a new Nat from an old one, if the claim held for the old Nat, it holds for the new Nat. Since this is the only way to construct a Nat, the claim must hold for all Nats. However, as with the anonymous method, we still have to accept the structural method as a valid form of proof before using it.

Notice also that the second step asks us to prove yet another "for all" statement, and it looks more complicated than the one we set out to prove. However, it may not be. The new "for all" statement is of the form "If Q, then R". We prove such an "if-then" statement by assuming Q and using it to prove R. The hope is that this may be simpler, because Q and R look similar in this case (they are P[y] and P[S y] respectively). So the assumption that Q holds might take us most of the way towards a proof that R holds.

For clarity, here’s a complete statement and proof of the previous claim using the structural method.

Claim: For all Nats x, plus Z x = x.

Proof: By the structural method applied to x. Let P[x]   =   "plus Z x = x".

We first prove that P[Z] holds. This says that plus Z Z = Z. But this statement holds by the application of the first rule in the definition of plus to the left-hand side.

Next we prove "for all Nats y, if P[y] holds, then P[S y] holds". This says "for all Nats y, if plus Z y = y, then plus z (S y) = (S y)". We prove this by the anonymous method applied to y, which means we have to prove "if plus Z y = y, then plus z (S y) = (S y)". To prove this statement, we assume that "plus Z y = y" holds and use that to show that "plus z (S y) = (S y)" holds. The following argument does this:

          plus Z (S y)

          

            = S (plus Z y) [rule 2 of plus]

          

            = S y          [by the assumption of P[y]]

The structural method lets us conclude that "For all Nats x, P[x]" holds, as required. QED.

You will be asked to do proofs like this on assignments and exams, and marks will be deducted for poor style, so study this example closely.

There are some stylistic points you should take note of here. The just-completed proof cites each method used and what variable it applies to (e.g. "the anonymous method applied to y"). The proof also states the property P[x] explicitly, rather than leaving the reader to figure it out. There are many different logical statements used in the course of the proof, and the proof is careful to specify them exactly and to be clear which one is being worked on. When multi-line equational reasoning is used, the proof annotates each line with the reason why it is true, putting the reason in square brackets. Finally, the proof is also careful to use fresh variables when needed. For example, y was chosen because x was already in use. These touches improve clarity, and they should be part of your proofs as well.

2.3 Recursion

Sometimes structural recursion is not suited to a problem, or gives an inefficient solution, but when it works, it is preferred. Students seem to have a lot of difficulty distinguishing structural recursion from non-structural recursion, so pay careful attention to the examples in this section.

Our definition of plus happens to be well-suited to proofs using the structural method, because it uses a technique known as structural recursion. Recursion is simply the use of the function being defined (plus in this example) on both the left-hand side and the right-hand side of one or more rules in the definition. The recursion is structural because it mirrors the structure of the data definition (in this case, the definition of Nat). This data definition is also recursive, since Nat appears on both the left-hand side and the right-hand side of at least one rule.

To see why plus uses structural recursion, let’s look at the definition of Nat. It has two parts: a Nat is either Z, or it is S applied to a Nat. The definition of plus has two parts, corresponding to the two parts in the definition of Nat. The first part of the definition of Nat is that Z is a Nat. The first part of the definition of plus uses the pattern Z to see if the second argument has the value Z. So the function mirrors the data definition in this case.

The second part of the definition of Nat is that it is S applied to a Nat. The second part of the definition of plus uses the pattern (S y) to see if the second argument is S applied to something, and gives that something the name "y". So far, this is like the mirroring in the first case. But there is an additional aspect of structural recursion here: because y is a Nat, structural recursion means that when plus is applied on the right-hand side, it is applied to y.

But of course plus has to be applied to two arguments, and the first argument is also a Nat. But the pattern x in the definition of plus just gives a name to the argument, which is used as is in the application of plus on the right-hand side. This is another possibility in structural recursion.

This reasoning applies in general, not just to plus. Suppose we had a function mystery with one Nat parameter. A definition of mystery that used structural recursion would look like this:

          mystery Z = ?

          

          mystery (S x) = ... mystery x ...

For a function enigma with two Nat parameters, there are more possibilities. Here is one:

          enigma x Z = ... x ...

          

          enigma x (S y) = ... enigma x y ...

Here, there is structural recursion on the second parameter, but not the first. Our definition of plus looks like this possibility.

          plus x Z = x

          

          plus x (S y) = S (plus x y)

Here is a definition of multiplication that is also structurally recursive, using the same idea:

          times x Z = Z

          

          times x (S y)

          

            = plus x (times x y)

But when there are two Nat parameters, since structural recursion permits each argument to be either unchanged or handled according to the recursive definition of Nat, there are other possibilities. We could have structural recursion on the first parameter, but not the second:

          enigma Z y = ... y ...

          

          enigma (S x) y = ... enigma x y ...

And we could have structural recursion on both parameters:

          enigma x Z = ... x ...

          

          enigma Z y  = ... y ...

          

          enigma (S x) (S y) = ... enigma x y ...

                               ... enigma (S x) y ...

                               ... enigma x (S y)

Not all three of those possibilities on the last right-hand side might be necessary. The point is that the recursive applications of enigma have as arguments either the original unchanged arguments on the left-hand side or the values named in the structural pattern on the left-hand side.

This is perhaps made more clear by examples of recursion that are not structural. Here is a non-structural alternate definition of addition.

          add x Z = x

          

          add x (S y) = add (S x) y

This fails to be structural recursion because the first parameter x is not treated in a structural fashion. It is not unchanged on the right-hand side, but it is not treated according to the definition of Nat, like the second parameter is.

Here’s an example of a computation using this new definition.

          add (S Z) (S (S Z))

          

            = add (S (S Z)) (S Z)

          

            = add (S (S (S Z))) Z

          

            = S (S (S Z))

The definition of add looks perfectly reasonable, and it seems to work. Why should we prefer plus to add? The difficulty with add is that it is harder to prove things about it than it is for plus. Suppose, as before, we try to prove that for all Nats x, add Z x = x.

          add Z Z = Z

          

          add Z (S y)

          

            = add (S Z) y

          

            = ?

It’s not clear how to proceed. In fact, it is possible to prove the claim using the structural method, but it takes more work.

Here is another example of non-structural recursion, a definition of the function idiv2 implementing integer division by 2, rounded down (so the result of applying idiv2 to the representation of 5 would be the representation of 2).

          idiv2 Z         = Z

          

          idiv2 (S Z)     = Z

          

          idiv2 (S (S y)) = S (idiv2 y)

Again, this seems like a reasonable definition, and it appears to work (it is, in fact, correct). This is not structural recursion because there is an extra case (the second one) in the definition for when the argument is our representation of one, and because the third case uses the pattern (S (S y)) instead of (S y).

Programming assignment submissions in this course are machine-tested, and so submissions that are correct but do not use structural recursion will earn marks. However, marks will be deducted when hand-marking is used, such as on exams. Since developing correct structurally recursive code is usually easier, you should get into the habit of trying it before resorting to non-structural recursion.

Sometimes, non-structural recursion is necessary. But whenever possible, we should use structural recursion. Not only is it harder to formally prove properties of non-structurally-recursive code, it is also harder to informally reason about it, meaning that such code is more likely to contain programming errors. For that reason, assignments in this course will often require you to use structural recursion (we will discuss later how to make sure that you are using structural recursion in Racket).

We will see other recursive definitions soon, and the reasoning we used above with respect to determining whether a function definition is structurally recursive will apply to those definitions as well.

The structural method for proving properties of programs that perform computation on Nats has at its core the requirement to prove a statement of the form "For all Nats y, if P[y] holds, then P[S y] holds." To prove an "if-then" statement, we assume the "if" part, and try to reason that the "then" part holds. When developing code using structural recursion, it helps to keep this in mind, even if you don’t ever prove anything about the code. Structural recursion means that the form of any recursive application of the function you are writing is limited to a few possibilities. Just as the structural method lets you assume that the property holds for the "smaller" instance P[y], in developing code using structural recursion, you should assume that the recursive applications work as desired. Just as the structural method has you try to reason that the "then" part holds, in developing the code, you should try to figure out how the (presumed correct) result of the recursive application can be used to figure out the result for the original argument.

2.4 The standard representation

As briefly mentioned above, Haskell has its own representation of integers (which include all the natural numbers), with familiar notation (such as infix +). But we can transfer the idea of structural recursion for computation with Nats over to computation on natural numbers using Haskell’s representation.

A Nat is either Z or S applied to a Nat. By analogy, a natural number is either 0 or \(n+1\), where \(n\) is a natural number. Here is an example of structural recursion on natural numbers, converting Haskell’s representation to ours.

          toNat 0 = Z

          

          toNat (n+1) = S (toNat n)

Haskell’s built-in representation of natural numbers, unlike our definition of Nat, can handle negative numbers as well; that is, it is really a representation of integers. (We will discuss how to extend Nat to handle negative numbers later in the course.) In mathematics, we represent the natural numbers by \(\mathbb{N}\) and the integers by \(\mathbb{Z}\). toNat is a function which consumes a natural number and produces a Nat, so mathematically it would be described as \(\mathtt{toNat}:\mathbb{N} \rightarrow \mathtt{Nat}\). In Haskell, it is described using the following syntax:

          toNat :: Integer -> Nat

Type signatures are optional in Haskell, but if they are present, they are part of the program. From now on, we will add them for clarity. As we will see in lecture module 02, contracts are not part of the program in Racket, but we will also make a habit of adding them.

This description is known as a type signature, but we will often call it a contract.

The conversion in the other direction uses structural recursion on Nats.

          fromNat :: Nat -> Integer

          

          fromNat Z = 0

          

          fromNat (S x) = 1 + fromNat x

What is the contract for add? It consumes two Nats and produces a Nat. Mathematically, we would describe it as \(\mathtt{add}:\mathbb{N} \times \mathbb{N} \rightarrow \mathbb{N}\). Here is the Haskell contract:

          add :: Nat -> Nat -> Nat

Here’s a hint on the "technical reason": in reality, every Haskell function has exactly one parameter!

An arrow is placed between each argument as well. There is a technical reason for this, which we will discover in lecture module 04.

The anonymous method and the structural method work for natural numbers. To prove "For all natural numbers \(m\), P[\(m\)]" holds:
  1. Prove P[0] holds.

  2. Prove that "For all natural numbers \(n\), if P[\(n\)] then P[\(n+1\)]" holds.

The stylistic guidelines we described above for induction on Nats should be applied to induction on natural numbers, and to other forms of induction that we discuss in the course.

This is also known as mathematical induction. It is a central proof technique in mathematics, and is exercised extensively in Math 135/145. In fact, what we have called the structural method is usually called structural induction.

From a mathematical point of view, our definition of Nat and associated computations is perfect. In fact, it is the standard definition of natural numbers used in formal logic and the foundations of mathematics. But it is not a perfectly satisfactory definition from a computer science point of view. A full explanation needs to be deferred until much later (perhaps CS 146), but we can get an intuitive idea now of why it is not a perfectly satisfactory definition. Consider the following computation.

          m = toNat 1000000

          

          times m m = ?

Think about what m would look like printed out. It would take up a lot of space, much more space than the familiar number 1000000. This is an indication of an inefficient use of space within the computer in which the representation is to be used. We will not talk much about use of space in computation until CS 146, but it will hover in the background during CS 145.

Now think about how much time the multiplication would take, compared to the pencil-and-paper multiplication of 1000000 times 1000000. This is an indication of an inefficient use of time in the computation. Our representation and our algorithms for computation using that representation are inefficient. We will see, later in CS 145, how to be much more efficient, using tools and techniques we’ve already discussed.

2.5 Tuples

In the remainder of this lecture module, we’ll take a very quick tour through some other features of Haskell. Don’t worry if the ideas are coming fast and seem a bit unclear at this point; we will return for more careful study of the ideas shortly.

Each parameter in a function is one value, and the result of the function is also one value. We may wish to group two or more values for conceptual or computational purposes. For example, the mathematical notation \((4,3)\) might represent a vector in two-dimensional space, and we might wish to write a function that scales such vectors. Scaling our example by a factor of 2 gives the vector \((8,6)\). We can do this grouping with the following Haskell data definition and function definition:

          data Pair = P Integer Integer

          

          scale :: Integer -> Pair -> Pair

          

          scale s (P x y) =  P (s*x) (s*y)

The data definition provides one variant with a data constructor P that must be applied to exactly two Integers.

As an example of computation, the expression scale 2 (P 4 3) will produce the value P 8 6.

We can generalize the concept to not just pairs (or 2-tuples) of integers, but pairs containing two different data types.

          data Pair t1 t2 = P t1 t2

Here t1 and t2 are Haskell type variables, that is, they represent unknown types. The same definition of scale will still work, but now it works not only on pairs of integers, but on pairs containing any types for which the * operation is defined. (The contract is more complicated and we defer discussion of this until later.)

The notation P 4 3 is not as readable to mathematicians as (4,3), and it should not surprise you that the latter (mathematical) tuple notation is built into Haskell. Here is a redefinition of scale and a sample computation using built-in tuples.

          scale :: Integer -> (Integer, Integer)

                           -> (Integer, Integer)

          

          scale s (x,y) = (s*x, s*y)

An example of computation with the new definition would be that the expression scale 2 (4,3) evaluates to the value (8,6).

2.6 Sequences

The function scale worked only with pairs or 2-tuples. We can use the same idea to create tuples of any length, but we would have to write separate functions for each tuple-length. How can we manipulate sequences of arbitrary length, such as 4, 3, 5, 1, 4, 2?

We combine the ideas for Nat and Pair, and call the result List. With Nat, we had a special data constructor Z for zero. The corresponding data constructor for List is E, representing the list (sequence) of length zero, or the empty list.

Nat had the data constructor S to add one to an existing Nat. List has the data constructor C to add one value to an existing List. Better yet, we can use a type variable, so that we can form not only Lists of numbers, but Lists containing values of an arbitrary type.

          data List t = E | C t (List t)

The list containing only 2 is represented by C 2 E. The list 4, 2 is represented by
C 4 (C 2 E). Here the values being added are Integers, so the type of the lists being constructed is List Integer.

Just as our definition of Nat was recursive, our definition of List is recursive. A List of type t is either E, or it is C applied to an element of type t and a List of type t.

Since we have a recursive definition for List, we can speak of structural recursion on Lists. As an example, we can generalize our earlier definition of scale to scale the values in a List, if it contains integers. Scaling the list C 4 (C 2 E) by a factor of 3 should result in C 12 (C 6 E). Here’s the code, which is structurally recursive:

          scale :: Integer -> List Integer -> List Integer

          

          scale x E = E

          

          scale x (C y ys) =  C (x*y) (scale x ys)

And here’s a sample computation trace:

          scale 2 (C 4 (C 3 (C 5 E)))

          

          = C 8 (scale 2 (C 3 (C 5 E)))

          

          = C 8 (C 6 (scale 2 (C 5 E)))

          

          = C 8 (C 6 (C 10 (scale 2 E)))

          

          = C 8 (C 6 (C 10 E))

Here’s structurally recursive code to add up the numbers in a List:

          sumList :: List Integer -> Integer

          

          sumList E = 0

          

          sumList (C x xs) =  x + sumList xs

And a sample computation trace:

          sumList (C 4 (C 3 (C 5 E)))

          

          = 4 + sumList (C 3 (C 5 E))

          

          = 4 + 3 + sumlist (C 5 E)

          

          = 4 + 3 + 5 + sumList E

          

          = 4 + 3 + 5 + 0

          

          = 12

We’ve seen a function that consumes a List and produces a List of the same length (scale), and a function that consumes a List and produces an Integer. Here’s a function that consumes a List and produces a List of a different length. The idea is to produce a list with all occurrences of a given value removed from a given list. For example, to produce the list resulting from removing the value 4 from the list 4,3,4, we would use the expression remove 4 (C 4 (C 3 (C 4 E))). The result should be
C 3 E, representing the list containing only 3.

We use the structural recursion template to write the code.

          remove :: Integer -> List Integer -> List Integer

          

          remove x E = E

          

          remove x (C y ys) = ?

What to replace the ? with depends on the values of x and y. If they are equal, then the value y should not appear in the result, otherwise it should. This is expressed in Haskell as follows:

          remove x E = E

          

          remove x (C y ys) | (x == y)  = remove x ys

                            | otherwise = C y (remove x ys)

The vertical lines here introduce guards, which are additional conditions on a rule after the pattern match succeeds. Here, the first condition checks whether the values of x and y are equal (note the use of == as an infix operator for the equality test). If this check fails, the right-hand side is not used. The guard otherwise always succeeds. Here is a sample computation trace.

          remove 4 (C 4 (C 3 (C 4 E)))

          

          = remove 4 (C 3 (C 4 E))

          

          = C 3 (remove 4 (C 4 E))

          

          = C 3 (remove 4 E)

          

          = C 3 E

Once again, Haskell has built-in, more convenient notation for lists, just as it does for tuples. The empty list is represented by [], and the C data constructor (called cons in both Haskell and Racket) is the infix operator :. Here is the empty list, the list containing just 4, and the list 4,3,5.

          []

          

          4 : []

          

          4 : (3 : (5 : []))

In Haskell, the : operator associates to the right, so we can take the brackets off the last line. But there is even more concise notation to represent these values: [4] and [4,3,5] for the last two lines.

Here is sumList written using the new notation:

          sumList :: [Integer] -> Integer

          

          sumList [] = 0

          

          sumList (x:xs) =  x + sumList xs

This use of x for an element and xs for a list is idiomatic Haskell. We will use more descriptive identifiers in Racket.

Notice the new pattern (x:xs). It matches a nonempty list, giving the name "x" to the first element of the list, and the name "xs" (think: "the rest of the x’s") to the rest of the list.

There are similar mechanisms in Racket, which is not as concise and expressive as Haskell, but easier to program in for the beginner, and easier to explain rigourously. We will learn about Racket in the next lecture module.