On this page:
5.1 Sequences
5.2 Lists in Racket
5.3 Sets
5.4 Representing sets using ordered lists
5.5 List abbreviations
8.1

5 Lists

\(\newcommand{\la}{\lambda} \newcommand{\Ga}{\Gamma} \newcommand{\ra}{\rightarrow} \newcommand{\Ra}{\Rightarrow} \newcommand{\La}{\Leftarrow} \newcommand{\tl}{\triangleleft} \newcommand{\ir}[3]{\displaystyle\frac{#2}{#3}~{\textstyle #1}}\)

A structure lets us group, in a Racket program, a fixed number of values (as many as there are fields in the structure). But we have to choose that fixed number while writing the program, and changing it later can be a lot of work. It would be nice to have a single datatype with the ability to represent groups of values of varying size (different sizes in the same program, or sizes chosen dynamically as the program is running).

We managed to represent unbounded natural numbers with two types of structures, a Z structure (with no fields) representing zero and an S structure (with only one field) representing successor. What made it work was our ability to nest structures. We can use the same idea to represent a sequence of unbounded length. Essentially, we are adding a second field to the successor structure which can hold one datum (value). But we will change the names of the structures to be more suitable.

5.1 Sequences

First, let’s work with a mathematical definition. We would like to represent a sequence of values, such as \(3,5,7,5,4\) (this will be our running example). In this example and ones to follow, the values are all natural numbers for simplicity, but in general we can mix different types of values. We adapt the definition of natural number: either 0 or \(n+1\), where \(n\) is a natural number. A sequence is either the empty sequence, or it is a value followed by a sequence.

There is no agreed-upon mathematical symbol to represent the empty sequence. We will use \(\epsilon\), for empty. Adding this to our definition, we say that a sequence \(S\) is either \(\epsilon\) or it is a value \(v\) followed by a sequence \(S'\). In the case where \(S'\) is \(\epsilon\), we write \(S\) as \(v\). Otherwise, we write \(S\) as \(v,S'\).

Our example \(3,5,7,5,4\) satisfies this definition:

Here are the Racket structure definitions we will use to represent sequences.

(define-struct Empty ())
(define-struct Cons (fst rst))

The name Empty is an obvious choice, but Cons isn’t. It stands for "constructor", and is the traditional name used when talking about sequences in functional programming languages. I’ll discuss the reason for the name shortly.

Data definition: An S-list is either a (make-Empty) or it is (make-Cons v slist), where v is a Racket value and slist is an S-list.

The S-list make-Empty represents the sequence \(\epsilon\). The S-list (make-Cons v slist) represents the sequence \(v,S\) where slist represents \(S\). For example, the sequence \(4\) is represented by (make-Cons 4 (make-Empty)). The sequence \(5,4\) is represented by (make-Cons 5 (make-Cons 4 (make-Empty))).

The S-list representation of the sequence \(3,5,7,5,4\) is

(make-Cons 3
  (make-Cons 5
    (make-Cons 7
      (make-Cons 5
        (make-Cons 4
          (make-Empty))))))

If we give this value a name, we can work with it using the accessor functions for the structures we have defined.

(define my-slst (make-Cons 3 (make-Cons 5 ...)))

(Cons-fst my-slst) \(\Rightarrow\) 3

(Cons-rst my-slst) \(\Rightarrow\)
(make-Cons 5
  (make-Cons 7
    (make-Cons 5
      (make-Cons 4
        (make-Empty)))))

5.2 Lists in Racket

Having defined the S-list representation of sequences, we now abandon it, because Racket has built-in support for a very similar representation called lists. Where we have used make-Empty, Racket uses the value '(). This is a special case of a more general mechanism for shortening list constants. The predefined constant empty is bound to this value, and I will use empty for readability (until we learn about the more general mechanism).

Where we have used make-Cons, Racket uses cons. The name cons was originally used in LISP (1959), and is used in many functional program languages based on LISP, such as Scheme and Racket. Lists were the only structure-like datatype in the original LISP implementation (the language name is a shortened version of "List Processing"). Symbols also come from LISP.

Data definition: A list is either empty or it is (cons fst rst), where fst is a Racket value and rst is a list.

The Racket teaching languages (but not full Racket) enforce the restriction that the second argument to cons must be a list. A list may contain values of different types (though in our examples we will stick with natural numbers for now).

The list representation of the sequence \(3,5,7,5,4\) is

(cons 3
  (cons 5
    (cons 7
      (cons 5
        (cons 4 empty)))))

To create a list in a more concise fashion, we can use the list function.

(list 3 5 7 5 4)

Beginners often confuse cons and list, but they have different results and different purposes. We use list to construct a list of fixed size (whose length is known when we write the program). We use cons to construct a list from one new element (the first) and a list of arbitrary size (whose length is known only when the second argument to cons is evaluated during the running of the program).

We access the first element of a nonempty S-list with Cons-fst. To access the first element of a nonempty list, we can use first.

We access the rest of a nonempty S-list with Cons-rst. To access the rest of a nonempty list, we can use rest.

The list equivalent of Empty? is empty?, and the list equivalent of Cons? is cons?. Racket also provides the predicate list?.

We can develop a template from the data definition for lists in the same way that we did for Nats and natural numbers. A list lst is either empty or it is (cons fst rst), where fst is a Racket value and rst is a list. Given a nonempty list lst, fst is (first lst), and rst is (rest lst).

(define (my-list-fn lst)
  (cond
    [(empty? lst) ...]
    [(cons? lst) ... (first lst) ...
                 ... (my-list-fn (rest lst)) ...]))

As an example of using this template, let’s write the function len that computes the length of a list. First, we replace the generic function name.

(define (len lst)
  (cond
    [(empty? lst) ...]
    [(cons? lst) ... (first lst) ...
                 ... (len (rest lst)) ...]))

The length of an empty list is 0, so that’s the first answer in the cond. What about the second answer? The template suggests two quantities to work with, (first lst) and the recursive application (len (rest lst)). Of these, (first lst) isn’t going to do us much good, since the length of a list has nothing to do with any of its values, only their count. So we won’t use that.

But we can use (len (rest lst)), since this is the length of the rest of lst. The length of all of lst is one more than this, and that gives us everything we need to complete the code.

(define (len lst)
  (cond
    [(empty? lst) 0]
    [(cons? lst) (+ 1 (len (rest lst)))]))

The predefined Racket function length does the same computation. It is good practice to work out the implementation of predefined functions where possible, so that they don’t seem magical.

Here’s a condensed trace of our len function on our example list.

(len (cons 3 (cons 5 (cons 7 (cons 5 (cons 4 empty))))))
\(\Rightarrow^*\) (+ 1 (len (cons 5 (cons 7 (cons 5 (cons 4 empty))))))
\(\Rightarrow^*\) (+ 1 (+ 1 (len (cons 7 (cons 5 (cons 4 empty))))))
\(\Rightarrow^*\) (+ 1 (+ 1 (+ 1 (len (cons 5 (cons 4 empty))))))
\(\Rightarrow^*\) (+ 1 (+ 1 (+ 1 (+ 1 (len (cons 4 empty))))))
\(\Rightarrow^*\) (+ 1 (+ 1 (+ 1 (+ 1 (+ 1 (len empty))))))
\(\Rightarrow^*\) (+ 1 (+ 1 (+ 1 (+ 1 (+ 1 0)))))
\(\Rightarrow^*\) 5

Lists can also be produced by functions. The function sqr-all consumes a list of numbers and produces a list of the same length, with each element being the square of the corresponding element of the argument list.

(sqr-all (cons 4 (cons -2 (cons 3 empty))))
\(\Rightarrow\) (cons 16 (cons 4 (cons 9 empty)))

We can use the list template to develop this function.

(define (my-list-fn lst)
  (cond
    [(empty? lst) ...]
    [(cons? lst) ... (first lst) ...
                 ... (my-list-fn (rest lst)) ...]))

(define (sqr-all lst)
  (cond
    [(empty? lst) empty]
    [(cons? lst) ... (first lst) ...
                 ... (sqr-all (rest lst)) ...]))

The recursive application sqr-all (rest lst) will square the rest of the argument list. How do we produce the list that is the square of the full argument list? We square the first element of the argument list, and cons that onto the result of the recursive application. (It’s common and handy to use cons as a verb.)

(define (sqr-all lst)
  (cond
    [(empty? lst) empty]
    [(cons? lst) (cons (sqr (first lst))
                       (sqr-all (rest lst)))]))

Here’s a condensed trace of the sqr-all function.

(sqr-all (cons 4 (cons -2 (cons 3 empty))))
\(\Rightarrow^*\) (cons 16 (sqr-all (cons -2 (cons 3 empty))))
\(\Rightarrow^*\) (cons 16 (cons 4 (sqr-all (cons 3 empty))))
\(\Rightarrow^*\) (cons 16 (cons 4 (cons 9 (sqr-all empty))))
\(\Rightarrow^*\) (cons 16 (cons 4 (cons 9 empty)))

How would we write the contract for len? len consumes a list of elements of any type and produces a number.

We could use the following notation:
; len: (Listof Any) -> Number

In general, for any type T, a list of elements of type T would have type (Listof T). Here’s the contract for sqr-all.

; sqr-all: (Listof Number) -> (Listof Number)

For another example of a function producing a list, the pos-elts function consumes a list of numbers and produces a list of the positive elements of the argument list in the same order.

(pos-elts (cons 4 (cons -2 (cons 3 empty))))
\(\Rightarrow\) (cons 4 (cons 3 empty))

Once again, we use the template to guide development.

(define (pos-elts lst)
  (cond
    [(empty? lst) ...]
    [(cons? lst) ... (first lst) ...
                 ... (pos-elts (rest lst)) ...]))

All of the elements in what the recursive application (pos-elts (rest lst)) produces are in the full result. What about (first lst)? It is only in the full result if it is positive, so we need to test for this.

(define (pos-elts lst)
  (cond
    [(empty? lst) empty]
    [(cons? lst)
       (cond
         [(positive? (first lst))
             ... (pos-elts (rest lst)) ...]
         [else
             ... (pos-elts (rest lst)) ...])]))

In the first case, the recursive application is the rest of the result, and (first lst) is the first of the result. To construct the first result, we simply use cons. In the second case, the recursive application produces the full result.

(define (pos-elts lst)
  (cond
    [(empty? lst) empty]
    [(cons? lst)
       (cond
         [(positive? (first lst))
            (cons (first lst) (pos-elts (rest lst)))]
         [else
             (pos-elts (rest lst))])]))

Here’s a condensed trace of an application of the pos-elts function.

(pos-elts (cons 4 (cons -2 (cons 3 empty))))
\(\Rightarrow^*\) (cons 4 (pos-elts (cons -2 (cons 3 empty))))
\(\Rightarrow^*\) (cons 4 (pos-elts (cons 3 empty)))
\(\Rightarrow^*\) (cons 4 (cons 3 (pos-elts empty)))
\(\Rightarrow^*\) (cons 4 (cons 3 empty))

We saw how to add a second parameter to the template for Nats. We can do the same thing for lists.

(define (my-list-fn lst1 lst2)
  (cond
    [(empty? lst1) ... lst2 ...]
    [(cons? lst1) ... (first lst1) ... lst2 ...
                 ... (my-list-fn (rest lst1) lst2) ...]))

The first example we will work out is the app function, which consumes two lists and produces the list with all of the elements of the first list followed by all of the elements of the second list.

(app (cons 3 (cons 5 empty))
     (cons 7 (cons 5 (cons 4 empty))))
\(\Rightarrow\)
(cons 3 (cons 5 (cons 7 (cons 5 (cons 4 empty)))))

When we start to work with the template, the empty? case is straightforward.

(define (app lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(cons? lst1) ... (first lst1) ... lst2 ...
                  ... (app (rest lst1) lst2) ...]))

The recursive application (app (rest lst1) lst2) produces the rest of the full result. We simply have to cons (first lst) onto it.

(define (app lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(cons? lst1)
       (cons (first lst1) (app (rest lst1) lst2))]))

Racket’s predefined append function does the same computation.

Here’s a condensed trace of our example application of app.

(app (cons 3 (cons 5 empty))
     (cons 7 (cons 5 (cons 4 empty))))
\(\Rightarrow\)
(cons 3
     (app (cons 5 empty)
          (cons 7 (cons 5 (cons 4 empty)))))
\(\Rightarrow^*\)
(cons 3
      (cons 5 (app empty
                   (cons 7 (cons 5 (cons 4 empty))))))
\(\Rightarrow^*\)
(cons 3 (cons 5 (cons 7 (cons 5 (cons 4 empty)))))

In developing this function, we did not need to deconstruct the second list parameter in any way. It was "along for the ride". But for a different function consuming two lists, we might choose to do structural recursion on the second parameter, or on both parameters.

Exercise 13: Write the Racket function substitute, which consumes two Racket values x and y and a list lst, and produces a version of lst with every occurrence of x replaced with y. Your function should work for arbitrary values that can be compared for equality (not just numbers). As an example, (substitute 3 "three" (list "four" 3 4 "three" 3)) should produce (list "four" "three" 4 "three" "three"). \(\blacksquare\)

Exercise 14: Write the Racket function unique-left, which consumes a list of elements and produces a list which has all but the leftmost occurrence of each value removed. For example, (unique-left (list 1 4 2 1 5 4)), should evaluate to (list 1 4 2 5). The list may contain different types of values comparable for equality, not just numbers.

Use structural recursion on lists for both functions and for all recursive helper functions. Use only function definition, function application, cond, empty?, cons, cons?, first, rest, and equal? to answer this question. In fact, if you use the template for structural recursion on lists, which includes one cond, one empty?, and one cons?, you do not need additional uses of these. \(\blacksquare\)

Exercise 15: Write the Racket function unique-right, which consumes a list of elements and produces a list which has all but the rightmost occurrence of each value removed. That is, the result of applying it to (list 1 4 2 1 5 4) would be (list 2 1 5 4). Your code for unique-right should not use unique-left from the exercise above; treat these as independent problems (though they may share helper functions). \(\blacksquare\)

Exercise 16: Write the function day-within-year that consumes three numbers year, month, and day, and produces the day of the year that the date (assumed to be a valid date) falls on in our modern Gregorian calendar. For example, September 11, 2001 was the 254th day of 2001 (which was not a leap year), so (day-within-year 2001 9 11) evaluates to 254.

In a leap year, the second month has 29 days instead of 28. Years divisible by 4 are leap years, but years divisible by 100 are not, but years divisible by 400 are. If you use recursion, use structural recursion.

You may find the following definition useful. Using it can considerably shorten the amount of code that you have to write.

(define days-in-month (list 31 28 31 30 31 30 31 31 30 31 30 31))
\(\blacksquare\)

5.3 Sets

For an extended example, we’ll use lists to represent sets. Sets resemble sequences, but no repetition of values (elements) is allowed, and the order in which values appear is unimportant. For example, \(\{4,2,3\}\) is the same set as \(\{2,3,4\}\). We often write sets with their values in order, but this is just for the convenience of the reader. Once again, I’ll use natural number values for convenience, but in general sets can contain values of different types.

Let’s briefly review some concepts and notation for sets. The statement "4 is an element of \(\{2,3,4\}\)" is written \(4 \in \{2,3,4\}\). \(5 \not\in \{2,3,4\}\) means "5 is not an element of \(\{2,3,4\}\)". A set \(S\) is a subset of a set \(T\) if every element of \(S\) is an element of \(T\). We write \(S \subseteq T\) to express this. For example, \(\{2,4\} \subseteq \{2,3,4\}\) \(\{3,5\} \not\subseteq \{2,3,4\}\) means "\(\{3,5\}\) is not a subset of \(\{2,3,4\}\)".

We will implement a number of useful operations on sets. The union of two sets \(S\) and \(T,\) written \(S\cup T\), contains all elements that are in either set.

\(S\cup T = \{e \mid e \in S \mathrm{~or~} e \in T\}\)

For example, \(\{3,5\} \cup \{2,5\} = \{2,3,5\}\).

The intersection of two sets \(S\) and \(T\), written \(S\cap T\), contains all elements that are in both sets.

\(S\cap T = \{e \mid e \in S \mathrm{~and~} e \in T\}\)

For example, \(\{3,5\} \cap \{2,5\} = \{5\}\).

We will discuss two different representations of sets using lists. Our first representation will store the elements of the set in a list, in an arbitrary order. For example, the set \(\{2,3,4\}\) might be represented by the list (cons 4 (cons 2 (cons 3 empty))). The representation is not unique. The same set could be represented by the list (cons 2 (cons 3 (cons 4 empty))).

We can develop Racket functions implementing the various set predicates and operations we have discussed, using the template for list functions.

The elem-of function consumes a value v and a list lst representing a set, and produces #true if the value is in the set; otherwise, it produces #false.

We start with the list template and fill it in.

(define (my-list-fn lst)
  (cond
    [(empty? lst) ...]
    [(cons? lst) ... (first lst) ...
                 ... (my-list-fn (rest lst)) ...]))

(define (elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst) ... (first lst) ... v
                 ... (elem-of v (rest lst)) ...]))

In this case, we don’t always need to do the recursive application. If (first lst) is equal to v, then we immediately know that the answer is #true. Only if these are not equal do we need to do the recursive application, in which case what it produces is our final answer.

(define (elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst)
       (cond
         [(equal? (first lst) v) true]
         [else (elem-of v (rest lst))])]))

We can simplify the function in various ways.

(define (elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst)
       (or (equal? (first lst) v)
           (elem-of v (rest lst)))]))

(define (elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst)
       (or (equal? (first lst) v)
           (elem-of v (rest lst)))]))

(define (elem-of v lst)
  (and (not (empty? lst))
       (or
         (equal? (first lst) v)
         (elem-of v (rest lst)))))

Preferring one of these is really a matter of taste. Personally, I find the original version that doesn’t use Boolean functions to be more readable. But there are times when the use of Boolean functions can improve readability.

The subset function consumes two lists lst1 and lst2 representing sets, and produces true if and only if the first set is a subset of the second set. Once again, we start with the template for structural recursion on the first of two list parameters.

(define (subset lst1 lst2)
  (cond
    [(empty? lst1) ... lst2 ..]
    [(cons? lst1) ... (first lst1) ... lst2
                  ... (subset (rest lst1) lst2) ...]))

The empty set is a subset of every other set, which gives us the answer in the empty? case.

(define (subset lst1 lst2)
  (cond
    [(empty? lst1) true]
    [(cons? lst1) ... (first lst1) ... lst2
                  ... (subset (rest lst1) lst2) ...]))

Is there a situation where we don’t have to do the recursive application? If (first lst1) is not an element of lst2, then we know that the full result is false. But if it is, then the full result is what the recursive application produces.

(define (subset lst1 lst2)
  (cond
    [(empty? lst1) true]
    [(cons? lst1)
       (cond
         [(elem-of (first lst1) lst2)
            (subset (rest lst1) lst2)]
         [else false])]))

Once again, various simplifications are possible.

(define (subset lst1 lst2)
  (cond
    [(empty? lst1) true]
    [(cons? lst1)
       (cond
         [(elem-of (first lst1) lst2)
             (subset (rest lst1) lst2)]
         [else false])]))

(define (subset lst1 lst2)
  (cond
    [(empty? lst1) true]
    [(elem-of (first lst1) lst2)
       (subset (rest lst1) lst2)]
    [else false]))

The union function consumes two lists lst1 and lst2 representing sets, and produces a list representing the union of the two sets. This time, I’ll let you fill in the reasoning between each successive refinement of the template.

(define (union lst1 lst2)
  (cond
    [(empty? lst1) ... lst2 ...]
    [(cons? lst1) ... (first lst1) ... lst2
                  ... (union (rest lst1) lst2) ...]))
 
(define (union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(cons? lst1)
       (cond
         [(elem-of (first lst1) lst2)
            (union (rest lst1) lst2)]
         [else ...])]))

(define (union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(cons? lst1)
       (cond
         [(elem-of (first lst1) lst2)
            (union (rest lst1) lst2)]
         [else (cons (first lst1)
                     (union (rest lst1) lst2))])]))

The implementation we have developed seems inefficient. Consider the computation of the elem-of function when the value does not appear in the set.

(define (elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst)
       (or (equal? (first lst) v)
           (elem-of v (rest lst)))]))

(elem-of 4 (list 1 2 3)) \(\Rightarrow^*\) false

The number of recursive applications of elem-of is equal to the size of the set.

Now consider the computation of the union function when the two sets do not share any values.

(define (union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(cons? lst1)
       (cond
         [(elem-of (first lst1) lst2) (union (rest lst1) lst2)]
         [else
           (cons (first lst1) (union (rest lst1) lst2))])]))

(union (list 1 2 3) (list 4 5 6 7))
\(\Rightarrow^*\) (list 1 2 3 4 5 6 7)

The number of recursive applications of union is the size of the set represented by lst1. Each application of elem-of in union results in a number of recursive applications of elem-of that is the size of the set represented by lst2. Thus the work done is at least the product of the sizes of the two sets. Can we do better?

5.4 Representing sets using ordered lists

It is more natural for humans to see \(\{2,3,4\}\) instead of \(\{4,2,3\}\). Suppose we decide that the set \(\{4,2,3\}\) must be represented by (list 2 3 4) and not (list 4 2 3). It doesn’t matter to the computer, but can this representation be more efficient? It certainly is for human beings.

We start, as before, with the elem-of function. To avoid confusion, we will prefix the names of functions using the new representation with o-, as in o-elem-of. Once again, the empty? case is straightforward.

(define (o-elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst) ... (first lst) ... v ...
                 ... (o-elem-of v (rest lst)) ...]))

Here’s where we need to use the idea that the elements of the set can be ordered. For the time being, we’ll assume the elements are numbers, and use numerical order. Later, we’ll discuss what to do about other kinds of elements. We have two numerical values, (first lst) and v, and three possible orderings.

(define (o-elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst)
       (cond
         [(< (first lst) v)
            ... (o-elem-of v (rest lst)) ...]
         [(= (first lst) v)
            ... (o-elem-of v (rest lst)) ...]
         [(> (first lst) v)
            ... (o-elem-of v (rest lst)) ...])]))

Once again, we might not need to do the recursive application. In the first case, we do, and the full result is what the recursive application produces. In the second case, we have an answer, since we have found an occurrence of v in the list. What about the third case? If v is less than the first element in the list, then it is less than all the elements in the list, since the list is ordered, and thus v cannot occur in the list. So once again we have an answer.

(define (o-elem-of v lst)
  (cond
    [(empty? lst) false]
    [(cons? lst)
       (cond
         [(< (first lst) v) (o-elem-of v (rest lst))]
         [(= (first lst) v) true]
         [(> (first lst) v) false])]))

Are ordered lists a more efficient set representation? Consider the computations
(elem-of 1 (list 2 3 4 5 6 7 8 9))
(o-elem-of 1 (list 2 3 4 5 6 7 8 9))

The first one recursively applies elem-of nine times, but the second one does no recursive applications. On the other hand, we can describe two other computations such that each function does nine recursive applications.

(elem-of 10 (list 2 3 4 5 6 7 8 9))
(o-elem-of 10 (list 2 3 4 5 6 7 8 9))

So there is a possible saving in this particular operation, but it is not guaranteed. However, for some of the other operations, we are guaranteed to do much better. This will happen for the union operation.

(define (o-union lst1 lst2)
  (cond
    [(empty? lst1) ... lst2 ...]
    [(cons? lst1) ... (first lst1) ...  lst2 ...
                  ... (o-union (rest lst1) lst2) ...]))

If lst1 is empty, then the union is lst2.

(define (o-union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(cons? lst1) ... (first lst1)
                  ... (first lst2) ... (rest lst2) ...
                  ... (o-union (rest lst1) lst2) ...]))

Previously, when writing union for unordered lists, we used elem-of to check whether (first lst1) was in lst2. But we’ve seen that o-elem-of isn’t necessary more efficient than elem-of. So this might not be fruitful. Instead, let’s consider all the possible structural recursions, that is, possibly making the second argument lst2 smaller in the recursive application.

(define (o-union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(empty? lst2) lst1]
    [(cons? lst1) ... (first lst1) ...  (first lst2) ...
                  ... (o-union (rest lst1) lst2) ...
                  ... (o-union lst1 (rest lst2)) ...
                  ... (o-union (rest lst1) (rest lst2)) ...]))

We may not use all of these. As before, we have two numerical values, (first lst1) and (first lst2), and three orderings.

(define (o-union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(empty? lst2) lst1]
    [(cons? lst1)
       (cond
         [(< (first lst1) (first lst2))
            ... (o-union (rest lst1) lst2) ...
            ... (o-union lst1 (rest lst2)) ...
            ... (o-union (rest lst1) (rest lst2)) ...]
         [(= (first lst1) (first lst2)) ...]
         [(> (first lst1) (first lst2)) ...])]))

We have to remember that the result must be an ordered list. In the first case, (first lst1) must be in the result, and it must be the first element in the result, since it is smaller than everything else in lst1 and smaller than everything in lst2. What is the rest of the result? It is the union of the rest of lst1 with all of lst2. That is, of the three possible recursive applications, we only need the first one.

(define (o-union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(empty? lst2) lst1]
    [(cons? lst1)
       (cond
         [(< (first lst1) (first lst2))
            (cons (first lst1) (o-union (rest lst1) lst2))]
         [(= (first lst1) (first lst2)) ...]
         [(> (first lst1) (first lst2)) ...])]))

The reasoning in the other two cases is similar. In the equality case, we only need one copy of the value in the output, but we remove the value from each argument list, so the recursion is on (rest lst1) and (rest lst2). The third case is symmetric to the first case.

(define (o-union lst1 lst2)
  (cond
    [(empty? lst1) lst2]
    [(empty? lst2) lst1]
    [(cons? lst1)
       (cond
         [(< (first lst1) (first lst2))
            (cons (first lst1) (o-union (rest lst1) lst2))]
         [(= (first lst1) (first lst2))
            (cons (first lst1) (o-union (rest lst1) (rest lst2)))]
         [(> (first lst1) (first lst2))
            (cons (first lst2) (o-union lst1 (rest lst2)))])]))

Is this more efficient? In each recursive application, at least one of the lists is smaller. Thus the total number of recursive applications is bounded above by the sum of the sizes of the sets, rather than the product of the sizes. Using ordered lists to represent sets improves the efficiency of the union operation.

Here’s the full template for structural recursion on two lists.

(define (my-list-fn lst1 lst2)
  (cond
    [(empty? lst1) ... lst2 ...]
    [(empty? lst2) ... lst1 ...]
    [(cons? lst1)
        ... (first lst1) ... (first lst2) ...
        ... (my-list-fn (rest lst1) lst2) ...
        ... (my-list-fn lst1 (rest lst2)) ...
        ... (my-list-fn (rest lst1) (rest lst2)) ...]))

We can develop similar templates for functions that do structural recursion on a list and a natural number, or on two natural numbers.

What if we want to represent sets of points? We need to define an ordering on points in order to use ordered lists. Is \((3,4) < (1,6)\)? There is no natural ordering on points. The ordering doesn’t have to make sense; we only use it for computational advantage.

For this purpose, we can use lexicographic order. "Lex" means word in Latin, and this is also known as "dictionary order". For points, lexicographic order means \((x_1,y_1) < (x_2, y_2)\) if and only if \(x_1 < x_2\), or \(x_1=x_2\) and \(y_1 < y_2\). This is easy to compute and gives us an efficient representation of sets of points.

We can extend this idea to other structures, lists, and mixtures of types. My flânerie Functional Data Structures (FDS) discusses even more efficient set representations. Here I have avoided the use of order notation or \(O\)-notation, but it is common in discussing efficiency, and in FDS I reluctantly adopt it, after a lengthy examination of its shortcomings.

Exercise 17: Develop a general template for structural recursion on one list parameter and one natural number parameter. Use it to write the Racket function keep-first, which consumes a list lst and a natural number n and produces the list consisting of the first n elements of lst. If n is greater than the length of lst, then all of lst is produced. \(\blacksquare\)

Exercise 18: Write the Racket function drop-first, which consumes a list lst and a natural number n and produces the list consisting of all elements of lst except for the first n elements. If n is greater than the length of lst, then the empty list is produced. Use the template developed in the previous exercise. \(\blacksquare\)

Exercise 19: Write the Racket function sublist, which consumes a list lst and two natural numbers n and m, and produces the list consisting of the elements of index greater than or equal to n but less than m (counting from zero). Do not use recursion; instead, use keep-first and drop-first from the previous two exercises as helper functions. \(\blacksquare\)

Exercise 20: Write the Racket functions intersection and subset for sets of numbers represented by ordered lists. These should show the same efficiency gains that we saw for union, relative to the unordered list representation. \(\blacksquare\)

Exercise 21: Write the Racket function to-ordered, which consumes a not-necessarily-ordered list and produces an ordered list containing the same values. Use structural recursion for this function and any helper functions that you might write. \(\blacksquare\)

5.5 List abbreviations

It’s time to move up to the next language level. Beginning Student With List Abbreviations (BSL+) provides concise alternatives for working with lists.

Beginning Student already provides list and a few other shortcuts.

(cons 1 (cons 2 (cons 3 empty))) is abbreviated by (list 1 2 3).

(second my-list) is an abbreviation for (first (rest my-list)).

third, fourth, and so on up to eighth are also defined.

Use these sparingly to improve readability.

Earlier we saw that putting a quote in front of something that looks like an identifier makes it into a symbol: red becomes 'red. Numbers quote to themselves: '4 is just 4. In BSL+, putting a quote in front of several items enclosed in parentheses makes a list of each item quoted.

'(red 4 blue) is the same as (list 'red 4 'blue).

Now the value representing the empty list, '(), makes more sense in context.

These three two-element lists are all the same. Each element is itself a two-element list.

(cons (cons 1 (cons 2 empty))
      (cons (cons 3 (cons 4 empty))
            empty))

(list (list 1 2) (list 3 4))

'((1 2) (3 4))

Quote notation really starts to pay off when dealing with lists containing other lists.

Exercise 22: For each of the following, write out an answer on paper, and then check it using the Interactions window in BSL+.

  • What is (first '((1 2) (3 4)))?

  • What is (rest '((1 2) (3 4)))?

  • What is the difference, if any, between '(1 (+ 1 2) 5) and (list 1 (+ 1 2) 5)?

\(\blacksquare\)

The following three exercises are poster children for the design recipe and the use of structural recursion. If you do not use these techniques, the exercises become much harder. Resist the temptation to look up solutions on the Internet; what you will find will probably be designed for languages that do not have the advantages that Racket does, and inevitably the algorithms will be messier, harder to understand, and difficult or impossible to adapt to the Racket teaching languages. These are among my favourite exercises in functional programming, and I highly recommend them.

Exercise 23: Write the Racket function subsets that consumes a list lst representing a set of distinct elements (not necessarily numbers) and produces a list of lists representing the subsets of lst, where the elements of each subset are in the same order that they appear in lst.

For example, (subsets '(3 1)) might produce '(() (3) (1) (3 1)), or it might produce '((3 1) (1) () (3)). Either answer is considered correct; the order of the subset-lists is not important.

Use structural recursion on lists for subsets and any helper functions that you may write. Use only cond, cons, first, rest, empty?, empty, and append. Do not use Racket numbers (except maybe as elements in your test cases) or arithmetic. \(\blacksquare\)

Exercise 24: Write the Racket function comb (for combinations) that consumes a list lst representing a set using the ordered list representation, and a natural number n, and produces a list of lists representing the size-\(n\) subsets of lst. For example, when comb is applied to '(1 2 3) and 2, it might produce '((1 2) (1 3) (2 3)). As in the previous exercise, the order of the sub-lists is not important.

Use structural recursion on one list parameter and one natural number parameter; think carefully about the base cases, including their ordering. Do not use your solution to the previous exercise; you should not generate all subsets. Further, you should not generate any subsets of size greater than \(n\), you should not use the predefined length function or write your own, and you should not need any arithmetic except for what is used in the template for structural recursion on natural numbers. \(\blacksquare\)

Exercise 25: Write a function perms that consumes a list of distinct values and produces a list of all permutations of those values. For example, perms applied to '(1 2 3 4) could produce:

'((1 2 3 4) (2 1 3 4) (2 3 1 4) (2 3 4 1)
  (1 3 2 4) (3 1 2 4) (3 2 1 4) (3 2 4 1)
  (1 3 4 2) (3 1 4 2) (3 4 1 2) (3 4 2 1)
  (1 2 4 3) (2 1 4 3) (2 4 1 3) (2 4 3 1)
  (1 4 2 3) (4 1 2 3) (4 2 1 3) (4 2 3 1)
  (1 4 3 2) (4 1 3 2) (4 3 1 2) (4 3 2 1))

As in the previous two exercises, the order of the sublists in the result is not important.

Use structural recursion on a single list parameter for perms and all recursive helper functions (these may have additional parameters "along for the ride"). Use only function definition, function application, cond, empty?, cons, cons?, first, and rest to answer this question. In fact, if you use the template for structural recursion on lists, which includes one cond, one empty?, and one cons?, you do not need additional uses of these. Do not use numbers and arithmetic. The example above uses integer elements, but the argument list can contain values of many different types. Think carefully about the base case for each function you write, and do not add additional base cases beyond the empty-list case. \(\blacksquare\)