4 Advanced Racket
I said briefly in the Basics chapter that Racket was a family of languages. Many other languages have a small core that is used to implement a larger set of features. This is also true of Racket, but Racket goes much further, making the ability to rewrite syntax through macros available to the programmer. Most Lisp-like languages offer this, and it seems easy to implement given the similarity of quoted list notation to Racket code, but it turns out to be difficult to implement properly (without introducing complex and subtle problems).
Racket’s macro system is state-of-the-art, but it is harder to learn and use than what we’ve seen of Racket so far. The macro system makes it possible to use Racket to design another language, which may be small and simple, or large and complex. A number of such languages are included in the Racket distribution, which is why it is a family of languages. But you can write your own.
At the end of this chapter, I discuss macros quickly, and then give overviews of two major Racket languages, Typed Racket (the name is self-explanatory) and Scribble (Racket’s documentation language, in which I have written all my flâneries, including this one). But before getting to those sections, you should start thinking of new features as being implemented by macros rewriting them into parts of the language you already know. This is even true for earlier, more elementary material; it’s what I called "syntactic sugar". For example, cond is a macro that produces nested if expressions.
My apologies if some of the sections below are empty or incomplete; they are work in progress. Even when complete, they are more like advertisements than full explanations, and as always, I refer you to the The Racket Guide and The Racket Reference for proper coverage.
4.1 Iterations and Comprehensions
Imperative languages often get things done with for and while loops. Tail recursion is more general, and higher-order functions such as map abstract common patterns. Racket’s iterations and comprehensions serve a similar purpose. The following example is equivalent to (map sqr '(1 2 3 4 5)).
in-list is a sequence constructor. Racket provides them for ranges of integers, strings, vectors, hash tables, and other sources of data that can be viewed as sequences. Their primary purpose is efficient implementation. The mechanism is extensible; you can define your own.
The next two examples are alternatives for (map sqr (filter odd? '(1 2 3 4 5))) and (foldr + 0 '(1 2 3 4 5)) respectively.
> (for/list ([i (in-list '(1 2 3 4 5))] #:when (odd? i)) (sqr i)) '(1 9 25)
> (for/sum ([i (in-list '(1 2 3 4 5))]) i) 15
In these cases, the expressions using higher-order functions are shorter, but the for/X mechanism is more general, and stays more readable when examples get only slightly more complex. We’ve seen versions that accumulate iteration results into a list and that sum them up; Racket also provides variations for product, logical AND and OR, vectors, hash tables, and general folds in the style of foldl and foldr. There is also a for form that does not produce any values, but gets work done through side effects, like an imperative for-loop.
4.2 Regular Expressions
Regular expressions (REs) started out as a small algebra to describe useful sets of strings in the context of formal language theory. A character denoted the set containing just the string containing that character, and the operations were set union (denoted by the standard mathematical set-union symbol ∪), concatenation (denoted by an infix dot operator or just omitted, as with multiplication), and Kleene star (denoted by postfix or superscript *). Concatenation forms the set of strings which are a string from the set denoted from the first expression followed by a string from the set set denoted by the second expression. Kleene star is the concatenation of zero or more strings from the set denoted by the expression to which the star is added.
REs proved useful for describing word-like substrings (consecutive sequences of characters) in program text (for example, those denoting identifiers or keywords) and in specifying acceptable matches in searches on large files. Many new useful operations were proposed, for example, to denote a range of characters, such as the digits [0-9] or the letters [a-z]. REs with an extended toolkit of operations show up in certain Unix command-line utilities, searches in most modern text editors, and with either library or direct support in most modern programming languages.
In these contexts, the typographical distinction between the language of regular expressions and the characters in the strings they are discussing is lost. We need a way of saying "I mean the character asterisk, not the Kleene star operation", and this is done by using backslash (\) as an escape. \* means the character asterisk. This is further complicated by the fact that backslash is also the string escape character in Racket. For example, the Racket string containing just a single double-quote character is "\"". Since Racket uses strings to build REs, the RE escape backslash must itself be string-escaped. The Racket RE that denotes the set containing the string containing a single asterisk is "\\*", or #rx"\\*". You can tell it’s going to be easy to get these wrong.
The Racket RE literal prefix #rx isn’t necessary (you can just use the following string) but it does help to make it clear what you’re doing. These REs use the standard syntax of Posix regular expressions. Racket also supports a different syntax used in the programming language Perl.
I’m not giving many examples here because they’re pretty hard to read, especially if you’re not familiar with the syntax yet. It’s best to work with the ideas on paper in a more mathematical and user-friendly notation and then see how to translate them into the chosen syntax. There is a terse summary of the full set of operators with examples in the Racket Reference (Regexp Syntax) and an example with more explanation in the Racket Guide (An Extended Example), preceded by a lengthier and more careful introduction than I’m giving.
Once we have built a RE, what can we do with it? regexp-match consumes an RE and a string and produces a list of substrings that match the RE. It is possible to denote subpatterns within the RE, and those sub-substrings will be appropriately grouped. But you won’t know where they are. regexp-match-positions instead produces a list of pairs of start-end indices. Some common post-processing tasks are also addressed. regexp-split will treat the RE as a description of delimiters used to split the string argument, and produce a list of the resulting substrings once the delimiters are removed. regexp-replace will replace the first match with a provided replacement or transformation, and regexp-replace* wil replace all matches.
Some of the additional operators increase the power of REs, but there are many situations for which they are unsuitable, for example, arithmetic expressions with arbitrary parenthesis nesting depth, HTML, or XML. For these and other complex parsing requirements, look at the various parsing libraries provided by the main Racket distribution or in user-contributed packages.