CS 842: Functional Parsing

For Prospective Students

Parsing is the process of transforming a sequence of characters (or, more generally, tokens consisting of multiple characters) into a more complex data structure that represents deep structure. The example many will be familiar with is the parsing of a program written in a specific programming language into an abstract syntax tree that is suitable for interpretation or for generating compiled code. Parsers can also be used to manage new data formats or small domain-specific languages.

The most common way of approaching this task is to use a parser generator (typically some variant of Unix yacc or bison) which takes as input a grammar for the language with attached semantic actions and produces code that can be compiled and linked with other code that makes use of the parser. This approach (specifically, the technique of LALR parsing used in the mentioned tools) is not well understood by the majority of users. It does not work with all grammars, and it is somewhat of a black art to massage a problematic grammar. The approach is decades old, developed at a time when computers were slower and memory was much more limited.

Functional programming languages treat functions as first-class values that can be passed as parameters, stored in data structures, and created at run time. They usually minimize or eliminate mutation (changing the value of a variable), which often results in clearer code that is easier to reason about, and has benefits in concurrent and parallel environments.

In this course, we will examine the implications of the functional approach for parsing, with emphasis on alternatives to the LALR approach. If functions can be created at run time, so can parsers. When the use of recursion is common, the use of recursive descent for parsing is more natural. When functions can consume and produce other functions, complex parsers can be built up from basic ones with the aid of a suitable combinator library. We will also look at some older techniques such as Earley parsers, and some newer ones such as packrat parsing.

This is a seminar course. Students are expected to read papers from the literature, to present one or more papers to the rest of the class, and to do a final project. More details are on the Logistics page.

Students should have prior experience with at least one functional programming language (Lisp, Scheme, Racket, SML, OCaml, Haskell, Erlang, Scala, Clojure, etc.) or be willing to pick up the fundamentals very quickly. There will be some programming assignments using Racket and Haskell.