CS 860: Patterns in Words

University of Waterloo: Winter 2019

Time: Meets Tuesday and Thursday, 2:30-3:50 PM in DC 2568


Jeffrey Shallit, DC 3134, x 34804, shallit at uwaterloo.ca. Office hours: Wednesdays, 10 - 11 AM, or by appointment, or when my office door is open.

Course Description

An aperiodic pattern avoiding overlaps in 2 dimensions
This is a grad course on patterns in words (strings), suitable for graduate students and bright undergraduates in mathematics and computer science. The only prerequisite is a certain degree of mathematical sophistication. You don't need to know much, but the basics of combinatorics will help a little, as will the basics of automata theory, algorithm analysis, and complexity theory -- but these are definitely not essential!

Goals of the course

We are concerned with patterns in words. Examples of the patterns we will study include and many more. Then, given a pattern P or a set of patterns S, the kinds of questions we might ask include the following:

  1. Are there are any finite words over a finite alphabet matching or avoiding P (resp., S)? Are there finitely many? Or are there infinitely many? If infinitely many, what is the smallest alphabet size that works?

  2. Are there any infinite words avoiding P (resp., S)? One-sided, two-sided? Aperiodic words? Fixed points of uniform morphisms? Arbitrary morphisms? Automatic words? Morphic words? What is the smallest alphabet size that works? Given various classes of patterns, what is the largest class that can be avoided over an alphabet of size k?

  3. If there is an infinite word avoiding P (resp., S) over a k-letter alphabet, what is the lexicographically least such word? The lexicographically greatest? The lexicographically least or greatest starting with a given prefix?

  4. How many words avoid P (resp., S)? If only finitely many, bound the length or list them all. Otherwise, measure cardinality as a function of the length of the word. Does it grow polynomially? Exponentially? Something else? How can we find good upper and lower bounds on the growth rate? For infinite words, are there finitely many? Or countably many? Or uncountably many?

  5. What is the largest or smallest number of occurrences of the pattern P (resp., S) that can occur in a word of length n? Distinct occurrences? "Primitive" occurrences?

  6. What is the shortest word containing all possible occurrences of the pattern P (resp., S) of length n?

  7. Can we construct an infinite word containing occurrences of P (resp., S) beginning at every position? Infinitely many such occurrences?

  8. Are there morphisms on words or languages that preserve the property of avoiding P (resp., S)? If so, does the morphism have a finite test set?

  9. What are properties of the language L = { x : x avoids P} (resp., { x : x avoids S})? Is the language regular? Context-free?

  10. Algorithms: how quickly can we decide whether a given word w matches or avoids P (resp., S)? Is the problem polynomial-time solvable? NP-complete? PSPACE-complete?

  11. Given a language L specified in some way (for example, by a deterministic finite automaton or pushdown automaton M), can we decide if every element of L avoids P (resp., S)? If at least one word avoids P (resp., S)? If there is at least one such word, what is the shortest such (measured, for example, as a function of the size of M)? Can we count how many words of L avoid P (resp., S)?

  12. Given an infinite word specified in some way (for example, as the fixed point of a morphism, or fixed point of a transducer), can we compute the (limiting) frequency of occurrences of the individual letters? Words? Can we decide if it contains some factor (subword) that matches (or fails to match) P? If it contains infinitely many? How many such factors of length n are there? Do the factors that occur, occur infinitely often? Do the occurrences of factors appear with bounded gaps?

  13. What are the topological properties of the set of infinite words avoiding P (resp., S)? Is it of measure 0? Is it perfect?

  14. Generalizations of all these things: to trees, graphs, uncountable domains, and so forth.

Course textbook

Narad Rampersad and Jeffrey Shallit are writing a book on the topic of this course. We'll provide parts of the book from time to time on LEARN.


We'll use Piazza for course announcements.

Lecture summaries and notes

See here for lecture summaries and notes.

More stuff


General mathematical sophistication is the most important prerequisite. You should be very comfortable with formal mathematical proofs, especially proofs by induction. A knowledge of basic automata theory at the level of Waterloo's CS 360 or 365 would be helpful, but not essential. Knowledge of basic algebra (groups, rings, fields) at the level of Waterloo's PMATH 347/348 would be helpful, but not essential. Knowledge of basic combinatorics (generating functions, etc.) at the level of Waterloo's MATH 239/249 and CO 330 would be helpful, but not essential. The ability to program in a language like C, C++, Java, Matlab, Maple, Mathematica, or Python would also be very helpful.


There will be 3-5 problem sets, with problems of varying difficulty. Some of the problems will involve experimentation and computation. These will be worth 60% of the course mark.

There will be a final course project. This can be

  1. reading a paper or papers from the literature, writing a 5-15 page report on them, and presenting them in class in a 15-30 minute presentation; or
  2. working on an open research problem or problems, writing a 5-15 page report, and presenting your results in class in a 15-30 minute presentation; or
  3. writing or re-writing Wikipedia pages on various topics discussed in class (a list will be provided). This option is available for those who are a bit presentation-phobic.
  4. doing a programming project where you implement something generally useful to the material of the course and make it publicly available.
The project should be handed in by April 9, and will be worth 40% of the course mark. You should hand in a 1-page proposal on February 7 2019.

There will be no midterm or final.

This guide by Ian Parberry gives some tips on making a good presentation. If you are using a computer, you may want to use the 'beamer' package of latex and you should send Prof. Shallit the pdf file of your presentation by the day before the day of your presentation.

If you are very presentation-phobic, you can gain equivalent credit by creating and/or editing some Wikipedia articles on the topic of the course.

For complete details about the course project, see here.

Problem Sets

Open problems

During the course, I will mention many interesting open problems. Solve any of them for bonus marks in the course.


A basic tool that everybody in the course should know about is the On-Line Encyclopedia of Integer Sequences, available at https://oeis.org. If you explore any sequence that does not appear there, please submit it. I will give 1 bonus mark in this course for each sequence you submit to the OEIS on the subject of this course that I consider to be good (my decision is final!). Let me know the sequence number(s) by email when they appear.

Some papers arising from previous graduate courses