CS 860: Patterns in Words

University of Waterloo: Winter 2019

Time: Meets Tuesday and Thursday, 2:30-3:50 PM in DC 2568

Instructor

Jeffrey Shallit, DC 3134, x 34804, shallit at uwaterloo.ca. Office hours: TBA

Course Description

An aperiodic pattern avoiding overlaps in 2 dimensions
This is a grad course on patterns in words (strings), suitable for graduate students and bright undergraduates in mathematics and computer science. The only prerequisite is a certain degree of mathematical sophistication. You don't need to know much, but the basics of combinatorics will help a little, as will the basics of automata theory, algorithm analysis, and complexity theory -- but these are definitely not essential!

Goals of the course

We are concerned with patterns in words. Examples of the patterns we will study include and many more. Then, given a pattern P or a set of patterns S, the kinds of questions we might ask include the following:

  1. Are there are any finite words over a finite alphabet matching or avoiding P (resp., S)? Are there finitely many? Or are there infinitely many? If infinitely many, what is the smallest alphabet size that works?

  2. Are there any infinite words avoiding P (resp., S)? One-sided, two-sided? Aperiodic words? Fixed points of uniform morphisms? Arbitrary morphisms? Automatic words? Morphic words? What is the smallest alphabet size that works? Given various classes of patterns, what is the largest class that can be avoided over an alphabet of size k?

  3. If there is an infinite word avoiding P (resp., S) over a k-letter alphabet, what is the lexicographically least such word? The lexicographically greatest? The lexicographically least or greatest starting with a given prefix?

  4. How many words avoid P (resp., S)? If only finitely many, bound the length or list them all. Otherwise, measure cardinality as a function of the length of the word. Does it grow polynomially? Exponentially? How can we find good upper and lower bounds? For infinite words, are there finitely many? Or countably many? Or uncountably many?

  5. What is the largest or smallest number of occurrences of the pattern P (resp., S) that can occur in a word of length n? Distinct occurrences? "Primitive" occurrences?

  6. What is the shortest word containing all possible occurrences of the pattern P (resp., S) of length n?

  7. Can we construct an infinite word containing occurrences of P (resp., S) beginning at every position? Infinitely many such occurrences?

  8. Are there morphisms on words or languages that preserve the property of avoiding P (resp., S)? If so, does the morphism have a finite test set?

  9. What are properties of the language L = { x : x avoids P} (resp., { x : x avoids S})? Is the language regular? Context-free?

  10. How quickly can we decide whether a given word w matches or avoids P (resp., S)? Is the problem polynomial-time solvable? NP-complete? PSPACE-complete?

  11. Given a language L specified in some way (for example, by a deterministic finite automaton or pushdown automaton M), can we decide if every element of L avoids P (resp., S)? If at least one word avoids P (resp., S)? If there is at least one such word, what is the shortest such (measured, for example, as a function of the size of M)? Can we count how many words of L avoid P (resp., S)?

  12. Given an infinite word specified in some way (for example, as the fixed point of a morphism, or fixed point of a transducer), can we compute the frequency of occurrences of letters? Can we decide if it contains some factor (subword) that matches (or fails to match) P? If it contains infinitely many? How many such factors of length n are there? Do the factors that occur, occur infinitely often? Do the occurrences of factors appear with bounded gaps?

  13. What are the topological properties of the set of infinite words avoiding P (resp., S)? Is it of measure 0? Is it perfect?

  14. Generalizations of all these things: to trees, graphs, uncountable domains, and so forth.

Course textbook

Narad Rampersad and Jeffrey Shallit are writing a book on the topic of this course. We'll provide parts of the book from time to time in a password-protected directory.

More stuff

Prerequisites

General mathematical sophistication is the most important prerequisite. You should be very comfortable with formal mathematical proofs, especially proofs by induction. A knowledge of basic automata theory at the level of Waterloo's CS 360 or 365 would be helpful, but not essential. Knowledge of basic algebra (groups, rings, fields) at the level of Waterloo's PMATH 347/348 would be helpful, but not essential. Knowledge of basic combinatorics (generating functions, etc.) at the level of Waterloo's MATH 239/249 and CO 330 would be helpful, but not essential.

Evaluation

There will be 3 problem sets, with problems of varying difficulty.

There will be a final course project. This can be

  1. reading a paper or papers from the literature, writing a 5-15 page report on them, and presenting them in class in a 15-30 minute presentation; or
  2. working on an open research problem or problems, writing a 5-15 page report, and presenting your results in class in a 15-30 minute presentation; or
  3. writing or re-writing Wikipedia pages on various topics discussed in class (a list will be provided). This option is available for those who are a bit presentation-phobic.
The project should be handed in on March 30.

There will be no midterm or final.

This guide by Ian Parberry gives some tips on making a good presentation. If you are using a computer, you may want to use the 'beamer' package of latex and you should send Prof. Shallit the pdf file of your presentation by the day before the day of your presentation.

If you are very presentation-phobic, you can gain equivalent credit by creating and/or editing some Wikipedia articles on the topic of the course. We're looking for articles on the following topics:

Dejean's conjecture (you will have to create it)
Combinatorics on words -- the current article is a real mess and needs some drastic rewriting
Squarefree word
Sturmian word
Lyndon word
Fibonacci word
Rich word
Partial word
Periodicity
Quasiperiodicity
Primitive word
Lovász local lemma

Open problems

During the course, I will mention many interesting open problems. Solve any of them for bonus marks in the course.

Sequences

A basic tool that everybody in the course should know about is the On-Line Encyclopedia of Integer Sequences, available at https://oeis.org. If you explore a sequence that does not appear there, please submit it.

Some papers arising from previous graduate courses