CS 860: Patterns in Words
University of Waterloo: Winter 2019
Time: Meets Tuesday and Thursday, 2:303:50 PM in DC 2568
Instructor
Jeffrey Shallit, DC 3134, x 34804, shallit at uwaterloo.ca. Office
hours: Wednesdays, 10  11 AM, or by appointment, or when my office door is open.
Course Description

An aperiodic pattern avoiding overlaps in 2 dimensions 
This is a grad course on patterns in words (strings), suitable for
graduate students and bright undergraduates
in mathematics and computer science. The only
prerequisite is a certain degree of mathematical sophistication.
You don't need
to know much, but the basics of combinatorics will help a little, as
will the basics of automata theory, algorithm analysis, and complexity theory
 but these are definitely not essential!
Goals of the course
We are concerned with patterns in words. Examples of the patterns we will study
include
 squares: nonempty words of the form xx
 overlaps: words of the form axaxa, where a is a single
letter and x is a possibly empty word
 antisquares: binary words of the form xx', where x' is
the complement of x
 cubes: nonempty words of the form xxx
 arbitrary powers: nonempty words of the form x^{α}
for a rational number α
 nonpowers: the primitive words
 bordered and unbordered words
 abelian squares: nonempty words of the form x x' where
x' is a permutation of x
 abelian powers
 additive squares: nonempty words of the form x x' where
x = x' and ∑ x = ∑ x'
 additive cubes and higher powers
 palindromes (nonempty words that equal their reverse)
 antipalindromes
 quasiperiodic words
 kabelian repetitions
 balanced words
 Lyndon words
 privileged words
 closed words
 rich words
 trapezoidal words
 antipowers: t consecutive distinct identicallength
blocks
and many more.
Then, given a pattern P or a set of patterns S, the
kinds of questions we might ask include the following:
 Are there are any finite words over a finite alphabet matching or
avoiding P (resp., S)? Are there finitely many? Or are there
infinitely many? If infinitely many, what is the smallest alphabet
size that works?
 Are there any infinite words avoiding P (resp., S)?
Onesided,
twosided? Aperiodic words? Fixed points of uniform morphisms?
Arbitrary morphisms? Automatic words? Morphic words? What is the
smallest alphabet size that works? Given various classes of patterns,
what is the largest class that can be avoided over an alphabet of size k?
 If there is an infinite word avoiding P (resp., S)
over a kletter
alphabet, what is the lexicographically least such word? The
lexicographically greatest? The lexicographically least or greatest
starting with a given prefix?
 How many words avoid P (resp., S)?
If only finitely many, bound
the length or list them all. Otherwise, measure cardinality as a
function of the length of the word. Does it grow polynomially?
Exponentially? Something else?
How can we find good upper and lower bounds on the growth rate? For
infinite words, are there finitely many? Or countably many? Or
uncountably many?
 What is the largest or smallest number of occurrences of the
pattern P (resp., S) that can occur in a word of length
n? Distinct occurrences? "Primitive" occurrences?
 What is the shortest word containing all possible occurrences of
the pattern P (resp., S) of length n?
 Can we construct an infinite word containing occurrences of P
(resp., S) beginning at every position? Infinitely many such
occurrences?
 Are there morphisms on words or languages that preserve the
property of avoiding P (resp., S)? If so, does the morphism have
a finite test set?
 What are properties of the language
L = { x : x avoids P} (resp.,
{ x : x avoids S})?
Is the language regular? Contextfree?
 Algorithms: how quickly can we decide whether a given word w
matches or avoids
P (resp., S)? Is the problem polynomialtime solvable?
NPcomplete? PSPACEcomplete?
 Given a language L specified in some way (for example, by a
deterministic finite automaton or pushdown automaton M), can we decide
if every element of L
avoids P (resp., S)? If at least one word
avoids P (resp., S)? If there is at least one such word,
what is the shortest such (measured, for example, as a function of the
size of M)? Can we count how many words of L
avoid P (resp., S)?
 Given an infinite word specified in some way (for example, as the
fixed point of a morphism, or fixed point of a transducer), can we
compute the (limiting) frequency of occurrences of the
individual letters? Words? Can we decide if it
contains some factor (subword) that matches (or fails to match) P? If
it contains infinitely many? How many such factors of length n are
there? Do the factors that occur, occur infinitely often? Do the
occurrences of factors appear with bounded gaps?
 What are the topological properties of the set of infinite words
avoiding P (resp., S)? Is it of measure 0? Is it perfect?
 Generalizations of all these things: to trees, graphs,
uncountable domains, and so forth.
Course textbook
Narad Rampersad and Jeffrey Shallit are writing a book on the topic
of this course. We'll provide parts of the book from time to time
on LEARN.
Piazza
We'll use
Piazza for course announcements.
Lecture summaries and notes
See here for lecture summaries and notes.
More stuff
Prerequisites
General mathematical sophistication is the most important prerequisite.
You should be very comfortable with formal mathematical proofs,
especially proofs by induction. A knowledge of basic automata theory
at the level of Waterloo's CS 360 or 365 would be helpful, but not
essential. Knowledge of basic algebra (groups, rings, fields) at
the level of Waterloo's PMATH 347/348 would be helpful, but not
essential. Knowledge of basic combinatorics (generating functions, etc.)
at the level of
Waterloo's MATH 239/249 and CO 330 would be helpful, but not essential.
The ability to program in a language like C, C++, Java, Matlab, Maple,
Mathematica, or Python would
also be very helpful.
Evaluation
There will be 35 problem sets, with problems of varying difficulty.
Some of the problems will involve experimentation and computation.
These will be worth 60% of the course mark.
There will be a final course project. This can be
 reading a paper or papers from the literature, writing a 515
page report on them, and presenting them in class in a 1530 minute
presentation; or
 working on an open research problem or problems, writing a 515
page report, and presenting your results in class in a 1530 minute
presentation; or
 writing or rewriting Wikipedia pages on various topics discussed
in class (a list will be provided). This option is available for
those who are a bit presentationphobic.
 doing a programming project where you implement something generally
useful to the material of the course and make it publicly available.
The project should be handed in by April 9, and will be worth 40%
of the course mark. You should hand in a 1page proposal on February 7
2019.
There will be no midterm or final.
This guide by Ian
Parberry gives some tips on making a good presentation. If you
are using a computer, you may want to use the 'beamer' package of latex
and you should send Prof. Shallit the pdf file of your presentation by the
day before the day of your presentation.
If you are very presentationphobic, you
can gain equivalent credit by creating and/or editing some Wikipedia
articles on the topic of the course.
For complete details about the course project, see
here.
Problem Sets
 Problem Set 1, distributed Thursday January 17; due Tuesday January 29,
printed copy, in class.
 Problem Set 2, distributed Wednesday February 6; due Tuesday February 26,
printed copy, in class.
 Problem Set 3, distributed Thursday March 7; due Thursday March 21,
printed copy, in class.
Open problems
During the course, I will mention many interesting
open problems.
Solve any of them for bonus marks in the course.
Sequences
A basic tool that everybody in the course should know about is the
OnLine Encyclopedia of Integer Sequences, available at
https://oeis.org. If you explore any sequence
that does not appear there, please submit it. I will give 1 bonus mark
in this course
for each sequence you submit to the OEIS on the subject of this course
that I consider to be good (my decision is final!). Let me know
the sequence number(s) by email when they appear.
Some papers arising from previous graduate courses
 Soroosh Yazdani,
Multiplicative functions and kautomatic sequences,
Journal de théorie des nombres de Bordeaux 13 (2001)
651658.
 E. Grant, J. Shallit, and T. Stoll,
Bounds for the discrete correlation of infinite sequences on k
symbols and generalized RudinShapiro sequences.
Acta Arith. 140 (2009), 345368.
 Mathieu GuayPaquet and Jeffrey Shallit,
Avoiding squares and overlaps over the natural numbers.
Discrete Mathematics 309 (2009), 62456254.
[link]
 YuHin Au, Aaron Robertson, and Jeffrey Shallit,
Van der Waerden's theorem and avoidability in words,
Integers 11 (2011), 6176.
[link]
 Chen Fei Du, Jeffrey Shallit, and Arseny Shur,
Optimal bounds for the similarity density of the ThueMorse word
with overlapfree and 7/3powerfree infinite binary words,
Internat. J. Found. Comput. Sci. 26 (2015), 11471165.
 YuHin Au, Generalized de Bruijn words for primitive words and powers,
Discrete Math. 338 (2015), 23202331.
 Michael Forsyth, Amlesh Jayakumar, Jarkko Peltomäki, and
Jeffrey Shallit, Remarks on privileged words,
Internat. J. Found. Comput. Sci. 27 (2016), 431442.
 Chuan Guo, Jeffrey Shallit, Arseny M. Shur,
Palindromic rich words and runlength encodings,
Information Processing Letters 116 (2016) 735738.
 Yu Hin (Gary) Au, Christopher DrexlerLemire, and Jeffrey Shallit,
"Notes and note pairs in Norgard's infinity series",
J. of Mathematics and Music 11 (1) (2017) 119.
DOI: here.
 Samin Riasat, Infinite products involving binary digit sums,
in D. Marc Kilgour, Herb Kunze, Roman Makarov, Roderick Melnik, Xu Wang,
eds., Recent Advances in Mathematical and Statistical Methods: IV AMMCS,
Springer, 2018, pp. 5968.