Please note: This master’s thesis presentation will be given online.
Egill Gudmundsson, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Olga Vechtomova
Neural networks are a popular choice of models for the purpose of text generation. Variational autoencoders have been shown to be good at reconstructing text and generating novel text. However, controlling certain aspects of the generated text (e.g., length, semantics, cadence) has proven a more difficult task. The objectives of disentanglement and controlled text generation have thus become areas of interest, with various approaches depending on the aspects we desire to control.
In this work we study controllable generation of lyric text based on semantic and phonetic criteria. The phonetic information takes the form of generalized phonetic patterns. A Bag-of-Words Variational Autoencoder (VAE) extracts and models the semantic information, while a phonetic pattern VAE handles the phonetic information. Each uses several regularization techniques for its respective latent space and the information from each is fed to a lyrics decoder to generate novel lyric lines that would satisfy both the Bag-of-Words and phonetic constraints.
The experiments show that our model can learn to reconstruct phonetic patterns extracted from text and use them with the Bag-of-Words representations to reconstruct the original lyric lines. Together, the learned representations of phonetic patterns and Bag-of-Words constraints can be used to generate new lyrics.