Publishing Notes from Markdown

2021/08/13

Early in 2020, I started writing course notes for CS 346, a new course that I’m developing. My notes from CS 349 were a collection of Powerpoint and Keynote slides, some text documents, and a bunch of HTML pages that I’d hand-written. It worked, but it was a cumbersome way to organize course notes. I spent far more time fidgeting with formatting issues than actually creating content.

I thought this might be a good opportunity to rethink how I was organizing my notes. I wanted something that would let me focus on writing, and spend less time getting distracted by formatting 1.

Goals

I had the following goals in mind:

• My system had to be optimized for writing notes, not slides. It’s much easier to generate meaningful slides from a good set of notes than the other way around.
• Notes should support standard text formatting features: bold, italics, bullets, list, tables.
• Notes should handle JPG and PNG images. I should be able to resize and position them.
• I need to include formatted source code (Kotlin, shell scripts).
• I want to organize my writing by chapter, and then combine everything into a PDF or HTML that can be hosted on the course website.
• I need to (eventually) be able to produce slides from these notes.

Analysis

There were really only three options that I considered: Keynote/PowerPoint, LaTeX and Markdown. Here’s my thoughts on each:

Keynote/Powerpoint

This was the most natural thing to try, since my secondary output was slides that I would show during class. I created over 100 slides, with detailed supporting notes. Both of these tools would let me organize content by chapter (file), and easily handled text and image formatting. Source code was an issue, but I found that I could copy-paste from a text editor and Keynote would respect whatever formatting was presented (e.g. I could copy-paste Kotlin code from Sublime Text and Keynote would retain the coloured syntax-highlighting).

• Pro: Handled images and text well. Relatively easy to turn notes into slides. Can generate PDF, one file at a time.

• Con: Each chapter was a separate file, with no way to combine them (and a single file for 300+ pages was too difficult to navigate). No real syntax highlighting.

LaTeX

LaTeX was more appealing in some ways, since I was used to writing lengthy text with LaTeX (e.g. papers). I created about 100 pages of text with images, split across chapters (with a makefile to the chapters into a single PDF). LaTeX was fantastic for working with images, and generated nice-looking output, but it was fairly time-intensive: I spent considerable time formatting imags, setting up bulleted lists, tinkering with styles. Source code wasn’t handled1, so I had to take screenshots and insert source code as images.

• Pro: Output formatted well. Images handled very well. Easy to split into chapters and combine into a single document. Can generate PDF.
• Con: More time spent on formatting. Doesn’t handle source code.

Markdown

I converted the LaTeX notes into Markdown, and wrote a couple hundred more pages. Markdown offered the benefits of LaTeX, but the format is much easier to read (i.e. optimized to be human readable). It provided most of the benefits of LaTeX, but traded away precision for a much relaxed format. It also allowed me to write source code inline and would recognize and syntax highlight it automatically.

• Pro: Easiest to write. Handles images, source code. Can split chapters into separate files and combine into a single document. Can generate PDF, HTML.
• Con: The loss of formatting precision didn’t matter in most cases, except that I lost the ability to resize images.

Solution

The chosen solution was Markdown + Pandoc to convert MD to PDF files. Markdown allows you to essentially ignore formatting while writing, which made the process so much easier. The only negative was that markdown doesn’t really handle images very well: you can insert them, but you cannot resize or justify them; by default, images are stretch to the width of the page. Most of the time this isn’t an issue, but occasionally I’d need to reformat images manually 2.

The process of generating output from markdown is shown below. The critical software is Pandoc: an open source document conversion tool that can convert markdown (and many other input types) to different output formats. In the case of my notes, pandoc seamlessly converts markdown to tex as an intermediate format, before using the tex files to PDF. This provides some unexpected benefits: any LaTeX that is included in markdown is passed through untouched to the Pandoc engine, to be processed.

Setup

You will want an editor that supports markdown, GNU make to build from my template below, and Pandoc (which just needs to be installed and on your path).

Markdown is just text, and your favourite editor probably already supports it. My current favourite editor is Typora, which supports an editable preview mode! It’s hands-down the most featured Markdown editor I’ve found.

Project Structure

Here’s the project structure:

.
├── 01.chapter01.md
├── 02.chapter02.md
├── 03.chapter03.md
├── assets/
├── lib/
├── makefile
├── meta/
├── out/

• 01.chapter1.md, 02.chapter2.md are markdown source files. They use standard Github-Flavored-Markdown (GFM) with no special format.
• assets contains images that have been resized to a reasonable format. This process will stretch them to fit, which will handle most images, but occasionally the canvas would need to be trimmed etc. to fit.
• lib contains a LaTeX template that I use when generating the PDF. The templats are passed to Pandoc as a command-line parameter in the makefile.
• meta contains LaTeX that gets passed into Pandoc in the makefile. Primarily used to support things that markdown doesn’t handle natively (e.g. \newpage at the end of every chapter, which is appended to the chapters before they are converted to PDF).
• out is the output directory.

Makefile

The makefile’s primary purpose is to call pandoc with the correct options. Here’s a portion of the makefile that generates a single PDF with all of the chapters:

common=--data-dir=lib --resource-path=assets
pdf_options=--template eisvogel -H meta/notes.sty -V titlepage:true --toc -N -V colorlinks=true -V linkcolor=blue -V urlcolor=blue -V toccolor=blue --toc-depth=3 --number-sections --pdf-engine=xelatex
newpage=meta/newpage.md

chapters=\
01.chapter01.md ${newpage} \ 02.chapter02.md${newpage} \
03.chapter03.md

pdf
@mkdir -p out
pandoc -f markdown -t pdf ${common}${pdf_options} -o out/notes.pdf ${pdf_header}${newpage} \${chapters}
open out/notes.pdf


Most of these options are pandoc options and should not be changed without great care!

• meta/pdf_headercontains a YAML block with document metadata. e.g. author name, date published.
• --template eisvogel  is a LaTeX template that provides a more modern “feel” to the notes.
• --pdf-engine=xelatex uses an alternative engine that handled UTF-8 properly. Without this line, you end up with some garbled output (mainly in the sample output where some of our programs output unusual characters).

Markdown Files

Here’s a sample markdown file. Note that there is no special formatting in these files (metadata required by pandoc is handled by the makefile).

01.chapter01.md

# Course Syllabus

This course explores the knowledge, skills and strategies required to build complete full-stack applications. Using an iterative development methodology, students will work in project teams to design, develop, and test applications and services. Standard development tools and approaches will be used to ensure code quality and performance at every step of the development cycle.

- Course Credit Weight: 0.50.
- Structure: 3 × 50-min lectures, 2 × 50-min labs (optional, drop-in).
- Prerequisite Courses: [CS 246](https://student.cs.uwaterloo.ca/~cs246/S21/index.shtml). Computer Science Majors only.

![Courses in this sequence](assets/course_sequence.png)


How do I use this?

Here’s a ZIP fie containing my entire directory structure, with an empty (sample) document. Type make to build the entire thing, or make <chapter name> to build a single chapter for testing.

Happy writing!

Jeff

1. Anyone that’s ever written in MS Word should understand the challenge of trying to stay focused while constantly fixing formatting issues. ↩︎

2. There’s actually a workaround: Pandoc will “pass through” any text that it doesn’t recognize into the intermediate format. This means that you can use LaTeX image formatting commands in Markdown, and as long as you’re converting to PDF, they will work! However, this doesn’t work if you output to a different format. I chose to manually resize images to keep flexibility in output formats. ↩︎