2. Syntax of Propositional Logic

In our analysis of logical arguments, we will not consider propositions written in English. Instead, to avoid possible ambiguities, we will only consider propositions written in a formal propositional logic language that we will now define.

A Propositional Logic Language

Our language will be built on the infinite alphabet $\sigma$ that includes the following symbols:

The propositional variable symbols $p, q, r$ and $p_1, p_2, p_3, \ldots$.
The unary connective symbol $\neg$.
The binary connective symbols $\wedge$, $\vee$, $\to$, and $\leftrightarrow$.
The parentheses symbols $($ and $)$.

For each $n \ge 0$, we let $\sigma^n$ denote the set of strings of length $n$ whose symbols are all in $\sigma$. The set of all finite-length strings over $\sigma$ is denoted by $\sigma^* = \bigcup_{n \ge 0} \sigma^n$. Note that each propositional variable represents a single symbol in $\sigma$ and there are no space symbols in the alphabet. So the string $(p_{42} \vee p_3)$, for example, has length $5$.

We only want to consider some of the strings in $\sigma^*$.

Definition. The language $L^p(\sigma)$ is the set of strings in $\sigma^*$ that can be obtained by a finite number of applications of the following three rules:

Every length-one string that consists of a single propositional variable is in $L^p(\sigma)$.

For every $\phi \in L^p(\sigma)$, the string $\neg \phi$ is also in $L^p(\sigma)$.

For every $\phi, \psi \in L^p(\sigma)$, the strings $(\phi \wedge \psi)$, $(\phi \vee \psi)$, $(\phi \to \psi)$, and $(\phi \leftrightarrow \psi)$ are all in $L^p(\sigma)$.

A formula is a string in $L^p(\sigma)$.

Examples of formulas in $L^p(\sigma)$:

$p$

$(p \wedge q)$

$(p_1 \wedge (p_2 \wedge p_3))$

$((p_1 \wedge p_2) \wedge p_3)$

$\neg \neg (p_1 \wedge (p_2 \vee q))$

All of the formulas above are distinct: while we might already have noticed that the propositions encoded in the 3rd and 4th examples above are “equivalent” in some sense, we have not yet encoded any meaning into the language so for now, two formulas $\phi$ and $\psi$ are equal to each other if and only if they encode the exact same string.

Examples of strings that are not formulas in $L^p(\sigma)$:

$(p \wedge )$

$(\neg p_3)$

$(p_1 \wedge p_2 \wedge p_3)$

$\phi$

The second and third examples might look like reasonable formulas (at least once we establish the distributivity of the $\wedge$ operator for the third example), but it’s important to emphasize that they are not formulas in the language $L^p(\sigma)$ that we defined because they don’t satisfy the parenthesization rules in the definition.

The last example is also worth a short discussion: we will use $\phi$ to represent a generic proposition. (In fact, we already did this in the definition above.) But while we can use $\phi$ to represent a formula, the string of length one that consists of the symbol $\phi$ is not itself a formula because it is not a string over the alphabet $\sigma$.

Note that the language $L^p(\sigma)$ represents only one of many possible ways that we could define a formal language to study propositional logic. We could make different choices in the set of symbols we use, the set of connectives we allow, the way we use parentheses, etc.

Parsing Formulas

A parse tree gives a visual representation of a formula. As an example, this is a parse tree for the formula $((p \wedge q) \vee (p \wedge \neg r))$:

Example of a parse tree

A parse tree is a rooted tree where the root of the tree (drawn at the top) is labelled with the formula being parsed. Each node labelled with a formula of the form $( \phi \,\square\, \psi)$ for some binary connective $\square \in \{\wedge, \vee, \to, \leftrightarrow\}$ has two children, labelled with $\phi$ and $\psi$, respectively. A node labelled with $\neg \phi$ has a single child labelled with $\phi$. And the leaves of the tree are labelled with propositional variables.

We can begin our formal study of propositional logic by asking a deceptively non-trivial question: can we represent all formulas with a parse tree? And, importantly, is every formula representable by a unique parse tree? The answer to both questions is “yes”, but proving that this answer is correct requires a bit more work. One way to establish these properties is to start by proving that every formula in $L^p(\sigma)$ satisfies another, more basic, structural property:

Lemma 1. Every formula $\phi \in L^p(\sigma)$ contains the same number of left parentheses and right parentheses.

Proof. By induction on the length $n$ of $\phi$.

In the base case where $n=1$, the formula $\phi$ must consist of a single propositional variable and the claim trivially holds since it then contains zero left and right parentheses.

In the induction step, the induction hypothesis is that every formula of length $< n$ has the same number of left and right parentheses. There are two cases to consider.

When $\phi = \neg \psi$, then by the induction hypothesis, $\psi$ contains the same number of left and right parentheses, so the same holds true for $\phi$ as well.

When $\phi = ( \psi \,\square\, \theta)$ for some $\square \in \{ \wedge, \vee, \to, \leftrightarrow\}$, then by the induction hypothesis, both $\psi$ and $\theta$ have the same number of left and right parentheses. Since the total number of left (resp. right) parentheses of $\phi$ is the sum of the number of left (resp. right) parentheses of $\psi$ and $\theta$ plus one, the same is true of $\phi$ as well.

We can build on this simple observation.

Lemma 2. Every formula $\phi \in L^p(\sigma)$ satisfies:

for every index $\ell$ where $\phi_\ell$ is a binary connective, there are strictly more left parentheses than right parentheses in the prefix $\phi_1\cdots\phi_\ell$; and

when $\phi$ contains at least one binary connective, there is a unique index $k$ where $\phi_k$ is a binary connective and there is exactly one more left parentheses than right parentheses in the prefix $\phi_1\cdots\phi_k$.

Proof. We again prove the theorem by induction on the length $n$ of $\phi$.

In the base case where $n=1$, the claim is trivially true because all formulas of length one consist of a single propositional variable and therefore do not contain any binary connectives.

In the induction step, there are possibilities. When $\phi = \neg \psi$, then by the induction hypothesis, the conclusion of the lemma holds for $\psi$ and so it holds for $\phi$ as well.

Consider now the second possibility, where $\phi = (\psi \,\square\, \theta)$ for some formulas $\psi$ and $\theta$ and binary connective symbol $\square$. Let $k$ denote the index of the $\square$ symbol in this decomposition of $\phi$.

By Lemma 1, $\psi = \phi_2\cdots\phi_{k-1}$ has the same number of left and right parentheses. So there is exactly one more left parenthesis than right parentheses in $\phi_1\cdots \phi_k$.
And by the induction hypothesis on $\psi$, every binary connective at index $i < k$ in $\phi$ has at least one more left than right parenthesis in the string $\phi_2\cdots\phi_i$ so it has at least two more left parentheses than right parentheses in the prefix $\phi_1\cdots\phi_i$. Similarly, by the induction hypothesis on $\theta$, for every binary connective at index $j > k$ in $\phi$ there are at least two more left parentheses than right parentheses in the prefix $\phi_1 \cdots \phi_j$.

Unique Parsing Theorem

With the two structural lemmas above, we can now prove that formulas can only be represented by a single parse tree.

Theorem. (Unique Parsing Theorem for Propositional Logic) Every formula $\phi \in L^p(\sigma)$ is of exactly one of the following forms:

$\phi$ consists of a single propositional variable.

$\phi = \neg \psi$ for some $\psi \in L^p(\sigma)$.

$\phi = (\psi \,\square\, \theta)$ for some $\psi, \theta \in L^p(\sigma)$ and $\square \in \{\wedge, \vee, \to, \leftrightarrow\}$.

Furthermore, in the third case, there is a unique such decomposition.

Proof. When the first symbol of $\phi$ is a propositional variable, that string must consist only of this symbol and therefore be in the first form since no rule in the definition of $L^p(\sigma)$ lets us construct a formula of length greater than 1 that begins with a propositional variable.

Similarly, when the first symbol of $\phi$ is a negation symbol $\neg$, it must be of the second form.

When the first symbol of $\phi$ is a left parenthesis, it must be of the third form in the theorem statement, and so we can write $\phi = ( \psi \,\square\, \theta)$ and let $k$ denote the index of the binary connective in this decomposition. By Lemma 1, for any valid decomposition $\phi = ( \psi \,\square\, \theta)$ with the binary connective $\square$ in this decomposition occurring at index $k$ in $\phi$, the number of left parentheses in $\phi_1 \cdots \phi_k$ must be exactly one more than the number of right parentheses in the same string. But by Lemma 2, there can be only one binary connective in $\phi$ that satisfies this condition. Therefore, the decomposition is unique.

The Unique Parsing Theorem will be most useful already in the next lecture when we assign some meaning to formulas by showing how we can evaluate them by “working up” their parse tree. In that setting, uniqueness saves us from having to worry that different parse trees for the same formula might give contradictory results.