3. Context-Free Grammars

Context-Free Grammars

G=(N,Σ,P,S) G = (N, \Sigma, P, S)

Where:

Productions

A production, (A,ω)P(A, \omega) \in P is written as AωA \rightarrow \omega.

Multiple productions written as Aω1ωnA \rightarrow \omega_1 | \dots | \omega_n.

The RHS can be empty: in this case, the RHS is ϵ\epsilon.

The LHS is always one non-terminal but the RHS can be any length and be made up of terminals and non-terminals.

In E+FG1E+NE+F \Rightarrow_G^1 E + N, GG denotes the grammar being used and the 11 denotes that the LHS can be transformed to the RHS in one step.

Derivability

On a relation (NΣ){(N \cup \Sigma)}^*:

Context-Free Languages

Building Blocks

Leftmost and Rightmost

Leftmost: if there are multiple non-terminals, always replace the left-most one first. The choice of production will be determined by the sentence you are trying to get.

If there is more than one leftmost derivation, the sentence is ambiguous. The CFG that generates it is also ambiguous.

The CFL is inherently ambiguous if every CFG generating it is ambiguous e.g. {aibjcki=j or j=k}\{ a^i b^j c^k \mid i = j \text{ or } j = k \}.

Regular Grammar

G=(N,Σ,P,S) G = (N, \Sigma, P, S)

For each AωPA \rightarrow \omega \in P:

Every regular language is generated by a regular grammar:

Every regular language is accepted by a DFA, NFA (with or without ϵ\epsilon transitions), regular expression, regular grammar.

Regular languages are a strict subset of CFLs.

Left/Right Regular Grammars

If ωP\omega \in P is the set of productions, then:

$ \omega \in \Sigma N\cup N\Sigma \cup { \epsilon }$ and ωNNΣ\omega \in NN \cup \Sigma do NOT yield regular languages.

Chomsky Normal Form

CFG G=(N,Σ,P,S)G = (N, \Sigma, P, S) if every production’s RHS is made up of:

Every CFG can be transformed into Chomsky normal form (unless ϵL(G)\epsilon \in L(G)) by eliminating:

  1. ϵ\epsilon-productions of the form AϵA \rightarrow \epsilon
  1. Unit-productions of the form ABA \rightarrow B where bwb \rightarrow w exists
  1. Non-generating non-terminals
  1. Non-reachable non-terminals
  1. Productions with RHS length 2\ge 2 that contain terminals
  1. Productions with RHS length 3\ge 3 (should only be left with productions with only non-terminals of this length)

If ϵ\epsilon is needed:

Cocke-Younger-Kasami Algorithm

Given a string ωΣ\omega \in \Sigma ^*, CFA AA, check if ωA\omega \in A

Chomsky normal form means that each non-terminal produces at least one terminal. Therefore, there is an upper bound on the length of derivations that need to be checked (exponential).

Use dynamic programming to improve speeds.

If the original problem is SwS \Rightarrow ^* w, solve NwijN \Rightarrow ^* w_{ij} for all non-terminals and for all substrings of ww, where ii and jj are the lengths of the strings e.g. abcde13abcde_{13} returns bcbc

Hence, the algorithm is O(n3)O(n^3).

If SN0,len(w)S \in N_{0, \text{len}(w)}, the string ww is in the language of the grammar.

Pushdown Automata

Uses a stack. Transitions can depend on symbols at the top of the stack, and modify symbols at the top of the stack. The stack and transition alphabet can be different.

M=(Q,Σ,Γ,δ,q0,F) M = (Q, \Sigma, \Gamma, \delta, q_0, F)

Where:

For a transition to take place, there must be a normal transition, but the character on the top of the stack must be the same one as specified in the transition (unless it is ϵ\epsilon, in which case nothing needs to be read).

Upon reading, the element is removed from the stack, and an element is added to the stack. The syntax is character, stack symbol to read/pop stack symbol to push e.g. a,ϵ/1a, \epsilon /1. Strings can be pushed and popped as well (when pushing, the first character of the string is pushed last).

For a string to be read, it must be on an accept state AND the stack must be empty.

Configuration

(q,x,α)Q×Σ×Γ (q, x, \alpha) \in Q \times \Sigma ^* \times \Gamma^*

Where:

The configuration is a snapshot of the current state of MM. The next state is:

(p,ax,αΓ)(q,x,BΓ) if (q,B)δ(p,a,α) (p, ax, \alpha \Gamma) \rightarrow (q, x, \Beta \Gamma) \text{ if } (q, \Beta) \in \delta (p, a, \alpha)

\rightarrow^* means the configuration is reachable in zero or more transitions.

The language accepted by MM:

L(M)={xΣ(q0,x,ϵ)(q,ϵ,ϵ),qF} L(M) = \{ x \in \Sigma^* \mid (q_0, x, \epsilon) \rightarrow^* (q, \epsilon, \epsilon), q \in F \}

Converting CFGs to PDAs

The resultant PDA will have two states: a start state q0q_0 and accept state q1q_1.

A transition from q0q_0 to q1q_1, ϵ,ϵ/S\epsilon , \epsilon / S, pushes the start state onto the stack.

q1q_1 will have transitions to itself for each production: for each production and RHS, an epsilon transition reading nothing and pushing the RHS of the production onto the stack should be made: ϵ,LHSproduction,RHSproduction\epsilon, \text{LHS}_\text{production}, \text{RHS}_\text{production}.

For each terminal α\alpha, there should also be a read transition taking the terminal as input and popping that terminal from the stack: α,α/ϵ\alpha,\alpha/ \epsilon.