2. Regular Expressions

Patterns

Atomic

Compound

Language Definition

L(r)L(r) is the language generated by the regular expression rr.

Proving Every Language Generated by a Regular Expression is an NFA

Where aΣa \in \Sigma:

∅:
---> ◯      ⭘


a:                     ε:
        a                      ε
---> ◯ ---> ⭘          ---> ◯ ---> ⭘


p|q:
                ε             ε
               ---> ◯  p   ◯ ---
        ε    /                   \
---> ◯ ---> ◯                     ⭘
             \  ε             ε  /
               ---> ◯  q   ◯ ---


pq:
        ε             ε              ε
---> ◯ ---> ◯  p  ◯ -----> ◯  q   ◯ ---> ⭘ 


p*:
        ε             ε
---> ◯ ---> ◯  p   ◯ ---> ⭘ 
     |      ^      |      ^
     |      --------      |
     ----------------------

Additionally:

Finite Automata to RegExp

Every language accepted by an NFA is generated by a RegExp. We can successively remove states and replace them with longer and longer RegExps as transitions. To remove qjq_j, for all qiq_i and qjq_j where iji \neq j and kjk \neq j:

regexp-state-removal

To remove q_j, for every qi -> qj -> qk, where qj not qi or qj:

      rij           rjk
qi ---------> qj ---------> qk
|            ^  |           ^
|            |__|           |
|            rjj            |
|___________________________|
             rik

    rik | (rij (rjj)* rjk)
qi ----------------------> qk


For two-state loops:

      rij           rjk
qi ---------> qj ---------> qk
 ^    rji     |
 |------------|

Remove q_j: i -> j -> k, and i -> j -> i

        r_ij r_jk
     qi ---------> qk
    ^  |
    |__|
   rij rji


      (rij rji)* r_ij r_jk
qi -----------------------> qk

Minimization of DFA

Equivalence of states pp and qq:

δ^(p,w)Fδ^(q,w)F for each wΣ \hat{\delta}(p, w) \in F \Leftrightarrow \hat{\delta}(q, w) \in F \text{ for each } w \in \Sigma^*

i.e. states are equivalent if, for all strings, if you start at pp or qq and both reach an accept state (or if both are rejected).

If states are not equivalent, they are distinguishable:

Algorithm to find distinguishable states

pqp \sim q if two states are equivalent. The $ \sim $ relation is:

Thus it is an equivalence relation   A×A\sim \; \subseteq A \times A:

Minimization Algorithm

Notes:

Decision Problems

Languages AA ,BB, string xx:

Membership

If the string xx is accepted by the language.

Run MM with input xx, check if it’s accepting. Linear time in length xx and size of MM.

Emptiness

If no string is accepted by the language.

Reachability to an accepting state. Use BFS or DFS.

Running time is linear in the size of the input MM: O(Q+QΣ)O(|Q|+|Q\|\Sigma|).

Finiteness

If the number of strings accepted by the language is finite.

Running time is linear in the size of MM.

Universality

If the language accepts all possible strings.

True if and only if L(M)=\overline{L(M)} = \emptyset. Construct the complement DFA by inverting end states; check emptiness.

Running time is linear.

Intersection Emptiness

True if L(M)L(M)=L(M) \cap L(M') = \emptyset.

Reduce the problem to the emptiness problem by constructing the product automaton.

Inclusion

If L(M)L(M)L(M) \subseteq L(M' ). L(M)L(M)=L(M) \cap \overline{L(M')} = \emptyset if this is the case.

Hence, take the complement of the MM' and use the intersection emptiness property. Running time is quadratic.

Equivalence

Use inclusion property going both ways.

L(M)=L(M)L(M) = L(M') if and only if L(M)L(M)L(M) \subseteq L(M') and L(M)L(M)L(M' ) \subseteq L(M').

Running time is quadratic.

Non-Regular Languages

Not all languages are regular. Counterexample: A={anbnnN}A = \{ a^n b^n \mid n \in N \}.

Assume AA is a DFA with kk states:

Pumping Lemma

If AA is regular, there is a number nn such that every wAw \in A, wn|w| \ge n can be expressed as w=xyzw = xyz where:

Three parts: getting to cycle, looping through the cycle ii times (or zero times), getting to end state.

To show A={anbnnN}A = \{ a^n b^n \mid n \in N \} is not regular, first assume it is regular:

Set w=anbnw = a^n b^n. It is clearly longer than nn, so w=xyzw = xyz where yϵy \neq \epsilon and xyn|xy| \le n.

Therefore y=ajy = a^j, where 1jn1 \le j \le n. Let i=2i = 2:

Hence, xyizAxy^i z \in A by the pumping lemma, but xy2z=xyyz=anybn=an+jbnAxy^2 z = xyyz = a^n yb^n = a^{n+j} b^n \notin A.

Modelling Independent Processes

Shuffling:

Shuffle of the same language: NFA with Σ2|\Sigma|^2 states, with transitions going horizontally or vertically, but not diagonally: one symbol can only move it in one of the DFAs. Both are the same, so there is symmetry; the upper or lower triangle can be cut off. The accepts states are the intersection.

Product Automaton

Product of two automaton accepts the intersection of the two languages - strings that satisfy conditions for both languages. The languages must have the same alphabet.

Create by drawing a matrix of states, each axis representing states from one automaton (useful to have different naming schemes for each automaton such as 1, 2, 3 etc. for one and a, b, c etc. for the other).

To be in a state, it is in both states simultaneously. To determine transitions, determine the state transitions individually, and find the ‘coordinates’ for the product by concatenating the two. e.g. if you are in state a3, to determine the transition for symbol 0, let transition(dfa1, a, 0) = a and transition(dfa2, 3, 0) = 4, then draw a transition to the resultant state a4. Accept states are states with are accepting in both automata.