A Short Course Compiler Grammar Tutorial Study Notes with Examples

What is Compiler Grammar

In 1956, Noam Chomsky gave a model of grammar. A grammar is defined as 4-tuples (V_N, Ʃ, P, S),

Where, V_N is finite non-empty set of non-terminals.

Ʃ is finite non-empty set of input terminals.

P is finite set of product rules.

S is the start symbol.

What is Chomsky Hierarchy

The Chomsky hierarchy is a containment hierarchy of classes of formal grammars that generate formal languages. The formal grammars consist of a finite set of terminal symbols, a finite set of non-terminal symbols, a set of production rules with a left and a right hand side consisting of a word of these symbols and a start symbol.

The Chomsky hierarchy consists of following four levels.

Type 0 Grammar (Unrestricted Grammar)

These are unrestricted grammars which include all formal grammars. These grammars generate exactly all language that can be recognized by a Turing machine.

Rules are of the form α → β.

Where, α and β are arbitrary strings over a vocabulary V and α ≠ ^ (null).

Type 1 Grammar (Context Sensitive Grammar)

Language defined by type-1 grammars are accepted by linear bounded automata.

Rules are of the form α A β → α B β

Where, A ϵ V_N

α , β, B ϵ (V_NU Ʃ)

B ≠ ^

Type 2 Grammar (Context-free Grammar)

Language defined by type-2 grammars are accepted by push-down automata.

Rules are of the form A → α

Where, A ϵ V_N

α ϵ (V_NU Ʃ)^*

Type 3 Grammar (Regular Grammar)

Language defined by type-3 grammars are accepted by finite state automata.

Rules are of the form A → ^

A → α

A → α β

Where, A, B ϵ V_Nand a ϵ Ʃ

A Short Course Compiler Grammar Tutorial Study Notes with Examples

Regular Expression

Regular expression mean to represent certain sets of strings in some algebraic fashion.

A regular expression over the alphabet E is defined as follows

ɸ is a regular expression corresponding to the empty language ɸ.
^ is a regular expression corresponding to the language { ^}.
For each symbol a ϵ E, a is a regular expression corresponding to the language {a}.

Regular Language

The Language accepted by FA are regular language and these language are easily described by simple expressions called regular expressions.

For any regular expression r and s over E, Corresponding to the language L_rand L_S respectively, each of the following is a regular expression corresponding to the language indicated.

(r s) corresponding to the language L_rL
(r + S) corresponding to the language L_rU L
r* corresponding to the language L_r.

Some examples of regular expression are

L (01) = {0,1}

L (01 +0) = {01,0}

L(0(1+0))={01,00}

L(0^*) = {Ɛ, 0, 00, 000, …..}

L((0 +10)^* (^ + 1) = all string of 0’s and 1’s without two consecutive 1’s.

Key Points

If L₁and L₂ are regular language in E^* , then L₁U L_2,L₁∩ L_2,L₁– L₂and L₁^’ (complement of L₁), are all regular languages.
Pumping lemma is a useful tool to prove that a certain language is not regular.
The Myhill-Nerode theorem tells that the given language might be regular or not regular.

Regular Set

A set represented by a regular expression is called regular set.

E.g. If Ʃ = {a, b} is an alphabet, then

Different Regular Sets and their Expressions

Sets	Regular Expression
{}	ɸ
{a}	a
{a ,b}	a+b
{ ɛ, a, aa, aaa,….}	a*
{a, aa, aaa,….}	*aa = a⁺**

Identities for Regular Expressions

The following points are the some identities for regular expressions.

ɸ + L = L + ɸ = L ɸ L = L ɸ = ɸ

^R = R^ = R ^^* = ^ and ɸ^*= Ɛ

R+ R = R, where R is the regular expression RR* = R*R = R^*

(R*)* = R* R*R* = R*

(P + Q)^* = (P^*Q^*)^*(P^* + Q^*)^*, where P and Q are regular expressions.

R (P + Q) = RP + RQ and (P + Q) R = PR + QR

P (QP)^*= (PQ)^*P

Regular Expression and Finite Automata

NFA with ^-moves Let r₁ and r₂be the two regular expressions over Ʃ₁ Ʃ₂ and E₁ E2 are two NFA for r₁ and r₂ respectively.

Regular Expression to ^-NFA (union)

For E₁ U E₂

Regular Expression to ^-NFA (Concatenation)

For E₁ E₂

Regular Expression to ^-NFA (Clousre)

For E^*

Construction of Regular Expression from DFA

If a DFA recognizes the regular language L₁ then there exists regular expression, which describes L.

Arden’s Theorem

If P and Q are two expressions over an alphabet Ʃ such that P does not contain ^, then the following equation R = Q + RP.
The above equation has a unique solution i.e. , R=QP^*. Arden’s theorem is used to determine the regular expression represented diagrams.

The following points are assumed regarding position diagrams.

The transition system does not have any ^.
It has only one initial (starting) state.

Properties of Regular Languages

Regular Language are closed under following properties.

Union
Concatenation

Kleene closure

Complementation
Transpose
Intersection

If A is polynomial time reducible to B (i.e., A ≤ B) and B is in P, then Also in P.
If A is NP-complete problem, then it is member of P if and only if P = NP.
The complexity class NP-complete is the intersection of NP and NP-hard classes.

Key Points

All NP problems can be solved, if an NP problem can be reduced to an NP- hard problem before solving it in polynomial time.
If A is NP- complete and there is a polynomial time reduction of A to B, then B is NP-complete.
If there is a language L such that L ϵ P, then complement of L is also in P.
P complexity class is closed under union, intersection, complementation, kleene closure and concatenation.
NP complexity class is closed under union, intersection, concatenation, and Kleene closure.

Design of Automaton

The design can be illustrated by under following three categories

Sequence of string is fixed

If the sequence of string is fixed that suppose an automaton is designed to accepted a string with abbacb.

The value of string is changed by changing input value

Suppose, an automaton is designed to accepted the string abba or bbba.

When input values are repeated

Suppose an automaton is to be designed for accepting all string generated by input a. That means it has to accept a, aa, aaa, ……

This will be designed as

Now, if we have to design a machine which accepts all inputs containing string ab

Context Free Language Tutorial Notes Study Material with Examples

Learn Context Free Language in Handbook Series: Click here

A Short Course Compiler Grammar Tutorial Study Notes with Examples

A Short Course Compiler Grammar Tutorial Study Notes with Examples

What is Compiler Grammar

What is Chomsky Hierarchy

Type 0 Grammar (Unrestricted Grammar)

Type 1 Grammar (Context Sensitive Grammar)

Type 2 Grammar (Context-free Grammar)

Type 3 Grammar (Regular Grammar)

Regular Expression

Regular Language

Key Points

Regular Set

Different Regular Sets and their Expressions

Identities for Regular Expressions

Regular Expression and Finite Automata

Regular Expression to ^-NFA (union)

Regular Expression to ^-NFA (Concatenation)

Regular Expression to ^-NFA (Clousre)

Arden’s Theorem

Properties of Regular Languages

Key Points

Design of Automaton

Sequence of string is fixed

When input values are repeated

Context Free Language Tutorial Notes Study Material with Examples

Leave a Reply Cancel reply