Definition of a grammar
A grammar G is a quadruple G = (V, T, S, P) where
- V is a finite set of (meta)symbols, or variables.
- T is a finite set of terminal symbols.
- S
V is
a distinguished element of V called the start symbol.
- P is a finite set of productions (or rules).
A production has the form X
Y
where
- X
(V
T)
:
X is a member of the set of strings composed of
any mixture of variables and terminal symbols,
but X is not the empty string.
- Y
(V
T)
:
Y is a member of the set of strings composed of
any mixture of variables and terminal symbols;
Y is allowed to be the empty string.
Derivations
Productions are rules that can be used to define the
strings belonging to the language generated by a grammar.
Given a grammar G = (V, T, S, P),
you can find a string belonging to this language as follows:
- Start with a string w consisting of only the start symbol S.
- Find a substring x of w that matches the left-hand
side of some production p in P.
- Replace the substring x of w with the right-hand side
of production p.
- Repeat steps 2 and 3 until the string consists entirely of symbols
of T (that is, it doesn't contain any variables).
- The final string belongs to the language L.
Each application of step 3 above is called a derivation step.
Suppose w is a string that can be written as uxv, where
- u and v are elements of
(V
T)
- x is an element of
(V
T)
- there is a production x
y
Then we can write
uxv
uyv
and we say that uxv directly derives uyv.
Notation:
- S
T :
S derives T (or T is derived from S) in exactly one step.
- S
T :
S derives T in zero or more steps.
Now the language L(G) generated by G is the set of all strings
w
T
with S
w.
Unrestricted grammars
Theorem: A language is generated by a grammar if and only if it
is recursive enumerable. That is: if and only if it is recognized by a
Turing machine. Deterministic and nondeterministic Turing machines are
equally expressive.
Context-free grammars and languages
A grammar G = (V, T, S, P) is a
context free grammar (CFG) if
all productions in P have the form
A
x
where
A
V, and
x
(V
T)*.
A language is called context-free if it is generated by a
context-free grammar.
Theorem: A language is context-free if and only if it is
recognized by a (nondeterministic) pushdown automaton.
Deterministic pushdown automata are somewhat less expressive.
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions
are all of the form
xAy
xvy
where A
V and
x, v, y
(V
T)*.
A context-sensitive language is a language generated by a
context-sensitive grammar.
The name "context-sensitive" comes from the fact that the actual
string modification is given by A
v,
while the x and y provide the context in which the rule may be applied.
A Turing machine has an infinite supply of blank tape.
A linear-bounded automaton (LBA) is a Turing machine whose tape
is only kn squares long, where n is the length of the input string and
k is a constant associated with the particular linear-bounded automaton.
Some textbooks define an LBA to use only the portion of the tape that
is occupied by the input; that is, k=1 in all cases. That definition
leads to an equivalent class of machines, because we can compensate for
the shorter tape by having a larger tape alphabet.
Theorem: A language is context-sensitive if and only if it is
recognized by a nondeterministic linear bounded automaton.
Deterministic linear bounded automata are somewhat less expressive.
Most of this material I found on the
webpage of David Matuszek.