-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Automata and Formal Languages
- finite automata
5-tuple (Q, Z, o, q0, F)
q is set of states, q0 is start state, F is set of final states
Z is finite input alphabet
o is transition function from QxZ->Q
that is o(q,a) returns a state for current state q and input symbol a from Z
1.) DFA: deterministic finite automata
2.) NFA: nondeterminisic finite automata
3.) NFA w/e: nondeterminisic finite automata with epsilon transitions
4.) regular expressions
these four have equivalent power to handle regular languages
terms
language L
L accepted by an automata
L denoted by an expression
two-way finite automata
can move head as part of o(q,a)
no more powerful
moore vs mealy machines
application of FA with output
moore - output depends on current state
mealy - output depends on current state and input symbol
moore and mealy are equivalent
pumping lemma
used to show languages nonregular
Let L be a regular set. Then there is a constant n such that if z is
any word in L, and |z| >= n, we may write z=uvw in such a way that
|uv|<=n, |v|>=1, and for all i>=0, uv^iw is in L. Furthermore, n is
no greater than the number of states of the samllest FA accepting L.
regular sets are closed under
union, concatenation, kleene closure
complement
intersection
substitution
homomorphisms and inverse homomorphisms
quotient with arbitrary sets
the set of sentences accepted by an FA M with n states is:
nonempty iff the FA accepts a sentence of length less than n
infinite iff the FA accepts a sentence of length l where n<=l<2n
there is an algorithm to determine if two FA are equal
M1 == M2?
M3 = L1-L2 union L2-L1
if M3 accepts a word, then L1!=L2
Myhill-Nerod theorm
the following three statements are equivalent
1.) The set L(=Z* is accepted by some FA
2.) L is the union of some of the equivalence classes of a right
invariant equivalence relation of finite index.
3.) Let equivalence relation Rl be defined by: xRly iff for all z in
Z*, xz is in L exactly when yz is in L. Then Rl is of finite
index.
this is used to prove
The minimum state automaton accepting a set L is unique up to an
isomorphism (i.e. a renaming of the states) and is given by M' in the
proof of Myhill-Nerod
- context free grammars (aka CFG)
G = (V, T, P, S)
V variables, T terminals
V and T disjoint
P productions
A -> a, A is variable, a is string from (V union T)*
S is start symbol
terminology
language L generated by grammar G
derivations
derivation trees
leftmost and rightmost derivations
leftmost = at each step in a derivation a production is applied to the leftmost variable
ambiguity
a CFG G such that some word has two parse trees is said to be ambiguous
or: some word that has more that one leftmost (or rightmost) derivation
a CFL for which every CFG is ambiguous is said to be inherently ambiguous
the CFL ( {aNbNcMdM N>=1 M>=1} UNION {aNbMcMdN N>=1 M>=1} ) is inherently ambiguous
useless
Every nonempty CFL is generaged by a CFG with no useless symbols
If L=L(G) for some CFG G, then L-{e} is L(G') for a CFF G' with no useless symbols or e-productions (A->epsilon)
every CFL with e is defined by a grammar with no useless symbols, e-productions, or unit productions (A->B, but not A->b)
Chomsky Normal Form (CNG)
any CFL w/o e is generated by a grammar in which all productions are of the form A->BC or A->a
Greibach Normal Form (GNF)
any CFL w/oe can be generated from a grammar for which every
production is f the form A->aAlpha where Alpha is a possible empty string of variables
- Pushdown Automata (for CFLs)
M = (Q, Z, L, o, q0, Z0, F)
Q states
Z input alphabet
L stack alphabet
q0 in Q initial state
Z0 in L start symbol for stack
F (= Q of final states
o maps from Qx(Z union {e}) to finite subests of QxL* : o(q1,a) => (q2, 0...)
acceptance
by final state
by empty stack
deterministic PDA != non-deterministic PDA
deterministic PDA cannot recognize wwR (concat w (reverse w))
non-deterministic PDA <=> CFG
- CFL properties
pumping lemma (used to show languages like aIbIcI non-CF)
Let L be any CFL. Then there is a constant n, depending only on L,
such that if Z is in L and |z| >=n, then we may write z=uvwxy such
that:
1.) |vx|>=1
2.) |vwx| <=n
3.) for all i>0 uvIwxIy is in L
ogden's lemma (used to show languages like aIbJcKdL (i=0 or j=k=l non-CF)
Stronger than plain pumping lemma
Let L be any CFL. Then there is a constant n such that if Z is in
any word in L and we mark any N or more positions of z as
"distinguished", then we can write z=uvwxy such that:
1.) v and x together have at least one distinguished position
2.) vwx has at most n distinguished positions
3.) for all i>0 uvIwxIy is in L
CFLs are closed under:
union, concatenation, and Kleene closure
substitution
homomorphism and inverse homomorphism
NOT under intersection
NOT under complementation
If L is CFL and R is regular set, then L intersect R is a CFL
decidability:
there are algorithms to determine if a CFL is
empty
finite
infinite
there are algorithms to determine if x of L* is part of CFL G
- Turing machines
M=(Q, Z, L, o, q0, B, F)
Q is finite set of states
L is finite setof allowable tape symbols
B, a symbol of L, is the blank
Z = L-B, the input symbols
o is the next move function
o(QxL) -> (QxLx{L,R})
o may be undefined for some arguments
q0 in Q is the start state
F (= Q is the set of final states
views of TM's
class of languages it defines called the recursively enumerable sets
recursively enumerable sets, TM will halt if W in L, but may not otherwise halt
recursive sets, subset of r.e. sets, TM will halt for all input
class of integer functions it computes called the partial recursive functions
enumerator: generating device with output tape (see below)
extensions (which don't add power)
adding finite state to controller
multiple tracks (one single tape)
two way infinite tape (instead of one way)
multiple tapes
nondeterminstic
multidimensional tape
multihead
off-line turing machine (read only input)
restriced machines that are turing complete
a two stack deterministic PDA is equivalent to a turing machine
a two-counter machine
unlimited alphabeta, but only 3 states and one tape
if L (= (0+1)* and L is r.e., then L is accepted by a one-tape TM with tape alphabet {0,1,B}
still true if we remove restriction on L alphabet Z, by using binary encoding of L's Z
in fact, every TM can be simulated by an off-line TM having
one storage take with two symbols, 0 (blank) and 1. The TM can
print a 0 or 1 over a 0, but cannot print a 0 over a 1.
common tricks to build machines
checking off symbols (on extra track)
shifting over
parameterless, nonrecursive subroutines
church's hypothesis (aka church-turing thesis)
the assumption that the intuitive notion of "computable functions"
can be indentified with the class of partial recursive functions
lambda calculus
Post systems
general recursive functions
random access machines
enumerator: computing output
L is an r.e. (recursively enumerable) set iff L is G(M1) for some TM M1
L is recursive iff L is generated in canonical order (e, 0, 1, 00, 01, 10, 11, 000, 001,...)
- undecidability
a problem whose language is recursive is said to be decidable, otherwise undecidable.
a problem is undecidable if there is no algorithm that takes as
an input an instance of the problem and determines whether the
answer to that instance is "yes" or "no".
properties of recursive and recursively enumerable languagse
the complement of a recursive languages is recursive:
the union of recursive languages is recursive.
the union of two recursively enumerable langauges is recursively enumerable.
if L and its complement are both recursively enumerable => L and it's complement are recursive
from these, given L and it's complement
1.) either both are recursive
2.) neither are recursive
3.) one is r.e. but not recursive, the other is not r.e
univeral turing machine
(can be minimalistic: one tape, 5 states, 7 tape symbols see 173)
restrict input TM's to M=(Q, {0,1}, {0,1,B}, o, q1, B, {q2}
encode in binary format
we can show there is a non-r.e. language using a diagonalization proof
we can show that the languages accepted by a TM are r.e. but not recursive
Rice's Theorm
any nontrival property SL of the r.e. languages is undecidable
trivial means it's either true of no r.e. or all r.e. languages
examples of nontrival
emptiness
finiteness
regularity
context-freedom
LofSL is r.e. iff
1.) if L is in SL and L (= L', then L' is in SL
2.) if L is an infinite set ifn SL, then there is some finite subset L' of L that is in SL.
3.) the set of finite languages in SL is enumerable
Post Correspondence Problem is undecidable
It is undecidable whether an arbitrary CFG is ambiguous
It is undecidable for arbitrary CFG's G1 and G2 whether L(G1) intersected with L(G2) is empty
It is undecidable for arbitrary CFG G whether L(G) = Z*
Let G1 and G2 be arbitrary CFG's and R an arbitrary regular set. The following are undecidable:
1.) L(G1) = L(G2)
2.) L(G2) (= L(G1)
3.) L(G1) = R
4.) R (= L(G1)
Let G1 and G2 be arbitrary CFG's. The following are undecidable:
1.) the compliment of L(G1) is a CFL
2.) L(G1) intersected with L(G2) is a CFL
Greibach's theorm (analog of Rice's theorm for CFL)
Let C be a class of languages that is effecively closed under
concatenation with regular sets and union, and for which "=Z*" is
undecidable for any sufficiently large fixed size Z. Let P be any
nontrivial property that is true for all regular sets and that is
preserved under /a, where a is a single symbol. (that is if L has P,
do does L/a = {w|wa is in L}) Then P is undecidable for C.
applications
Let F be an arbitrary CFG. The is undecidable wether L(G) is regular.
Inherently ambiguity for CFL's is undecidable
recrusive function theory
Smn-theorm
given a partial recursive function g(x,y), there is an algorithm
one can use to construct from a TM for g and a value for x,
another TM which with input y computes g(x,y)
the recursion theorm
every total recursive function o mapping indices (integers
denoting TMs) of partial recursive functions has a fixed point xo
such that Fxo(y) = Fo(xo)(y)
app
- chomsky hierarchy
regular grammars
CFG with prodcutions of form
A->wB or A->w (w is possibly empty) = right linear
A->Bw or A->w (w is possibly empty) = left linear
both right and left linear grammars are regular grammars
If L has a regular grammar, then L is a regular set
If L is regular set, then it is generated by some left linear G and some right linear G
unrestricted grammars (aka semi-Thue or type 0 or phrase structure)
productions of form alpha->beta where alpha and beta are arbitrary, alpha != epsilon
if L is L(G) for unrestricted grammar G = (V,T,P,S), then L is an r.e. language
converse holds
if L is an r.e. language, then L is L(G) for unrestricted grammar G
context sensivitve grammars (and context sensivitve languages)
productions of form alpha->beta where beta is at least as long as alpha
almost any language one can think of is a CSL; all proofs depend on diagonalization
linear bounded automata (LBA) match CSLs
a nondeterminstic TM with
1.) input alpha contains ^ and $ the left and right endmarkers
2.) the LBA has no moves left of ^ or right of $ (and it cant print over markers)
unknown if deterministic LBA = or != to nondeterminstic LBA
if L is CSL, the L is accepted by some LBA
if L = L(M) for LBA M, then L-{e} is a CSL
every CSL is recursive but
there is a recursive language that is not CSL
the chomsky hierarchy
the regular sets are properlycontained in the CFLs
the CFLs-{e} are properly contrained in the CSLs
the CSLs are properly contained in the r.e. sets
- deterministic CFLs (accepted by deterministic PDAs)
syntax of many programming languages
LR-grammars generate exactly the DCFLs
DCFLs are closed under
complementation
quotient with regular language ( L/R is a DCFL {x|there exists w in R such that xw is in in L})
MIN(L) = {x|x is in L and no w in L is a proper prefix of x}
MAX(L) = {x|x is in L and x is not proper prefix of any word in L}
inverse homomorphism
intersection with regular set
not true for general CFLs
DCFL are NOT closed under
homomorphism
union
concatenation
Kleene closure
undeciable problems of DCFLs L and L'
1.) is L intersected with L' empty?
2.) is L a subset of L'?
3.) is L intersected with L' a DCFL?
4.) is L intersected with L' a CFL?
5.) is L unioned with L' a DCFL?
Let L be arbitrary CFL, it is undeciable whether L is a DCFL
LR(0) grammars
left-to-right scan producing rightmost derivation using 0 look ahead symbols
definition of LR(0)
1.) its start symbol does not appear on the right side of any
production
2.) for every variable prefix y of G, whenever "A -> alpha . " is
a complete item valid for y, then no other complete item with a
terminal to the right of the dot is valid for y.
If L is L(G) for an LR(0) G, then L is N(M) for a DPDA M
Every LR(0) G is unambiguous
If M is a DPDA, then G of M is an LR(0) grammar
If L has an LR(0) grammer iff L is a DCFL with the prefix property
LR(k) grammars
left-to-right scan producing rightmost derivation using k look ahead symbols
definition of LR(1)
1.) its start symbol does not appear on the right side of any
production
2.) whenever the set of items I valid for some viable preix
includes some complete item "A -> alpha ., {a1, a2, ..., aN}",
then:
i.) no aI appears immediately to the right of the dot in any
item of I and
ii.) if B->beta., {n1,..., bk} is another complete item in I, then
ai!=bj for any 1<=i<=n and 1<=j<=k
- closure properities of families of languages
familty of languages - a collection of L's with at least one nonempty L
trio - a family of Ls closed under
intersection with a regular set
inverse homomorphism
e-free (forward) homomorphism
closed under substitution by e-free regular sets
examples
CSLs
recursive sets
e-free regular sets (the smallest trio)
full trio - a trio closed under all homomorphism
closed under quotient with a regular set
closed under substitution by regular sets
examples
regular sets (the smallest full trio)
CFLs
r.e. sets
abstract family of languages (AFL)
trio and closed under union, concatenation, positive closure
closed under substitution into regular sets
examples
e-free regular sets
CSL
full AFL
full trio and closed under union, concatenation, kleene closure
examples
regular sets (smallest)
CFLs
r.e. sets
AFL requires six closure properties, but they are not independant
concatenation is dependant
union is dependant
intersection with R is dependant
SUMMARY OF CLOSURE PROPERTIES page 280
SUMMARY OF DECISION PROPERTIES page 281
- computation complexity theory
defintions
S(n) spaced bounded by n cell
T(n) time bounded by n steps
DTIME DSPACE
NTIME NSPACE
- intractable problems
P = languages recognizable in deterministic polynomial time (DTIME)
NP = languages recognizable in nondeterministic polynomial time (NTIME)
reducability
Let L' be polynomial-time reducible to L, then
a.) L' is in NP if L is in NP
b.) L' is in P if L is in P
If L' is log-space reducible to L then:
a.) L' is in P if L is in P
b.) L' is in NSPACE(log^k n) if L is in NSPACE(log^k n) if L
c.) L' is in DSPACE(log^k n) if L is in DSPACE(log^k n) if L
complete vs hard
L is complete for C with respect to polynomial-time (resp log-space) reductions if
*L is in C, and
every language in C is polynomial-time (resp log-space) reducible to L
L is hard for C with respect to polynomial-time (resp log-space) reductions if
every language in C is polynomial-time (resp log-space) reducible to L
*but L is not necessarily in C
np-complete problems
sat = satisfiability (all NP problems reduce to sat)
c-sat = satisfiability of CNF expressions (all sat reduce to csat)
3-sat = satisfiability of 3-CNF expressions (all csat reduce to sat)
2-sat
vertex cover problem (all vertex cover problems reduce to 3-sat)
directed hamiltonian circuit problem
hamiltonian circuit problem (all h problems reduce to directed h)
integer linear programming (3-sat reduces to ILP)
the chromatic number problem
the traveling salesman problem (can reduce hamiltonian to this)
the exact cover problem (node cover?)
the partition problem (clique?)
np-hard problems
if we can reduce np-complete to a problem, but not show it is NP
NP is closed under complementation iff some NP-complete problem is in NP
primality
nonprimeness
- no known polynomial-time algorithm
- not known to be NP-complete
but the complementary primeness, is in NP
suggests that there may be sets in the intersection of NP and co-NP that are not in P
number theory ((N+*=<01) is undecidable (Godel's incompleteness theorem)
theory of reals with addition IS DECIDABLE
presburger arithmetic IS DECIDABLE (Z+=<01)
- highlights of other important language classes
auxillary PDA (deterministic or nondeterministic)
read only input tape with w
FSM
r/w tape of length of input string w
a stack
stack automaton (SA) is a PDA plus
input is two-way read only with end markers
the stack head can traverse the stack in read only mode
- P decidable vs decidable but not P (NP-complete vs PSPACE) vs undecidable