-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Automata and Formal Languages - finite automata 5-tuple (Q, Z, o, q0, F) q is set of states, q0 is start state, F is set of final states Z is finite input alphabet o is transition function from QxZ->Q that is o(q,a) returns a state for current state q and input symbol a from Z 1.) DFA: deterministic finite automata 2.) NFA: nondeterminisic finite automata 3.) NFA w/e: nondeterminisic finite automata with epsilon transitions 4.) regular expressions these four have equivalent power to handle regular languages terms language L L accepted by an automata L denoted by an expression two-way finite automata can move head as part of o(q,a) no more powerful moore vs mealy machines application of FA with output moore - output depends on current state mealy - output depends on current state and input symbol moore and mealy are equivalent pumping lemma used to show languages nonregular Let L be a regular set. Then there is a constant n such that if z is any word in L, and |z| >= n, we may write z=uvw in such a way that |uv|<=n, |v|>=1, and for all i>=0, uv^iw is in L. Furthermore, n is no greater than the number of states of the samllest FA accepting L. regular sets are closed under union, concatenation, kleene closure complement intersection substitution homomorphisms and inverse homomorphisms quotient with arbitrary sets the set of sentences accepted by an FA M with n states is: nonempty iff the FA accepts a sentence of length less than n infinite iff the FA accepts a sentence of length l where n<=l<2n there is an algorithm to determine if two FA are equal M1 == M2? M3 = L1-L2 union L2-L1 if M3 accepts a word, then L1!=L2 Myhill-Nerod theorm the following three statements are equivalent 1.) The set L(=Z* is accepted by some FA 2.) L is the union of some of the equivalence classes of a right invariant equivalence relation of finite index. 3.) Let equivalence relation Rl be defined by: xRly iff for all z in Z*, xz is in L exactly when yz is in L. Then Rl is of finite index. this is used to prove The minimum state automaton accepting a set L is unique up to an isomorphism (i.e. a renaming of the states) and is given by M' in the proof of Myhill-Nerod - context free grammars (aka CFG) G = (V, T, P, S) V variables, T terminals V and T disjoint P productions A -> a, A is variable, a is string from (V union T)* S is start symbol terminology language L generated by grammar G derivations derivation trees leftmost and rightmost derivations leftmost = at each step in a derivation a production is applied to the leftmost variable ambiguity a CFG G such that some word has two parse trees is said to be ambiguous or: some word that has more that one leftmost (or rightmost) derivation a CFL for which every CFG is ambiguous is said to be inherently ambiguous the CFL ( {aNbNcMdM N>=1 M>=1} UNION {aNbMcMdN N>=1 M>=1} ) is inherently ambiguous useless Every nonempty CFL is generaged by a CFG with no useless symbols If L=L(G) for some CFG G, then L-{e} is L(G') for a CFF G' with no useless symbols or e-productions (A->epsilon) every CFL with e is defined by a grammar with no useless symbols, e-productions, or unit productions (A->B, but not A->b) Chomsky Normal Form (CNG) any CFL w/o e is generated by a grammar in which all productions are of the form A->BC or A->a Greibach Normal Form (GNF) any CFL w/oe can be generated from a grammar for which every production is f the form A->aAlpha where Alpha is a possible empty string of variables - Pushdown Automata (for CFLs) M = (Q, Z, L, o, q0, Z0, F) Q states Z input alphabet L stack alphabet q0 in Q initial state Z0 in L start symbol for stack F (= Q of final states o maps from Qx(Z union {e}) to finite subests of QxL* : o(q1,a) => (q2, 0...) acceptance by final state by empty stack deterministic PDA != non-deterministic PDA deterministic PDA cannot recognize wwR (concat w (reverse w)) non-deterministic PDA <=> CFG - CFL properties pumping lemma (used to show languages like aIbIcI non-CF) Let L be any CFL. Then there is a constant n, depending only on L, such that if Z is in L and |z| >=n, then we may write z=uvwxy such that: 1.) |vx|>=1 2.) |vwx| <=n 3.) for all i>0 uvIwxIy is in L ogden's lemma (used to show languages like aIbJcKdL (i=0 or j=k=l non-CF) Stronger than plain pumping lemma Let L be any CFL. Then there is a constant n such that if Z is in any word in L and we mark any N or more positions of z as "distinguished", then we can write z=uvwxy such that: 1.) v and x together have at least one distinguished position 2.) vwx has at most n distinguished positions 3.) for all i>0 uvIwxIy is in L CFLs are closed under: union, concatenation, and Kleene closure substitution homomorphism and inverse homomorphism NOT under intersection NOT under complementation If L is CFL and R is regular set, then L intersect R is a CFL decidability: there are algorithms to determine if a CFL is empty finite infinite there are algorithms to determine if x of L* is part of CFL G - Turing machines M=(Q, Z, L, o, q0, B, F) Q is finite set of states L is finite setof allowable tape symbols B, a symbol of L, is the blank Z = L-B, the input symbols o is the next move function o(QxL) -> (QxLx{L,R}) o may be undefined for some arguments q0 in Q is the start state F (= Q is the set of final states views of TM's class of languages it defines called the recursively enumerable sets recursively enumerable sets, TM will halt if W in L, but may not otherwise halt recursive sets, subset of r.e. sets, TM will halt for all input class of integer functions it computes called the partial recursive functions enumerator: generating device with output tape (see below) extensions (which don't add power) adding finite state to controller multiple tracks (one single tape) two way infinite tape (instead of one way) multiple tapes nondeterminstic multidimensional tape multihead off-line turing machine (read only input) restriced machines that are turing complete a two stack deterministic PDA is equivalent to a turing machine a two-counter machine unlimited alphabeta, but only 3 states and one tape if L (= (0+1)* and L is r.e., then L is accepted by a one-tape TM with tape alphabet {0,1,B} still true if we remove restriction on L alphabet Z, by using binary encoding of L's Z in fact, every TM can be simulated by an off-line TM having one storage take with two symbols, 0 (blank) and 1. The TM can print a 0 or 1 over a 0, but cannot print a 0 over a 1. common tricks to build machines checking off symbols (on extra track) shifting over parameterless, nonrecursive subroutines church's hypothesis (aka church-turing thesis) the assumption that the intuitive notion of "computable functions" can be indentified with the class of partial recursive functions lambda calculus Post systems general recursive functions random access machines enumerator: computing output L is an r.e. (recursively enumerable) set iff L is G(M1) for some TM M1 L is recursive iff L is generated in canonical order (e, 0, 1, 00, 01, 10, 11, 000, 001,...) - undecidability a problem whose language is recursive is said to be decidable, otherwise undecidable. a problem is undecidable if there is no algorithm that takes as an input an instance of the problem and determines whether the answer to that instance is "yes" or "no". properties of recursive and recursively enumerable languagse the complement of a recursive languages is recursive: the union of recursive languages is recursive. the union of two recursively enumerable langauges is recursively enumerable. if L and its complement are both recursively enumerable => L and it's complement are recursive from these, given L and it's complement 1.) either both are recursive 2.) neither are recursive 3.) one is r.e. but not recursive, the other is not r.e univeral turing machine (can be minimalistic: one tape, 5 states, 7 tape symbols see 173) restrict input TM's to M=(Q, {0,1}, {0,1,B}, o, q1, B, {q2} encode in binary format we can show there is a non-r.e. language using a diagonalization proof we can show that the languages accepted by a TM are r.e. but not recursive Rice's Theorm any nontrival property SL of the r.e. languages is undecidable trivial means it's either true of no r.e. or all r.e. languages examples of nontrival emptiness finiteness regularity context-freedom LofSL is r.e. iff 1.) if L is in SL and L (= L', then L' is in SL 2.) if L is an infinite set ifn SL, then there is some finite subset L' of L that is in SL. 3.) the set of finite languages in SL is enumerable Post Correspondence Problem is undecidable It is undecidable whether an arbitrary CFG is ambiguous It is undecidable for arbitrary CFG's G1 and G2 whether L(G1) intersected with L(G2) is empty It is undecidable for arbitrary CFG G whether L(G) = Z* Let G1 and G2 be arbitrary CFG's and R an arbitrary regular set. The following are undecidable: 1.) L(G1) = L(G2) 2.) L(G2) (= L(G1) 3.) L(G1) = R 4.) R (= L(G1) Let G1 and G2 be arbitrary CFG's. The following are undecidable: 1.) the compliment of L(G1) is a CFL 2.) L(G1) intersected with L(G2) is a CFL Greibach's theorm (analog of Rice's theorm for CFL) Let C be a class of languages that is effecively closed under concatenation with regular sets and union, and for which "=Z*" is undecidable for any sufficiently large fixed size Z. Let P be any nontrivial property that is true for all regular sets and that is preserved under /a, where a is a single symbol. (that is if L has P, do does L/a = {w|wa is in L}) Then P is undecidable for C. applications Let F be an arbitrary CFG. The is undecidable wether L(G) is regular. Inherently ambiguity for CFL's is undecidable recrusive function theory Smn-theorm given a partial recursive function g(x,y), there is an algorithm one can use to construct from a TM for g and a value for x, another TM which with input y computes g(x,y) the recursion theorm every total recursive function o mapping indices (integers denoting TMs) of partial recursive functions has a fixed point xo such that Fxo(y) = Fo(xo)(y) app - chomsky hierarchy regular grammars CFG with prodcutions of form A->wB or A->w (w is possibly empty) = right linear A->Bw or A->w (w is possibly empty) = left linear both right and left linear grammars are regular grammars If L has a regular grammar, then L is a regular set If L is regular set, then it is generated by some left linear G and some right linear G unrestricted grammars (aka semi-Thue or type 0 or phrase structure) productions of form alpha->beta where alpha and beta are arbitrary, alpha != epsilon if L is L(G) for unrestricted grammar G = (V,T,P,S), then L is an r.e. language converse holds if L is an r.e. language, then L is L(G) for unrestricted grammar G context sensivitve grammars (and context sensivitve languages) productions of form alpha->beta where beta is at least as long as alpha almost any language one can think of is a CSL; all proofs depend on diagonalization linear bounded automata (LBA) match CSLs a nondeterminstic TM with 1.) input alpha contains ^ and $ the left and right endmarkers 2.) the LBA has no moves left of ^ or right of $ (and it cant print over markers) unknown if deterministic LBA = or != to nondeterminstic LBA if L is CSL, the L is accepted by some LBA if L = L(M) for LBA M, then L-{e} is a CSL every CSL is recursive but there is a recursive language that is not CSL the chomsky hierarchy the regular sets are properlycontained in the CFLs the CFLs-{e} are properly contrained in the CSLs the CSLs are properly contained in the r.e. sets - deterministic CFLs (accepted by deterministic PDAs) syntax of many programming languages LR-grammars generate exactly the DCFLs DCFLs are closed under complementation quotient with regular language ( L/R is a DCFL {x|there exists w in R such that xw is in in L}) MIN(L) = {x|x is in L and no w in L is a proper prefix of x} MAX(L) = {x|x is in L and x is not proper prefix of any word in L} inverse homomorphism intersection with regular set not true for general CFLs DCFL are NOT closed under homomorphism union concatenation Kleene closure undeciable problems of DCFLs L and L' 1.) is L intersected with L' empty? 2.) is L a subset of L'? 3.) is L intersected with L' a DCFL? 4.) is L intersected with L' a CFL? 5.) is L unioned with L' a DCFL? Let L be arbitrary CFL, it is undeciable whether L is a DCFL LR(0) grammars left-to-right scan producing rightmost derivation using 0 look ahead symbols definition of LR(0) 1.) its start symbol does not appear on the right side of any production 2.) for every variable prefix y of G, whenever "A -> alpha . " is a complete item valid for y, then no other complete item with a terminal to the right of the dot is valid for y. If L is L(G) for an LR(0) G, then L is N(M) for a DPDA M Every LR(0) G is unambiguous If M is a DPDA, then G of M is an LR(0) grammar If L has an LR(0) grammer iff L is a DCFL with the prefix property LR(k) grammars left-to-right scan producing rightmost derivation using k look ahead symbols definition of LR(1) 1.) its start symbol does not appear on the right side of any production 2.) whenever the set of items I valid for some viable preix includes some complete item "A -> alpha ., {a1, a2, ..., aN}", then: i.) no aI appears immediately to the right of the dot in any item of I and ii.) if B->beta., {n1,..., bk} is another complete item in I, then ai!=bj for any 1<=i<=n and 1<=j<=k - closure properities of families of languages familty of languages - a collection of L's with at least one nonempty L trio - a family of Ls closed under intersection with a regular set inverse homomorphism e-free (forward) homomorphism closed under substitution by e-free regular sets examples CSLs recursive sets e-free regular sets (the smallest trio) full trio - a trio closed under all homomorphism closed under quotient with a regular set closed under substitution by regular sets examples regular sets (the smallest full trio) CFLs r.e. sets abstract family of languages (AFL) trio and closed under union, concatenation, positive closure closed under substitution into regular sets examples e-free regular sets CSL full AFL full trio and closed under union, concatenation, kleene closure examples regular sets (smallest) CFLs r.e. sets AFL requires six closure properties, but they are not independant concatenation is dependant union is dependant intersection with R is dependant SUMMARY OF CLOSURE PROPERTIES page 280 SUMMARY OF DECISION PROPERTIES page 281 - computation complexity theory defintions S(n) spaced bounded by n cell T(n) time bounded by n steps DTIME DSPACE NTIME NSPACE - intractable problems P = languages recognizable in deterministic polynomial time (DTIME) NP = languages recognizable in nondeterministic polynomial time (NTIME) reducability Let L' be polynomial-time reducible to L, then a.) L' is in NP if L is in NP b.) L' is in P if L is in P If L' is log-space reducible to L then: a.) L' is in P if L is in P b.) L' is in NSPACE(log^k n) if L is in NSPACE(log^k n) if L c.) L' is in DSPACE(log^k n) if L is in DSPACE(log^k n) if L complete vs hard L is complete for C with respect to polynomial-time (resp log-space) reductions if *L is in C, and every language in C is polynomial-time (resp log-space) reducible to L L is hard for C with respect to polynomial-time (resp log-space) reductions if every language in C is polynomial-time (resp log-space) reducible to L *but L is not necessarily in C np-complete problems sat = satisfiability (all NP problems reduce to sat) c-sat = satisfiability of CNF expressions (all sat reduce to csat) 3-sat = satisfiability of 3-CNF expressions (all csat reduce to sat) 2-sat vertex cover problem (all vertex cover problems reduce to 3-sat) directed hamiltonian circuit problem hamiltonian circuit problem (all h problems reduce to directed h) integer linear programming (3-sat reduces to ILP) the chromatic number problem the traveling salesman problem (can reduce hamiltonian to this) the exact cover problem (node cover?) the partition problem (clique?) np-hard problems if we can reduce np-complete to a problem, but not show it is NP NP is closed under complementation iff some NP-complete problem is in NP primality nonprimeness - no known polynomial-time algorithm - not known to be NP-complete but the complementary primeness, is in NP suggests that there may be sets in the intersection of NP and co-NP that are not in P number theory ((N+*=<01) is undecidable (Godel's incompleteness theorem) theory of reals with addition IS DECIDABLE presburger arithmetic IS DECIDABLE (Z+=<01) - highlights of other important language classes auxillary PDA (deterministic or nondeterministic) read only input tape with w FSM r/w tape of length of input string w a stack stack automaton (SA) is a PDA plus input is two-way read only with end markers the stack head can traverse the stack in read only mode - P decidable vs decidable but not P (NP-complete vs PSPACE) vs undecidable