-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Automata and Formal Languages
- finite automata
  5-tuple (Q, Z, o, q0, F)
  q is set of states, q0 is start state, F is set of final states
  Z is finite input alphabet
  o is transition function from QxZ->Q
    that is o(q,a) returns a state for current state q and input symbol a from Z
  1.) DFA: deterministic finite automata 
  2.) NFA: nondeterminisic finite automata
  3.) NFA w/e: nondeterminisic finite automata with epsilon transitions
  4.) regular expressions
  these four have equivalent power to handle regular languages
  terms
    language L
    L accepted by an automata
    L denoted by an expression
  two-way finite automata
    can move head as part of o(q,a)
    no more powerful
  moore vs mealy machines
    application of FA with output
    moore - output depends on current state
    mealy - output depends on current state and input symbol
    moore and mealy are equivalent
  pumping lemma
    used to show languages nonregular
      Let L be a regular set. Then there is a constant n such that if z is
      any word in L, and |z| >= n, we may write z=uvw in such a way that
      |uv|<=n, |v|>=1, and for all i>=0, uv^iw is in L. Furthermore, n is
      no greater than the number of states of the samllest FA accepting L.
  regular sets are closed under
    union, concatenation, kleene closure
    complement
    intersection
    substitution
    homomorphisms and inverse homomorphisms
    quotient with arbitrary sets
  the set of sentences accepted by an FA M with n states is:
    nonempty iff the FA accepts a sentence of length less than n
    infinite iff the FA accepts a sentence of length l where n<=l<2n
  there is an algorithm to determine if two FA are equal
    M1 == M2? 
    M3 = L1-L2 union L2-L1
    if M3 accepts a word, then L1!=L2
  Myhill-Nerod theorm
    the following three statements are equivalent
    1.) The set L(=Z* is accepted by some FA
    2.) L is the union of some of the equivalence classes of a right
        invariant equivalence relation of finite index.
    3.) Let equivalence relation Rl be defined by: xRly iff for all z in
        Z*, xz is in L exactly when yz is in L. Then Rl is of finite
        index.
    this is used to prove
      The minimum state automaton accepting a set L is unique up to an
      isomorphism (i.e. a renaming of the states) and is given by M' in the
      proof of Myhill-Nerod
- context free grammars (aka CFG)
  G = (V, T, P, S)
    V variables, T terminals 
      V and T disjoint
    P productions
      A -> a, A is variable, a is string from (V union T)*
    S is start symbol
  terminology
    language L generated by grammar G
  derivations
    derivation trees
    leftmost and rightmost derivations
      leftmost = at each step in a derivation a production is applied to the leftmost variable
  ambiguity
    a CFG G such that some word has two parse trees is said to be ambiguous
      or: some word that has more that one leftmost (or rightmost) derivation
    a CFL for which every CFG is ambiguous is said to be inherently ambiguous
      the CFL ( {aNbNcMdM N>=1 M>=1} UNION {aNbMcMdN N>=1 M>=1} ) is inherently ambiguous
  useless
    Every nonempty CFL is generaged by a CFG with no useless symbols
    If L=L(G) for some CFG G, then L-{e} is L(G') for a CFF G' with no useless symbols or e-productions (A->epsilon)
    every CFL with e is defined by a grammar with no useless symbols, e-productions, or unit productions (A->B, but not A->b)
  Chomsky Normal Form (CNG)
    any CFL w/o e is generated  by a grammar in which all productions are of the form A->BC or A->a
  Greibach Normal Form (GNF)
    any CFL w/oe can be generated from a grammar for which every
    production is f the form A->aAlpha where Alpha is a possible empty string of variables
- Pushdown Automata (for CFLs)
  M = (Q, Z, L, o, q0, Z0, F)
    Q states
    Z input alphabet
    L stack alphabet
    q0 in Q initial state
    Z0 in L start symbol for stack
    F (= Q of final states
    o maps from Qx(Z union {e}) to finite subests of QxL* : o(q1,a) => (q2, 0...)
  acceptance
    by final state
    by empty stack
  deterministic PDA != non-deterministic PDA
    deterministic PDA cannot recognize wwR (concat w (reverse w))    
  non-deterministic PDA <=> CFG
- CFL properties
  pumping lemma (used to show languages like aIbIcI non-CF)
    Let L be any CFL. Then there is a constant n, depending only on L,
    such that if Z is in L and |z| >=n, then we may write z=uvwxy such
    that:
    1.) |vx|>=1
    2.) |vwx| <=n
    3.) for all i>0 uvIwxIy is in L
  ogden's lemma (used to show languages like aIbJcKdL (i=0 or j=k=l non-CF)
   Stronger than plain pumping lemma
    Let L be any CFL. Then there is a constant n such that if Z is in
    any word in L and we mark any N or more positions of z as
    "distinguished", then we can write z=uvwxy such that:
    1.) v and x together have at least one distinguished position
    2.) vwx has at most n distinguished positions
    3.) for all i>0 uvIwxIy is in L
  CFLs are closed under:
    union, concatenation, and Kleene closure
    substitution
    homomorphism and inverse homomorphism
    NOT under intersection
    NOT under complementation
    If L is CFL and R is regular set, then L intersect R is a CFL
  decidability: 
    there are algorithms to determine if a CFL is 
      empty
      finite
      infinite
    there are algorithms to determine if x of L* is part of CFL G
- Turing machines
  M=(Q, Z, L, o, q0, B, F)
    Q is finite set of states
    L is finite setof allowable tape symbols
    B, a symbol of L, is the blank 
    Z = L-B, the input symbols
    o is the next move function
      o(QxL) -> (QxLx{L,R})
      o may be undefined for some arguments
    q0 in Q is the start state
    F (= Q is the set of final states
  views of TM's
    class of languages it defines called the recursively enumerable sets  
      recursively enumerable sets, TM will halt if W in L, but may not otherwise halt
        recursive sets, subset of r.e. sets, TM will halt for all input
    class of integer functions it computes called the partial recursive functions
    enumerator: generating device with output tape (see below)
  extensions (which don't add power)
    adding finite state to controller
    multiple tracks (one single tape)
    two way infinite tape (instead of one way)
    multiple tapes
    nondeterminstic
    multidimensional tape
    multihead
    off-line turing machine (read only input)
  restriced machines that are turing complete
    a two stack deterministic PDA is equivalent to a turing machine
    a two-counter machine
    unlimited alphabeta, but only 3 states and one tape
    if L (= (0+1)* and L is r.e., then L is accepted by a one-tape TM with tape alphabet {0,1,B}
      still true if we remove restriction on L alphabet Z, by using binary encoding of L's Z
        in fact, every TM can be simulated by an off-line TM having
        one storage take with two symbols, 0 (blank) and 1. The TM can
        print a 0 or 1 over a 0, but cannot print a 0 over a 1.
  common tricks to build machines
    checking off symbols (on extra track)
    shifting over
    parameterless, nonrecursive subroutines
  church's hypothesis (aka church-turing thesis)
    the assumption that the intuitive notion of "computable functions"
    can be indentified with the class of partial recursive functions
      lambda calculus
      Post systems
      general recursive functions
      random access machines
  enumerator: computing output
    L is an r.e. (recursively enumerable) set iff L is G(M1) for some TM M1
    L is recursive iff L is generated in canonical order (e, 0, 1, 00, 01, 10, 11, 000, 001,...)
- undecidability
    a problem whose language is recursive is said to be decidable, otherwise undecidable.
      a problem is undecidable if there is no algorithm that takes as
      an input an instance of the problem and determines whether the
      answer to that instance is "yes" or "no".
  properties of recursive and recursively enumerable languagse
    the complement of a recursive languages is recursive:
    the union of recursive languages is recursive.
      the union of two recursively enumerable langauges is recursively enumerable.
    if L and its complement are both recursively enumerable => L and it's complement are recursive
    from these, given L and it's complement
      1.) either both are recursive
      2.) neither are recursive
      3.) one is r.e. but not recursive, the other is not r.e
  univeral turing machine
    (can be minimalistic: one tape, 5 states, 7 tape symbols see 173)
    restrict input TM's to M=(Q, {0,1}, {0,1,B}, o, q1, B, {q2}
      encode in binary format
    we can show there is a non-r.e. language using a diagonalization proof
    we can show that the languages accepted by a TM are r.e. but not recursive
  Rice's Theorm
    any nontrival property SL of the r.e. languages is undecidable
      trivial means it's either true of no r.e. or all r.e. languages
      examples of nontrival
	emptiness
	finiteness
	regularity
	context-freedom
    LofSL is r.e. iff
      1.) if L is in SL and L (= L', then L' is in SL
      2.) if L is an infinite set ifn SL, then there is some finite subset L' of L that is in SL.
      3.) the set of finite languages in SL is enumerable
    Post Correspondence Problem is undecidable
    It is undecidable whether an arbitrary CFG is ambiguous
    It is undecidable for arbitrary CFG's G1 and G2 whether L(G1) intersected with L(G2) is empty
    It is undecidable for arbitrary CFG G whether L(G) = Z*
    Let G1 and G2 be arbitrary CFG's and R an arbitrary regular set. The following are undecidable:
      1.) L(G1) = L(G2)
      2.) L(G2) (= L(G1)
      3.) L(G1) = R
      4.) R (= L(G1)
    Let G1 and G2 be arbitrary CFG's. The following are undecidable:
      1.) the compliment of L(G1) is a CFL
      2.) L(G1) intersected with L(G2) is a CFL
  Greibach's theorm (analog of Rice's theorm for CFL)
    Let C be a class of languages that is effecively closed under
    concatenation with regular sets and union, and for which "=Z*" is
    undecidable for any sufficiently large fixed size Z. Let P be any
    nontrivial property that is true for all regular sets and that is
    preserved under /a, where a is a single symbol. (that is if L has P,
    do does L/a = {w|wa is in L}) Then P is undecidable for C.
      applications
        Let F be an arbitrary CFG. The is undecidable wether L(G) is regular.
        Inherently ambiguity  for CFL's is undecidable
  recrusive function theory
    Smn-theorm
      given a partial recursive function g(x,y), there is an algorithm
      one can use to construct from a TM for g and a value for x,
      another TM which with input y computes g(x,y)
    the recursion theorm
      every total recursive function o mapping indices (integers
      denoting TMs) of partial recursive functions has a fixed point xo
      such that Fxo(y) = Fo(xo)(y)
        app
- chomsky hierarchy
  regular grammars
    CFG with prodcutions of form 
      A->wB or A->w (w is possibly empty) = right linear
      A->Bw or A->w (w is possibly empty) = left linear
        both right and left linear grammars are regular grammars
    If L has a regular grammar, then L is a regular set
    If L is regular set, then it is generated by some left linear G and some right linear G
  unrestricted grammars (aka semi-Thue or type 0 or phrase structure)
    productions of form alpha->beta where alpha and beta are arbitrary, alpha != epsilon
    if L is L(G) for unrestricted grammar G = (V,T,P,S), then L is an r.e. language
      converse holds
        if L is an r.e. language, then L is L(G) for unrestricted grammar G
  context sensivitve grammars (and context sensivitve languages)
    productions of form alpha->beta where beta is at least as long as alpha
      almost any language one can think of is a CSL; all proofs depend on diagonalization
    linear bounded automata (LBA) match CSLs
      a nondeterminstic TM  with
        1.) input alpha contains ^ and $ the left and right endmarkers
        2.) the LBA has no moves left of ^ or right of $ (and it cant print over markers)
          unknown if deterministic LBA = or != to nondeterminstic LBA
    if L is CSL, the L is accepted by some LBA
      if L = L(M) for LBA M, then L-{e} is a CSL
    every CSL is recursive but
      there is a recursive language that is not CSL
   the chomsky hierarchy
     the regular sets are properlycontained in the CFLs
     the CFLs-{e} are properly contrained in the CSLs
     the CSLs are properly contained in the r.e. sets
- deterministic CFLs (accepted by deterministic PDAs)
    syntax of many programming languages
    LR-grammars generate exactly the DCFLs
    DCFLs are closed under
      complementation
      quotient with regular language ( L/R is a DCFL {x|there exists w in R such that xw is in in L})
      MIN(L) = {x|x is in L and no w in L is a proper prefix of x} 
      MAX(L) = {x|x is in L and x is not proper prefix of any word in L} 
      inverse homomorphism
      intersection with regular set
        not true for general CFLs
    DCFL are NOT closed under
      homomorphism
      union
      concatenation
      Kleene closure
    undeciable problems of DCFLs L and L'
      1.) is L intersected with L' empty?
      2.) is L a subset of L'?
      3.) is L intersected with L' a DCFL?
      4.) is L intersected with L' a CFL?
      5.) is L unioned with L' a DCFL?
    Let L be arbitrary CFL, it is undeciable whether L is a DCFL
  LR(0) grammars
    left-to-right scan producing rightmost derivation using 0 look ahead symbols
    definition of LR(0)
      1.) its start symbol does not appear on the right side of any
          production
      2.) for every variable prefix y of G, whenever "A -> alpha . " is
          a complete item valid for y, then no other complete item with a
          terminal to the right of the dot is valid for y.
    If L is L(G) for an LR(0) G, then L is N(M) for a DPDA M
      Every LR(0) G is unambiguous
    If M is a DPDA, then G of M is an LR(0) grammar
    If L has an LR(0) grammer iff L is a DCFL with the prefix property
  LR(k) grammars
    left-to-right scan producing rightmost derivation using k look ahead symbols
    definition of LR(1)
      1.) its start symbol does not appear on the right side of any
          production
      2.) whenever the set of items I valid for some viable preix
          includes some complete item "A -> alpha ., {a1, a2, ..., aN}",
          then:
          i.) no aI appears immediately to the right of the dot in any
              item of I and
          ii.) if B->beta., {n1,..., bk} is another complete item in I, then
               ai!=bj for any 1<=i<=n and 1<=j<=k
- closure properities of families of languages
  familty of languages - a collection of L's with at least one nonempty L
  trio - a family of Ls closed under
           intersection with a regular set 
           inverse homomorphism
           e-free (forward) homomorphism
    closed under substitution by e-free regular sets
    examples
      CSLs
      recursive sets
      e-free regular sets (the smallest trio)
  full trio - a trio closed under all homomorphism
    closed under quotient with a regular set
    closed under substitution by regular sets
    examples
      regular sets (the smallest full trio)
      CFLs
      r.e. sets
  abstract family of languages (AFL) 
    trio and closed under union, concatenation, positive closure
    closed under substitution into regular sets
    examples
      e-free regular sets
      CSL
  full AFL
    full trio and closed under union, concatenation, kleene closure
    examples
      regular sets (smallest)
      CFLs
      r.e. sets
  AFL requires six closure properties, but they are not independant
    concatenation is dependant
    union is dependant
    intersection with R is dependant
  SUMMARY OF CLOSURE PROPERTIES page 280
  SUMMARY OF DECISION PROPERTIES page 281
- computation complexity theory
  defintions
    S(n) spaced bounded by n cell
    T(n) time bounded by n steps
    DTIME DSPACE
    NTIME NSPACE
- intractable problems
  P  = languages recognizable in deterministic polynomial time (DTIME)
  NP = languages recognizable in nondeterministic polynomial time (NTIME)
  reducability
    Let L' be polynomial-time reducible to L, then
      a.) L' is in NP if L is in NP
      b.) L' is in P if L is in P
    If L' is log-space reducible to L then:
      a.) L' is in P if L is in P
      b.) L' is in NSPACE(log^k n) if L is in NSPACE(log^k n) if L
      c.) L' is in DSPACE(log^k n) if L is in DSPACE(log^k n) if L
  complete vs hard
    L is complete for C with respect to polynomial-time (resp log-space) reductions if
      *L is in C, and
      every language in C is polynomial-time (resp log-space) reducible to L
    L is hard for C with respect to polynomial-time (resp log-space) reductions if
      every language in C is polynomial-time (resp log-space) reducible to L
      *but L is not necessarily in C
  np-complete problems
    sat = satisfiability (all NP problems reduce to sat)
    c-sat = satisfiability of CNF expressions (all sat reduce to csat)
    3-sat = satisfiability of 3-CNF expressions (all csat reduce to sat)
    2-sat
    vertex cover problem (all vertex cover problems reduce to 3-sat)
    directed hamiltonian circuit problem
    hamiltonian circuit problem (all h problems reduce to directed h)
    integer linear programming (3-sat reduces to ILP)
    the chromatic number problem
    the traveling salesman problem (can reduce hamiltonian to this)
    the exact cover problem (node cover?)
    the partition problem (clique?)
  np-hard problems
    if we can reduce np-complete to a problem, but not show it is NP
  NP is closed under complementation iff some NP-complete problem is in NP
  primality
    nonprimeness
    - no known polynomial-time algorithm 
    - not known to be NP-complete
    but the complementary primeness, is in NP
      suggests that there may be sets in the intersection of NP and co-NP that are not in P

  number theory ((N+*=<01) is undecidable (Godel's incompleteness theorem)
    theory of reals with addition IS DECIDABLE
    presburger arithmetic IS DECIDABLE (Z+=<01)
- highlights of other important language classes
  auxillary PDA (deterministic or nondeterministic)
    read only input tape with w
    FSM
    r/w tape of length of input string w
    a stack
  stack automaton (SA) is a PDA plus
    input is two-way read only with end markers
    the stack head can traverse the stack in read only mode
- P decidable vs decidable but not P (NP-complete vs PSPACE) vs undecidable