-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Programming Languages - basic computability (overlap with automata theory) - good language design motivating application Lisp symbolic computations, logic, experimental programming C Unix operating system Simula Simulation PL/1 tried to solve all problems, not successful or influential abstract machine (clear program execution model) Lisp expression to be evaluated continuation represeting the remaining program association list (a-list) for variables heap of cons cells Fortran flat register machine no stacks, no recursion memory as linear array Algol family Stack of activation records heap storage Small talk Objects, communcating by messages theoretical foundations Lisp partial recursive functions for symbolic expressions based on lambda calculus - Lisp atoms, s-exps, lists (cons cells) functions vs special forms special forms do not eval their arguments evaluation of expressions eval arguments, including function position, then apply function to arguments static vs dynamic scope dynamic traditionally as bug. scheme and common list static expression vs statement vs declaration expression can be side effect free or not terminate statement is always side effect declaration conditional expressions structured control structure vs goto programs as data macros function expressions (lambda ...) recursion allow forward reference for recursive function calls higher order function a function that takes a function as an argument or returns a function or both garbage collection mark and sweep pure lisp vs lisp with side effects rplaca rplacd set-car! set-cdr! setq set! - fundamentals compilers and syntax (duplicated below in compilers) source program input lexical analyzer break into tokens syntax analyzer grammatical parsing semantic analyzer type checking, operator annotation intermediate code generation RTL code optimizer see below code generator machine code target program Hello, world! code optimizer common subexpression elimination copy propagation dead-code elimination loop optimization move code to before unrolling ... in-lining function calls strength reduction ... grammars (overlap with DCFL's of automata theory) BNF=Backus naur form derivations parse trees and ambiguity if-then-else example parsing and precedence lambda calculus functions and function expressions syntax grammar is simple - M ::== x | MM| lambda x.M variable binding alpha-equivalent if only diff in expressions is variables lambda x . x is alpha-equivalent to lambda y . y scope lambda x . M M is the scope of x equivalence and substituion (overlap with AI and Logic) lambda x . M = lambda y . [y/x] M (more formally than alpha-equivalence) (lambda x . M) N = [N/x] M (beta-equivalence (function application)) beta-equivalence is the Algol 60 copy rule for evaluating function calls also related to macroexpansion and in-line substituion of functions always remember to rename bound variables in M that might be the same as free variables in N functions of several arguments f(g,x) = g(x) two arguments f-curry = lamba g. lambda x. gx one argument, returns function of one argument f-curry g x = gx = f(g,x) curring in this case works becuase (see left) declarations let x = M in N (let ((x M) n) Lisp/Scheme ( lambda x . N ) M lambda calculus syntacic sugar - derivative sweeter syntax - declarations are good example recursion and fixed points fixed point of a function F is a value f such that f=G(f) one lambda calculus fixed point operator Y=lambda f . (lambda x . f (x x)) (lambda x . f (x x)) Y f = (lambda x . f (x x)) (lambda x . f (x x)) Y f = f(lambda x . f (x x)) (lambda x . f (x x)) Y f = f (Y f) example of using Y with factorial f function f(n) { if n=0 then 1 else n*f(n-1)}; f(n) = if n=0 then 1 else n*f(n-1) f = lambda n . if n=0 then 1 else n*f(n-1) G = lambda f. lambda n . if n=0 then 1 else n*f(n-1) f = G(f) fact = YG fact 2 = (YG) 2 = G (YG) 2 = (lambda f. lambda n . if n=0 then 1 else n*f(n-1)) (YG) 2 = (lambda n . if n=0 then 1 else n*(YG)(n-1)) 2 = if 2=0 then 1 else 2*(YG)(2-1)) = if 2=0 then 1 else 2*(YG)(1)) = 2*((YG)1) = 2* G (YG) 1 = 2* (lambda f. lambda n . if n=0 then 1 else n*f(n-1)) (YG) 1 = 2*(lambda n . if n=0 then 1 else n*(YG)(n-1)) 1 = 2* if 1=0 then 1 else 1*(YG)(1-1)) = 2* if 1=0 then 1 else 1*(YG)(0)) = 2*1*((YG)0) = 2*1* G (YG) 0 = 2*1 (lambda f. lambda n . if n=0 then 1 else n*f(n-1)) (YG) 0 = 2*1(lambda n . if n=0 then 1 else n*(YG)(n-1)) 0 = 2*1 if 1=0 then 1 else 1*(YG)(0-1)) = 2*1*1 = 2 reduction, confluence, normal forms if we change equivalence to reduction (lambda x . M) N -> [N/x] M (beta-reduction) then we have a direction to evaluate normal forms use reductions until you cannot proceed, then you are in normal form confluence eval order does not affect final value this implies there is only one normal form important properties of lambda calculus every computable function can be in pure lambda calculus no need for side effects numbers can be expressed with church numerals recursion with Y operator evaluation is order independant because of confluence denotational semantics the meaning of program is a function of states to states good for mathmatically style proofs and analysis often bad for time and space analysis compositionality meaning of program from meaning of parts not syntactical operations like subsitution operation semantics step by step state transitions lambda calculs reduction better for time and space analysis functional and imperative languages functional: lisp and ML pure functional: no side effects can always replace common subexpressions referential transparency if we say that a variable refers to its location in memory then imperative languages are referentially transparent but not if a variable refers to the value stored in that location concurrency pure functional: evaluate in parallel imperative: must evaluate in order practical functional programming Backus's FP Haskell Lazy ML - the Algol family and ML algol family algol 60, pascal, C algol 58, algol 68, algol w, euclid, el1, mesa, modula-2, oberon, modula-3 algol 60 included backus of fortran mccarthy of lisp alan perlis important features simple statement-oriented syntax with semicolon blocks with begin and end (C {}) recurisve functions and stack storage allocation few ad hoc restrictions general expression in array indices procedures that could be called with procedures parameters a primitive static type system the bad type of procedure parameters doesn't include arg types array parameters without bounds pass-by-value and pass-by-name pass-by-name interacts badly with side-effects pass-by-value expensive for arrays control flow problems with memory management jumping out of nested blocks algol 68 second system effect some features made compilation inefficient improved on primitive type system primitive modes: int, real, char. bool, string, complex, bits, bytes, semaphore, format, file compound modes: array, structure, procedure, set, pointer unrestricted combinations such as array of pointers to procedures memory stack for local variables heap for non-locals, but garbage collected calling convention pass-by-value pass-by-reference with pointers basically C's model pascal simple one pass compiler type system: algol 60 < pascal < algol 68 records, variant records, subranges modula pascal + module system C arrays = pointers no nested function declarations C++ has stronger type checking rules ML constructs (more details below) patterns declarations functions polymorphism (later chapter) overloading type declarations reference cells exceptions (later chapter) type correctness exceptions are type safe repl style expressions vs declarations div vs / no auto casting of integers to/from reals identifiers vs variables declarations introduce constants, not variables for variables, declare a mutable cell (see reference cells below) types unit bool int strings real tuples #1(2, 4)=>2 records #First_name({First_name="Donald", Last_name="Knuth"}), lists 1::2::3::nil = [1,2,3] value declaratios can destructure structured types val (x,y,z) = (1,2,3); this is how functions declaration sugar works fun f(x,y) = x+y; but more generally fun length(nil) = 0 | length (x::xs) = 1+length(xs); pattern match happens in order of declaration data types enumeration style datatype color = Red|Blue|Green union style datatype student = BS of name | MS of name*school | PhD of name*faculty; pattern match on BS/MS/PhD tags recursive types datatype tree = LEAF of int | NODE of (tree*tree); polymorphic datatype 'a tree = LEAF of 'a | NODE of ('a tree*'a tree); reference cells ref v creates ref to v !r returns value in cell r r:=v places v in ref cell r ie val x = ref 0; x := (3*!x)+5; !x =>5 expression sugar e1;e2 = (fn x => e2) e1 - Type systems and type inference type errors hardware x() if x is not a function unintended semantics int_add(3, 4.5) types and optimization typing can lead to efficient machine code for example, for record/struct access type safety examples not safe c/c++ type casts, pointer arithmetic almost safe pascal explicit dealoc, danlging pointers safe lisp, ml, smalltalk, java complete type checking compile-time vs run-time checking compile time reduces need for run-time checks, benefit performance (lisp vs ml) compile time checking tends to be sound and conservative sound if no programs with errors are considered correct conservative if some programs without errors are still considered to have errors tied to halting problem if we assume all things halt, we find compile errors that might not happen at run-time type inference (overlap with AI and logic) examples page 137 1.) assing a type to the expression and each subexpresion. for any compound expression or variable, use a type variable. for known operations or constants such as + or 3, use the type that is known to match the symbol 2.) generates a set of constraints on types, using the parse tree of the expression. these constraints reflect the fact that if a function is applied to an argument, for example, then the type of the argument must equal the type of the domain of the function. function application: if the type of f is a, the type of e is b, and the toype of fe is c, then we must have a = b->c lambda abstraction (function expression): if the type of x is a and the type of e is b, then the type of "lambda x .e" must be a->b 3.) solve these constrains by means of unification, which is a substituion-based algorithm for solving systems of equations polymorphism forms parameteric polymorphism - type variables (C++ templates) ad hoc polymorphism - overloading subtype polymorphism - inheritence implementation c++ polymorphism implemented at link time faster/code bloat ml has uniform data representation so no bloat, but potentially efficent polymorphism vs overloading parameteric polymorphism, same algorithm on different types ad hoc polymorphism, different algorithm for different types (at compile-time) type declarations transparent - alternative name only, can be mixed aka structural type equality type Celsius = real; opqaue - new type not equal to any previous aka name type equality datatype (above) and abstract types - scope, functions, and storage management block structure often denote scope variable classes local variables (stack allocated in activation record) parameters (also stack allocated in activation record) global variables (or simply "free variables", since they could be just from outside the local scope) activation records (aka stack frame) stuff for blocks even without function calls referenced by environment pointer (aka stack pointer) space for local varibles space for intermediate results control link (aka dynamic link) to top of previous activation record additional stuff in activation records for function calls space for actual parameters (like local varibles) return address return result address addition for static scoping access link (aka static link) parameter passing main distinctions the time that the actual parameter is evaluated the location used to store the parameter value mechanisms pass-by-reference: pass the l-value (address) of the actual parameter pass-by-value: pass the r-lvalue (contents of address) of the actual parameter pass-by-name: algol 60, beta-reduction, C macros pass-by-value-result (aka call-by-value-result or copy-in/copy-out) issues: side effects aliasing efficiency scope vs lifetime scope: a region of text in which a declaration is visible (compile-time) lifetime: the duration, during a run of a program, during which a location is allocated as teh result of a specific declaration (run-time) scope can be different than lifetime when nested declarations hide scope of outer declarations static scope vs dynamic scope static: a global id refers to the id with that name declared in the closest enclosing scope of the text examples: newer lisps, scheme, algol, pascal, c, ml, most current languages dyanmic: a global id refers to the id associated with the most recent activation record examples: older lisps, tex/latex, exceptions, macros tail recursion can be used to express iteration optimization to reuse activation records necessary for iteration to use constant space first class (functions in this case, but for any type) delcared within any scope pass as arguments to functions returned as results of functions downward funarg problem closures used for maintaining static scope when functions are passed to functions or returned as results. contains link to defining activation frame pointer to code when closure invoked allocate new activation frame as usual set access link to closure's defining activation frame upward funarg problem (aka upward fun-result problem) closures to the rescue again when functions are returned from nested scopes (with free variables), activation records do not obey a stack discipline can use garbage collection to recover frames - control in sequential languages goto vs if/then/else/end while/do/end for{} case... exceptions effects jump out of a block or funciton invocation pass data as part of the jump return to a program point that was setup to continue the computation terms and syntax (ml syntax: exception name of type) raising (aka throwing) (ml: raise name arguments) handling (aka catching) (ml: handle => ) uses for error conditions for efficency handlers are typical searched via dynamic scope (dynamic link, control link) C++ exceptions throw/try/catch (no finally...) problems with deallocation can lead to leaks if not careful continuations a function representing the "the rest of the program" continuation passing form (or cps continuation passing style) fun fact(n) = if n=0 then 1 else n*fact(n-1) fun factk(n) = let fun f(n, k) = if n=0 then k(1) else f(n-1, fn x => k(n*x)) in f(n, fn x => x) compiling with continuations (steele and sussman MIT AI MEMO 474) 1. lexical analysis 2. translate to lambda calculus form 3. conversion to CPS 4. optimization of CPS 5. closure conversion - eliminate free variables 6. elimination of nested scopes 7. register spilling - no expression with more than n free vars 8. generation of target-assembly language program 9. assembly to production target-machine program compiles calls directly to jumps (lambda the ultimate goto) functions and eval order delay and force (macro/special forms) Delay(e) == fn() => e Force(e) == e() - data abstraction and modularity structured programming interface - a description of the parts of a component that are visible to other program components specification - a description of the behavior ot the component, as observable through its interface client vs implementation client is user of component interface implementation is code that defines program component concretely abstraction procedural abstraction well-defined interface information hiding of locals variables (reduce namespace clutter) reusability data abstraction indentify interface to data structure information hiding of implementation details reuse of data structures abstract data types (from Clu clusters) type with set of operations abstype cmplx = C of real*real with fun cmplx(x,y:real) = C(x,y) fun x_coord(C(x,y)) = x fun y_coord(C(x,y)) = y fun add(C(x1,y1), C(x2,y2)) = C(x1+x2, y1+y2) end abstype could include type variables, useful to define say "set" data-type induction constuctors: build elements of type operators: map elements of the type defined with constuctors to others that are definable only with constuctors observers: operations that return a result of another type example with set: constuctors: empty and insert operations: union observer: isMember induction because all elements of a given abstract type are given by a sequence of constuctor operations, we may prove properties fo all elements of an abs type by induction on the number of constuctors necessary to produce a given element. modules (from Modula) a module simply allows a number of declarations to be grouped together may also allow programmer to control visibility of items in a module also, may support parameterized modules modula terms interface is called definition module implementation is called implementation module Ada packages ML modules structures, signatures, functors signatures = interfaces structures = modules not first class, only bound to id's or passed to functors functors = functions from structures to structures used to define generic modules not higher-order, no need for signatures - object oriented design objects interactions by messages or member-function calls 4 key features dynamic lookup: object chooses how to respond to message (runtime, vs overloading at compile-time) abstraction: implementation details are hidden subtyping: if object a has the functionality of b, then we may use a in any context expecting b inheritence: reuse the definition of one kind of object to define another kind of object inheritence breaks some abstraction inheritence is not subtyping subtyping is about interfaces, inheritence is about implementation closures can be seen as simple objects dynamic lookup via fn pointers abstraction through static scoping but no subtyping but no inheritence object-oriented design identify objects identify semantics of these objects identify the relationships among objects implement the objects program structure function oriented each operation contains a dispatch based on type object-oriented each type contains each operation design patterns it sovles a problem it is a proven concept the solution is not obvious it describes a relationship the pattern has a significant human component (mimimize human intervention) examples singleton facade - Simula added to algol 60 class concepts and reference variables (pointers to objects) pass-by-reference char, text, i/o features coroutines, mechanism for concurrent programs removed from algo 60 changed pass-by-name to combo pass-by-value and pass-by-reference some initialization requirements on variables own variables (C static variables) algol 60 string type (in favor of new text type) class: a procedure returna pointer to its activation record object: an activation record produce by call to a class, called an instance of the class dynamic lookup: operations selected from activation record abstraction: in later versions subtyping: arising from the way types associated with classes inheritence: in the form of class prefixing, including redefining parts of subclass extras inner: method of aub should be called in combination with exec of inner superclass code inspect and qua: inspect is a class (type) test and qua is a form of type cast checked at runtime - Smalltalk diff from simula in smalltalk, everything is object, even a class all operations are therefore messages to objects objects/classes were shown to be useful organizing concepts for building an entire programming environment and system abstract was added, distinguishing between private (protected actually) instance variables (data) and public methods (code) dynamic lookup: method lookup by receiver of messages, search local method dictionary first, then upward ingalls test: can you define a new type of integer and have it work everywhere? passing ingalls test can mean substantial run-time cost, java avoid this abstraction: as noted above subtyping: no compile time types, as long as two objects respond to the same messages they are subtypes ...at observing semantic substitution inheritence: by default from superclass - Objects and run-time efficency: C++ c++ goals data abstraction and object-oriented features better static type checking backwards compatability with C (mostly) efficency of compiled code: don't pay cost for features you don't use c stucts become C++ objects non-object additions to C type bool reference types and pass-by-reference also pass-by-const-reference user-defined overloading function templates exceptions misc new/delete i/o default parameter values eol comments reduced need of typedef dynamic lookup: for virtual functions abstraction: via public, private, protected, etc subtyping: using subclassing, interface via abstract class, subclasser controls if superclass type is visible inheritence: using subclassing, single or multiple inheritence problems of C++ casts and conversion can be complex and unpredictable objects allocated on the stack overloading can intefact unpredictable with dynamic virtual function lookup multiple inheritence: fun with vtables pure virtual = abstract C++ lookup vs smalltalk lookup compile-time vs runtime multiple inheritence casts with multiple inheritence cause runtime pointer arithmetic name clashes implicit resolution: python and clos explicit resolution: C++ (eiffel?) disallow name clashes - portability and safety: Java java the language the compilers and runtime (vm) the library and toolkits goals portability safety simplicity efficiency reduced emphasis on efficiency gave Java designers more flexibility dropped from c/c++ structs and unions functions multiple inheritence goto operator overloading (to be added) automatic coercions pointers dynamic lookup: abstraction: subtyping: inheritence: - concurrency and distributed programming multiprogramming: multple programs, one processor multiprocessing: multple programs, multple processors nondeterminism mechanisms for: communications buffered vs unbuffered sync vs async message order coordiation locking semaphores atomicity critical sections mutual exclusion: only one process at a time in critical section progress: if no process is in section, one may proceed bounded waiting: no waiting process should have to wait indefinitely implementation locks and busy waiting deadlocks semaphores wait and signal (P and V (pass and release)) binary vs counting little busy waiting (just for critical section on counter and wait queue) monitors more common in programming languages than raw semaphores, but conceptually equally powerful actors concurrent ml threads events channels M-variables (mvars from ID) Java threads communication via memory etc synchonization object locks for mutual exclusion objects wait sets for semaphores thread termination rmi locating remote objects communcation with remote objects transmitting objects across the network - logic programming paradigm and prolog ... - C function type syntax arithmetic promotion - Smalltalk message passing method dictionary - C++ vtables (virtual function tables) name mangling and other linker tricks - Algol 60 pass by name - defintion of state of computation (similar to architecture section) program counter stack pointer - memory management stack allcoation - activation records - frame pointer heap allocation malloc fragmentations handles allows compaction to prevent fragmentation must double dereference garbage collection - conservative - mark and sweep - stop and copy compaction helps VM performance avoids fragmentation generational collectors shared memory - error handling exceptions - concurrency threads (kernel vs user) synchonization monitors semaphores (binary and counting) test-and-set (spin) vs locks (block) reentrance - lambda calculus alpha renaming beta reduction Y fixed point operator - operational vs denotation semantics - argument passing pass by value pass by reference pass by name - iteration recursion tail recursion - polymorphism generic types - object oriented encapsulation inheritence overloading - scope static scope dynamic scope object lifetime