Systems qualifying exam - OS
Ed Swierk
28 May 2002


1.

(a) In a system with VM, a page fault causes a trap into the OS, which
handles the fault and then restarts the faulting instruction from the
beginning.

In the case of the mem_incr instruction, a page fault could occur in two
places: on the inner mem[] and/or on the outer one.  If the outer mem[]
faults and the instruction is restarted, the inner ++ will execute a
second time, which is incorrect.

(b)

A binary rewriter could overwrite each mem_incr instruction with a
sequence of ld followed by increment.  These instructions have the
same effect as mem_incr, but if a page fault occurs during an ld, then
only the ld will be restarted and the increment will not be repeated.

The scheduler would have to be modified to handle any code that depends
on the mem_incr being atomic.  Alternatively, the rewritten code could
also include an instruction to disable interrupts beforehand and
reenable them afterwards.

(The rewritten code might also have to save any extra registers that
the ld uses and restore them afterwards.)

(c)

The virtual machine monitor would have to run at privilege level 0 in
order to implement virtual memory for the underlying OS.  If the OS runs
at level 1, code in the OS kernel that expects to be running at level 0
(e.g. by explicitly checking privlev) would break.

A binary rewriter could overwrite any privlev instruction with a jump to
a routine in the VMM that returns the privilege level that the OS is
expecting.


2.

Overall steps:

Create file:
(1) find empty inode in free list
(2) write file data
(3) write inode
(4) update free list [atomic]

Append entry to directory:
(5) copy directory data to a new location
(6) add entry to the new copy of the directory, pointing to the new file
(7) modify directory inode to point to first data block [atomic]

If a crash occurs during any of the non-atomic steps, the recovery routine
just needs to detect unreferenced inodes and remove them, which fsck does
already.

Main challenge: implement (4) and (7) atomically given only atomic write of
a single byte.  Inode numbers and block numbers have to be at least 3-4 bytes
on a storage device with 60 GB.

Idea: put a shadow copy of the free list and of each directory node at an
easy to find location (e.g. given address 0XXXXXXX the shadow copy resides
at 1XXXXXXX), and put a bit in the main copy indicating that it is being
updated.  If a crash occurs during update, the recovery routine can detect
that it is corrupt and use the shadow copy instead.


3.

(1)
- protect against denial of service attacks: reject SYN floods, ping
  attacks and other nastygrams, not allowing such packets to make it to
  the OS
- speed up the common case: put a mini web server with the most
  frequently accessed pages (preprocessed into TCP segments with
  precomputed checksums) onto the network controller
      
(2) The OS could require that the untrusted code be implemented in a
type-safe language like Java.  When loading new code into the interrupt
handler, the OS would run a bytecode verifier on it, and then compile it
into native code with a JIT-like mechanism.  The code could declare its
memory requirements beforehand to eliminate the need for dynamic memory
allocation and garbage collection.

The verifier would have to guarantee that the code does not attempt to
access random areas of memory, which is easy in the case of Java code.

A harder challenge is ensuring that the code does not use too many CPU
cycles, which cannot be guaranteed statically.  However the OS could set
a hard limit on the amount of time any interrupt handler spends doing its
job, and could use a timer (at a higher interrupt priority) to kill a
runaway handler.