Systems qualifying exam - OS Ed Swierk 28 May 2002 1. (a) In a system with VM, a page fault causes a trap into the OS, which handles the fault and then restarts the faulting instruction from the beginning. In the case of the mem_incr instruction, a page fault could occur in two places: on the inner mem[] and/or on the outer one. If the outer mem[] faults and the instruction is restarted, the inner ++ will execute a second time, which is incorrect. (b) A binary rewriter could overwrite each mem_incr instruction with a sequence of ld followed by increment. These instructions have the same effect as mem_incr, but if a page fault occurs during an ld, then only the ld will be restarted and the increment will not be repeated. The scheduler would have to be modified to handle any code that depends on the mem_incr being atomic. Alternatively, the rewritten code could also include an instruction to disable interrupts beforehand and reenable them afterwards. (The rewritten code might also have to save any extra registers that the ld uses and restore them afterwards.) (c) The virtual machine monitor would have to run at privilege level 0 in order to implement virtual memory for the underlying OS. If the OS runs at level 1, code in the OS kernel that expects to be running at level 0 (e.g. by explicitly checking privlev) would break. A binary rewriter could overwrite any privlev instruction with a jump to a routine in the VMM that returns the privilege level that the OS is expecting. 2. Overall steps: Create file: (1) find empty inode in free list (2) write file data (3) write inode (4) update free list [atomic] Append entry to directory: (5) copy directory data to a new location (6) add entry to the new copy of the directory, pointing to the new file (7) modify directory inode to point to first data block [atomic] If a crash occurs during any of the non-atomic steps, the recovery routine just needs to detect unreferenced inodes and remove them, which fsck does already. Main challenge: implement (4) and (7) atomically given only atomic write of a single byte. Inode numbers and block numbers have to be at least 3-4 bytes on a storage device with 60 GB. Idea: put a shadow copy of the free list and of each directory node at an easy to find location (e.g. given address 0XXXXXXX the shadow copy resides at 1XXXXXXX), and put a bit in the main copy indicating that it is being updated. If a crash occurs during update, the recovery routine can detect that it is corrupt and use the shadow copy instead. 3. (1) - protect against denial of service attacks: reject SYN floods, ping attacks and other nastygrams, not allowing such packets to make it to the OS - speed up the common case: put a mini web server with the most frequently accessed pages (preprocessed into TCP segments with precomputed checksums) onto the network controller (2) The OS could require that the untrusted code be implemented in a type-safe language like Java. When loading new code into the interrupt handler, the OS would run a bytecode verifier on it, and then compile it into native code with a JIT-like mechanism. The code could declare its memory requirements beforehand to eliminate the need for dynamic memory allocation and garbage collection. The verifier would have to guarantee that the code does not attempt to access random areas of memory, which is easy in the case of Java code. A harder challenge is ensuring that the code does not use too many CPU cycles, which cannot be guaranteed statically. However the OS could set a hard limit on the amount of time any interrupt handler spends doing its job, and could use a timer (at a higher interrupt priority) to kill a runaway handler.