A Hierarchical Object Cache

USC and UC Boulder; Armando's comments on lessons learned and how they apply to TCS proxy

One-line summary

Various important (and nonobvious) lessons from implementing Harvest cache.

Detailed timing breakdown shows TCP setup is responsible for 15ms of the 20ms Harvest response time seen by clients.
Getting the nonblocking disk I/O and select loop right required tuning for different systems.
Transparency was hardest to get right, due to initial assumption that URL plus MIME headers gives unique object name; MIME headers are vastly different for different clients.
Timeouts from large MIME headers that required fragmenting sometimes caused nonmaskable faults to user.
Some noncompliant http servers close client connection before reading all MIME headers!
DNS "negative caching" timeouts were too lengthy, causing users to report that DNS lookups worked fine until Harvest was used.
Browser-specific dynamically-generated Web pages cause problems with hit rates and really require MIME headers to be included in comparison for correctness.
Client and server implementation differences, noncompliance with standards, and vendor interoperability in general have forced tradeoffs between efficiency/performance, design cleanliness and operational transparency.
Keeping metadata in memory and limiting the VM image size to avoid page faults was an important win.
Monolithic filesystems are the wrong model for the evolving Internet: feature set is overkill for many applications, implementations are complex and nonmodular, vendor interoperability is more difficult since components are "larger" and more tightly coupled to rest of OS...