A Hierarchical Object Cache
USC and UC Boulder; Armando's comments on lessons learned and how they apply
to TCS proxy
One-line summary
Various important (and nonobvious) lessons from implementing Harvest cache.
Main ideas
- Detailed timing breakdown shows TCP setup is responsible for 15ms
of the 20ms Harvest response time seen by clients.
- Getting the nonblocking disk I/O and select loop right required tuning
for different systems.
- Transparency was hardest to get right, due to initial assumption that
URL plus MIME headers gives unique object name; MIME headers are
vastly different for different clients.
- Timeouts from large MIME headers that required fragmenting sometimes
caused nonmaskable faults to user.
- Some noncompliant http servers close client connection before reading
all MIME headers!
- DNS "negative caching" timeouts were too lengthy, causing
users to report that DNS lookups worked fine until Harvest was used.
- Browser-specific dynamically-generated Web pages cause problems with
hit rates and really require MIME headers to be included in comparison for
correctness.
- Client and server implementation differences, noncompliance with standards,
and vendor interoperability in general have forced tradeoffs between efficiency/performance,
design cleanliness and operational transparency.
- Keeping metadata in memory and limiting the VM image size to avoid
page faults was an important win.
- Monolithic filesystems are the wrong model for the evolving Internet:
feature set is overkill for many applications, implementations are complex
and nonmodular, vendor interoperability is more difficult since components
are "larger" and more tightly coupled to rest of OS...
Back to index