Back to index

High order bits from 1997 Internet Cache Workshop, Boulder, CO

Names in brackets are primary authors of papers that are the basis of each comment.

Overview/Main Points

Former USSR and eastern bloc countries: inter-country connectivity is tens to hundreds of Kbits where it exists at all. Big opportunity for distillation proxies. Caches arranged in a serious hierarchy, but hit rates are low...suspect either user locality not enough or cache size not enough. Funding to make things better is a challenge, esp. for research institutions. [Krashakov]
Interestingly, in Armenia, they had copyright problems with a Russian content provider when trying to mirror the site, but not when trying to cache it with Harvest! Is there a legal distinction between mirrors and caches?? Other big problems include consistency/freshness management and web sites that do IP-based authentication (they fail through a chain of proxies).
School systems, esp. K-12, are relying heavily on caching and prefetching. Major subsidized projects in Tennessee (turnkey "cache box" for individual schools) and Washington state (interconnected caches running on high end multiheaded servers with 10GB disks) are aimed at giving schools faster access to Internet content.
"Squid Proxy Analysis" paper from UBC studied Silicon Valley proxy cache (the big node in the NLANR cache hierarchy), internal corporate cache, and community college. Through simulation, they found that greater # hits per day implies greater hit rate (as we found for greater # users...not quite the same thing?), but their hit rates flattened out at 35%, even for big caches. Maybe they didn't simulate large enough user pools? (Their one "big" workload, the NLANR cache, is excessively prone to thrashing since it only serves other caches, not a single end-user population.)
New Zealand cache (which communicates with NLANR cache) warns that a hierarchy of caches can actually make things worse if the cumulative bandwidth-delay product is very high because of the intermediate nodes. Cut-through routing and persistent HTTP connections help.
Singapore will be deploying a PICS (ie censorship) proxy soon. Interesting technical detail is the need to cache "PICS labels", which presumably map PICS tags to descriptions of what the tags say about specific content. This is comparable to caching DNS resolutions as Harvest does, and important for the same reasons.
Krishnamurthy and Wills are beginning to investigate something I thought of last year: clients piggyback invalidate or update requests on requests to either caches or servers, and/or servers piggyback invalidate/update on reply to client. (Similar to release consistent DSM.) They haven't investigated how to limit the size of the piggybacks, select which subsets to piggyback if the full set of resources is too large, whether the client or server makes that selection, etc., and of course those are the things that determine both the overhead of doing this and the consistency model clients will see. They do mention using a compressed (or key-compressed, in the DB sense) representation of resource names to reduce the size of piggyback payload.
NZGate and the New Zealand cache have had some problems enticing users to use the cache, mainly because current caching software does not provide a way for the cache to resell effective bandwidth with a charging model consistent with other options. I.e. it is difficult to come up with a model that will allow customers to decide if they'd rather buy cache service or buy direct connect bandwidth. Metering per byte doesn't seem to work well, so they're experimenting with metering based on peak bandwidth (suitably smoothed, aggregated, etc.) over several days from the cache to each customer.
Sally Floyd et al. have a paper on combining Web caching with multicast to avoid redundant transmission....still to be read.
The guys at UCL propose a cache "mesh" (rather than hierarchy) that works like IP routing: caches maintain "routes" to other caches and exchange routing updates, so if you miss on a page, you know explicitly which other cache to go to for it. What's interesting about this, to me, is that they're proposing to solve the "application-level routing problem" which would have to be solved to have meshes of cooperating proxies as well. I didn't have the whole paper locally, so didn't read how they did this part.
Similarly, Jeffery et al.: hierarchical caching is not scalable because each higher level needs a bigger cache. Bullshit, once an object is in some leaf caches, it is fine to evict it from higher-level ones. This is the L1/L2 inclusion problem in the context of Web caches; their argument is only true if higher level caches blindly cache everything (and perhaps not even then, if only a subset of its children are going to request the object anyway).
Michael Schwartz, of @Home and formerly Harvest, has this to say about how @Home arranges its cache hierarchy, and I quote: "In terms of availability, we achieve redundant cache service at the head ends by running multiple proxy servers, each of which answers a subset of proxy requests. Browsers execute proxy auto-configuration scripts that hash on URLs to select a proxy, with mechanisms for timeout/failover." Sound familiar? His paper says ICP will not survive long-term (since it doesn't enable "formal service agreements" as opposed to informal information-sharing between caches).
Static Caching: Define today's working set to be exactly the same as yesterday's, based on server logs and a ranking metric for "most valuable" documents (ratio of #references to bytes) such that the working set just fills the cache. Recompute the working set each day, but between recomputes, don't cache anything not in the WS. They claim that increasing cache size beyond 64 MB is not useful...no way this is true. (They used server logs from various servers, including UCB; but no client traces or proxy traces.) They talk about compressing documents on the server and/or in the cache, but not about having the cache compress them lazily as it fetches them. A nice idea but the work is not mature.

Back to index