Back to index
High order bits from 1997 Internet Cache Workshop, Boulder, CO
Names in brackets are primary authors of papers that are the basis of
each comment.
Overview/Main Points
- Former USSR and eastern bloc countries: inter-country
connectivity is tens to hundreds of Kbits where it exists at
all. Big opportunity for distillation proxies. Caches arranged
in a serious hierarchy, but hit rates are low...suspect either
user locality not enough or cache size not enough. Funding to
make things better is a challenge, esp. for research
institutions. [Krashakov]
- Interestingly, in Armenia, they had
copyright problems with a Russian content provider when trying to
mirror the site, but not when trying to cache it with Harvest!
Is there a legal distinction between mirrors and caches?? Other
big problems include
consistency/freshness management and web sites that do IP-based
authentication (they fail through a chain of proxies).
- School systems, esp. K-12, are relying heavily on caching and
prefetching. Major subsidized projects in Tennessee (turnkey
"cache box" for individual schools) and Washington state (interconnected
caches running on high end multiheaded servers with 10GB disks)
are aimed at giving schools faster access to Internet content.
- "Squid Proxy Analysis" paper from UBC studied Silicon Valley
proxy cache (the big node in the NLANR cache hierarchy), internal
corporate cache, and community college.
Through simulation, they found that greater # hits per day
implies greater hit rate
(as we found for greater # users...not quite the same thing?),
but their hit rates flattened out at 35%, even for big caches.
Maybe they didn't simulate large enough user pools? (Their one
"big" workload, the NLANR cache, is excessively prone to
thrashing since it only serves other caches, not a single
end-user population.)
- New Zealand cache (which communicates with NLANR cache) warns
that a hierarchy of caches can actually make things worse
if the cumulative bandwidth-delay product is very high because of
the intermediate nodes. Cut-through routing and persistent HTTP
connections help.
- Singapore will be deploying a PICS (ie censorship) proxy soon.
Interesting technical detail is the need to cache "PICS labels",
which presumably map PICS tags to descriptions of what the tags
say about specific content. This is comparable to caching DNS
resolutions as Harvest does, and important for the same reasons.
- Krishnamurthy and Wills are beginning to investigate something I
thought of last year: clients piggyback invalidate or update requests
on requests to either caches or servers, and/or servers piggyback
invalidate/update on reply to client. (Similar to release
consistent DSM.) They haven't investigated how to limit the size
of the piggybacks, select which subsets to piggyback if the full
set of resources is too large, whether the client or server makes
that selection, etc., and of course those are the things that
determine both the overhead of doing this and the consistency
model clients will see. They do mention using a
compressed (or key-compressed, in the DB sense) representation of
resource names to reduce the size of piggyback payload.
- NZGate and the New Zealand cache have had some problems enticing
users to use the cache, mainly because current caching software
does not provide a way for the cache to resell effective
bandwidth with a charging model consistent with other options.
I.e. it is difficult to come up with a model that will allow
customers to decide if they'd rather buy cache service or buy
direct connect bandwidth. Metering per byte doesn't seem to work
well, so they're experimenting with metering based on peak
bandwidth (suitably smoothed, aggregated, etc.) over several days
from the cache to each customer.
- Sally Floyd et al. have a paper on combining Web caching with
multicast to avoid redundant transmission....still to be read.
- The guys at UCL propose a cache "mesh" (rather than hierarchy)
that works like IP routing: caches maintain "routes" to other
caches and exchange routing updates, so if you miss on a page,
you know explicitly which other cache to go to for it. What's
interesting about this, to me, is that they're proposing to solve
the "application-level routing problem" which would have to be
solved to have meshes of cooperating proxies as well. I didn't
have the whole paper locally, so didn't read how they did this
part.
- Similarly, Jeffery et al.: hierarchical caching is not scalable
because each
higher level needs a bigger cache. Bullshit, once an object is
in some leaf caches, it is fine to evict it from higher-level
ones. This is the L1/L2 inclusion problem in the context of Web
caches; their argument is only true if higher level caches
blindly cache everything (and perhaps not even then, if only a
subset of its children are going to request the object anyway).
- Michael Schwartz, of @Home and formerly Harvest, has this to say
about how @Home arranges its cache hierarchy, and I quote: "In
terms of availability, we achieve redundant cache service at the head
ends by running multiple proxy servers, each of which answers a subset
of proxy requests. Browsers execute proxy auto-configuration
scripts that hash on URLs to select a proxy, with mechanisms for
timeout/failover." Sound familiar? His paper says ICP
will not survive long-term (since it doesn't enable "formal
service agreements" as opposed to informal information-sharing
between caches).
- Static Caching: Define today's
working set to be exactly the same as yesterday's, based on
server logs and a ranking metric for "most valuable" documents
(ratio of #references to bytes) such that the working set just
fills the cache. Recompute the working set each day, but between
recomputes, don't cache anything not in the WS. They claim that
increasing cache size beyond 64 MB
is not useful...no way this is true. (They used server logs from
various servers, including UCB; but no client traces or proxy
traces.) They talk about compressing documents on the server
and/or in the cache, but not about having the cache compress them
lazily as it fetches them. A nice idea but the work is not mature.
Back to index