Caching

Latency-based caching

server download lat alone does worse than LRU, LFU, and Size!
Hybrid: ((ConnTime + WB/Bandwidth)*(Nrefs^WN))/Size
Estimate ConnTime, Bandwidth using TCP-like smoothing;
Modified Harvest to use their alg (actually uses buckets, not servers).
Note: per-server, not per-URL. More robust estimates even when per-URL data is stale.
Hybrid formula more robust to variety of workloads (trace-playback as well as real users); better on average than minimizing any single metric, in terms of e2e latency
Minimizing cache misses (server hits) sensitive to WB; minzing Bandwidth sensitive to WN; but minimizing e2eTime insensitive to both!
"With 120K refs from aol.com, results inconclusive". (AOL gave them traces!) Hypothesis: less locality (or different locality) compared to BU and VT traces.
Problems:
- Self-selection: some people tend not to visit docs whose URLs indicate they're far away. Hypothesis: we may be underestimating the improvement.
- Variance in download times etc. is high in practice.
- How about caches that adapt their algs according to traffic patterns? (future)

Points I brought up:

Modified Harvest, cool! Customizable eviction?
Sharing traces and playback engines
How did you get AOL traces?
How big cache, and how does perf of each alg depend on size? They used a cache that was 10% of "infinite size". Relative performance is invariant to cache size down to about 1% of "infinite size", at which point LFU gets much better.

Action items: We should share Harvest mods (they are very interested in partitioned Harvest) and traces (they don't distribute on CDROM, but have them online and queriable by Java applet, which they'd be happy to give us)

Finding salient features by looking for word clusters

Goal: extract "word clusters" from documents, then use them to perform the query "other documents like this one" (Excite does something like this)
get "word clusters" based on word counts, syntactic analysis, etc. -- no semantics or "prior knowledge"
future: rank-ordering rather than raw word counts; word stemming; combinations of terms (logical connectives, etc); hierarchical cluster refinement
Problems:
- no word-sense disambiguation (ie by context), since purely statistical (solution: since cluster size small, try to determine semantic relationships between words in a cluster using lexical database; can also do the same on orig. query and compare semantic similarity)
- subject to "spam words", outliers, etc (above mech also gives formal metric fro "cohesiveness", which should throw these out)
Conclusion: categorization of documents less useful than word clusters for doing "similarity" searches
Flaw: a big leap from statistics/syntax to semantics. The natural language folks have tried this time and again and most semantic efforts have foundered on the amount of context really needed.
Flaw: document sample size is 85. Yes, 85.

NSTP - notification service transport protocol for groupware (Lotus)

Toolkit for "synchronous groupware", using Java or C++. Looks similar to what McCanne et al. are doing with MASH and object libraries, but far more stupid.
Server-multiple clients model; one TCP conn per client. Forget it.
"What about consistency in multiuser apps": "It's an application level problem" (they provide locks, etc.)
"What about scalability" question got a fudgy answer and handwaving (soudns like it's not designed for wide-area anything)
demo: playing tic-tac-toe and chatting using Java applets that have the notification toolkit under them)
Sources avaliable for noncommercial user at nstp.research.lotus.com
This doesn't sound useful to me.

WebRule

Web server plug-in that contains a rules database that allows rules to be triggered by actions.
Actions can be local (startup, shutdown, URL access request, permissions violation, etc.) or remote (another WebRule server sends you an action request, rule update, etc.)
Actions trigger rules, which are basically little scripts with various attributes attached (permissions, etc.)
Rule example: "When such-and-such page changes [the action part], go get it, plus the following other pages, and then run them through this table-merging program [the rule part]".
Can build little groups of collaborating WebRule servers to support such services. Examples they gave weren't terrifically well motivated but I think it has potential.
Flaws:
- Server plug-in written in Java and C. Clearly this application has more leverage on the proxy (imagine scalable proxy augmented with rule/action paradigm)
- Not scalable for the obvious reasons, and also not clear what happens to scalability if rules cause lots of cross-server interactions.
- As far as I can tell, individual users cannot modify or upload rules -- only WebRule admins can.
Wouldn't the scalable proxy be a great place to run a rule/action system like this one?

Pseudo-Serving

Idea: let clients "bid" CPU/disk resources to get faster service from servers
Interesting idea, half baked implementation and simulation results, thoroughly unconvincing, and author didn't handle questions particularly well.