Caching
Latency-based caching
  -  server download lat alone does worse than LRU, LFU, and Size!
  
 -  Hybrid: ((ConnTime + WB/Bandwidth)*(Nrefs^WN))/Size
  
 -  Estimate ConnTime, Bandwidth using TCP-like smoothing; 
  
 -  Modified Harvest to use their alg (actually uses buckets, not
       servers).
  
 -  Note: per-server, not per-URL.  More robust estimates even when
       per-URL data is stale.
  
 -  Hybrid formula more robust to variety of workloads
       (trace-playback as well as real users); better on average than
       minimizing any single metric, in terms of e2e latency
  
 -  Minimizing cache misses (server hits) sensitive to WB; minzing
       Bandwidth sensitive to WN; but minimizing e2eTime
       insensitive to both!
  
 -  "With 120K refs from aol.com, results inconclusive".  (AOL gave
       them traces!)  Hypothesis: less locality (or different locality)
       compared to BU and VT traces.
  
 -  Problems:
       
         -  Self-selection: some people tend not to visit docs whose URLs
              indicate they're far away.  Hypothesis: we may be underestimating
              the improvement.
         
 -  Variance in download times  etc. is high in practice.
         
 -  How about caches that adapt their algs according to
              traffic patterns? (future)
       
 
 
Points I brought up:
  -  Modified Harvest, cool! Customizable eviction?
  
 -  Sharing traces and playback engines
  
 -  How did you get AOL traces?
  
 -  How big cache, and how does perf of each alg depend on size?
       They used a cache that was 10% of "infinite size".  Relative
       performance is invariant to cache size down to about 1% of
       "infinite size", at which point LFU gets much better.
 
Action items: We should share Harvest mods (they are very
interested in partitioned Harvest) and traces (they don't distribute on
CDROM, but have them online and queriable by Java applet, which they'd
be happy to give us)
Finding salient features by looking for word clusters
  -  Goal: extract "word clusters" from documents, then use them to
       perform the query "other documents like this one" (Excite does
       something like this)
  
 -  get "word clusters" based on word counts, syntactic analysis,
       etc. -- no semantics or "prior knowledge"
  
 -  future: rank-ordering rather than raw word counts; word stemming;
       combinations of terms (logical connectives, etc); hierarchical
       cluster refinement
  
 -  Problems:
       
         -  no word-sense disambiguation (ie by context), since
              purely statistical (solution: since cluster size small,
              try to determine semantic relationships between words in a
              cluster using lexical database; can also do the same on
              orig. query and compare semantic similarity)
         
 -  subject to "spam words", outliers, etc (above mech also
              gives formal metric fro "cohesiveness", which should throw
              these out)
       
 
   -  Conclusion: categorization of documents less useful than word
       clusters for doing "similarity" searches
  
 -  Flaw: a big leap from statistics/syntax to semantics. The natural
       language folks have tried this time and again and most semantic efforts
       have foundered on the amount of context really needed.
  
 -  Flaw: document sample size is 85.  Yes, 85.
 
NSTP - notification service transport protocol for groupware (Lotus)
  -  Toolkit for "synchronous groupware", using Java or C++.  Looks
       similar to what McCanne et al. are doing with MASH and object
       libraries, but far more stupid.
  
 -  Server-multiple clients model; one TCP conn per client.  Forget it.
  
 -  "What about consistency in multiuser apps": "It's an application
       level problem" (they provide locks, etc.)
  
 -  "What about scalability" question got a fudgy
       answer and handwaving (soudns like it's not designed for
       wide-area anything)
  
 -  demo: playing tic-tac-toe and
       chatting using Java applets that have the notification toolkit
       under them)
  
 -  Sources avaliable for noncommercial user at nstp.research.lotus.com
  
 -  This doesn't sound useful to me.
 
WebRule
  -  Web server plug-in that contains a rules database that allows
       rules to be triggered by actions.
  
 -  Actions can be local (startup, shutdown, URL access request,
       permissions violation, etc.) or remote (another WebRule server
       sends you an action request, rule update, etc.)
  
 -  Actions trigger rules, which are basically little scripts with
       various attributes attached (permissions, etc.)
  
 -  Rule example: "When such-and-such page changes [the action part],
       go get it, plus the
       following other pages, and then run them through this
       table-merging program [the rule part]".
  
 -  Can build little groups of collaborating WebRule servers to
       support such services.  Examples they gave weren't terrifically
       well motivated but I think it has potential.
  
 -  Flaws:
       
         -  Server plug-in written in Java and C.  Clearly this
              application has more leverage on the proxy (imagine
              scalable proxy augmented with rule/action paradigm)
         
 -  Not scalable for the obvious reasons, and also not clear
              what happens to scalability if rules cause lots of
              cross-server interactions.
         
 -  As far as I can tell, individual users cannot modify or
              upload rules -- only WebRule admins can.
       
 
   -  Wouldn't the scalable proxy be a great place to run a rule/action
       system like this one?
 
Pseudo-Serving
  -  Idea: let clients "bid" CPU/disk resources to get faster
       service from servers
  
 -  Interesting idea, half baked implementation and simulation
       results, thoroughly unconvincing, and author didn't handle
       questions particularly well.