J.F. Meyer, L. Spainhower,
Proc. ICSSEA 2001
[PDF.zip] |
Summary by
Armando FOx |
One-line summary:
To apply performability framework to Internet services, pick a metric such as
probability of rejected request (due to admission control during spike) or
probability of excess latncy to satisfy requiest; then assign reward levels 0 or
1 to the random var, and generate and solve a stochastic-process model of the
system to obtain a performability rating for it.
Overview/Main Points
Interesting short paper from lisa spainhower and john meyer (umich),
on applying performability to inet services.
they propose a 5-step approach that would allow a performability
characterization of an inet service. basically it requires the steps i
outlined at the mini-tutorial on fri, viz:
1. pick the parameter you want to measure that captures
"degradation" (in the example in this paper,eg, they showed
how you could use the probability that the server rejects a request
because it's down, too busy, etc, or you could use the incremental
latency experienced by a particular request due to high load. they did
not consider anything like having the server return a less-than-perfect
result, or other types of degradation)
2. characterize the workload with respect to the parameter (eg: given
the request load over a particular observaiton window, you can figure
out *from the user's perspective* the probabiltiy distribution that a
given request will be rejected)
3-5. build a stochastic model of the system and solve it (the
state transitions diagram stuff i showed you, resulting in the different
'performability' curves) - in the paper they don't try to do this but
assert that "well known methods"can be used
there is a pointer to another paper by Meyer (reference #16) that
"suggests the feasibility" of doing this, but i haven't read
that ppr...anyone?
-----Original Message-----
From: Lisa Spainhower [mailto:[email protected]] Sent: Monday,
September 16, 2002 12:21 PM To: [email protected] Subject: RE: MTTR
beats MTTF
Armando, The mainframe mixed workload case has parallels in the
distributed world and, in my estimation, will increasingly do so. Even
in a simple web server environment there is are DNS and firewall and
usually some kind of workload distributor servers that have more global
effect if they fail (bigger 'impact ratio')as compared to web servers
per se. In ERP applications like SAP it is common to have a big backend
DB server and lots of smaller application servers but some applications
- e.g., ordering - may impact other apps and some applications -
reporting, for instance - don't. This sort of componentization is
probably going to increase. I've quickly looked over your paper and want
to read it more carefully. Not to seem like I'm pushing my own paper,
but a more recent publication by John Meyer and myself on performability
does seem relevant. It is "Performability: an e-Utility
Imperative" published in ICSSEA 2001. I've attached the final
version. (See attached file: meyspain.doc)
Relevance
A tie between perforambility and Internet service workloads, which are not
what performability was originally designed to describe.
Flaws
The metrics proposed to be captured (probability of rejected request,
probability of excess latency) are effectively binary-valued...we should be able
to do better with gracefully-degrading primitives.
Doesn't say what to do with the performability curve once we have it.
(This came up at Sept. Santa Cruz retreat.) What does an operator do with
this number? What does it tell him/her? (We have been arguing that
MTTR is the metric for impact, not performability, but that we need a way to
capture degraded behavior when we are working on lowering MTTR.)
|