Performability: an e-utility imperative
J.F. Meyer, L. Spainhower,
Proc. ICSSEA 2001 [PDF.zip]
Summary by
Armando FOx

One-line summary:

To apply performability framework to Internet services, pick a metric such as probability of rejected request (due to admission control during spike) or probability of excess latncy to satisfy requiest; then assign reward levels 0 or 1 to the random var, and generate and solve a stochastic-process model of the system to obtain a performability rating for it.

Overview/Main Points

Interesting short paper from lisa spainhower and john meyer (umich), on applying performability to inet services.

they propose a 5-step approach that would allow a performability characterization of an inet service. basically it requires the steps i outlined at the mini-tutorial on fri, viz:

1. pick the parameter you want to measure that captures "degradation" (in the example in this paper,eg, they showed how you could use the probability that the server rejects a request because it's down, too busy, etc, or you could use the incremental latency experienced by a particular request due to high load. they did not consider anything like having the server return a less-than-perfect result, or other types of degradation)

2. characterize the workload with respect to the parameter (eg: given the request load over a particular observaiton window, you can figure out *from the user's perspective* the probabiltiy distribution that a given request will be rejected)

3-5.  build a stochastic model of the system and solve it (the state transitions diagram stuff i showed you, resulting in the different 'performability' curves) - in the paper they don't try to do this but assert that "well known methods"can be used

there is a pointer to another paper by Meyer (reference #16) that "suggests the feasibility" of doing this, but i haven't read that ppr...anyone?

-----Original Message----- 

From: Lisa Spainhower [mailto:[email protected]] Sent: Monday, September 16, 2002 12:21 PM To: [email protected] Subject: RE: MTTR beats MTTF

Armando, The mainframe mixed workload case has parallels in the distributed world and, in my estimation, will increasingly do so. Even in a simple web server environment there is are DNS and firewall and usually some kind of workload distributor servers that have more global effect if they fail (bigger 'impact ratio')as compared to web servers per se. In ERP applications like SAP it is common to have a big backend DB server and lots of smaller application servers but some applications - e.g., ordering - may impact other apps and some applications - reporting, for instance - don't. This sort of componentization is probably going to increase. I've quickly looked over your paper and want to read it more carefully. Not to seem like I'm pushing my own paper, but a more recent publication by John Meyer and myself on performability does seem relevant. It is "Performability: an e-Utility Imperative" published in ICSSEA 2001. I've attached the final version. (See attached file: meyspain.doc)

Relevance

A tie between perforambility and Internet service workloads, which are not what performability was originally designed to describe.

Flaws

The metrics proposed to be captured (probability of rejected request, probability of excess latency) are effectively binary-valued...we should be able to do better with gracefully-degrading primitives.

Doesn't say what to do with the performability curve once we have it.  (This came up at Sept. Santa Cruz retreat.)  What does an operator do with this number?  What does it tell him/her?  (We have been arguing that MTTR is the metric for impact, not performability, but that we need a way to capture degraded behavior when we are working on lowering MTTR.)

Back to index

Summaries may be used for non-commercial purposes only, provided the summary's author and origin are acknowledged. For all other uses, please contact us.