| 
        Guides to Software Evaluation  | 
        
       |  
     
    
      
        
        
          
            Authors: Hans-Ludwig Hausen, Dieter Welzel 
            Arbeitspapiere der GMD 746, April 1993, 92 pages 
            [Abstract] 
            [HTML] |  
            Summary by 
            George Candea  | 
           
         
One-sentence summary:
Describes a framework for measuring, assessing, and certifying the
quality of a software product; reported information is meant to be
equally useful to software producers, vendors, and users.
Overview/Main Points
- According to one of the authors, the guide was used on about 100
    industrial case studies in the European Union and has been adopted
    by the European Space Organization after the Ariane-5 disaster to
    improve sw quality.  The framework intends to be useful in
    evaluating a wide range of products, from off-the-shelf packages
    to custom/commissioned software, to embedded systems.
 - The evaluation procedure needs to be cost-effective and run by an
    independent 3rd party, such as specialized testing labs.  This
    software evaluation guide is meant to be a handbook for such labs,
    as well as producers/purchasers of software.
 - There are 4 principles guiding the authors:
    
       - repeatability: repeated evaluation yields same results
       
 - reproducibility: evaluation by different labs gives
           same results
       
 - impartiality
       
 - objectivity: there is a minimal amount of subjective
          judgment required.
    
  
 - The evaluation procedure consists of 5 steps (italicized terms are
    defined below):
    
    - The lab's client (e.g., a software vendor) states the
        evaluation requirements, including evaluation levels,
        and the lab analyzes them.
    
 - Lab produces an evaluation specification.  For example, the
        spec describes evaluation techniques to be used for each
        software characteristic: black box testing + glass box
        + UI inspection + algorithmic complexity analysis + static
        analysis + design process evaluation.  The client can accept
        the spec or withdraw from evaluation process.
    
 - Lab designs the evaluation process based on spec; client
        accepts or withdraws.
    
 - Lab performs evaluation.  This may involve manual processes,
        computer aided processes (e.g., using a check list manager for
        applying check lists), as well as automatic evaluation (e.g.,
        measuring complexity using static analyzer).
    
 - Lab reports results; client accepts or lodges an appeal in
        court or some legal forum (results become public information
        otherwise).
    
  
 - An evaluation level represents the depth/thoroughness of
    the evaluation techniques and results, w.r.t. safety, security,
    economic risk, availability, app constraints).  Evaluation levels
    are names A, B, C, D and have various levels for:
    
    - safety: can range from a safety breach causing "small damage
        to property + no risk to people" to "unrecoverable
        environmental damage + many people killed".
    
 - economy: ranges from "negligible loss" to "financial disaster
        that ruins company".
    
 - security: defined in terms of ITSEC assurance levels.
    
 - availability: ranges from "up on request/demand" to "no down
        time at all".
    
 - application domain: e.g., "office automation + entertainment",
        "air and railway systems", etc.
    
  
 - Software characteristics are: functionality, reliability,
    usability, efficiency, maintainability, or portability.
 - An evaluation module encapsulates atomic evaluation
    procedures (e.g., "check whether system automatically proposes
    corrections to clearly correctable user-errors?"), metrics and
    evaluation levels for each software characteristic (for the
    example shown above, "yes=2, some=1, no=0"), a description of the
    assessment procedure, and a format for reporting results and
    costs.
 - The information required by the lab for the evaluation can
    include: product information (user handbook, design docs, object
    code, code listings, etc.), development process information
    (management report, quality assurance report, project
    documentation), etc.
 - Annex
    B shows an evaluation module in the form of a usability
    checklist.  It is based on the following usability metrics:
    installability from scratch, learnability, use efficiency,
    interface customizability, experienced user migration ease.
  
Relevance
- Lays out a reasonably well thought-out process that can be used in
    evaluating software.  However, this seems like a more useful tool
    for the software producers themselves, as part of their internal
    reviews, rather than for customers, because many of the features
    in the evaluation may not be things they care about.  Hence, it's
    unclear to what extent one could compare two products on the basis
    of their evaluations.
 - In doing so, the author employ existing/available tools and
    technology, which makes the guide easier to adopt.  They don't
    invent new languages or use exotic tools.
 - It is a realistic guide, in that it allows for reasonably useful
    reviews to be conducted for reasonable software.  It is clearly
    not the result of a committee.  The approach is rather holistic
    and end-to-end, much unlike most evaluation processes I've seen so
    far (e.g., verification and validation).
 - Would certainly be useful and have an impact on the industry
    if all major players agreed upon it.
  
Drawbacks
Some of these are not necessarily flaws of this guide, but rather
issues that would affect any process that tried to evaluate software
quality.
- The development of evaluation modules is closely tied to existing
    computing paradigms (e.g., windowing interfaces for UI's).  As
    such, they might become obsolete.  But worse, the existence of
    some "standard" evaluation modules may thwart innovation (e.g.,
    see the industry's focus on TPC-C/W performance).  But, "for
    better or worse, bechmarks shape a field" (Dave Patterson).
 - In step 1, "analyzing evaluation requirements," the guide suggests
    that in the case of complex systems, the lab should closely
    collaborate with its customer, in order to reduce costs; this
    however can seriously influence objectivity.
 - Some of the evaluation procedures are inherently subjective and
    there is no way around it.  E.g., in the usability module, there
    are a few metrics that evaluate the "understandability" and
    "clarity" of documentation.
 - Impartiality can also be a problem.  In the usability module, for
    the "check that all required installation files are present"
    metric, there is a note saying "The architecture of the product
    may make it impossible for the evaluator to directly test for this
    feature.  If so, the evaluator should answer this question by
    querying the developer."
 - The guide may not be amenable to today's software industry.  For
    any reasonably complex system, source code and other internal,
    company-confidential documents may be required.  It also seems
    like few companies would be willing to commit to the "results
    reporting" step unless the evaluation truly guarantees
    reproducibility (e.g., like TPC-C).  However, the
    necessarily-subjective items would prevent such reproducibility.
 - I haven't seen the case studies mentioned by the author, so I
    can't speak to the domain of applicability nor to the
    effectiveness.  I don't know how many man-hours are required for
    such an evaluation, nor how automatable the evaluation really is.
    However, many items seem to not allow automated evaluation.
 - Finally, I don't know whether such independent labs using this
    guide would do much more to the software industry than PC
    Magazine, Gartner, etc. do when they publish their research.  To
    some extent it might be possible these organizations use parts of
    this guide in their work.
  
         | 
       
     
    
     |