
Center for Software Engineering
Abstracts (Last revised on 5/2/2002)
Automated Generation of Self-Checking
Function Tests
Amit Paradkar
Submitted to ISSRE 2002, July 2002
This paper describes a new unified test generation method which simultaneously addresses the following critical issues in software function testing: (1) Selection of appropriate combinations of parameter values for testing individual operations, (2) Selection of appropriate sequence of operation invocations, (3) Generation of test oracles in the form of self-checking sequences, and, (4) Generation of negative test cases. Our method exploits a novel mutation scheme applied to operations specified as pre-, and postconditions (actions) on parameters and state variables; a set of novel abstraction techniques which result in a new form of compact transition system, called quasi-reachability graph; and the techniques developed for planning under resource constraints to automatically generate self-checking positive and negative test cases with appropriate parameter values. The test cases generated using our approach target detection of certain faults in an implementation. We discuss the reduction techniques used in our method to control the size of the generated test suite. We also report our experiences with using our method in an industrial setting.
Improving Education of Software Engineers
Through Use of Defect Analysis
Theresa Kratschmer
Submitted to IEEE Software Magazine, Sept/Oct 2002
In this paper, we discuss the use of software defect analysis in teaching professional software engineers how to improve their development process. After a brief summary of the basics of the defect analysis methodology, we discuss how the education of software developers in this technology has evolved over the years, reflecting changes in our deployment model. We also highlight factors that have been critical for successful learning. Our results indicate that defect analysis is an effective and expedient method of teaching software engineers the skills to improve the development process and the quality of products. However, it is critical that the appropriate amount of information is conveyed at the right time, that there is plenty of opportunity to practice the newly acquired knowledge , and that the educational materials effectively support the concepts and principles taught in order to reach large audiences and achieve maximum retention of the material.
Decision Support for Software Management
Sunita Chulani, P. Santhanam, Bob Leszkowicz, Anna Nowacki
Submitted to IEEE Software
Typical software metrics and measurements (e.g. code coverage, McCabe complexity, Function points, OO metrics, etc) target the technical professionals in a software development organization. While this is useful to the practitioners in the trenches, the managers and executives of these organizations face a broader set of issues and concerns such as functionality and schedule compared to the competition, effectiveness of the support to customers, meeting customer expectations at a reasonable cost, market share, customer retention, etc. In order to make better decisions on these issues, the collection of business and software engineering data and its regular use throughout the product life cycle is essential.
The product life cycle typically consists of initial market research, design and development of the product, service and support of the product during which various pieces of information related to the product are collected. The aim of this paper is to describe a research project that analyzes data from three separate data sources: (i) customer support, (ii) critical situations involving customers and (iii) customer satisfaction surveys. We describe how all these data, currently used in isolation, can be integrated at the product level. This enables mid and high level management to derive relationships among the inhibitors and drivers that are important through the entire product life cycle. The data can be used to successfully understand the sensitivity of the metrics across development, service, customer satisfaction and finally, revenue. For example,
In addition, we highlight some of the issues involved in the data integration and the lessons learned in building a consolidated data warehouse. We also present a discussion of the models and relationships between some of the different data elements and suggest directions for future research.
Bayesian Methods for Change-Point Detection
in Long-Range Dependent Processes
Bonnie K. Ray, Ruey S. Tsay
To appear in Journal of Time Series Analysis, Blackwell Publisher, 2002
We describe a Bayesian method for detecting structural changes in a long-range dependent process. In particular, we focus on changes in the long-range dependence parameter, $d$, and changes in the process level, $\mu$. Markov chain Monte Carlo methods are used to estimate the posterior probability and size of a change at time $t$, along with other model parameters. A time-dependent Kalman filter approach is used to evaluate the likelihood of the fractionally integrated ARMA model characterizing the long-range dependence. The method allows for multiple change points and can be extended to the long-memory stochastic volatility case. We apply the method to three examples, to investigate a change in persistence of the yearly Nile River minima, to investigate structural changes in the series of durations between intraday trades of IBM stock on the New York Stock Exchange, and to detect structural breaks in daily stock returns for the Coca Cola Company during the 1990's.
Modeling Vector Nonlinear Time Series
Using POLYMARS
Bonnie K. Ray, Jan DeGooijer
To appear in Computational Statistics and Data Analysis, Elsevier Publisher,
2002
A modified multivariate adaptive regression splines method for modeling vector nonlinear time series is investigated. The method results in models that can capture certain types of vector self-exciting threshold autoregressive behavior, as well as provide good predictions for more general vector nonlinear time series. The effect of different model selection criteria on fitted models and predictions is evaluated through simulation. The method is illustrated for a real data example, to model a series of intra-day electricity loads in two neighboring Australian states.
Software Engineering Economics
Sunita Chulani, Barry Boehm
To appear as chapter in Software Management Tutorial, 6th Edition
This paper summarizes the current state of the art and recent trends in software engineering economics. It provides an overview of economic analysis techniques and their applicability to software engineering and management. It surveys the field of software cost estimation, including the major estimation techniques available and the state of the art in algorithmic cost models.
Specifying and Selecting Small Yet Effective
Set of Parameter Values for Testing
Amit Paradkar
Submitted to ACM SIGSOFT Foundations of Software Engineering - 10, November
11, 2002
We present an approach to selecting parameter values for testing an operation. The main contributions are (1) a method to derive a small set of test variations, which are constraints on parameters, from the operation's test specification (2) a method to select parameter values which satisfy the test variations (3) an evaluation which indicates that our approach is better than those based on the Design of Experiments (DOE) techniques and (4) results and lessons learnt from two case studies using our approach in an industrial setting. In particular, we present techniques for deriving test variations from test specifications which describe logical relationships among the manually derived abstract parameter partitions. The test specifications also associate concrete data values with each abstract partition. We have implemented our techniques in a tool. We conducted an experiment to compare our techniques with those based on DOE. We compared cost, measured by test data set size, and effectiveness, measured by code and mutant coverage, of the two approaches.
A Technique for Generating Optimized System
Test Suites
Clay Williams, Theresa Kratschmer
Submitted to the 10th International Symposium on the Foundations of Software
Engineering (FSE-10)
November 2002
One goal of system testing is to test the complete set of functional capabilities delivered by a system. As systems deliver increasingly large sets of functional capabilities, developing efficient and effective test cases for such testing can be a challenging task. Using concepts from the Unified Modeling Language (UML), we present a representation for testing system functionality called a Constrained Use Case Flow Graph (CUCFG.) We describe an approach for optimizing a CUCFG based on integer linear programming (ILP) and show how to generate test plans or test cases from the optimized graph. Finally, we present an analysis of the technique based on its use in testing a data analysis system. This analysis explores both in-process and field discovered defects.
Use of Defect Analysis in Transferring
a Software Project
Theresa Kratschmer, P.
Santhanam
Submitted to IEEE Software, Nov/Dec 2002
There is an increasing trend in the IT industry to transfer the development and/or maintenance responsibility of a software project from one organization to another. The typical business reasons for such a decision are change in the mission of the original group and/or perceived reduced cost of development and support due to a less expensive alternative. There is generally an attempt to make sure that this transfer does not affect the customers adversely in their satisfaction with product functionality and quality. Once the decision is made to go ahead with the transfer of mission to another group, there remains the daunting task of how to make it happen in the most efficient manner so that the new organization can come up to speed in their understanding of the software project details and the implied processes to achieve their goal. In this paper, we discuss a unique application of software defect analysis in a real project transfer situation to help in this task and the results of this experience.
The specific concerns that the receiving organization had were:
Since the transferring team had been collecting the software defect data during its development process, we proceeded to analyze them to address these concerns in an objective way. The defect analysis, which included the use of Orthogonal Defect Classification (ODC) methodology, indicated several significant things. First, we were able to determine from the defect analysis which components had more exposure in terms of lack of requirements definition and design completeness so that the appropriate documentation can be included in the contractual specifications. Secondly, the analysis showed that the planned approach for training testers would be not effective since some of the existing problems in the transferring organization will also be carried over to the receiving organization. Therefore, we suggested an alternative approach for training. Finally, we concluded that the design of this product was still evolving and that the testing targeted basic functions only. The level of testing complexity for specific components was assessed and the results of this analysis were then reported to the technical teams so that improvements could be made. Overall, the use of defect analysis proved surprisingly effective in dealing with mission transfer across organizations.
Software debugging, testing, and verification
B. Hailpern and P. Santhanam
IBM Systems Journal, Vol. 41, No. 1, February 2002.
In commercial software development organizations, increased complexity of products, shortened development cycles, and higher customer expectations of quality have placed a major responsibility on the areas of software debugging, testing, and verification. As this issue of the IBM Systems Journal illustrates, there are exciting improvements in the underlying technology on all three fronts. However, we observe that due to the informal nature of software development as a whole, the prevalent practices in the industry are still immature, even in areas where improved technology exists. In addition, tools that incorporate the more advanced aspects of this technology are not ready for large-scale commercial use. Hence there is reason to hope for significant improvements in this area over the next several years.
THE STCL TESTWARE ARCHITECTURE
The STCL Architecture Committee
IBM Systems Journal, Vol. 41, No. 1, February 2002.
The Software Test Community Leadership (STCL) is an IBM wide initiative focused on improving software test and quality practices across IBM. In 1999, they began working to develop an architecture to integrate the strategic testing tools across IBM. This article discusses the issues associated with developing an architecture for a large base of existing tools which span a variety of platforms and domains. This includes both the phases involved in developing the architecture, the standards selected for implementing it, and the artifacts that must be developed to support the variety of tools used in test across IBM. The architecture is being designed and developed in four phases: enabling data exchange, providing data integration, providing control integration, and providing a single GUI into the toolset. Because of the heterogeneous nature of the platforms and domains the architecture must support, using open, widely available standards was essential. The architecture uses XML as a data exchange medium, WebDAV as a data integration standard, Java for control integration, and the Eclipse framework for building an integrated GUI. Given these standards, we discuss the artifacts required to provide generic "plug-and-play" capability in the architecture. Data integration is managed by defining generic resource types, which can be mapped to specific tools that support common test capabilities. Control integration will be managed by providing a framework in Java for managing cooperation and workflow between tools. GUI integration will be provided by developing components using Eclipse which support the resource types and integrate into the control framework . We present examples of each of these phases, discuss our progress to date, and outline our roadmap for the future.
IMPROVING SOFTWARE TESTING VIA ODC: THREE CASE STUDIES
Mark Butcher, Hilora Munro (IBM UK), and Theresa Kratschmer
IBM Systems Journal, Vol. 41, No. 1, February 2002.
Orthogonal Defect Classification (ODC) is a technology which uses analysis of in-process and customer reported defects to improve the quality of software products. In this paper, we will discuss three different products that have used ODC to improve overall quality by focusing on specific areas of the testing process. The first product, MQSeries, is a family of products providing messaging, business integration, and workflow management services. The messaging product development team started using ODC in May of 2000. They had three objectives: 1) measuring test effectiveness, 2) evaluating customer usage in order to make improvements to test, and 3) identifying and prioritizing specific actions to implement in the development process to improve quality. The second product, CICS, is an application server that provides industrial strength, online transaction management for mission-critical applications. Despite being a world-class quality product, CICS has implemented ODC to analyze customer reported defects in order to improve customer satisfaction and decrease warranty costs by strengthening test. The final product to be discussed is SoDA, a product developed to allow testers and developers to visualize their ODC classified data. ODC has been crucial in the management of SoDA development to help focus the limited resources available for testing.
NEW METRICS TO EVALUATE VENDOR DEVELOPED SOFTWARE BASED ON THE TEST CASE EXECUTION RESULTS
Kathryn Bassin, Shriram Biyani, and P. Santhanam
IBM Systems Journal, Vol. 41, No. 1, February 2002.
Due to skill and cost considerations, a growing number of organizations rely on external vendors to develop the software for their needs. One of the challenges the vendee organization faces is the evaluation of the delivered code in terms of functionality, performance, etc. Due to the implicit risks in software projects, the contractual commitments for quality and completeness are generally at a high level and typically the vendee organization ends up with its own system/acceptance test to validate its own expectations. Typically the code gets delivered in an incremental fashion over a few iterations and much of the day-to-day data from the vendor are not available to the vendee. Sydney Olympics was one such project and when IBM had to do the evaluation of a vendor delivered code, we helped the IBM Olympics management team evaluate the readiness of the delivered code based on the execution records of the test cases. We developed some new innovative metrics that can be derived from the actual test execution results to distinguish the various common scenarios of test case failure. Examples are: functionality never enabled, bad fixes, defects that were never fixed over successive iterations, etc. We were able to calculate these metrics based on actual data and provide them to the management. The relationship of these metrics to the actual cause was validated through explicit communications with the vendor and the subsequent actions to improve the quality and completeness of the delivered code. This paper will show how this metrics can be derived from the execution data and used in a real software project execution environment.
AN APPROACH TO HIGHER RELIABILITY USING SOFTWARE COMPONENTS
Hongxia Jin and P. Santhanam
To be presented at the 12th
International Symposium on Software Reliability Engineering (ISSRE) 2001, Hong
Kong, November 27-30, 2001
The general belief that component reuse improves software reliability is based on the assumption that the prior usage has exposed the potential software faults. In reality, this is not necessarily true due to the inherent differences in the environments and usage of the component. To achieve a high reliability for a component-based software system, we need reliable components that interoperate properly in the new environment. In this paper, we present a unified approach to do an evaluation of the interoperability of components. This involves a generic and systematic capture of the component behavior that expresses the various assumptions made by the designers about components and their interconnections explicitly. With the information captured at a semantic level, this approach can detect potential mismatches between components in the new environment and also give guidance on how to resolve the mismatches to fit components in the new context. The capture of this information in an appropriate format (e.g. XML) and an automated analysis can show serious exposures to reliability in a component-based system, before it is integrated.
EVALUATING THE SOFTWARE TEST STRATEGY FOR THE 2000 SYDNEY OLYMPICS
Kathryn Bassin, Shriram Biyani, and P. Santhanam
To be
presented at the 12th International Symposium on Software Reliability
Engineering (ISSRE) 2001, Hong Kong, November 27-30, 2001
The 2000 Summer Olympic Games event was a major information technology challenge. With a fixed deadline for completion, its inevitable dependency on software systems and immense scope, the testing and verification effort was critical to its success. One way in which success was assured was the use of innovative techniques using ODC based analysis to evaluate planned and executed test activities. The techniques were used to verify that the plan was comprehensive, yet efficient, and ensured that progress could be accurately measured. This paper describes some of these techniques and provides examples of the benefits derived. We also discuss the applicability of the techniques to other software projects.
MANAGING THE MAINTENANCE OF PORTED, OUTSOURCED, AND LEGACY SOFTWARE VIA ORTHOGONAL DEFECT CLASSIFICATION
Kathryn Bassin and P. Santhanam
To be presented at the International Conference on Software
Maintenance 2001, Florence, Italy, November 6-11, 2001.
Are you stymied by the challenges of managing and evaluating software that includes legacy, ported, and outsourced code?
It may be that you have the means to address many of your challenges already at hand. Defect records either discovered in-house or by customers (or both!), classified using Orthogonal Defect Classification (ODC), provide a wealth of information pertinent to many aspects of developing, verifying, and maintaining outsourced and ported software.
This paper sets forth a method of exploiting this data to provide decision support regarding maintaining and measuring a complex project, isolating and defining specific problem areas, and taking actions targeting these areas to mitigate the risk associated with them. Actual case studies will be used to illustrate key points.
TOWARD A TEST-READY META-MODEL FOR USE CASES
Clay Williams
Presented at Practical UML-Based Rigorous
Development Methods -- Countering or Integrating the eXtremists Workshop
(co-located with UML 2001), July 27, 2001
In the UML, use cases are used to define coherent units of functionality associated with classifiers (classes, subsystems, or systems). Two principal purposes that use cases serve are specifying the functionality the classifier will provide and providing a basis for developing test cases for the classiifier. This paper discusses issues that arise when using use cases as the basis for model-based testing. Based on this discussion, a test-ready meta-model for use cases is developed. Also described is a tool constructed using the concepts from the meta-model and data is provided from the initial pilots of this tool.
DERIVING A SOFTWARE QUALITY VIEW FROM CUSTOMER
SATISFACTION AND SERVICE DATA
Sunita Chulani, P. Santhanam,
Darrell Moore, Bob Leszkowicz, Gary Davidson
Presented and published in the
Proceedings of the European Software Control and Metrics (ESCOM)
Conference,
London, England, April 2001.
Most quality and cost models use defect density to represent software
quality. Customer's quality expectations are not typically based on size and
complexity of the product they buy and their satisfaction can be influenced
substantially by other product attributes that are not typically mapped to
defects (e.g. Ease of installation and use, timely support, etc.). Consequently,
new ways to measure customer view of quality are needed. In this paper, we
provide analyses of customer service and survey data from eight products and
discuss some key insights to lay the foundation for a better understanding of
customer view of software quality. We believe that this approach can help us
identify actions in software development and support that will address the
concerns of our customers and improve their satisfaction with our products.
CERTIFYING THE CORRECTNESS OF BRANCH-AND-BOUND
COMPUTATIONS*
Hongxia Jin, Gregory F. Sullivan and
Gerald M. Masson
Submitted to the IEEE Transactions on Software Engineering,
2000
eFlow: A JAVA-BASED WORKFLOW SERVICE
G. Robert Malan, Jarir Chaar, Santanu Paul, and Peter
Masters
Proceedings of the OOPSLA '98 Workshop on Implementation and
Application of OO Workflow Management Systems, October 1998
In today's enterprise, workflow management technologies help focus on the business processes of an enterprise rather than on the individual functions of the components of such an enterprise. They are used to ensure the reliable and repeatable execution of business processes and to improve the efficiency of the enterprise by supporting the definition, planning, execution, and analysis of such processes. eFlow is a lightweight framework for supporting workflow on the Internet. It allows Web-based workflow processes to inter-operate through the use of the Rainmaker interface. These web-based workflow processes are exported to users in the form of web pages with embedded Java applets. The applets consist of eSuite components including: spreadsheet, word processor, charting, and presentation applets. eFlow uses e-mail as its notification mechanism, providing a universal inbox for users. Persistence of both state and application data is provided through the Servlet File System (SFS), a distributed web-based file system. eFlow allows mobile users to access their worklist tasks from any point in the network (intranet or Internet) using only a Java-enabled web browser or a thin client platform such as a Network Computer (NC). As a Java-based application, the implementation as well as the data items are defined within an object oriented framework. This paper describes the eFlow architecture and illustrates its functionality through the use of an example process: a grant proposal workflow process.
AN OBJECTIVE APPROACH TO EVALUATE SOFTWARE DEVELOPMENT
In today's software development environment, the delicate compromise over
functionality, time to market, and quality, drives all business decisions. The
success of a software development effort is dependent on whether the development
team can efficiently design, code, test and support the software in a timely
fashion. This article describes an objective and time tested method to meet the
needs of all the key people in a software organization: the executive, release
manager, developers and testers. The method is based on the use of software
defects as a diagnostic probe in an organization and the capture of semantic
information from the defect analysis via the "Orthogonal Defect Classification"
scheme. Through three real-life case studies in different parts of IBM, we
illustrate the use of one of the most power methods available today for software
management and technical decision support.
EXPLORING DEFECT DATA FROM DEVELOPMENT AND CUSTOMER USAGE ON SOFTWARE MODULES OVER MULTIPLE RELEASES
Traditional defect analyses of software modules have focused on either identifying error prone modules or predicting the number of faults in a module, based on a set of module attributes such as complexity, lines of code, etc. In contrast to these metrics-based modeling studies, this paper explores the relationship of the number of faults per module to the prior history of the module. Specifically, we examine the relationship between (a) the faults discovered during development of a product release and those escaped to the field, and (b) faults in the current release and faults in previous releases. Based on the actual data from four releases of a commercial application product consisting of several thousand modules, we show that:
These results can be used to improve the prediction of quality at the module level of future releases based on the past history.
USE OF SOFTWARE TRIGGERS TO EVALUATE SOFTWARE PROCESS EFFECTIVENESS AND CAPTURE CUSTOMER USAGE PROFILES
We have analyzed fault data comprising nearly 30,000 records (including in-process and field data) from two real products A and B over multiple releases, using Orthogonal Defect Classification (ODC). We exploit the information captured by ODC triggers to evaluate the development activities, and identify specific actions for improvement in development. We illustrate the use of triggers to capture customer usage in a way directly meaningful to product development and show
This is the most comprehensive use of ODC triggers in development and field reported to date.
ERROR INJECTION AIMED AT FAULT REMOVAL IN FAULT TOLERANCE MECHANISMS - CRITERIA FOR ERROR SELECTION USING FIELD DATA ON SOFTWARE FAULTS
Fault injection allows a detailed study of complex interactions between faults and fault handling mechanisms. It can be a useful complement to analytical modeling and formal verification techniques in the testing of fault tolerant systems. However, work on fault injection has not matured adequately to provide industry with cost effective alternatives for the validation of fault tolerant systems. This study analyzes 408 customer discovered faults (defects) in a release of a large operating system product. We discuss methods to select the error types for an error injection experiment in the system test environment, aimed at fault removal. Using four levels of severity and a total of 24 error types as recorded in the customer defects records, we analyze the faults in terms of fault types and system test triggers as defined in ODC. Our work shows examples of criteria that can be used to select errors for injection that use the information from the field reported defects. In particular:
As software dominates most discussions in the information technology business, one needs to examine carefully where we are headed in software reliability. It is reasonable to ask about the nature of software faults and the remedy for them, either from a fault-tolerance perspective or, more generally, on a software dependability front. There are conferences on this topic, and over 300 technical papers that discuss some of its aspects. However, as many industry specialists agree, software is one area in the information technology industry which continues to baffle the scientist from a dependability perspective. While hardware & technology have seen four orders of magnitude improvement in the past decade, software has probably marginally improved or, some will argue, gotten worse. One then wonders if research in this area is headed in the right directions?
Thus, it is vital to examine some of the fundamentals, e.g., "What is a software failure?" This question is usually assumed to be well understood. However, when one examines it closely, it is startling how little is truly understood regarding those faults that matter. This problem is appreciably better understood in hardware systems, or systems, have been better tracked over the years and there is a larger body of experience in failure modes & effects analysis. However, the counterpart in software is far less understood. It is further complicated by a lack of clarity as to what is a software failure.
GUIDING FAULT INJECTION AND BROADENING ITS SCOPE USING ODC TRIGGERS
Thus, the ideas of fault-injection are made applicable to a much larger domain - namely of products which don't have to be fault-tolerant in design to take advantage and improve overall system reliability.
A COMPARATIVE ANALYSIS OF EVENT TUPLING SCHEMES
Event logs provide an effective means of improving system availability. However, the majority of faults produce many errors because faults propagate in the time and error detection domains. Thus, the ability to coalesce related events is critical.
The tupling heuristics developed at Carnegie-Mellon University provide one such methodology. These heuristics were applied to a new and larger set of data in order to evaluate the generality of the scheme and to extend the previous work. The extensions included deriving a semantic understanding of why the rules work, expanded statistical analysis, and a comprehensive sensitivity study to determine the effects of changes in the rules.
The results prove that tupling is a useful and general methodology. The sensitivity study enabled the identification of refinements to the rules, while the high degree of skew in the tuple variables enables us to propose that the extreme percentiles be used as an alarm threshold for proactive fault management.
GENERATION OF AN ERROR SET THAT EMULATES SOFTWARE FAULTS - BASED ON FIELD DATA
A significant issue in fault injection experiments is that the injected faults are representative of software faults observed in the field. Another important issue is the time used, as we want experiments to be conducted without excessive time spent waiting for the consequences of a fault. An approach to accelerate the failure process would be to inject errors instead of faults, but this would require a mapping between representative software faults and injectable errors. Furthermore, it must be assured that the injected errors emulate software faults and not hardware faults. These issues were addressed in a study of software faults encountered in one release of a large IBM operating system product. The key results are:
The explosive growth in global networking infrastructures, as demonstrated by the World Wide Web (WWW) on the Internet, the IBM Global Network, the AT&T Business Network, and other networks managed by telecommunication companies has begun to open up new possibilities for collaborative processes on a global scale. We identify software development as one that can benefit immensely from the proper use of global networking. Its feasibility however is contingent on the resolution of at least four critical factors. First, virtual teams of software developers can be assembled intelligently and swiftly on a per-project basis from a global pool of resources. Second, the project owner can manage the development process using the workflow technology that can be deployed over a global network such as the Internet. Third, developers and testers can be equipped with development environments, tools, and methodologies that are logical extensions of their current environments and practices. Fourth, a suitable team communication infrastructure can be put in place to facilitate unstructured communication between team members. These concepts are explored briefly in this paper.
As software dominates most discussions in the information technology business
one needs
to carefully examine where we are headed in software
dependability. This paper
reexamines some of the basic premises
upon which the area of software fault-tolerance is
built and
critiques some current practices and beliefs. A few of the thoughts and
contributions are:
The definition of a software failure needs to change from a specification
based
thought to one of customer expectation and ability to do
productive work. This
will cause a significant shift on what we
build fault-tolerance for. However, it would
also help narrow the
gap between today's theory, practice, and customer need.
Data on
customer problems illustrates that 90% of the problems reported are what
we have traditionally considered as non-defect -- implying no need
for a
programming change. However, with the new definition of
failure, we will need to
address this more seriously as a part of
fault-tolerance. This change could level the
playing field and
help achieve greater customer satisfaction. A rationale for determining
the
amount of fault-tolerance based on the concept of the threshold of pain, is
suggested.
It helps guide the prioritization of fault-tolerance amongst
competing forces, by platform
and market segment.
In conclusion the paper reflects on a few of the development world
"realities" to temper
what can be achieved and what we as a
community need to be aware of.
SOFTWARE TRIGGERS AS A FUNCTION OF TIME - ODC ON FIELD FAULTS
The dynamics of software faults becoming failures during the use of a product is one of the least understood aspects regarding software faults today. This paper addresses this problem by analyzing the software triggers that activate faults into failures. The work is conducted on faults experienced by a large operating systems product for two years after release into the field. The results provide some of the first demonstrations of the changing trigger distribution as a function of time after release. Specifically, this paper:
This paper uses the relative growth of defects classified using Orthogonal Defect Classification to get a finer insight into dynamics of the software development process during later parts of testing. This is particularly useful to help identify management actions to better use people resources (both skill and staffing levels) to respond to difficulties experienced with the product in test. Specifically the technique helps to:
One of the perpetual challenges of a developer, is the use of in-process
measurement to quantify, understand, and
manage a software
development process. Software quality measurements are beset with undue
complexity and have
over time gradually advanced away from the
developer. In an area where the processes are so amorphous, the
tangibles required for measurement and modeling are few. In this
area, the need to derive tractable measurements that are reasonable to undertake
and intuitively plausible cannot be understated. Measurement without an
underlying
theme can leave the experimentalist, the theorist and
the practitioner very confused.
At the T.J. Watson Research Center we have been developing and deploying a
technology called Orthogonal Defect
Classification. ODC is a
significant breakthrough in measurement concepts and brings a new dimension to
software
defect analysis. It is established on a very subtle but
key finding - demonstrating the existence of a semantic
classification of defects which can explain the progress of the
product through the process. ODC opens the doors to
bring a
systematic and scientific methodology to the fuzzy area of in-process
measurement. Our research in this area
is growing as is our
experience with its use at several IBM labs.
A key problem in today's complex software systems is software failure
detection and isolation, given that most
software failures are
only partial, and if efficiently diagnosed, isolated and recovered, could avert
a total outage. The
probe detects failed software components in a
running software system by requesting service, or a certain level of
service, from a set of functions, modules and/or subsystems (target)
and checking the response to the request. The
objective is to
localize the failure only up to the level of a target, which achieving a high
degree of efficiency and
confidence in the process. Targets can be
identified at different levels or layers in the software. The choice is based
on the granularity of fault detection that is desired, taken in
consideration with the level at which recovery is
implemented. The
implementation of the probe system is made self-testing against any single
failure in its operational
components, using the idea of a null
probe. The probe system has been designed, taking advantage of the latency
characteristics of errors, to provide a low-overhead mechanism. The
ideas are implementable in either a single or
multiple computer
system.
ODC FOR PROCESS MEASUREMENT, ANALYSIS AND CONTROL
This paper provides the motivation and overview of Orthogonal Defect Classification (ODC), a new technology for software process measurement and analysis. ODC provides a significant step forward in being able to understand the dynamics of software development by using classification of defects, so that they provide measurements. This breakthrough is being used at several IBM labs and is now supported by several processes, analyses and tools from the Thomas J. Watson Research Center.
AN INFERENCE STRUCTURE FOR PROCESS FEEDBACK: TECHNIQUE AND IMPLEMENTATION
This paper presents an automatic technique for making simple inferences about the stages in a software production process along with notes about its implementation. The technique represents an approach to automate process feedback which may be based either on human experience and common sense or on historical data. Specifically, we present
In closing, we view the above as a case study of an approach which makes use of both human judgment and historical data, and generalize the lessons learnt.
An approach that involves both automatic and human interpretation to correct the software production process during development is becoming important in IBM as a means to improve quality and productivity. A key step of the approach is the interpretation of defect data by the project team. This paper uses examples of such correction to evaluate and evolve the approach, and to inform and teach those who will use the approach in software development. The methodology is shown to benefit different kinds of products beyond what can be achieved by current practices, and the collection of examples discussed represents the experiences of using a model of correction.
Consolidated work loads running around the clock means that today's large,
general-purpose computers must meet
high availability demands. To
meet these demands, the Model 982 provides fault tolerance by combining
enhanced
circuit-level error detection and failure isolation
techniques with system-level techniques exploiting inherent
redundancy.
We present a case study of the use of a software process improvement method which is based on the analysis of defect data. The first step of the method is the classification of software defects using attributes which relate defects to specific process activities. Such classification captures the semantics of the defects in a fashion which is useful for process correction. The second step utilizes a machine-assisted approach to data exploration which allows a project team to discover such knowledge from defect data as is useful for process correction. We show that such analysis of defect data can readily lead a project team to improve their process during development.
IN-PROCESS EVALUATION FOR SOFTWARE INSPECTION AND TEST
The goal of software inspection and test is to reduce the expected cost of software failure over the life of a product. This paper extends the use of defect triggers, the events that cause defects to be discovered, to help evaluate the effectiveness of inspections and test scenarios. In the case of inspections, the defect trigger is defined as a set of values that associate the skills of the inspector with the discovered defect. Similarly, for test scenarios, the defect trigger values embody the deferring strategies being used in creating these scenarios.
The usefulness of triggers in evaluating the effectiveness of software inspections and tests is demonstrated by evaluating the inspection and test activities of some software products. These evaluations are used to point to both deficiencies in inspection and test strategies and to progress made in improving such strategies. The trigger distribution of the entire inspection or test series may then be used to highlight areas for further investigation, with the aim of improving the design, implementation, and test processes.
SOFTWARE RECREATE PROBLEMS ESTIMATED TO RANGE 10-20 PERCENT: A CASE STUDY ON TWO OPERATING SYSTEM PRODUCTS
Software recreates are necessitated due to inadequate diagnostic capability following a failure. They impact the service process and the perception of availability, but have never been adequately quantified. This paper develops a technique to make the key measurements of: percent recreate, arrival rate and open time, from problem service data without requiring any additional instrumentation. The study is conducted over an 18 month period on two operating system products, that are among the best in the industry for diagnosis and service. The results provide the first insight into the problem and some accurate baselines. Specific to these products:
Clearly, the problem is not insignificant and the results underscore the need for improvement in diagnosis and isolation.
ON THE EVALUATION OF SOFTWARE INSPECTIONS AND TESTS
The goal of software inspections and tests is to reduce the expected cost of software failure over the life of a product. This paper extends the use of defect triggers, the events which cause defects to be discovered, to help evaluate the effectiveness of inspection and test activities. In the case of inspections, the defect trigger is defined as a set of values which associate the skills of the inspector with the discovered defect. Similarly, for tests, the defect trigger values embody the various strategies being used in creating test scenarios.
The usefulness of triggers in evaluating the effectiveness of software inspections and tests is demonstrated by evaluating the inspection and test activities of some software products. These evaluations are used to point to both deficiencies in inspection and test strategies, and progress made in improving such strategies.
EXPERIENCES IN TRANSFERRING A SOFTWARE PROCESS IMPROVEMENT METHODOLOGY TO PRODUCTION LABORATORIES
This paper describes the experience of transferring a software process methodology developed in a research laboratory to different production laboratories at IBM. The methodology involves the classification and analysis of software defects with a view to improving the software development process. The experience is reported in two parts. The first part details those factors which were anticipated, at least to some extent, or which seem more obvious. The second part details those parts of the experience which were completely unanticipated, appear quite subtle, and which are probably not well known.
This paper provides a new insight into the design of system level fault injection experiments. The failure acceleration theory is used to conduct an experiment on the NFS distributed file system. A matched pair of experiments are conducted at two different levels of acceleration, studying its effect on two key parameters: Probability of Failure and Error Propagation. In the second set, these are done approaching almost maximum acceleration, yielding some insight into how acceleration works and validating earlier theory. These results are valuable to experimentalists since they provide the stepping stones towards systematic design of such experiments. Specifically:
The paper will be useful to the fault-injection community, experimental validation, and provide insight to modelers.
This paper identifies key problem areas for the fault-tolerant community to
address. Changes in technology,
expectation of society, and needs
of the market pressure the design point for fault-tolerance in their own
special
manner. A developer who has only a fine set of resources
and limited time, responds to these pressures with a set of
priorities. I believe that the top five challenges, which ultimately
drive the exploitation of fault-tolerant technology are:
(1)
Shipping a product on schedule, (2) Reducing unavailability, (3) Non-disruptive
change management, (4) Human
fault-tolerance, (5) All over again
in the distributed world. Each of these are discussed to explore their influence
on
the choice for fault-tolerance. Understanding them is key to
guide research investment and maximize its derivatives.
This paper describes orthogonal defect classification (ODC), a concept that enables in-process feedback to developers by extracting signatures on the development process from defects. The ideas are evolved from an earlier finding that demonstrates the use of semantic information from defects to extract cause-effect relationships in the development process. This finding is leveraged to develop a systematic framework for building measurement and analysis methods. This paper
A COMPARISON OF SOFTWARE DEFECTS IN DATABASE MANAGEMENT SYSTEMS AND OPERATING SYSTEMS
A clear understanding of software defects that occur in the field is critical for the development of effective validation methods and strategies for fault-tolerance. This paper presents an analysis of software defects reported at customer sites in two large IBM database management products, DB2 and IMS. The analysis considers several different error classification systems and compares the results to those of an earlier study of field defects in IBM's MVS operating system. The paper:
DESIGN FOR FAULT-TOLERANCE IN SYSTEM ES/9000 MODEL 900
The ES/9000 Model 900 is IBM's high-end fault-tolerant commercial processor. Although, high-end commercial processors were traditionally designed to be very reliable, this is the first one that implements a fault-tolerant machine. The design exploits circuit level concurrent-error detection, fault-identification and reconfiguration with system level techniques when multiple functional resources are available. It provides true graceful degradation during Central Processor or Channel reconfiguration and repair. This paper:
RELIABILITY GROWTH FOR TYPED DEFECTS
This paper presents a reliability growth model for defects that have been categorized into defect types associated with specific stages in the software development process. Modeling the reliability growth of defects for each type separately allows identification of problems in the development process which may otherwise be masked when defects of all types are modeled together. This paper:
Since each defect type can be associated with a software development stage, comparing the estimated defect detection rates and the dependency between types provides a basis for feedback on the process.
In recent years, software defects have become the dominant cause of unplanned outage; improvements in software reliability and quality have not kept pace with those of hardware. Despite their importance, software defects are still not understood adequately enough to provide a clear strategy for avoiding or tolerating them. To gain the necessary insight, we study defects reported between 1986 and 1989 in the MVS* Operating System. The study compares typical defects (regular) to those that corrupt a program's memory (overlay), given that overlays are considered by field services to be particularly hard to find and fix.
This paper:
Further analysis is provided on defects in fixes to other defects, failure symptoms, and the impact of defects on customers. These results provide a base line understanding useful to designers and developers. The data will also help develop realistic fault models for use in fault-injection experiments.
*MVS is a Registered trade mark of the IBM Corporation.
This paper describes Orthogonal Defect Classification, a means by which defects can be used to provide feedback on the development process. A careful selection of classification codes with orthogonal properties provide signatures in the distribution of the codes. These signatures reflect the progress of the process, detect departures when they occur, and provide the necessary insight to make adjustments.
The properties of software defects are captured by the defect type, defect trigger, source, impact, and environment attributes. The paper describes these attributes and illustrates their use with results from pilot studies in many IBM labs. It is noted that Orthogonal Defect Classification has the merit of being independent of product, thereby providing a framework for general use.
SOFTWARE DEFECTS AND THEIR IMPACT ON SYSTEM AVAILABILITY - A STUDY OF FIELD FAILURES IN OPERATING SYSTEMS
In recent years, software defects have become the dominant cause of unplanned outage; improvements in software reliability and quality have not kept pace with those of hardware. Despite their importance, software defects are still not understood adequately enough to provide a clear strategy for avoiding or tolerating them. To gain the necessary insight, we study defects reported between 1986 and 1989 in the MVS* Operating System. The study compares typical defects (regular) to those that corrupt a program's memory (overlay), given that overlays are considered by field services to be particularly hard to find and fix.
This paper:
Further analysis is provided on defects in fixes to other defects, failure symptoms, and the impact of defects on customers. These results provide a base line understanding useful to designers and developers. The data will also help develop realistic fault models for use in fault-injection experiments.
This paper presents an empirical investigation on possible cause and effect relationships between defects and the development process. Establishing such relationships is critical to make software development into a process with greater understanding and control. This paper:
Thus, we show that it is plausible that there exist other cause-effect relationships that could be identified. The impact of this finding is that it could well pave the way for a more systematic process control methodology to be applied to software development.
The difficulty with the measurement of fault latency is due to the lack of observability of the fault occurrence and error generation instants in a production environment. This paper describes an experiment, using data from a VAX 11/780 under real workload, to accurately study fault latency in the memory subsystem. Fault latency distributions are generated for s-a-0 and s-a-1 permanent fault models. The results show that the mean fault latency of an s-a-0 fault is nearly 5 times that of the s-a-1 fault. An analysis of variance is performed to quantify the relative influence of different workload parameters on the measured latency.
This paper uses fault injection to characterize large system failures. Thus, it overcomes limitations imposed by the lack of complete information in field failure data. The experiment is conducted on a commercial transaction processing system and this paper:
These results enhance our understanding of large system failures and provide a foundation for design enhancements and modeling of availability.