IBM Research

Center for Software Engineering

IBMIBM ResearchComputer ScienceCenter for Software EngineeringLegalPrivacySearch IBM

 

Abstracts (Last revised on 5/2/2002)


Automated Generation of Self-Checking Function Tests
Amit Paradkar
Submitted to ISSRE 2002, July 2002

This paper describes a new unified test generation method which simultaneously addresses the following critical issues in software function testing: (1) Selection of appropriate combinations of parameter values for testing individual operations, (2) Selection of appropriate sequence of operation invocations, (3) Generation of test oracles in the form of self-checking sequences, and, (4) Generation of negative test cases. Our method exploits a novel mutation scheme applied to operations specified as pre-, and postconditions (actions) on parameters and state variables; a set of novel abstraction techniques which result in a new form of compact transition system, called quasi-reachability graph; and the techniques developed for planning under resource constraints to automatically generate self-checking positive and negative test cases with appropriate parameter values. The test cases generated using our approach target detection of certain faults in an implementation. We discuss the reduction techniques used in our method to control the size of the generated test suite. We also report our experiences with using our method in an industrial setting.


 

Improving Education of Software Engineers Through Use of Defect Analysis
Theresa Kratschmer
Submitted to IEEE Software Magazine, Sept/Oct 2002

In this paper, we discuss the use of software defect analysis in teaching professional software engineers how to improve their development process. After a brief summary of the basics of the defect analysis methodology, we discuss how the education of software developers in this technology has evolved over the years, reflecting changes in our deployment model. We also highlight factors that have been critical for successful learning. Our results indicate that defect analysis is an effective and expedient method of teaching software engineers the skills to improve the development process and the quality of products. However, it is critical that the appropriate amount of information is conveyed at the right time, that there is plenty of opportunity to practice the newly acquired knowledge , and that the educational materials effectively support the concepts and principles taught in order to reach large audiences and achieve maximum retention of the material.


 

Decision Support for Software Management
Sunita Chulani, P. Santhanam, Bob Leszkowicz, Anna Nowacki
Submitted to IEEE Software

Typical software metrics and measurements (e.g. code coverage, McCabe complexity, Function points, OO metrics, etc) target the technical professionals in a software development organization. While this is useful to the practitioners in the trenches, the managers and executives of these organizations face a broader set of issues and concerns such as functionality and schedule compared to the competition, effectiveness of the support to customers, meeting customer expectations at a reasonable cost, market share, customer retention, etc. In order to make better decisions on these issues, the collection of business and software engineering data and its regular use throughout the product life cycle is essential.

The product life cycle typically consists of initial market research, design and development of the product, service and support of the product during which various pieces of information related to the product are collected. The aim of this paper is to describe a research project that analyzes data from three separate data sources: (i) customer support, (ii) critical situations involving customers and (iii) customer satisfaction surveys. We describe how all these data, currently used in isolation, can be integrated at the product level. This enables mid and high level management to derive relationships among the inhibitors and drivers that are important through the entire product life cycle. The data can be used to successfully understand the sensitivity of the metrics across development, service, customer satisfaction and finally, revenue. For example,

In addition, we highlight some of the issues involved in the data integration and the lessons learned in building a consolidated data warehouse. We also present a discussion of the models and relationships between some of the different data elements and suggest directions for future research.


 

Bayesian Methods for Change-Point Detection in Long-Range Dependent Processes
Bonnie K. Ray, Ruey S. Tsay
To appear in Journal of Time Series Analysis, Blackwell Publisher, 2002

We describe a Bayesian method for detecting structural changes in a long-range dependent process. In particular, we focus on changes in the long-range dependence parameter, $d$, and changes in the process level, $\mu$. Markov chain Monte Carlo methods are used to estimate the posterior probability and size of a change at time $t$, along with other model parameters. A time-dependent Kalman filter approach is used to evaluate the likelihood of the fractionally integrated ARMA model characterizing the long-range dependence. The method allows for multiple change points and can be extended to the long-memory stochastic volatility case. We apply the method to three examples, to investigate a change in persistence of the yearly Nile River minima, to investigate structural changes in the series of durations between intraday trades of IBM stock on the New York Stock Exchange, and to detect structural breaks in daily stock returns for the Coca Cola Company during the 1990's.


 

Modeling Vector Nonlinear Time Series Using POLYMARS
Bonnie K. Ray, Jan DeGooijer
To appear in Computational Statistics and Data Analysis, Elsevier Publisher, 2002

A modified multivariate adaptive regression splines method for modeling vector nonlinear time series is investigated. The method results in models that can capture certain types of vector self-exciting threshold autoregressive behavior, as well as provide good predictions for more general vector nonlinear time series. The effect of different model selection criteria on fitted models and predictions is evaluated through simulation. The method is illustrated for a real data example, to model a series of intra-day electricity loads in two neighboring Australian states.


 

Software Engineering Economics
Sunita Chulani, Barry Boehm
To appear as chapter in Software Management Tutorial, 6th Edition

This paper summarizes the current state of the art and recent trends in software engineering economics. It provides an overview of economic analysis techniques and their applicability to software engineering and management. It surveys the field of software cost estimation, including the major estimation techniques available and the state of the art in algorithmic cost models.


 

Specifying and Selecting Small Yet Effective Set of Parameter Values for Testing
Amit Paradkar
Submitted to ACM SIGSOFT Foundations of Software Engineering - 10, November 11, 2002

We present an approach to selecting parameter values for testing an operation. The main contributions are (1) a method to derive a small set of test variations, which are constraints on parameters, from the operation's test specification (2) a method to select parameter values which satisfy the test variations (3) an evaluation which indicates that our approach is better than those based on the Design of Experiments (DOE) techniques and (4) results and lessons learnt from two case studies using our approach in an industrial setting. In particular, we present techniques for deriving test variations from test specifications which describe logical relationships among the manually derived abstract parameter partitions. The test specifications also associate concrete data values with each abstract partition. We have implemented our techniques in a tool. We conducted an experiment to compare our techniques with those based on DOE. We compared cost, measured by test data set size, and effectiveness, measured by code and mutant coverage, of the two approaches.


 

A Technique for Generating Optimized System Test Suites
Clay Williams, Theresa Kratschmer
Submitted to the 10th International Symposium on the Foundations of Software Engineering (FSE-10)
November 2002

One goal of system testing is to test the complete set of functional capabilities delivered by a system. As systems deliver increasingly large sets of functional capabilities, developing efficient and effective test cases for such testing can be a challenging task. Using concepts from the Unified Modeling Language (UML), we present a representation for testing system functionality called a Constrained Use Case Flow Graph (CUCFG.) We describe an approach for optimizing a CUCFG based on integer linear programming (ILP) and show how to generate test plans or test cases from the optimized graph. Finally, we present an analysis of the technique based on its use in testing a data analysis system. This analysis explores both in-process and field discovered defects.


 

Use of Defect Analysis in Transferring a Software Project
Theresa Kratschmer, P. Santhanam
Submitted to IEEE Software, Nov/Dec 2002

There is an increasing trend in the IT industry to transfer the development and/or maintenance responsibility of a software project from one organization to another. The typical business reasons for such a decision are change in the mission of the original group and/or perceived reduced cost of development and support due to a less expensive alternative. There is generally an attempt to make sure that this transfer does not affect the customers adversely in their satisfaction with product functionality and quality. Once the decision is made to go ahead with the transfer of mission to another group, there remains the daunting task of how to make it happen in the most efficient manner so that the new organization can come up to speed in their understanding of the software project details and the implied processes to achieve their goal. In this paper, we discuss a unique application of software defect analysis in a real project transfer situation to help in this task and the results of this experience.

The specific concerns that the receiving organization had were:

Since the transferring team had been collecting the software defect data during its development process, we proceeded to analyze them to address these concerns in an objective way. The defect analysis, which included the use of Orthogonal Defect Classification (ODC) methodology, indicated several significant things. First, we were able to determine from the defect analysis which components had more exposure in terms of lack of requirements definition and design completeness so that the appropriate documentation can be included in the contractual specifications. Secondly, the analysis showed that the planned approach for training testers would be not effective since some of the existing problems in the transferring organization will also be carried over to the receiving organization. Therefore, we suggested an alternative approach for training. Finally, we concluded that the design of this product was still evolving and that the testing targeted basic functions only. The level of testing complexity for specific components was assessed and the results of this analysis were then reported to the technical teams so that improvements could be made. Overall, the use of defect analysis proved surprisingly effective in dealing with mission transfer across organizations.


Software debugging, testing, and verification
B. Hailpern and P. Santhanam
IBM Systems Journal, Vol. 41, No. 1, February 2002.

In commercial software development organizations, increased complexity of products, shortened development cycles, and higher customer expectations of quality have placed a major responsibility on the areas of software debugging, testing, and verification. As this issue of the IBM Systems Journal illustrates, there are exciting improvements in the underlying technology on all three fronts. However, we observe that due to the informal nature of software development as a whole, the prevalent practices in the industry are still immature, even in areas where improved technology exists. In addition, tools that incorporate the more advanced aspects of this technology are not ready for large-scale commercial use. Hence there is reason to hope for significant improvements in this area over the next several years.


THE STCL TESTWARE ARCHITECTURE

The STCL Architecture Committee
IBM Systems Journal, Vol. 41, No. 1, February 2002.

The Software Test Community Leadership (STCL) is an IBM wide initiative focused on improving software test and quality practices across IBM. In 1999, they began working to develop an architecture to integrate the strategic testing tools across IBM. This article discusses the issues associated with developing an architecture for a large base of existing tools which span a variety of platforms and domains. This includes both the phases involved in developing the architecture, the standards selected for implementing it, and the artifacts that must be developed to support the variety of tools used in test across IBM. The architecture is being designed and developed in four phases: enabling data exchange, providing data integration, providing control integration, and providing a single GUI into the toolset. Because of the heterogeneous nature of the platforms and domains the architecture must support, using open, widely available standards was essential. The architecture uses XML as a data exchange medium, WebDAV as a data integration standard, Java for control integration, and the Eclipse framework for building an integrated GUI. Given these standards, we discuss the artifacts required to provide generic "plug-and-play" capability in the architecture. Data integration is managed by defining generic resource types, which can be mapped to specific tools that support common test capabilities. Control integration will be managed by providing a framework in Java for managing cooperation and workflow between tools. GUI integration will be provided by developing components using Eclipse which support the resource types and integrate into the control framework . We present examples of each of these phases, discuss our progress to date, and outline our roadmap for the future.


IMPROVING SOFTWARE TESTING VIA ODC: THREE CASE STUDIES

Mark Butcher, Hilora Munro (IBM UK), and Theresa Kratschmer
IBM Systems Journal, Vol. 41, No. 1, February 2002.

Orthogonal Defect Classification (ODC) is a technology which uses analysis of in-process and customer reported defects to improve the quality of software products. In this paper, we will discuss three different products that have used ODC to improve overall quality by focusing on specific areas of the testing process. The first product, MQSeries, is a family of products providing messaging, business integration, and workflow management services. The messaging product development team started using ODC in May of 2000. They had three objectives: 1) measuring test effectiveness, 2) evaluating customer usage in order to make improvements to test, and 3) identifying and prioritizing specific actions to implement in the development process to improve quality. The second product, CICS, is an application server that provides industrial strength, online transaction management for mission-critical applications. Despite being a world-class quality product, CICS has implemented ODC to analyze customer reported defects in order to improve customer satisfaction and decrease warranty costs by strengthening test. The final product to be discussed is SoDA, a product developed to allow testers and developers to visualize their ODC classified data. ODC has been crucial in the management of SoDA development to help focus the limited resources available for testing.


NEW METRICS TO EVALUATE VENDOR DEVELOPED SOFTWARE BASED ON THE TEST CASE EXECUTION RESULTS

Kathryn Bassin, Shriram Biyani, and P. Santhanam
IBM Systems Journal, Vol. 41, No. 1, February 2002.

Due to skill and cost considerations, a growing number of organizations rely on external vendors to develop the software for their needs. One of the challenges the vendee organization faces is the evaluation of the delivered code in terms of functionality, performance, etc. Due to the implicit risks in software projects, the contractual commitments for quality and completeness are generally at a high level and typically the vendee organization ends up with its own system/acceptance test to validate its own expectations. Typically the code gets delivered in an incremental fashion over a few iterations and much of the day-to-day data from the vendor are not available to the vendee. Sydney Olympics was one such project and when IBM had to do the evaluation of a vendor delivered code, we helped the IBM Olympics management team evaluate the readiness of the delivered code based on the execution records of the test cases. We developed some new innovative metrics that can be derived from the actual test execution results to distinguish the various common scenarios of test case failure. Examples are: functionality never enabled, bad fixes, defects that were never fixed over successive iterations, etc. We were able to calculate these metrics based on actual data and provide them to the management. The relationship of these metrics to the actual cause was validated through explicit communications with the vendor and the subsequent actions to improve the quality and completeness of the delivered code. This paper will show how this metrics can be derived from the execution data and used in a real software project execution environment.


AN APPROACH TO HIGHER RELIABILITY USING SOFTWARE COMPONENTS

Hongxia Jin and P. Santhanam
To be presented at the 12th International Symposium on Software Reliability Engineering (ISSRE) 2001, Hong Kong, November 27-30, 2001

The general belief that component reuse improves software reliability is based on the assumption that the prior usage has exposed the potential software faults. In reality, this is not necessarily true due to the inherent differences in the environments and usage of the component. To achieve a high reliability for a component-based software system, we need reliable components that interoperate properly in the new environment. In this paper, we present a unified approach to do an evaluation of the interoperability of components. This involves a generic and systematic capture of the component behavior that expresses the various assumptions made by the designers about components and their interconnections explicitly. With the information captured at a semantic level, this approach can detect potential mismatches between components in the new environment and also give guidance on how to resolve the mismatches to fit components in the new context. The capture of this information in an appropriate format (e.g. XML) and an automated analysis can show serious exposures to reliability in a component-based system, before it is integrated.


EVALUATING THE SOFTWARE TEST STRATEGY FOR THE 2000 SYDNEY OLYMPICS

Kathryn Bassin, Shriram Biyani, and P. Santhanam
To be presented at the 12th International Symposium on Software Reliability Engineering (ISSRE) 2001, Hong Kong, November 27-30, 2001

The 2000 Summer Olympic Games event was a major information technology challenge. With a fixed deadline for completion, its inevitable dependency on software systems and immense scope, the testing and verification effort was critical to its success. One way in which success was assured was the use of innovative techniques using ODC based analysis to evaluate planned and executed test activities. The techniques were used to verify that the plan was comprehensive, yet efficient, and ensured that progress could be accurately measured. This paper describes some of these techniques and provides examples of the benefits derived. We also discuss the applicability of the techniques to other software projects.


MANAGING THE MAINTENANCE OF PORTED, OUTSOURCED, AND LEGACY SOFTWARE VIA ORTHOGONAL DEFECT CLASSIFICATION

Kathryn Bassin and P. Santhanam
To be presented at the International Conference on Software Maintenance 2001, Florence, Italy, November 6-11, 2001.

Are you stymied by the challenges of managing and evaluating software that includes legacy, ported, and outsourced code?

It may be that you have the means to address many of your challenges already at hand. Defect records either discovered in-house or by customers (or both!), classified using Orthogonal Defect Classification (ODC), provide a wealth of information pertinent to many aspects of developing, verifying, and maintaining outsourced and ported software.

This paper sets forth a method of exploiting this data to provide decision support regarding maintaining and measuring a complex project, isolating and defining specific problem areas, and taking actions targeting these areas to mitigate the risk associated with them. Actual case studies will be used to illustrate key points.


TOWARD A TEST-READY META-MODEL FOR USE CASES

Clay Williams
Presented at Practical UML-Based Rigorous Development Methods -- Countering or Integrating the eXtremists Workshop (co-located with UML 2001), July 27, 2001

In the UML, use cases are used to define coherent units of functionality associated with classifiers (classes, subsystems, or systems). Two principal purposes that use cases serve are specifying the functionality the classifier will provide and providing a basis for developing test cases for the classiifier. This paper discusses issues that arise when using use cases as the basis for model-based testing. Based on this discussion, a test-ready meta-model for use cases is developed. Also described is a tool constructed using the concepts from the meta-model and data is provided from the initial pilots of this tool.


DERIVING A SOFTWARE QUALITY VIEW FROM CUSTOMER SATISFACTION AND SERVICE DATA

Sunita Chulani, P. Santhanam, Darrell Moore, Bob Leszkowicz, Gary Davidson
Presented and published in the Proceedings of the European Software Control and Metrics (ESCOM)
Conference, London, England, April 2001.

Most quality and cost models use defect density to represent software quality. Customer's quality expectations are not typically based on size and complexity of the product they buy and their satisfaction can be influenced substantially by other product attributes that are not typically mapped to defects (e.g. Ease of installation and use, timely support, etc.). Consequently, new ways to measure customer view of quality are needed. In this paper, we provide analyses of customer service and survey data from eight products and discuss some key insights to lay the foundation for a better understanding of customer view of software quality. We believe that this approach can help us identify actions in software development and support that will address the concerns of our customers and improve their satisfaction with our products.


CERTIFYING THE CORRECTNESS OF BRANCH-AND-BOUND COMPUTATIONS*

Hongxia Jin, Gregory F. Sullivan and Gerald M. Masson
Submitted to the IEEE Transactions on Software Engineering, 2000

Many broadly applied combinatorial optimization problems are computationally intensive, sometimes are NP-hard, such as the well known Traveling Salesman problem, and the Knapsack problem. The execution of intensive computations can suffer from faults in hardware and errors in software which remain dormant during less intensive computations. These problems provide a major motivation for developing techniques for certifying answers. Indeed, we show that checking the correctness of the results obtained from these intensive computations can also be computationally intensive. We have conducted an investigation of problems which are amenable to the powerful and widely used 'branch-and-bound' strategy. This strategy can be used to solve difficult problems such as, TSP problem and the Knapsack problem.
We describe an efficient result checking technique based on certificate. We show how the certificate-based approach can productively be applied to efficiently checking the correctness of results generated by branch-and-bound programs. We apply the method to a generic branch-and-bound algorithm and we prove the certificate for branch-and-bound programs are correctly developed to check the correctness of the results. We will apply the correctness checking of branch-and-bound approaches to several computationally intensive problems. We also present experimental results, which demonstrate that the certifier computation is much faster than the original computation. This means that dramatic time savings are possible when building reliable software using the described method.
*This research was supported by NSF grant CCR-9804076

APPROXIMATE CORRECTNESS-CHECKING OF COMPUTATIONAL RESULTS*

Hongxia Jin, Gregory F. Sullivan and Gerald M. Masson
IEEE Transactions on Reliability, December 1999, pp 338-350

The process of checking the correctness of time-critical computations involves a fundamental and inherently conflicting tradeoff: namely, the precision of the checking versus the time required to perform the checking process. This tradeoff is particularly relevant when the computations are intensive and can require significant time for completion. In this paper, we address this issue by developing an innovative technique based on an extension of the utilization of certificates which permits the precision of the correctness checking to be explicitly specified. We develop this concept relative to the broad class of problems which are amenable to the powerful and widely used 'branch-and-bound' strategy. It is extremely relevant to the motivation and significance of this paper to keep in mind that checking the correctness of results for computationally intensive problems can be just as demanding as generating the results themselves. Accordingly, for time-critical applications in which a run-time result verification must be accomplished before a generated result can be released, a methodology for highly efficient approximate correctness checking of results in which it is efficiently determined that an obtained result is within some specified percentage of an optimal solution can be of enormous importance. This paper presents an approximate correctness checking methodology in which the precision of the result check can be explicitly specified and guaranteed even when an exact correct result is unknown. We will show that if the precision of the correctness check is approximated to be only slightly less than an exact check, a dramatic efficiency in the checking process can be attained, in some cases reaching orders of magnitude in speedup. This controllable tradeoff inherent in our methodology can be the basis for new classes of adaptive result checking strategies in which the precision of the check of a result can be varied depending on the impact of the result within time-critical applications.
*This research was supported by NSF grant CCR-9804076

DISTRIBUTED APPLET-BASED CERTIFIABLE PROCESSING*

Hongxia Jin, Gregory F. Sullivan and Gerald M. Masson
Submitted to IEEE Transactions on Parallel and Distributed Systems, 2000

Numerous applications of distributed computing implementations demand highly efficient computational result correctness checking. We describe and demonstrate the concept of Distributed Applet-based Certifiable Processing (DACP). DACP offers a low-overhead framework for Web-based distributed environments in which a main machine (server) partitions a given computational problem into a set of sub-problems, distributes these sub-problems across a network to other machines (clients), efficiently certifies the correctness of the sub-problem results returned by the clients, and then assembles them into a final solution of the original computational problem. The resource and time advantages of the DACP methodology are directly related to the effectiveness and efficiency offered by an innovative distributed implementation of the certificate-based approach to computational result checking. We have considered a class of computationally intensive search problems which provides a major motivation for developing techniques for result correctness checking. We have conducted an investigation of problems which are amenable to widely used branch-and-bound strategy. We apply the DACP methodology to a generic branch-and-bound algorithm and also prove the correctness of the three algorithms which are designated the original, primary and certifier algorithms. Our experimental assessment of DACP, performed with the use of Java applets which we have developed to provide distributed branch-and-bound solutions, considers variations in sub-problem partitionings. These results emphatically demonstrate that within a Web-based distributed environment employing DACP, the time required to certify the correctness of the computation plus the communication overhead is small compared to the time required to perform the computation. The results in this paper indicate that DACP offers significant advantages in comparison with other known result correctness checking techniques for reliable distributed computing.
*This research was supported by NSF grant CCR-9319945

SALT - AN INTEGRATED ENVIRONMENT TO AUTOMATE GENERATION OF FUNCTION TESTS FOR APIs

Amit Paradkar
Presented and in Proceedings of the International Symposium on Software Reliability Engineering (ISSRE'00), San Jose, CA, October 2000.

Function testing is one of the most critical defect removal activities in software life cycle. Studies indicate that significant fraction of field defects could have been found during function testing. Furthermore, test design phase of function testing consumes approximately two thirds of the effort. Thus, automation of test design is essential both to reduce the substantial cost of testing and to improve the delivered software reliability. Few commercial tools attempt to address test design automation but have limitations of their own. We argue for a model-based approach specifically designed from test perspective to automate test design. We also describe essential characteristics of an environment meant for such test design automation - like separation of logical model of the functions under test from the data model to be used in testing, associating a plausible fault model with the logical model, and flexibility in user specified test conditions and test sequences etc. We describe features of Specification and Assertion Language for Testing (SALT) environment which embodies these characteristics. SALT allows testers to capture relationships among partitions of input and output variables for a function under test. Tester can also specify (potential) updates to context which result from the function invocation. This context enables generation of sequences of function invocations with expected outputs. These test specification along with a fault model allow generation of an optimized set of test variations. Also, the language provides hooks for fragments of actual test cases to be associated with a function. This enables derivation of test cases from the test variations. We describe an example to illustrate SALT usage and report results of our pilot study using SALT.

SOFTWARE DEVELOPMENT COST ESTIMATION APPROACHES

Sunita Chulani, Barry Boehm and Chris Abts
Published in the Annals of Software Engineering, Fall 2000.

This paper summarizes several classes of software cost estimation models and techniques: parametric models, expertise-based techniques, learning-oriented techniques, dynamics-based models, regression-based models, and composite-Bayesian techniques for integrating expertise-based and regression-based models. Experience to date indicates that neural-net and dynamics-based techniques are less mature than the other classes of techniques, but that all classes of techniques are challenged by the rapid pace of change in software technology. The primary conclusion is that no single technique is best for all situations, and that a careful comparison of the results of several approaches is most likely to produce realistic estimates.

COCOMO II

Sunita Chulani
Published in Wiley Software Engineering Encyclopedia, Fall 2000.

This article introduces the reader to the Constructive Cost Model (COCOMO) II - a well-known model used in software cost and schedule estimation. First, the background of the model is presented, tracing the history of COCOMO from its inception to the present time. Second, a detailed description of COCOMO II is presented, including inputs and outputs, internal algorithms, and calibration results. Third, some of the major differences between COCOMO 81, Ada COCOMO and COCOMO II are described. Fourth, some currently available computerized editions of COCOMO II are briefly discussed. Finally, an independent evaluation of COCOMO II is presented. Although this article cannot provide all the information required to understand thoroughly or use COCOMO II, it will give the reader a basic comprehension of the model and its capabilities.

FUTURE TRENDS, IMPLICATIONS IN COST ESTIMATION MODELS

Sunita Chulani, Barry Boehm, Chris Abts, Jongmoon Baik, A. Windsor Brown, Brad Clark, Ellis Horowitz, Ray Madachy, Don Reifer and Bert Steece
Published in Crosstalk, the Journal of Defense Engineering, April 2000.

The rapid pace of change in software technology requires everybody in the software business to continually rethink and update their practices just to stay relevant and effective. This article discusses this challenge first with respect to the USC COCOMO II software cost modeling project, and then for software-intensive organizations in general. It then presents a series of adaptive feedback loops by which organizations can use COCOMO II-type models to help cope with the challenges of change.

FROM MULTIPLE REGRESSION TO BAYESIAN ANALYSIS FOR CALIBRATING COCOMO II

Sunita Chulani, Barry Boehm, and Bert Steece
Best Software Track paper, Best Conference paper. Proc. of the 21st Annual Conference of the International Society of Parametric Analysts (ISPA), Spring 2000.
Published in the Journal of Parametrics, Spring 2000.

Created to provide a software cost estimation model suited for a rapidly evolving environment, the COCOMO II model is the result of a 1994 research effort to update the 1981 COnstructive COst MOdel and its 1987 Ada version. Boehm et al [Boehm95, USC-CSE98] provided the initial definition and rationale for this model. The model’s inputs include Source Lines of Code and/or Function Points as the sizing parameter, adjusted for both reuse and breakage; a set of 17 multiplicative effort multipliers and a set of 5 exponential scale factors. The first calibration was based on a dataset of 83 completed projects collected from Commercial, Aerospace, Government and FFRDC organizations using a 10% weighted average multiple regression approach. It was presented at the ISPA conference in 1997 and became popular as COCOMO II.1997 [Chulani98]. Since then, the COCOMO II database has grown to 161 datapoints. The Bayesian approach was used to calibrate the 2000 version of the model to 161 datapoints [Chulani99].
This paper compares and contrasts the two calibration approaches, namely the 10% weighted average multiple regression approach and the Bayesian approach; used to calibrate the successive versions of COCOMO II, i.e. the 1997 and 2000 calibrations. It concludes that the Bayesian approach used in the 2000 calibration is better and more robust than the multiple regression approach used in the 1997 calibration. We note that the predictive performance of the Bayesian approach (i.e. COCOMO II.2000) is significantly better than that of the multiple regression approach (i.e. COCOMO II.1997). COCOMO II.2000 gives predictions that are within 30% of the actuals 75% of the time where as COCOMO II.1997 gives predictions within 30% of the actuals only 52% of the time.

BAYESIAN ANALYSIS OF EMPIRICAL SOFTWARE ENGINEERING COST MODELS

Sunita Chulani, Barry Boehm and Bert Steece
IEEE Transactions on Software Engineering, Special Issue on Empirical Methods in Software Engineering, Vol. 25, No. 4, July/August 1999.

To date many software engineering cost models have been developed to predict the cost, schedule and quality of the software under development. But, the rapidly changing nature of software development has made it extremely difficult to develop empirical models that continue to yield high prediction accuracies. Software development costs continue to increase and practitioners continually express their concerns over their inability to accurately predict the costs involved. Thus, one of the most important objectives of the software engineering community has been to develop useful models that constructively explain the software development lifecycle and accurately predict the cost of developing a software product. To that end, many parametric software estimation models have evolved in the last two decades [Putnam92, Jones97, Park88, Jensen83, Rubin83, Boehm81, Boehm95, Walkerden97, Conte86, Fenton91, Masters85, Mohanty81].
Almost all of the above mentioned parametric models have been empirically calibrated to actual data from completed software projects. The most commonly used technique for empirical calibration has been the popular classical multiple regression approach. As discussed in this paper, the multiple regression approach imposes a few assumptions frequently violated by software engineering datasets. The source data is also generally imprecise in reporting size, effort and cost-driver ratings, particularly across different organizations. This results in the development of inaccurate empirical models that don't perform very well when used for prediction. This paper illustrates the problems faced by the multiple regression approach during the calibration of one of the popular software engineering cost models, COCOMO II. It describes the use of a pragmatic 10% weighted average approach that was used for the first publicly available calibrated version [Clark98]. It then moves on to show how a more sophisticated Bayesian approach can be used to alleviate some of the problems faced by multiple regression. It compares and contrasts the two empirical approaches, and concludes that the Bayesian approach was better and more robust than the multiple regression approach.
Bayesian analysis is a well-defined and rigorous process of inductive reasoning that has been used in many scientific disciplines [the reader can refer to Gelman95, Zellner83, Box73 for a broader understanding of the Bayesian Analysis approach]. A distinctive feature of the Bayesian approach is that it permits the investigator to use both sample (data) and prior (expert judgement) information in a logically consistent manner in making inferences. This is done by using Bayes’ theorem to produce a ‘post-data’ or posterior distribution for the model parameters. Using Bayes’ theorem, prior (or initial) values are transformed to post-data views. This transformation can be viewed as a learning process. The posterior distribution is determined by the variances of the prior and sample information. If the variance of the prior information is smaller than the variance of the sampling information, then a higher weight is assigned to the prior information. On the other hand, if the variance of the sample information is smaller than the variance of the prior information, then a higher weight is assigned to the sample information causing the posterior estimate to be closer to the sample information.
The Bayesian approach discussed in this paper enables stronger solutions to one of the biggest problems faced by the software engineering community: the challenge of making good decisions using data that is usually scarce and incomplete. We note that the predictive performance of the Bayesian approach (i.e. within 30% of the actuals 75% of the time) is significantly better than that of the previous multiple regression approach (i.e. within 30% of the actuals only 52% of the time) on our latest sample of 161 project datapoints.

THE ROSETTA STONE: MAKING COCOMO 81 ESTIMATES WORK WITH COCOMO II

Sunita Chulani, Don Reifer and Barry Boehm
Published in Crosstalk, The Journal of Defense Engineering, February 1999

As part of our efforts to help Constructive Cost Model (COCOMO) users, we, the COCOMO research team at the Center for Software Engineering at the University of Southern California (USC), have developed the Rosetta Stone to convert COCOMO 81 files to run using the new COCOMO II software cost estimating model. The Rosetta Stone is extremely important because it allows users to update estimates made with the earlier version of the model so that they can take full advantage of the many new features incorporated into the COCOMO II package. This article describes both the Rosetta Stone and guidelines to make the job of conversion easy.

CONSTRUCTIVE QUALITY MODELING FOR DEFECT DENSITY PREDICTION: COQUALMO

Sunita Chulani
Presented and in Proceedings of the International Symposium of Software Reliability Engineering, Boca Raton, FL., Nov. 1-4, 1999

The aim of this paper is to present COQUALMO, a quality prediction model. COQUALMO predicts the defect density of the software under development where defects conceptually flow into a holding tank through various defect introduction pipes and are removed through various defect removal pipes. COQUALMO consists of 2 sub-models, namely the ‘Defect Introduction (DI)’ and the ‘Defect Removal (DR)’ models. The DI model is formulated using product, process, computer and personnel attributes (based on COCOMO II, USC-CSE, 1999) and predicts the number of requirements, design and coding defects that are introduced during various activities of the development life cycle. The DR model captures the effects of 3 relatively orthogonal profiles of defect removal techniques, namely Automated Analysis, People Reviews, Execution Testing and Tools, and predicts the number of requirements, design and coding defects that are eliminated. The residual number of defects is the difference between the number of defects introduced and the number of defects removed. As discussed below, the model has been validated against published studies on defect densities and gives comparable results.

SOFTWARE TESTING AND THE UML

Clay E. Williams
Presented and in Proceedings of the International Symposium on Software Reliability Engineering (ISSRE'99), Boca Raton, Nov. 1-4, 1999

The Unified Modeling Language (UML) has emerged as an industrial standard for modeling software systems, and has been presented to the International Organization for Standardization for consideration as an international standard [4]. UML has received a great deal of attention (both positive and negative) from the software design and development communities, and work is ongoing to enhance and expand its capabilities. However, the software testing community has had much less awareness and debate about UML, and has largely been absent as the modeling standard was developed. This is an important issue, because in many software development organizations, the cost of testing can account for more than 40% of the total development cost for a software system. Given these facts, this abstract seeks to explore the possibility of using the UML for software testing.

EFFICIENT REGRESSION TESTING OF MULTI-PANEL SYSTEMS

Clay Williams and Amit Paradkar
Presented and in Proceedings of the 1999 International Symposium on Software Reliability Engineering (ISSRE'99), Boca Raton, FL., Nov. 1-4, 1999.

Multi-panel systems are systems that interact with a user via multiple input panels. The flow through the panels is influenced by the interaction. Multi-panel systems are ubiquitous, and include panel-based legacy applications, automated teller machines, and web-based systems. Finding regression test suites which efficiently cover the functionality of these systems is difficult, because it requires covering interactions between input fields within a panel as well as flows between panels. Previous approaches to covering input field interactions include partitioning methods and combinatorial design methods. State machines and decision tables have been used to cover flow between panels. None of these techniques produce a conceptually simple, unified model that supports both intra- and inter-panel coverage. Our new method for capturing and representing test specifications provides such a model. This model is then used to generate a locally minimal set of test cases which completely covers the model. We applied this technique in a pilot study for regression testing, and this pilot had promising results which we discuss. We conclude by presenting our plans for generalizing the technique beyond multi-panel systems and regression testing.

COQUALMO (COnstructive QUALity MOdel): A SOFTWARE DEFECT DENSITY PREDICTION MODEL

Sunita Chulani
Proceedings of the European Software Conference on Metrics (ESCOM), April 1999

Cost, schedule and quality are highly correlated factors in software development. They basically form three sides of the same triangle. Beyond a certain point (the "Quality is Free" point), it is difficult to increase the quality without increasing either the cost or schedule or both for the software under development. Similarly, development schedule cannot be drastically compressed without hampering the quality of the software product and/or increasing the cost of development. Software estimation models can (and should) play an important role in facilitating the balance of cost/schedule and quality.
Recognizing this important association, an attempt is being made to develop a quality model extension to COCOMO II (the popular cost and schedule estimation model), namely COQUALMO. This paper presents the two sub-models, i.e., the Defect Introduction and the Defect Removal sub-models, of COQUALMO. It also illustrates the integration of COQUALMO with COCOMO II to facilitate cost/schedule/quality tradeoffs.

ODC FOR SOFTWARE PROCESS MEASUREMENT, ANALYSIS AND CONTROL

P. Santhanam and Ram Chillarege
Presented at the 1998 Korea-U.S. Technical Converence on Strategic Technologies, Tysons Corner, VA., October 22-24, 1998.

This paper provides the motivation and an overview of Orthogonal Defect Classification (ODC), a proven technology for software process measurement and analysis. ODC provides a significant step forward in capturing the dynamics of software development by using the semantic classification of defects, so that they provide measurements. This methodology is being used successfully at several organizations in IBM and
other companies.

eFlow: A JAVA-BASED WORKFLOW SERVICE

G. Robert Malan, Jarir Chaar, Santanu Paul, and Peter Masters
Proceedings of the OOPSLA '98 Workshop on Implementation and Application of OO Workflow Management Systems, October 1998

In today's enterprise, workflow management technologies help focus on the business processes of an enterprise rather than on the individual functions of the components of such an enterprise. They are used to ensure the reliable and repeatable execution of business processes and to improve the efficiency of the enterprise by supporting the definition, planning, execution, and analysis of such processes. eFlow is a lightweight framework for supporting workflow on the Internet. It allows Web-based workflow processes to inter-operate through the use of the Rainmaker interface. These web-based workflow processes are exported to users in the form of web pages with embedded Java applets. The applets consist of eSuite components including: spreadsheet, word processor, charting, and presentation applets. eFlow uses e-mail as its notification mechanism, providing a universal inbox for users. Persistence of both state and application data is provided through the Servlet File System (SFS), a distributed web-based file system. eFlow allows mobile users to access their worklist tasks from any point in the network (intranet or Internet) using only a Java-enabled web browser or a thin client platform such as a Network Computer (NC). As a Java-based application, the implementation as well as the data items are defined within an object oriented framework. This paper describes the eFlow architecture and illustrates its functionality through the use of an example process: a grant proposal workflow process.


AN OBJECTIVE APPROACH TO EVALUATE SOFTWARE DEVELOPMENT

Kathryn A. Bassin, Theresa Kratschmer, and P. Santhanam
IEEE Software Magazine, November/December 1998 Special Issue, Vol. 15 No.6 pp 66-74, 1998

In today's software development environment, the delicate compromise over functionality, time to market, and quality, drives all business decisions. The success of a software development effort is dependent on whether the development team can efficiently design, code, test and support the software in a timely fashion. This article describes an objective and time tested method to meet the needs of all the key people in a software organization: the executive, release manager, developers and testers. The method is based on the use of software defects as a diagnostic probe in an organization and the capture of semantic information from the defect analysis via the "Orthogonal Defect Classification" scheme. Through three real-life case studies in different parts of IBM, we illustrate the use of one of the most power methods available today for software management and technical decision support.


EXPLORING DEFECT DATA FROM DEVELOPMENT AND CUSTOMER USAGE ON SOFTWARE MODULES OVER MULTIPLE RELEASES

Shriram Biyani and P. Santhanam
Proceedings of the Ninth International Symposium on Software Reliability Engineering, Paderborn, Germany, November 4-7, 1998, pp 316-320

Traditional defect analyses of software modules have focused on either identifying error prone modules or predicting the number of faults in a module, based on a set of module attributes such as complexity, lines of code, etc. In contrast to these metrics-based modeling studies, this paper explores the relationship of the number of faults per module to the prior history of the module. Specifically, we examine the relationship between (a) the faults discovered during development of a product release and those escaped to the field, and (b) faults in the current release and faults in previous releases. Based on the actual data from four releases of a commercial application product consisting of several thousand modules, we show that:

These results can be used to improve the prediction of quality at the module level of future releases based on the past history.


CALIBRATING THE COCOMO II POST-ARCHITECTURE MODEL

Sunita Chulani, Bradford Clark and Barry Boehm
Proceedings of the International Conference on Software Engineering (ICSE), April 1998.

The COCOMO II model was created to meet the need for a cost model that accounted for future software development practices. This paper describes some of the experiences learned in calibrating COCOMO II Post-Architecture model from eighty-three observations. The results of the multiple regression analysis, their implications, and a future calibration strategy are discussed.

MODELING SOFTWARE DEFECT INTRODUCTION

Sunita Chulani
Proceedings of the California Software Symposium, November 1997.

This paper presents an initial version of the Defect Introduction sub-model of the empirical quality modeling extension to the existing COCOMO II software cost estimation model. The Quality Model is an estimation model that can be used for predicting number of residual defects/KSLOC (thousands of Source Lines of Code) or defects/FP (Function Point) in a software product. It applies in the early activities such as analysis and design as well as in the later stages for refining the estimate when more information is available. It enables ‘what-if’ analyses that demonstrate the impact of various defect removal techniques and the effects of personnel, project, product and platform characteristics on software quality. It also provides insights on determining ship time, assessment of quality investment payoffs and understanding of quality strategy interactions. The model has two sub-models, namely the Defect Introduction Model and the Defect Removal Model. This paper focuses on the Initial version of the Defect Introduction Model which is the result of a two-round Delphi analysis. It discusses in depth the Defect Introduction Rate sensitivities of the various COCOMO II parameters and gives a detailed explanation of the rationale behind the suggested numeric ratings associated with each of the parameters.

USE OF SOFTWARE TRIGGERS TO EVALUATE SOFTWARE PROCESS EFFECTIVENESS AND CAPTURE CUSTOMER USAGE PROFILES

Kathryn A. Bassin and P. Santhanam
Eighth International Symposium on Software Reliability Engineering (ISSRE'97), Albuquerque, NM, Nov, 2-5, 1997.

We have analyzed fault data comprising nearly 30,000 records (including in-process and field data) from two real products A and B over multiple releases, using Orthogonal Defect Classification (ODC). We exploit the information captured by ODC triggers to evaluate the development activities, and identify specific actions for improvement in development. We illustrate the use of triggers to capture customer usage in a way directly meaningful to product development and show

This is the most comprehensive use of ODC triggers in development and field reported to date.


ERROR INJECTION AIMED AT FAULT REMOVAL IN FAULT TOLERANCE MECHANISMS - CRITERIA FOR ERROR SELECTION USING FIELD DATA ON SOFTWARE FAULTS

J. Christmansson and P. Santhanam
Seventh International Symposium on Software Reliability Engineering (ISSRE'96), White Plains, N.Y., Oct 31 - Nov 2, 1996

Fault injection allows a detailed study of complex interactions between faults and fault handling mechanisms. It can be a useful complement to analytical modeling and formal verification techniques in the testing of fault tolerant systems. However, work on fault injection has not matured adequately to provide industry with cost effective alternatives for the validation of fault tolerant systems. This study analyzes 408 customer discovered faults (defects) in a release of a large operating system product. We discuss methods to select the error types for an error injection experiment in the system test environment, aimed at fault removal. Using four levels of severity and a total of 24 error types as recorded in the customer defects records, we analyze the faults in terms of fault types and system test triggers as defined in ODC. Our work shows examples of criteria that can be used to select errors for injection that use the information from the field reported defects. In particular:

1. Six error types accounted for nearly 80% of the highest severity defects.
2. Nine error types accounted for about 80% of the defects exposed by recovery procedures or exception handlers.
3. The six error types that were common to the two sets were: program management, storage management, data structures, flag, serialization and parameter, indicating their significance in the operating system software environment.
4. We have identified the best components in the product for the injection of these errors by considering the rank ordered lists of components with highest severity defects and/or defects in the recovery procedures and exception handlers.
These results show that a systematic analysis of field defects with ODC and error type analysis can be very beneficial in the planning of a focused error injection experiment in system test for the validation of fault tolerant mechanisms in a large software product.

WHAT IS SOFTWARE FAILURE?

Ram Chillarege
IEEE Transactions on Reliability, Vol. 45, No. 3, September, 1996.

As software dominates most discussions in the information technology business, one needs to examine carefully where we are headed in software reliability. It is reasonable to ask about the nature of software faults and the remedy for them, either from a fault-tolerance perspective or, more generally, on a software dependability front. There are conferences on this topic, and over 300 technical papers that discuss some of its aspects. However, as many industry specialists agree, software is one area in the information technology industry which continues to baffle the scientist from a dependability perspective. While hardware & technology have seen four orders of magnitude improvement in the past decade, software has probably marginally improved or, some will argue, gotten worse. One then wonders if research in this area is headed in the right directions?

Thus, it is vital to examine some of the fundamentals, e.g., "What is a software failure?" This question is usually assumed to be well understood. However, when one examines it closely, it is startling how little is truly understood regarding those faults that matter. This problem is appreciably better understood in hardware systems, or systems, have been better tracked over the years and there is a larger body of experience in failure modes & effects analysis. However, the counterpart in software is far less understood. It is further complicated by a lack of clarity as to what is a software failure.


GUIDING FAULT INJECTION AND BROADENING ITS SCOPE USING ODC TRIGGERS

R. Chillarege
Invited Paper, CADET, Beijing, China, July 1996 and
Fourth International Conference on Advanced Computing, December 16-18, 1996.

Thus, the ideas of fault-injection are made applicable to a much larger domain - namely of products which don't have to be fault-tolerant in design to take advantage and improve overall system reliability.


A COMPARATIVE ANALYSIS OF EVENT TUPLING SCHEMES

Michael Buckley and Daniel Siewiorek
Presented at The 26th Annual International Symposium on Fault-Tolerant Computing, Sendai, Japan, June 25-27, 1996.

Event logs provide an effective means of improving system availability. However, the majority of faults produce many errors because faults propagate in the time and error detection domains. Thus, the ability to coalesce related events is critical.

The tupling heuristics developed at Carnegie-Mellon University provide one such methodology. These heuristics were applied to a new and larger set of data in order to evaluate the generality of the scheme and to extend the previous work. The extensions included deriving a semantic understanding of why the rules work, expanded statistical analysis, and a comprehensive sensitivity study to determine the effects of changes in the rules.

The results prove that tupling is a useful and general methodology. The sensitivity study enabled the identification of refinements to the rules, while the high degree of skew in the tuple variables enables us to propose that the extreme percentiles be used as an alarm threshold for proactive fault management.


GENERATION OF AN ERROR SET THAT EMULATES SOFTWARE FAULTS - BASED ON FIELD DATA

J. Christmansson and R. Chillarege
Presented at The 26th Annual International Symposium on Fault-Tolerant Computing, Sendai, Japan, June 25-27, 1996.

A significant issue in fault injection experiments is that the injected faults are representative of software faults observed in the field. Another important issue is the time used, as we want experiments to be conducted without excessive time spent waiting for the consequences of a fault. An approach to accelerate the failure process would be to inject errors instead of faults, but this would require a mapping between representative software faults and injectable errors. Furthermore, it must be assured that the injected errors emulate software faults and not hardware faults. These issues were addressed in a study of software faults encountered in one release of a large IBM operating system product. The key results are:


VIRTUAL PROJECT MANAGEMENT FOR SOFTWARE

Jarir Chaar, Santanu Paul and Ram Chillarege
Appearing at the NSF Workshop on Workflow & Process Automation, May 8-10, 1996

The explosive growth in global networking infrastructures, as demonstrated by the World Wide Web (WWW) on the Internet, the IBM Global Network, the AT&T Business Network, and other networks managed by telecommunication companies has begun to open up new possibilities for collaborative processes on a global scale. We identify software development as one that can benefit immensely from the proper use of global networking. Its feasibility however is contingent on the resolution of at least four critical factors. First, virtual teams of software developers can be assembled intelligently and swiftly on a per-project basis from a global pool of resources. Second, the project owner can manage the development process using the workflow technology that can be deployed over a global network such as the Internet. Third, developers and testers can be equipped with development environments, tools, and methodologies that are logical extensions of their current environments and practices. Fourth, a suitable team communication infrastructure can be put in place to facilitate unstructured communication between team members. These concepts are explored briefly in this paper.


CHALLENGES FACING SOFTWARE FAULT-TOLERANCE

Ram Chillarege
First Conference on Fault-Tolerant Systems, IIT Madras, December 20-22, 1995

As software dominates most discussions in the information technology business one needs
to carefully examine where we are headed in software dependability. This paper
reexamines some of the basic premises upon which the area of software fault-tolerance is
built and critiques some current practices and beliefs. A few of the thoughts and
contributions are:

The definition of a software failure needs to change from a specification based
thought to one of customer expectation and ability to do productive work. This
will cause a significant shift on what we build fault-tolerance for. However, it would
also help narrow the gap between today's theory, practice, and customer need.
Data on customer problems illustrates that 90% of the problems reported are what
we have traditionally considered as non-defect -- implying no need for a
programming change. However, with the new definition of failure, we will need to
address this more seriously as a part of fault-tolerance. This change could level the
playing field and help achieve greater customer satisfaction. A rationale for determining
the amount of fault-tolerance based on the concept of the threshold of pain, is suggested.
It helps guide the prioritization of fault-tolerance amongst competing forces, by platform
and market segment.

In conclusion the paper reflects on a few of the development world "realities" to temper
what can be achieved and what we as a community need to be aware of.


DISCOVERING RELATIONSHIPS BETWEEN SERVICE AND CUSTOMER
SATISFACTION

Michael Buckley and Ram Chillarege
International Conference on Software Maintenance, Opio (Nice), France, October 16-20, 1995
Organizations spend significant resources tracking customer satisfaction and managing service delivery. Although a great deal of effort is expended in understanding what goes on within each of these areas, little or no effort has been applied to identifying and quantifying the relationships between the two. The objective of this research is to discover and establish potential relationships between service data and customer satisfaction. This understanding will enable more effective management, which will lead to improved quality, reduced cost and increased customer satisfaction.
This study uses three years of data from an IBM operating system to measure the correlation between 15 service variables and nine customer satisfaction attributes. The results show that:

SOFTWARE TRIGGERS AS A FUNCTION OF TIME - ODC ON FIELD FAULTS

R. Chillarege and Kathryn A. Bassin
Fifth IFIP Working Conference on Dependable Computing for Critical Applications, September 1995

The dynamics of software faults becoming failures during the use of a product is one of the least understood aspects regarding software faults today. This paper addresses this problem by analyzing the software triggers that activate faults into failures. The work is conducted on faults experienced by a large operating systems product for two years after release into the field. The results provide some of the first demonstrations of the changing trigger distribution as a function of time after release. Specifically, this paper:

1. Defines triggers for the three primary verification activities: software review and inspection, function test and
test.
2. Provides three trigger distributions as a function of time, attributable to escapes to the field from each of the
verification activities: review, function test and system test.
3. Illustrates that each trigger peaks at a different time from the date of release. This is a key finding with significant
implications in several aspects of software dependability and software engineering.

MEASUREMENT OF FAILURE RATE IN WIDELY DISTRIBUTED SOFTWARE

Ram Chillarege, Shriram Biyani and Jeanette Rosenthal
Proceedings of the 25th Annual Symposium on Fault-Tolerant Computing, IEEE Computer Society, June 1995
In the history of empirical failure rate measurement, one problem that continues to plague researchers and
practitioners is that of measuring the customer perceived failure rate of commercial software. Unfortunately, even
order of magnitude measures of failure rate are not truly available for commercial software which is widely
distributed. Given repeated reports on the criticality of software, and its significance, the industry flounders for some
real baselines.

IDENTIFYING RISK USING ODC BASED GROWTH MODELS
R. Chillarege and S. Biyani
Proceedings, 5th International Symposium on Software Reliability Engineering, IEEE Computer, November 1994.

This paper uses the relative growth of defects classified using Orthogonal Defect Classification to get a finer insight into dynamics of the software development process during later parts of testing. This is particularly useful to help identify management actions to better use people resources (both skill and staffing levels) to respond to difficulties experienced with the product in test. Specifically the technique helps to:


ORTHOGONAL DEFECT CLASSIFICATION - A BREAKTHROUGH FOR IN-PROCESS
MEASUREMENT

Ram Chillarege
A Tutorial - Fifth International Symposium on Software Reliability Engineering, Monterey, California, November 6-9, 1994

One of the perpetual challenges of a developer, is the use of in-process measurement to quantify, understand, and
manage a software development process. Software quality measurements are beset with undue complexity and have
over time gradually advanced away from the developer. In an area where the processes are so amorphous, the
tangibles required for measurement and modeling are few. In this area, the need to derive tractable measurements that are reasonable to undertake and intuitively plausible cannot be understated. Measurement without an underlying
theme can leave the experimentalist, the theorist and the practitioner very confused.

At the T.J. Watson Research Center we have been developing and deploying a technology called Orthogonal Defect
Classification. ODC is a significant breakthrough in measurement concepts and brings a new dimension to software
defect analysis. It is established on a very subtle but key finding - demonstrating the existence of a semantic
classification of defects which can explain the progress of the product through the process. ODC opens the doors to
bring a systematic and scientific methodology to the fuzzy area of in-process measurement. Our research in this area
is growing as is our experience with its use at several IBM labs.


SELF-TESTING SOFTWARE PROBE SYSTEM FOR FAILURE DETECTION AND DIAGNOSIS

Ram Chillarege
Proceedings CASCON '94 CD ROM, Toronto, Canada, October 31 - November 3, 1994.

A key problem in today's complex software systems is software failure detection and isolation, given that most
software failures are only partial, and if efficiently diagnosed, isolated and recovered, could avert a total outage. The
probe detects failed software components in a running software system by requesting service, or a certain level of
service, from a set of functions, modules and/or subsystems (target) and checking the response to the request. The
objective is to localize the failure only up to the level of a target, which achieving a high degree of efficiency and
confidence in the process. Targets can be identified at different levels or layers in the software. The choice is based
on the granularity of fault detection that is desired, taken in consideration with the level at which recovery is
implemented. The implementation of the probe system is made self-testing against any single failure in its operational
components, using the idea of a null probe. The probe system has been designed, taking advantage of the latency
characteristics of errors, to provide a low-overhead mechanism. The ideas are implementable in either a single or
multiple computer system.


ODC FOR PROCESS MEASUREMENT, ANALYSIS AND CONTROL

R. Chillarege
Fourth International Conference of Software Quality, Maryland, October 1994.

This paper provides the motivation and overview of Orthogonal Defect Classification (ODC), a new technology for software process measurement and analysis. ODC provides a significant step forward in being able to understand the dynamics of software development by using classification of defects, so that they provide measurements. This breakthrough is being used at several IBM labs and is now supported by several processes, analyses and tools from the Thomas J. Watson Research Center.


AN INFERENCE STRUCTURE FOR PROCESS FEEDBACK: TECHNIQUE AND IMPLEMENTATION

Inderpal Bhandari, Bonnie Ray, M. Y. Wong, David Choi, A. Watanable, Ram Chillarege, M. Halliday, Alan Dooley, Jarir Chaar
Software Quality Journal, Vol. 3, No. 3, September 1994.

This paper presents an automatic technique for making simple inferences about the stages in a software production process along with notes about its implementation. The technique represents an approach to automate process feedback which may be based either on human experience and common sense or on historical data. Specifically, we present

1. A software defect classification scheme that relates defects to process stages. As an example of such a scheme,
we describe Orthogonal Defect Classification which associates defects with stages such as high-level design,
low level design, code, and system test.
2. Rules that provide simple in-process feedback about a process stage based on the number of defects that
were associated with the stage. A stage may be inferred to be "good", "exposed", "fixed", or "revamp". We
discuss how the inferences can be made with and without historical information for the process and the
application of this inference technique to the software production process.
3. The implementation of the rules using a statistical programming language such as SAS.

In closing, we view the above as a case study of an approach which makes use of both human judgment and historical data, and generalize the lessons learnt.


IN-PROCESS IMPROVEMENT THROUGH DEFECT DATA INTERPRETATION

Inderpal Bhandari, Michael Halliday, Jarir Chaar, Ram Chillarege, Kevin Jones, Janette Atkinson, Clotilde LeporiCostell, Pamela Jasper, Eric Tarver, Cecilia Carranza Lewis, Masato Yonezawa
IBM Systems Journal, Vol. 33, No. 1, 1994.

An approach that involves both automatic and human interpretation to correct the software production process during development is becoming important in IBM as a means to improve quality and productivity. A key step of the approach is the interpretation of defect data by the project team. This paper uses examples of such correction to evaluate and evolve the approach, and to inform and teach those who will use the approach in software development. The methodology is shown to benefit different kinds of products beyond what can be achieved by current practices, and the collection of examples discussed represents the experiences of using a model of correction.


IBM'S ES/9000 MODEL 982'S FAULT-TOLERANT DESIGN FOR CONSOLIDATION

Lisa Spainhower, Thomas A. Gregg, and Ram Chillarege
IEEE Micro, 1994.

Consolidated work loads running around the clock means that today's large, general-purpose computers must meet
high availability demands. To meet these demands, the Model 982 provides fault tolerance by combining enhanced
circuit-level error detection and failure isolation techniques with system-level techniques exploiting inherent
redundancy.


A CASE STUDY OF SOFTWARE PROCESS IMPROVEMENT DURING DEVELOPMENT

Inderpal Bhandari, Michael Halliday, Eric Tarver, David Brown, Jarir Chaar, and Ram Chillarege
IEEE Transactions on Software Engineering, Vol. 19, No. 12, December 1993.

We present a case study of the use of a software process improvement method which is based on the analysis of defect data. The first step of the method is the classification of software defects using attributes which relate defects to specific process activities. Such classification captures the semantics of the defects in a fashion which is useful for process correction. The second step utilizes a machine-assisted approach to data exploration which allows a project team to discover such knowledge from defect data as is useful for process correction. We show that such analysis of defect data can readily lead a project team to improve their process during development.


IN-PROCESS EVALUATION FOR SOFTWARE INSPECTION AND TEST

Jarir K. Chaar, Michael J. Halliday, Interpal S. Bhandari, and Ram Chillarege
IEEE Transactions on Software Engineering, Vol. 19, No. 11, November 1993.

The goal of software inspection and test is to reduce the expected cost of software failure over the life of a product. This paper extends the use of defect triggers, the events that cause defects to be discovered, to help evaluate the effectiveness of inspections and test scenarios. In the case of inspections, the defect trigger is defined as a set of values that associate the skills of the inspector with the discovered defect. Similarly, for test scenarios, the defect trigger values embody the deferring strategies being used in creating these scenarios.

The usefulness of triggers in evaluating the effectiveness of software inspections and tests is demonstrated by evaluating the inspection and test activities of some software products. These evaluations are used to point to both deficiencies in inspection and test strategies and to progress made in improving such strategies. The trigger distribution of the entire inspection or test series may then be used to highlight areas for further investigation, with the aim of improving the design, implementation, and test processes.


SOFTWARE RECREATE PROBLEMS ESTIMATED TO RANGE 10-20 PERCENT: A CASE STUDY ON TWO OPERATING SYSTEM PRODUCTS

R. Chillarege, B. Ray, A. Garrigan, and D. Ruth
Proceedings, Fourth International Symposium on Software Reliability Engineering, IEEE, Denver, Colorado, November 1993.

Software recreates are necessitated due to inadequate diagnostic capability following a failure. They impact the service process and the perception of availability, but have never been adequately quantified. This paper develops a technique to make the key measurements of: percent recreate, arrival rate and open time, from problem service data without requiring any additional instrumentation. The study is conducted over an 18 month period on two operating system products, that are among the best in the industry for diagnosis and service. The results provide the first insight into the problem and some accurate baselines. Specific to these products:

Clearly, the problem is not insignificant and the results underscore the need for improvement in diagnosis and isolation.


ON THE EVALUATION OF SOFTWARE INSPECTIONS AND TESTS

Jarir K. Chaar, Michael J. Halliday, Inderpal S. Bhandari, and Ram Chillarege
Proceedings, International Test Conference, Baltimore, Maryland, October 1993.

The goal of software inspections and tests is to reduce the expected cost of software failure over the life of a product. This paper extends the use of defect triggers, the events which cause defects to be discovered, to help evaluate the effectiveness of inspection and test activities. In the case of inspections, the defect trigger is defined as a set of values which associate the skills of the inspector with the discovered defect. Similarly, for tests, the defect trigger values embody the various strategies being used in creating test scenarios.

The usefulness of triggers in evaluating the effectiveness of software inspections and tests is demonstrated by evaluating the inspection and test activities of some software products. These evaluations are used to point to both deficiencies in inspection and test strategies, and progress made in improving such strategies.


EXPERIENCES IN TRANSFERRING A SOFTWARE PROCESS IMPROVEMENT METHODOLOGY TO PRODUCTION LABORATORIES

M. Halliday, I. Bhandari, J. Chaar and R. Chillarege
Proceedings of the 2nd International Symposium on Fault-Tolerant Computing, Venice, Italy, October 18-20, 1993.

This paper describes the experience of transferring a software process methodology developed in a research laboratory to different production laboratories at IBM. The methodology involves the classification and analysis of software defects with a view to improving the software development process. The experience is reported in two parts. The first part details those factors which were anticipated, at least to some extent, or which seem more obvious. The second part details those parts of the experience which were completely unanticipated, appear quite subtle, and which are probably not well known.


ERROR PROPAGATION IS INVERSELY PROPORTIONAL TO FAILURE ACCELERATION: ESTABLISHED BY FAULT-INJECTION ON THE NFS FILE SYSTEM

Ram Chillarege, Kumar Goswami, Murthy Devarakonda
Presented at Dependable Computing for Critical Applications, San Diego, California, July 4-6, 1993

This paper provides a new insight into the design of system level fault injection experiments. The failure acceleration theory is used to conduct an experiment on the NFS distributed file system. A matched pair of experiments are conducted at two different levels of acceleration, studying its effect on two key parameters: Probability of Failure and Error Propagation. In the second set, these are done approaching almost maximum acceleration, yielding some insight into how acceleration works and validating earlier theory. These results are valuable to experimentalists since they provide the stepping stones towards systematic design of such experiments. Specifically:

The paper will be useful to the fault-injection community, experimental validation, and provide insight to modelers.


TOP FIVE CHALLENGES FACING THE INDUSTRY IN THE PRACTICE OF FAULT-TOLERANCE

Ram Chillarege
Workshop on Hardware & Software Architectures for Fault-Tolerance, June 1993.

This paper identifies key problem areas for the fault-tolerant community to address. Changes in technology,
expectation of society, and needs of the market pressure the design point for fault-tolerance in their own special
manner. A developer who has only a fine set of resources and limited time, responds to these pressures with a set of
priorities. I believe that the top five challenges, which ultimately drive the exploitation of fault-tolerant technology are:
(1) Shipping a product on schedule, (2) Reducing unavailability, (3) Non-disruptive change management, (4) Human
fault-tolerance, (5) All over again in the distributed world. Each of these are discussed to explore their influence on
the choice for fault-tolerance. Understanding them is key to guide research investment and maximize its derivatives.


ORTHOGONAL DEFECT CLASSIFICATION - A CONCEPT FOR IN-PROCESS MEASUREMENTS

R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M.Y. Wong
IEEE Transactions on Software Engineering, Vol. 18, No. 11, November 1992.

This paper describes orthogonal defect classification (ODC), a concept that enables in-process feedback to developers by extracting signatures on the development process from defects. The ideas are evolved from an earlier finding that demonstrates the use of semantic information from defects to extract cause-effect relationships in the development process. This finding is leveraged to develop a systematic framework for building measurement and analysis methods. This paper


A COMPARISON OF SOFTWARE DEFECTS IN DATABASE MANAGEMENT SYSTEMS AND OPERATING SYSTEMS

Mark Sullivan and Ram Chillarege
Proceedings, The 22nd International Symposium on Fault Tolerant Computing, Boston, MA, July 1992.

A clear understanding of software defects that occur in the field is critical for the development of effective validation methods and strategies for fault-tolerance. This paper presents an analysis of software defects reported at customer sites in two large IBM database management products, DB2 and IMS. The analysis considers several different error classification systems and compares the results to those of an earlier study of field defects in IBM's MVS operating system. The paper:


DESIGN FOR FAULT-TOLERANCE IN SYSTEM ES/9000 MODEL 900

Lisa Spainhower, Jack Isenberg, Ram Chillarege, Joseph Berding
The 22nd International Symposium on Fault-Tolerant Computing, Boston, MA, July 1992.

The ES/9000 Model 900 is IBM's high-end fault-tolerant commercial processor. Although, high-end commercial processors were traditionally designed to be very reliable, this is the first one that implements a fault-tolerant machine. The design exploits circuit level concurrent-error detection, fault-identification and reconfiguration with system level techniques when multiple functional resources are available. It provides true graceful degradation during Central Processor or Channel reconfiguration and repair. This paper:


RELIABILITY GROWTH FOR TYPED DEFECTS

B.K. Ray, I. Bhandari, and R. Chillarege
Proceedings, Annual Reliability and Maintainability Symposium, Las Vegas, Nevada, January 1992.

This paper presents a reliability growth model for defects that have been categorized into defect types associated with specific stages in the software development process. Modeling the reliability growth of defects for each type separately allows identification of problems in the development process which may otherwise be masked when defects of all types are modeled together. This paper:

Since each defect type can be associated with a software development stage, comparing the estimated defect detection rates and the dependency between types provides a basis for feedback on the process.


SOFTWARE DEFECTS IN THE MVS OPERATING SYSTEM: UNDERSTANDING THEIR CHARACTERISTICS AND IMPACT

Mark Sullivan and Ram Chillarege
IEEE Transactions on Reliability, January 1992

In recent years, software defects have become the dominant cause of unplanned outage; improvements in software reliability and quality have not kept pace with those of hardware. Despite their importance, software defects are still not understood adequately enough to provide a clear strategy for avoiding or tolerating them. To gain the necessary insight, we study defects reported between 1986 and 1989 in the MVS* Operating System. The study compares typical defects (regular) to those that corrupt a program's memory (overlay), given that overlays are considered by field services to be particularly hard to find and fix.

This paper:

Further analysis is provided on defects in fixes to other defects, failure symptoms, and the impact of defects on customers. These results provide a base line understanding useful to designers and developers. The data will also help develop realistic fault models for use in fault-injection experiments.

*MVS is a Registered trade mark of the IBM Corporation.


ORTHOGONAL DEFECT CLASSIFICATION FOR DEFECT CONTROL

R. Chillarege, I. Bhandari, J. Chaar, M. Halliday, D. Moebus, B. Ray and M. Wong
IEEE Transactions on Software Engineering, October 28, 1991.

This paper describes Orthogonal Defect Classification, a means by which defects can be used to provide feedback on the development process. A careful selection of classification codes with orthogonal properties provide signatures in the distribution of the codes. These signatures reflect the progress of the process, detect departures when they occur, and provide the necessary insight to make adjustments.

The properties of software defects are captured by the defect type, defect trigger, source, impact, and environment attributes. The paper describes these attributes and illustrates their use with results from pilot studies in many IBM labs. It is noted that Orthogonal Defect Classification has the merit of being independent of product, thereby providing a framework for general use.


SOFTWARE DEFECTS AND THEIR IMPACT ON SYSTEM AVAILABILITY - A STUDY OF FIELD FAILURES IN OPERATING SYSTEMS

Mark Sullivan and Ram Chillarege
Digest 21st International Symposium on Fault-Tolerant Computing, Montreal, Canada, June 1991.

In recent years, software defects have become the dominant cause of unplanned outage; improvements in software reliability and quality have not kept pace with those of hardware. Despite their importance, software defects are still not understood adequately enough to provide a clear strategy for avoiding or tolerating them. To gain the necessary insight, we study defects reported between 1986 and 1989 in the MVS* Operating System. The study compares typical defects (regular) to those that corrupt a program's memory (overlay), given that overlays are considered by field services to be particularly hard to find and fix.

This paper:

Further analysis is provided on defects in fixes to other defects, failure symptoms, and the impact of defects on customers. These results provide a base line understanding useful to designers and developers. The data will also help develop realistic fault models for use in fault-injection experiments.

*MVS is a Registered trade mark of the IBM Corporation.


DEFECT TYPE AND ITS IMPACT ON THE GROWTH CURVE

Ram Chillarege, Wei-Lun Kao and Richard G. Condit
Proceedings of the 13th International Conference on Software Engineering, Austin, Texas, May 1991.

This paper presents an empirical investigation on possible cause and effect relationships between defects and the development process. Establishing such relationships is critical to make software development into a process with greater understanding and control. This paper:

Thus, we show that it is plausible that there exist other cause-effect relationships that could be identified. The impact of this finding is that it could well pave the way for a more systematic process control methodology to be applied to software development.


AN EXPERIMENTAL STUDY OF MEMORY FAULT LATENCY

Ram Chillarege and Ravi K. Iyer
IEEE Transactions on Computers, Vol. 38, No. 6, 1989.

The difficulty with the measurement of fault latency is due to the lack of observability of the fault occurrence and error generation instants in a production environment. This paper describes an experiment, using data from a VAX 11/780 under real workload, to accurately study fault latency in the memory subsystem. Fault latency distributions are generated for s-a-0 and s-a-1 permanent fault models. The results show that the mean fault latency of an s-a-0 fault is nearly 5 times that of the s-a-1 fault. An analysis of variance is performed to quantify the relative influence of different workload parameters on the measured latency.


UNDERSTANDING LARGE SYSTEM FAILURES - A FAULT INJECTION EXPERIMENT

Ram Chillarege and Nicholas S. Bowen
Proceedings, International Symposium on Fault Tolerant Computing, 1989.

This paper uses fault injection to characterize large system failures. Thus, it overcomes limitations imposed by the lack of complete information in field failure data. The experiment is conducted on a commercial transaction processing system and this paper:

These results enhance our understanding of large system failures and provide a foundation for design enhancements and modeling of availability.


Go back to the top.