Statistical techniques offer immense value to managers and developers who want to maximize quality and efficiency throughout the entire software lifecycle. Finally, there's a guide to using statistical techniques to solve specific software productivity and maintenance problems. Using actual software product data, one of the field's leading experts leads you through every step of the statistical analysis, helping you avoid pitfalls and extract all the value your data has to offer. Katrina Maxwell begins by outlining an intelligent methodology and an exclusive set of "recipes" for analyzing software project data, showing how to answer crucial questions without "getting lost in the data." Starting with actual software project data, organized in a database, Maxwell walks through the entire analysis process, explaining essential techniques such as correlation, regression, and analysis and variance. Along the way, Maxwell presents four real-world case studies focused on the key tasks software project managers face: improving productivity, optimizing time to market, building software development cost models, and identifying software maintenance cost drivers. For all managers, developers, and researchers who want to apply statistical methods to improving the efficiency and quality of their software projects.
As an aid to readers of Applied Statistics for Software Managers, we are making available here the datasets featured in Appendices A, B, and C. The datasets are Excel files contained in a ZIP archive.
1. Data Analysis Methodology.
Graphs. Tables. Correlation Analysis. Stepwise Regression Analysis. Numerical Variable Checks. Categorical Variable Checks. Testing the Residuals. Detecting Influential Observations.
Creation of New Variables. Data Modifications. Identifying Subsets of Categorical Variables. Model Selection. Graphs. Tables. Correlation Analysis. Stepwise Regression Analysis. Numerical Variable Checks. Categorical Variable Checks. Testing the Residuals. Detecting Influential Observations.
Model Selection. Graphs. Tables. Correlation Analysis. Stepwise Regression Analysis. Numerical Variable Checks. Categorical Variable Checks. Testing the Residuals. Detecting Influential Observations.
Choice of Data. Model Selection. Graphs. Tables. Correlation Analysis. Stepwise Regression Analysis. Numerical Variable Checks. Categorical Variable Checks. Testing the Residuals. Detecting Influential Observations. Common Accuracy Statistics. Boxplots of Estimation Error. Wilcoxon Signed-Rank Test. Accuracy Segmentation. The 95% Confidence Interval. Identifying Subsets of Categorical Variables. Model Selection. Building the Multi-Variable Model. Checking the Models. Measuring Estimation Accuracy. Comparison of 1991 and 1993 Models. Management Implications.
It's the Results That Matter. Cost Drivers of Annual Corrective Maintenance (by Katrina D. Maxwell and Pekka Forselius). From Data to Knowledge. Variable and Model Selection. Preliminary Analyses. Building the Multi-Variable Model. Checking the Model. Extracting the Equation. Interpreting the Equation. Accuracy of Model Prediction. The Telon Analysis. Further Analyses. Final Comments.
Describing Individual Variables. The Normal Distribution. Overview of Sampling Theory. Other Probability Distributions. Identifying Relationships in the Data. Comparing Two Estimation Models. Final Comments.
You've implemented a measurement program and have collected some software metrics data. Great, but do you know how to make the most of this valuable asset? Categorical variables such as language, development platform, application type, and tool use can be important factors in explaining the cost, duration, and productivity of your company's software projects. However, analyzing a database containing many non-numerical variables is not a straightforward task.
Statistics, like software development, is as much an art as it is a science. Choosing the appropriate statistical methods, selecting the variables to use, creating new variables, removing outliers, picking the best model, detecting confounded variables, choosing baseline categorical variables, and handling influential observations all require that you make many decisions during the data analysis process. Decisions for which there are often no clear rules. What should you do? Read on.
Using real software project data, this book leads you through all the steps necessary to extract the most value from your data. In Chapter 1, I describe my methodology for analyzing software project data. You do not need to understand statistics to follow the methodology. I simply explain what to do, why I do it, how to do it, and what to watch out for at each step.
Common problems that occur when analyzing real data are thoroughly covered in four case studies of gradually increasing complexity. Each case study is based around a business issue of interest to software managers. In Chapter 2, you will learn how to determine which variables explain differences in software development productivity. In Chapter 3, you will look at factors that influence time to market. In Chapter 4, you will learn how to develop and measure the accuracy of cost estimation models. In Chapter 5, you will study the cost drivers of software maintenance, with an emphasis on presenting results. Finally, in Chapter 6, you will learn what you need to know about descriptive statistics, statistical tests, correlation analysis, regression analysis, and analysis of variance.Intended audience
I wrote this book for current and future software managers. In particular, the unique combination of statistics applied to software business issues should help every future software engineer/manager understand why software measurement is useful and what to do with the data.
This book could be used as the basis for a corporate training program, and in the software engineering and information systems curricula of universities. Additionally, it could be used in statistics courses taught to computer scientists as it contains examples of interest to them.Prerequisites
Anyone who wants to analyze data will need to know how to use a statistical software tool. As far as mathematics go, a basic knowledge of algebra is sufficient.