 # A Data Analysis Methodology for Software Managers

• Print
This chapter is from the book

## Extracting the Equation

Our model has passed all our checks. So far, everything has been calculated automatically. We have not been forced to extract the equation and calculate effort ourselves. What is the actual equation? From the final model (Example 1.23), I see that the equation to calculate leffort is:

`ln(effort) = 5.088876 + 0.7678266 x ln(size) – 0.3856721 x t14`

How did I read the equation off the output? The equation is a linear equation of the form y = a + bx1 + cx2. y is ln(effort), x1 is ln(size), and x2 is t14. a, b, and c are the coefficients (Coef.) from the output. The constant (_cons), a, is 5.088876, the coefficient of ln(size), b, is 0.7678266, and the coefficient of t14, c, is –0.3856721.

In a presentation or report, I give the results in the form of an equation for effort, not ln(effort). I find it is easier for people to understand. Keep in mind that most people don't want to know how you analyzed the data or the equation; they just want to know the management implications. I almost never include an equation in an oral presentation. By all means, prepare some slides about the methodology and the equation, but do not show them unless specifically asked to go into the details in public.

To transform ln(effort) into effort, I take the inverse natural log (or e) of each side of the equation. To do this accurately, I use all seven significant digits of the coefficients from the output. However, when I present the equation, I round the transformed coefficients to four digits. This results in about a 0.025% difference in total predicted effort (between a one- to two-hour difference) in this example compared with using the seven-digit coefficients. Rounding the coefficients to two digits resulted in a 100-hour difference in predicted effort for some projects in this sample, which I consider unacceptable. If I were to use the equation in practice to calculate effort, I would retain all seven significant digits. Try to always simplify as much as possible what you present to others, but be sure to use all the accuracy of the initial equations for your own calculations.

`effort = 162.2074 x size0.7678 x e–0.3857xt14`

To prove to yourself that these two equations are the same, transform the effort equation back to the initial ln(effort) equation by taking the ln of both sides and applying the following three rules from algebra:

`ln(xyz) = ln(x) + ln(y) + ln(z),  ln(x)a = aln(x),  and  ln(e) = 1 `

In Chapters 3, 4, and 5, you will see how to extract the equation from models that include categorical variables. The impact of categorical variables in an equation is simply to modify the constant term (a).