- Data Validation
- Variable and Model Selection
- Preliminary Analyses
- Building the Multi-Variable Model
- Extracting the Equation
- Final Comments

## Extracting the Equation

Our model has passed all our checks. So far, everything has been calculated
automatically. We have not been forced to extract the equation and calculate
effort ourselves. What is the actual equation? From the final model (Example
1.23), I see that the equation to calculate *leffort* is:

ln(effort) = 5.088876 + 0.7678266 xln(size) – 0.3856721 xt14

How did I read the equation off the output? The equation is a linear equation
of the form *y = a + bx _{1} + cx_{2}*.

*y*is

*ln*(

*effort*),

*x*is

_{1}*ln*(

*size*), and

*x*is

_{2}*t14*.

*a*,

*b*, and

*c*are the coefficients (

*Coef*.) from the output. The constant (

*_cons*),

*a*, is 5.088876, the coefficient of

*ln*(

*size*),

*b*, is 0.7678266, and the coefficient of

*t14*,

*c*, is –0.3856721.

In a presentation or report, I give the results in the form of an equation
for *effort*, not *ln*(*effort*). I find it is easier for people
to understand. Keep in mind that most people don't want to know how you
analyzed the data or the equation; they just want to know the management
implications. I almost never include an equation in an oral presentation. By all
means, prepare some slides about the methodology and the equation, but do not
show them unless specifically asked to go into the details in public.

To transform *ln*(*effort*) into *effort*, I take the inverse
natural log (or *e*) of each side of the equation. To do this accurately, I
use all seven significant digits of the coefficients from the output. However,
when I present the equation, I round the transformed coefficients to four
digits. This results in about a 0.025% difference in total predicted
*effort* (between a one- to two-hour difference) in this example compared
with using the seven-digit coefficients. Rounding the coefficients to two digits
resulted in a 100-hour difference in predicted *effort* for some projects
in this sample, which I consider unacceptable. If I were to use the equation in
practice to calculate *effort*,* *I would retain all seven significant
digits. Try to always simplify as much as possible what you present to others,
but be sure to use all the accuracy of the initial equations for your own
calculations.

effort= 162.2074 xsize^{0.7678}xe^{–0.3857xt14}

To prove to yourself that these two equations are the same, transform the
*effort* equation back to the initial *ln*(*effort*) equation by
taking the *ln* of both sides and applying the following three rules from
algebra:

ln(xyz) = ln(x) + ln(y) + ln(z), ln(x)^{a}= aln(x), and ln(e) = 1

In Chapters 3, 4, and 5, you will see how to extract the equation from models
that include categorical variables. The impact of categorical variables in an
equation is simply to modify the constant term (*a*).