# Forcing the Constant in Regression to Zero: Understanding Excel's LINEST() Error

One of the options in LINEST(), available as its third argument, is to force the constant in the regression equation to a value of zero. Whether to do so, regardless of the software in use, has been a contentious subject in the literature on regression analysis for decades. This paper touches only lightly on the question of whether it is appropriate to adopt the option: There are well reasoned arguments on each side of the issue. Instead, Excel expert Conrad Carlberg, author of Predictive Analytics: Microsoft Excel, focuses on a serious error in the LINEST() results when the option is selected. The error was not corrected until Excel 2003, and it remains in Excel 2010, in the values of R2 that can be displayed with chart trendlines.

One of the options that has always been available in Excel's LINEST() worksheet function is the const argument, short for constant. The function's syntax is:

=LINEST(Y values, X values, const, stats)

where:

• Y values represents the range that contains the outcome variable (or the variable that is to be predicted by the regression equation).
• X values represents the range that contains the variable or variables that are used as predictors.
• const is either TRUE or FALSE, and indicates whether LINEST() should include a constant (also called an intercept) in the equation, or should omit the constant. If const is TRUE or omitted, the constant is calculated and included. If const is FALSE, the constant is omitted from the equation.
• stats, if TRUE, tells LINEST() to include statistics that are helpful in evaluating the quality of the regression equation as a means of gauging the strength of the relationship between the Y values and the X values.

Setting the const argument to FALSE can easily have major implications for the nature of the results that LINEST() returns. And there is a real question of whether the const argument is a useful option at all. In fact, the question is not limited to LINEST() and Excel. It extends to the whole area of regression analysis, regardless of the platform used to carry out the regression.

Some credible practitioners believe that it's important to force the constant to zero in certain situations, usually in the context of regression discontinuity designs.

Others, including myself, believe that if setting the constant to zero appears to be a useful and informative option, then linear regression itself is often the wrong model for the data.

## The Excel 2003 Through 2010 Versions

Figure 1 shows an example of the difference between LINEST() results when the constant is calculated normally, and when it is forced to equal zero.

In Figure 1, the two sets of results are based on the same underlying data set, with the Y values in A2:A21 and the X values in B2:D21. The first set of results in F3:I7 is based on a constant calculated normally (const = TRUE). The second set of results in F10:I14 is based on a constant that is forced to equal zero (const = FALSE).

Notice that not a single value in the results is the same when the constant is forced to zero as when the constant is calculated normally.

### Basing the Deviations on the Means

Figure 2 begins to demonstrate how this comes about.

In Figure 2, cells G15:H15 contain the sums of squares for the regression and the residual, respectively. They are based on the predicted Y values, in L21:L40, and the deviations of the predicted values from the actuals, in M21:LM40.

The sums of squares are calculated by means of the DEVSQ() function, which subtracts every value in the argument's range from the mean of those values, squares the result, and sums the squares.

The value in cell G13, 0.595, is the R2 for the regression. One useful way to calculate that figure (and a useful way to think of it) is:

=G15/(G15+H15)

That is, R2 is the ratio of the sum of squares regression to the total sum of squares of the Y values. The result, 0.595, states that 59.5% of the variability in the Y values is attributable to variability in the composite of the X values.

Notice in Figure 2 that the statistics reported in G11:J15 are identical to those reported in G3:J7 (except that LINEST() reports the regression coefficients and their standard errors in the reverse of worksheet order). The former are calculated using Excel's matrix functions; the latter are calculated using the LINEST function.

Also notice in Figure 2 that the correlation between the actual and the predicted Y values is given in cell H22. It is 0.772. The square of that correlation, in cell H23, is 0.595—that is of course R2, the same value that you get by calculating the ratio of the sum of squares regression to the total sum of squares.

There's nothing magical about any of this. It's all as is expected according to the mathematics underlying regression analysis.

### Changing the Deviation Basis to Zero

Now examine the same sort of analysis shown in Figure 3.

Notice the values for the sum of squares regression and the sum of squares residual in Figure 3. They are both much larger than the sums of squares reported in Figure 2. The reason is that the deviations that are squared and summed in Figure 3 are the differences between the values and zero, not between the values and their mean.

This change in the nature of the deviations always increases the total sum of squares. (For the reason that this is so, see Statistical Analysis: Microsoft Excel 2010, Que, 2011, Chapter 2.)

The change from centering the predicted values on their mean, and the errors in prediction on their mean, also changes the relative size of the sums of squares. It can happen that the sum of squares regression gets larger relative to the sum of squares residual, and the result is to increase the apparent value of R2. Using the sums of squares shown in Figure 2 and Figure 3, for example:

Figure 2:

12870.037 / (12870.037 + 8742.913) = .595

(Compare with cells G5 and G13.)

Figure 3:

55879.198 / (55879.198 + 12875.802) = .813

(Compare with cells G5 and G13.)

So the suppression of the constant in Figure 3 has resulted in an increase in the R2 from .595 to .813, and that's a substantial increase. But does it really mean that the regression equation that's returned in Figure 3 is more accurate than the one returned in Figure 2? After all, the square root of R2 is the multiple correlation between the actual Y values and the composite, predicted Y values. The higher that correlation, the more accurate the prediction.

### How the Deviations Affect the R2

We can test that accuracy by calculating the correlations, squaring them, and comparing the results to the values for R2 that are returned under the two conditions for the constant: present and absent.

Look first again at Figure 2. There, the multiple R is calculated at .772, and the multiple R2 is calculated at .595 (cells H22 and H23). The value of .595 agrees with the value returned by LINEST() in cell G5, and by the ratio of the sums of squares in cell G13.

Now return to Figure 3. There, the multiple R is calculated at .684, and the multiple R2 is calculated at .468 (cells H22 and H23). But the value of .468 does not agree with the value returned by LINEST() in cell G5, and by the ratio of the sums of squares in cell G13.

In sum, running LINEST() on the data shown in Figure 2 and Figure 3 has these effects on the apparent accuracy of the predictions:

• The R2 reported by LINEST() without the constant is higher than that reported by LINEST() with the constant.
• The accuracy of the regression equation when evaluated by means of the correlation between the actual Y values and the predicted Y values is lower when the regression equation omits the constant.

This is an inconsistency, even an apparent contradiction. Regarded as a ratio of sums of squares, R2 is higher without the constant. Regarded as the square of the correlation between the actual and predicted Y values, R2 is lower without the constant.

### The Constant and the Deviations

Of course, the problem is due to the fact that in omitting the constant, we are redefining what's meant by the term "sum of squares." As a result, we're dismembering the meaning of the R2.

When you include the constant, the deviations are the differences between the observed values and their mean—that's what "least squares" is all about. When you omit the constant, the deviations are the differences between the observed values and zero—that's what "regression without the constant" is all about.

If the predicted values happen to be generally farther from zero than from their own mean, then the sum of squares regression will be inflated as compared to regression with the constant. In that case, the R2 will tend to be greater without the constant in the regression equation than it is with the constant.

## A Negative R2?

Finally, suppose you're still using a version of Excel through Excel 2002, and you have used LINEST(), without the constant, on a data set such as the one shown in Figure 4.

Even the idea of a negative R2 is ridiculous. Outside the realm of imaginary numbers, the square of a number cannot be negative, and ordinary least squares analysis does not involve imaginary numbers. How does the R2 value of -0.09122 in cell F4 of Figure 4 get there?

For that matter, how does Excel 2002 come up with a negative sum of squares regression and a negative F ratio (cells F6 and F5 respectively in Figure 4)? If the square of a number must be positive, then the sum of squared numbers must also be positive. And an F ratio is the ratio of two variances. A variance is an average of squared deviations, and therefore must also be positive—and the ratio of two positive numbers must also be positive.

### How to Get a Negative R2

The answer is poorly informed coding. Recall that, when the constant is calculated normally, the total sum of squares of the actual Y values equals the total of the sum of squares regression and the sum of squares residual. For example, in Figure 2, the total sum of squares is shown in cell A23 at 21612.950. It is returned by Excel's DEVSQ() function, which sums the squared deviations of each value from the mean of the values.

Also in Figure 2, the sum of squares regression and the sum of squares residual are shown in cells G15:H15. The total of those two figures is 21612.950: the value of the total sum of squares in cell A23.

Therefore, one way to calculate the sum of squares regression is to subtract the sum of squares residual from the total sum of squares. Another method, of course, is to calculate the sum of squares regression directly on the predicted values. But if you're writing the underlying code in, say, C, it's much quicker to get the sum of squares regression by subtraction than by doing the math from scratch on the predicted values.

When the constant is forced to zero, the sum of squares residual that's returned in all versions of Excel equals the result of pointing SUMSQ(), not DEVSQ(), at the residual values. This is entirely correct, given that you want to force the constant to zero.

The sum of squares residual using the normal calculation of the constant is as follows:

Residual = Actual – Predicted

That is, find each of N residual values, which is the actual Y value less the predicted Y value (Ŷ). Subtract the mean of the residuals () from each residual, square the difference, and sum the squared differences. Excel's DEVSQ() function does precisely this.

The sum of squares residual forcing the constant to zero is as follows:

or, more simply:

Excel's SUMSQ() function does precisely this.

### The Mistake, Corrected—In Part

Now, what LINEST() did in Excel version 2002 (and earlier) was to use the equivalent of SUMSQ() to get the sum of squares residual, but the equivalent of DEVSQ() to get the total sum of squares. If you add SUMSQ(Predicted values) to SUMSQ(Residual values), you get SUMSQ(Actual values).

But only in the situation where the mean of the actual values is zero can SUMSQ(Predicted values) plus SUMSQ(Residual values) equal DEVSQ(Actual values).

The problem has been corrected in Excel 2003 and subsequent versions. But as late as Excel 2010, the problem lives on in Excel charts. If you add a linear trendline to a chart, call for it to force the constant to zero, and display the R2 value on the chart, it can still show up as a negative number. See Figure 5.

Notice in Figure 5 that although Excel 2010 was used to produce the chart, the linear trendline's properties include a negative R2 value. (The equation would be correct, though, if you chose to show it along with R2.)

## Conclusion

This series of papers on how Microsoft has implemented LINEST() concludes with a discussion of Microsoft's extraordinary decision regarding how to handle extreme multicollinearity in the X variables.

### InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

## Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

## Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

### Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

### Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

### Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

### Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

### Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

### Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

## Other Collection and Use of Information

### Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

### Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

### Do Not Track

This site currently does not respond to Do Not Track signals.

## Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

## Children

This site is not directed to children under the age of 13.

## Marketing

Pearson may send or direct marketing communications to users, provided that

• Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
• Such marketing is consistent with applicable law and Pearson's legal obligations.
• Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
• Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

## Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

## Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

## Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

## Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

## Sharing and Disclosure

Pearson may disclose personal information, as follows:

• As required by law.
• With the consent of the individual (or their parent, if the individual is a minor)
• In response to a subpoena, court order or legal process, to the extent permitted or required by law
• To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
• In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
• To investigate or address actual or suspected fraud or other illegal activities
• To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
• To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
• To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.