- Statistics and Machine Learning
- The Impact of Big Data
- Supervised and Unsupervised Learning
- Linear Models and Linear Regression
- Generalized Linear Models
- Generalized Additive Models
- Logistic Regression
- Enhanced Regression
- Survival Analysis
- Decision Tree Learning
- Bayesian Methods
- Neural Networks and Deep Learning
- Support Vector Machines
- Ensemble Learning
- Automated Learning
Generalized Linear Models
Standard linear models assume that the response measure is normally distributed and that there is a constant change in the response measure for each change in predictor variables. In many real-world situations, however, this assumption is inappropriate, and a linear model may be unreliable.
For example, suppose that you want to model how weekly in-store sales of an item respond to targeted coupons. A linear model might tell you that sales per store increase by a thousand units for each one-dollar decrease in the net price. However, when you inspect the prediction errors for this model, you find that the model significantly overestimates the incremental sales for stores that typically sell only a thousand units a week, and significantly underestimates incremental sales for stores that typically sell ten thousand units a week or more.
Based on analysis of the errors from the linear model, the analyst reformulates the model to predict the percentage change in store sales based on changes in the net price. In other words, the analyst changes the model from a linear response model to an exponential or log-linear response model. Generalized linear models provide the necessary flexibility to make this change.
Whereas standard linear models require a normally distributed response measure, generalized linear models work effectively with many different distributions. Moreover, while linear models assume a linear relationship between the predictors and the response measure, generalized linear models simply assume this relationship is linear when transformed by a link function.
With generalized linear models, the analyst specifies three things: a probability distribution that describes the response measure, a link function that describes the relationship between the predictors and the mean of the response measure, and a set of linear predictors. Probability distributions can include any member of the exponential family, including the Bernoulli, Beta, Chi-squared, Dirichlet, Exponential, Gamma, Normal, Poisson, and Wishart distributions.
Generalized linear models are more demanding for the analyst due to the number and complexity of controllable parameters. Software implementations of GLM often include diagnostic tools to help the analyst diagnose the appropriate distribution for the response measure and recommend a link function.