In this lesson you will learn to:
General linear models are a family of statistical models based on the normal distribution.
A lot of classical statistical methods come under the unifying umbrella of general linear models:
are four examples of statistical methods which are all examples of general linear models.
General linear models are often called linear models for short.
General linear models have parameters, like all statistical models.
Fitting a general linear model means using the data to find suitable values for the parameters of the model.
You can use the lm()
function in R to fit a general linear model. Below is an example:
# Fit a simple general linear model to the human height data
m = lm(HEIGHT ~ 1, data=human)
The lm()
function uses a maximum likelihood approach to fit the model, but we don't need to know about the details of the fitting. We will need to know how to interpret the results of the fitting process.
In texts you will see several names used for essentially the same concept. Here are a few that you will encounter in this lesson:
A statistical glossary on Brightspace has a more complete list of statistical terminology
(Video 4 mins 46 sec)
lm()
command to fit and display the first general linear model shown in the video above.
# Subset data for heights of women
humanF = subset(human, SEX=='F')
# Fit a general linear model to heights of women
m = lm(HEIGHT ~ 1, data=humanF)
# Produce a summary of the fitted model
summary(m)
The R code below uses the lm()
command to fit and display the second general linear model shown in the video above.
# Fit a general linear model to heights of women and men
m = lm(HEIGHT ~ 1 + SEX, data=human)
# Produce a summary of the fitted model
summary(m)
Classical linear regression as a general linear model
(Video 3 mins 35 sec)
lm()
command to fit and display the general linear model shown in the video above.
# Subset data for heights of men
humanM = subset(human, SEX=='M')
# Fit a general linear model to the relationship
# between HEIGHT and WEIGHT of men
m = lm(HEIGHT ~ 1 + WEIGHT, data=humanM)
# Produce a summary of the fitted model
summary(m)
Classical ANCOVA as a general linear model
(Video 4 mins 4 sec)
lm()
command to fit and display the general linear model shown in the video above.
# Fit a general linear model to the relationship between
# height and weight for women and men
m = lm(HEIGHT~1+SEX+WEIGHT+SEX:WEIGHT, data=human)
# Produce a summary of the fitted model
summary(m)
A general linear model makes a number of assumptions about the population it is trying to model.
The main assumptions are (in order of decreasing importance):
Validating the assumptions of a general linear model
(Video 3 mins 4 sec)
Validating the assumptions of a general linear model
(Video 47 secs)
Validating the assumptions of a general linear model
(Video 4 mins 41 secs)
Validating the assumptions of a general linear model
(Video 1 min)
Validating the assumptions of a general linear model
(Video 2 mins 14 sec)
lm()
command to fit and display the general linear model shown in the video above.
# Fit a general linear model to the relationship between
# height and weight for women and men
m = lm(HEIGHT~1+SEX, data=human)
# Produce residuals versus fitted plot (homogeneity of variance)
plot(m, which=1)
# Produce QQ plot of the residuals (normality)
plot(m, which=2)
R will produce four validation plots by default
# Display the default four validation plots
plot(m)
The assumptions of homogeneity of variance and normality can be validated by looking at the residuals from a fitted general linear model.
General linear models are fairly robust to mild violations of the assumptions. They are most robust to departures from normality.
Below we give some examples of residual versus fitted plots and quantile-quantile plots from fitted general linear models that suggest one of these two assumptions has been violated.
General linear models bring together many classical methods into a single approach. The classical terminology is still used even if the statistical model is a general linear model.
Below are some examples you will come across:
Classical term | General linear model equivalent |
---|---|
ANOVA (analysis of variance) | A general linear model where all explanatory variables (usually no more than two) are qualitative (i.e. factors) |
t-test | A general linear model with one qualitative explanatory variable with two levels |
Linear regression | A general linear model with one quantitative continuous explanatory variable, and an expected straight-line relationship between response and explanatory variable |
ANCOVA (analysis of covariance) | A general linear model with two explanatory variables (one quantitative continuous and one qualitative). |