This post comes out of the blue, nearly 2 years since my last one. I realize I’ve been lazy, so here’s hoping I move from an inertia of rest to that of motion, implying, regular and (hopefully) relevant posts. I also chanced upon some wisdom while scrolling through my Twitter feed:

This blog post in particular was meant to be a reminder to myself and other R users that the much used lm() function in R (for fitting linear models) can be replaced with some handy matrix operations to obtain regression coefficients, their standard errors and other goodness-of-fit stats printed out when `summary()`

is called on an `lm`

object.

Linear regression can be formulated mathematically as follows:

,

is the outcome variable and is the data matrix of independent predictor variables (including a vector of ones corresponding to the intercept). The ordinary least squares (OLS) estimate for the vector of coefficients is:

The covariance matrix can be obtained with some handy matrix operations:

given that

The standard errors of the coefficients are basically and with these, one can compute the t-statistics and their corresponding p-values.

Lastly, the F-statistic and its corresponding p-value can be calculated after computing the two residual sum of squares (RSS) statistics:

- – for the full model with all predictors
- – for the partial model () with the outcome observed mean as estimated outcome

I wrote some R code to construct the output from summarizing `lm`

objects, using all the math spewed thus far. The data used for this exercise is available in R, and comprises of standardized fertility measures and socio-economic indicators for each of 47 French-speaking provinces of Switzerland from 1888. Try it out and see for yourself the linear algebra behind linear regression.

Hope this was useful and worth your time!

19.173322
72.852003

### Like this:

Like Loading...