Ordinary Least Squares - Code Snippet

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable and the output of the (linear) function of the independent variable.

In a linear regression model, the response variable is a linear function of the regressors:

\mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }},\,

where $\mathbf {y}$ and ${\boldsymbol {\varepsilon }}$ are $n\times 1$ vectors of the response variables and the errors of the $n$ observations, and $\mathbf {X}$ is an $n\times p$ matrix of regressors, also sometimes called the design matrix, whose row $i$ is $\mathbf {x} _{i}^{\operatorname {T} }$ and contains the $i$ -th observations on all the explanatory variables.

The goal is to find the coefficients ${\boldsymbol {\beta }}$ which fit the equations "best", in the sense of solving the quadratic minimization problem ${\hat {\boldsymbol {\beta }}}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,S({\boldsymbol {\beta }})$ , where the objective function $S$ is given by

S({\boldsymbol {\beta }})=\sum _{i=1}^{n}\left|y_{i}-\sum _{j=1}^{p}X_{ij}\beta _{j}\right|^{2}=\left\|\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}\right\|^{2}.

This minimization problem has a unique solution, provided that the $p$ columns of the matrix $\mathbf {X}$ are linearly independent, given by:

{\hat {\boldsymbol {\beta }}}=\left(\mathbf {X} ^{\operatorname {T} }\mathbf {X} \right)^{-1}\mathbf {X} ^{\operatorname {T} }\mathbf {y} .

The fitted values from the regression will be

{\hat {y}}=X{\hat {\beta }}=Py,

where $\mathbf {P} = \mathbf {X}(\mathbf {X} ^{\operatorname {T} }X)^{−1}\mathbf {X} ^{\operatorname {T} }$ is the projection matrix onto the space $V$ spanned by the columns of $\mathbf {X}$ . The annihilator matrix $\mathbf {M}=\mathbf {I}_{n}-\mathbf {P}$ is a projection matrix onto the space orthogonal to $V$ .

Both matrices $\mathbf {P}$ and $\mathbf {M}$ are symmetric and idempotent (meaning that $\mathbf {P}^{2}=\mathbf {P}$ and $\mathbf {M}^{2}=\mathbf {M}$ ), and relate to the data matrix $\mathbf {X}$ via identities $\mathbf {P}\mathbf {X}=\mathbf {X}$ and $\mathbf {M}\mathbf {X}=0$ . Matrix $\mathbf {M}$ creates the residuals from the regression:

{\hat {\varepsilon }}=y-{\hat {y}}=y-X{\hat {\beta }}=My=M(X\beta +\varepsilon )=(MX)\beta +M\varepsilon =M\varepsilon .