Ordinary Least Squares

A method for choosing the unknown parameters in a linear regression model.

Ordinary Least Squares

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable and the output of the (linear) function of the independent variable.

In a linear regression model, the response variable is a linear function of the regressors:

y=Xβ+ε,\mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }},\,

where y\mathbf {y} and ε{\boldsymbol {\varepsilon }} are n×1n\times 1 vectors of the response variables and the errors of the nn observations, and X\mathbf {X} is an n×pn\times p matrix of regressors, also sometimes called the design matrix, whose row ii is xiT\mathbf {x} _{i}^{\operatorname {T} } and contains the ii-th observations on all the explanatory variables.

The goal is to find the coefficients β{\boldsymbol {\beta }} which fit the equations "best", in the sense of solving the quadratic minimization problem β^=arg minβS(β){\hat {\boldsymbol {\beta }}}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,S({\boldsymbol {\beta }}), where the objective function SS is given by

S(β)=i=1nyij=1pXijβj2=yXβ2.S({\boldsymbol {\beta }})=\sum _{i=1}^{n}\left|y_{i}-\sum _{j=1}^{p}X_{ij}\beta _{j}\right|^{2}=\left\|\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}\right\|^{2}.

This minimization problem has a unique solution, provided that the pp columns of the matrix X\mathbf {X} are linearly independent, given by:

β^=(XTX)1XTy.{\hat {\boldsymbol {\beta }}}=\left(\mathbf {X} ^{\operatorname {T} }\mathbf {X} \right)^{-1}\mathbf {X} ^{\operatorname {T} }\mathbf {y} .

The fitted values from the regression will be

y^=Xβ^=Py,{\hat {y}}=X{\hat {\beta }}=Py,

where P=X(XTX)1XT\mathbf {P} = \mathbf {X}(\mathbf {X} ^{\operatorname {T} }X)^{−1}\mathbf {X} ^{\operatorname {T} } is the projection matrix onto the space VV spanned by the columns of X\mathbf {X}. The annihilator matrix M=InP\mathbf {M}=\mathbf {I}_{n}-\mathbf {P} is a projection matrix onto the space orthogonal to VV.

Both matrices P\mathbf {P} and M\mathbf {M} are symmetric and idempotent (meaning that P2=P\mathbf {P}^{2}=\mathbf {P} and M2=M\mathbf {M}^{2}=\mathbf {M}), and relate to the data matrix X\mathbf {X} via identities PX=X\mathbf {P}\mathbf {X}=\mathbf {X} and MX=0\mathbf {M}\mathbf {X}=0. Matrix M\mathbf {M} creates the residuals from the regression:

ε^=yy^=yXβ^=My=M(Xβ+ε)=(MX)β+Mε=Mε.{\hat {\varepsilon }}=y-{\hat {y}}=y-X{\hat {\beta }}=My=M(X\beta +\varepsilon )=(MX)\beta +M\varepsilon =M\varepsilon .