Simple Linear Regression

A linear regression model with a single explanatory variable.

Simple Linear Regression

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. Suppose we observe nn data pairs and call them (xi,yi),i=1,...,n{(x_i, y_i), i = 1, ..., n}. We can describe the underlying relationship between yiy_i and xix_i involving this error term εiε_i by

yi=α+βxi+εi.y_{i}=\alpha +\beta x_{i}+\varepsilon _{i}.

To estimate the regression coefficients α\alpha, β\beta, here we adopt the least squares approach: a line that minimizes the sum of squared residuals ε^i{\widehat {\varepsilon }}_{i} (differences between actual and predicted values of the dependent variable yy). In other words, α^{\widehat {\alpha }} and β^{\widehat {\beta }} solve the following minimization problem:

(α^,β^)=argmin(Q(α,β)),({\hat {\alpha }},\,{\hat {\beta }})=\operatorname {argmin} \left(Q(\alpha ,\beta )\right),

where the objective function QQ is:

Q(α,β)=i=1nε^i2=i=1n(yiαβxi)2 .Q(\alpha ,\beta )=\sum _{i=1}^{n}{\widehat {\varepsilon }}_{i}^{\,2}=\sum _{i=1}^{n}(y_{i}-\alpha -\beta x_{i})^{2}\ .

By expanding to get a quadratic expression in α\alpha and β\beta, we can derive minimizing values of the function arguments, denoted α^{\widehat {\alpha }} and β^{\widehat {\beta }}:

α^=yˉβ^xˉ,β^=sx,ysx2=rxysysx{\begin{aligned}{\widehat {\alpha }}&={\bar {y}}-{\widehat {\beta }}\,{\bar {x}},\\[5pt]{\widehat {\beta }}&={\frac {s_{x,y}}{s_{x}^{2}}}=r_{xy}{\frac {s_{y}}{s_{x}}}\end{aligned}}

where

  • rxy=sx,ysxsyr_{xy}={\frac{s_{x,y}}{s_{x}s_{y}}} is the sample correlation coefficient between xx and yy
  • sxs_{x} and sys_{y} are the uncorrected sample standard deviations of xx and yy
  • sx2s_{x}^{2} and sx,ys_{x,y} are the sample variance and sample covariance, respectively

The residual sum of squares (RSS) is the sum of the squares of ε^i{\widehat {\varepsilon \,}}_{i}. RSS for the least-squares regression line is given by:

RSS=sy2+sx,y2sx2=sy2(1rxy2)\operatorname {RSS} =s_{y}^{2}+{\frac{s_{x,y}^{2}}{s_{x}^{2}}}=s_{y}^{2}(1-r_{xy}^{2})

The coefficient of determination (R squared) is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). The most general definition of the coefficient of determination is:

R2=1RSSTSSR^{2}=1-{\frac{\operatorname {RSS}}{\operatorname {TSS}}}

For the case of a linear model with a single independent variable, the coefficient of determination is the square of rxyr_{xy}, Pearson's product-moment coefficient.