Optimisation of a Linear Regression Model in R

In this post I would like to show how to manually optimise a linear regression model using the optim() command in R. Usually if you learn how to fit a linear regression model in R, you would learn how to use the lm() command to do this. However, if you would like to know how to do this manually, examples are rare.
Optimisation
R
Author

Joshua Philipp Entrop

Published

April 26, 2020

In this post I would like to show how to manually optimise a linear regression model using the optim() command in R. Usually if you learn how to fit a linear regression model in R, you would learn how to use the lm() command to do this. However, if you would like to know how to do this manually, examples are rare.

If you want to optimise a function, the most important question of course is which function should be optimised? As it is taught in most statistic classes this function is in case of a linear regression model the distribution of the residuals \((R)\). We know that this distribution follows a normal distribution with mean 0 and a unknown standard deviation \(\sigma\):

\[\sum_{i = 1}^{i = n} R_i \sim N(0, \sigma)\]

Where \(R\) equals:

\[ R_i = y_1 - \hat{y_i} \]

Thus, we are interested in a function for \(\hat{y_i}\) which minimises the residuals and hence, gives us the best estimates for the function of interest. Taking all this information together we can write the function we would like to optimise, shown below.

#Reading the example data set icu from the package aplore3
library(aplore3)
y = icu$sys                 #Set our depended variable
x1 = icu$age                #Set our fist independed variable
x2 = as.numeric(icu$gender) #Set our second independed variable

#Define our liklihood function we like to optimise
ll_lm <- function(par, y, x1, x2){
  
  alpha <- par[1]
  beta1 <- par[2]
  beta2 <- par[3]
  sigma <- par[4]
  
  R = y - alpha - beta1 * x1 - beta2 * x2
  
  -sum(dnorm(R, mean = 0, sigma, log = TRUE))
}

A function that can be used for the optim() command needs to have a par argument, which includes the unknown parameters. The par arguments needs a vector with initial values or guesses for all unknown parameters. As shown in the example above the par argument includes initial values for all 4 unknown parameters. In this example the first value of the par arguments equals \(\alpha\), the second \(\beta_1\), the third \(\beta_2\) and the fourth \(\sigma\), which is our unknown standard deviation of the normal distribution of our residuals. Additionally, we included our two independent variables \(x_1\) and \(x_2\) and our dependent variable y as function arguments in the optim() call.

Also note, that I used the log argument of the dnorm() function to get the logarithmic values of the dnorm function. This is necessary, if we would like to sum the single likelihood values instead of taking the product of them.

The linear model we would like to fit in this example is:

\[ E(Y|\mathbf{X}) = \alpha + \beta_1 x_1 + \beta_2 x_2\]

Hence, the residuals for this model can be calculated as:

\[R = y - \alpha - \beta_1 x_1 - \beta_2 x_2\]

Since we know that these residuals follow a normal distribution with mean 0, we just need to find the standard deviation for the normal distribution of the residuals and the values for our coefficients, that fits best our data. To do this, we would like to minimise the sum of errors. This is done by the optim() command. However, since the optim() command always maximises functions, we just need to put a minus before our summation.

Before, we run the optim() command we also need to find good guesses for our estimates, since the initial parameter values which are chosen for the optimisation influences our estimates. In this case we just calculate the conditional means for our subgroups and use them as guess for our coefficients.

est_alpha <- mean(icu$sys)
est_beta1 <- mean(icu$sys[icu$age >= 40 & icu$age <= 41]) - mean(icu$sys[icu$age >= 41 & icu$age <= 52])
est_beta2 <- mean(icu$sys[icu$gender == "Male"]) - mean(icu$sys[icu$gender == "Female"])
est_sigma <- sd(icu$sys)

Now we can use the optim() function to search for our maximum likelihood estimates (mles) for the different coefficients. For the optim() function, we need the function we like to optimise, in this case ll_lm(), our guesses for the estimates and the empirical data we want to use for the optimisation.

mle_par <- optim(fn = ll_lm,                #Function to be optimised
                 par = c(alpha = est_alpha, #Initial values
                         beta1 = est_beta1, 
                         beta2 = est_beta2, 
                         sigma = est_sigma), 
                 y = icu$sys,               #Empirical data from the data set icu
                 x1 = icu$age,
                 x2 = as.numeric(icu$gender))

mle_par$par                                 #Showing estimates for unknown parameters
       alpha        beta1        beta2        sigma 
123.99854056   0.06173058   3.37040256  32.77094809 

If we now compare our estimates with the results of the lm() command for the same model, we can see slight differences in the estimates, which may be due to different optimisation algorithms or due to our initial guesses for the parameters.

summary(lm(sys ~ age + as.numeric(gender), data = icu))

Call:
lm(formula = sys ~ age + as.numeric(gender), data = icu)

Residuals:
    Min      1Q  Median      3Q     Max 
-98.544 -20.510  -1.445  18.884 122.272 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        124.39209    9.32708  13.337   <2e-16 ***
age                  0.06276    0.11738   0.535    0.593    
as.numeric(gender)   3.09867    4.83773   0.641    0.523    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 33.05 on 197 degrees of freedom
Multiple R-squared:  0.003889,  Adjusted R-squared:  -0.006224 
F-statistic: 0.3845 on 2 and 197 DF,  p-value: 0.6813

This approach for a manual optimisation of a linear regression model can also be adopted for other linear regression model with more coefficients. In this case the formula for the residuals needs to be adjusted to the structure of the model that you like to fit.