# Optimisation of a Linear Regression Model in R

In this post I would like to show how to manually optimise a linear regression model using the `optim()` command in R. Usually if you learn how to fit a linear regression model in R, you would learn how to use the `lm()` command to do this. However, if you would like to know how to do this manually, examples are rare.

If you want to optimise a function, the most important question of course is which function should be optimised? As it is taught in most statistic classes this function is in case of a linear regression model the distribution of the residuals \((R)\). We know that this distribution follows a normal distribution with mean 0 and a unknown standard deviation \(\sigma\):

\[\sum_{i = 1}^{i = n} R_i \sim N(0, \sigma)\]

Where \(R\) equals:

\[ R_i = y_1 - \hat{y_i} \]

Thus, we are interested in a function for \(\hat{y_i}\) which minimises the residuals and hence, gives us the best estimates for the function of interest. Taking all this information together we can write the function we would like to optimise, shown below.

```
#Reading the example data set icu from the package aplore3
library(aplore3)
y = icu$sys #Set our depended variable
x1 = icu$age #Set our fist independed variable
x2 = as.numeric(icu$gender) #Set our second independed variable
#Define our liklihood function we like to optimise
ll_lm <- function(par, y, x1, x2){
alpha <- par[1]
beta1 <- par[2]
beta2 <- par[3]
sigma <- par[4]
R = y - alpha - beta1 * x1 - beta2 * x2
-sum(dnorm(R, mean = 0, sigma, log = TRUE))
}
```

A function that can be used for the `optim()` command needs to have a `par` argument, which includes the unknown parameters. The `par` arguments needs a vector with initial values or guesses for all unknown parameters. As shown in the example above the `par` argument includes initial values for all 4 unknown parameters. In this example the first value of the `par` arguments equals \(\alpha\), the second \(\beta_1\), the third \(\beta_2\) and the fourth \(\sigma\), which is our unknown standard deviation of the normal distribution of our residuals. Additionally, we included our two independent variables \(x_1\) and \(x_2\) and our dependent variable y as function arguments in the `optim()` call.

Also note, that I used the `log` argument of the `dnorm()` function to get the logarithmic values of the dnorm function. This is necessary, if we would like to sum the single likelihood values instead of taking the product of them.

The linear model we would like to fit in this example is:

\[ E(Y|\mathbf{X}) = \alpha + \beta_1 x_1 + \beta_2 x_2\]

Hence, the residuals for this model can be calculated as:

\[R = y - \alpha - \beta_1 x_1 - \beta_2 x_2\]

Since we know that these residuals follow a normal distribution with mean 0, we just need to find the standard deviation for the normal distribution of the residuals and the values for our coefficients, that fits best our data. To do this, we would like to minimise the sum of errors. This is done by the `optim()` command. However, since the `optim()` command always maximises functions, we just need to put a minus before our summation.

Before, we run the `optim()` command we also need to find good guesses for our estimates, since the initial parameter values which are chosen for the optimisation influences our estimates. In this case we just calculate the conditional means for our subgroups and use them as guess for our coefficients.

```
est_alpha <- mean(icu$sys)
est_beta1 <- mean(icu$sys[icu$age >= 40 & icu$age <= 41]) - mean(icu$sys[icu$age >= 41 & icu$age <= 52])
est_beta2 <- mean(icu$sys[icu$gender == "Male"]) - mean(icu$sys[icu$gender == "Female"])
est_sigma <- sd(icu$sys)
```

Now we can use the `optim()` function to search for our maximum likelihood estimates (mles) for the different coefficients. For the `optim()` function, we need the function we like to optimise, in this case `ll_lm()`, our guesses for the estimates and the empirical data we want to use for the optimisation.

```
mle_par <- optim(fn = ll_lm, #Function to be optimised
par = c(alpha = est_alpha, #Initial values
beta1 = est_beta1,
beta2 = est_beta2,
sigma = est_sigma),
y = icu$sys, #Empirical data from the data set icu
x1 = icu$age,
x2 = as.numeric(icu$gender))
mle_par$par #Showing estimates for unknown parameters
```

```
## alpha beta1 beta2 sigma
## 123.99854056 0.06173058 3.37040256 32.77094809
```

If we now compare our estimates with the results of the `lm()` command for the same model, we can see slight differences in the estimates, which may be due to different optimisation algorithms or due to our initial guesses for the parameters.

`summary(lm(sys ~ age + as.numeric(gender), data = icu))`

```
##
## Call:
## lm(formula = sys ~ age + as.numeric(gender), data = icu)
##
## Residuals:
## Min 1Q Median 3Q Max
## -98.544 -20.510 -1.445 18.884 122.272
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.39209 9.32708 13.337 <2e-16 ***
## age 0.06276 0.11738 0.535 0.593
## as.numeric(gender) 3.09867 4.83773 0.641 0.523
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.05 on 197 degrees of freedom
## Multiple R-squared: 0.003889, Adjusted R-squared: -0.006224
## F-statistic: 0.3845 on 2 and 197 DF, p-value: 0.6813
```

This approach for a manual optimisation of a linear regression model can also be adopted for other linear regression model with more coefficients. In this case the formula for the residuals needs to be adjusted to the structure of the model that you like to fit.