# Optimisation of a Logistic Regression Model using Optimx in R

In my last post I used the `optim()` command to optimise a linear regression model. In this post, I am going to take that approach a little further and optimise a logistic regression model in the same manner. Thanks to John C. Nash, I got a first glimpse into the world of optimisation functions in `R`. His book showed me how important it is to compare the results of different optimisation algorithms. Where I used the `base` optimisation function `optim()` in my last post, I will use `optimx()` from the `optimx` package in this post. The `optimx` package and function were developed by Nash and colleagues as a wrapper of the `base` `optim()` function. There are numerous advantages in using `optimx()` instead of `optim()`. In my opinion, among the most important is an easier comparison between different optimisation methods. In case someone is more interested in the variety of optimisation functions and problems that come with them, I can warmly recommend John C. Nash’s book *Nonlinear Parameter Optimisation Using R Tools*.

However, coming back to my main focus: the optimisation of a logistic regression model using the `optimx()` function in R. For this, I would like to use the `icu` data set from the package `aplore3`. The data set contains data from 200 patients in an intensive care unit (ICU) and provides information whether the patient survived their stay or died. The particular question I would like to take a look at is whether the probability of dying during the ICU stay \(P(Y = 1)\) is related to age \((x_1)\) and sex \((x_2)\). In order to do so, I firstly would like to load the data set and set up our variables:

```
# 1. Prefix -------------------------------------------------------------------
# Remove all files from ls
rm(list = ls())
# Loading packages
require(aplore3)
require(optimx)
require(numDeriv)
require(dplyr)
# 2. Loading dataset ----------------------------------------------------------
#Reading the example data set icu from the package aplore3
icu <- as.data.frame(icu)
icu$sta_n <- ifelse(icu$sta == "Died", 1, 0)
icu$female <- ifelse(icu$gender == "Female", 1, 0)
```

The specific model that I would like to use is:

\[ P(Y|x_1, x_2) \sim \alpha + \beta_1 x_1 + \beta_2 x_2\] Using the logistic link-function we can find a linear function for the right side of the equation.

\[ \ln \Bigg[ \frac{P(Y|x_1, x_2)}{1 - P(Y|x_1, x_2)} \Bigg] = \alpha + \beta_1 x_1 + \beta_2 x_2\]

For this model, I would like to find the values \(\alpha\), \(\beta_1\) and \(\beta_2\) that maximize the log-likelihood function and hence, provides the best fit to our empirical data provided in the `icu` data set. Therefore, we also need to define the log-likelihood function for our logistic regression model. According to Hosmer and Lemeshow the log-likelihood function for a logistic regression model can be defined as

\[ \sum_{i = 1}^{n}(y_i - \ln(\pi_i)) + (1 - y_i) * \ln(1 - \pi_i)). \] Where \(\pi\) is defined using the sigmoid function as

\[ P(Y|x_1, x_2) = \pi = \frac{\exp(\alpha + \beta_1 x_1 + \beta_2 x_2)}{1 + \exp(\alpha + \beta_1 x_1 + \beta_2 x_2)}. \]

For the optimisation in R we need to define the log-likelihood function as a function in `R`. Additionally, we need to add the constrain \(0 < \pi < 1\) to our like-likelihood function, since we are interested in a probability \(\pi\) which needs to be in the range between \(0\) and \(1\). We can use an `if` statement in `R` to include our constrain to our R function. For all parameter values that return a value of \(\pi\) that is out of the bounds, we set the value to a very high number, for instance \(10^{200}\). Using these high numbers for values outside the bounds, the optimisation algorithm will dismiss these parameter values from our solutions. Note that we calculate `-sum()`, since we want to find the maximum of the log-likelihood function.

```
# 3. Define log-likelihood function for logistic regression model -------------
# (see applied logistic regression)
negll <- function(par){
#Extract guesses for alpha and beta1
alpha <- par[1]
beta1 <- par[2]
beta2 <- par[3]
#Define dependent and independent variables
y <- icu$sta_n
x1 <- icu$age
x2 <- icu$female
#Calculate pi and xb
xb <- alpha + beta1 * x1 + beta2 * x2
pi <- exp(xb) / (1 + exp(xb))
#Set high values for 0 < pi < 1
if(any(pi) > 1 || any(pi) < 0) {
val <- 1e+200
} else {
val <- -sum(y * log(pi) + (1 - y) * log(1 - pi))
}
val
}
```

Additionally to our log-likelihood function, it is also useful to specify the gradient function for our log-likelihood. This is not necessary for all optimisation algorithms, however, it improves the testing for convergence. Hence, we can obtain better estimates by also supplying the gradient function. According to Hosmer and Lemeshow, the gradient function of the log-likelihood function is defined as

\[ g_\alpha(\pi) = \sum(y_i - \pi_i) \] \[ g_{x_j}(\pi) = \sum(x_{ji} * (y_i - \pi_i)). \]

In our case we yield 3 gradient functions

\[ g_\alpha(\pi) = \sum(y_i - \pi_i) \] \[ g_{x_1}(\pi) = \sum(x_{1_i} * (y_i - \pi_i)) \]

\[ g_{x_2}(\pi) = \sum(x_{2_i} * (y_i - \pi_i)). \]

We can then use these 3 functions to calculate the gradients in R. Also here we need to use `-sum()` for the gradient functions.

```
# 4. Define fradient function for logistic regression model -------------------
# (see applied logistic regression)
negll.grad <- function(par){
#Extract guesses for alpha and beta1
alpha <- par[1]
beta1 <- par[2]
beta2 <- par[3]
#Define dependent and independent variables
y <- icu$sta_n
x1 <- icu$age
x2 <- icu$female
#Create output vector
n <- length(par[1])
gg <- as.vector(rep(0, n))
#Calculate pi and xb
xb <- alpha + beta1 * x1 + beta2 * x2
pi <- exp(xb) / (1 + exp(xb))
#Calculate gradients for alpha and beta1
gg[1] <- -sum(y - pi)
gg[2] <- -sum(x1 * (y - pi))
gg[3] <- -sum(x2 * (y - pi))
return(gg)
}
```

`R` also provides functions to estimate a numerical approximation of the gradient function. One of these function is `grad()` from the `numDeriv` package. It is useful to double check your analytic gradient function using one of these numerical approximations. Since, `optimx()` uses the `grad()` function for doing this, we are also going to use this function

```
# 4.1 Compare gradient function with numeric approximation of gradient ========
# compare gradient at 0, 0, 0
mygrad <- negll.grad(c(0, 0, 0))
numgrad <- grad(x = c(0, 0, 0), func = negll)
all.equal(mygrad, numgrad)
```

`## [1] TRUE`

We see, that the results from our analytic gradient function are identical to the results using the `grad()` function. So we can proceed and use the `optimx()` function to find the maximum of our log-likelihood function. As a first guess we use \(0\) as initial value for all unknown parameters.

```
# 4. Find maximum of log-likelihood function ----------------------------------
opt <- optimx(par = c(0, 0, 0), negll,
gr = negll.grad,
control = list(trace = 0, all.methods = TRUE))
# print reulsts of optimisation
# remove not needed information for purpose of presentation
summary(opt, order = "convcode") %>%
select(-value, -niter, -gevals, -fevals)
```

```
## p1 p2 p3 convcode kkt1 kkt2 xtime
## Nelder-Mead -3.05574129 0.02756878 -0.011265895 0 FALSE TRUE 0.01
## L-BFGS-B -3.05676203 0.02758493 -0.011303170 0 TRUE TRUE 0.01
## nlm -3.05669063 0.02758409 -0.011311003 0 TRUE TRUE 0.02
## nlminb -3.05669048 0.02758409 -0.011311590 0 TRUE TRUE 0.01
## CG -0.03645581 -0.01905827 -0.008845892 1 FALSE TRUE 0.14
## Rcgmin -3.05669070 0.02758409 -0.011310980 1 TRUE TRUE 0.21
## Rvmmin 0.00000000 0.00000000 0.000000000 21 FALSE TRUE 0.00
## BFGS NA NA NA 9999 NA NA 0.00
## spg NA NA NA 9999 NA NA 0.00
## ucminf NA NA NA 9999 NA NA 0.00
## newuoa NA NA NA 9999 NA NA 0.00
## bobyqa NA NA NA 9999 NA NA 0.00
## nmkb NA NA NA 9999 NA NA 0.00
## hjkb NA NA NA 9999 NA NA 0.00
```

A value of \(0\) in the `convcode` column of the output indicates, that the algorithm converged. Even though multiple algorithms converged and gave us a value for our three unknown parameters, they all provide slightly different estimates. Therefore, I think it would be interesting to compare our estimates with the estimates from the commonly used `glm()` function. Below I wrote a small function that estimates the mean differences in the estimates from the different optimisation methods and the `glm` model.

```
# 5. Estimate regression coeficents using glm ---------------------------------
glm_model <- glm(sta_n ~ age + female,
data = icu,
family = binomial(link = "logit"))
# 6. Comparing results from optimx and glm ------------------------------------
glm_results <- unname(coef(glm_model))
coef_opt <- coef(opt)
lapply(1:nrow(coef_opt), function(i){
optimisation_algorithm <- attributes(coef_opt)$dimnames[[1]][i]
mle_glm1 <- (coef_opt[i, "p1"] - glm_results[1])
mle_glm2 <- (coef_opt[i, "p2"] - glm_results[2])
mle_glm3 <- (coef_opt[i, "p3"] - glm_results[3])
mean_difference <- mean(mle_glm1, mle_glm2, mle_glm3, na.rm = TRUE)
data.frame(optimisation_algorithm, mean_difference)
}) %>%
bind_rows() %>%
filter(!is.na(mean_difference)) %>%
mutate(mean_difference = abs(mean_difference)) %>%
arrange(mean_difference)
```

```
## optimisation_algorithm mean_difference
## 1 Rcgmin 1.887799e-08
## 2 nlm 5.856574e-08
## 3 nlminb 2.071622e-07
## 4 L-BFGS-B 7.134885e-05
## 5 Nelder-Mead 9.493966e-04
## 6 CG 3.020235e+00
## 7 Rvmmin 3.056691e+00
```

This shows that the `Rcgmin` algorithm yield the most similar results to the estimates from the `glm` model. However, most of the algorithms in the table provide estimates similar to the estimates from the `glm` model, which indicates that our optimisation of the logistic regression model using the log-likelihood function and the gradient function worked out well.