# Overview

### Description

buildmlx uses SAMBA (Stochastic Approximation for Model Building Algorithm), an iterative procedure to accelerate and optimize the process of model building by identifying at each step how best to improve some of the model components. This method allows to find the optimal statistical model which minimizes some information criterion in very few steps.

Penalization criterion can be either a custom penalization of the form gamma*(number of parameters), AIC (gamma=2) or BIC (gamma=log(N)).

Several strategies can be used for building the covariate model at each iteration of the algorithm: direction=“full” means that all the possible models are compared (default when the number of covariates is less than 10). Othrwise, direction is the mode of stepwise search of stepAIC {MASS}, can be one of “both”, “backward”, or “forward”, with a default of “both” when there are at least 10 covariates.

### Usage

buildmlx <- function(project, final.project=NULL, model="all",
paramToUse="all", covToTest="all", covToTransform="none",
criterion="BIC", direction=NULL,
max.iter=20, print=TRUE, nb.model=1, linearization=FALSE,
seqcc=FALSE, p.max=1, steps=1000, exp.iter=2)

### Arguments

project
the initial Monolix project
final.project
the final Monolix project (default adds “_built" to the original project)
model
the components of the statistical model to build among c(“covariate”, “correlation”, “residualError”), (default=“all”)
paramToUse
list of parameters possibly function of covariates (default=“all”)
covToTest
components of the covariate model that can be modified (default=“all”)
covToTransform
list of (continuous) covariates to be log-transformed (default=“none”)
criterion
penalization criterion to optimize c({“BIC”}, “AIC”, gamma)
direction
method for covariate search c({“full”}, “both”, “backward”, “forward”), (default=“full” or “both”)
max.iter
maximum number of iterations (default=20)
exp.iter
number of iterations during the exploratory phase (default=1)
print
{TRUE}/FALSE display the results (default=TRUE)
nb.model
number of models to display at each iteration (default=1)
linearization
TRUE/{FALSE} whether the computation of the likelihood is based on a linearization of the model (default=FALSE)
seqcc
TRUE/{FALSE} whether the covariate model is built before the correlation model (default=FALSE)
p.max
maximum p-value (for the correlation test) for keeping a covariate in a model (default=1)
steps
maximum number of iteration for stepAIC/BIC (default=1000)

# An example: warfarin PK

## Using the default settings

We need first to load the Rsmlx library and set the Monolix connectors in order to run Monolix from R

library(Rsmlx)
initializeMlxConnectors(software = "monolix")
## [INFO] The path to monolix installation directory has not been given.
## [INFO] The directory specified in the initialization file of the Lixoft Suite (located at "C:\Users\Marc\lixoft\lixoft.ini") will be used by default.
## -> "C:/ProgramData/Lixoft/MonolixSuite2018R1"
## [INFO] The library mlxConnectors ("C:/ProgramData/Lixoft/MonolixSuite2018R1/lib/mlxConnectors.dll") is already loaded -> nothing to be done.

We select a Monolix project where the initial statistical model is defined

warfPK.project <- "projects/warfarinPK.mlxtran"

The structural PK model is a one compartment model for oral administration with a lag time. There are three individual covariates in the data: weight, sex and age.

The statistical model is a basic model, without any covariates, without any correlation between random and with a constant error model. The four individul PK parameters $$Tlag$$, $$ka$$, $$V$$ and $$Cl$$ are lognormally distributed.

We can now use buildmlx for building the statistical model:

warfPK.res1 <- buildmlx(warfPK.project)
## [INFO] Results have been succesfully loaded
## [INFO] Results have been succesfully loaded
##
##
## direction: = "full" will be used for the covariate search
##
## ____________________________________________
## Initialization:
## Covariate model:
##      sex age wt
## Tlag   0   0  0
## ka     0   0  0
## V      0   0  0
## Cl     0   0  0
##
## Correlation model:
## list()
##
## Residual error model:
##         y1
## "constant"
##
## Estimated criteria (importanceSampling):
##   -2LL   s.e.    AIC    BIC
## 739.34   0.09 757.34 770.53
## ____________________________________________
## Iteration 1:
##
## Covariate model:
##      sex age wt
## Tlag   0   0  0
## ka     0   0  0
## V      0   0  1
## Cl     0   1  1
##
## Correlation model:
## list()
##
## Residual error model:
##          y1
## "combined2"
## Run scenario for model 1 ...
## Estimation of the population parameters...
## Sampling from the conditional distribution...
## Estimation of the log-likelihood...
##
## Estimated criteria (importanceSampling):
##   -2LL   s.e.    AIC    BIC
## 651.90   0.10 677.90 696.96
## No difference between two successive iterations
## ____________________________________________
## Final model:
##
## Covariate model:
##      age sex wt
## Tlag   0   0  0
## ka     0   0  0
## V      0   0  1
## Cl     1   0  1
##
## Correlation model:
## list()
##
## Residual error model:
##          y1
## "combined2"
##
## Estimated criteria (importanceSampling):
##   -2LL   s.e.    AIC    BIC
## 651.90   0.10 677.90 696.96
## ____________________________________________
## total time: 7.2s
## ____________________________________________

The output of buildmlx contains the name of the new project and the new statistical model

print(names(warfPK.res1))
## [1] "project"           "covariate.model"   "correlation.model"
## [4] "error.model"
print(warfPK.res1$project) ## [1] "projects/warfarinPK_built.mlxtran" print(warfPK.res1$covariate.model)
## $Tlag ## sex age wt ## FALSE FALSE FALSE ## ##$ka
##   sex   age    wt
## FALSE FALSE FALSE
##
## $V ## sex age wt ## FALSE FALSE TRUE ## ##$Cl
##   sex   age    wt
## FALSE  TRUE  TRUE
print(warfPK.res1$correlation) ## list() print(warfPK.res1$error)
##          y1
## "combined2"

Remarks:

• The new Monolix project tt>warfarinPK_built.mlxtran created by buildmlx can now be loaded and used in Monolix.
• In project warfarinPK_built.mlxtran, volume is function of weight. Since $$V_i$$ is log-normally distributed, $$\log(V_i)$$ is a linear function of $$W_i$$ in this model: $\log(V_i) = \log(V_{\rm pop}) + \beta_{V,W} \left({W_i}-{W_{\rm pop}} \right) + \eta_{V,i}$ where $$W_{\rm pop}=\bar{W}$$ is the empirical mean weight (in other words, continuous covariates are automatically centered by their mean values). We will see below how to transform continuous covariates automatically.
• A subfolder buildmlx is created in the result folder warfarinPK of the original project. All the intermediary runs and a summary are stored in this folder.

## Selecting covariates

We can select the list of covariates to test (the original covariate model is used for the other ones)

warfPK.res2 <- buildmlx(project=warfPK.project, covToTest = "wt", print=FALSE)
print(warfPK.res2$covariate.model) ##$Tlag
##   sex   age    wt
## FALSE FALSE FALSE
##
## $ka ## sex age wt ## FALSE FALSE FALSE ## ##$V
##   sex   age    wt
## FALSE FALSE  TRUE
##
## $Cl ## sex age wt ## FALSE FALSE TRUE ## Selecting parameters We can select the list of parameters for which a covariate model is built warfPK.res3 <- buildmlx(project=warfPK.project, paramToUse =c("ka", "Cl"), print=FALSE) print(warfPK.res3$covariate.model)
## $Tlag ## sex age wt ## FALSE FALSE FALSE ## ##$ka
##   sex   age    wt
## FALSE FALSE FALSE
##
## $V ## sex age wt ## FALSE FALSE FALSE ## ##$Cl
##   sex   age    wt
## FALSE  TRUE  TRUE

## Using AIC instead of BIC

warfPK.res4 <- buildmlx(project=warfPK.project, criterion="AIC", print=FALSE)
print(warfPK.res4$covariate.model) ##$Tlag
##   sex   age    wt
## FALSE FALSE FALSE
##
## $ka ## sex age wt ## FALSE FALSE FALSE ## ##$V
##   sex   age    wt
##  TRUE FALSE  TRUE
##
## $Cl ## sex age wt ## FALSE TRUE TRUE ## Testing log-transforms of the covariates Instead of considering that the individual log-parameters are linear functions of the original covariates, we can consider a linear relationship between log-covariates and log-parameters, such as, for instance: $\log(V_i) = \log(V_{\rm pop}) + \beta_{V,W} \log\left({W_i}/{W_{\rm pop}} \right) + \beta_{V,A} \log\left({A_i}/{A_{\rm pop}} \right) + \eta_{V,i}$ Another equivalent representation of this model writes $V_i = V_{\rm pop}\left(\frac{W_i}{W_{\rm pop}} \right)^{\beta_{V,W}}\left(\frac{A_i}{A_{\rm pop}} \right)^{\beta_{V,A}}e^{\eta_{V,i}}$ In this example, additional covariates $${\rm lwt}_i=\log({W_i}/{\bar{W}})$$ and $${\rm lage}_i=\log({A_i}/{\bar{A}})$$, where $$\bar{W}$$ and $$\bar{A}$$ are the empirical means of $$(W_i)$$ and $$(A_i)$$, are automatically created, warfPK.res5 <- buildmlx(project=warfPK.project, covToTransform = "all", print=FALSE) print(warfPK.res5$covariate.model)
## $Tlag ## sex age lage lwt wt ## FALSE FALSE FALSE FALSE FALSE ## ##$ka
##   sex   age  lage   lwt    wt
## FALSE FALSE FALSE FALSE FALSE
##
## $V ## sex age lage lwt wt ## FALSE FALSE FALSE TRUE FALSE ## ##$Cl
##   sex   age  lage   lwt    wt
## FALSE FALSE  TRUE  TRUE FALSE

# Building models with many covariates

SAMBA implemented in buildmlx is very powerful when a large number of covariates are available. In this new example, 40 artificial continuous and categorical covariates have been added to the original warfarin datafile.

warfPK40.project <- "projects/warfarinPK_40covariates.mlxtran"

The total number of possible covariate models is really huge but buildmlx will find the solution very quickly:

warfPK40.res <- buildmlx(warfPK40.project, covToTransform = c("age", "wt"))
## [INFO] Results have been succesfully loaded
## [INFO] Results have been succesfully loaded
##
##
## direction: = "both" will be used for the covariate search
##
## ____________________________________________
## Initialization:
## Covariate model:
##      cat1 cat10 cat11 cat12 cat13 cat14 cat15 cat16 cat17 cat18 cat19 cat2
## Tlag    0     0     0     0     0     0     0     0     0     0     0    0
## ka      0     0     0     0     0     0     0     0     0     0     0    0
## V       0     0     0     0     0     0     0     0     0     0     0    0
## Cl      0     0     0     0     0     0     0     0     0     0     0    0
##      cat20 cat3 cat4 cat5 cat6 cat7 cat8 cat9 sex age cont1 cont10 cont11
## Tlag     0    0    0    0    0    0    0    0   0   0     0      0      0
## ka       0    0    0    0    0    0    0    0   0   0     0      0      0
## V        0    0    0    0    0    0    0    0   0   0     0      0      0
## Cl       0    0    0    0    0    0    0    0   0   0     0      0      0
##      cont12 cont13 cont14 cont15 cont16 cont17 cont18 cont19 cont2 cont20
## Tlag      0      0      0      0      0      0      0      0     0      0
## ka        0      0      0      0      0      0      0      0     0      0
## V         0      0      0      0      0      0      0      0     0      0
## Cl        0      0      0      0      0      0      0      0     0      0
##      cont3 cont4 cont5 cont6 cont7 cont8 cont9 wt
## Tlag     0     0     0     0     0     0     0  0
## ka       0     0     0     0     0     0     0  0
## V        0     0     0     0     0     0     0  0
## Cl       0     0     0     0     0     0     0  0
##
## Correlation model:
## list()
##
## Residual error model:
##         y1
## "constant"
##
## Estimated criteria (importanceSampling):
##   -2LL   s.e.    AIC    BIC
## 739.57   0.13 757.57 770.77
## ____________________________________________
## Iteration 1:
##
## Covariate model:
##      cat1 cat10 cat11 cat12 cat13 cat14 cat15 cat16 cat17 cat18 cat19 cat2
## Tlag    0     0     0     0     0     0     0     0     0     0     0    0
## ka      0     0     0     0     0     0     0     0     0     0     0    0
## V       1     0     0     0     0     0     0     0     0     0     0    0
## Cl      0     0     0     0     0     0     0     0     0     0     0    0
##      cat20 cat3 cat4 cat5 cat6 cat7 cat8 cat9 sex age cont1 cont10 cont11
## Tlag     0    0    0    0    0    0    0    0   0   0     0      0      0
## ka       0    0    0    0    0    0    0    0   0   0     0      0      0
## V        0    0    0    0    1    0    0    0   0   0     0      0      0
## Cl       0    0    0    0    0    0    0    0   0   0     0      0      0
##      cont12 cont13 cont14 cont15 cont16 cont17 cont18 cont19 cont2 cont20
## Tlag      0      0      0      0      0      0      0      0     0      0
## ka        0      0      0      0      0      0      0      0     0      0
## V         0      0      0      0      0      0      0      0     0      0
## Cl        0      0      0      1      0      0      0      0     0      0
##      cont3 cont4 cont5 cont6 cont7 cont8 cont9 wt lage lwt
## Tlag     0     0     0     0     0     0     0  0    0   0
## ka       0     0     0     0     0     0     0  0    0   0
## V        1     0     0     0     0     0     0  0    0   1
## Cl       0     0     0     0     0     0     0  0    1   1
##
## Correlation model:
## list()
##
## Residual error model:
##          y1
## "combined2"
## Run scenario for model 1 ...
## Estimation of the population parameters...
## Sampling from the conditional distribution...
## Estimation of the log-likelihood...
##
## Estimated criteria (importanceSampling):
##   -2LL   s.e.    AIC    BIC
## 629.43   0.08 663.43 688.35
## ____________________________________________
## Iteration 2:
##
## Covariate model:
##      cat1 cat10 cat11 cat12 cat13 cat14 cat15 cat16 cat17 cat18 cat19 cat2
## Tlag    0     0     0     0     0     0     0     0     0     0     0    0
## ka      0     0     0     0     0     0     0     0     0     0     0    0
## V       0     1     0     0     0     0     0     0     0     0     0    0
## Cl      0     0     0     0     0     0     0     0     0     0     0    0
##      cat20 cat3 cat4 cat5 cat6 cat7 cat8 cat9 sex age cont1 cont10 cont11
## Tlag     0    0    0    0    0    0    0    0   0   0     0      0      0
## ka       0    0    0    0    0    0    0    0   0   0     0      0      0
## V        0    0    0    0    1    1    0    0   0   0     0      0      0
## Cl       0    0    0    0    0    0    0    0   0   0     0      0      0
##      cont12 cont13 cont14 cont15 cont16 cont17 cont18 cont19 cont2 cont20
## Tlag      0      0      0      0      0      0      0      0     0      0
## ka        0      0      0      0      0      0      0      0     0      0
## V         0      0      0      0      0      0      0      0     0      0
## Cl        0      0      0      1      0      0      0      0     0      0
##      cont3 cont4 cont5 cont6 cont7 cont8 cont9 wt lage lwt
## Tlag     0     0     0     0     0     0     0  0    0   0
## ka       0     0     0     0     0     0     0  0    0   0
## V        1     0     0     1     0     0     0  0    0   1
## Cl       0     0     0     0     0     0     0  0    1   1
##
## Correlation model:
## list()
##
## Residual error model:
##          y1
## "combined2"
## Run scenario for model 2 ...
## Estimation of the population parameters...
## Sampling from the conditional distribution...
## Estimation of the log-likelihood...
##
## Estimated criteria (importanceSampling):
##   -2LL   s.e.    AIC    BIC
## 610.10   0.09 648.10 675.95
## No difference between two successive iterations
## ____________________________________________
## Final model:
##
## Covariate model:
##      age cat1 cat10 cat11 cat12 cat13 cat14 cat15 cat16 cat17 cat18 cat19
## Tlag   0    0     0     0     0     0     0     0     0     0     0     0
## ka     0    0     0     0     0     0     0     0     0     0     0     0
## V      0    0     1     0     0     0     0     0     0     0     0     0
## Cl     0    0     0     0     0     0     0     0     0     0     0     0
##      cat2 cat20 cat3 cat4 cat5 cat6 cat7 cat8 cat9 cont1 cont10 cont11
## Tlag    0     0    0    0    0    0    0    0    0     0      0      0
## ka      0     0    0    0    0    0    0    0    0     0      0      0
## V       0     0    0    0    0    1    1    0    0     0      0      0
## Cl      0     0    0    0    0    0    0    0    0     0      0      0
##      cont12 cont13 cont14 cont15 cont16 cont17 cont18 cont19 cont2 cont20
## Tlag      0      0      0      0      0      0      0      0     0      0
## ka        0      0      0      0      0      0      0      0     0      0
## V         0      0      0      0      0      0      0      0     0      0
## Cl        0      0      0      1      0      0      0      0     0      0
##      cont3 cont4 cont5 cont6 cont7 cont8 cont9 lage lwt sex wt
## Tlag     0     0     0     0     0     0     0    0   0   0  0
## ka       0     0     0     0     0     0     0    0   0   0  0
## V        1     0     0     1     0     0     0    0   1   0  0
## Cl       0     0     0     0     0     0     0    1   1   0  0
##
## Correlation model:
## list()
##
## Residual error model:
##          y1
## "combined2"
##
## Estimated criteria (importanceSampling):
##   -2LL   s.e.    AIC    BIC
## 610.10   0.09 648.10 675.95
## ____________________________________________
## total time: 15.7s
## ____________________________________________

Log-weight and log-age are found again by buildmlx, as with the original data. There are only six additional false discoveries in the final model:

warfPK40.test <- testmlx(warfPK40.res$project) pp <- warfPK40.test$covariate$p.value.parameters print(pp[pp$in.model,])
##     parameter covariate   p.value in.model
## 122     log.V       lwt 4.600e-09     TRUE
## 140     log.V      cat6 4.566e-03     TRUE
## 141     log.V      cat7 1.921e-02     TRUE
## 125     log.V     cat10 4.935e-02     TRUE
## 100     log.V    ccont6 5.130e-02     TRUE
## 99      log.V    ccont3 2.098e-01     TRUE
## 170    log.Cl       lwt 2.245e-02     TRUE
## 146    log.Cl   ccont15 6.208e-02     TRUE
## 169    log.Cl      lage 7.666e-02     TRUE

# BIC versus correlation test

We show with this (simulated) example that putting a covariate on a parameter can significantly improve BIC, even if the correlation between this covariate and the parameter is not significant.

sim.project <- "projects/simulPK_project.mlxtran"

Let us start by testing all covariates on all parameters

sim.res1 <- buildmlx(project=sim.project, print=FALSE)
print(sim.res1$covariate.model) ##$ka
##    C1    C2    C3    C4    C5
##  TRUE FALSE FALSE FALSE FALSE
##
## $V ## C1 C2 C3 C4 C5 ## FALSE TRUE TRUE FALSE FALSE ## ##$Cl
##    C1    C2    C3    C4    C5
##  TRUE FALSE FALSE  TRUE  TRUE

Cl is function of C5 in the final model. Nevertheless, we can check that the correlation between C5 and log.Cl is not significant

sim.test1 <- testmlx(sim.res1$project) pp <- sim.test1$covariate$p.value.parameters print(pp[pp$in.model,])
##    parameter covariate   p.value in.model
## 6     log.ka       cC1 5.421e-13     TRUE
## 18     log.V       cC3 1.326e-12     TRUE
## 17     log.V       cC2 2.289e-12     TRUE
## 26    log.Cl       cC1 1.776e-06     TRUE
## 29    log.Cl       cC4 2.419e-05     TRUE
## 30    log.Cl       cC5 5.511e-01     TRUE

We can decide that covariate C5 may in the model for Cl only if the correlation between C5 and log.Cl is significant enough, i.e. if the p-value of the correlation test is less than 0.2 for instance

sim.res2 <- buildmlx(project=project, print=FALSE, p.max=0.2)
print(sim.res2$covariate.model) ##$ka
##    C1    C2    C3    C4    C5
##  TRUE FALSE FALSE FALSE FALSE
##
## $V ## C1 C2 C3 C4 C5 ## FALSE TRUE TRUE FALSE FALSE ## ##$Cl
##    C1    C2    C3    C4    C5
##  TRUE FALSE FALSE  TRUE FALSE

# Strarting from different statistical models

There are 6 individual parameters and 3 covariates in this example.

In this first run, the initial model is a simple statistical model, without any covariate:

remi.res1 <- buildmlx("projects/remifentanil1.mlxtran", covToTransform = "all", print=FALSE)
print(remi.res1)
## $project ## [1] "projects/remifentanil1_built.mlxtran" ## ##$covariate.model
## $covariate.model$Cl
##   SEX   AGE   LBM  lAGE  lLBM
## FALSE FALSE FALSE  TRUE  TRUE
##
## $covariate.model$V1
##   SEX   AGE   LBM  lAGE  lLBM
## FALSE FALSE  TRUE  TRUE FALSE
##
## $covariate.model$Q2
##   SEX   AGE   LBM  lAGE  lLBM
## FALSE  TRUE FALSE FALSE FALSE
##
## $covariate.model$V2
##   SEX   AGE   LBM  lAGE  lLBM
## FALSE  TRUE  TRUE FALSE FALSE
##
## $covariate.model$Q3
##   SEX   AGE   LBM  lAGE  lLBM
## FALSE  TRUE FALSE FALSE FALSE
##
## $covariate.model$V3
##   SEX   AGE   LBM  lAGE  lLBM
## FALSE FALSE FALSE  TRUE FALSE
##
##
## $correlation.model ##$correlation.model[[1]]
## [1] "Cl" "Q2" "V2" "Q3"
##
##
## $error.model ## y_1 ## "combined1" Let us start now with a totally different initial statistical model, assuming that all parameters are function of all covariates remi.res2 <- buildmlx("projects/remifentanil2.mlxtran", covToTransform = "all", print=FALSE) print(remi.res2) ##$project
## [1] "projects/remifentanil2_built.mlxtran"
##
## $covariate.model ##$covariate.model$Cl ## SEX AGE LBM lAGE lLBM ## FALSE FALSE FALSE TRUE TRUE ## ##$covariate.model$V1 ## SEX AGE LBM lAGE lLBM ## FALSE FALSE TRUE TRUE FALSE ## ##$covariate.model$Q2 ## SEX AGE LBM lAGE lLBM ## FALSE TRUE FALSE FALSE FALSE ## ##$covariate.model$V2 ## SEX AGE LBM lAGE lLBM ## FALSE TRUE FALSE FALSE FALSE ## ##$covariate.model$Q3 ## SEX AGE LBM lAGE lLBM ## TRUE TRUE FALSE FALSE FALSE ## ##$covariate.model$V3 ## SEX AGE LBM lAGE lLBM ## TRUE FALSE FALSE TRUE FALSE ## ## ##$correlation.model
## $correlation.model[[1]] ## [1] "Cl" "Q2" "V2" "Q3" ## ## ##$error.model
##         y_1
## "combined1"

Even if the two initial models are very different, the two final models are very similar.