1 Quick start
The main functions for regression-type models are modelSelection
and bestBIC
(along with companions like bestEBIC
, bestAIC
and bestIC
).
Details are in subsequent sections. Below we illustrate how to obtain:
- Information criteria for all models (including those from MCMC exploration)
- Posterior model probabilities
- Marginal posterior inclusion probabilities
- BMA point estimates and posterior intervals
See modelSelectionGGM
for Gaussian graphical models and bfnormmix
for Gaussian mixture models.
1.1 Linear regression
We simulate linear regression data \[ y_i = \sum_{j=1}^p x_{ij} \theta_j + \epsilon_i, \] for \(p=3\) covariates and \(i=1,\ldots,100\) individuals. We set regression coefficients \(\theta_1= 1\), \(\theta_2= 1\), \(\theta_3=0\) and \(\epsilon_i \sim N(0,1)\), and the random number seed for reproducibility. It is good practice to store the outcome and covariates into a data frame, as we do below.
library(modelSelection)
set.seed(1234)
x <- matrix(rnorm(100*3), nrow=100, ncol=3)
theta <- matrix(c(1,1,0), ncol=1)
y <- x %*% theta + rnorm(100)
df <- data.frame(y, x)
1.1.1 L0 criteria
bestBIC
obtains the BIC for all models.
As usual in R we can use formulas like y ~ X1 + X2 + X3
(provided the variables are stored in a data frame), or simply y ~ .
to consider all the variables in the data (other than y
) as covariates.
An intercept is added automatically, giving a total of 4 variables and \(2^4=16\) possible models (the intercept can be removed by adding -1 to the formula, as usual). bestBIC
enumerates these 16 models and finds the model with best (lowest) BIC (when there are many covariates, MCMC is used to explore the model space).
A much faster alternative is to use a heuristic implemented in bestBIC_fast
, based on coordinate descent and other tricks, see Section 3 (for Gaussian and binary outcomes it relies on package L0Learn
(Hazimeh, Mazumder, and Nonet 2023)).
We set verbose=FALSE
to avoid printing progress information and cluttering the book.
For our simulated data the selected model matches the data-generating truth, which only features the first two covariates.
## icfit object
## Model with best BIC: X1 X2
## Use summary(), coef() and predict() to get inference for the top model
## Use coef(object$msfit) and predict(object$msfit) to get BMA estimates and predictions
We list the BIC for the top 5 models (index 1 corresponds to the intercept, 2 to x[,1]
and so on). We can also use standard functions like summary
and coef
to view the MLE for the best model, and predict
to obtain predictions for new data.
## # A tibble: 5 × 2
## modelid bic
## <chr> <dbl>
## 1 2,3 302.
## 2 2,3,4 307.
## 3 1,2,3 307.
## 4 1,2,3,4 311.
## 5 3 381.
##
## Call:
## glm(formula = f, family = family2glm(ms$family), data = data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## X1 1.1505 0.1022 11.26 <2e-16 ***
## X2 1.1509 0.1006 11.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.06776)
##
## Null deviance: 371.43 on 100 degrees of freedom
## Residual deviance: 104.64 on 98 degrees of freedom
## AIC: 294.32
##
## Number of Fisher Scoring iterations: 2
## X1 X2
## 1.150549 1.150920
1.1.2 Bayesian model selection
A limitation of L0 criteria is that they ignore the uncertainty in the selected model, i.e. we’re not completely sure that it’s the correct one.
We use BMS to assess that uncertainty, in a Bayesian sense.
BMS requires setting a prior distribution on the models and on the parameters for each model.
For now, we run modelSelection
with default priors (Beta-Binomial prior on the models, pMOM prior with default prior precision on the coefficients).
priorCoef <- momprior()
priorModel <- modelbbprior()
fit.bms <- modelSelection(y ~ ., data=df,
priorCoef=priorCoef, priorModel=priorModel,
verbose=FALSE)
postProb
shows posterior model probabilities (sorted decreasingly), and coef
gives BMA point estimates, 0.95 posterior intervals, and marginal posterior inclusion probabilities \(P(\theta_j \neq 0 \mid y)\).
Below, phi
refers to the error variance \(\sigma^2\) (which is included with probability 1).
In our example, BMS selects the right covariates and assigns high posterior probability to that solution, as one would ideally wish.
## estimate 2.5% 97.5% margpp
## (Intercept) 0.007082034 -0.02658464 0.04089499 0.007366249
## X1 1.133309621 0.93331088 1.33480178 1.000000000
## X2 1.134404673 0.93919629 1.33501531 1.000000000
## X3 0.000366013 0.00000000 0.00000000 0.008254065
## phi 1.103715115 0.84213596 1.44604848 1.000000000
## modelid family pp
## 7 2,3 normal 9.845428e-01
## 8 2,3,4 normal 8.090989e-03
## 15 1,2,3 normal 7.203173e-03
## 16 1,2,3,4 normal 1.630761e-04
## 3 3 normal 3.424188e-17
Finally, we can use predict
to obtain point predictions and 0.95 posterior predictive intervals.
## mean 2.5% 97.5%
## 1 -0.8928148 -1.1160111 -0.66885457
## 2 -0.2161415 -0.3514485 -0.08236455
## 3 1.3134407 1.0653356 1.56205993
## 4 -3.2261301 -3.6793885 -2.77249364
## 5 -0.4427614 -0.6498843 -0.23853199
## 6 0.7716784 0.6332783 0.90914325
1.2 Logistic regression
For binary outcomes, we simply specify family='binomial'
(and for Poisson we specify family='poisson'
).
We first create a binary version of our simulated outcome.
We next use bestBIC
and modelSelection
as before. The selected model is still the correct one, but the posterior probability for (wrongly) including x[,3]
is higher than in the linear regression data. This is intuitively expected, binary outcomes carry less information than Gaussian ones, so there is more uncertainty on the chosen model.
## icfit object
## Model with best BIC: X1 X2
## Use summary(), coef() and predict() to get inference for the top model
## Use coef(object$msfit) and predict(object$msfit) to get BMA estimates and predictions
fit2.bms <- modelSelection(ybin ~ ., data=dfbin,
priorCoef=priorCoef, priorModel=priorModel,
family='binomial', verbose=FALSE)
coef(fit2.bms)
## Warning in hasPostSampling(object): Exact posterior sampling not implemented,
## using Normal approx instead
## estimate 2.5% 97.5% margpp
## (Intercept) 0.16323081 0.03275118 0.2669387 2.730669e-10
## X1 1.38893900 0.78354882 2.0102542 1.000000e+00
## X2 1.05744788 0.53310589 1.6067301 9.999136e-01
## X3 0.07043283 0.00000000 0.7391405 1.695233e-01
1.3 Non-Linear effects via Generalized Additive Models (GAMs)
Non-linear effects can be modeled via cubic splines using the smooth
argument (the default is 9 knots, producing 5 columns in design matrix for each non-linear covariate).
When using the smooth argument we cannot use the ~ .
notation for including all covarites, rather we must list those for which we wish to include a non-linear effect (see the example below).
The effect of each covariate is decomposed as a linear part plus a deviation from linearity (which is forced to be orthogonal to the linear term). This is useful to identify covariates for which a linear effect is sufficient, and covariates for which there are non-linearities.
modelSelection
considers 3 possibilities for each covariate: excluding it entirely, including only the linear effect, and including both linear and non-linear terms. For further details on this decomposition, see (D. Rossell and Rubio 2021).
The linear effect coefficients are displayed using the original variable names, and the non-linear coefficients with an .s
appended. Here we have 5 columns coding for the non-linear effect, labelled as .s1
though .s5
.
In our example, there is strong evidence for (correctly) including the linear effect of X1
and X2
, excluding their non-linear effects, and exclusing X3
entirely.
fit.gam <- modelSelection(y ~ ., data=df,
smooth = ~ X1 + X2 + X3,
priorCoef=priorCoef,
priorModel=priorModel, verbose=FALSE)
## Warning in hasPostSampling(object): Exact posterior sampling not implemented,
## using Normal approx instead
## estimate 2.5% 97.5% margpp
## (Intercept) 8.127021e-03 -0.01028342 0.02684467 7.499102e-03
## X1 1.140319e+00 1.02940753 1.25420735 1.000000e+00
## X2 1.139502e+00 1.03040279 1.24949508 1.000000e+00
## X3 1.579339e-04 0.00000000 0.00000000 8.545216e-03
## X1.s1 -1.088515e-04 0.00000000 0.00000000 4.026757e-04
## X1.s2 1.357836e-04 0.00000000 0.00000000 4.026757e-04
## X1.s3 -2.810730e-04 0.00000000 0.00000000 4.026757e-04
## X1.s4 2.847856e-04 0.00000000 0.00000000 4.026757e-04
## X1.s5 2.822071e-05 0.00000000 0.00000000 4.026757e-04
## X2.s1 4.295849e-03 0.00000000 0.00000000 4.296470e-03
## X2.s2 5.891295e-03 0.00000000 0.00000000 4.296470e-03
## X2.s3 2.200417e-03 0.00000000 0.00000000 4.296470e-03
## X2.s4 3.947671e-03 0.00000000 0.00000000 4.296470e-03
## X2.s5 1.376460e-03 0.00000000 0.00000000 4.296470e-03
## X3.s1 -1.142249e-04 0.00000000 0.00000000 1.855263e-05
## X3.s2 8.956766e-05 0.00000000 0.00000000 1.855263e-05
## X3.s3 7.400172e-05 0.00000000 0.00000000 1.855263e-05
## X3.s4 5.668310e-05 0.00000000 0.00000000 1.855263e-05
## X3.s5 1.327733e-04 0.00000000 0.00000000 1.855263e-05
## phi 1.093750e+00 0.82918219 1.43831078 1.000000e+00