High-dimensional model choice. A hands-on take
2025-09-26
Preface
The motivation for writing this book was dual: provide supporting material for undergraduate and graduate students to whom I introduce fundamental notions behind high-dimensional model selection,
and serve as documentation for the R package modelSelection
(previously called mombf
) which I have been developing over the years.
With these two goals in mind, the book evolved into a hands-on guide to model selection.
The book focuses on sparse inference, mainly Bayesian model selection (BMS) and averaging (BMA), for a number of popular models listed below. It also implements L0 criteria like the AIC, BIC or EBIC (as well as more general information criteria). The R package’s C++ implementation is not optimal, but it’s designed to be minimally scalable in sparse high-dimensional settings (large \(p\)). A lot of work went into coding and maintaining the package, if you use it please cite at least one of the papers indicated below.
For a quick start guide with modelSelection
, see Section 1. The main models handled by the package are:
Generalized linear models: linear, logistic and Poisson regression. BMS, BMA and L0 criteria (Johnson and Rossell 2012; D. Rossell and Telesca 2017; D. Rossell, Abril, and Bhattacharya 2021).
Linear regression with non-normal residuals (D. Rossell and Rubio 2018), including asymmetric Normal, Laplace and asymmetric Laplace residuals.
Accelerated Failure Time models for right-censored survival data (D. Rossell and Rubio 2021).
Bayesian inference for Gaussian graphical models
Bayesian for Gaussian mixture models (Fúquene, Steel, and Rossell 2019).
On the Bayesian side, modelSelection
is the main package implementing non-local priors (NLPs) but other popular priors are also implemented, e.g. Zellner’s and Normal shrinkage priors in regression, or Gaussian spike-and-slab priors in graphical models.
NLPs are briefly reviewed in this book, see Johnson and Rossell (2010) and Johnson and Rossell (2012) for their model selection properties,
D. Rossell and Telesca (2017) for parameter estimation,
and D. Rossell, Abril, and Bhattacharya (2021) for computational approximations to marginal likelihoods.