Published January 30, 2025 | Version v1
Dataset Open

Analysis of bootstrap and subsampling in high-dimensional regularized regression (code)

  • 1. École Polytechnique Fédérale de Lausanne (EPFL), Statistical Physics of Computation laboratory, CH-1015 Lausanne, Switzerland
  • 2. École Polytechnique Fédérale de Lausanne (EPFL), Information, Learning and Physics laboratory, CH-1015 Lausanne, Switzerland
  • 3. École Polytechnique Fédérale de Lausanne (EPFL), Information and Network Dynamics laboratory, CH-1015 Lausanne, Switzerland
  • 4. Département d'Informatique, École Normale Supérieure - PSL & CNRS, Paris, France

* Contact person

Description

We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples n and dimension d of the covariates grow at a comparable fixed rate α = n/d. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when α is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime α < 1 relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization. This record provides the code to reproduce the numerical experiments of the related paper "Analysis of bootstrap and subsampling in high-dimensional regularized regression".

Files

File preview

files_description.md

All files

Files (957.8 KiB)

Name Size
md5:48609dda85b02cbafe1d87597c7cc702
318 Bytes Preview Download
md5:9ecf4b0632902209f673b53919ac1512
957.0 KiB Preview Download
md5:98c73ff79efc66b38ed648aad8eef65e
500 Bytes Preview Download

References

Journal reference
L. Clarté, A. Vandenbroucque, G. Dalle, B. Loureiro, F. Krzakala, L. Zdeborová, Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244, 787-819 (2024)