Published June 1, 2025 | Version v1
Dataset Open

Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms

  • 1. Science Institute and Faculty of Physical Sciences, University of Iceland, Reykjavík, Iceland
  • 2. Department of Mechanical and Materials Engineering, Queen's University, Kingston, Ontario, Canada, K7L 3N6

* Contact person

Description

The increasing use of high-throughput computational chemistry demands rigorous methods for evaluating algorithm performance. We present a Bayesian hierarchical modeling paradigm (brms/Stan) for analyzing key performance metrics: function evaluations, computation time, and success/failure. This framework accounts for variability across different systems and functionals, providing reliable uncertainty estimates beyond subjective visual assessments or frequentist limitations. We applied this to compare conjugate gradient (CG) and L-BFGS algorithms for the Dimer method's rotation phase (in EON, with/without removal of external rotations) on a benchmark of 500 initial saddle search approximations, analyzing over 2000 runs. Our results show CG rotations generally outperform L-BFGS, exhibiting a statistically credible, small reduction in PES calls and significantly higher odds of successful convergence. Conversely, enabling rotation removal incurred a substantial PES call penalty without a corresponding credible improvement in success odds in the implementation studied. These findings, from our novel Bayesian hierarchical modeling application, suggest CG may be preferable for Dimer rotational optimization in similar contexts. This robust statistical framework highlights benefits for revisiting optimization strategies, quantifying uncertainty, and facilitating improved high-throughput computational chemistry methods. This record contains the saddle search output logs for EON with NWChem across four settings, with/without external rotation and the use of CG/LBFGS for the rotational phase of the dimer. The record also includes fitted Bayesian Hierarchical models for performance and success analysis. These models and data are used to generate the figures and validate the analysis in the manuscript. For details, refer to the code in the associated GitHub repository.

Files

File preview

files_description.md

All files

Files (1.7 GiB)

Name Size
md5:5d92ff42fdc3f7ab1af34b71f83b724d
466 Bytes Preview Download
md5:5cd1034a2d822b45a4e918445b9c2e86
192.3 MiB Download
md5:a18d30d132e6dcfcd56b57823371a5c1
1.5 GiB Download
md5:d5e4323e70b1aec0dd4fa2901c99a8b5
4.5 KiB Preview Download

References

Preprint (Preprint describing the model and analysis of the data in the record.)
R. Goswami, "Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms," May 19, 2025, arXiv: arXiv:2505.13621., doi: 10.48550/arXiv.2505.13621

Software (Software collection used to generate the models and data in this record.)
R. Goswami, Github.