<a id="org5026264"></a>
Clone the GitHub repository:
git clone git@github.com:TheochemUI/otgpd_repro.git
<a id="orgaed715b"></a>
Contains the reproduction data for the publication on the wall-time efficient and robust optimal transport Gaussian Process dimer.
Third in a series of papers [1, 2] on the Dimer method with a focus on the failure modes of the Gaussian Process Regression model on the 500 system benchmark of Hermes et. al. [3].
<a id="orgbc606a4"></a>
If you use this repository or its parts please cite the corresponding publication or data source.
<a id="org3c106ce"></a>
- Goswami, and H. Jónsson, “Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches,” Oct 08, 2025, arXiv
<a id="org36b01fe"></a>
Remember to inflate the data using the materialscloud source before using the
scripts in the repository. This can be done by running the
following–assuming that the .xz files are in data relative to the
repository root:
# Fitted models with predictions
cd $GITROOT/data
tar -xf models_and_preds.tar.xz && rm -rf models_and_preds.tar.xz
# Raw benchmark data, i.e., OTGPD EON output logs
cp $GITROOT/data/softest_mode_scg_barrier.tar.xz $GITROOT/runs/automated/snake_runs/softest_mode_scg_barrier
cd $GITROOT/runs/automated/snake_runs/softest_mode_scg_barrier
tar -xf softest_mode_scg_barrier.tar.xz && rm -rf softest_mode_scg_barrier.tar.xz
<a id="org0f9753c"></a>
To reuse the models and predictions, both F.A.I.R. formatted results and easier
to work with R formats are provided.
The record contains R objects for the brms models and their predictions.
The model dependencies need to be loaded. After this, the base R function,
readRDS will suffice:
library('brms')
model <- readRDS("data/models/brms_pes.rds")
More helper functions for generating and using these models and predictions are in the Github repository.
Without R the steps for access are a bit more involved. Predictions are
provided as Apache Arrow Parquet files, along with the model training data.
The model trained is also exported from brms into the stan code.
For each model (e.g., brms_pes), we provide three key components:
.stan): The complete model definition translated into the Stan
programming language. This is the logic of the model. e.g.
data/stancode/brms_pes.stan.parquet): The data that was passed to the Stan model for
fitting. This file is essential for re-running the model from scratch.e.g.
data/standata_parquet/brms_pes_standata.zstd.parquet<a id="orgc38b286"></a>
The repository has code archives, benchmark runs, and scripts for analysis.
❯ tree -L 2
.
├── CODEOWNERS
├── docs
│ ├── 00_freeform.org
│ ├── 01_hpc.org
│ ├── 03_suppl_viz.org
│ ├── 04_models.org
│ └── 05_suppl_py.org
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── runs
│ ├── automated
│ ├── calc_rundata.py
│ ├── init_condcheck.py
│ └── run_pf.py
├── scripts
│ ├── build_nwchem.sh
│ └── env_setup.sh
└── subrepos
├── chemparseplot
├── eOn
├── gpr_optim
├── IterativeRotationsAssignments
├── readme.org
└── rgpycrumbs
Where the data in the archives expands to locations within the benchmarks.
Each of the benchmarks consists of the following structure:
.
├── doublet
│ ├── 000
# .....
│ └── 234
└── singlet
│ ├── 000
# .....
└── 264
Comprising of 500 systems.
For comparisons:
<a id="org8da5349"></a>
# softest_mode_scg_barrier.tar.xz
# $GITROOT/runs/automated/snake_runs/softest_mode_scg_barrier
tree -L 1 .
.
├── doublet
└── singlet
<a id="orgc2e084b"></a>
[1] R. Goswami, M. Masterov, S. Kamath, A. Pena-Torres, and H. Jónsson, “Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions,” J. Chem. Theory Comput., Jul. 2025, doi: 10.1021/acs.jctc.5c00866.
[2] R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” AIP Adv., vol. 15, no. 8, p. 85210, Aug. 2025, doi: 10.1063/5.0283639.
[3] E. D. Hermes, K. Sargsyan, H. N. Najm, and J. Zádor, “Sella, an Open-Source Automation-Friendly Molecular Saddle Point Optimizer,” J. Chem. Theory Comput., vol. 18, no. 11, pp. 6974–6988, Nov. 2022, doi: 10.1021/acs.jctc.2c00395.