Preview

Preliminary
About

Preliminary

Clone the GitHub repository:

git clone git@github.com:TheochemUI/otgpd_repro.git

About

Contains the reproduction data for the publication on the wall-time efficient and robust optimal transport Gaussian Process dimer.

Third in a series of papers [1, 2] on the Dimer method with a focus on the failure modes of the Gaussian Process Regression model on the 500 system benchmark of Hermes et. al. [3].

Reference

If you use this repository or its parts please cite the corresponding publication or data source.

Preprint

Goswami, and H. Jónsson, “Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches,” Oct 08, 2025, arXiv

Replication data

Remember to inflate the data using the materialscloud source before using the scripts in the repository. This can be done by running the following–assuming that the .xz files are in data relative to the repository root:

# Fitted models with predictions
cd $GITROOT/data
tar -xf models_and_preds.tar.xz && rm -rf models_and_preds.tar.xz
# Raw benchmark data, i.e., OTGPD EON output logs
cp $GITROOT/data/softest_mode_scg_barrier.tar.xz $GITROOT/runs/automated/snake_runs/softest_mode_scg_barrier
cd $GITROOT/runs/automated/snake_runs/softest_mode_scg_barrier
tar -xf softest_mode_scg_barrier.tar.xz && rm -rf softest_mode_scg_barrier.tar.xz

Reusing models

To reuse the models and predictions, both F.A.I.R. formatted results and easier to work with R formats are provided.

From R

The record contains R objects for the brms models and their predictions. The model dependencies need to be loaded. After this, the base R function, readRDS will suffice:

library('brms')
model <- readRDS("data/models/brms_pes.rds")

More helper functions for generating and using these models and predictions are in the Github repository.

F.A.I.R formatted usage

Without R the steps for access are a bit more involved. Predictions are provided as Apache Arrow Parquet files, along with the model training data. The model trained is also exported from brms into the stan code. For each model (e.g., brms_pes), we provide three key components:

Stan Code (.stan): The complete model definition translated into the Stan programming language. This is the logic of the model. e.g. data/stancode/brms_pes.stan
Stan Data (.parquet): The data that was passed to the Stan model for fitting. This file is essential for re-running the model from scratch.e.g. data/standata_parquet/brms_pes_standata.zstd.parquet

Structure

The repository has code archives, benchmark runs, and scripts for analysis.

❯ tree -L 2
.
├── CODEOWNERS
├── docs
│   ├── 00_freeform.org
│   ├── 01_hpc.org
│   ├── 03_suppl_viz.org
│   ├── 04_models.org
│   └── 05_suppl_py.org
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── runs
│   ├── automated
│   ├── calc_rundata.py
│   ├── init_condcheck.py
│   └── run_pf.py
├── scripts
│   ├── build_nwchem.sh
│   └── env_setup.sh
└── subrepos
    ├── chemparseplot
    ├── eOn
    ├── gpr_optim
    ├── IterativeRotationsAssignments
    ├── readme.org
    └── rgpycrumbs

Where the data in the archives expands to locations within the benchmarks.

Each of the benchmarks consists of the following structure:

.
├── doublet
│   ├── 000
# .....
│   └── 234
└── singlet
│   ├── 000
# .....
    └── 264

Comprising of 500 systems.

For comparisons:

GPDimer runs: Extract from the relevant materials cloud archive.
Dimer (rotation separated) runs: From this archive

OTGPD runs

# softest_mode_scg_barrier.tar.xz
# $GITROOT/runs/automated/snake_runs/softest_mode_scg_barrier
tree -L 1 .
.
├── doublet
└── singlet

References

[1] R. Goswami, M. Masterov, S. Kamath, A. Pena-Torres, and H. Jónsson, “Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions,” J. Chem. Theory Comput., Jul. 2025, doi: 10.1021/acs.jctc.5c00866.

[2] R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” AIP Adv., vol. 15, no. 8, p. 85210, Aug. 2025, doi: 10.1063/5.0283639.

[3] E. D. Hermes, K. Sargsyan, H. N. Najm, and J. Zádor, “Sella, an Open-Source Automation-Friendly Molecular Saddle Point Optimizer,” J. Chem. Theory Comput., vol. 18, no. 11, pp. 6974–6988, Nov. 2022, doi: 10.1021/acs.jctc.2c00395.

Table of Contents