* Preliminary
Clone the GitHub repository:
#+begin_src bash
git clone git@github.com:HaoZeke/brms_idrot_repro.git
#+end_src
* About
Contains the reproduction details for the publication on the performance and success models for the dimer across rotational optimizers and external rotation removal.
** Reference
If you use this repository or its parts please cite the corresponding publication or data source.
*** Preprint
#+begin_quote
R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” May 19, 2025, arXiv: arXiv:2505.13621. doi: 10.48550/arXiv.2505.13621.
#+end_quote
** Replication data
Remember to inflate the data using the materialscloud source before using the scripts in the repository. This can be done by running the following--assuming that the ~.xz~ files are in ~data~ relative to the repository root:
#+begin_src bash
# Fitted models with predictions
cd $GITROOT/data
tar -xf models_and_preds.tar.xz && rm -rf models_and_preds.tar.xz
# Raw benchmark data, i.e., EON output logs
cp $GITROOT/data/hpc.tar.xz $GITROOT/bench_runs/runs/hpc
cd $GITROOT/bench_runs/runs/hpc
tar -xf hpc.tar.xz && rm -rf hpc.tar.xz
#+end_src
*** Reusing models
To reuse the models and predictions, both F.A.I.R. formatted results and easier to work with ~R~ formats are provided.
**** From R
The record contains ~R~ objects for the ~brms~ models and their predictions. The model dependencies need to be loaded. After this, the base ~R~ function, ~readRDS~ will suffice:
#+begin_src R
library('brms')
model <- readRDS("data/models/brms_pes_cglbfgs_norot.rds")
#+end_src
The predictions are ~zstd~ compressed files (level 22) which need to be read using ~archive::file_read~ and ~readRDS~:
#+begin_src R
con <- archive::file_read(file = "data/models/preds/brms_pes_cg_rotrem.rds")
res<-readRDS(con)
close(con)
#+end_src
More helper functions for generating and using these models and predictions are in the Github repository.
**** F.A.I.R formatted usage
Without ~R~ the steps for access are a bit more involved. Predictions are provided as Apache Arrow Parquet files, along with the model training data. The model trained is also exported from ~brms~ into the ~stan~ code.
For each model (e.g., brms_pes_cg_rotrem), we provide three key components:
- Stan Code (.stan): The complete model definition translated into the Stan programming language. This is the logic of the model.
- File example: data/fair_forms/stancode/brms_pes_cg_rotrem.stan
- Stan Data (.parquet): The data that was passed to the Stan model for fitting. This file is essential for re-running the model from scratch.
- File example: data/fair_forms/standata_parquet/brms_pes_cg_rotrem_standata.zstd.parquet
- Predictions (.parquet): The pre-computed predictions generated by our R run of the model. This is the most direct way to use the model's output.
- File example: data/fair_forms/brms_pes_cg_rotrem_preds.zstd.parquet
*** Structure
The repository itself is structured into code archives, benchmark runs, and scripts for analysis.
#+begin_src bash
➜ tree -L 2
.
├── bench_runs
│ ├── base_config.ini
│ ├── calc_rundata.py
│ ├── profiles
│ ├── readme.org
│ ├── rundata
│ ├── run_eon.py
│ ├── scripts
│ └── Snakefile
├── data
│ └── sella_si_data.zip
├── docs
│ └── source
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── scripts
│ └── env_setup.sh
└── subrepos
├── ase
├── chemparseplot
├── eOn
├── IterativeRotationsAssignments
├── nwchem
├── pychumpchem
└── rgpycrumbs
#+end_src
Where the data in the archives expands to locations within the benchmarks.
Each of the benchmarks consists of the following structure:
#+begin_src bash
.
├── doublets
│ ├── 000
# .....
│ └── 234
└── singlets
│ ├── 000
# .....
└── 264
#+end_src
Comprising of 500 systems.
*** EON Dimer runs
#+begin_src bash
# hpc.tar.xz
# $GITROOT/bench_runs/runs/hpc
➜ tree -L 3 .
.
├── cg
│ ├── no_rot_remove
│ │ ├── doublets
│ │ └── singlets
│ └── rot_remove
│ ├── doublets
│ └── singlets
└── lbfgs
├── no_rot_remove
│ ├── doublets
│ └── singlets
└── rot_remove
├── doublets
└── singlets
#+end_src