* Preliminary

Clone the GitHub repository:
#+begin_src bash
git clone git@github.com:HaoZeke/brms_idrot_repro.git
#+end_src

* About
Contains the reproduction details for the publication on the performance and success models for the dimer across rotational optimizers and external rotation removal.

** Reference
If you use this repository or its parts please cite the corresponding publication or data source.

*** Preprint
#+begin_quote
R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” May 19, 2025, arXiv: arXiv:2505.13621. doi: 10.48550/arXiv.2505.13621.
#+end_quote

** Replication data
Remember to inflate the data using the materialscloud source before using the scripts in the repository. This can be done by running the following--assuming that the ~.xz~ files are in ~data~ relative to the repository root:
#+begin_src bash
# Fitted models with predictions
cd $GITROOT/data
tar -xf models_and_preds.tar.xz && rm -rf models_and_preds.tar.xz
# Raw benchmark data, i.e., EON output logs
cp $GITROOT/data/hpc.tar.xz $GITROOT/bench_runs/runs/hpc
cd $GITROOT/bench_runs/runs/hpc
tar -xf hpc.tar.xz && rm -rf hpc.tar.xz
#+end_src

*** Reusing models
To reuse the models and predictions, both F.A.I.R. formatted results and easier to work with ~R~ formats are provided.
**** From R
The record contains ~R~ objects for the ~brms~ models and their predictions. The model dependencies need to be loaded. After this, the base ~R~ function, ~readRDS~ will suffice:
#+begin_src R
library('brms')
model <- readRDS("data/models/brms_pes_cglbfgs_norot.rds")
#+end_src
The predictions are ~zstd~ compressed files (level 22) which need to be read using ~archive::file_read~ and ~readRDS~:
#+begin_src R
con <- archive::file_read(file = "data/models/preds/brms_pes_cg_rotrem.rds")
res<-readRDS(con)
close(con)
#+end_src
More helper functions for generating and using these models and predictions are in the Github repository.
**** F.A.I.R formatted usage
Without ~R~ the steps for access are a bit more involved. Predictions are provided as Apache Arrow Parquet files, along with the model training data. The model trained is also exported from ~brms~ into the ~stan~ code.

For each model (e.g., brms_pes_cg_rotrem), we provide three key components:

- Stan Code (.stan): The complete model definition translated into the Stan programming language. This is the logic of the model.
  - File example: data/fair_forms/stancode/brms_pes_cg_rotrem.stan
- Stan Data (.parquet): The data that was passed to the Stan model for fitting. This file is essential for re-running the model from scratch.
  - File example: data/fair_forms/standata_parquet/brms_pes_cg_rotrem_standata.zstd.parquet
- Predictions (.parquet): The pre-computed predictions generated by our R run of the model. This is the most direct way to use the model's output.
  - File example: data/fair_forms/brms_pes_cg_rotrem_preds.zstd.parquet
*** Structure
The repository itself is structured into code archives, benchmark runs, and scripts for analysis.
#+begin_src bash
➜ tree -L 2
.
├── bench_runs
│   ├── base_config.ini
│   ├── calc_rundata.py
│   ├── profiles
│   ├── readme.org
│   ├── rundata
│   ├── run_eon.py
│   ├── scripts
│   └── Snakefile
├── data
│   └── sella_si_data.zip
├── docs
│   └── source
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── scripts
│   └── env_setup.sh
└── subrepos
    ├── ase
    ├── chemparseplot
    ├── eOn
    ├── IterativeRotationsAssignments
    ├── nwchem
    ├── pychumpchem
    └── rgpycrumbs
#+end_src

Where the data in the archives expands to locations within the benchmarks.

Each of the benchmarks consists of the following structure:

#+begin_src bash
.
├── doublets
│   ├── 000
# .....
│   └── 234
└── singlets
│   ├── 000
# .....
    └── 264
#+end_src

Comprising of 500 systems.

*** EON Dimer runs
#+begin_src bash
# hpc.tar.xz
# $GITROOT/bench_runs/runs/hpc
➜ tree -L 3 .
.
├── cg
│   ├── no_rot_remove
│   │   ├── doublets
│   │   └── singlets
│   └── rot_remove
│       ├── doublets
│       └── singlets
└── lbfgs
    ├── no_rot_remove
    │   ├── doublets
    │   └── singlets
    └── rot_remove
        ├── doublets
        └── singlets
#+end_src