The tarball datasets.tar.gz contains all data, including XYZ files, property CSV files, extended XYZ files (for MACE), and dataset splits.
It includes three datasets: TM-GSspin+, tmPHOTO, and OctaKulik. For a detailed description, see README_DATA.md.
The tarballs representation_TM-GSspinPlus.tar.gz, representation_tmPHOTO.tar.gz, and
representation_OctaKulik.tar.gz contain NumPy arrays of molecular representations used in this work.
For OctaKulik:
HOMO_LUMO_gap/: contains representations for HOMO, LUMO, and gap using both low-spin and high-spin geometries and corresponding spin statessplitting/: contains representations for spin splitting using low-spin geometries and low-spin stateThe subdirectory cMBDF_MODA_PC3_MAOC/ contains cMBDF, MODA, and PC3-MAOC representations discussed in the Supporting Information.
Files named refcode-{dataset}.txt provide the refcode ordering for the corresponding NumPy arrays.
The tarball MACE.tar.gz contains trained MACE models for intensive property prediction, along with SLURM job scripts and logs for all three datasets.
For each dataset, models are organized by type:
MACE_equivariant/: Equivariant MACE (max_L = 2) with model="MACE"MACE_invariant/: Invariant MACE (max_L = 0) with model="MACE"AtomicDipolesMACE/: Equivariant dipole MACE (max_L = 2) with model="AtomicDipolesMACE".job: SLURM job scripts (update the local path to your source-built MACE installation and the paths for --train_file, and --test_file).out: training and evaluation logs (optional, for reference)*_stagetwo.model: Final trained models used for reported results and inferenceembedding in the name include charge and spin embeddingsThe tarball 3DMol.tar.gz contains trained 3DMol models and logs for each dataset.
*best_checkpoint.pt: checkpoint of the best-performing model*.log: training and evaluation log filesemb in the name include charge and spin embeddingsglobal in the name correspond to the global variant (full molecule)local in the name correspond to the local variant (metal center only)COMMAND> shown in the corresponding log file--splitter: text or NumPy file containing test set indices--dataset: dataset loader path adjusted to your local setup