The tarball data.tar.gz contains all data, including XYZ files, property CSV files, extended XYZ files (for MACE), and dataset splits.
It includes three datasets: TM-GSspinPlus, tmPHOTO, and Octa-MK. For a detailed description, see README_DATA.md.
The tarballs representation_TM-GSspinPlus.tar.gz, representation_tmPHOTO.tar.gz, and
representation_Octa-MK.tar.gz contain NumPy arrays of molecular representations used in this work.
For Octa-MK:
HOMO_LUMO_gap/: contains representations for HOMO, LUMO, and gap, using both low-spin and high-spin geometries and corresponding spin statessplitting/: contains representations for spin splitting, using low-spin geometries and low-spin stateThe subdirectory cMBDF_MODA_PC3_MAOC/ contains cMBDF, MODA, and PC3-MAOC representations discussed in the Supporting Information.
Files named refcode-{dataset}.txt provide the refcode ordering for the corresponding NumPy arrays.
The tarball MACE.tar.gz contains trained MACE models for intensive property prediction, along with SLURM job scripts and logs for all three datasets.
For each dataset, models are organized by type:
MACE_equivariant/: Equivariant MACE (max_L = 2) with model="MACE" for predicting energies (splitting, HOMO, LUMO, or gap)MACE_invariant/: Invariant MACE models (max_L = 0) using model="MACE" for predicting energies (splitting, HOMO, LUMO, or gap)AtomicDipolesMACE/: Equivariant dipole MACE (max_L = 2) with model="AtomicDipolesMACE" for predicting the dipole moment magnitude.job: SLURM job scripts (update the local path to your source-built MACE installation and the paths for --train_file, and --test_file).out: training and evaluation logs (optional, for reference)*.model: Final trained models used for reported results and inferenceembedding in the name include charge and spin embeddings (additional hyperparameter: --embedding_specs)The tarball 3DMol.tar.gz contains trained 3DMol models and logs for each dataset.
*best_checkpoint.pt: checkpoint of the best-performing model*.log: training and evaluation log filesemb in the name include charge and spin embeddingsglobal in the name correspond to the global variant (full molecule)local in the name correspond to the local variant (metal center only)COMMAND> shown in the corresponding log file--splitter: text or NumPy file containing test set indices--dataset: dataset loader path adjusted to your local setupNote: In MACE.tar.gz and 3DMol.tar.gz, the Octa-MK dataset is labeled as OctaKulik in file names or within files.