Structures and targets computed at r2SCAN and PBE levels of theory that comprise the
dataset are provided in extended XYZ format. Energies (specifically total energy under
key energy, and atomization energy under key atomization_energy), stresses
(stress), and lattice parameters (Lattice) are stored in the file headers, and atom
types (species), positions (pos), and forces (forces) are stored as
space-separated entries. Cartesian coordinates, energies, forces, and stresses are given
in A, eV, eV/A, and eV/A^3, respectively. Also stored in the header are the name of the
subset to which each structure belongs (subset) and the numeric index of the structure
within its subset (frame_id). Together these uniquely identify each structure in the
dataset.
The MAD-1.5 dataset is computed in its entirety at the r2SCAN level of theory.
Specifically, the record contains:
mad-1.5-r2scan-train.xyz -> the training split used in trainng PET-MAD-1.5 models
mad-1.5-r2scan-val.xyz -> the validation split used in training PET-MAD-1.5 models
mad-1.5-r2scan-test.xyz -> the test split used in evaluating PET-MAD-1.5 models
mad-1.5-r2scan-llpr-rejected.xyz -> the 8244 structures rejected in the LLPR-uncertainty-based cleaning step
Additionally we provide structures, energies, forces, and stresses (for periodic structures) from the MAD-1 subsets (MC3D, MC3D-rattled, MC3D-random, MC3D-surface, MC3D-cluster, MC2D, ShiftML-molcrys, ShiftML-molfrags) plus monomers and MC3D-random-extended from MAD-1.5, computed with the PBE functional but with all other DFT settings kept consistent with the r2SCAN calculations.
Targets computed with PBE were used in model training but targeted with separate heads than the r2SCAN targets, found to help training and result in lower force errors (more details can be found in the supporting preprint). Cross-validation splits used in training were consistent with the above r2SCAN splits.
Specifically, we provide the file:
Other files made available in this data archive: