The tar ball file `datasets.tar.gz` contains three folders corresponding to each dataset used in the article.
Each of them contains the geometries (xyz-files), SMILES and properties (CSV-file), and the raw binary data (data-splits, results, and fingerprints/representations)
./cyclo:
full_dataset.csv full dataset and target properties
dataset_subset_750.csv Subset splitting and properties
B2R2(l)-model
b2r2_l_10_fold.npy results on the 10 fold cross-validation datasplits
b2r2_l_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries)
b2r2_l.npy representations for the full dataset
b2r2_l_xtb.npy representations for the full dataset (xtb geometries)
DRFP-model
drfp_10_fold.npy results on the 10 fold cross-validation datasplits
drfp.npy representations for the full dataset
MFP-model
mfp_10_fold.npy results on the 10 fold cross-validation datasplits
mfp.npy representations for the full dataset
SLATM-model
slatm_10_fold.npy results on the 10 fold cross-validation datasplits
slatm_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries)
Geometries
xyz DFT-level geometries
xyz-xtb xTB-level geometries
./gdb7-22-ts:
ccsdtf12_dz.csv ccsd-level computed data and target properties
ccsdtf12_dz_subset_750.csv subset ccsd-level computed data and target properties
tr_sizes.npy training sizes for each split
B2R2(l)-model
b2r2_l_10_fold.npy results on the 10 fold cross-validation datasplits
b2r2_l_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries)
b2r2_l.npy representations for the full dataset
b2r2_l_xtb.npy representations for the full dataset (xtb geometries)
DRFP-model
drfp_10_fold.npy results on the 10 fold cross-validation datasplits
drfp.npy representations for the full dataset
MFP-model
mfp_10_fold.npy results on the 10 fold cross-validation datasplits
mfp.npy representations for the full dataset
SLATM-model
results on the 10 fold cross-validation datasplits
slatm_10_fold.npy results on the 10 fold cross-validation datasplits (xtb geometries)
slatm_10_fold_xtb.npy
Geometries
xyz DFT-level geometries
xyz-xtb xTB-level geometries
./proparg:
data.csv full dataset and target properties
data_fixarom_smiles.csv fixed aromaticity
data_fixarom_smiles_stereo.csv fixed stereochemistry
data_subset_750.csv subset splitting
B2R2(l)-model
b2r2_l_10_fold.npy results on the 10 fold cross-validation datasplits
b2r2_l_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries)
b2r2_l.npy representations for the full dataset
b2r2_l_xtb.npy representations for the full dataset (xtb geometries)
DRFP-model
drfp.npy representations for the full dataset
drfp_10_fold.npy results on the 10 fold cross-validation datasplits
drfp_combinatorial.npy representations for the full dataset
drfp_combinatorial_10_fold.npy results on the 10 fold cross-validation datasplits
drfp_stereo.npy representations for the full dataset (including stereochemistry)
drfp_stereo_10_fold.npy results on the 10 fold cross-validation datasplits
MFP-model
mfp.npy representations for the full dataset
mfp_10_fold.npy results on the 10 fold cross-validation datasplits
mfp_combinatorial_10_fold.npy representations for the full dataset
mfp_combinatorial.npy results on the 10 fold cross-validation datasplits
mfp_stereo_10_fold.npy representations for the full dataset (including stereochemistry)
mfp_stereo.npy results on the 10 fold cross-validation datasplits
SLATM-model
slatm_10_fold.npy results on the 10 fold cross-validation datasplits
slatm_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries)
Geometries
xyz DFT-level geometries
xyz-xtb xTB-level geometries