The data contained inside AUGMENTED_VOLCANO_DATA.tar.gz are organised in 6 folders: 1) EPSim_MAPS: contains 2 binary NumPy array files (X.npy, Y.npy) and 1 text file (color.txt): X.npy: x-axis of the EPSim map. Contains the descriptor variable (DeltaG[I5]) for all the catalysts and the Sabatier ideal. Y.npy: y-axis of the EPSim map. Contains a normalized similarity measure of each potential catalyst to the Sabatier ideal. color.txt (3 columns). 1st column: Name of compound (SABATIER, training and oos-test set). 2nd column: integer variable for coloring the map according to the pds of each catalyst. 3rd column: integer variable for coloring the map according to metal center. 2) geometries: contains 2 subfolders: train: geometry of each compound included in the training set (xyz format). oos_test: geometry of each compound included in the out-of-sample test set (xyz format). 3) INTERMEDIATE_ENERGIES: contains 2 text files: intermediate_energies_train.txt (6 columns). 1st column: Name of compound (training set). 2nd-6th: DeltaG of catalytic intermediate 3 to 7 relative to DeltaG of intermediate 2 [kcal/mol]. intermediate_energies_oos_test.txt (6 columns). 1st column: Name of compound (oos test set). 2nd-6th: DeltaG of catalytic intermediate 3 to 7 relative to DeltaG of intermediate 2 [kcal/mol]. 4) PDS: contains 2 text files: pds_train.txt (2 columns). 1st column: Name of compound (training set). 2nd: number of potential determining step in the catalytic cycle. pds_oos_test.txt (2 columns). 1st column: Name of compound (oos test set). 2nd: number of potential determining step in the catalytic cycle. 5) REACTION_ENERGIES: contains 2 text files: rxn_energies_sabatier_train.txt (7 columns). 1st column: Name of compound (SABATIER and training set). 2nd-7th: Reaction energy for each catalytic step [kcal/mol]. rxn_energies_oos_test.txt(7 columns). 1st column: Name of compound (out-of-sample test set). 2nd-7th: Reaction energy for each catalytic step [kcal/mol]. 6) tSNE_MAPS: contains 2 binary NumPy array files (p1.npy, p2.npy) and 1 text file (color.txt): p1.npy: first dimension of the t-SNE map. p2.npy: second dimension of the t-SNE map. color.txt (3 columns). 1st column: Name of compound (SABATIER, training and oos-test set). 2nd column: integer variable for coloring the map according to the pds of each catalyst. 3rd column: integer variable for coloring the map according to metal center.