The data contained inside AUGMENTED_VOLCANO_DATA.tar.gz are organised in 6 folders:

1) EPSim_MAPS: 
    contains 2 binary NumPy array files (X.npy, Y.npy) and 1 text file (color.txt):
        X.npy: x-axis of the EPSim map. Contains the descriptor variable (DeltaG[I5]) for all the catalysts and the Sabatier ideal.
        Y.npy: y-axis of the EPSim map. Contains a normalized similarity measure of each potential catalyst to the Sabatier ideal.
        color.txt (3 columns).
            1st column: Name of compound (SABATIER, training and oos-test set). 
            2nd column: integer variable for coloring the map according to the pds of each catalyst. 
            3rd column: integer variable for coloring the map according to metal center.

2) geometries: 
    contains 2 subfolders:
        train: geometry of each compound included in the training set (xyz format).
        oos_test: geometry of each compound included in the out-of-sample test set (xyz format).

3) INTERMEDIATE_ENERGIES:
    contains 2 text files:
        intermediate_energies_train.txt (6 columns).
            1st column: Name of compound (training set). 2nd-6th: DeltaG of catalytic intermediate 3 to 7 relative to DeltaG of intermediate 2 [kcal/mol].
        intermediate_energies_oos_test.txt (6 columns).
            1st column: Name of compound (oos test set). 2nd-6th: DeltaG of catalytic intermediate 3 to 7 relative to DeltaG of intermediate 2 [kcal/mol].

4) PDS:
    contains 2 text files:
        pds_train.txt (2 columns).
            1st column: Name of compound (training set). 2nd: number of potential determining step in the catalytic cycle.
        pds_oos_test.txt (2 columns).
            1st column: Name of compound (oos test set). 2nd: number of potential determining step in the catalytic cycle.

5) REACTION_ENERGIES:
    contains 2 text files:
        rxn_energies_sabatier_train.txt (7 columns).
            1st column: Name of compound (SABATIER and training set). 2nd-7th: Reaction energy for each catalytic step [kcal/mol].
        rxn_energies_oos_test.txt(7 columns).
            1st column: Name of compound (out-of-sample test set). 2nd-7th: Reaction energy for each catalytic step [kcal/mol].

6) tSNE_MAPS:
    contains 2 binary NumPy array files (p1.npy, p2.npy) and 1 text file (color.txt):
        p1.npy: first dimension of the t-SNE map. 
        p2.npy: second dimension of the t-SNE map.
        color.txt (3 columns).
            1st column: Name of compound (SABATIER, training and oos-test set).
            2nd column: integer variable for coloring the map according to the pds of each catalyst.
            3rd column: integer variable for coloring the map according to metal center.