Repository contains relevant files for two applications of the NaviCatGA code.

* example_1.zip contains:

* * ga_run_setup: code for the GA setup used in example 1.

* * * alpha.npy: numpy array containing the regression coefficients for the trained ML model.
* * * mbtypes.npy: numpy array containing the many-body types for the trained ML model.
* * * X_tr.npy: numpy array containing the SLATM representation of the training set.
* * * sigma.npy: numpy array containing the sigma parameter of the kernel for the trained ML model.
* * * database.csv: database of SMILES substituents for the GA run.
* * * run_ga.py: exemplary python code for example 1 with NaviCatGA.

* * ml_model: training data and xyz structures for the ML model used in example 1. Labels are extracted from Cordova et al. 10.1021/acscatal.0c00774 

* * * D_sorted.txt: descriptor variable (labels) for the xyz structures in the training set.
* * * functions_compounds.py, functions_ml_model.py: python modules for the ML model training (vide infra).
* * * train_ml_model.py: python code for training the ML model used in example 1.
* * * Training_Int4_Geoms: directory containing ithe 1649 xyz structures in the training set.

* * results: xyz structures generated per generation in the five GA runs showcased in the paper.

* example_2.zip contains:

* * ga_run_setup: code for the GA setup used in example 2, including AaronTools formatted (xyz) substituents and scaffolds.

* * * mlr_module: python module describing the MLR fitness function.
* * * run_ga.py: exemplary python code for example 2 with NaviCatGA.
* * * scaffolds: directory containing the xyz files for the scaffold database of example 2.
* * * substituents: directory containing the AaronTools.py formatted xyz files for the substituents in example 2.

* * mlr_model: training data and xyz structures for the MLR model used in example 2. Labels are extracted from Gallarati et al. 10.1039/D1SC00482D 

* * * dde_cv_atools.py: performs cross validation on the DDE MLR model.
* * * de_cv_atools.py: performs cross validation on the DE MLR model.
* * * descriptors.xyz: contains descriptors used in the MLR model.
* * * energies.txt: contains energies used to train the MLR model.
* * * volcano_data.csv: contains energy profiles for generating the volcano plot.
* * * Reference_Int1_Geoms_AaronTools: directory containing the 67 xyz structures used in the MLR model fitting.

* * results: xyz structures generated per generation in the three GA runs showcased in the paper.