Repository contains relevant files for two applications of the NaviCatGA code. * example_1.zip contains: * * ga_run_setup: code for the GA setup used in example 1. * * * alpha.npy: numpy array containing the regression coefficients for the trained ML model. * * * mbtypes.npy: numpy array containing the many-body types for the trained ML model. * * * X_tr.npy: numpy array containing the SLATM representation of the training set. * * * sigma.npy: numpy array containing the sigma parameter of the kernel for the trained ML model. * * * database.csv: database of SMILES substituents for the GA run. * * * run_ga.py: exemplary python code for example 1 with NaviCatGA. * * ml_model: training data and xyz structures for the ML model used in example 1. Labels are extracted from Cordova et al. 10.1021/acscatal.0c00774 * * * D_sorted.txt: descriptor variable (labels) for the xyz structures in the training set. * * * functions_compounds.py, functions_ml_model.py: python modules for the ML model training (vide infra). * * * train_ml_model.py: python code for training the ML model used in example 1. * * * Training_Int4_Geoms: directory containing ithe 1649 xyz structures in the training set. * * results: xyz structures generated per generation in the five GA runs showcased in the paper. * example_2.zip contains: * * ga_run_setup: code for the GA setup used in example 2, including AaronTools formatted (xyz) substituents and scaffolds. * * * mlr_module: python module describing the MLR fitness function. * * * run_ga.py: exemplary python code for example 2 with NaviCatGA. * * * scaffolds: directory containing the xyz files for the scaffold database of example 2. * * * substituents: directory containing the AaronTools.py formatted xyz files for the substituents in example 2. * * mlr_model: training data and xyz structures for the MLR model used in example 2. Labels are extracted from Gallarati et al. 10.1039/D1SC00482D * * * dde_cv_atools.py: performs cross validation on the DDE MLR model. * * * de_cv_atools.py: performs cross validation on the DE MLR model. * * * descriptors.xyz: contains descriptors used in the MLR model. * * * energies.txt: contains energies used to train the MLR model. * * * volcano_data.csv: contains energy profiles for generating the volcano plot. * * * Reference_Int1_Geoms_AaronTools: directory containing the 67 xyz structures used in the MLR model fitting. * * results: xyz structures generated per generation in the three GA runs showcased in the paper.