About the repository ==================== This repository contains data files to reproduce the analysis contained in the article "Adsorbate chemical environment-based machine learning framework for heterogeneous catalysis" by Pushkar Ghanekar, Siddharth Deshpande, and Jeffrey Greeley. The files in this repository are in a format that can be used by the software package "ACE_GCN" at https://gitlab.com/jgreeley-group/ace_gcn. To access the process datasets in this directory please use the `pickle` module to extract the relevant features. ``` import pickle with open('pickle_file.pkl','rb') as f: data = pickle.load(f) ``` For saving space most of the data is loaded in tar.bz2 format. This can be uncompressed on your local cluster / machine using following functions: ```bash function compress_tar(){ folder_name=$1 tar -jcvf $folder_name.tar.bz2 $folder_name } function uncompress_tar(){ folder_name=$1 tar -jxvf $folder_name } ``` Directory structure ===================== Soure_Data ├── Fig3 │ ├── NO5_POSCAR_CONTCAR_NO1234_model.csv │ ├── NO6_POSCAR_CONTCAR_NO12345_model.csv │ ├── train_NO123_full_sumNO4.csv │ └── val_NO123_full_sumNO4.csv ├── Fig4 │ ├── Pt221_4OH_Train_123Pt221_Pt100.csv │ ├── Pt221_5OH_w4OH.csv │ └── Pt221_6OH_w45OH.csv |-- Fig5.py ├── Pt3Sn_NO │ ├── pkls │ │ ├── 4NO │ │ │ ├── 4NO_OHE_6_pkls.tar.bz2 │ │ │ └── id_prop_4NO_CONTCAR_POSCAR.csv │ │ ├── 5NO │ │ │ ├── 5NO_OHE_6_pkls.tar.bz2 │ │ │ └── id_prop_5NO_CONTCAR_POSCAR.csv │ │ ├── 6NO │ │ │ ├── 6NO_OHE_6_pkls.tar.bz2 │ │ │ └── id_prop_6NO_CONTCAR_POSCAR.csv │ │ └── Pt3Sn_NO_1_6_processed.pkl.tar.bz2 │ └── raw_files │ ├── 4NO-cont-pos-out.tar.bz2 │ ├── 5NO-cont-pos-out.tar.bz2 │ ├── 6NO-cont-pos-out.tar.bz2 │ └── raw_converged_123456NO_PtSn.tar.bz2 ├── Pt_OH │ ├── pkls │ │ ├── Pt100 │ │ │ ├── Pt100_CN_OHE_6_Hbonds_pkls.tar.bz2 │ │ │ └── id_prop_Pt100.csv │ │ ├── Pt100_Pt221_12345OH.pkl.tar.bz2 │ │ └── Pt221 │ │ ├── 221_123OH_CN_OHE_6_Hbonds_pkls.tar.bz2 │ │ ├── 221_3OH_UNIQUE_CN_OHE_6_Hbonds_pkls.tar.bz2 │ │ ├── 221_4OH_TOP_SITE_DFT_PKLS.tar.bz2 │ │ ├── 221_5OH_TOP_SITE_DFT_PKLS.tar.bz2 │ │ ├── Pt221_456OH_guess │ │ │ ├── POSCAR_Pt221_5OH_TOP_SITE_EXHAUST_GUESS_PKLS.tar.bz2 │ │ │ ├── POSCAR_Pt221_6OH_TOP_SITE_EXHAUST_GUESS_PKLS.tar.bz2 │ │ │ └── POSCAR_Pt_221_4OH_TOP_SITE_DFT_PKLS.tar.bz2 │ │ ├── id_prop_1OH.csv │ │ ├── id_prop_2OH.csv │ │ ├── id_prop_3OH_most_stable.csv │ │ ├── id_prop_OH4_TOP_DFT.csv │ │ ├── id_prop_OH5_TOP_DFT.csv │ │ └── id_prop_UNIQUE_3OH.csv │ └── raw_files │ ├── Pt100_123OH_raw_converged.tar.bz2 │ ├── Pt221_123OH.tar.bz2 │ ├── Pt221_4OH_GUESS_SITES.tar.bz2 │ ├── Pt221_4OH_TOP_SITES_DFT.tar.bz2 │ ├── Pt221_5OH_GUESS_SITES.tar.bz2 │ ├── Pt221_5OH_TOP_SITES_DFT.tar.bz2 │ └── Pt221_6OH_GUESS_SITES.tar.bz2 ├── README.txt └── processing_scripts ├── binding_distance_utils.py ├── data_coverage_OHE_CN_HBonding.py └── make_graph_objects_dask.py Description of file ==================== The files provided in this directory broadly fall into 3 categories: 1) Atom position files (POSCARs) and trajectory files (OUTCARs) for Pt3Sn/NO and Pt/OH example. These files are stored in the tar.bz2 compression and can be retrieved by running the `tar -jxvf <folder_name>` on bash command line. 2) Graph objects - processed atom position files abstracted in graph objects through the surf graph algorithm ready to be ready by the ACE-GCN code. These files terminate with '_pkl/.pkl' and compressed by tar.bz2. 3) Numpy processed objects - atom objects ready for training and prediction using the ACE-GCN model. A simple example is generated for Pt/OH (/Pt_OH/pkls/Pt100_Pt221_12345OH.pkl.tar.bz2) and PtSn/NO (Pt3Sn_NO/Pt3Sn_NO_1_6_processed.pkl.tar.bz2). Fig files ========= Contains comma-separated-value files which have the raw data generated from ACE-GCN and used in plotting the figures. All plots are made using the `matplotlib` and `seaborn` package. Pt3N_Sn ========= Pkls: Processed graph objects relaxed and unrelaxed guess configurations raw_files: optimized atomic trajectories for 1-6 NO* configurations on Pt3Sn (111), this brute-force enumeration was used training the ACE-GCN model. It also contains the initial guess structures used for 4/5/6 NO* cases. Pt_OH ========= Pkls: Compressed graph objects for the unrelaxed and optimized atomic positions for Pt100 and Pt221 configuration. raw_files: contain atom positions and trajectory For reading the pickle files use the following snippet: Make sure to use the id_prop file which is the look-up dataset for reading the graph object. ```python from processing_script.binding_distance_utils import generate_dataset_list graph_dataset = np.array(generate_dataset_list(directory_path, id_prop_file, pickle_path), dtype=object) ```