###################################################################### Data for: "A new kind of atlas of zeolite building blocks" Benjamin A. Helfrecht*, Rocio Semino, Giovanni Pireddu, Scott M. Auerbach, and Michele Ceriotti** *benjamin.helfrecht@epfl.ch **michele.ceriotti@epfl.ch DOI: 10.1063/1.5119751 Data curated by Benjamin Helfrecht, Nov. 2019 ###################################################################### This folder several subdirectories, which in turn contain the KPCA and classical descriptor data and environment property decompositions computed for the paper. See the paper and its supplemental material (DOI above) for more details about the descriptors. The directories and their contents are as follows: DEEM_{1k, 10k}_{3.5A, 6.0A} =========================== Directories containing the first 100 kernel principal components, environment volumes, and environment energies computed for each Si atom in the 1k or 10k DEEM samples used in the paper using SOAP as a descriptor with a radial cutoff of 3.5 angstroms or 6.0 angstroms. kpca100.dat ---------- First 100 kernel principal components of the SOAP vectors volumes.dat ----------- Si-centered environment contributions to the structure volume energies.dat ------------ Si-centered environment contributions to the structure energy DEEM_{1k, 10k}_{Angles, Distances} ================================== Directories containing the angle or distance descriptors for each Si atom in the 1k or 10k DEEM samples used in the paper. {angles, distances}.dat ----------------------- Angle or distance descriptors DEEM_{1k, 10k}_{King, Short}_{Binary, Distribution} =================================================== Directories containing the ring descriptors for each Si atom in the 1k or 10k DEEM samples used in the paper. The ring descriptors are composed of ring counts from the King or Shortest Path definitions in the "Binary" (boolean) or "Distribution" (ring counts) representation. rings.dat --------- Ring descriptor The directory also contains several supplemental files: ids_natoms_10k.dat ------------------ The structure IDs and the number of Si atoms in the DEEM database structures in the 10k sample used in the paper. The structures are ordered to correspond to the descriptors listed above. For example, the 10th line in the descriptor files corresponds to the second Si atom in structure PCOD8000033. ids_natoms_1k.dat ----------------- The structure IDs and the number of Si atoms in the DEEM database structures in the 1k sample used in the paper. The structures are ordered to correspond to the descriptors listed above. For example, the 10th line in the descriptor files corresponds to the second Si atom in structure PCOD8000332. build_xyz.py ------------ A python script that can be used to construct an extended xyz file containing the Si atoms of the DEEM database structures where the descriptors (and environment property values) are appended to the Si atoms. An extended xyz file containing the database structures ordered by increasing ID number is required. Help for the script can be obtained using `python build_xyz.py -h`. build_xyz_all ------------- A bash script that builds all of the extended xyz files for all of the descriptors given the filepaths to the extended xyz files of the 1k and 10k DEEM database structures. Given the structures, e.g., `DEEM_1k.xyz` and `DEEM_10k.xyz`, all of the extended xyz files can be built using `build_xyz_all DEEM_1k.xyz DEEM_10k.xyz`.