######################################################################
Data for:
"A new kind of atlas of zeolite building blocks"

Benjamin A. Helfrecht*, Rocio Semino, Giovanni Pireddu, 
Scott M. Auerbach, and Michele Ceriotti**

*benjamin.helfrecht@epfl.ch
**michele.ceriotti@epfl.ch

DOI: 10.1063/1.5119751

Data curated by Benjamin Helfrecht, Nov. 2019                            
######################################################################

This folder several subdirectories, which in turn contain
the KPCA and classical descriptor data
and environment property decompositions computed for the paper.
See the paper and its supplemental material (DOI above) for more
details about the descriptors.

The directories and their contents are as follows:

DEEM_{1k, 10k}_{3.5A, 6.0A}
===========================
Directories containing the first 100 kernel principal components,
environment volumes, and environment energies computed
for each Si atom in the 1k or 10k DEEM samples used in the paper
using SOAP as a descriptor with a radial cutoff of 3.5 angstroms
or 6.0 angstroms.

    kpca100.dat
    ----------
    First 100 kernel principal components of the SOAP vectors

    volumes.dat
    -----------
    Si-centered environment contributions to the structure volume

    energies.dat
    ------------
    Si-centered environment contributions to the structure energy

DEEM_{1k, 10k}_{Angles, Distances}
==================================
Directories containing the angle or distance descriptors for each Si atom
in the 1k or 10k DEEM samples used in the paper.

    {angles, distances}.dat
    -----------------------
    Angle or distance descriptors
    
DEEM_{1k, 10k}_{King, Short}_{Binary, Distribution}
===================================================
Directories containing the ring descriptors for each Si atom
in the 1k or 10k DEEM samples used in the paper.
The ring descriptors are composed of ring counts from
the King or Shortest Path definitions in the
"Binary" (boolean) or "Distribution" (ring counts) representation.

    rings.dat
    ---------
    Ring descriptor

The directory also contains several supplemental files:

ids_natoms_10k.dat
------------------
The structure IDs and the number of Si atoms
in the DEEM database structures
in the 10k sample used in the paper.
The structures are ordered to correspond to the descriptors
listed above. For example, the 10th line in the descriptor files
corresponds to the second Si atom in structure PCOD8000033.

ids_natoms_1k.dat
-----------------
The structure IDs and the number of Si atoms
in the DEEM database structures
in the 1k sample used in the paper.
The structures are ordered to correspond to the descriptors
listed above. For example, the 10th line in the descriptor files
corresponds to the second Si atom in structure PCOD8000332.

build_xyz.py
------------
A python script that can be used to construct an extended xyz file
containing the Si atoms of the DEEM database structures 
where the descriptors (and environment property values) are appended
to the Si atoms. An extended xyz file containing the database structures
ordered by increasing ID number is required.
Help for the script can be obtained using `python build_xyz.py -h`.

build_xyz_all
-------------
A bash script that builds all of the extended xyz files
for all of the descriptors given the filepaths to the
extended xyz files of the 1k and 10k DEEM database structures.
Given the structures, e.g., `DEEM_1k.xyz` and `DEEM_10k.xyz`,
all of the extended xyz files can be built using
`build_xyz_all DEEM_1k.xyz DEEM_10k.xyz`.