This is the dataset for the publication "Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks", by D. Schwalbe-Koda, A.R. Tan, R. Gómez-Bombarelli. The repository contains the simulation data used to train the neural network potentials, as well as the adversarial attacks. The contents of the datasets are:
zeolite.json
: contains all structures, and DFT energies and forces, both for unloaded zeolite frameworks, and zeolite-molecule pairs.ala2.json
: contains all geometries, energies and forces for alanine dipeptide calculated with the OPLS force field.ammonia.json
: contains all geometries, energies and forces for the ammonia molecule, as calculated with DFT.In all datasets, the units for energy is given in kcal/mol, and the units for forces are given in kcal/mol Å. Positions and lattice parameters are given in Ångstrom. A detailed description of the columns in these datasets is shown below.
All DFT energies and forces were calculated with VASP 5.4.4, using PBE-D3 (see Methods). The dataset contains the following information:
dataset_name
: name of the method used to create the pose. They can be:MD
: geometry created by sampling an MD trajectory, and corresponds to a sigle frame.random_displacement
: geometry created by randomly displacing the atoms (see Methods of the manuscript)adv_attack
: adversarial attack on the ground state geometryNNMD
: geometry created by performing an NN-based MD simulation, then calculating the DFT energies/forces on a few sampled framesground_state
: geometries obtained by optimizing the zeolite-OSDA poses using DFT simulations.zeolite
: IZA code of the zeolite frameworkmolecule
: SMILES string of the OSDA docked inside of the zeolite (None if we only have a pure-silica framework)loading
: number of OSDAs docked in that particular pose of the zeolitenum_atoms
: number of atoms of the geometrynxyz
: atomic number (first column) and xyz coordinates (in Å, second to fourth columns) of each atomlattice
: lattice matrix (in Å) of the structureenergy
: DFT energy of the configuration (in kcal/mol)forces
: DFT forces calculated for each atom (in kcal/mol/Å)All energies and forces were calculated using the OPLS force field, as implemented in OpenMM (see Methods).
nxyz
: atomic number (first column) and xyz coordinates (in Å, second to fourth columns) of each atomenergy
: energy of the configuration (in kcal/mol) as computed with OPLSforces
: forces calculated for each atom (in kcal/mol/Å) as computed with OPLSphi
: collective variable phi
, in degreespsi
: collective variable psi
, in degreesAll energies and forces were calculated using BP86-D3, as implemented in ORCA (see Methods).
dataset_name
: name of the dataset used to train each model. They can be:gen1
: geometries generated from hessian displacements of the ammonia moleculegen2
: gen1
plus adversarial attacks from the first generation of NN potentialsgen3
: gen2
plus adversarial attacks from the second generation of NN potentialsrandom
: geometries generated by randomly displacing atomic coordinates by 0.1 and 0.3 Å.nxyz
: atomic number (first column) and xyz coordinates (in Å, second to fourth columns) of each atomenergy
: energy of the configuration (in kcal/mol)forces
: forces calculated for each atom (in kcal/mol/Å)