This is the dataset for the publication "Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks", by D. Schwalbe-Koda, A.R. Tan, R. Gómez-Bombarelli. The repository contains the simulation data used to train the neural network potentials, as well as the adversarial attacks. The contents of the datasets are:
zeolite.json: contains all structures, and DFT energies and forces, both for unloaded zeolite frameworks, and zeolite-molecule pairs.ala2.json: contains all geometries, energies and forces for alanine dipeptide calculated with the OPLS force field.ammonia.json: contains all geometries, energies and forces for the ammonia molecule, as calculated with DFT.In all datasets, the units for energy is given in kcal/mol, and the units for forces are given in kcal/mol Å. Positions and lattice parameters are given in Ångstrom. A detailed description of the columns in these datasets is shown below.
All DFT energies and forces were calculated with VASP 5.4.4, using PBE-D3 (see Methods). The dataset contains the following information:
dataset_name: name of the method used to create the pose. They can be:MD: geometry created by sampling an MD trajectory, and corresponds to a sigle frame.random_displacement: geometry created by randomly displacing the atoms (see Methods of the manuscript)adv_attack: adversarial attack on the ground state geometryNNMD: geometry created by performing an NN-based MD simulation, then calculating the DFT energies/forces on a few sampled framesground_state: geometries obtained by optimizing the zeolite-OSDA poses using DFT simulations.zeolite: IZA code of the zeolite frameworkmolecule: SMILES string of the OSDA docked inside of the zeolite (None if we only have a pure-silica framework)loading: number of OSDAs docked in that particular pose of the zeolitenum_atoms: number of atoms of the geometrynxyz: atomic number (first column) and xyz coordinates (in Å, second to fourth columns) of each atomlattice: lattice matrix (in Å) of the structureenergy: DFT energy of the configuration (in kcal/mol)forces: DFT forces calculated for each atom (in kcal/mol/Å)All energies and forces were calculated using the OPLS force field, as implemented in OpenMM (see Methods).
nxyz: atomic number (first column) and xyz coordinates (in Å, second to fourth columns) of each atomenergy: energy of the configuration (in kcal/mol) as computed with OPLSforces: forces calculated for each atom (in kcal/mol/Å) as computed with OPLSphi: collective variable phi, in degreespsi: collective variable psi, in degreesAll energies and forces were calculated using BP86-D3, as implemented in ORCA (see Methods).
dataset_name: name of the dataset used to train each model. They can be:gen1: geometries generated from hessian displacements of the ammonia moleculegen2: gen1 plus adversarial attacks from the first generation of NN potentialsgen3: gen2 plus adversarial attacks from the second generation of NN potentialsrandom: geometries generated by randomly displacing atomic coordinates by 0.1 and 0.3 Å.nxyz: atomic number (first column) and xyz coordinates (in Å, second to fourth columns) of each atomenergy: energy of the configuration (in kcal/mol)forces: forces calculated for each atom (in kcal/mol/Å)