This is dataset for the publication "Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles", by A.R. Tan, S. Urata, S. Goldman, J. C. B. Dietschreit, and R. Gomez-Bombarelli. The repository contains the simulation and adversarially sampled data to train and test the neural network potentials. The contents of the datasets are:
ammonia_train.xyz
: contains 78 geometries, energies and forces for the ammonia molecule, as calculated with DFT. The content in this file is downloaded from the paper Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks. The energies and forces are calculated using the BP86-D3/def2-SVP level of theory as implemented in ORCA. The geometries were generated using hessian-displacement in the direction of normal mode vectors on initial molecular conformers generated using RDKit with the MMFF94 force field.ammonia_test.xyz
: contains 129 geometries, energies and forces for the ammonia molecule, sampled using the adversarial sampling method and calculated using the same level of DFT theory as the train set. Energies of the geometries in this set range from 0 - 100 kcal/mol (or 0 - 25 kcal/mol/atom). Geometries in this file is used to test robustness and extrapolative power of the neural network potentials.As the test set comprises higher-energy geometries and the training set encompasses low-energy structures near the ground state, the ammonia dataset serves as a fundamental example of an out-of-domain, extrapolative uncertainty quantification (UQ) challenge. While energy is generally not a valid way of separating in-domain (ID) and out-of-domain (OOD) data, it does effectively separate the configurations well due to ammonia being such a small molecule. For a more comprehensive analysis of ID and OOD partition, see Supplementary Section IIA in the Supplementary Information of the paper.
silica_train.xyz
: contains 1318 structures, lattice cells, energies and forces for the silica glass system. Structures here are sampled from simulations using force-matching potentials (FMP) from paper How fluorine minimizes density fluctuations of silica glass: Molecular dynamics study with machine-learning assisted force-matching potential, and calculated using DFT. Note that DFT energies in this file has been subtracted by a value of -125667.96875 kcal/mol from the original DFT calculated energy.silica_test.xyz
: contains 373 structures, lattice cells, energies and forces for the silica glass system, sampled using the adversarial sampling method and calculated with DFT. Structures in this file is used to test robustness and extrapolative power of the neural network potentials. Note that DFT energies in this file has been subtracted by a value of -125667.96875 kcal/mol from the original DFT calculated energy.Each silica glass structure consists of 699 atoms (233 Si and 466 O atoms). In total, 1591 silica structures were sampled from the molecular dynamic trajectories and 101 structures using adversarial sampling. The structures were then partitioned into a training set that contains only structures generated under low-temperature, low-deformation rate conditions, whereas the test set contains structures extracted from higher temperature and higher deformation rate trajectories. As can be seen in Supplementary Figures 2 & 3 and Supplementary Section IIB in the Supplementary Information of the paper, it is much harder to split the data for this much bigger system in such a way that it creates a clear in and out of distribution part. DFT calculations were performed on the structures using the Vienna Ab-initio Simulation Package (VASP). Details of the MD simulations, DFT calculations, and adversarial sampling of the silica structures are discussed in Supplementary Section IIB. Because of its high configurational complexity - including high energy fracture geometries - and low chemical complexity, our silica dataset represents a step-up in generalization of UQ in extreme condition regimes.
In all datasets, the units for energy is given in kcal/mol, and the units for the forces are given in kcal/mol/Angstrom. Positions and lattice parameters are given in Angstrom.