Publication date: Sep 04, 2020
Most of the datasets to benchmark machine-learning models contain minimum-energy structures, or small fluctuations around stable geometries, and focus on the diversity of chemical compositions, or the presence of different phases. This dataset provides a large number (7732489) configurations for a simple CH4 composition, that are generated in an almost completely unbiased fashion. Hydrogen atoms are randomly distributed in a 3A sphere centered around the carbon atom, and the only structures that are discarded are those with atoms that are closer than 0.5A, or such that the reference DFT calculation does not converge. This dataset is ideal to benchmark structural representations and regression algorithms, verifying whether they allow reaching arbitrary accuracy in the data rich regime.
No Explore or Discover sections associated with this archive record.
|609.0 MiB||7732489 random methane molecules along with their dft energies in the extended xyz format|