Molecules in this dataset are taken from :

  1. water/water_randomized_1000.xyz is a shuffled version of the 1000 molecules listed in water_monomer.xyz at doi:10.24435/materialscloud:2018.0009/v1/
  2. ethanol/ethanol_4500.xyz contains a random subselection og 4500 ethanol molecules (and the Fock and overlap matrices transformed to the PySCF format) of the data used in Ref 1.
  3. 'qm7/data/qm7-chno-frames.xyz` is a selection of 6868 molecules containing C,H,N,O atoms from the QM7 dataset.
  4. Please use the ncenter-reps code available at Ref 2.

[1] K. T. Schütt, M. Gastegger, A. Tkatchenko, K. R. Müller, R. J. Maurer Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions, Nature commun., 10(1), 1-10. 2019

[2] J. Nigam, M. Ceriotti, N-center representations for atomic-scale modeling, github.com/cosmo-epfl/ncenter-reps