The XYZ files here give both the geometries and the dipoles of the datasets used in fitting of the scalar-vector dipole models in this publication.
All dipoles are expressed in units of Debye. All files are in extended-XYZ format, readable by ASE, where the dipole is then accessible as follows:
atoms.info['dipole_<method>']
where atoms
is a single ASE Atoms object, and <method>
is one of b3lyp
,
ccsd
, or scan0
.
Both CCSD/daDZ and B3LYP/daDZ dipoles are included, as well as
polarizabilities, all calculated as described here and
available with the alphaML dataset. The molecules are
randomly shuffled (to ease random partitioning into training and test set; the
first 5400 molecules are the training set and the last 1811 are the test set),
but their indices in FPS ordering are given under the key fps_order
.
A sample from the QM9 database, B3LYP dipoles only. As with the
QM7b set, the molecules were randomly shuffled; the first 20000 were chosen as
the training set and the next 1000 as the test set. The key id
corresponds
to the QM9 ID of the molecule as given in the dataset.
This test set consists of the first 29 molecules of the AlphaML showcase (also available here) plus 31 additional amino acid derivatives. Dipoles were computed at B3LYP, CCSD, and SCAN0.
Finally, four challenge sets are provided; these are all series of molecules of increasing length, some made of polar fragments, one with large separation of charge, and one "control" with nearly constant dipole as a function of length. Dipoles computed only at the B3LYP level.
This dataset is licensed under a Creative Commons Attribution 4.0 International License.