Source data for dipoles fits

The XYZ files here give both the geometries and the dipoles of the datasets used in fitting of the scalar-vector dipole models in this publication.

All dipoles are expressed in units of Debye. All files are in extended-XYZ format, readable by ASE, where the dipole is then accessible as follows:

atoms.info['dipole_<method>']

where atoms is a single ASE Atoms object, and <method> is one of b3lyp, ccsd, or scan0.

QM7b

Both CCSD/daDZ and B3LYP/daDZ dipoles are included, as well as polarizabilities, all calculated as described here and available with the alphaML dataset. The molecules are randomly shuffled (to ease random partitioning into training and test set; the first 5400 molecules are the training set and the last 1811 are the test set), but their indices in FPS ordering are given under the key fps_order.

QM9

A sample from the QM9 database, B3LYP dipoles only. As with the QM7b set, the molecules were randomly shuffled; the first 20000 were chosen as the training set and the next 1000 as the test set. The key id corresponds to the QM9 ID of the molecule as given in the dataset.

Showcase

This test set consists of the first 29 molecules of the AlphaML showcase (also available here) plus 31 additional amino acid derivatives. Dipoles were computed at B3LYP, CCSD, and SCAN0.

Challenge sets

Finally, four challenge sets are provided; these are all series of molecules of increasing length, some made of polar fragments, one with large separation of charge, and one "control" with nearly constant dipole as a function of length. Dipoles computed only at the B3LYP level.

License

This dataset is licensed under a Creative Commons Attribution 4.0 International License.