Published February 20, 2024 | Version v2
Dataset Open

Electronic excited states from physically-constrained machine learning

  • 1. Dipartimento di Chimica e Chimica Industriale, Università di Pisa, Pisa, Italy
  • 2. Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
  • 3. Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA

* Contact person

Description

Data-driven techniques are increasingly used to replace electronic-structure calculations of matter. In this context, a relevant question is whether machine learning (ML) should be applied directly to predict the desired properties or be combined explicitly with physically-grounded operations. We present an example of an integrated modeling approach, in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation. The resulting model can make predictions for molecules that are much larger and more complex than those that it is trained on, and allows for dramatic computational savings by indirectly targeting the outputs of well-converged calculations while using a parameterization corresponding to a minimal atom-centered basis. Our results on a comprehensive dataset of hydrocarbons emphasize the merits of intertwining data-driven techniques with physical approximations, improving the transferability and interpretability of ML models without affecting their accuracy and computational efficiency, and providing a blueprint for developing ML-augmented electronic-structure methods. Here we include the dataset, accompanying the paper linked below, of hydrocarbons including ethane, ethene, butadiene, hexane, hexatriene, isoprene, styrene, polyalkenes (dodecahexaene, tetradecaheptaene, hexadecaoctaene, octadecanonaene, eicosadecaene), aromatics (benzene, azulene, naphthalene, biphenyl), anthracene, beta-carotene, fullerene. We also provide scripts to generate the Fock and overlap matrices in this dataset. The code for machine learning can be found at the Software reference below.

Files

File preview

files_description.md

All files

Files (4.2 GiB)

Name Size
md5:df82f87e5992e71e87e66df0389b11cc
343 Bytes Preview Download
md5:ad20d9c9ca8109ac24f845bb383162f9
4.2 GiB Download
md5:42c4dda53ca3a875a4a3fd66bd41d461
3.5 KiB Preview Download

References

Journal reference (Paper in which methods and data are discussed)
E. Cignoni, D. Suman, J. Nigam, L. Cupellini, B. Mennucci, and M. Ceriotti, doi: doi.org/10.1021/acscentsci.3c01480

Journal reference (Paper in which methods and data are discussed)
E. Cignoni, D. Suman, J. Nigam, L. Cupellini, B. Mennucci, and M. Ceriotti

Preprint (Preprint in which the data is described)
E. Cignoni, D. Suman, J. Nigam, L. Cupellini, B. Mennucci, and M. Ceriotti, arXiv preprint arXiv:2311.00844.

Software (Github repository with the code for generating data and machine learning)
E. Cignoni, Hamiltonian learning for excited states (HaLEx)