Recommended by

Indexed by

Electronic excited states from physically-constrained machine learning

Edoardo Cignoni1*, Divya Suman2, Jigyasa Nigam2*, Lorenzo Cupellini1, Benedetta Mennucci1, Michele Ceriotti3,2

1 Dipartimento di Chimica e Chimica Industriale, Università di Pisa, Pisa, Italy

2 Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

3 Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA

* Corresponding authors emails: edoardo.cignoni@phd.unipi.it, jigyasa.nigam@epfl.ch
DOI10.24435/materialscloud:j2-58 [version v2]

Publication date: Feb 20, 2024

How to cite this record

Edoardo Cignoni, Divya Suman, Jigyasa Nigam, Lorenzo Cupellini, Benedetta Mennucci, Michele Ceriotti, Electronic excited states from physically-constrained machine learning, Materials Cloud Archive 2024.34 (2024), https://doi.org/10.24435/materialscloud:j2-58


Data-driven techniques are increasingly used to replace electronic-structure calculations of matter. In this context, a relevant question is whether machine learning (ML) should be applied directly to predict the desired properties or be combined explicitly with physically-grounded operations. We present an example of an integrated modeling approach, in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation. The resulting model can make predictions for molecules that are much larger and more complex than those that it is trained on, and allows for dramatic computational savings by indirectly targeting the outputs of well-converged calculations while using a parameterization corresponding to a minimal atom-centered basis. Our results on a comprehensive dataset of hydrocarbons emphasize the merits of intertwining data-driven techniques with physical approximations, improving the transferability and interpretability of ML models without affecting their accuracy and computational efficiency, and providing a blueprint for developing ML-augmented electronic-structure methods. Here we include the dataset, accompanying the paper linked below, of hydrocarbons including ethane, ethene, butadiene, hexane, hexatriene, isoprene, styrene, polyalkenes (dodecahexaene, tetradecaheptaene, hexadecaoctaene, octadecanonaene, eicosadecaene), aromatics (benzene, azulene, naphthalene, biphenyl), anthracene, beta-carotene, fullerene. We also provide scripts to generate the Fock and overlap matrices in this dataset. The code for machine learning can be found at the Software reference below.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.


File name Size Description
3.5 KiB README describing the repository architecture and data
4.2 GiB Dataset of hydrocarbons of varying conjugation, lengths and aromaticity along with scripts to run calculations and produce figures


Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

Preprint (Preprint in which the data is described)
Software (Github repository with the code for generating data and machine learning)


ERC hamiltonian excited states machine learning EPFL FIAMMA LIFETimeS

Version history:

2024.34 (version v2) [This version] Feb 20, 2024 DOI10.24435/materialscloud:j2-58
2024.11 (version v1) Jan 23, 2024 DOI10.24435/materialscloud:5s-gm