Machine learning meets volcano plots: Computational discovery of cross-coupling catalysts

Authors: Benjamin Meyer1*, Boodsarin Sawatlon1*, Stefan Niklaus Heinen2*, O. Anatole von Lilienfeld2*, Clémence Corminboeuf1*

  1. Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland;
  2. Institute of Physical Chemistry, Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
  • Corresponding authors emails:,,,,

DOI10.24435/materialscloud:2018.0014/v1 (version v1, submitted on 01 August 2018)

How to cite this entry

Benjamin Meyer, Boodsarin Sawatlon, Stefan Niklaus Heinen, O. Anatole von Lilienfeld, Clémence Corminboeuf, Machine learning meets volcano plots: Computational discovery of cross-coupling catalysts, Materials Cloud Archive (2018), doi: 10.24435/materialscloud:2018.0014/v1.


The application of modern machine learning to challenges in atomistic simulation is gaining attraction. We present new machine learning models that can predict the energy of the oxidative addition process between a transition metal complex and a substrate for C-C cross-coupling reaction. In turn, this quantity can be used as a descriptor to estimate the activity of homogeneous catalysts using molecular volcano plots. The versatility of this approach is illustrated for vast libraries of organometallic catalysts based on Pt, Pd, Ni, Cu, Ag, and Au combined with 91 ligands. Out-of-sample machine learning predictions were made on a total of 18,062 compounds leading to 557 catalyst candidates falling into the ideal thermodynamic window. This number was further refined by searching for candidates with an estimated price lower than 10 US$/mmol. The 37 catalyst finalists are dominated by palladium phosphine ligand combinations but also include earth abundant (Cu) transition metal with less common ligands. Our results indicate that modern statistical learning techniques can be applied to the computational discovery of readily available and promising catalyst candidates.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive entry.


File name Size Description
MD5MD5: 030cd6a0e4fc77b0974e9ceb33fe8ce8
30.9 MiB The overall 25,116 generated structures of each catalytic intermediates.
MD5MD5: 5e6865d8715bd983a2b814d550108ba5
10.6 MiB The overall 7,054 optimized geometries at the B3LYP-D3/3-21G level of each catalytic intermediates.
MD5MD5: 275389a88c051b4e41b84b6c71c298e6
782.4 KiB The single point energies computed at the B3LYP-D3/def2-TZVP level, the corresponding binding energies and the 18,062 out-of-sample machine learning predicted binding energies using the Coulomb Matrix, Bag of Bonds and SLTAM representations


Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.

External references

Journal reference
B. Meyer, B. Sawatlon, S. N. Heinen, O. A. von Lilienfeld and C. Corminboeuf. Machine learning meets volcano plots: Computational discovery of cross-coupling catalysts, Accepted. doi:10.1039/C8SC01949E


machine learning homogeneous catalysis volcano plot transition metal complexes

Version history

01 August 2018 [This version]