×

Recommended by

Indexed by

Reaction-agnostic featurization of bidentate ligands for Bayesian ridge regression of enantioselectivity

Alexandre A. Schoepfer1,2,3, Ruben Laplaza1,3, Matthew D. Wodrich1,3, Jerome Waser2,3*, Clemence Corminboeuf1,3*

1 Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

2 Laboratory of Catalysis and Organic Synthesis, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

3 National Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

* Corresponding authors emails: jerome.waser@epfl.ch, clemence.corminboeuf@epfl.ch
DOI10.24435/materialscloud:c0-7z [version v1]

Publication date: Dec 12, 2023

How to cite this record

Alexandre A. Schoepfer, Ruben Laplaza, Matthew D. Wodrich, Jerome Waser, Clemence Corminboeuf, Reaction-agnostic featurization of bidentate ligands for Bayesian ridge regression of enantioselectivity, Materials Cloud Archive 2023.193 (2023), https://doi.org/10.24435/materialscloud:c0-7z

Description

Chiral ligands are important components in asymmetric homogeneous catalysis, but their synthesis and screening can be both time-consuming and resource-intensive. Data-driven approaches, in contrast to screening procedures based on intuition, have the potential to reduce the time and resources needed for reaction optimization by more rapidly identifying an ideal catalyst. These approaches, however, are often non-transferable and cannot be applied across different reactions. To overcome this drawback, we introduce a general featurization strategy for bidentate ligands that is coupled with an automated feature selection pipeline and Bayesian ridge regression to perform multivariate linear regression modeling. This approach, which is applicable to any reaction, incorporates electronic, steric, and topological features (rigidity/flexibility, branching, geometry, constitution) and is well-suited for early-stage ligand optimization. Using only small datasets, our workflow capably predicts the enantioselectivity of four metal-catalyzed asymmetric reactions. Uncertainty estimates provided by Bayesian ridge regression permit the use of Bayesian optimization to efficiently explore pools of prospective new ligands. Finally, we constructed the BDL-Cu-2023 dataset, comprised of 312 bidentate ligands extracted from the CSD, and screened it with this procedure to identify promising ligand candidates for a challenging asymmetric oxy-alkynylation reaction.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name Size Description
chemiscopify.ipynb
MD5md5:3398199b2ce36283d150ab56ce2c32f3
40.6 KiB Notebook to generate Chemiscope files
lit_xyz.tar.gz
MD5md5:3049e2c0f2d6ada0f6deb7a94cdfcd60
93.3 KiB xyz literature ligand structures
csd_xyz.tar.gz
MD5md5:6256d527c7b0a5d379cc93b27e51c07e
217.3 KiB xyz CSD ligand structures
mc_lit.csv
MD5md5:44683dba570bfd0bfe8dee2a40292c01
410.2 KiB literature ligand features
mc_csd.csv
MD5md5:75c3c148a02a04d55e8605fa82455235
1.3 MiB CSD ligand features
mc_preds_oa.csv
MD5md5:5eb918892a52e8f3fe5cf748e304f939
3.3 KiB OA dataset predictions
mc_preds_cp.csv
MD5md5:61b7916c6727d54275a786feb39a3be8
4.6 KiB CP dataset predictions
mc_preds_cc.csv
MD5md5:99cb3e3d78e336fdc7523571aee48b5c
4.7 KiB CC dataset predictions
mc_preds_da_f.csv
MD5md5:2fbc097415efe52056d7d130234e223e
5.0 KiB DA dataset predictions
lit_ligs-chemiscope.json.gz
MD5md5:eb4f21d154d737cc4a121b507c199168
Visualize on Chemiscope
261.5 KiB literature ligand chemiscope JSON
csd_ligs-chemiscope.json.gz
MD5md5:36fd24082b9f8229383fbc0ed5c683a1
Visualize on Chemiscope
730.9 KiB CSD ligand chemiscope JSON
oa_preds-chemiscope.json.gz
MD5md5:75f1120e6c830e27519683bc74ecc4a9
Visualize on Chemiscope
18.8 KiB OA dataset chemiscope JSON
cp_preds-chemiscope.json.gz
MD5md5:1adfe59be19b8a6036b01dd02af20c87
Visualize on Chemiscope
22.4 KiB CP dataset chemiscope JSON
cc_preds-chemiscope.json.gz
MD5md5:9e81a5f4c1fc7119ae668b055bfefcb3
Visualize on Chemiscope
26.5 KiB CC dataset chemiscope JSON
da_f_preds-chemiscope.json.gz
MD5md5:bcb30d3e88e67cdf5a0858bdf5bc16f6
Visualize on Chemiscope
27.6 KiB DA dataset chemiscope JSON
README.md
MD5md5:e3a11fad358dd332637f47dafca4b62f
653 Bytes Read me

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

Journal reference (Paper where the data is generated and used.)
A. A. Schoepfer, R. Laplaza, M. D. Wodrich, J. Waser, C. Corminboeuf, submitted.

Keywords

catalysis homogenous catalysis ligands bidentate ligands NCCR Catalysis EPFL

Version history:

2023.193 (version v1) [This version] Dec 12, 2023 DOI10.24435/materialscloud:c0-7z