Publication date: Dec 12, 2023
Chiral ligands are important components in asymmetric homogeneous catalysis, but their synthesis and screening can be both time-consuming and resource-intensive. Data-driven approaches, in contrast to screening procedures based on intuition, have the potential to reduce the time and resources needed for reaction optimization by more rapidly identifying an ideal catalyst. These approaches, however, are often non-transferable and cannot be applied across different reactions. To overcome this drawback, we introduce a general featurization strategy for bidentate ligands that is coupled with an automated feature selection pipeline and Bayesian ridge regression to perform multivariate linear regression modeling. This approach, which is applicable to any reaction, incorporates electronic, steric, and topological features (rigidity/flexibility, branching, geometry, constitution) and is well-suited for early-stage ligand optimization. Using only small datasets, our workflow capably predicts the enantioselectivity of four metal-catalyzed asymmetric reactions. Uncertainty estimates provided by Bayesian ridge regression permit the use of Bayesian optimization to efficiently explore pools of prospective new ligands. Finally, we constructed the BDL-Cu-2023 dataset, comprised of 312 bidentate ligands extracted from the CSD, and screened it with this procedure to identify promising ligand candidates for a challenging asymmetric oxy-alkynylation reaction.
No Explore or Discover sections associated with this archive record.
File name | Size | Description |
---|---|---|
chemiscopify.ipynb
MD5md5:3398199b2ce36283d150ab56ce2c32f3
|
40.6 KiB | Notebook to generate Chemiscope files |
lit_xyz.tar.gz
MD5md5:3049e2c0f2d6ada0f6deb7a94cdfcd60
|
93.3 KiB | xyz literature ligand structures |
csd_xyz.tar.gz
MD5md5:6256d527c7b0a5d379cc93b27e51c07e
|
217.3 KiB | xyz CSD ligand structures |
mc_lit.csv
MD5md5:44683dba570bfd0bfe8dee2a40292c01
|
410.2 KiB | literature ligand features |
mc_csd.csv
MD5md5:75c3c148a02a04d55e8605fa82455235
|
1.3 MiB | CSD ligand features |
mc_preds_oa.csv
MD5md5:5eb918892a52e8f3fe5cf748e304f939
|
3.3 KiB | OA dataset predictions |
mc_preds_cp.csv
MD5md5:61b7916c6727d54275a786feb39a3be8
|
4.6 KiB | CP dataset predictions |
mc_preds_cc.csv
MD5md5:99cb3e3d78e336fdc7523571aee48b5c
|
4.7 KiB | CC dataset predictions |
mc_preds_da_f.csv
MD5md5:2fbc097415efe52056d7d130234e223e
|
5.0 KiB | DA dataset predictions |
lit_ligs-chemiscope.json.gz
MD5md5:eb4f21d154d737cc4a121b507c199168
Visualize on Chemiscope
|
261.5 KiB | literature ligand chemiscope JSON |
csd_ligs-chemiscope.json.gz
MD5md5:36fd24082b9f8229383fbc0ed5c683a1
Visualize on Chemiscope
|
730.9 KiB | CSD ligand chemiscope JSON |
oa_preds-chemiscope.json.gz
MD5md5:75f1120e6c830e27519683bc74ecc4a9
Visualize on Chemiscope
|
18.8 KiB | OA dataset chemiscope JSON |
cp_preds-chemiscope.json.gz
MD5md5:1adfe59be19b8a6036b01dd02af20c87
Visualize on Chemiscope
|
22.4 KiB | CP dataset chemiscope JSON |
cc_preds-chemiscope.json.gz
MD5md5:9e81a5f4c1fc7119ae668b055bfefcb3
Visualize on Chemiscope
|
26.5 KiB | CC dataset chemiscope JSON |
da_f_preds-chemiscope.json.gz
MD5md5:bcb30d3e88e67cdf5a0858bdf5bc16f6
Visualize on Chemiscope
|
27.6 KiB | DA dataset chemiscope JSON |
README.md
MD5md5:e3a11fad358dd332637f47dafca4b62f
|
653 Bytes | Read me |
2023.193 (version v1) [This version] | Dec 12, 2023 | DOI10.24435/materialscloud:c0-7z |