Reaction-agnostic featurization of bidentate ligands for Bayesian ridge regression of enantioselectivity


JSON Export

{
  "id": "1985", 
  "updated": "2023-12-12T16:21:53.963875+00:00", 
  "metadata": {
    "version": 1, 
    "contributors": [
      {
        "givennames": "Alexandre A.", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland", 
          "Laboratory of Catalysis and Organic Synthesis, Institute of Chemical Sciences and Engineering, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland", 
          "National Center for Competence in Research \u2013 Catalysis (NCCR-Catalysis), \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "familyname": "Schoepfer"
      }, 
      {
        "givennames": "Ruben", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland", 
          "National Center for Competence in Research \u2013 Catalysis (NCCR-Catalysis), \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "familyname": "Laplaza"
      }, 
      {
        "givennames": "Matthew D.", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland", 
          "National Center for Competence in Research \u2013 Catalysis (NCCR-Catalysis), \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "familyname": "Wodrich"
      }, 
      {
        "givennames": "Jerome", 
        "affiliations": [
          "Laboratory of Catalysis and Organic Synthesis, Institute of Chemical Sciences and Engineering, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland", 
          "National Center for Competence in Research \u2013 Catalysis (NCCR-Catalysis), \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "email": "jerome.waser@epfl.ch", 
        "familyname": "Waser"
      }, 
      {
        "givennames": "Clemence", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland", 
          "National Center for Competence in Research \u2013 Catalysis (NCCR-Catalysis), \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "email": "clemence.corminboeuf@epfl.ch", 
        "familyname": "Corminboeuf"
      }
    ], 
    "title": "Reaction-agnostic featurization of bidentate ligands for Bayesian ridge regression of enantioselectivity", 
    "_oai": {
      "id": "oai:materialscloud.org:1985"
    }, 
    "keywords": [
      "catalysis", 
      "homogenous catalysis", 
      "ligands", 
      "bidentate ligands", 
      "NCCR Catalysis", 
      "EPFL"
    ], 
    "publication_date": "Dec 12, 2023, 17:21:53", 
    "_files": [
      {
        "key": "chemiscopify.ipynb", 
        "description": "Notebook to generate Chemiscope files", 
        "checksum": "md5:3398199b2ce36283d150ab56ce2c32f3", 
        "size": 41567
      }, 
      {
        "key": "lit_xyz.tar.gz", 
        "description": "xyz literature ligand structures", 
        "checksum": "md5:3049e2c0f2d6ada0f6deb7a94cdfcd60", 
        "size": 95528
      }, 
      {
        "key": "csd_xyz.tar.gz", 
        "description": "xyz CSD ligand structures", 
        "checksum": "md5:6256d527c7b0a5d379cc93b27e51c07e", 
        "size": 222507
      }, 
      {
        "key": "mc_lit.csv", 
        "description": "literature ligand features", 
        "checksum": "md5:44683dba570bfd0bfe8dee2a40292c01", 
        "size": 420040
      }, 
      {
        "key": "mc_csd.csv", 
        "description": "CSD ligand features", 
        "checksum": "md5:75c3c148a02a04d55e8605fa82455235", 
        "size": 1331958
      }, 
      {
        "key": "mc_preds_oa.csv", 
        "description": "OA dataset predictions", 
        "checksum": "md5:5eb918892a52e8f3fe5cf748e304f939", 
        "size": 3339
      }, 
      {
        "key": "mc_preds_cp.csv", 
        "description": "CP dataset predictions", 
        "checksum": "md5:61b7916c6727d54275a786feb39a3be8", 
        "size": 4690
      }, 
      {
        "key": "mc_preds_cc.csv", 
        "description": "CC dataset predictions", 
        "checksum": "md5:99cb3e3d78e336fdc7523571aee48b5c", 
        "size": 4841
      }, 
      {
        "key": "mc_preds_da_f.csv", 
        "description": "DA dataset predictions", 
        "checksum": "md5:2fbc097415efe52056d7d130234e223e", 
        "size": 5107
      }, 
      {
        "key": "lit_ligs-chemiscope.json.gz", 
        "description": "literature ligand chemiscope JSON", 
        "checksum": "md5:eb4f21d154d737cc4a121b507c199168", 
        "size": 267788
      }, 
      {
        "key": "csd_ligs-chemiscope.json.gz", 
        "description": "CSD ligand chemiscope JSON", 
        "checksum": "md5:36fd24082b9f8229383fbc0ed5c683a1", 
        "size": 748446
      }, 
      {
        "key": "oa_preds-chemiscope.json.gz", 
        "description": "OA dataset chemiscope JSON", 
        "checksum": "md5:75f1120e6c830e27519683bc74ecc4a9", 
        "size": 19261
      }, 
      {
        "key": "cp_preds-chemiscope.json.gz", 
        "description": "CP dataset chemiscope JSON", 
        "checksum": "md5:1adfe59be19b8a6036b01dd02af20c87", 
        "size": 22901
      }, 
      {
        "key": "cc_preds-chemiscope.json.gz", 
        "description": "CC dataset chemiscope JSON", 
        "checksum": "md5:9e81a5f4c1fc7119ae668b055bfefcb3", 
        "size": 27132
      }, 
      {
        "key": "da_f_preds-chemiscope.json.gz", 
        "description": "DA dataset chemiscope JSON", 
        "checksum": "md5:bcb30d3e88e67cdf5a0858bdf5bc16f6", 
        "size": 28257
      }, 
      {
        "key": "README.md", 
        "description": "Read me", 
        "checksum": "md5:e3a11fad358dd332637f47dafca4b62f", 
        "size": 653
      }
    ], 
    "references": [
      {
        "comment": "Paper where the data is generated and used.", 
        "citation": "A. A. Schoepfer, R. Laplaza, M. D. Wodrich, J. Waser, C. Corminboeuf, submitted.", 
        "type": "Journal reference"
      }
    ], 
    "description": "Chiral ligands are important components in asymmetric homogeneous catalysis, but their synthesis and screening can be both time-consuming and resource-intensive. Data-driven approaches, in contrast to screening procedures based on intuition, have the potential to reduce the time and resources needed for reaction optimization by more rapidly identifying an ideal catalyst. These approaches, however, are often non-transferable and cannot be applied across different reactions. To overcome this drawback, we introduce a general featurization strategy for bidentate ligands that is coupled with an automated feature selection pipeline and Bayesian ridge regression to perform multivariate linear regression modeling. This approach, which is applicable to any reaction, incorporates electronic, steric, and topological features (rigidity/flexibility, branching, geometry, constitution) and is well-suited for early-stage ligand optimization. Using only small datasets, our workflow capably predicts the enantioselectivity of four metal-catalyzed asymmetric reactions. Uncertainty estimates provided by Bayesian ridge regression permit the use of Bayesian optimization to efficiently explore pools of prospective new ligands. Finally, we constructed the BDL-Cu-2023 dataset, comprised of 312 bidentate ligands extracted from the CSD, and screened it with this procedure to identify promising ligand candidates for a challenging asymmetric oxy-alkynylation reaction.", 
    "status": "published", 
    "license": "Creative Commons Attribution 4.0 International", 
    "conceptrecid": "1984", 
    "is_last": true, 
    "mcid": "2023.193", 
    "edited_by": 576, 
    "id": "1985", 
    "owner": 1009, 
    "license_addendum": null, 
    "doi": "10.24435/materialscloud:c0-7z"
  }, 
  "revision": 4, 
  "created": "2023-11-22T15:12:54.753686+00:00"
}