Data-driven discovery of organic electronic materials enabled by hybrid top-down/bottom-up design


JSON Export

{
  "id": "1855", 
  "updated": "2023-08-10T12:47:17.700312+00:00", 
  "metadata": {
    "version": 3, 
    "contributors": [
      {
        "givennames": "J. Terence", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "familyname": "Blaskovits"
      }, 
      {
        "givennames": "R.", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "familyname": "Laplaza"
      }, 
      {
        "givennames": "S.", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "familyname": "Vela"
      }, 
      {
        "givennames": "C.", 
        "affiliations": [
          "Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "email": "clemence.corminboeuf@epfl.ch", 
        "familyname": "Corminboeuf"
      }
    ], 
    "title": "Data-driven discovery of organic electronic materials enabled by hybrid top-down/bottom-up design", 
    "_oai": {
      "id": "oai:materialscloud.org:1855"
    }, 
    "keywords": [
      "organic molecules", 
      "crystal structures", 
      "optical properties", 
      "photophysical properties", 
      "donor-acceptor copolymers"
    ], 
    "publication_date": "Aug 10, 2023, 14:47:17", 
    "_files": [
      {
        "key": "README.txt", 
        "description": "README file detailing the contents of this record.", 
        "checksum": "md5:9cfe7467dc8b61f90bcdac45a6174ddb", 
        "size": 984
      }, 
      {
        "key": "Data_FORMED.csv", 
        "description": "CSV file containing the tabulated properties for the FORMED database.", 
        "checksum": "md5:9f31404de41180f603c86027993b8677", 
        "size": 99694228
      }, 
      {
        "key": "Data_dimers_selected.csv", 
        "description": "CSV file containing the tabulated properties for the selected dimers.", 
        "checksum": "md5:4aad2e015bacc3f7be16b63a6678602b", 
        "size": 697042
      }, 
      {
        "key": "Data_dimers_predicted.csv", 
        "description": "CSV file containing the tabulated properties (obtained with ML) for the predicted dimers.", 
        "checksum": "md5:8ebee42a57392516ef58a94fdf1b7578", 
        "size": 105124607
      }, 
      {
        "key": "XYZ_FORMED.tar.gz", 
        "description": "CSV file containing the tabulated properties (obtained with ML) for the predicted dimers.", 
        "checksum": "md5:584c00f6fbd6d56b0055685938848654", 
        "size": 99371312
      }, 
      {
        "key": "XYZ_dimers_selected.tar.gz", 
        "description": "Compressed file with all the XYZ files of the selected dimers.", 
        "checksum": "md5:20789d5a174f5fa27cd0226c5ca2ffa8", 
        "size": 2415084
      }, 
      {
        "key": "XYZ_dimers_predicted.tar.gz", 
        "description": "Compressed file with all the XYZ files of the predicted dimers.", 
        "checksum": "md5:088eeebd2d10dc40beba11238084124a", 
        "size": 889586236
      }, 
      {
        "key": "FORMED_chemiscope.json.gz", 
        "description": "Chemiscope file containing the properties and structures of the FORMED database.", 
        "checksum": "md5:05236f475f8c01672bc313480df7a549", 
        "size": 94007012
      }, 
      {
        "key": "Dimers_selected_chemiscope.json.gz", 
        "description": "Chemiscope file containing the properties and structures of the selected dimers.", 
        "checksum": "md5:547a7fb2245ae0a5ef4d4edd1752c1f8", 
        "size": 2131293
      }, 
      {
        "key": "chemiscopify.ipynb", 
        "description": "Notebook exemplifying how the provided XYZ structures and csv files can be combined to generate the Chemiscope json files.", 
        "checksum": "md5:3885bfbfcba50076deb2914be9e52979", 
        "size": 37872266
      }, 
      {
        "key": "Data_FORMED_scored.csv", 
        "description": "CSV file containing the tabulated properties for the FORMED database plus SMILES, canonical SMILES and SAScores and SCScores for all except 1743 molecules which could not be processed.", 
        "checksum": "md5:7f6c580975810525cffeb8cc63cf173f", 
        "size": 122129274
      }, 
      {
        "key": "Data_top_1500_dimers_scored.csv", 
        "description": "CSV file containing the filenames, SMILES, canonical SMILES, S1-T1-based scores and SAScores and SCScores for 1500 top dimers from the generated subset except 30 molecules which could not be processed.", 
        "checksum": "md5:45a5fe3952e6308e0ee7a9add3f0052a", 
        "size": 289923
      }
    ], 
    "references": [
      {
        "comment": "Manuscript to be submitted. Reference will be updated shortly.", 
        "citation": "J. T. Blaskovits, R. Laplaza, S. Vela, C. Corminboeuf, To be submitted (2022)", 
        "type": "Journal reference"
      }
    ], 
    "description": "The high-throughput molecular exploration and screening of organic electronic materials often starts with either a 'top-down' mining of existing repositories, or the 'bottom-up' assembly of fragments based on predetermined rules and known synthetic templates. In both instances, the datasets used are often produced on a case-by-case basis, and require the high-quality computation of electronic properties and extensive user input: curation in the top-down approach, and the construction of a fragment library and introduction of rules for linking them in the bottom-up approach. Both approaches are time-consuming and require significant computational resources. Here, we generate a top-down set named FORMED consisting of 117K synthesized molecules containing their optimized structures, associated electronic and topological properties and chemical composition, and use these structures as a vast library of molecular building blocks for bottom-up fragment-based materials design. A tool is developed to automate the coupling of these building block units based on their available Csp2-H bonds, thus providing a fundamental link between the two philosophies of dataset construction. Statistical models are trained on this dataset and a subset of the resulting hybrid top-down/bottom-up compounds (selected dimers), which enable on-the-fly prediction of key ground state (frontier molecular orbital gaps) and excited state (S1 and T1 energies) properties from molecular geometries with high accuracy across all known p-block organic compound space.\nWith access to ab initio-quality optical properties in hand, it is possible to apply this bottom-up pipeline using existing compounds as molecular building blocks to any materials design campaign. To illustrate this, we construct and screen over a million molecular candidates (predicted dimers) for efficient intramolecular singlet fission, the leading candidates of which provide insight into the structural features that may promote this multiexciton-generating process.", 
    "status": "published", 
    "license": "Creative Commons Attribution 4.0 International", 
    "conceptrecid": "1560", 
    "is_last": true, 
    "mcid": "2023.124", 
    "edited_by": 576, 
    "id": "1855", 
    "owner": 643, 
    "license_addendum": null, 
    "doi": "10.24435/materialscloud:aa-2w"
  }, 
  "revision": 3, 
  "created": "2023-08-10T09:20:07.611783+00:00"
}