This record has versions v1, v2, v3. This is version v2.
×

Recommended by

Indexed by

Data-driven discovery of organic electronic materials enabled by hybrid top-down/bottom-up design

J. Terence Blaskovits1, R. Laplaza1, S. Vela1, C. Corminboeuf1*

1 Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

* Corresponding authors emails: clemence.corminboeuf@epfl.ch
DOI10.24435/materialscloud:nh-gb [version v2]

Publication date: Mar 22, 2023

How to cite this record

J. Terence Blaskovits, R. Laplaza, S. Vela, C. Corminboeuf, Data-driven discovery of organic electronic materials enabled by hybrid top-down/bottom-up design, Materials Cloud Archive 2023.47 (2023), https://doi.org/10.24435/materialscloud:nh-gb

Description

The high-throughput molecular exploration and screening of organic electronic materials often starts with either a 'top-down' mining of existing repositories, or the 'bottom-up' assembly of fragments based on predetermined rules and known synthetic templates. In both instances, the datasets used are often produced on a case-by-case basis, and require the high-quality computation of electronic properties and extensive user input: curation in the top-down approach, and the construction of a fragment library and introduction of rules for linking them in the bottom-up approach. Both approaches are time-consuming and require significant computational resources. Here, we generate a top-down set named FORMED consisting of 117K synthesized molecules containing their optimized structures, associated electronic and topological properties and chemical composition, and use these structures as a vast library of molecular building blocks for bottom-up fragment-based materials design. A tool is developed to automate the coupling of these building block units based on their available Csp2-H bonds, thus providing a fundamental link between the two philosophies of dataset construction. Statistical models are trained on this dataset and a subset of the resulting hybrid top-down/bottom-up compounds (selected dimers), which enable on-the-fly prediction of key ground state (frontier molecular orbital gaps) and excited state (S1 and T1 energies) properties from molecular geometries with high accuracy across all known p-block organic compound space. With access to ab initio-quality optical properties in hand, it is possible to apply this bottom-up pipeline using existing compounds as molecular building blocks to any materials design campaign. To illustrate this, we construct and screen over a million molecular candidates (predicted dimers) for efficient intramolecular singlet fission, the leading candidates of which provide insight into the structural features that may promote this multiexciton-generating process.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name Size Description
README.txt
MD5md5:9cfe7467dc8b61f90bcdac45a6174ddb
984 Bytes README file detailing the contents of this record.
Data_FORMED.csv
MD5md5:9f31404de41180f603c86027993b8677
95.1 MiB CSV file containing the tabulated properties for the FORMED database.
Data_dimers_selected.csv
MD5md5:4aad2e015bacc3f7be16b63a6678602b
680.7 KiB CSV file containing the tabulated properties for the selected dimers.
Data_dimers_predicted.csv
MD5md5:ae8916898d920626f346d6c6ba8dd42b
100.4 MiB CSV file containing the tabulated properties (obtained with ML) for the predicted dimers.
XYZ_FORMED.tar.gz
MD5md5:584c00f6fbd6d56b0055685938848654
94.8 MiB Compressed file with all the XYZ files of the FORMED database.
XYZ_dimers_selected.tar.gz
MD5md5:20789d5a174f5fa27cd0226c5ca2ffa8
2.3 MiB Compressed file with all the XYZ files of the selected dimers.
XYZ_dimers_predicted.tar.gz
MD5md5:2f54d2274ed5fe3ceda5027ea7d567a9
855.0 MiB Compressed file with all the XYZ files of the predicted dimers.
FORMED_chemiscope.json.gz
MD5md5:05236f475f8c01672bc313480df7a549
Visualize on Chemiscope
89.7 MiB Chemiscope file containing the properties and structures of the FORMED database.
Dimers_selected_chemiscope.json.gz
MD5md5:547a7fb2245ae0a5ef4d4edd1752c1f8
Visualize on Chemiscope
2.0 MiB Chemiscope file containing the properties and structures of the selected dimers.
Dimers_predicted_chemiscope.json.gz
MD5md5:1682a79e2f29fffadffeadb7173a4733
Visualize on Chemiscope
785.9 MiB Chemiscope file containing the properties and structures of the predicted dimers.
chemiscopify.ipynb
MD5md5:3885bfbfcba50076deb2914be9e52979
36.1 MiB Notebook exemplifying how the provided XYZ structures and csv files can be combined to generate the Chemiscope json files.
Data_FORMED_scored.csv
MD5md5:7f6c580975810525cffeb8cc63cf173f
116.5 MiB CSV file containing the tabulated properties for the FORMED database plus SMILES, canonical SMILES and SAScores and SCScores for all except 1743 molecules which could not be processed.
Data_top_1500_dimers_scored.csv
MD5md5:45a5fe3952e6308e0ee7a9add3f0052a
283.1 KiB CSV file containing the filenames, SMILES, canonical SMILES, S1-T1-based scores and SAScores and SCScores for 1500 top dimers from the generated subset except 30 molecules which could not be processed.

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

Journal reference (Manuscript to be submitted. Reference will be updated shortly.)
J. T. Blaskovits, R. Laplaza, S. Vela, C. Corminboeuf, To be submitted (2022)

Keywords

organic molecules crystal structures optical properties photophysical properties donor-acceptor copolymers