Publication date: Mar 22, 2023
The high-throughput molecular exploration and screening of organic electronic materials often starts with either a 'top-down' mining of existing repositories, or the 'bottom-up' assembly of fragments based on predetermined rules and known synthetic templates. In both instances, the datasets used are often produced on a case-by-case basis, and require the high-quality computation of electronic properties and extensive user input: curation in the top-down approach, and the construction of a fragment library and introduction of rules for linking them in the bottom-up approach. Both approaches are time-consuming and require significant computational resources. Here, we generate a top-down set named FORMED consisting of 117K synthesized molecules containing their optimized structures, associated electronic and topological properties and chemical composition, and use these structures as a vast library of molecular building blocks for bottom-up fragment-based materials design. A tool is developed to automate the coupling of these building block units based on their available Csp2-H bonds, thus providing a fundamental link between the two philosophies of dataset construction. Statistical models are trained on this dataset and a subset of the resulting hybrid top-down/bottom-up compounds (selected dimers), which enable on-the-fly prediction of key ground state (frontier molecular orbital gaps) and excited state (S1 and T1 energies) properties from molecular geometries with high accuracy across all known p-block organic compound space. With access to ab initio-quality optical properties in hand, it is possible to apply this bottom-up pipeline using existing compounds as molecular building blocks to any materials design campaign. To illustrate this, we construct and screen over a million molecular candidates (predicted dimers) for efficient intramolecular singlet fission, the leading candidates of which provide insight into the structural features that may promote this multiexciton-generating process.
No Explore or Discover sections associated with this archive record.
File name | Size | Description |
---|---|---|
README.txt
MD5md5:9cfe7467dc8b61f90bcdac45a6174ddb
|
984 Bytes | README file detailing the contents of this record. |
Data_FORMED.csv
MD5md5:9f31404de41180f603c86027993b8677
|
95.1 MiB | CSV file containing the tabulated properties for the FORMED database. |
Data_dimers_selected.csv
MD5md5:4aad2e015bacc3f7be16b63a6678602b
|
680.7 KiB | CSV file containing the tabulated properties for the selected dimers. |
Data_dimers_predicted.csv
MD5md5:ae8916898d920626f346d6c6ba8dd42b
|
100.4 MiB | CSV file containing the tabulated properties (obtained with ML) for the predicted dimers. |
XYZ_FORMED.tar.gz
MD5md5:584c00f6fbd6d56b0055685938848654
|
94.8 MiB | Compressed file with all the XYZ files of the FORMED database. |
XYZ_dimers_selected.tar.gz
MD5md5:20789d5a174f5fa27cd0226c5ca2ffa8
|
2.3 MiB | Compressed file with all the XYZ files of the selected dimers. |
XYZ_dimers_predicted.tar.gz
MD5md5:2f54d2274ed5fe3ceda5027ea7d567a9
|
855.0 MiB | Compressed file with all the XYZ files of the predicted dimers. |
FORMED_chemiscope.json.gz
MD5md5:05236f475f8c01672bc313480df7a549
Visualize on Chemiscope
|
89.7 MiB | Chemiscope file containing the properties and structures of the FORMED database. |
Dimers_selected_chemiscope.json.gz
MD5md5:547a7fb2245ae0a5ef4d4edd1752c1f8
Visualize on Chemiscope
|
2.0 MiB | Chemiscope file containing the properties and structures of the selected dimers. |
Dimers_predicted_chemiscope.json.gz
MD5md5:1682a79e2f29fffadffeadb7173a4733
Visualize on Chemiscope
|
785.9 MiB | Chemiscope file containing the properties and structures of the predicted dimers. |
chemiscopify.ipynb
MD5md5:3885bfbfcba50076deb2914be9e52979
|
36.1 MiB | Notebook exemplifying how the provided XYZ structures and csv files can be combined to generate the Chemiscope json files. |
Data_FORMED_scored.csv
MD5md5:7f6c580975810525cffeb8cc63cf173f
|
116.5 MiB | CSV file containing the tabulated properties for the FORMED database plus SMILES, canonical SMILES and SAScores and SCScores for all except 1743 molecules which could not be processed. |
Data_top_1500_dimers_scored.csv
MD5md5:45a5fe3952e6308e0ee7a9add3f0052a
|
283.1 KiB | CSV file containing the filenames, SMILES, canonical SMILES, S1-T1-based scores and SAScores and SCScores for 1500 top dimers from the generated subset except 30 molecules which could not be processed. |
2023.124 (version v3) | Aug 10, 2023 | DOI10.24435/materialscloud:aa-2w |
2023.47 (version v2) [This version] | Mar 22, 2023 | DOI10.24435/materialscloud:nh-gb |
2022.162 (version v1) | Dec 05, 2022 | DOI10.24435/materialscloud:j6-e2 |