×

Recommended by

Indexed by

Dictionary of 140k GDB and ZINC derived AMONs

Bing Huang1,2*, Anatole von Lilienfeld1,2*

1 Department of Chemistry, University of Basel, CH-4056 Basel, Switzerland

2 Faculty of Physics, University of Vienna, 1090 Wien, Austria

* Corresponding authors emails: hbdft2008@gmail.com, anatole.vonlilienfeld@gmail.com
DOI10.24435/materialscloud:1s-51 [version v1]

Publication date: Apr 11, 2021

How to cite this record

Bing Huang, Anatole von Lilienfeld, Dictionary of 140k GDB and ZINC derived AMONs, Materials Cloud Archive 2021.61 (2021), https://doi.org/10.24435/materialscloud:1s-51

Description

We present all AMONs for GDB and Zinc data-bases using no more than 7 non-hydrogen atoms (AGZ7)---a calculated organic chemistry building-block dictionary based on the AMON approach [Huang and von Lilienfeld, Nature Chemistry (2020)]. AGZ7 records Cartesian coordinates of compositional and constitutional isomers, as well as properties for ∼140k small organic molecules obtained by systematically fragmenting all molecules of Zinc and the majority of GDB17 into smaller entities, saturating with hydrogens, and containing no more than 7 heavy atoms (excluding hydrogen atoms). AGZ7 cover the elements H, B, C, N, O, F, Si, P, S, Cl, Br, Sn and I and includes optimized geometries, total energy and its decomposition, Mulliken atomic charges, dipole moment vectors, quadrupole tensors, electronic spatial extent, eigenvalues of all occupied orbitals, LUMO, gap, isotropic polarizability, harmonic frequencies, reduced masses, force constants, IR intensity, normal coordinates, rotational constants, zero-point energy, internal energy, enthalpy, entropy, free energy, and heat capacity (all at ambient conditions) using B3LYP/cc-pVTZ (pseudopotentials were used for Sn and I) level of theory. We exemplify the usefulness of this data set with AMON based machine learning models of total potential energy predictions of seven of the most rigid GDB-17 molecules.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name Size Description
README.md
MD5md5:a84a7b7306bc700b92254f3b47c19041
2.9 KiB README file
zinc.can
MD5md5:2ae912085b1226832a2734e723b5cf3b
1.2 MiB SMILES strings of AMONs unique to ZINC
gdb17.can
MD5md5:cb8994a40f3aa64b778b33d2dcecd68a
173.0 KiB SMILES strings of AMONs unique to GDB17
gdb17-zinc-comm.can
MD5md5:f7311fd7a704b54e830829195335f691
345.2 KiB SMILES strings of AMONs shared by ZINC & GDB17 
zinc.tar.gz
MD5md5:90b07d3d60018e6ee6cfec0bd90c36a9
634.6 MiB json files containing geometry and properties for all ZINC AMONs
gdb17.tar.gz
MD5md5:be6cc52e6d6308a4e05f310bc19de083
258.3 MiB json files containing geometry and properties for all GDB17 AMONs

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

Keywords

building blocks quantum machine learning organic chemical space SNSF MARVEL ERC

Version history:

2021.61 (version v1) [This version] Apr 11, 2021 DOI10.24435/materialscloud:1s-51