ReDD-COFFEE database

This is the landing page for the ReDD-COFFEE database, the Ready-to-use and Diverse Database of Covalent Organic Frameworks with Force field based Energy Evaluation. This database contains a diverse set of 268 687 covalent organic frameworks (COFs) and accompanying ab initio derived, system-specific force fields.

If you use any of this data, please cite the corresponding paper.

Structures

All 268 687 COF structures are collected in the provided tar files, which are ordered by linkage type and dimensionality. To untar the folders, run

tar -xf Linkage_ND.tar.xz

Each optimized structure file is named top_SBU1_SBU2_..._SBUN_optimized.chk, with top being the topology of the structure and SBUX is the SBU that is placed on the X-th Wyckoff set of the topology. The structure files are provided in the molmod .chk format (see molmod.github.io/molmod), but can be converted to .cif files using one of the post-processing scripts.

Force field parameters

The force field parameters of all structures are collected in the pars.txt file. This is a Yaff Parameter file (see molmod.github.io/yaff), which can be directly adopted to start molecular simulations in Yaff. To generate the force field for a specific structure, run the following lines using a Python interpreter:

from yaff import System, ForceField system = System.from_file(structure.chk) ff = ForceField.generate(system, 'pars.txt', **kwargs)

For more information on how to start molecular simulations, or convert the files to other formats, we refer to the molmod and Yaff documentation.

Post-processing scripts

extract_subset.sh As discussed in the original paper, a subset can be extracted from the ReDD-COFFEE database that has approximately the same variety and disparity as the full database. Therefore, the structures are sorted by a maxmin-algorithm. The order of the structures is given in the ordered_maxmin.txt file. To extract a subset, the extract_subset.sh script can be adopted by running the following command in a terminal. bash extract_subset.sh N, where N is the number of structures that you require to be in the subset. If not provided, the total number of structures in the subset will be 10 000 and the same subset as mentioned in the paper will be obtained.
extract_pars.sh Since the pars.txt file contains the force field parameters of all structures, there are a lot of redundant lines when trying to load the parameters for one specific structure. To extract only the necessary parameters, the extract_pars.sh script can be used: bash extract_pars.sh struct.chk bash extract_pars.sh struct1.chk struct2.chk bash extract_pars.sh BoronateEster_2D/*.chk bash extract_pars.sh */*.chk For each structure argument, the script will generate a separate file pars_STRUCT.txt, with STRUCT being the structure name, containing only those force field parameters that are required to generate the force field for that specific structure. This file can replace the pars.txt argument in the Python command above to generate the force field. The file will be placed in the same folder as the .chk file mentioned in the argument.
write_to_cif.py To convert the molmod .chk format to the common .cif format, the write_to_cif.py script can be adopted. (This script requires the molmod and Yaff packages to be installed.) python write_to_cif.py struct.chk python write_to_cif.py struct1.chk struct2.chk python write_to_cif.py BoronateEster_2D/*.chk python write_to_cif.py */*.chk For each structure argument, the script generates a separate file STRUCT.cif, with STRUCT being the structure name, in the same folder as the .chk file.