Published November 7, 2025
| Version v1
Dataset
Open
NaviDiv: a web app for monitoring chemical diversity in generative molecular design
* Contact person
Description
The rapid progress in generative models for molecular design has led to extensive libraries of candidate molecules for biological and chemical applications. However, ensuring these molecules are diverse and representative of broader chemical space remains challenging, with researchers often over-exploring limited regions or missing promising candidates due to inadequate monitoring tools. This work presents NaviDiv (Navigating Diversity in Chemical Space), a comprehensive web-based framework for managing chemical diversity in the string-based generative molecular design through three integrated capabilities: multi-metric diversity analysis capturing structural, syntactic, and molecular framework variations; interactive real-time visualization enabling immediate detection of model collapse; and adaptive constraint generation that dynamically guides optimization while preserving diversity. Through a singlet fission material discovery case study using REINVENT4, we demonstrate that different diversity metrics (i.e. structural similarity, fragment composition, and sequence patterns) respond differently during optimization, with constraint effectiveness depending critically on representational alignment with the generative model. N-gram-based constraints outperform fingerprint-based approaches due to direct correspondence with SMILES generation, while combined constraints maintain diversity across all metrics while achieving optimization performance within 15\% of unconstrained baselines. The framework is freely available at : https://github.com/LCMD-epfl/NaviDiv, providing accessible tools for data-driven decisions about diversity-property trade-offs in automated molecular discovery.
This dataset contains all materials generated during the singlet fission material discovery case study presented in this work. It includes: (1) the REINVENT4 prior model used for molecule generation, (2) the trained ChemProp model checkpoints used to evaluate singlet fission potential, and (3) the generated molecules from each experimental run. For each run, molecules are stored in a CSV file alongside the corresponding chemical diversity analysis outputs in the same directory.
Files
File preview
ReadME.md
All files
Files
(1.7 GiB)
| Name | Size | |
|---|---|---|
|
md5:e65bb9759095ad344a3f3226ace581be
|
2.1 KiB | Download |
|
md5:5e0bca4b6672e9b8729e0c54647238d4
|
3.3 MiB | Download |
|
md5:3f8d1cc22605428f5ce399b87473e124
|
14.6 KiB | Preview Download |
|
md5:99e5450ed3045344b2c995e8191e1ce9
|
20.7 MiB | Download |
|
md5:aa7f9e44058b531e4bef08dfa183bac1
|
1.5 KiB | Download |
|
md5:3827c05e75f8a13f41f6d34f850c170c
|
242.9 MiB | Download |
|
md5:79da704e3b45e87f2f760ab0a3392744
|
1.5 GiB | Download |
|
md5:2fb8b9603d9de5df73ac2ea7905a1b1a
|
3.6 KiB | Download |
|
md5:f25dc4f78b23e6518ea7ecdfbdb45407
|
9.8 KiB | Download |
References
Journal reference (Paper where the data and methods are discussed. Submitted to Digital Discovery. (under review)) M.Azzouzi, T. Worakul, C. Corminboeuf, Digital Discovery XX, XX (20XX)
Preprint (Conference proceeding where this work was first presented.) M.Azzouzi, T. Worakul, C. Corminboeuf. AI for Accelerated Materials Design - NeurIPS 2025.