Published November 7, 2025 | Version v1
Dataset Open

NaviDiv: a web app for monitoring chemical diversity in generative molecular design

  • 1. ROR icon École Polytechnique Fédérale de Lausanne

* Contact person

Description

The rapid progress in generative models for molecular design has led to extensive libraries of candidate molecules for biological and chemical applications. However, ensuring these molecules are diverse and representative of broader chemical space remains challenging, with researchers often over-exploring limited regions or missing promising candidates due to inadequate monitoring tools. This work presents NaviDiv (Navigating Diversity in Chemical Space), a comprehensive web-based framework for managing chemical diversity in the string-based generative molecular design through three integrated capabilities: multi-metric diversity analysis capturing structural, syntactic, and molecular framework variations; interactive real-time visualization enabling immediate detection of model collapse; and adaptive constraint generation that dynamically guides optimization while preserving diversity. Through a singlet fission material discovery case study using REINVENT4, we demonstrate that different diversity metrics (i.e. structural similarity, fragment composition, and sequence patterns) respond differently during optimization, with constraint effectiveness depending critically on representational alignment with the generative model. N-gram-based constraints outperform fingerprint-based approaches due to direct correspondence with SMILES generation, while combined constraints maintain diversity across all metrics while achieving optimization performance within 15\% of unconstrained baselines. The framework is freely available at : https://github.com/LCMD-epfl/NaviDiv, providing accessible tools for data-driven decisions about diversity-property trade-offs in automated molecular discovery.
This dataset contains all materials generated during the singlet fission material discovery case study presented in this work. It includes: (1) the REINVENT4 prior model used for molecule generation, (2) the trained ChemProp model checkpoints used to evaluate singlet fission potential, and (3) the generated molecules from each experimental run. For each run, molecules are stored in a CSV file alongside the corresponding chemical diversity analysis outputs in the same directory.

Files

File preview

All files

Files (1.7 GiB)

Name Size
md5:e65bb9759095ad344a3f3226ace581be
2.1 KiB Download
md5:5e0bca4b6672e9b8729e0c54647238d4
3.3 MiB Download
md5:3f8d1cc22605428f5ce399b87473e124
14.6 KiB Preview Download
md5:99e5450ed3045344b2c995e8191e1ce9
20.7 MiB Download
md5:aa7f9e44058b531e4bef08dfa183bac1
1.5 KiB Download
md5:3827c05e75f8a13f41f6d34f850c170c
242.9 MiB Download
md5:79da704e3b45e87f2f760ab0a3392744
1.5 GiB Download
md5:2fb8b9603d9de5df73ac2ea7905a1b1a
3.6 KiB Download
md5:f25dc4f78b23e6518ea7ecdfbdb45407
9.8 KiB Download

References

Journal reference (Paper where the data and methods are discussed. Submitted to Digital Discovery. (under review))
M.Azzouzi, T. Worakul, C. Corminboeuf, Digital Discovery XX, XX (20XX)

Preprint (Conference proceeding where this work was first presented.)
M.Azzouzi, T. Worakul, C. Corminboeuf. AI for Accelerated Materials Design - NeurIPS 2025.