NaviDiv Submission Files - Data Repository for Paper

This repository contains all data, configurations, and scripts used to generate the results presented in the paper. It serves as a reference for reproducing the experiments and figures.

Repository Overview
Prerequisites
Initial Setup
Reproducing Paper Results
Directory Structure
Generating Figures
Configuration Details
Troubleshooting

Repository Overview

This repository contains:

Configuration files for all experiments (conf_folder/)
Complete experimental data used in the paper (Singlet_fission_run/)
Scripts for running REINVENT with NaviDiv (run_reinvent*.sh)
Figure generation scripts (Figures_script/)
Trained models and priors (SF_model/, reinvent_prior/)
Path management utilities for easy setup on different systems

Prerequisites

Required Software

Anaconda or Miniconda - for environment management
Python 3.8+ - with conda
CUDA-capable GPU (optional but recommended)
Navi_diversity package - the main NaviDiv codebase

Installing Navi_diversity

Clone the Navi_diversity repository:

git clone https://github.com/your-org/Navi_diversity.git
cd Navi_diversity
# Follow installation instructions in the Navi_diversity repository
conda env create -f environment.yml
conda activate NaviDiv_test

Note: Replace your-org with the actual organization/user hosting the Navi_diversity repository.

Initial Setup

Step 1: Update Paths for Your System

All configuration files contain absolute paths that need to be updated to match your local environment. We provide an automated tool for this:

cd /path/to/NaviDiv_submission_files_2

# Interactive setup (recommended for first-time users)
bash update_all_paths.sh

What it does:

Prompts you for the path to your Navi_diversity installation
Automatically updates all configuration files (.yaml, .toml, .sh, .py)
Creates backups of modified files in .backups/ directory
Generates an .env.template file with your configuration

Manual setup (alternative):

python3 update_paths.py --navidiv-path /path/to/Navi_diversity

Step 2: Verify Your Configuration

After running the path update script, check that paths are correct:

Open .env.template and verify:
- NAVIDIV_PATH points to your Navi_diversity installation
- WORKSPACE_ROOT points to this repository
Check a sample config file:
```
cat conf_folder/test.yaml
```
Ensure paths are correct for your system.

Step 3: Set Up Environment Variables

# Source the environment template
source .env.template

# Or create a permanent .env file
cp .env.template .env
# Edit .env with your preferred paths
source .env

Reproducing Paper Results

The paper presents results from multiple experimental runs with different diversity scoring configurations. All data is already included in the Singlet_fission_run/ directory.

Paper Experiments Location

The experiments used in the paper are located in:

Singlet_fission_run/
├── experiment_1206/      # First complete run (date: Dec 6)
├── experiment_1306/      # Second complete run (date: Dec 13)
├── experiment_1406_1/    # Third run, replicate 1 (date: Dec 14)
├── experiment_1406_2/    # Third run, replicate 2
├── experiment_1406_3/    # Third run, replicate 3
├── experiment_1406_4/    # Third run, replicate 4
└── experiment_1406_5/    # Third run, replicate 5

Each experiment folder contains subdirectories for different diversity scoring configurations:

All_constraints/ - Combined high constraints (all diversity metrics enabled)
All_weak_constraints/ - Combined low constraints (relaxed thresholds)
fragement_only/ - Fragment-based diversity only
ngram_only/ - N-gram-based diversity only
scaffold_only/ - Scaffold-based diversity only
similarity_only/ - Similarity-based diversity only

Running New Experiments

To reproduce the experiments or run new ones:

Quick Test Run (100 steps)

# Make sure environment is set up
source .env

# Run a test with one diversity scorer
./run_reinvent_updated.sh

This will run REINVENT for 100 steps (quick test) with the first diversity scorer configuration.

Full Paper Reproduction

To run the full experiments as in the paper, modify run_reinvent_updated.sh:

Set maximum steps to 1000 (paper value):

# In run_reinvent_updated.sh, change:
reinvent_common.max_steps=100
# to:
reinvent_common.max_steps=1000

Remove the break statement to run all diversity scorers:

# In run_reinvent_updated.sh, remove or comment out:
# break

Run multiple replicates:

# Change RUN_INDEX for each replicate
for i in {1..5}; do
    # Edit RUN_INDEX in the script or pass as parameter
    RUN_INDEX=$i ./run_reinvent_updated.sh
done

Key Parameters in Run Script

ENV_NAME: Conda environment name (default: NaviDiv_test)
CONFIG_NAME: Configuration file to use (default: test)
WD: Working directory for output
RUN_INDEX: Run number for organizing replicates
reinvent_common.max_steps: Number of RL steps (100 for test, 1000 for paper)

Configuration Files

All diversity scorer configurations are in:

conf_folder/diversity_scorer/
├── 1_default.yaml              # Baseline configuration
├── All_constraints.yaml        # All metrics with high thresholds
├── All_weak_constraints.yaml   # All metrics with low thresholds
├── fragement_only.yaml         # Fragment diversity only
├── ngram_only.yaml             # N-gram diversity only
├── scaffold_only.yaml          # Scaffold diversity only
└── similarity_only.yaml        # Tanimoto similarity only

To use a specific configuration:

# Edit run_reinvent_updated.sh
# Change the diversity_scorer parameter:
diversity_scorer="All_constraints"  # or any other config name

Directory Structure

NaviDiv_submission_files_2/
├── README.md                          # This file
├── update_paths.py                    # Path update utility
├── update_all_paths.sh                # Interactive path setup
├── run_reinvent_updated.sh            # Main run script (updated paths)
├── run_reinvent.sh                    # Original run script (legacy)
│
├── conf_folder/                       # Configuration files
│   ├── test.yaml                      # Main REINVENT config
│   ├── default_config.toml            # Transfer learning config
│   ├── diversity_scorer/              # Diversity scoring configs
│   │   ├── All_constraints.yaml
│   │   ├── All_weak_constraints.yaml
│   │   ├── fragement_only.yaml
│   │   ├── ngram_only.yaml
│   │   ├── scaffold_only.yaml
│   │   └── similarity_only.yaml
│   └── reinvent_common/               # Common REINVENT settings
│
├── Singlet_fission_run/               # ** PAPER DATA - All experimental results **
│   ├── experiment_1206/               # First complete run
│   │   ├── All_constraints/           # Results for each diversity config
│   │   ├── All_weak_constraints/
│   │   ├── fragement_only/
│   │   ├── ngram_only/
│   │   ├── scaffold_only/
│   │   └── similarity_only/
│   ├── experiment_1306/               # Second complete run
│   └── experiment_1406_[1-5]/         # Five replicates of third run
│
├── Figures_script/                    # ** Figure generation scripts **
│   ├── README.md                      # Detailed plotting guide
│   ├── plot_steps_multi_experiment.py # Generate plots from multiple runs
│   ├── plot_steps_single_experiment.py # Generate plots from single run
│   ├── customize_figure_multi_experiment.py # Customize and filter plots
│   ├── raw/                           # Generated raw plots
│   │   ├── steps_plot_multi_experiment.pkl
│   │   └── steps_plot_multi_experiment.png
│   └── modified/                      # Customized final plots
│       └── steps_plot_multi_experiment_second.png
│
├── SF_model/                          # Singlet fission model
│   ├── formed.prior                   # Pre-trained prior
│   ├── agents/                        # Agent checkpoints
│   └── formed_chemprop/               # ChemProp model files
│
├── reinvent_prior/                    # REINVENT prior model
│   └── formed.prior
│
└── outputs/                           # Test outputs (not used in paper)

Generating Figures

All figures in the paper were generated using the scripts in Figures_script/. See the detailed guide in Figures_script/README.md.

Quick Start - Regenerate Paper Figures

cd Figures_script

# Step 1: Generate raw plots from all paper experiments
python plot_steps_multi_experiment.py

This will:

Read data from Singlet_fission_run/experiment_*/
Generate plots for all diversity metrics
Save outputs to raw/steps_plot_multi_experiment.png and .pkl

# Step 2: Customize plots to show only specific panels
python customize_figure_multi_experiment.py

This will:

Load the saved pickle file
Select specific axes (subplots) to display
Apply custom styling (labels, colors, fonts)
Save customized figure to modified/steps_plot_multi_experiment_second.png

Customizing Which Plots to Show

To select different panels for your figure, edit customize_figure_multi_experiment.py:

# Change keep_indices to show different subplots
# Indices correspond to the order in the raw figure
fig_custom = customize_axes(
    axes,
    keep_indices=[12, 13, 4, 5],  # Show only these subplot indices
    # ... other parameters
)

How to find subplot indices:

Open raw/steps_plot_multi_experiment.png
Count subplots from top-left to bottom-right (starting at 0)
Note the indices of the plots you want to keep
Update keep_indices in the customize script

Available Metrics in Plots

The following diversity metrics are plotted:

Score - Overall reward score from REINVENT
Prior - Negative log-likelihood from prior model
Appeared more than 10 times - Structures appearing in >10% of molecules
mean_distance - Average Tanimoto distance (diversity)
mean_similarity - Average Tanimoto similarity
Percentage of Unique Fragments - Fragment diversity ratio
Unique Circles (Morgan Fingerprint) - Circular fingerprint diversity
10-gram statistics - N-gram based diversity metrics
Scaffold statistics - Scaffold diversity metrics

For detailed plotting instructions, see Figures_script/README.md.

Configuration Details

Main REINVENT Configuration (`conf_folder/test.yaml`)

Key parameters:

run_mode: Type of REINVENT run (e.g., "transfer_learning", "reinforcement_learning")
max_steps: Number of RL optimization steps (100 for testing, 1000 for paper)
prior_path: Path to pre-trained prior model
agent_path: Path to agent checkpoint
diversity_scorer: Which diversity configuration to use

Diversity Scorer Configurations

Each YAML file in conf_folder/diversity_scorer/ defines:

Enabled metrics: Which diversity metrics to calculate
Thresholds: Penalty thresholds for each metric
Weights: Relative importance of each metric
Scoring mode: How penalties are combined

Example structure:

diversity_metrics:
  fragment_diversity:
    enabled: true
    threshold: 0.7
    weight: 1.0
  ngram_diversity:
    enabled: true
    threshold: 0.8
    weight: 1.0

Model Files

Prior Model (SF_model/formed.prior, reinvent_prior/formed.prior):
- Pre-trained generative model for sampling molecules
- Used as baseline for RL optimization
Agent Checkpoints (SF_model/agents/agent_*.chkpt):
- Saved agent states during training
- Can be used to resume training or analyze learning progression
Property Predictor (SF_model/formed_chemprop/):
- ChemProp model for predicting singlet fission properties
- Used as reward function during RL

Troubleshooting

Common Issues

1. Path Errors

Problem: FileNotFoundError or paths not found

Solution:

# Re-run path update script
bash update_all_paths.sh

# Verify paths in config files
grep -r "/media/mohammed" conf_folder/
# Should return empty if paths are updated correctly

2. Conda Environment Not Found

Problem: Environment 'NaviDiv_test' not found

Solution:

# Create environment from Navi_diversity repository
cd /path/to/Navi_diversity
conda env create -f environment.yml

# Or check existing environments
conda env list

3. CUDA/GPU Errors

Problem: CUDA out of memory or GPU not available

Solution:

# Edit conf_folder/test.yaml to use CPU
device: "cpu"  # instead of "cuda:0"

# Or reduce batch size
batch_size: 50  # instead of 100

4. Import Errors

Problem: ModuleNotFoundError: No module named 'navidiv'

Solution:

# Make sure PYTHONPATH is set
export PYTHONPATH="${PYTHONPATH}:${NAVIDIV_PATH}/src/navidiv/reinvent"

# Or add to .bashrc for permanent fix
echo 'export PYTHONPATH="${PYTHONPATH}:${NAVIDIV_PATH}/src/navidiv/reinvent"' >> ~/.bashrc

5. Permission Errors

Problem: Cannot write to output directory

Solution:

# Ensure output directories exist and are writable
mkdir -p test_case_3
chmod 755 test_case_3

Getting Help

If you encounter issues:

Check the Navi_diversity repository documentation
Verify all paths are correctly updated
Ensure conda environment is activated
Check log files in the output directory

Citation

If you use this data or code in your research, please cite:

@article{your_paper_2025,
  title={Your Paper Title},
  author={Your Name and Others},
  journal={Journal Name},
  year={2025}
}

License

[Add your license information here]

Contact

For questions or issues:

Open an issue in the repository
Contact: [your-email@example.com]

Acknowledgments

This work was supported by [funding sources]. We thank the developers of REINVENT and the NaviDiv framework.

Last Updated: October 2025

NaviDiv Submission Files - Data Repository for Paper

Table of Contents

Repository Overview

Prerequisites

Required Software

Installing Navi_diversity

Initial Setup

Step 1: Update Paths for Your System

Step 2: Verify Your Configuration

Step 3: Set Up Environment Variables

Reproducing Paper Results

Paper Experiments Location

Running New Experiments

Quick Test Run (100 steps)

Full Paper Reproduction

Key Parameters in Run Script

Configuration Files

Directory Structure

Generating Figures

Quick Start - Regenerate Paper Figures

Customizing Which Plots to Show

Available Metrics in Plots

Configuration Details

Main REINVENT Configuration (conf_folder/test.yaml)

Diversity Scorer Configurations

Model Files

Troubleshooting

Common Issues

1. Path Errors

2. Conda Environment Not Found

3. CUDA/GPU Errors

4. Import Errors

5. Permission Errors

Getting Help

Citation

License

Contact

Acknowledgments

Main REINVENT Configuration (`conf_folder/test.yaml`)