This repository contains all data, configurations, and scripts used to generate the results presented in the paper. It serves as a reference for reproducing the experiments and figures.
This repository contains:
conf_folder/)Singlet_fission_run/)run_reinvent*.sh)Figures_script/)SF_model/, reinvent_prior/)Clone the Navi_diversity repository:
git clone https://github.com/your-org/Navi_diversity.git
cd Navi_diversity
# Follow installation instructions in the Navi_diversity repository
conda env create -f environment.yml
conda activate NaviDiv_test
Note: Replace your-org with the actual organization/user hosting the Navi_diversity repository.
All configuration files contain absolute paths that need to be updated to match your local environment. We provide an automated tool for this:
cd /path/to/NaviDiv_submission_files_2
# Interactive setup (recommended for first-time users)
bash update_all_paths.sh
What it does:
.yaml, .toml, .sh, .py).backups/ directory.env.template file with your configurationManual setup (alternative):
python3 update_paths.py --navidiv-path /path/to/Navi_diversity
After running the path update script, check that paths are correct:
Open .env.template and verify:
NAVIDIV_PATH points to your Navi_diversity installationWORKSPACE_ROOT points to this repositoryCheck a sample config file:
cat conf_folder/test.yaml
Ensure paths are correct for your system.
# Source the environment template
source .env.template
# Or create a permanent .env file
cp .env.template .env
# Edit .env with your preferred paths
source .env
The paper presents results from multiple experimental runs with different diversity scoring configurations. All data is already included in the Singlet_fission_run/ directory.
The experiments used in the paper are located in:
Singlet_fission_run/
├── experiment_1206/ # First complete run (date: Dec 6)
├── experiment_1306/ # Second complete run (date: Dec 13)
├── experiment_1406_1/ # Third run, replicate 1 (date: Dec 14)
├── experiment_1406_2/ # Third run, replicate 2
├── experiment_1406_3/ # Third run, replicate 3
├── experiment_1406_4/ # Third run, replicate 4
└── experiment_1406_5/ # Third run, replicate 5
Each experiment folder contains subdirectories for different diversity scoring configurations:
All_constraints/ - Combined high constraints (all diversity metrics enabled)All_weak_constraints/ - Combined low constraints (relaxed thresholds)fragement_only/ - Fragment-based diversity onlyngram_only/ - N-gram-based diversity onlyscaffold_only/ - Scaffold-based diversity onlysimilarity_only/ - Similarity-based diversity onlyTo reproduce the experiments or run new ones:
# Make sure environment is set up
source .env
# Run a test with one diversity scorer
./run_reinvent_updated.sh
This will run REINVENT for 100 steps (quick test) with the first diversity scorer configuration.
To run the full experiments as in the paper, modify run_reinvent_updated.sh:
Set maximum steps to 1000 (paper value):
# In run_reinvent_updated.sh, change:
reinvent_common.max_steps=100
# to:
reinvent_common.max_steps=1000
Remove the break statement to run all diversity scorers:
# In run_reinvent_updated.sh, remove or comment out:
# break
Run multiple replicates:
# Change RUN_INDEX for each replicate
for i in {1..5}; do
# Edit RUN_INDEX in the script or pass as parameter
RUN_INDEX=$i ./run_reinvent_updated.sh
done
ENV_NAME: Conda environment name (default: NaviDiv_test)CONFIG_NAME: Configuration file to use (default: test)WD: Working directory for outputRUN_INDEX: Run number for organizing replicatesreinvent_common.max_steps: Number of RL steps (100 for test, 1000 for paper)All diversity scorer configurations are in:
conf_folder/diversity_scorer/
├── 1_default.yaml # Baseline configuration
├── All_constraints.yaml # All metrics with high thresholds
├── All_weak_constraints.yaml # All metrics with low thresholds
├── fragement_only.yaml # Fragment diversity only
├── ngram_only.yaml # N-gram diversity only
├── scaffold_only.yaml # Scaffold diversity only
└── similarity_only.yaml # Tanimoto similarity only
To use a specific configuration:
# Edit run_reinvent_updated.sh
# Change the diversity_scorer parameter:
diversity_scorer="All_constraints" # or any other config name
NaviDiv_submission_files_2/
├── README.md # This file
├── update_paths.py # Path update utility
├── update_all_paths.sh # Interactive path setup
├── run_reinvent_updated.sh # Main run script (updated paths)
├── run_reinvent.sh # Original run script (legacy)
│
├── conf_folder/ # Configuration files
│ ├── test.yaml # Main REINVENT config
│ ├── default_config.toml # Transfer learning config
│ ├── diversity_scorer/ # Diversity scoring configs
│ │ ├── All_constraints.yaml
│ │ ├── All_weak_constraints.yaml
│ │ ├── fragement_only.yaml
│ │ ├── ngram_only.yaml
│ │ ├── scaffold_only.yaml
│ │ └── similarity_only.yaml
│ └── reinvent_common/ # Common REINVENT settings
│
├── Singlet_fission_run/ # ** PAPER DATA - All experimental results **
│ ├── experiment_1206/ # First complete run
│ │ ├── All_constraints/ # Results for each diversity config
│ │ ├── All_weak_constraints/
│ │ ├── fragement_only/
│ │ ├── ngram_only/
│ │ ├── scaffold_only/
│ │ └── similarity_only/
│ ├── experiment_1306/ # Second complete run
│ └── experiment_1406_[1-5]/ # Five replicates of third run
│
├── Figures_script/ # ** Figure generation scripts **
│ ├── README.md # Detailed plotting guide
│ ├── plot_steps_multi_experiment.py # Generate plots from multiple runs
│ ├── plot_steps_single_experiment.py # Generate plots from single run
│ ├── customize_figure_multi_experiment.py # Customize and filter plots
│ ├── raw/ # Generated raw plots
│ │ ├── steps_plot_multi_experiment.pkl
│ │ └── steps_plot_multi_experiment.png
│ └── modified/ # Customized final plots
│ └── steps_plot_multi_experiment_second.png
│
├── SF_model/ # Singlet fission model
│ ├── formed.prior # Pre-trained prior
│ ├── agents/ # Agent checkpoints
│ └── formed_chemprop/ # ChemProp model files
│
├── reinvent_prior/ # REINVENT prior model
│ └── formed.prior
│
└── outputs/ # Test outputs (not used in paper)
All figures in the paper were generated using the scripts in Figures_script/. See the detailed guide in Figures_script/README.md.
cd Figures_script
# Step 1: Generate raw plots from all paper experiments
python plot_steps_multi_experiment.py
This will:
Singlet_fission_run/experiment_*/raw/steps_plot_multi_experiment.png and .pkl# Step 2: Customize plots to show only specific panels
python customize_figure_multi_experiment.py
This will:
modified/steps_plot_multi_experiment_second.pngTo select different panels for your figure, edit customize_figure_multi_experiment.py:
# Change keep_indices to show different subplots
# Indices correspond to the order in the raw figure
fig_custom = customize_axes(
axes,
keep_indices=[12, 13, 4, 5], # Show only these subplot indices
# ... other parameters
)
How to find subplot indices:
raw/steps_plot_multi_experiment.pngkeep_indices in the customize scriptThe following diversity metrics are plotted:
For detailed plotting instructions, see Figures_script/README.md.
conf_folder/test.yaml)Key parameters:
run_mode: Type of REINVENT run (e.g., "transfer_learning", "reinforcement_learning")max_steps: Number of RL optimization steps (100 for testing, 1000 for paper)prior_path: Path to pre-trained prior modelagent_path: Path to agent checkpointdiversity_scorer: Which diversity configuration to useEach YAML file in conf_folder/diversity_scorer/ defines:
Example structure:
diversity_metrics:
fragment_diversity:
enabled: true
threshold: 0.7
weight: 1.0
ngram_diversity:
enabled: true
threshold: 0.8
weight: 1.0
Prior Model (SF_model/formed.prior, reinvent_prior/formed.prior):
Agent Checkpoints (SF_model/agents/agent_*.chkpt):
Property Predictor (SF_model/formed_chemprop/):
Problem: FileNotFoundError or paths not found
Solution:
# Re-run path update script
bash update_all_paths.sh
# Verify paths in config files
grep -r "/media/mohammed" conf_folder/
# Should return empty if paths are updated correctly
Problem: Environment 'NaviDiv_test' not found
Solution:
# Create environment from Navi_diversity repository
cd /path/to/Navi_diversity
conda env create -f environment.yml
# Or check existing environments
conda env list
Problem: CUDA out of memory or GPU not available
Solution:
# Edit conf_folder/test.yaml to use CPU
device: "cpu" # instead of "cuda:0"
# Or reduce batch size
batch_size: 50 # instead of 100
Problem: ModuleNotFoundError: No module named 'navidiv'
Solution:
# Make sure PYTHONPATH is set
export PYTHONPATH="${PYTHONPATH}:${NAVIDIV_PATH}/src/navidiv/reinvent"
# Or add to .bashrc for permanent fix
echo 'export PYTHONPATH="${PYTHONPATH}:${NAVIDIV_PATH}/src/navidiv/reinvent"' >> ~/.bashrc
Problem: Cannot write to output directory
Solution:
# Ensure output directories exist and are writable
mkdir -p test_case_3
chmod 755 test_case_3
If you encounter issues:
If you use this data or code in your research, please cite:
@article{your_paper_2025,
title={Your Paper Title},
author={Your Name and Others},
journal={Journal Name},
year={2025}
}
[Add your license information here]
For questions or issues:
This work was supported by [funding sources]. We thank the developers of REINVENT and the NaviDiv framework.
Last Updated: October 2025