# Source data files for "A data-driven perspective on the colours of metal-organic frameworks"

## 0_models 
Contains the `.joblib` files of trained `scikit-learn` models that can be opened in Python 3 with following code snipped 

```(python)
import joblib 
model = joblib.load(<joblibfile>)
```

More information about how to use `.joblib` files can be found at https://scikit-learn.org/stable/modules/model_persistence.html. 

The folder contains three files: 

- `regressor_0_1run_2020_09_10_13_19_1599736778False.joblib`: the model predicting the 0.1 quantiles
- `regressor_0_9run_2020_09_10_13_19_1599736778False.joblib`: the model predicting the 0.9 quantiles
- `regressor_medianrun_2020_09_10_13_19_1599736778False.joblib`: the model predicting the median colours
- `scaler_run_2020_09_10_13_19_1599736778.joblib`: a standard scaler object 


## 1_bootstrapped_metrics
Contains `.csv` files of bootstrapped metrics on a holdout set for the model and two baseline models: 

- `mean_bootsrapped_results.csv`: results for a mean baseline model
- `median_bootsrapped_results.csv`: results for a median baseline model 
- `model_bootsrapped_results.csv`: results for the GBDT model 

## 2_data 
Contains the holdout and the development set that were use to train and test the model. 

## 3_feature_importance
Contains the SHAP feature importance arrays (with features in the columns and averaged over all the base estimators of the bagged model). These are the data needed to reproduce Figures 6 and 7 in the manuscript. 

2D arrays (samples x features)
- `shap_blue.txt`: feature importance for blue color channel
- `shap_red.txt`: feature importance for red color channel 
- `shap_green.txt`: feature importance for green color channel

interaction values 3D arrays (samples x features x features)
- `interaction_blue_averaged.npy`: feature interaction values for blue color channel
- `interaction_red_averaged.npy`: feature interaction values for red color channel 
- `interaction_green_averaged.npy`: feature interaction valuesfor green color channel