# Source data files for "A data-driven perspective on the colours of metal-organic frameworks" ## 0_models Contains the `.joblib` files of trained `scikit-learn` models that can be opened in Python 3 with following code snipped ```(python) import joblib model = joblib.load(<joblibfile>) ``` More information about how to use `.joblib` files can be found at https://scikit-learn.org/stable/modules/model_persistence.html. The folder contains three files: - `regressor_0_1run_2020_09_10_13_19_1599736778False.joblib`: the model predicting the 0.1 quantiles - `regressor_0_9run_2020_09_10_13_19_1599736778False.joblib`: the model predicting the 0.9 quantiles - `regressor_medianrun_2020_09_10_13_19_1599736778False.joblib`: the model predicting the median colours - `scaler_run_2020_09_10_13_19_1599736778.joblib`: a standard scaler object ## 1_bootstrapped_metrics Contains `.csv` files of bootstrapped metrics on a holdout set for the model and two baseline models: - `mean_bootsrapped_results.csv`: results for a mean baseline model - `median_bootsrapped_results.csv`: results for a median baseline model - `model_bootsrapped_results.csv`: results for the GBDT model ## 2_data Contains the holdout and the development set that were use to train and test the model. ## 3_feature_importance Contains the SHAP feature importance arrays (with features in the columns and averaged over all the base estimators of the bagged model). These are the data needed to reproduce Figures 6 and 7 in the manuscript. 2D arrays (samples x features) - `shap_blue.txt`: feature importance for blue color channel - `shap_red.txt`: feature importance for red color channel - `shap_green.txt`: feature importance for green color channel interaction values 3D arrays (samples x features x features) - `interaction_blue_averaged.npy`: feature interaction values for blue color channel - `interaction_red_averaged.npy`: feature interaction values for red color channel - `interaction_green_averaged.npy`: feature interaction valuesfor green color channel