The zip contains the following data

- data folder with following files:
  - mof_model: Data for the model trained on the MOF subset of the CSD:
    - features_all.npy: numpy array with all features (train, validation, test)
    - labels_all.npy: numpy array with all labels (train, validation, test)
    - names_all.pkl: pickle file with all names (CSD codes, for train, validation, test)
    - scaler_0.joblib: the standard scaler file
    - votingclassifier.joblib: the pretrained votingclassifier
  - all_csd_cod: Data for the model trained on all the CSD and the COD:
    - features_train.npy: numpy array with all features used for training
    - features_test.npy: numpy array with all features used for testing
    - features_valid.npy: numpy array with all features used for validation
    - labels_train.npy: numpy array with all labels used for training
    - labels_test.npy: numpy array with all labels used for testing
    - labels_valid.npy: numpy array with all labels used for validation
    - 20200825-200503_scaler.joblib: the standard scaler file
    - 20200827-120447_ensemble_0.joblib: the pretrained votingclassifier
    - all_data.csv: containing all features, labels and names in csv format (with duplicates dropped)
  - all_data.csv: Features, labels, names and oxidation states for all metal sites

The models (votingclassifier.joblib, 20200827-120447_ensemble_0.joblib) requires some modules (the votingclassifier class) from 10.5281/zenodo.3567011 and 10.5281/zenodo.3567274. The code at this reference was also used to train and test the model. The codes can be installed (preferably in a fresh virtual environment, see e.g., https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) e.g. using
pip install git+https://github.com/kjappelbaum/oximachine_featurizer.git
pip install git+https://github.com/kjappelbaum/learn_mof_ox_state.git
Dockerfiles for a webapp that use the model (and to install all dependencies) are available at https://github.com/kjappelbaum/oximachinetool.
A Python package that allows to run the model to make new predictions for a given CIF is available at https://github.com/kjappelbaum/oximachinerunner.

The code at 10.5281/zenodo.3567274 was used to generate the input data for the models (features, labels, names).

To open .npy files you need the numpy python package version 1.16 or above. You can load the data from Python in this way (https://docs.scipy.org/doc/numpy/reference/routines.io.html):

    import numpy
    data = numpy.load("features_all.npy")

To open .joblib files you need the joblib python package (https://pypi.org/project/joblib/). You can load the data from Python in this way:

    import joblib
    model = joblib.load("20191111-075310_ensemble_0.joblib")

To open the pickle file you need to use Python version 3 or higher. They can then be loaded in the following way

    import pickle
    with open('names_all.pkl', 'rb') as fh:
        names = pickle.load(fh)