Using collective knowledge to assign oxidation states

Authors: Kevin Maik Jablonka1, Daniele Ongari1, Seyed Mohamad Moosavi1, Berend Smit1*

  1. Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingenierie Chimiques (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), Sion, VS, Switzerland
  • Corresponding author email: berend.smit@epfl.ch

DOI10.24435/materialscloud:2019.0085/v1 (version v1, submitted on 11 December 2019)

How to cite this entry

Kevin Maik Jablonka, Daniele Ongari, Seyed Mohamad Moosavi, Berend Smit, Using collective knowledge to assign oxidation states, Materials Cloud Archive (2019), doi: 10.24435/materialscloud:2019.0085/v1.

Description

Knowledge of the oxidation state of a metal centre in a material is essential to understand its properties. Chemists have developed several theories to predict the oxidation state on the basis of the chemical formula. These methods are quite successful for simple compounds but often fail to describe the oxidation states of more complex systems, such as metal-organic frameworks. In this work, we present a data-driven approach to automatically assign oxidation states, using a machine learning algorithm trained on the assignments by chemists encoded in the chemical names in the Cambridge Crystallographic Database. Our approach only considers the immediate local chemical environment around a metal centre and, in this way, is robust to most of the experimental uncertainties in these structures (like incorrect protonation or unbound solvents). We find such excellent accuracy (>98%) in our predictions that we can use our method to identify a large number of incorrect assignments in the database. The predictions of our model follow chemical intuition, without explicitly having taught the model those heuristics. This work nicely illustrates how powerful the collective knowledge of chemists actually is. Machine learning can harvest this knowledge and convert it into a useful tool for chemists.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive entry.

Files

File name Size Description
datapackage.zip
MD5MD5: 06c443c12aaf2584dbdcadaf2144d13a
100.7 MiB datapackage containing features, labels, CSD codes, feature names and pre-trained models.
README.txt
MD5MD5: c43b268e6071f1793417b125060f5b93
2.1 KiB README.txt detailing file contents of the datapackage

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.

External references

Journal reference
K. M. Jablonka, D. Ongari, S. M. Moosavi, B. Smit, submitted, 2019.
Software (Code that can be used to generate the feature matrix. )
Software (Software that implements the code to train and test the models. )

Keywords

ERC MOF ML oxidation state LSMO EPFL

Version history

11 December 2019 [This version]