×

Recommended by

Indexed by

Structure-property maps with kernel principal covariates regression

Benjamin A. Helfrecht1*, Rose K. Cersonsky1*, Guillaume Fraux1*, Michele Ceriotti1*

1 Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

* Corresponding authors emails: benjamin.helfrecht@epfl.ch, rose.cersonsky@epfl.ch, guillaume.fraux@epfl.ch, michele.ceriotti@epfl.ch
DOI10.24435/materialscloud:9e-3j [version v2]

Publication date: Dec 20, 2021

How to cite this record

Benjamin A. Helfrecht, Rose K. Cersonsky, Guillaume Fraux, Michele Ceriotti, Structure-property maps with kernel principal covariates regression, Materials Cloud Archive 2021.225 (2021), https://doi.org/10.24435/materialscloud:9e-3j

Description

Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression, and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we introduce a kernelized version of PCovR and a sparsified extension, and demonstrate the performance of this approach in revealing and predicting structure-property relations in chemistry and materials science, showing a variety of examples including elemental carbon, porous silicate frameworks, organic molecules, amino acid conformers, and molecular materials.

Files

File name Size Description
datasets.tgz
MD5md5:fd7f42bcd62917a994115b7dac03dbf9
102.7 MiB Gzipped TAR archive containing all the datasets used in XYZ format
arginine-kpcovr-0.55-chemiscope.json.gz
MD5md5:4901d18f01498450fddf70d4f1bd0d9e
Visualize on Chemiscope
1.1 MiB Map created with KPCovR for the Arginine-Dipeptide dataset at alpha=0.55 using the chemiscope.org visualizer JSON format
azaphenacenes-kpcovr-0.65-chemiscope.json.gz
MD5md5:8a5d0f6f04c6c26a7c3ce9b3c0668d80
Visualize on Chemiscope
280.9 KiB Map created with KPCovR for the Azaphenacenes dataset at alpha=0.65 using the chemiscope.org visualizer JSON format
C-VII-kpcovr-0.0-chemiscope.json.gz
MD5md5:500809d4a4a62b864c1dd42f2c01732c
Visualize on Chemiscope
1.6 MiB Map created with KPCovR for the AIRSS carbon dataset at alpha=0.0 using the chemiscope.org visualizer JSON format
C-VII-kpcovr-0.5-chemiscope.json.gz
MD5md5:1af1bd5df08b2cb21aecdd9885829a51
Visualize on Chemiscope
1.6 MiB Map created with KPCovR for the AIRSS carbon dataset at alpha=0.5 using the chemiscope.org visualizer JSON format
C-VII-kpcovr-1.0-chemiscope.json.gz
MD5md5:427aa3e3a939fee3a976fa475092e4bb
Visualize on Chemiscope
1.6 MiB Map created with KPCovR for the AIRSS carbon dataset at alpha=1.0 using the chemiscope.org visualizer JSON format
CSD-1000R-kpcovr-0.5-chemiscope.json.gz
MD5md5:709118a8c4ec0460efda82059b0b57a0
Visualize on Chemiscope
1.0 MiB Map created with KPCovR for the NMR Chemical shielding dataset at alpha=0.5 using the chemiscope.org visualizer JSON format
DEEM-global-kpcovr-0.5-chemiscope.json.gz
MD5md5:433087121bd75a693da1c51bdd91a519
Visualize on Chemiscope
3.0 MiB Map created with KPCovR for global properties of DEEM zeolites at alpha=0.5 using the chemiscope.org visualizer JSON format
DEEM-local-kpcovr-0.5-chemiscope.json.gz
MD5md5:196238f8be2815f22257fe791eaa2199
Visualize on Chemiscope
753.9 KiB Map created with KPCovR for local properties of DEEM zeolites at alpha=0.5 using the chemiscope.org visualizer JSON format
qm9-12PC-kpcovr-0.5-chemiscope.json.gz
MD5md5:9000a600226b8bb361eccf89b88e1613
Visualize on Chemiscope
3.3 MiB Map created with KPCovR for the QM9 dataset at alpha=0.5 using the chemiscope.org visualizer JSON format
qm9-12PC-kpcovr-1.0-chemiscope.json.gz
MD5md5:604a61a5e9993b2ba7b59b04a1f6306f
Visualize on Chemiscope
3.3 MiB Map created with KPCovR for the QM9 dataset at alpha=1.0 using the chemiscope.org visualizer JSON format

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

Keywords

machine learning materials science dimensionality reduction kernel methods MaX SNSF ERC EPFL MARVEL/DD1

Version history:

2021.225 (version v2) [This version] Dec 20, 2021 DOI10.24435/materialscloud:9e-3j
2020.80 (version v1) Jul 16, 2020 DOI10.24435/materialscloud:ay-eq