Recommended by

Indexed by

Training sets based on uncertainty estimates in the cluster-expansion method

David Kleiven1*, Jaakko Akola1,2, Andrew Peterson3,4, Tejs Vegge4, Jin Hyun Chang4*

1 Department of Physics, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway

2 Computational Physics Laboratory, Tampere University, P.O. Box 692, FI-33014 Tampere, Finland

3 School of Engineering, Brown University, Providence, RI 02912, United States of America

4 Department of Energy Conversion and Storage, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark

* Corresponding authors emails: david.kleiven@ntnu.no, jchang@dtu.dk
DOI10.24435/materialscloud:ha-ca [version v1]

Publication date: Feb 03, 2022

How to cite this record

David Kleiven, Jaakko Akola, Andrew Peterson, Tejs Vegge, Jin Hyun Chang, Training sets based on uncertainty estimates in the cluster-expansion method, Materials Cloud Archive 2022.21 (2022), doi: 10.24435/materialscloud:ha-ca.


Cluster expansion (CE) has gained an increasing level of popularity in recent years, and many strategies have been proposed for training and fitting the CE models to first-principles calculation results. The paper reports a new strategy for constructing a training set based on their relevance in Monte Carlo sampling for statistical analysis and reduction of the expected error. We call the new strategy a "bootstrapping uncertainty structure selection" (BUSS) scheme and compared its performance against a popular scheme where one uses a combination of random structure and ground-state search (referred to as RGS). The provided dataset contains the training sets generated using BUSS and RGS for constructing a CE model for disordered Cu2ZnSnS4 material. The files are in the format of the Atomic Simulation Environment (ASE) database (please refer to ASE documentation for more information https://wiki.fysik.dtu.dk/ase/index.html). Each `.db` file contains 100 DFT calculations, which were generated using iteration cycles. Each iteration cycle is referred to as a generation (marked with `gen` key in the database) and each database contains 10 generations where each generation consists of 10 training structures. See more details in the paper.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.


File name Size Description
972.0 KiB ASE database containing the training structures generated using BUSS scheme
544.0 KiB ASE database containing the training structures generated using RGS scheme


Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

Journal reference (Paper in which the method is described)


BIG-MAP cluster expansion Monte Carlo phase transition bootstrapping machine learning energy materials

Version history:

2022.21 (version v1) [This version] Feb 03, 2022 DOI10.24435/materialscloud:ha-ca