Training sets based on uncertainty estimates in the cluster-expansion method

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Kleiven, David</dc:creator>
  <dc:creator>Akola, Jaakko</dc:creator>
  <dc:creator>Peterson, Andrew</dc:creator>
  <dc:creator>Vegge, Tejs</dc:creator>
  <dc:creator>Chang, Jin Hyun</dc:creator>
  <dc:description>Cluster expansion (CE) has gained an increasing level of popularity in recent years, and many strategies have been proposed for training and fitting the CE models to first-principles calculation results. The paper reports a new strategy for constructing a training set based on their relevance in Monte Carlo sampling for statistical analysis and reduction of the expected error. We call the new strategy a "bootstrapping uncertainty structure selection" (BUSS) scheme and compared its performance against a popular scheme where one uses a combination of random structure and ground-state search (referred to as RGS). The provided dataset contains the training sets generated using BUSS and RGS for constructing a CE model for disordered Cu2ZnSnS4 material. The files are in the format of the Atomic Simulation Environment (ASE) database (please refer to ASE documentation for more information Each `.db` file contains 100 DFT calculations, which were generated using iteration cycles. Each iteration cycle is referred to as a generation (marked with `gen` key in the database) and each database contains 10 generations where each generation consists of 10 training structures. See more details in the paper.</dc:description>
  <dc:publisher>Materials Cloud</dc:publisher>
  <dc:rights>Creative Commons Attribution 4.0 International</dc:rights>
  <dc:subject>cluster expansion</dc:subject>
  <dc:subject>Monte Carlo</dc:subject>
  <dc:subject>phase transition</dc:subject>
  <dc:subject>machine learning</dc:subject>
  <dc:subject>energy materials</dc:subject>
  <dc:title>Training sets based on uncertainty estimates in the cluster-expansion method</dc:title>