This README file provides an overview of the dataset included in this submission, which contains a collection of VASP outputs, xyz files, and a Chemiscope file corresponding to the HEA25 dataset. This dataset describes FCC and BCC configurations containing 25 d-group elements (all the first three periods except Tc, Re, Os, Hg, Cd), as published in the article https://arxiv.org/abs/2212.13254.
The file HEA25.extxyz contains a collection of fcc and bcc configurations containing 25 d-group elements (all the first three periods except Tc, Re, Os, Hg, Cd). Each configuration is accompanied by its corresponding energies, forces and stress tensor computed using VASP. Volumes are set to random perturbations around the mean molar volume of each composition, with some structures being in the ideal lattice positions and some in distorted configurations.
Every structure has the following info fields: "class", "crystal", "name", "energy", "free_energy", "stress".
"Class" reflects how the structure was generated: 1 -- perfect crystals; 3-8 elements per structure, 2 -- shuffled positions (standard deviation 0.2\AA ); 3-8 elements per structure, 3 -- shuffled positions (standard deviation 0.5\AA ); 3-8 elements per structure, 4 -- shuffled positions (standard deviation 0.2\AA ); 3-25 elements per structure
"crystal" indicates the lattice taken as a basis for the structure "fcc" -- structure initialized with face-centered cubic lattice "bcc" -- structure initialized with body-centered cubic lattice
"name" is a sequential number of a SCF DFT calculation (and it's corresponding folder) in a subset with the same "class" and "crystal" values.
"energy" is the value of energy(sigma->0) output of VASP in eV
"free_energy" is the value of TOTEN output of VASP in eV
"stress" is the value of STRESS output of VASP in eV/A^2
a data file that can be used to generate an interactive visualization of the training data and can be loaded on http://chemiscope.org. each structure in the dataset is described by the first 4 components of the principal covariates regression performed using the radial spectrum as input descriptors and the free energy as regression target.
an archive containing some of the raw output files obtained during the calculations of HEA25 dataset. The archive contains 8 folders (bcc1, bcc2, bcc3, bcc4, fcc1, fcc2, fcc3, fcc4), where the name indicates the basis lattice structure (bcc,fcc) and the "class" of the subset (1,2,3,4), reflecting how the subset was generated (see the "Class" in section dedicated to HEA25.extxyz). The name of the subfolders corresponds to the "name" tag in extxyz file.
Each subfolder corresponds to one single point calculation and may contain several VASP output files, including "OUTCAR", "EIGENVAL", "OSZICAR", and "DOSCAR".
"OUTCAR" is an output file generated by VASP that provides information on the calculation progress, convergence, and final results of a VASP calculation.
"EIGENVAL" is another output file generated by VASP that contains the eigenvalues of the Hamiltonian matrix and is used to calculate the band structure of the system.
"OSZICAR" is a file containing a VASP calculation's convergence history, including the energy and forces at each ionic step.
"DOSCAR" is a file that contains the density of states (DOS) of the system, which provides information on the energy levels available to the electrons in the system.
Every folder contains a file named files_dict.json
, which lists indices of folders (same as tag "name" in HEA25.extxyz) where the corresponding data file can be found. The "XYZ" tag indicates the complete list of indices (or "name"s) belonging to the corresponding subset (crystal type + class), which were included in the "HEA25.extxyz" file.
files_dict_summary.json
contains a summary of the number of files of each type that are found in each folder, where the tag 'XYZ' indicates how many structures of this type are included in the final extxyz file (HEA25.extxyz).