The beta-glycine dataset ######################## Description =========== The beta-glycine dataset is created with the purpose of validating the electron machine learning potential (eMLP) on crystalline beta glycine. It contains 25,676 configurations with normal mode perturbations for the nuclei and unit cell and electric field perturbations. Energies, forces and Wannier centers are computed using density functional theory (DFT) with the PBE functional and a Plane-Wave basis set in the ab-initio quantum chemistry code QuantumESPRESSO. The beta-glycine dataset is described and used in the following paper: M. Cools-Ceuppens, J. Dambre, T. Verstraelen, Modeling electronic response properties with an explicit-electron machine learning potential (in preparation) Extended xyz file format ======================== All the xyz files in this archive are stored in the so-called extended XYZ file format (https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz). An example is given below: 50 Lattice="5.0819235 -0.0011333923 0.027774505 -0.00016987571 6.236614 -0.0009511302 -2.105486 0.0019548798 4.9750934" Properties=species:S:1:pos:R:3:Z:I:1:force:R:3 energy=-3161.012449830798 efield="0 0 0" hf_estimate=-3161.012449830798 electronic_dipole="0.6779043475404715 -10.902173951948958 -0.7185390453433134" ionic_dipole="287.4206492098615 197.39115999784147 2.266397702946465" C 0.5408310890197754 0.32683509588241577 0.30854031443595886 6 0.0007037109251808858 -0.0003177883461905644 0.0006335198098815134 C 4.52894926071167 3.444002389907837 -0.2688392102718353 6 -0.001697956503432433 -0.00028744932932123863 -0.0007579612011082392 C 1.0813000202178955 0.41322383284568787 -1.1176987886428833 6 0.0003437564877482076 0.0001910329621517713 6.067803373865145e-05 C 3.9890048503875732 3.531252145767212 1.1575239896774292 6 5.2193393427738325e-05 0.00023525593589350034 -0.0004378588621056078 H 3.0946781635284424 0.03335396572947502 -0.5957079529762268 1 0.0002344846049561446 0.00018589075590273306 0.0001532377462213401 ... Es 5.423905400939941 3.3092987278233172 -0.3752628593983651 99 0 0 0 Es 4.461880484802246 3.263245607667878 1.6205625053172112 99 0 0 0 The positions of the Wannier centers are stored as if they were einsteinium atoms (Es) with atom number 99. These correspond with the valence electrons of the system. The first and fifth column define the atom type of all elements. The second to fourth column specify the positions of the atoms (or centers) in angstrom and the last three columns are the forces in eV/angstrom. In the comment line, the energy and hf_estimate (Harris-Foulkes estimate) are given in eV. The keyword efield is the electric field in atomic units. The keyword stress, if given, is the stress tensor in GPa. The ionic_dipole and electronic dipole vectors, if given, are the dipole moments of the nuclei and electrons respectively in atomic units. Files in this archive ===================== beta_glycine_dataset.tar.gz: ------------ The archive of the beta-glycine dataset with 25,676 samples. It contains two extended xyz files. The first one 'stress_data.xyz' contains 15,871 samples without external fields but the stress tensor is computed. The second file 'ext_field_data.xyz' contains 9,805 samples with external fields being applied, but no stress tensor. optimized_geometry.xyz ---------------------- The extended xyz file of the optimized geometry and the Wannier centers of beta-glycine.