The files represent the two databases used for analysis in the paper. The .npz files contain information about each structure and can be opened with a dataclass:
@dataclass class R4Data:
structures: Path # cif-structure files
data: Path # only for the MP (contains energies)
geo: Path # geometric npz, distribution of radii, alpha-parameters, etc.
soap: Path # soap vectors
chem: Path # chemiscope file
@classmethod
def from_prefix(cls, prefix: Path):
return cls(
structures=prefix / "structures.xyz",
data=prefix / "data.zip",
geo=prefix / "geo.npz",
soap=prefix / "soap.npz",
chem=prefix / "chem.json.gz"
)
Better instructions on how to re-run the code can be found on github: https://github.com/epfl-theos/r4-project.
The MC3D.tar.xz represents the Materials Cloud 3-dimensional crystal structures "source" database (MC3D), containing 79854 inorganic structures. The compressed file contains
The full .xyz file has not yet been published and is therefore not included in the compressed file. However, the MC3D_ids.yaml file contains the version and the ID of each compound.
The MP.tar.xz represents the Materials Project crystal structures database (MP), containing 83989 inorganic structures. The compressed file contains