The dataset consists of a large number of Quantum Espresso log files, packed into separate tar.gz files. The files are divided into 6 categories, based on the chemical species adsorbed to the surface.

H.tar.gz: adsorption of H
C.tar.gz: adsorption of C, CH, CH2 and CH3
N.tar.gz: adsorption of N, NH and NH2
O.tar.gz: adsorption of O, OH, H2O
S.tar.gz: adsorption of S and SH (and some additional adsorption of H)
references.tar.gz: Bulk structures, slab geometries without adsorbates, and gas phase molecules used as references.

After downloading the files or interest, they can be extracted with tar:

$tar -xzvf H.tar.gz

which will extract .log files to the directories winther/, mamunm/ and jrboes/, containing the files generated by each author.

After extraction the log files can be read into atomic structures with ASE (https://wiki.fysik.dtu.dk/ase/). 

```
#!/usr/bin/env python

from ase.io import read
from ase.visualize import view
atoms = read('winther/12116.log')

E = atoms.get_potential_energy()
view(atoms)
```

Preprocessed adsorption energies from the dataset is available from http://www.Catalysis-Hub.org/publications/MamunHighT2019/ as well as the catalysis-hub Python API: https://github.com/SUNCAT-Center/CatHub/tutorials/1_bimetallic_alloys/. Although each initial calculated structure is unique, there is a noticable amount of duplicate geometries among the optimized structures, due to reorganization of the adsorbates. The script used to determine surface reconstruction and adsorption sites after relaxation is available at https://github.com/SUNCAT-Center/CatHub/cathub/classification.py.