High-throughput inverse design and optimization of functionalities: spin splitting in two-dimensional compounds: Data description

The data provided in this repository is divided into three main sections:

  • SS Tables (ss_tables.tar.gz): csv files (Excel compatible) containing an overview of the work's main findings, including a full list of SSs identified, separated by the SS prototypes (Rashba, Dresselhaus, Zeeman and High-order), as defined in the main paper. They contain the materials general information (e.g. id, symmetry, band gap, energy above convex hull) and SS related information (e.g. SS magnitude and localization in the band structure).
  • Calculations data: Structures and main results of the calculations for each material selected in the work. That includes the full list of kpoints, eigenvalues, orbital and spin polarization projections computed for all materials bandstructures, in a DFT-code-agnostic format (python based). Rendered images of structure and band structure plots for each compound are also included in the folders. These are available in the materials.tar.gz file, structured with each material as a subfolder and labeled by its material ID from the C2DB Database [1].
  • SS description: All post-processed information regarding spin-splitting (SS) identification by the developed algorithm, for all materials analysed in the work. These are in the form of a json file (ss_2d_materials.json) and a MongoDB database (ss_2d_materials.gz), contained in the databases.tar.gz tarball.

Below there are instructions for accessing the data using python.

Calculations data

For each material available in the materials folder, the following files are available:

  • band_structure.png figures: Plotted bandstructures for each spin polarization direction as well as one without spin texture.
  • bands_data: Pickle file with complete data of computed band structures, including orbital and spin polarization projections.
  • Structure file: cif file containing structural information of the calculated material, with the naming convention of <material_id>.cif.
  • Structure image: png file with a rendered representation of the material's structure, with the naming convention of <material_id>.png.

Instructions for reading/opening them are below:

Bands data

These pickle files can be read with:

import pickle

with open('path/to/materials/material_id/bands_data.pickle', 'rb') as file:
    bands_data = pickle.load(file)

The resulting bands_data is a python dictionary with the following keys:

  • nbands [int]: number of calculated bands
  • nkpoints [int]: number of calculated kpoints along the entire band structure reciprocal path
  • vb_index [int]: index of the valence band.
  • cb_index [int]: index of the conduction band.
  • efermi [float]: fermi energy acording to calculation results parsed by ASE [2].
  • nelect [float]: number of electrons in the calculation, values parsed by Pymatgen [3] from the OUTCAR file (negligible deviations from int values explain why)
  • cell [numpy.ndarray]: Structure unit cell as a 3x3 numpy array, where vectors are represented in each line.
  • rec_cell [numpy.ndarray]: Reciprocal unit cell as a 3x3 numpy array, where vectors are represented in each line.
  • labels_dict [dict]: Python dictionary of high-symmetry kpoints. Keys are kpoint indexes according to the calculation and values are kpoint labels according to the convention employed by ASE [2].
  • kpoints_rec_coords [list[lists]]: List of kpoints along high-symmetry k-path employed in the band structure calculation. The position of the kpoints in the list is equivalent to its kpoint index which is referred accross the calculation results. For each kpoint, a list of reciprocal coordinates is provided.
  • eigenvalues [numpy.ndarray]: Numpy array with shape (nbands, nkpoints) containing the energy eigenvalues of the bandstructure calculation as parsed by Pymatgen [3].
  • orbital_projections [dict]: Python dictionary the orbital projections. Keys are the individual ion and orbital contributions taking into account equivalent ions, which follow the naming convention of (<Element>_<wyckoff symbol>)_<orbital>. Each key stores an numpy array with shape (nbands, nkpoints) containing the orbital projections for each calculated kpoint/band index.
  • spin_projections [numpy.ndarray]: Numpy array with shape (nbands, nkpoints, 3) containing the projected magnetizations along the (x, y, z) directions for each set of calculated kpoint/band index, stored in the third dimension of the array according to the same order.

Structure data

These are cif files, named according to the material id. They can be opened and visualized with your software of choice (e.g. VESTA), or converted into Pymatgen Structures [3] with:

from pymatgen.core.structure import Structure

structure = Structure.from_file('path/to/cif_file.cif')

SS description

The post-processed data for all spin splitting analyses.

MongoDB database

A backed-up MongoDB dump file is available with all the resulting data from our analysis. The data structure is divided into three groups: Material-specific, band structure, and spin-splitting specific data.

Material-specific description keys

  • _id [string]: C2DB unique identifier.
  • stoichiometry [string]: Anonymous formula describing the material stoichiometry.
  • elements [array[string]]: List of elements in composition.
  • cations [array[string]]: List of cations defined by the calculated Bader charges.
  • anions [array[string]]: List of anions defined by the calculated Bader charges.
  • unit_cell [Object]: Unit cell basis vectors (Å).
    • .a1 [array[float]]
    • .a2 [array[float]]
    • .a3 [array[float]]
  • sites [array[Object]]: List of site objects. Each object provides the element and the coordinates of each occupied site in the unit cell.
    • [[element [string], coords [array[float]], ...]
  • bravais_lattice [string]: Two-dimensional Bravais lattice (primitive hexagonal, centred rectangular, primitive oblique, primitive square, primitive rectangular).
  • structural_cluster [string]: Structural cluster label.
  • space_group [integer]: Space group number.
  • polar [boolean]: Crystal structure is polar.
  • e_hull [float]: Energy above hull (eV).
  • band_gap [float]: Band gap (eV).

Band structure description keys

  • n_electrons [integer]: Total number of valence electrons used in the calculations.
  • n_bands [integer]: Number of calculated bands.
  • BZ_cell [Object]: Brillouin zone basis vectors (Å<sup>-1</sup>).
    • .b1 [array[float]]
    • .b2 [array[float]]
    • .b3 [array[float]]
  • n_kpoints [integer]: Number of k-points along the entire band structure.
  • kpoints [array[array]]: List of k-points' fractional coordinates. It is ordered by the band structure k-point sequence.
  • k_path [array[Object]]: List of high-symmetry k-points objects ordered by the band structure's k-path sequence. Each object provides the label and the coordinates of the respective k-point.
    • [[label [string], coords [array[float]], ...]
  • bands [array[array]]: Array containing n_bands arrays, each representing an individual band with a total of n_kpoints energy eigenvalues.
  • NOMAD_files [array[string]]: List of URLs to input and output files of the non-collinear DFT calculation. These include the INCAR, POSCAR, KPOINTS, OUTCAR, and vasprun.xml.

Spin-splitting description keys

  • vb [Object]: The object containing the valence band spin-splitting data description.
    • index [integer]: Valence band index
    • LSS [array[Object]]: List of linear spin splittings objects. Each object has the following fields:
      • rashba_param [float]: Rashba parameter (ev.Å).
      • delta_e [float]: Energy delta (eV).
      • delta_k [float]: Delta k (Å$^{-1}$).
      • accessibility [float]: Distance in energy to the VBM (eV).
      • kpoint [array[float]]: K-point fractional coordinates of the LSS.
      • k_segment [string]: Initial and final HS k-points of k-path segment which the LSS happens.
      • anti_crossing [boolean]: Spin splitting has anti-crossing.
    • HOSS [array[Object]]: List of high-order spin-splittings objects. Each object has the same keys as the LSS object, except rashba_param as it is not defined for HOSS.
    • ZSS [array[Object]]: List of Zeeman spin splittings objects. Each object has three keys: spin_splitting and accessibility, with the same meaning as in the LSS object; and kpoint, which is a high-symmetry k-point object indicating the location of the ZSS.
  • cb [Object]: The object containing the conduction band spin-splitting data description. It is analogous to the vb structure described above, with the exception that the SS values are calculated concerning the conduction band minimum instead.
  • has_ZSS [boolean]: Valence or conduction band has ZSS.
  • has_LSS [boolean]: Valence or conduction band has LSS.
  • has_HOSS [boolean]: Valence or conduction band has HOSS.
  • vb_SS [boolean]: Valence band has SS.
  • cb_SS [boolean]: Conduction band has SS.

Setting up the database

To access the database, the user needs to download and install the MongoDB Community Edition. We suggest that the user read the installation documentation of the MongoDB Community Edition, which is OS-dependent, to install it and to make sure the MongoDB process is started before trying to access the database.

After the mongod process is started, the database can be loaded with the mongorestore command:

mongorestore --gzip --archive=ss_2d_materials.gz

The database is loaded under the name ss_2d_materials, which contains the collection ss_descriptions that stores all provided data.

Querying the database

Data queries can be made with multiple programming languages, including Python with the PyMongo library. Another option is to use the Compass program, which provides a graphical user interface for queries and other functionalities.

As an example of querying, we demonstrate how to use Pymongo in Python to get the data analyzed in the "Data Validation" section.

from pymongo import MongoClient

# Connecting to MongoDB
client = MongoClient()

# Selecting the '2D_ss_materials' database
db = client['2D_ss_materials']

# Specifying the collection 'ss_description'
collection = db.ss_descriptions

# This query specifies that we want only the entries with ZSS > 0.1 eV (in the vb OR in the cb)
query = {
    '$or':[
        {"vb.zss.spin_splitting": {'$gte': 0.1}},
        {"cb.zss.spin_splitting": {'$gte': 0.1}}
    ]
}

# List of specified fields we want returned
project = {"cations": 1, "anions": 1, "structural_cluster": 1}

docs = collection.find(query, project)
print(list(docs))

# Result (showing only the first two results):
'''
[
{
    "_id": "HgSe-619ed885f677",
    "cations": ["Hg"],
    "anions": ["Se"],
    "structural_cluster": "AB-c2"
},
{
    "_id": "HgO-a8678fa85c38",
    "cations": ["Hg"],
    "anions": ["O"],
    "structural_cluster": "AB-c2"
},
...
]
'''

The query dictionary specifies that only materials presenting a ZSS greater than 0.1 eV should be selected. The project object sets the data keys that should be returned: cations, anions, and structural_cluster. The result is a list of 120 entries that fit the query criteria. For further options, refer to the MongoDB manual on querying documents.

json format

We also provide a JSON file with the same structure as the MongoDB data. The file can be read in Python with the following:

with open('2D_ss_materials.json', 'r') as f:
    data = json.load(f)

The data variable stores a list with a length corresponding to the total number of materials (436). Each entry from this list is a dictionary with the data structure mentioned above. The user can then convert the data to a simple table, with special care to normalize the unstructured fields, such as vb and cb.

References:

  1. The Computational 2D Materials Database: High-Throughput Modeling and Discovery of Atomically Thin Crystals. Sten Haastrup, Mikkel Strange, Mohnish Pandey, Thorsten Deilmann, Per S. Schmidt, Nicki F. Hinsche, Morten N. Gjerding, Daniele Torelli, Peter M. Larsen, Anders C. Riis-Jensen, Jakob Gath, Karsten W. Jacobsen, Jens Jørgen Mortensen, Thomas Olsen, Kristian S. Thygesen 2D Materials 5, 042002 (2018)
  2. Ask Hjorth Larsen, Jens Jørgen Mortensen, Jakob Blomqvist, Ivano E. Castelli, Rune Christensen, Marcin Dułak, Jesper Friis, Michael N. Groves, Bjørk Hammer, Cory Hargus, Eric D. Hermes, Paul C. Jennings, Peter Bjerre Jensen, James Kermode, John R. Kitchin, Esben Leonhard Kolsbjerg, Joseph Kubal, Kristen Kaasbjerg, Steen Lysgaard, Jón Bergmann Maronsson, Tristan Maxson, Thomas Olsen, Lars Pastewka, Andrew Peterson, Carsten Rostgaard, Jakob Schiøtz, Ole Schütt, Mikkel Strange, Kristian S. Thygesen, Tejs Vegge, Lasse Vilhelmsen, Michael Walter, Zhenhua Zeng, Karsten Wedel Jacobsen The Atomic Simulation Environment—A Python library for working with atoms J. Phys.: Condens. Matter Vol. 29 273002, 2017
  3. Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent Chevrier, Kristin A. Persson, Gerbrand Ceder. Python Materials Genomics (pymatgen) : A Robust, Open-Source Python Library for Materials Analysis. Computational Materials Science, 2013, 68, 314–319. doi:10.1016/j.commatsci.2012.10.028