Published January 7, 2026 | Version v1
Dataset Open

Score-based diffusion models for accurate crystal-structure inpainting and reconstruction of hydrogen positions

  • 1. PSI Center for Scientific Computing, Theory and Data, Paul Scherrer Institute, 5232 Villigen PSI, Switzerland
  • 2. Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Universit`a di Parma, I-43124 Parma, Italy
  • 3. Dipartimento di Scienze Fisiche, Informatiche e Matematiche, Universit`a degli Studi di Modena e Reggio Emilia, Modena, Italy
  • 4. Centro S3, Istituto Nanoscienze-CNR, Modena, Italy

* Contact person

Description

Generative AI models, such as score-based diffusion models, have recently advanced the field of computational materials science by enabling the generation of new materials with desired properties. In addition, these models can also be leveraged to reconstruct crystal structures for which partial information is available. One relevant example is the reliable determination of atomic positions occupied by hydrogen atoms in hydrogen-containing crystalline materials. While crucial to the analysis and prediction of many materials properties, the identification of hydrogen positions can however be difficult and expensive, as it is challenging in X-ray scattering experiments and often requires dedicated neutron scattering measurements. As a consequence, inorganic crystallographic databases frequently report lattice structures where hydrogen atoms have been either omitted or inserted with heuristics or by chemical intuition. Here, we combine diffusion models from the field of materials science with techniques originally developed in computer vision for image inpainting. We present how this knowledge transfer across domains enables a much faster and more accurate completion of host structures, compared to unconditioned diffusion models or previous approaches solely based on DFT. Overall, our approach exceeds a success rate of 97% in terms of finding a structural match or predicting a more stable configuration than the initial reference, when starting both from structures that were already relaxed with DFT, or directly from the experimentally determined host structures.

Files

File preview

All files

Files (27.9 GiB)

Name Apps Size
md5:4766c3646f15d103842ff86d14b22e8d
101.3 MiB Preview Download
md5:49478edb715e897bc74d2732624303cf
22.0 GiB Download
md5:076e656461d7603d5fba74a980530571
3.8 GiB Download
md5:92fc5388a9f9d11e5f5c99b46e97a7d3
1.9 GiB Download
md5:9e70f2cf6bbf15d6f0989a447d266c53
10.7 KiB Preview Download

References

Preprint
T. Reents, A. Cantarella, M. Bercx, P. Bonfà and G. Pizzi, arXiv preprint arXiv:2601.01959 (2026), doi: 10.48550/arXiv.2601.01959

Software (Model checkpoints for the models discussed in the paper.)
Hugging Face repository: t-reents/XtalPaint

Software (Code package related to the paper)
XtalPaint