Published January 5, 2026 | Version v1
Dataset Open

Resolving the body-order paradox of machine learning interatomic potentials

  • 1. ROR icon École Polytechnique Fédérale de Lausanne
  • 2. ROR icon Harvard University
  • 3. ROR icon University of Modena and Reggio Emilia

* Contact person

Description

This dataset contains the model training scripts, training sets, evaluation results, and analysis notebooks associated with the work "Resolving the Body-Order Paradox of Machine Learning Interatomic Potentials" by Chong et al. Abstract of the work is shared below:

 

"In many cases, the predictions of machine learning interatomic potentials (MLIPs) can be interpreted as a sum of body-ordered contributions, which is explicit when the model is directly built on neighbor density correlation descriptors, and implicit when the model captures the correlations through non-linear functions of low body-order terms. In both cases, the "effective body-orderedness" of MLIPs remains largely unexplained: how do the models decompose the total energy into body-ordered contributions, and how does their body-orderedness affect the accuracy and learning behavior? In answering these questions, we first discuss the complexities in imposing the many-body expansion on ab initio calculations at the atomic limit. Next, we train a curated set of MLIPs on datasets of hydrogen clusters and reveal the inherent tendency of the ML models to deduce their own, effective body-order trends, which are dependent on the model type and dataset makeup. Finally, we present different trends in the convergence of the body-orders and generalizability of the models, providing useful insights for the development of future MLIPs."

Files

File preview

All files

Files (262.6 MiB)

Name Size
md5:06c4d7744969758c0ad0be71ab8c31cd
262.6 MiB Download
md5:6102329704b7f73972dfad39d86dc08f
1.1 KiB Preview Download

References

Preprint (Preprint where the data is discussed.)
S. Chong et al., arXiv preprint, arXiv:2509.14146, 2025.