In this README file, we detail additional steps that must be taken to execute the Jupyter notebooks enclosed in the 'notebooks' directory.

Here is the list of Python packages that should be installed with pip into your environment:

- numpy, scipy, matplotlib, ase, sklearn, torch, tqdm (optional), skmatter, rascaline, equistore

Instructions to install rascaline and equistore can be found in the below addresses:
https://luthaf.fr/rascaline/latest/index.html
https://lab-cosmo.github.io/equistore/latest/

---------- 'section2_GaAs.ipynb' ----------

-- The dataset of GaAs structures from Imbalzano and Ceriotti, PRM, 2021, must be downloaded. This can be accessed via < https://archive.materialscloud.org/record/2021.95 > At this Materials Cloud repository, download the 'gaas-data.zip', and extract from it '/data/database/TRAINING.data' file, which should then be put into the 'datasets' directory.

-- 'TRAINING.data' file is provided in "runner" format, which is the native file format of n2p2. To read this file format with ASE, one must put 'datasets/runner.py' file into the 'ase/io' directory of your Python environment that contains the ASE package (ex. for an Anaconda base environment: 'anaconda3/lib/python3.9/site-packages/ase/io'). Then, 'formats.py' file in the same directory should be modified by adding "F('runner', 'Runner input file','+F', module='runner')" where all IO formats of ASE are defined (lines 310-465).



---------- 'section3_film.ipynb' and 'section4_NN_expreg.ipynb' ----------

-- The dataset of carbon structures from Deringer and Csanyi, PRB, 2017, must be downloaded. This can be accessed via < https://www.repository.cam.ac.uk/handle/1810/262814 > At this repository, download the 'aC_GAP_data_main.tar.gz' file and extract it. From there, the 'E_F_tests' directory should be put into the 'datasets' directory.