×

Recommended by

Indexed by

Predicting polymerization reactions via transfer learning using chemical language models

Brenda S. Ferrari1*, Matteo Manica2*, Ronaldo Giro1*, Teodoro Laino2,3*, Mathias B. Steiner1*

1 IBM Research Brazil - Avenida República do Chile, 330 - 11o. e 12. andares Rio De Janeiro, RJ 20031-170, Brazil

2 IBM Research Europe - Säumerstrasse 4, 8803 Rüschlikon, Switzerland

3 National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland

* Corresponding authors emails: bferrari@ibm.com, TTE@zurich.ibm.com, rgiro@br.ibm.com, TEO@zurich.ibm.com, mathiast@br.ibm.com
DOI10.24435/materialscloud:ef-4j [version v2]

Publication date: Feb 29, 2024

How to cite this record

Brenda S. Ferrari, Matteo Manica, Ronaldo Giro, Teodoro Laino, Mathias B. Steiner, Predicting polymerization reactions via transfer learning using chemical language models, Materials Cloud Archive 2024.40 (2024), https://doi.org/10.24435/materialscloud:ef-4j

Description

Polymers are candidate materials for a wide range of sustainability applications such as carbon capture and energy storage. However, computational polymer discovery lacks automated analysis of reaction pathways and stability assessment through retro-synthesis. Here, we report the first extension of transformer-based language models to polymerization reactions for both forward and retrosynthesis tasks. We curated a polymerization dataset for vinyl polymers covering reactions and retrosynthesis for representative homo-polymers and co-polymers. Overall, we report a forward model accuracy of 80% and a backward model accuracy of 60%. We further analyse the model performance on a set of case studies by providing polymerization and retro-synthesis examples and evaluating the model’s predictions quality from a materials science perspective.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name Size Description
hta_dataset_all_combinations.csv
MD5md5:e98e53c67b90a7e2ec049a93a17dd78f
2.1 MiB polymerization reactions with head and tail atoms assigned by a Python script based on polymerization mechanisms and nucleophilic index from quantum chemistry atom population of Highest Occupied Molecular Orbital
m2p_dataset_all_combinations.csv
MD5md5:def01c2f8f848ab7ad329372cb37898c
1.5 MiB polymerization reactions with head and tail atoms assigned by a modified version of a Python tool called Monomers to Polymers (M2P)
trained_models.zip
MD5md5:f43e79c28fec01b7bf665b853b6359a3
3.8 GiB zip file containing the Machine Learning training models (forward and retrosynthesis) where the config files are those with yml extensions and the model weights are those with pt extensions (pickle binary - can be read via pytorch)
input.csv
MD5md5:47aa28056236aeaa416d7034577dd58b
18.6 KiB ground-truth dataset with monomers and polymer repeat units to validate the head and tail assignment HTA and M2P algorithms

License

Files and data are licensed under the terms of the following license: MIT License. CDLA-Permissive-2.0
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

No external references available for this Materials Cloud Archive record.

Keywords

polymerization reaction machine learning homopolymers co-polymers reactants reagents (solvents, catalysts) products

Version history:

2024.40 (version v2) [This version] Feb 29, 2024 DOI10.24435/materialscloud:ef-4j
2023.137 (version v1) Sep 06, 2023 DOI10.24435/materialscloud:zw-be