This record has versions v1, v2. This is version v1.

Recommended by

Indexed by

Predicting polymerization reactions via transfer learning using chemical language models

Brenda S. Ferrari1*, Matteo Manica2*, Ronaldo Giro1*, Teodoro Laino2,3*, Mathias B. Steiner1*

1 IBM Research Brazil - Avenida República do Chile, 330 - 11o. e 12. andares Rio De Janeiro, RJ 20031-170, Brazil

2 IBM Research Europe - Säumerstrasse 4, 8803 Rüschlikon, Switzerland

3 National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland

* Corresponding authors emails:,,,,
DOI10.24435/materialscloud:zw-be [version v1]

Publication date: Sep 06, 2023

How to cite this record

Brenda S. Ferrari, Matteo Manica, Ronaldo Giro, Teodoro Laino, Mathias B. Steiner, Predicting polymerization reactions via transfer learning using chemical language models, Materials Cloud Archive 2023.137 (2023),


Polymers are candidate materials for a wide range of sustainability applications such as carbon capture and energy storage. However, computational polymer discovery lacks automated analysis of reaction pathways and stability assessment through retro-synthesis. Here, we report the first extension of transformer-based language models to polymerization reactions for both forward and retrosynthesis tasks. We curated a polymerization dataset for vinyl polymers covering reactions and retrosynthesis for representative homo-polymers and co-polymers. Overall, we report a forward model accuracy of 80% and a backward model accuracy of 60%. We further analyse the model performance on a set of case studies by providing polymerization and retro-synthesis examples and evaluating the model’s predictions quality from a materials science perspective.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.


File name Size Description
2.1 MiB polymerization reactions with head and tail atoms assigned by a Python script based on polymerization mechanisms and nucleophilic index from quantum chemistry atom population of Highest Occupied Molecular Orbital
1.5 MiB polymerization reactions with head and tail atoms assigned by a modified version of a Python tool called Monomers to Polymers (M2P)
3.8 GiB zip file containing the Machine Learning training models (forward and retrosynthesis) where the config files are those with yml extensions and the model weights are those with pt extensions (pickle binary - can be read via pytorch)


Files and data are licensed under the terms of the following license: MIT License. CDLA-Permissive-2.0
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

No external references available for this Materials Cloud Archive record.


polymerization reaction machine learning homopolymers co-polymers reactants reagents (solvents, catalysts) products

Version history:

2024.40 (version v2) Feb 29, 2024 DOI10.24435/materialscloud:ef-4j
2023.137 (version v1) [This version] Sep 06, 2023 DOI10.24435/materialscloud:zw-be