×

Recommended by

Indexed by

HTA - An open-source software for assigning heads and tails to SMILES in polymerization reactions

Brenda de Souza Ferrari1*, Ronaldo Giro2*, Mathias B. Steiner3*

1 University of Jyväskylä - Department of Chemistry - Seminaarinkatu 15, PL 35, 40014 Jyväskylän yliopisto - Finland

2 Rd J Fco Aguirre Proença Km 9 Sp101. Chacara Assay Hortolândia, SP 13186-900, Brazil

3 IBM Research Brazil - Avenida República do Chile, 330 - 11o. e 12. andares Rio De Janeiro, RJ 20031-170, Brazil

* Corresponding authors emails: brndafferrari@gmail.com, rgiro@br.ibm.com, mathiast@br.ibm.com
DOI10.24435/materialscloud:tx-b9 [version v1]

Publication date: Jan 10, 2025

How to cite this record

Brenda de Souza Ferrari, Ronaldo Giro, Mathias B. Steiner, HTA - An open-source software for assigning heads and tails to SMILES in polymerization reactions, Materials Cloud Archive 2025.6 (2025), https://doi.org/10.24435/materialscloud:tx-b9

Description

Polymers are versatile materials with a wide range of applications. The improvement of polymer properties rises the importance on the way that the repeating units are connected (head-to-tail,head-to-head,tail-to-tail) to build the polymer structure since it directly influences the morphology, chain topology and consequently its properties. Artificial intelligence (AI) based approaches are beginning to impact several domains of human life, science and technology. Polymer informatics is one such domain where AI and machine learning (ML) tools are being used in the efficient development, design and discovery of polymer. One key enabling factor for the essential foundations for Polymer Informatics is the machine-readable polymer representation. Polymer have been represented in a string format with special characters used to tag the head and tail positions indicating where the linking bond happens between repeat units. Available tools to assign the head and tail position limits its applicability in a broad sense. In this work we show a new tool to assign the head and tail atoms for a given monomer. From a database of 206 polymer precursors curated from the literature, our algorithm correctly predicted the class of 201 data points, which represents 97.6% of accuracy and regarding the the head and tail assignment, correctly assigned the positions for 188 data points, which translates to 91.3% of accuracy.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name Size Description
input.csv
MD5md5:df1241b40f9b0329dfd5ad070754d94e
17.4 KiB The file input.csv with 206 monomers from various polymer classes was used as input for validation of HTA tool: https://github.com/IBM/HeadTailAssign. The last column of input.csv file are the ground-truth repeat units with the head and tail tagged with with the symbols [*:1] and [*:2], respectivelly.
output_hta.csv
MD5md5:32007eea133418e68723f6684626fefa
31.8 KiB Output file obtained from HTA tool: https://github.com/IBM/HeadTailAssign

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International. Community Data License Agreement Permissive 2.0 - CDLA-Permissive-2.0
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

Journal reference (Paper in which the method is described and the data is discussed)
B. S. Ferrari, R. Giro and M.B. Steiner, Journal of Chemical Theory and Computation - submitted
Preprint
B. S. Ferrari, R. Giro and M.B. Steiner - submitted to Chemrxiv

Keywords

polymer repeat unit head and tail atom position

Version history:

2025.6 (version v1) [This version] Jan 10, 2025 DOI10.24435/materialscloud:tx-b9