Aluminum alloy compositions and properties extracted from a corpus of scientific manuscripts and US patents Abstract: Researchers continue to explore and develop aluminum alloys with new compositions and improved performance characteristics. An understanding of the current design space can help accelerate the discovery of new alloys. We present two datasets: 1) chemical composition, and 2) mechanical properties for predominantly wrought aluminum alloys. The first dataset contains 14,884 entries on aluminum alloy compositions extracted from academic literature and US patents using text processing techniques, including 550 wrought aluminum alloys which are already registered with the Aluminum Association. The second dataset contains 1,278 entries on mechanical properties for aluminum alloys, where each entry is associated with a particular wrought series designation, extracted from tables in academic literature. ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî Data Summary: - property.csv - Each row in the dataset represents an aluminum alloy - Number of attributes: 11 (5 descriptive headers, 4 property-related headers, 2 ‚Äòflag‚Äô headers) - composition.csv - Each row in the dataset represents an aluminum alloy - Number of attributes: 75 (6 descriptive headers, and 69 element headers) ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî Variable Information: Given is the attribute name, datatype, and a brief description. Property Dataset: ‚Äòdoi‚Äô ‚Äî String ‚Äî Digital Object Identifier of the associated journal article ‚Äòname‚Äô ‚Äî String ‚Äî Original table row name ‚Äòseries‚Äô ‚Äî Integer (class) ‚Äî Aluminum alloy series designation, one of (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000). The ‚Äòseries‚Äô value is first based on the alloy composition associated with the same ‚Äòdoi‚Äô. It is then manually cleaned following validation processing. ‚Äòcaption‚Äô ‚Äî String ‚Äî Original source table caption ‚Äòtable_extr_AA_des‚Äô ‚Äî Integer ‚Äî Table-extracted AA designation: AA designation code (extracted from original source table row name or table caption via text matching digits of format 'XXXX') ‚ÄòYS‚Äô ‚Äî Decimal ‚Äî Yield strength (MPa) ‚ÄòUTS‚Äô ‚Äî Decimal ‚Äî Ultimate tensile strength (MPa) ‚Äòtemper‚Äô ‚Äî String ‚Äî Temper designation of the alloy ‚Äòelong‚Äô ‚Äî Decimal ‚Äî Percent elongation ‚Äòflag‚Äô ‚Äî True/False ‚Äî Alloy undergoes special processing ‚Äòflag_note‚Äô ‚Äî String ‚Äî Reason for flag (e.g., equal angular extrusion) Composition Dataset: ‚Äòsource‚Äô ‚Äî String (class) ‚Äî The original source of composition information, one of: (named, full text, table, patent) ‚Äòft_doi_list‚Äô ‚Äî String (list format) ‚Äî Full text DOI list: List containing all unique DOIs associated with a given composition ‚Äòtable_doi‚Äô ‚Äî String ‚Äî DOI of the table's associated journal article ¬Ý ‚Äòname‚Äô ‚Äî String ‚Äî Determined by 'source' as follows:¬Ý ('named': Four-digit identifier code designated by Al Association (AA) (see Ref. [1]), 'full text': N/A, 'table': Original source table row name,¬Ý 'patent': Patent publication number) ‚Äòtable_extr_AA_des‚Äô ‚Äî Integer ‚Äî Table-extracted AA designation: AA designation code (extracted from original source table row name or table caption via text matching digits of format 'XXXX') 'comp_rule_based_series' ‚Äî Integer (class) - Composition rule-based series: Aluminum alloy series, assigned by applying a set of rules to the alloy‚Äôs composition <element> ‚Äî Decimal ‚Äî Percent weight of this <element> within the Al alloy ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî References [1] The Aluminum Association. International Alloy Designations and Chemical Composition Limits for Wrought Aluminum and Wrought Aluminum Alloys. (2018).