Papers
arxiv:2511.16366

From Patents to Dataset: Scraping for Oxide Glass Compositions and Properties

Published on Nov 20, 2025
Authors:
,
,
,
,
,
,

Abstract

In this work, we present web scraping techniques to extract in- formation from patent tables, clean and structure them for future use in predictive machine learning models to develop new glasses. We extracted compositions and three properties relevant to the development of new glasses and structured them into a database to be used together with information from other available datasets. We also analyzed the consistency of the information obtained and what it adds to the existing databases. The extracted liquidus temperatures comprise 5,696 compositions; the second subset includes 4,298 refractive indexes and, finally, 1,771 compositions with Abbe numbers. The extraction performed here increases the available information by approximately 10.4% for liquidus temperature, 6.6% for refractive index, and 4.9% for Abbe number. The impact extends beyond quantity: the newly extracted data introduce compositions with property values that are more diverse than those in existing databases, thereby expanding the accessible compositional and property space for glass modeling applications. We emphasize that the compositions of the new database contain relatively more titanium, magnesium, zirconium, niobium, iron, tin, and yttrium oxides than those of the existing bases.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.16366 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.16366 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.16366 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.