Identifying Instruments in the ADS/SciX Corpus with LLM Agents

PO
Not scheduled
15m
Wichernhaus

Wichernhaus

poster presentation Automation of data pipeline and workflows Poster

Speaker

Jean-Claude Paquin (Harvard-Smithsonian Center for Astrophysics)

Description

Bibliographies are a core tool used by observatories to evaluate the impact of their facilities and instruments. Yet, identifying and classifying papers referencing specific instruments is usually a manual, time-intensive task. We developed a large language model (LLM)-augmented pipeline to automatically construct a comprehensive list of instruments referenced across the full astronomy corpus of the Astrophysics Data System (ADS/SciX), roughly 3 million records in size. By grounding LLM agents with web search, we increased the number of true-positive ngram-to-instrument associations. What would have taken a week of focused work by a single human curator was accomplished in hours, and can now be run incrementally on new additions to the corpus to dynamically identify novel instruments as they appear in the literature.

Affiliation of the submitter Harvard-Smithsonian Center for Astrophysics
Attendance remote

Primary author

Jean-Claude Paquin (Harvard-Smithsonian Center for Astrophysics)

Co-authors

Kelly Lockhart (Harvard-Smithsonian Center for Astrophysics) Sergi Blanco-Cuaresma (Harvard-Smithsonian Center for Astrophysics)

Presentation materials