Speaker
Description
Bibliographies are a core tool used by observatories to evaluate the impact of their facilities and instruments. Yet, identifying and classifying papers referencing specific instruments is usually a manual, time-intensive task. We developed a large language model (LLM)-augmented pipeline to automatically construct a comprehensive list of instruments referenced across the full astronomy corpus of the Astrophysics Data System (ADS/SciX), roughly 3 million records in size. By grounding LLM agents with web search, we increased the number of true-positive ngram-to-instrument associations. What would have taken a week of focused work by a single human curator was accomplished in hours, and can now be run incrementally on new additions to the corpus to dynamically identify novel instruments as they appear in the literature.
| Affiliation of the submitter | Harvard-Smithsonian Center for Astrophysics |
|---|---|
| Attendance | remote |