Re-writing the VizieR catalogue ingestion pipeline

PO
Not scheduled
15m
Wichernhaus

Wichernhaus

Board: L232
poster presentation Lessons learned Poster

Speaker

Ivan Brossard (CDS)

Description

Vizier provides a library of published astronomical catalogues (tables and associated data) with verified, enriched data. Since its creation in 1995, the workflow has been a semi-automated process. It makes authors’ datasets compatible with the Virtual Observatory (VO) and adds additional data, such as position or links.

We rewrote the catalogue ingestion process to follow an architecture that allows for new developments while being backwards compatible with the ~30K catalogues already present in the database. Indeed, VizieR has to keep up with the evolution of the science context spurred by the dynamic VO and the Open Data era.

The new developments improve internal processes (UCD auto-detection, cross-correlation…) and integrate astronomical standard such as MOCs, UAT or Datalink. They also abide better by FAIR concepts, licences, ORCIDs and linked resources.

While the original program was exclusively written in C, the new version is a mix of C and Python, written so that metadata extraction and database population are separate. The new version still uses ReadMe files for metadata ( see 10.1051/aas:2000169 ) and accepts multiple standards for data, such as Fits, MRT files, or csv files. These files are then formatted and ingested in PostgreSQL and in an ElasticSearch database.

In this poster we describe the method by which we re-wrote the ingestion pipeline. Finally, we describe new ideas and features that we can now implement, thanks to the new pipeline.

Affiliation of the submitter CDS - Observatoire Astronomique de Strasbourg
Attendance in-person

Primary author

Co-author

gilles Landais (CDS, Observatoire Astronomique de Strasbourg)

Presentation materials