Speaker
Description
The quantities of data produced by next generation instruments such as the SKA, the DSA2000 and the ngVLA require new software ecosystems to convert observational data into science ready data products.
Traditionally, such scales of data and compute are solved using traditional HPC software and infrastructure. While this approach is still relevant going forward, the advent of (1) ubiquitous cloud compute (2) the expansion of the scientific Python (PyData) ecosystem and (3) the concurrent explosion of data processing techniques used in Machine Learning and AI, has pioneered a complementary approach that sacrifices some performance for the flexibility to rapidly prototype and develop distributed processing pipelines by embedding “experts in the loop”.
The Pangeo project has pioneered this approach in the geosciences domain, building cloud-based pipelines built on a software stack of Xarray for dataset representation, Zarr for distributed storage and Dask, NumPy and SciPy for compute. It has also seen adoption within the Radio Astronomy community in software ecosystems such as Africanus and the adoption of these technologies within the Measurement Set v4 Working Group xradio prototype.
This talk will discuss how, through the adoption of these new interfaces and formats, Radio Astronomy stands on the cusp of a new wave of software built upon Open Source technologies. It will demonstrate how, through the use of the Xarray interface, data scientists and radio astronomers will be able to manipulate large scale datasets to produce science. Additionally, it will discuss pertinent developments in the broader open-source community relevant to Radio Astronomy going forward.
| Affiliation of the submitter | South African Radio Astronomy Observatory |
|---|---|
| Attendance | in-person |