Speaker
Description
The GAIA Datamining Platform provides interactive, JupyterHub-based access to the GAIA Data Release 3 dataset, which comprises 7TB of data.
The GAIA Data Release 4 dataset is expected to be in excess of 600TB.
We describe our progress in evolving the GAIA Data Mining Platform to a modern, kubernetes-based, platform-independent deployment, named Astroflow, adding dask functionality to existing large scale Spark analytical processing.
In conjunction with the closely related SPACIOUS project, we report our findings and successes in deploying the platform to both on-premise (OpenStack) and commercial (Google) cloud platforms.
We outline our plans to incorporate Apache Iceberg into our architecture to efficiently scale up to and support the forthcoming GAIA DR4 release, and to use the data lake model to support and combine future multiple data sources for large scale analytical processing and data mining in our interactive environment
| Affiliation of the submitter | Institute For Astronomy, University Of Edinburgh |
|---|---|
| Attendance | in-person |