Practical lessons in building collaborative science platforms with open-source tools

P11
12 Nov 2025, 15:00
15m
Synagoge

Synagoge

Görlitz
oral presentation Lessons learned Plenary Session 11

Speaker

Hubert Siejkowski (Academic Computer Centre CYFRONET of the AGH University of Krakow)

Description

Advanced science platforms must handle large data volumes, complex workflows, and collaborations that span multiple disciplines and partners. While scientific questions may differ between fields, the challenges of building reliable and reproducible data-driven research infrastructures are very similar.

This talk demonstrates how a geophysics-oriented science platform integrates well known open-source services to support the full lifecycle of data-intensive research. For monitoring and observability, we employ Prometheus and Grafana to collect and visualise performance metrics, alongside Fluentd and Kibana for log aggregation and analysis. These tools provide real-time insights into the health and efficiency of the platform and scientific workflows. They are also generating metrics and usage reports that satisfy accountability requirements of funding agencies.

Beyond infrastructure monitoring, collaboration and reproducibility are enabled through Gitea for distributed version control and lightweight code hosting, allowing users to develop and run custom applications on our platform. To support the increasing role of machine learning in scientific workflows, we integrate tools such as Weights & Biases for experiment tracking and model management, and Label Studio for collaborative dataset curation and annotation. The platform supports execution on remote workers and HPC clusters, enabling scalable computation and flexible integration with existing research infrastructures.

Our experience demonstrates that combining these tools into a modular ecosystem creates science platforms that are scalable, funding-compliant, and cultivate reproducibility. Key lessons stress the need to balance automation with researcher control, design platforms that enable cross-disciplinary collaboration, and adopt testing frameworks to ensure platform reliability. Although developed for geophysics, this approach is directly transferable to astrophysics and other data-intensive fields. The talk will highlight practical insights on building platforms that support scientific workflows and collaboration in the big data era.

Affiliation of the submitter Academic Computer Centre CYFRONET of the AGH University of Krakow
Attendance in-person

Primary author

Hubert Siejkowski (Academic Computer Centre CYFRONET of the AGH University of Krakow)

Co-authors

Bartłomiej Wenda (Academic Computer Centre CYFRONET of the AGH University of Krakow) Jakub Sto (Academic Computer Centre CYFRONET of the AGH University of Krakow) Joanna Kocot (Academic Computer Centre CYFRONET of the AGH University of Krakow) Krystyna Milian (Academic Computer Centre CYFRONET of the AGH University of Krakow) Maciej; Leśniak (Academic Computer Centre CYFRONET of the AGH University of Krakow) Tomasz Balawajder (Academic Computer Centre CYFRONET of the AGH University of Krakow)

Presentation materials