CAOM-AI: Content-Based Image Search System

P10
12 Nov 2025, 11:30
15m
Synagoge

Synagoge

Görlitz
oral presentation Science platforms in the big data era Plenary Session 10

Speaker

Hossen Teimoorinia (HAA/CADC)

Description

Modern astronomical surveys such as HST, JWST, Euclid, and LSST are generating petabyte-scale imaging archives across multiple wavelengths and epochs. Traditional image retrieval methods, which are based solely on metadata, such as sky position, filter, or exposure time- are insufficient to identify objects with similar visual or physical characteristics. To enable efficient discovery in these massive datasets, content-based search methods that operate directly on image representations are required. Early studies in this area relied on handcrafted features or supervised convolutional neural networks (CNNs), which require extensive labelled data and thus limit scalability. More recent advances employ unsupervised or self-supervised learning to extract intrinsic representations of astronomical images. In particular, Teimoorinia et al. (2021) introduced a fully unsupervised two-stage framework that combines self-organizing maps (SOMs) and CNN autoencoders for image modelling and similarity search, demonstrating the feasibility of data-driven discovery without labels. Building on this foundation, we present a practical and scalable framework that integrates state-of-the-art self-supervised learning into the Canadian Astronomy Data Center’s Common Archive Observation Model (CAOM). Using techniques such as variance–invariance–covariance regularization (VICReg) and deep embedded clustering (DEC), our system generates high-dimensional embeddings that capture both the visual morphology and physical properties of astronomical sources. We demonstrate that these learned representations enable efficient, scientifically meaningful image retrieval within large, heterogeneous archives, paving the way for next-generation, content-based discovery tools in data center environments.

Affiliation of the submitter HAA/CADC
Attendance in-person

Primary author

Co-author

Patrick Dowler (HAA/CADC)

Presentation materials