Speaker
Description
Modern astronomical surveys such as HST, JWST, Euclid, and LSST are generating petabyte-scale imaging archives across multiple wavelengths and epochs. Traditional image retrieval methods, which are based solely on metadata, such as sky position, filter, or exposure time- are insufficient to identify objects with similar visual or physical characteristics. To enable efficient discovery in these massive datasets, content-based search methods that operate directly on image representations are required. Early studies in this area relied on handcrafted features or supervised convolutional neural networks (CNNs), which require extensive labelled data and thus limit scalability. More recent advances employ unsupervised or self-supervised learning to extract intrinsic representations of astronomical images. In particular, Teimoorinia et al. (2021) introduced a fully unsupervised two-stage framework that combines self-organizing maps (SOMs) and CNN autoencoders for image modelling and similarity search, demonstrating the feasibility of data-driven discovery without labels. Building on this foundation, we present a practical and scalable framework that integrates state-of-the-art self-supervised learning into the Canadian Astronomy Data Center’s Common Archive Observation Model (CAOM). Using techniques such as variance–invariance–covariance regularization (VICReg) and deep embedded clustering (DEC), our system generates high-dimensional embeddings that capture both the visual morphology and physical properties of astronomical sources. We demonstrate that these learned representations enable efficient, scientifically meaningful image retrieval within large, heterogeneous archives, paving the way for next-generation, content-based discovery tools in data center environments.
| Affiliation of the submitter | HAA/CADC |
|---|---|
| Attendance | in-person |