Speaker
Description
We review the application of machine learning (ML) to identify Young
Stellar Objects (YSOs) in the era of large-scale astronomical surveys. It begins by
outlining the limitations of traditional identification methods (e.g., infrared excess,
spectroscopic confirmation), which struggle with the volume, complexity, and high-
dimensionality of data from projects like the Vera C. Rubin Observatory (LSST) and
JWST. Key challenges highlighted include severe class imbalance, as YSOs are rare,
and contamination from other astrophysical sources. We present several case stud-
ies demonstrating the success of ML techniques. These examples show how algo-
rithms like Gradient Boosting Machines, CatBoost combined with Self-Paced Ensem-
ble (SPE), and Probabilistic Random Forests, especially when enhanced with methods
like SMOTE to address class imbalance, have successfully identified large, pure sam-
ples of YSO candidates from multi-wavelength data. Looking forward, we predict that
next-generation surveys will drive the adoption of more sophisticated models like Graph
Neural Networks and Transformers. Future trends will include a shift towards physi-
cally informed ML that estimates parameters like mass and age, and a greater focus on
model explainability using tools like SHAP to build trust and gain scientific insights. In
conclusion, ML is revolutionizing YSO identification, turning the challenge of big data
into an unprecedented opportunity for statistical studies of star formation.
| Affiliation of the submitter | National Astronomical Observatories, Chinese Academy of Sciences |
|---|---|
| Attendance | in-person |