TY - JOUR
T1 - Detecting outliers and learning complex structures with large spectroscopic surveys - A case study with APOGEE stars
AU - Reis, Itamar
AU - Poznanski, Dovi
AU - Baron, Dalya
AU - Zasowski, Gail
AU - Shahaf, Sahar
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2018/5/11
Y1 - 2018/5/11
N2 - In this work, we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the data set, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the data set for objects allows us to find objects that are impossible to find using their best-fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the data set, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.
AB - In this work, we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the data set, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the data set for objects allows us to find objects that are impossible to find using their best-fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the data set, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.
KW - Methods: data analysis
KW - Stars: general
KW - Stars: peculiar
UR - http://www.scopus.com/inward/record.url?scp=85047176217&partnerID=8YFLogxK
U2 - 10.1093/mnras/sty348
DO - 10.1093/mnras/sty348
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85047176217
SN - 0035-8711
VL - 476
SP - 2117
EP - 2136
JO - Monthly Notices of the Royal Astronomical Society
JF - Monthly Notices of the Royal Astronomical Society
IS - 2
ER -