The weirdest SDSS galaxies: Results from an outlier detection algorithm

Dalya Baron*, Dovi Poznanski

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


How can we discover objects we did not know existed within the large data sets that now abound in astronomy? We present an outlier detection algorithm that we developed, based on an unsupervised Random Forest. We test the algorithm on more than two million galaxy spectra from the Sloan Digital Sky Survey and examine the 400 galaxies with the highest outlier score. We find objects which have extreme emission line ratios and abnormally strong absorption lines, objects with unusual continua, including extremely reddened galaxies. We find galaxy-galaxy gravitational lenses, double-peaked emission line galaxies and close galaxy pairs. We find galaxies with high ionization lines, galaxies that host supernovae and galaxies with unusual gas kinematics. Only a fraction of the outliers we find were reported by previous studies that used specific and tailored algorithms to find a single class of unusual objects. Our algorithm is general and detects all of these classes, and many more, regardless of what makes them peculiar. It can be executed on imaging, time series and other spectroscopic data, operates well with thousands of features, is not sensitive to missing values and is easily parallelizable.

Original languageEnglish
Pages (from-to)4530-4555
Number of pages26
JournalMonthly Notices of the Royal Astronomical Society
Issue number4
StatePublished - 11 Mar 2017


  • Galaxies: general
  • Galaxies: peculiar
  • Methods: data analysis
  • Methods: statistical


Dive into the research topics of 'The weirdest SDSS galaxies: Results from an outlier detection algorithm'. Together they form a unique fingerprint.

Cite this