Transductive and Inductive Outlier Detection with Robust Autoencoders

Ofir Lindenbaum, Yariv Aizenbud, Yuval Kluger

Research output: Contribution to journalConference articlepeer-review

Abstract

Accurate detection of outliers is crucial for the success of numerous data analysis tasks. In this context, we propose the Probabilistic Robust AutoEncoder (PRAE) that can simultaneously remove outliers during training (transductive) and learn a mapping that can be used to detect outliers in new data (inductive). We first present the Robust AutoEncoder (RAE) objective that excludes outliers while including a subset of samples (inliers) that can be effectively reconstructed using an AutoEncoder (AE). RAE minimizes the autoencoder’s reconstruction error while incorporating as many samples as possible. This could be formulated via regularization by subtracting an ℓ0 norm, counting the number of selected samples from the reconstruction term. As this leads to an intractable combinatorial problem, we propose two probabilistic relaxations of RAE, which are differentiable and alleviate the need for a combinatorial search. We prove that the solution to the PRAE problem is equivalent to the solution of RAE. We then use synthetic data to demonstrate that PRAE can accurately remove outliers in various contamination levels. Finally, we show that using PRAE for outlier detection leads to state-of-the-art results for inductive and transductive outlier detection.

Original languageEnglish
Pages (from-to)2271-2293
Number of pages23
JournalProceedings of Machine Learning Research
Volume244
StatePublished - 2024
Event40th Conference on Uncertainty in Artificial Intelligence, UAI 2024 - Barcelona, Spain
Duration: 15 Jul 202419 Jul 2024

Funding

FundersFunder number
National Institutes of HealthU54AG076043, UM1DA051410, P50CA121974, U54AG079759, R01GM131642, U01DA053628

    Fingerprint

    Dive into the research topics of 'Transductive and Inductive Outlier Detection with Robust Autoencoders'. Together they form a unique fingerprint.

    Cite this