Representative Selection in Nonmetric Datasets

Elad Liebman*, Benny Chor, Peter Stone

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This study considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements. This subset needs to inherently reflect the type of information contained in the entire set, while minimizing redundancy. For such purposes, clustering might seem like a natural approach. However, existing clustering methods are not ideally suited for representative selection, especially when dealing with nonmetric data, in which only a pairwise similarity measure exists. In this article we propose δ-medoids, a novel approach that can be viewed as an extension of the k-medoids algorithm and is specifically suited for sample representative selection from nonmetric data. We empirically validate δ-medoids in two domains: music analysis and motion analysis. We also show some theoretical bounds on the performance of δ-medoids and the hardness of representative selection in general.

Original languageEnglish
Pages (from-to)807-838
Number of pages32
JournalApplied Artificial Intelligence
Volume29
Issue number8
DOIs
StatePublished - 14 Sep 2015

Fingerprint

Dive into the research topics of 'Representative Selection in Nonmetric Datasets'. Together they form a unique fingerprint.

Cite this