Curious instance selection

Michal Moran, Tom Cohen, Yuval Ben-Zion, Goren Gordon

Research output: Contribution to journalArticlepeer-review

Abstract

In the process of building machine learning models data sometimes must be sampled before the learning process can be applied. This step, known as instance selection, is mostly done to reduce the amount of data in a volume that will allow the computing resources required for the learning phase to be reduced. In addition, it also removes noisy data that can affect the learning quality. While the two objectives are often in conflict, in most current approaches, it is impossible to control the balance between them. We propose a reinforcement learning-based approach for instance selection, called curious instance selection (CIS), which evaluates clusters of instances using the curiosity loop architecture. The output of the algorithm is a matrix that represents the value of adding a cluster of instances to existing instances. This matrix enables the computation of the Pareto front and demonstrates the ability to balance the noise and volume reduction objectives. CIS was evaluated on five datasets, and its performance was compared with the performance of three state-of-the-art algorithms. Our results show that CIS not only provides enhanced flexibility but also achieves higher effectiveness (reduction times accuracy). This approach strengthens the appeal of using curiosity-based algorithms in data science.

Original languageEnglish
Pages (from-to)794-808
Number of pages15
JournalInformation Sciences
Volume608
DOIs
StatePublished - Aug 2022

Keywords

  • Curiosity loop
  • Data science
  • Instance selection
  • Intrinsic motivation learning
  • Machine learning
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Curious instance selection'. Together they form a unique fingerprint.

Cite this