Enhancing learning algorithms to support data with short sequence features by automated feature discovery

Ofer Dor, Yoram Reich*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this paper, we propose a VECtor DIScovery approach, called VECDIS, which enhances the learning performance of existing classifiers directly from various data types and is able to discover features made of multiple feature types for explanatory purposes. The data types could be combinations of multivariate, short time-series or short sequential data. The features in the dataset could have single item or/and a list of ordered items of different sizes. The present approach allows handling raw vector data without prior manipulation (i.e., preprocessing). The discovered features are made of vector and non-vector mathematical relations. The algorithm generates new vector features and mathematical expression features that are transmitted or exchanged with previously generated features, to the next iterative step. The approach is able to search and automatically discover thousands of different features (sequence manipulation), performed on the sequence features. We performed large number of experiments with various synthetic and simulated datasets and with a wide range of classifiers. The results show that VECDIS enhanced significantly the classification performance of existing classifiers to handle datasets having multiple feature types with short sequence features. Nevertheless, there is no guarantee that the mathematical library as presented in this paper is suitable to all sequence datasets and would lead to discovering a valuable feature set. Therefore, VECDIS enables expanding or exchanging the mathematical library as desire.

Original languageEnglish
Pages (from-to)114-132
Number of pages19
JournalKnowledge-Based Systems
Volume52
DOIs
StatePublished - Nov 2013

Keywords

  • Feature construction
  • Feature discovery
  • Feature selection
  • Preprocessing
  • Sequential data
  • Short sequence

Fingerprint

Dive into the research topics of 'Enhancing learning algorithms to support data with short sequence features by automated feature discovery'. Together they form a unique fingerprint.

Cite this