TY - JOUR
T1 - Enhancing learning algorithms to support data with short sequence features by automated feature discovery
AU - Dor, Ofer
AU - Reich, Yoram
PY - 2013/11
Y1 - 2013/11
N2 - In this paper, we propose a VECtor DIScovery approach, called VECDIS, which enhances the learning performance of existing classifiers directly from various data types and is able to discover features made of multiple feature types for explanatory purposes. The data types could be combinations of multivariate, short time-series or short sequential data. The features in the dataset could have single item or/and a list of ordered items of different sizes. The present approach allows handling raw vector data without prior manipulation (i.e., preprocessing). The discovered features are made of vector and non-vector mathematical relations. The algorithm generates new vector features and mathematical expression features that are transmitted or exchanged with previously generated features, to the next iterative step. The approach is able to search and automatically discover thousands of different features (sequence manipulation), performed on the sequence features. We performed large number of experiments with various synthetic and simulated datasets and with a wide range of classifiers. The results show that VECDIS enhanced significantly the classification performance of existing classifiers to handle datasets having multiple feature types with short sequence features. Nevertheless, there is no guarantee that the mathematical library as presented in this paper is suitable to all sequence datasets and would lead to discovering a valuable feature set. Therefore, VECDIS enables expanding or exchanging the mathematical library as desire.
AB - In this paper, we propose a VECtor DIScovery approach, called VECDIS, which enhances the learning performance of existing classifiers directly from various data types and is able to discover features made of multiple feature types for explanatory purposes. The data types could be combinations of multivariate, short time-series or short sequential data. The features in the dataset could have single item or/and a list of ordered items of different sizes. The present approach allows handling raw vector data without prior manipulation (i.e., preprocessing). The discovered features are made of vector and non-vector mathematical relations. The algorithm generates new vector features and mathematical expression features that are transmitted or exchanged with previously generated features, to the next iterative step. The approach is able to search and automatically discover thousands of different features (sequence manipulation), performed on the sequence features. We performed large number of experiments with various synthetic and simulated datasets and with a wide range of classifiers. The results show that VECDIS enhanced significantly the classification performance of existing classifiers to handle datasets having multiple feature types with short sequence features. Nevertheless, there is no guarantee that the mathematical library as presented in this paper is suitable to all sequence datasets and would lead to discovering a valuable feature set. Therefore, VECDIS enables expanding or exchanging the mathematical library as desire.
KW - Feature construction
KW - Feature discovery
KW - Feature selection
KW - Preprocessing
KW - Sequential data
KW - Short sequence
UR - http://www.scopus.com/inward/record.url?scp=84883872953&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2013.07.013
DO - 10.1016/j.knosys.2013.07.013
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84883872953
SN - 0950-7051
VL - 52
SP - 114
EP - 132
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
ER -