Linear separability of gene expression data sets

Giora Unger*, Benny Chor

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

We study simple geometric properties of gene expression data sets, where samples are taken from two distinct classes (e.g., two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes exhibits linear separation with respect to the two classes, then the joint expression level of the two genes is strongly correlated to the phenomena of the sample being taken from one class or the other. This may indicate an underlying molecular mechanism relating the two genes and the phenomena(e.g., a specific cancer). We developed and implemented novel efficient algorithmic tools for finding all pairs of genes that induce a linear separation of the two sample classes. These tools are based on computational geometric properties and were applied to 10 publicly available cancer data sets. For each data set, we computed the number of actual separating pairs and compared it to an upper bound on the number expected by chance and to the numbers resulting from shuffling the labels of the data at random empirically. Seven out of these 10 data sets are highly separable. Statistically, this phenomenon is highly significant, very unlikely to occur at random. It is therefore reasonable to expect that it manifests a functional association between separating genes and the underlying phenotypic classes.

Original languageEnglish
Article number4604654
Pages (from-to)375-381
Number of pages7
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume7
Issue number2
DOIs
StatePublished - 2010

Funding

FundersFunder number
Israel Science Foundation418/00

    Keywords

    • Bioinformatics (genome or protein) databases
    • Biology and genetics
    • DNA microarrays
    • Data mining
    • Diagnosis
    • Gene expression analysis
    • Geometrical problems and computations
    • Heuristic methods
    • Information filtering
    • Linear separation

    Fingerprint

    Dive into the research topics of 'Linear separability of gene expression data sets'. Together they form a unique fingerprint.

    Cite this