TY - JOUR
T1 - Optimal set cover formulation for exclusive row biclustering of gene expression
AU - Painsky, Amichai
AU - Rosset, Saharon
N1 - Funding Information:
Regular Paper This research was funded in part by Israeli Science Foundation under Grant No. 1227/09 and by a grant to Amichai Painsky from the Israeli Center for Absorption in Science. A preliminary version of the paper was published in the Proceedings of ICDM 2012. ©2014 Springer Science + Business Media, LLC & Science Press, China
PY - 2014/5
Y1 - 2014/5
N2 - The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.
AB - The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.
KW - biclustering
KW - exclusive row biclustering
KW - gene expression
KW - projected clustering
UR - http://www.scopus.com/inward/record.url?scp=84901657035&partnerID=8YFLogxK
U2 - 10.1007/s11390-014-1440-y
DO - 10.1007/s11390-014-1440-y
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84901657035
SN - 1000-9000
VL - 29
SP - 423
EP - 435
JO - Journal of Computer Science and Technology
JF - Journal of Computer Science and Technology
IS - 3
ER -