TY - JOUR
T1 - Discovering Local Structure in Gene Expression Data
T2 - The Order-Preserving Submatrix Problem
AU - Ben-Dor, Amir
AU - Chor, Benny
AU - Karp, Richard
AU - Yakhini, Zohar
PY - 2003
Y1 - 2003
N2 - This paper concerns the discovery of patterns in gene expression matrices, in which each element gives the expression level of a given gene in a given experiment. Most existing methods for pattern discovery in such matrices are based on clustering genes by comparing their expression levels in all experiments, or clustering experiments by comparing their expression levels for all genes. Our work goes beyond such global approaches by looking for local patterns that manifest themselves when we focus simultaneously on a subset G of the genes and a subset T of the experiments. Specifically, we look for order-preserving submatrices (OPSMs), in which the expression levels of all genes induce the same linear ordering of the experiments (we show that the OPSM search problem is NP-hard in the worst case). Such a pattern might arise, for example, if the experiments in T represent distinct stages in the progress of a disease or in a cellular process and the expression levels of all genes in G vary across the stages in the same way. We define a probabilistic model in which an OPSM is hidden within an otherwise random matrix. Guided by this model, we develop an efficient algorithm for finding the hidden OPSM in the random matrix. In data generated according to the model, the algorithm recovers the hidden OPSM with a very high success rate. Application of the methods to breast cancer data seem to reveal significant local patterns.
AB - This paper concerns the discovery of patterns in gene expression matrices, in which each element gives the expression level of a given gene in a given experiment. Most existing methods for pattern discovery in such matrices are based on clustering genes by comparing their expression levels in all experiments, or clustering experiments by comparing their expression levels for all genes. Our work goes beyond such global approaches by looking for local patterns that manifest themselves when we focus simultaneously on a subset G of the genes and a subset T of the experiments. Specifically, we look for order-preserving submatrices (OPSMs), in which the expression levels of all genes induce the same linear ordering of the experiments (we show that the OPSM search problem is NP-hard in the worst case). Such a pattern might arise, for example, if the experiments in T represent distinct stages in the progress of a disease or in a cellular process and the expression levels of all genes in G vary across the stages in the same way. We define a probabilistic model in which an OPSM is hidden within an otherwise random matrix. Guided by this model, we develop an efficient algorithm for finding the hidden OPSM in the random matrix. In data generated according to the model, the algorithm recovers the hidden OPSM with a very high success rate. Application of the methods to breast cancer data seem to reveal significant local patterns.
KW - Data analysis
KW - Gene expression
KW - Local pattern
KW - Local structure
KW - Non-parametric methods
UR - http://www.scopus.com/inward/record.url?scp=0242690489&partnerID=8YFLogxK
U2 - 10.1089/10665270360688075
DO - 10.1089/10665270360688075
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 12935334
AN - SCOPUS:0242690489
SN - 1066-5277
VL - 10
SP - 373
EP - 384
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 3-4
ER -