TY - JOUR
T1 - Machine Learning-Based Gene Prioritization Identifies Novel Candidate Risk Genes for Inflammatory Bowel Disease
AU - Isakov, Ofer
AU - Dotan, Iris
AU - Ben-Shachar, Shay
N1 - Publisher Copyright:
© 2017 Crohn's & Colitis Foundation.
PY - 2017/9/1
Y1 - 2017/9/1
N2 - Background: The inflammatory bowel diseases (IBDs) are chronic inflammatory disorders, associated with genetic, immunologic, and environmental factors. Although hundreds of genes are implicated in IBD etiology, it is likely that additional genes play a role in the disease process. We developed a machine learning-based gene prioritization method to identify novel IBD-risk genes. Methods: Known IBD genes were collected from genome-wide association studies and annotated with expression and pathway information. Using these genes, a model was trained to identify IBD-risk genes. A comprehensive list of 16,390 genes was then scored and classified. Results: Immune and inflammatory responses, as well as pathways such as cell adhesion, cytokine-cytokine receptor interaction, and sulfur metabolism were identified to be related to IBD. Scores predicted for IBD genes were significantly higher than those for non-IBD genes (P < 10 -20). There was a significant association between the score and having an IBD publication (P < 10 -20). Overall, 347 genes had a high prediction score (>0.8). A literature review of the genes, excluding those used to train the model, identified 67 genes without any publication concerning IBD. These genes represent novel candidate IBD-risk genes, which can be targeted in future studies. Conclusions: Our method successfully differentiated IBD-risk genes from non-IBD genes by using information from expression data and a multitude of gene annotations. Crucial features were defined, and we were able to detect novel candidate risk genes for IBD. These findings may help detect new IBD-risk genes and improve the understanding of IBD pathogenesis.
AB - Background: The inflammatory bowel diseases (IBDs) are chronic inflammatory disorders, associated with genetic, immunologic, and environmental factors. Although hundreds of genes are implicated in IBD etiology, it is likely that additional genes play a role in the disease process. We developed a machine learning-based gene prioritization method to identify novel IBD-risk genes. Methods: Known IBD genes were collected from genome-wide association studies and annotated with expression and pathway information. Using these genes, a model was trained to identify IBD-risk genes. A comprehensive list of 16,390 genes was then scored and classified. Results: Immune and inflammatory responses, as well as pathways such as cell adhesion, cytokine-cytokine receptor interaction, and sulfur metabolism were identified to be related to IBD. Scores predicted for IBD genes were significantly higher than those for non-IBD genes (P < 10 -20). There was a significant association between the score and having an IBD publication (P < 10 -20). Overall, 347 genes had a high prediction score (>0.8). A literature review of the genes, excluding those used to train the model, identified 67 genes without any publication concerning IBD. These genes represent novel candidate IBD-risk genes, which can be targeted in future studies. Conclusions: Our method successfully differentiated IBD-risk genes from non-IBD genes by using information from expression data and a multitude of gene annotations. Crucial features were defined, and we were able to detect novel candidate risk genes for IBD. These findings may help detect new IBD-risk genes and improve the understanding of IBD pathogenesis.
KW - RNA-seq
KW - big data
KW - gene expression
KW - genetics
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85028358144&partnerID=8YFLogxK
U2 - 10.1097/MIB.0000000000001222
DO - 10.1097/MIB.0000000000001222
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 28795970
AN - SCOPUS:85028358144
SN - 1078-0998
VL - 23
SP - 1516
EP - 1523
JO - Inflammatory Bowel Diseases
JF - Inflammatory Bowel Diseases
IS - 9
ER -