Machine Learning-Based Gene Prioritization Identifies Novel Candidate Risk Genes for Inflammatory Bowel Disease

Ofer Isakov, Iris Dotan, Shay Ben-Shachar*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

48 Scopus citations

Abstract

Background: The inflammatory bowel diseases (IBDs) are chronic inflammatory disorders, associated with genetic, immunologic, and environmental factors. Although hundreds of genes are implicated in IBD etiology, it is likely that additional genes play a role in the disease process. We developed a machine learning-based gene prioritization method to identify novel IBD-risk genes. Methods: Known IBD genes were collected from genome-wide association studies and annotated with expression and pathway information. Using these genes, a model was trained to identify IBD-risk genes. A comprehensive list of 16,390 genes was then scored and classified. Results: Immune and inflammatory responses, as well as pathways such as cell adhesion, cytokine-cytokine receptor interaction, and sulfur metabolism were identified to be related to IBD. Scores predicted for IBD genes were significantly higher than those for non-IBD genes (P < 10 -20). There was a significant association between the score and having an IBD publication (P < 10 -20). Overall, 347 genes had a high prediction score (>0.8). A literature review of the genes, excluding those used to train the model, identified 67 genes without any publication concerning IBD. These genes represent novel candidate IBD-risk genes, which can be targeted in future studies. Conclusions: Our method successfully differentiated IBD-risk genes from non-IBD genes by using information from expression data and a multitude of gene annotations. Crucial features were defined, and we were able to detect novel candidate risk genes for IBD. These findings may help detect new IBD-risk genes and improve the understanding of IBD pathogenesis.

Original languageEnglish
Pages (from-to)1516-1523
Number of pages8
JournalInflammatory Bowel Diseases
Volume23
Issue number9
DOIs
StatePublished - 1 Sep 2017

Funding

FundersFunder number
U.S. Department of the Interior
Parkinson's Disease Foundation2017, 4
Fetzer Institute
Esther B. Kahn Charitable Foundation
Leona M. and Harry B. Helmsley Charitable Trust2015PG-ISL020
Boston Medical Center6423906
Crohn's and Colitis FoundationDOI 10.1097/MIB.0000000000001222
Institute of Genetics
Tel Aviv University
Center for Nanoscience and Nanotechnology, Tel Aviv University

    Keywords

    • RNA-seq
    • big data
    • gene expression
    • genetics
    • machine learning

    Fingerprint

    Dive into the research topics of 'Machine Learning-Based Gene Prioritization Identifies Novel Candidate Risk Genes for Inflammatory Bowel Disease'. Together they form a unique fingerprint.

    Cite this