TY - JOUR

T1 - FactorialHMM

T2 - Fast and exact inference in factorial hidden Markov models

AU - Schweiger, Regev

AU - Erlich, Yaniv

AU - Carmi, Shai

N1 - Publisher Copyright:
© The Author(s) 2018.

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Motivation: Hidden Markov models (HMMs) are powerful tools for modeling processes along the genome. In a standard genomic HMM, observations are drawn, at each genomic position, from a distribution whose parameters depend on a hidden state, and the hidden states evolve along the genome as a Markov chain. Often, the hidden state is the Cartesian product of multiple processes, each evolving independently along the genome. Inference in these so-called Factorial HMMs has a na?'ve running time that scales as the square of the number of possible states, which by itself increases exponentially with the number of sub-chains; such a running time scaling is impractical for many applications. While faster algorithms exist, there is no available implementation suitable for developing bioinformatics applications. Results: We developed FactorialHMM, a Python package for fast exact inference in Factorial HMMs. Our package allows simulating either directly from the model or from the posterior distribution of states given the observations. Additionally, we allow the inference of all key quantities related to HMMs: (i) the (Viterbi) sequence of states with the highest posterior probability; (ii) the likelihood of the data and (iii) the posterior probability (given all observations) of the marginal and pairwise state probabilities. The running time and space requirement of all procedures is linearithmic in the number of possible states. Our package is highly modular, providing the user with maximal flexibility for developing downstream applications.

AB - Motivation: Hidden Markov models (HMMs) are powerful tools for modeling processes along the genome. In a standard genomic HMM, observations are drawn, at each genomic position, from a distribution whose parameters depend on a hidden state, and the hidden states evolve along the genome as a Markov chain. Often, the hidden state is the Cartesian product of multiple processes, each evolving independently along the genome. Inference in these so-called Factorial HMMs has a na?'ve running time that scales as the square of the number of possible states, which by itself increases exponentially with the number of sub-chains; such a running time scaling is impractical for many applications. While faster algorithms exist, there is no available implementation suitable for developing bioinformatics applications. Results: We developed FactorialHMM, a Python package for fast exact inference in Factorial HMMs. Our package allows simulating either directly from the model or from the posterior distribution of states given the observations. Additionally, we allow the inference of all key quantities related to HMMs: (i) the (Viterbi) sequence of states with the highest posterior probability; (ii) the likelihood of the data and (iii) the posterior probability (given all observations) of the marginal and pairwise state probabilities. The running time and space requirement of all procedures is linearithmic in the number of possible states. Our package is highly modular, providing the user with maximal flexibility for developing downstream applications.

UR - http://www.scopus.com/inward/record.url?scp=85068424289&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty944

DO - 10.1093/bioinformatics/bty944

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

C2 - 30445428

AN - SCOPUS:85068424289

SN - 1367-4803

VL - 35

SP - 2162

EP - 2164

JO - Bioinformatics

JF - Bioinformatics

IS - 12

ER -