TY - GEN
T1 - Probabilistic inference of viral quasispecies subject to recombination
AU - Zagordi, Osvaldo
AU - Töpfer, Armin
AU - Prabhakaran, Sandhya
AU - Roth, Volker
AU - Halperin, Eran
AU - Beerenwinkel, Niko
PY - 2012
Y1 - 2012
N2 - RNA viruses are present in a single host as a population of different but related strains. This population, shaped by the combination of genetic change and selection, is called quasispecies. Genetic change is due to both point mutations and recombination events. We present a jumping hidden Markov model that describes the generation of the viral quasispecies and a method to infer its parameters by analysing next generation sequencing data. The model introduces position-specific probability tables over the sequence alphabet to explain the diversity that can be found in the population at each site. Recombination events are indicated by a change of state, allowing a single observed read to originate from multiple sequences. We present an implementation of the EM algorithm to find maximum likelihood estimates of the model parameters and a method to estimate the distribution of viral strains in the quasispecies. The model is validated on simulated data, showing the advantage of explicitly taking the recombination process into account, and applied to reads obtained from two experimental HIV samples.
AB - RNA viruses are present in a single host as a population of different but related strains. This population, shaped by the combination of genetic change and selection, is called quasispecies. Genetic change is due to both point mutations and recombination events. We present a jumping hidden Markov model that describes the generation of the viral quasispecies and a method to infer its parameters by analysing next generation sequencing data. The model introduces position-specific probability tables over the sequence alphabet to explain the diversity that can be found in the population at each site. Recombination events are indicated by a change of state, allowing a single observed read to originate from multiple sequences. We present an implementation of the EM algorithm to find maximum likelihood estimates of the model parameters and a method to estimate the distribution of viral strains in the quasispecies. The model is validated on simulated data, showing the advantage of explicitly taking the recombination process into account, and applied to reads obtained from two experimental HIV samples.
KW - Hidden Markov model
KW - Molecular sequence analysis
KW - Next-generation sequencing
KW - Sequencing and genotyping technologies
KW - Viral quasispecies
UR - http://www.scopus.com/inward/record.url?scp=84860819723&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-29627-7_36
DO - 10.1007/978-3-642-29627-7_36
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84860819723
SN - 9783642296260
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 342
EP - 354
BT - Research in Computational Molecular Biology - 16th Annual International Conference, RECOMB 2012, Proceedings
T2 - 16th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2012
Y2 - 21 April 2012 through 24 April 2012
ER -