TY - JOUR
T1 - A Probabilistic Model for Indel Evolution
T2 - Differentiating Insertions from Deletions
AU - Loewenthal, Gil
AU - Rapoport, Dana
AU - Avram, Oren
AU - Moshe, Asher
AU - Wygoda, Elya
AU - Itzkovitch, Alon
AU - Israeli, Omer
AU - Azouri, Dana
AU - Cartwright, Reed A.
AU - Mayrose, Itay
AU - Pupko, Tal
N1 - Publisher Copyright:
© The Author(s) 2021.
PY - 2021
Y1 - 2021
N2 - Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
AB - Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
KW - Alignments
KW - Approximate Bayesian computation
KW - Evolutionary models
KW - Indels
KW - Molecular evolution
UR - http://www.scopus.com/inward/record.url?scp=85122549245&partnerID=8YFLogxK
U2 - 10.1093/molbev/msab266
DO - 10.1093/molbev/msab266
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 34469521
AN - SCOPUS:85122549245
SN - 0737-4038
VL - 38
SP - 5769
EP - 5781
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 12
ER -