Abstract
We address the problem of explicit state and word duration modeling in hidden Markov models (HMMs). A major weakness of conventional HMMs is that they implicitly model state durations by a Geometric distribution, which is usually inappropriate. Using explicit modeling of state and word durations, it is possible to significantly enhance the performance of speech recognition systems. The main outcome of this work is a modified Viterbi algorithm that by incorporating both state and word duration modeling, reduces the string error rate of the conventional Viterbi algorithm by 29% and 43% for known and unknown string lengths respectively, for a speaker independent, connected digit string task. The uniqueness of the algorithm is that unlike alternative approaches, it adds the duration metric at each frame transition (and not at the end of a state, word or sentence), thus enhancing the performance.
Original language | English |
---|---|
Pages (from-to) | 548-551 |
Number of pages | 4 |
Journal | Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing |
Volume | 1 |
State | Published - 1995 |
Event | Proceedings of the 1995 20th International Conference on Acoustics, Speech, and Signal Processing. Part 1 (of 5) - Detroit, MI, USA Duration: 9 May 1995 → 12 May 1995 |