TY - JOUR
T1 - Segmental modeling using a continuous mixture of nonparametric models
AU - Goldberger, Jacob
AU - Burshtein, David
AU - Franco, Horacio
N1 - Funding Information:
Manuscript received March 24, 1997; revised July 5, 1998. This work was supported in part by DARPA through the Office of Naval Research under Contract N00014-94-C-0181. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Joseph Picone. J. Goldberger and D. Burshtein are with Tel-Aviv University, Ramat Aviv, Tel Aviv 69978, Israel (e-mail: [email protected]; [email protected]). H. Franco is with SRI International, Menlo Park, CA 94025 USA (e-mail: [email protected]). Publisher Item Identifier S 1063-6676(99)02733-9.
PY - 1999
Y1 - 1999
N2 - A major limitation of hidden Markov model (HMM) based automatic speech recognition is the inherent assumption that successive observations within a state are independent and identically distributed (IID). The IID assumption is reasonable for some of the states (e.g., a state that corresponds to a steady state vowel). However, most states clearly violate this assumption (e.g., states corresponding to vowel-constant transition, diphthongs, etc.) and are in fact characterized by a highly correlated and nonstationary speech signal. In recent years, alternative models have been proposed, that attempt to describe the dynamics of the signal within a phonetic unit. The new approach is generally known by the name segmental modeling, since the speech signal is modeled on a segment level base and not on a frame base (such as HMM). We propose a family of new segmental models that are composed of two elements. The first element is a nonparametric representation of the mean and variance trajectories, and the second is some parameterized transformation (e.g., random shift) of the trajectory that is global to the entire segment. The new model is in fact a continuous mixture of segment trajectories. We present recognition results on a large vocabulary task, and compare the model to alternative segment models on a triphone recognition task.
AB - A major limitation of hidden Markov model (HMM) based automatic speech recognition is the inherent assumption that successive observations within a state are independent and identically distributed (IID). The IID assumption is reasonable for some of the states (e.g., a state that corresponds to a steady state vowel). However, most states clearly violate this assumption (e.g., states corresponding to vowel-constant transition, diphthongs, etc.) and are in fact characterized by a highly correlated and nonstationary speech signal. In recent years, alternative models have been proposed, that attempt to describe the dynamics of the signal within a phonetic unit. The new approach is generally known by the name segmental modeling, since the speech signal is modeled on a segment level base and not on a frame base (such as HMM). We propose a family of new segmental models that are composed of two elements. The first element is a nonparametric representation of the mean and variance trajectories, and the second is some parameterized transformation (e.g., random shift) of the trajectory that is global to the entire segment. The new model is in fact a continuous mixture of segment trajectories. We present recognition results on a large vocabulary task, and compare the model to alternative segment models on a triphone recognition task.
UR - http://www.scopus.com/inward/record.url?scp=0032636332&partnerID=8YFLogxK
U2 - 10.1109/89.759032
DO - 10.1109/89.759032
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0032636332
SN - 1063-6676
VL - 7
SP - 262
EP - 271
JO - IEEE Transactions on Speech and Audio Processing
JF - IEEE Transactions on Speech and Audio Processing
IS - 3
ER -