A Session-GMM generative model using test utterance Gaussian mixture modeling for speaker verification

Hagai Aronowitz, David Burshtein, Amihood Amir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Test-utterance parameterization (TUP) using Gaussian Mixture Models (GMMs) has recently shown to be beneficial for speaker indexing due to its computational efficiency and identical accuracy compared to classic GMM-based recognizers. In this paper we show that TUP can also lead to more accurate speaker recognition. On the NIST-2004 evaluation corpus, recognition error rate was reduced by 8% compared to the classic GMM-based algorithm. Furthermore, we introduce a novel generative statistical model for generation of test utterances by speakers. This model is incorporated naturally into the TUP framework and improves speaker recognition accuracy. On the NIST-2004 evaluation corpus, recognition error rate was reduced by 15% compared to the classic GMM-based algorithm.

Original languageEnglish
Title of host publication2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages733-736
Number of pages4
ISBN (Print)0780388747, 9780780388741
DOIs
StatePublished - 2005
Event2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA, United States
Duration: 18 Mar 200523 Mar 2005

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
VolumeI
ISSN (Print)1520-6149

Conference

Conference2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Country/TerritoryUnited States
CityPhiladelphia, PA
Period18/03/0523/03/05

Fingerprint

Dive into the research topics of 'A Session-GMM generative model using test utterance Gaussian mixture modeling for speaker verification'. Together they form a unique fingerprint.

Cite this