A phonetic vocoder with scalable adaptation to speaker codebooks

Israel Halaly, Yuval Bistritz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The paper presents a very low bit rate phonetic vocoder based on speech recognition and synthesis with scalable speaker adaptation using a set of speaker phoneme codebooks (SPCBs). These SPCBs are designed by an LBG-like algorithm and are available to both encoder and decoder. The encoder compares periodically the input speech to adapted synthesized speech produced by the SPCBs. For each SPCB, the sequence of recognized phonemes and the pitch contour is used to synthesize speech that is adapted by a scalable multi-segmented spectral warping. The information about the SPCB and its adaptation parameters that achieve the least spectral distortion is transmitted to the decoder. This revises a vocoder presented earlier this year by considering for it a more flexible adaptation scheme with just a slight increase of the bit rate. Experiments held at a typical low bit rate of phonetic vocoders (around 300 bps) demonstrate that the scalable adaptation reduces the average spectral distortion and improves speaker recognizability as judged by listeners.

Original languageEnglish
Title of host publication2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2008
Pages684-688
Number of pages5
DOIs
StatePublished - 2008
Event2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2008 - Eilat, Israel
Duration: 3 Dec 20085 Dec 2008

Publication series

NameIEEE Convention of Electrical and Electronics Engineers in Israel, Proceedings

Conference

Conference2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2008
Country/TerritoryIsrael
CityEilat
Period3/12/085/12/08

Keywords

  • Speaker recognizability
  • Spectral warping
  • Very low bit rate speech coding
  • Vocal tract length normalization

Fingerprint

Dive into the research topics of 'A phonetic vocoder with scalable adaptation to speaker codebooks'. Together they form a unique fingerprint.

Cite this