Abstract
The paper presents a very low bit rate phonetic vocoder based on speech recognition and synthesized speech with speaker adaptation using a set of speaker phoneme codebooks (SPCBs). The vocoder incorporates a well designed set of speaker phonemes codebooks that are available to both the encoder and decoder. The encoder performs periodically 'analysis by synthesis' that compares the incoming speech to speech that the decoder could synthesize from the output stream of the phoneme recognizer and the quantized pitch data per each SPCB and adapts it to the incoming speech by spectral warping. The index of the best performing SPCB and its adaptation parameter are transmitted to the decoder, together with the pitch and recognizer output bit streams, to synthesize speech that resembles better the speaker. In experiments held at a typical low bit rate of phonetic vocoders (below 300 bps), the incorporated adaptation reduced the average spectral distortion and increased speaker recognizability as judged by listeners. copyright by EURASIP.
Original language | English |
---|---|
Journal | European Signal Processing Conference |
State | Published - 2008 |
Event | 16th European Signal Processing Conference, EUSIPCO 2008 - Lausanne, Switzerland Duration: 25 Aug 2008 → 29 Aug 2008 |