TY - GEN
T1 - A phonetic vocoder with scalable adaptation to speaker codebooks
AU - Halaly, Israel
AU - Bistritz, Yuval
PY - 2008
Y1 - 2008
N2 - The paper presents a very low bit rate phonetic vocoder based on speech recognition and synthesis with scalable speaker adaptation using a set of speaker phoneme codebooks (SPCBs). These SPCBs are designed by an LBG-like algorithm and are available to both encoder and decoder. The encoder compares periodically the input speech to adapted synthesized speech produced by the SPCBs. For each SPCB, the sequence of recognized phonemes and the pitch contour is used to synthesize speech that is adapted by a scalable multi-segmented spectral warping. The information about the SPCB and its adaptation parameters that achieve the least spectral distortion is transmitted to the decoder. This revises a vocoder presented earlier this year by considering for it a more flexible adaptation scheme with just a slight increase of the bit rate. Experiments held at a typical low bit rate of phonetic vocoders (around 300 bps) demonstrate that the scalable adaptation reduces the average spectral distortion and improves speaker recognizability as judged by listeners.
AB - The paper presents a very low bit rate phonetic vocoder based on speech recognition and synthesis with scalable speaker adaptation using a set of speaker phoneme codebooks (SPCBs). These SPCBs are designed by an LBG-like algorithm and are available to both encoder and decoder. The encoder compares periodically the input speech to adapted synthesized speech produced by the SPCBs. For each SPCB, the sequence of recognized phonemes and the pitch contour is used to synthesize speech that is adapted by a scalable multi-segmented spectral warping. The information about the SPCB and its adaptation parameters that achieve the least spectral distortion is transmitted to the decoder. This revises a vocoder presented earlier this year by considering for it a more flexible adaptation scheme with just a slight increase of the bit rate. Experiments held at a typical low bit rate of phonetic vocoders (around 300 bps) demonstrate that the scalable adaptation reduces the average spectral distortion and improves speaker recognizability as judged by listeners.
KW - Speaker recognizability
KW - Spectral warping
KW - Very low bit rate speech coding
KW - Vocal tract length normalization
UR - http://www.scopus.com/inward/record.url?scp=62749197609&partnerID=8YFLogxK
U2 - 10.1109/EEEI.2008.4736621
DO - 10.1109/EEEI.2008.4736621
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:62749197609
SN - 9781424424825
T3 - IEEE Convention of Electrical and Electronics Engineers in Israel, Proceedings
SP - 684
EP - 688
BT - 2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2008
Y2 - 3 December 2008 through 5 December 2008
ER -