TY - JOUR
T1 - Compact Time-Domain Representation for Logical Access Spoofed Audio
AU - Karo, Matan
AU - Yeredor, Arie
AU - Lapidot, Itshak
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filterbank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data lies. The suggested embeddings allow the use of simple linear separators to achieve 12.99% Equal Error Rate (EER) on ASVspoof2019 logical Access (LA) test set for female samples, and 12.09% for male samples. In addition, we present a convenient way to visualize the data, which helps to assess the efficiency of different spoofing techniques. Furthermore, we present reduced complexity embedding method by using compander quantization, which in some cases even improves the EER on the test set up to 3.00%. The experimental results show the potential of using multichannel PMF-based features for the anti-spoofing task, in addition to the benefits of using diffusion maps both as an analysis tool and as an embedding tool.
AB - Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filterbank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data lies. The suggested embeddings allow the use of simple linear separators to achieve 12.99% Equal Error Rate (EER) on ASVspoof2019 logical Access (LA) test set for female samples, and 12.09% for male samples. In addition, we present a convenient way to visualize the data, which helps to assess the efficiency of different spoofing techniques. Furthermore, we present reduced complexity embedding method by using compander quantization, which in some cases even improves the EER on the test set up to 3.00%. The experimental results show the potential of using multichannel PMF-based features for the anti-spoofing task, in addition to the benefits of using diffusion maps both as an analysis tool and as an embedding tool.
KW - Anti-spoofing
KW - compander
KW - diffusion maps
KW - speech embedding
KW - speech probability mass function
UR - http://www.scopus.com/inward/record.url?scp=85179778645&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2023.3341000
DO - 10.1109/TASLP.2023.3341000
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85179778645
SN - 2329-9290
VL - 32
SP - 946
EP - 958
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -