Compact Time-Domain Representation for Logical Access Spoofed Audio

Matan Karo*, Arie Yeredor, Itshak Lapidot

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filterbank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data lies. The suggested embeddings allow the use of simple linear separators to achieve 12.99% Equal Error Rate (EER) on ASVspoof2019 logical Access (LA) test set for female samples, and 12.09% for male samples. In addition, we present a convenient way to visualize the data, which helps to assess the efficiency of different spoofing techniques. Furthermore, we present reduced complexity embedding method by using compander quantization, which in some cases even improves the EER on the test set up to 3.00%. The experimental results show the potential of using multichannel PMF-based features for the anti-spoofing task, in addition to the benefits of using diffusion maps both as an analysis tool and as an embedding tool.

Original languageEnglish
Pages (from-to)946-958
Number of pages13
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
StatePublished - 2024

Keywords

  • Anti-spoofing
  • compander
  • diffusion maps
  • speech embedding
  • speech probability mass function

Fingerprint

Dive into the research topics of 'Compact Time-Domain Representation for Logical Access Spoofed Audio'. Together they form a unique fingerprint.

Cite this