Abstract
We present a method for clustering short push-to-talk speech segments in the presence of different numbers of speakers. Iterative Mean Shift algorithm based on the cosine distance is used to perform speaker clustering on i-vectors generated from many short speech segments. We report results as measured by the Accuracy, the average number of detected speakers (ANDS), the average cluster purity (ACP), the average speaker purity (ASP) and K . We achieve clustering accuracy of: 90.0%, 86.9% and 72.1% for 3, 15 and 60 speakers respectively.
Original language | English |
---|---|
Pages (from-to) | 3031-3035 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2015-January |
State | Published - 2015 |
Externally published | Yes |
Event | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: 6 Sep 2015 → 10 Sep 2015 |
Keywords
- Cosine distance
- Mean-shift clustering
- Short segments
- Speaker clustering