Who said that? A comparative study of non-negative matrix factorization techniques

Teun Krikke, Frank Broz, David Michael Lane

Research output: Chapter in Book/Report/Conference proceedingConference contribution

74 Downloads (Pure)


In noisy environments it is difficult for a computer to understand what a person is saying especially when there are multiple speakers. In this paper we concentrate on separating overlapping speech. Non-negative matrix factorisation (NMF) is a method of doing source separation without needing a lot of data. The choice of cost function can have a significant impact on the performance of NMF. We evaluate NMF using three different cost functions (Euclidean, Itakura-Saito and Kullback-Leibler) including modifications using sparsity, convolution or additional information in the form of the direction of arrival. We conduct this evaluation on three different speech corpora. Adding directional information to NMF in the form of non-negative tensor factorisation (NTF) gives us the best result on the map task and vocalization corpora and the Itakura-Saito cost function performs best on the acoustic-camera corpus. In this paper, we show that the Itakura-Saito cost function is the most robust cost function when the recording contains noise. We do this by applying acoustic evaluation measurements.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2018
Number of pages5
Publication statusPublished - 2018
EventInterspeech 2018 - Hyderabad, India
Duration: 2 Sept 20186 Sept 2018


ConferenceInterspeech 2018
Internet address


Dive into the research topics of 'Who said that? A comparative study of non-negative matrix factorization techniques'. Together they form a unique fingerprint.

Cite this