In noisy environments it is difficult for a computer to understand what a person is saying especially when there are multiple speakers. In this paper we concentrate on separating overlapping speech. Non-negative matrix factorisation (NMF) is a method of doing source separation without needing a lot of data. The choice of cost function can have a significant impact on the performance of NMF. We evaluate NMF using three different cost functions (Euclidean, Itakura-Saito and Kullback-Leibler) including modifications using sparsity, convolution or additional information in the form of the direction of arrival. We conduct this evaluation on three different speech corpora. Adding directional information to NMF in the form of non-negative tensor factorisation (NTF) gives us the best result on the map task and vocalization corpora and the Itakura-Saito cost function performs best on the acoustic-camera corpus. In this paper, we show that the Itakura-Saito cost function is the most robust cost function when the recording contains noise. We do this by applying acoustic evaluation measurements.
|Title of host publication||Proceedings of Interspeech 2018|
|Number of pages||5|
|Publication status||Published - 2018|
|Event||Interspeech 2018 - Hyderabad, India|
Duration: 2 Sep 2018 → 6 Sep 2018
|Period||2/09/18 → 6/09/18|