On Improved Training of CNN for Acoustic Source Localisation

Elizabeth Vargas, James R. Hopgood, Keith Brown, Kartic Subr

Research output: Contribution to journalArticlepeer-review

Abstract

Convolutional Neural Networks (CNNs) are a popular choice for estimating Direction of Arrival (DoA) without explicitly estimating delays between multiple microphones. The CNN method first optimises unknown filter weights (of a CNN) by using observations and ground-truth directional information. This trained CNN is then used to predict incident directions given test observations. Most existing methods train using spectrally-flat random signals and test using speech. In this paper, which focuses on single source DoA estimation, we find that training with speech or music signals produces a relative improvement in DoA accuracy for a variety of audio classes across 16 acoustic conditions and 9 DoAs, amounting to an average improvement of around 17% and 19% respectively when compared to training with spectrally flat random signals. This improvement is also observed in scenarios in which the speech and music signals are synthesised using, for example, a Generative Adversarial Network (GAN). When the acoustic environments during test and training are similar and reverberant, training a CNN with speech outperforms Generalized Cross Correlation (GCC) methods by about 125%. When the test conditions are different, a CNN performs comparably. This paper takes a step towards answering open questions in the literature regarding the nature of the signals used during training, as well as the amount of data required for estimating DoA using CNNs.

Original languageEnglish
Pages (from-to)720-732
Number of pages13
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume29
Early online date8 Jan 2021
DOIs
Publication statusPublished - 2021

Keywords

  • Direction of arrival
  • convolutional neural network (CNN)
  • generative adversarial network (GAN)
  • microphone arrays
  • neural networks

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'On Improved Training of CNN for Acoustic Source Localisation'. Together they form a unique fingerprint.

Cite this