Deep Head Pose: Gaze-Direction Estimation in Multimodal Video

Sankha Subhra Mukherjee, Neil Robertson

Research output: Contribution to journalArticlepeer-review

82 Citations (Scopus)
270 Downloads (Pure)

Abstract

In this paper we present a convolutional neural network (CNN)-based model for human head pose estimation in low-resolution multi-modal RGB-D data. We pose the problem as one of classification of human gazing direction. We further fine-tune a regressor based on the learned deep classifier. Next we combine the two models (classification and regression) to estimate approximate regression confidence. We present state-of-the-art results in datasets that span the range of high-resolution human robot interaction (close up faces plus depth information) data to challenging low resolution outdoor surveillance data. We build upon our robust head-pose estimation and further introduce a new visual attention model to recover interaction with the environment. Using this probabilistic model, we show that many higher level scene understanding like human-human/scene interaction detection can be achieved. Our solution runs in real-time on commercial hardware.
Original languageEnglish
Pages (from-to)2094-2107
Number of pages14
JournalIEEE Transactions on Multimedia
Volume17
Issue number11
Early online date28 Sep 2015
DOIs
Publication statusPublished - Nov 2015

Keywords

  • Convolutional neural networks (CNNs)
  • deep learning
  • gaze direction
  • head-pose
  • RGB-D

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Media Technology
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Deep Head Pose: Gaze-Direction Estimation in Multimodal Video'. Together they form a unique fingerprint.

Cite this