Multi-Modal Multi-Stage Multi-Task Learning for Occlusion-Aware Facial Landmark Localisation

Research output: Contribution to journalArticlepeer-review

6 Downloads (Pure)

Abstract

Thermal facial imaging enables non-contact measurements of face heat patterns that are valuable for healthcare and affective computing, but common occluders (glasses, masks, scarves) and the single-channel, texture-poor nature of thermal frames make robust landmark localisation and visibility estimation challenging. We propose M3MSTL, a multi-modal, multi-stage, multi-task framework for occlusion-aware landmarking on thermal faces. M3MSTL pairs a ResNet-50 backbone with two lightweight heads: a compact fully connected landmark regressor and a Vision Transformer occlusion classifier that explicitly fuses per-landmark temperature cues. A three-stage curriculum (mask-based backbone pretraining, head specialisation with a frozen trunk, and final joint fine-tuning) stabilises optimisation and improves generalisation from limited thermal data. On the TFD68 dataset, M3MSTL substantially improves both visibility and localisation: the occlusion accuracy reaches 91.8% (baseline 89.7%), the mean NME reaches 0.246 (baseline 0.382), the ROC–AUC reaches 0.974, and the AP is 0.966. Paired statistical tests confirm that these gains are significant. Our approach aims to improve the reliability of temperature-based biometric and clinical measurements in the presence of realistic occluders.
Original languageEnglish
Article number28
JournalAI
Volume7
Issue number1
Early online date15 Jan 2026
DOIs
Publication statusPublished - Jan 2026

Keywords

  • thermal imaging
  • occlusion-aware landmark
  • biometrics
  • multi-stage training
  • multi-task learning
  • ResNet-50
  • vision transformer

Fingerprint

Dive into the research topics of 'Multi-Modal Multi-Stage Multi-Task Learning for Occlusion-Aware Facial Landmark Localisation'. Together they form a unique fingerprint.

Cite this