NeFT-Net: N-window extended frequency transformer for rhythmic motion prediction

Adeyemi Ademola, David Sinclair, Babis Koniaris, Samantha Hannah, Kenny Mitchell

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
13 Downloads (Pure)

Abstract

Advancements in prediction of human motion sequences are critical for enabling online virtual reality (VR) users to dance and move in ways that accurately mirror real-world actions, delivering a more immersive and connected experience. However, latency in networked motion tracking remains a significant challenge, disrupting engagement and necessitating predictive solutions to achieve real-time synchronization of remote motions. To address this issue, we propose a novel approach leveraging a synthetically generated dataset based on supervised foot anchor placement timings for rhythmic motions, ensuring periodicity and reducing prediction errors. Our model integrates a discrete cosine transform (DCT) to encode motion, refine high-frequency components, and smooth motion sequences, mitigating jittery artifacts. Additionally, we introduce a feed-forward attention mechanism designed to learn from N-window pairs of 3D key-point pose histories for precise future motion prediction. Quantitative and qualitative evaluations on the Human3.6M dataset highlight significant improvements in mean per joint position error (MPJPE) metrics, demonstrating the superiority of our technique over state-of-the-art approaches. We further introduce novel result pose visualizations through the use of generative AI methods.
Original languageEnglish
Article number104244
JournalComputers and Graphics
Volume129
Early online date17 May 2025
DOIs
Publication statusPublished - Jun 2025

Keywords

  • Machine learning
  • Motion processing
  • Rendering
  • Virtual reality

Fingerprint

Dive into the research topics of 'NeFT-Net: N-window extended frequency transformer for rhythmic motion prediction'. Together they form a unique fingerprint.

Cite this