Skip to main navigation Skip to search Skip to main content

Feature Fusion of Deep and Spatio-Temporal Features for Dynamic Scene Understanding

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Classifying scenes from video sequences where both the environment as well as the element is in motion, called dynamic scene recognition is crucial for applications such as surveillance, autonomous navigation, and environmental monitoring. Yet it remains challenging as it involves complex real-world variability and temporal dynamics by jointly modeling spatial and temporal cues. Early approaches based on hand-crafted descriptors lacked robustness and scalability, while many current methods still struggle under conditions such as camera motion, occlusions, and changing illumination. Addressing these challenges, we propose a novel hybrid spatio-temporal framework which integrates motion and appearance information. Most informative frames are identified through Top-K keyframe extraction, spatial features are captured using ResNet101, and temporal motion patterns are modeled through Volume Local Directional Number (VLDN). These fused spatio-temporal representations are then classified using a tuned 1D Convolutional Neural Network (1D-CNN) designed for sequential data. Extensive experiments performed on YUPENN and its extended dataset, demonstrate that the proposed lightweight pipeline consistently delivers robust and generalizable dynamic scene recognition and achieves state-of-the-art performance.
Original languageEnglish
Title of host publication2nd International Conference on Artificial Intelligence, Metaverse, and Cybersecurity (ICAMAC)
PublisherIEEE
ISBN (Electronic)9798331572259
DOIs
Publication statusPublished - 25 Feb 2026

Fingerprint

Dive into the research topics of 'Feature Fusion of Deep and Spatio-Temporal Features for Dynamic Scene Understanding'. Together they form a unique fingerprint.

Cite this