Abstract
The rapid advancement of multimedia and imaging technologies has resulted in increasingly diverse visual and semantic data. A large range of applications such as remote-assisted driving requires the amalgamated storage and transmission of various visual and semantic data. However, existing works suffer from the limitation of insufficiently exploiting the redundancy between different types of data. In this article, we propose a unified framework to jointly compress a diverse spectrum of visual and semantic data, including images, point clouds, segmentation maps, object attributes, and relations. We develop a unifying process that embeds the representations of these data into a joint embedding graph according to their categories, which enables flexible handling of joint compression tasks for various visual and semantic data. To fully leverage the redundancy between different data types, we further introduce an embedding-based adaptive joint encoding process and a Semantic Adaptation Module to efficiently encode diverse data based on the learned embeddings in the joint embedding graph. Experiments on the Cityscapes, MSCOCO, and KITTI datasets demonstrate the superiority of our framework, highlighting promising steps toward scalable multimedia processing.
Original language | English |
---|---|
Article number | 221 |
Journal | ACM Transactions on Multimedia Computing, Communications and Applications |
Volume | 20 |
Issue number | 7 |
Early online date | 28 Mar 2024 |
DOIs | |
Publication status | Published - Jul 2024 |
Keywords
- Joint visual and semantic data compression
- multimedia processing
- unified compression framework
- visual semantic data
ASJC Scopus subject areas
- Hardware and Architecture
- Computer Networks and Communications