Abstract
Recovering a reliable three-dimensional (3D) hand mesh from a monocular image in camera space remains highly challenging, particularly for encoding fine-grained metric depth geometry. To address this challenge, we propose a novel deep learning model that estimates the absolute hand position in camera space while enhancing the details of the reconstructed mesh. Specifically, our model employs a shared-weight feature encoder integrated with a depth regression head to extract latent hand representations and predict an initial coarse depth map. A key component is the Pseudo Stereo System, which generates pseudo-right features from left-view features and disparity cues, and establishes geometric constraints using a flexible feature-matching module. This design enables the model to learn depth-aware representations under training-time geometric supervision, while inference relies on a single RGB image. With this design, dense hand depth estimation is effectively guided by disparity map regression. Finally, a Transformer-based recovery module leverages 2D image-plane and depth features in combination to infer the 3D hand mesh. Extensive experiments on the FreiHAND dataset demonstrate that our model significantly outperforms existing methods in camera-centered 3D hand reconstruction and exhibits robust generalization across datasets in both camera-centered and root-relative settings. Our codes are publicly available at: https://github.com/ShaoXiang23/Pseudo-Stereo-Hand.
| Original language | English |
|---|---|
| Article number | 115583 |
| Journal | Knowledge-Based Systems |
| Volume | 339 |
| Early online date | 24 Feb 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 24 Feb 2026 |
Keywords
- Camera-Centered mesh
- Depth-Aware fusion
- Hand mesh reconstruction
- Pseudo stereo supervision
- Transformer modeling
ASJC Scopus subject areas
- Management Information Systems
- Software
- Information Systems and Management
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Camera-Space Hand Mesh Reconstruction from a Monocular Image via Pseudo Stereo Perception'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver