Skip to main content

Self-supervised learning for fine-grained monocular 3D face reconstruction in the wild.

Huang, D., Shi, Y., Liu, J. and Tang, W., 2024. Self-supervised learning for fine-grained monocular 3D face reconstruction in the wild. Multimedia Systems, 30, 235.

Full text available as:

[thumbnail of Self-supervised learning for fine-grained monocular 3D face reconstruction in the wild.pdf]
Preview
PDF
Self-supervised learning for fine-grained monocular 3D face reconstruction in the wild.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.

3MB

DOI: 10.1007/s00530-024-01436-3

Abstract

Reconstructing 3D face from monocular images is a challenging computer vision task, due to the limitations of traditional 3DMM (3D Morphable Model) and the lack of high-fidelity 3D facial scanning data. To solve this issue, we propose a novel coarse-to-fine self-supervised learning framework for reconstructing fine-grained 3D faces from monocular images in the wild. In the coarse stage, face parameters extracted from a single image are used to reconstruct a coarse 3D face through a 3DMM. In the refinement stage, we design a wavelet transform perception model to extract facial details in different frequency domains from an input image. Furthermore, we propose a depth displacement module based on the wavelet transform perception model to generate a refined displacement map from the unwrapped UV textures of the input image and rendered coarse face, which can be used to synthesize detailed 3D face geometry. Moreover, we propose a novel albedo map module based on the wavelet transform perception model to capture high-frequency texture information and generate a detailed albedo map consistent with face illumination. The detailed face geometry and albedo map are used to reconstruct a fine-grained 3D face without any labeled data. We have conducted extensive experiments that demonstrate the superiority of our method over existing state-of-the-art approaches for 3D face reconstruction on four public datasets including CelebA, LS3D, LFW, and NoW benchmark. The experimental results indicate that our method achieved higher accuracy and robustness, particularly of under the challenging conditions such as occlusion, large poses, and varying illuminations.

Item Type:Article
ISSN:0942-4962
Uncontrolled Keywords:3D face reconstruction; Monocular image; 3DMM; Self-supervised learning; Coarse-to-fine model
Group:Faculty of Media & Communication
ID Code:41389
Deposited By: Symplectic RT2
Deposited On:22 Sep 2025 13:08
Last Modified:22 Sep 2025 13:08

Downloads

Downloads per month over past year

More statistics for this item...
Repository Staff Only -