Masked autoencoders in 3d point cloud representation learning

Jiang, Jincen; Lu, X.; Zhao, L.; Dazaley, R.; Wang, M.

Masked autoencoders in 3d point cloud representation learning.

Tools

Jiang, J., Lu, X., Zhao, L., Dazaley, R. and Wang, M., 2025. Masked autoencoders in 3d point cloud representation learning. IEEE Transactions on Multimedia, 27, 820-831.

Full text available as:

Preview

PDF
TMM_015431.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.
4MB

Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk.

Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material.

DOI: 10.1109/TMM.2023.3314973

Abstract

Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, methods based upon masking Autoencoders have been explored in the fields. The input can be intuitively masked due to regular content, like sequence words and 2D pixels. However, the extension to 3D point cloud is challenging due to irregularity. In this paper, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches, then complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB_T50_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4% and 86.2% classification accuracy, respectively). Our source codes are available at: https://github.com/Jinec98/MAE3D.

Item Type:	Article
ISSN:	1520-9210
Uncontrolled Keywords:	Self-supervised learning; Point cloud; Completion
Group:	Faculty of Media & Communication (Until 31/07/2025)
ID Code:	39113
Deposited By:	Symplectic RT2
Deposited On:	10 Nov 2023 10:04
Last Modified:	13 Sep 2025 01:08

Downloads

Downloads per month over past year

More statistics for this item...

Repository Staff Only -