Jiang, J., Lu, X., Zhao, L., Dazaley, R. and Wang, M., 2023. Masked autoencoders in 3d point cloud representation learning. IEEE Transactions on Multimedia. (In Press)
Full text available as:
PDF
TMM_015431.pdf - Accepted Version Restricted to Repository staff only until 13 September 2025. Available under License Creative Commons Attribution Non-commercial. 4MB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Abstract
Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, methods based upon masking Autoencoders have been explored in the fields. The input can be intuitively masked due to regular content, like sequence words and 2D pixels. However, the extension to 3D point cloud is challenging due to irregularity. In this paper, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches, then complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB_T50_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4% and 86.2% classification accuracy, respectively). Our source codes are available at: https://github.com/Jinec98/MAE3D.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Self-supervised learning; Point cloud; Completion |
Group: | Faculty of Media & Communication |
ID Code: | 39113 |
Deposited By: | Symplectic RT2 |
Deposited On: | 10 Nov 2023 10:04 |
Last Modified: | 10 Nov 2023 10:04 |
Downloads
Downloads per month over past year
Repository Staff Only - |