Empower dynamic scene understanding through scene flow estimation and object segmentation.

Tools

Li, Z., 2025. Empower dynamic scene understanding through scene flow estimation and object segmentation. Doctoral Thesis (Doctoral). Bournemouth University.

Full text available as:

Preview

PDF
LI, Zhiqi_Ph.D._2025.pdf
Available under License Creative Commons Attribution Non-commercial.
12MB

Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk.

Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material.

Abstract

Understanding dynamic 3D scenes—critical for applications like autonomous navigation and mixed reality—requires pars- ing both motion (scene flow) and object interactions (segmen- tation). Scene flow captures 3D motion fields, while segmen- tation isolates objects, enabling systems to interpret evolving environments. Integrating these tasks offers a holistic view but faces computational challenges due to scene flow’s high dimensionality. This work proposes a lightweight deep learning architecture combining an enhanced Point Transformer for efficient fea- ture extraction and a point-voxel correlation module for sta- ble motion estimation. To bypass labor-intensive object annotations, scene flow is leveraged as auxiliary supervision. Instead of predicting masks for all points, this thesis focuses on key points, reducing com- plexity while maintaining accuracy. The proposed clustering- free approach achieves state-of-the-art results on indoor datasets. For temporal consistency, an unsupervised method integrates continuous point cloud sequences (encoding spatial embed- dings) with time-independent queries (encoding object se- mantics). This enables gradual mask prediction across frames without direct labels, accommodating dynamic inputs. This framework advances dynamic scene understanding by harmo- nizing motion and segmentation, validated through competi- tive benchmarks and flexible input handling.

Item Type:	Thesis (Doctoral)
Additional Information:	If you feel that this work infringes your copyright please contact the BURO Manager.
Group:	Faculty of Media & Communication
ID Code:	41016
Deposited By:	Symplectic RT2
Deposited On:	12 May 2025 11:57
Last Modified:	12 May 2025 12:00

Downloads

Downloads per month over past year

More statistics for this item...

Repository Staff Only -