Li, Z., Adamczewska, N. and Tang, W., 2025. KW EDITING DR TO CHECK - Beyond short segments: A comprehensive multi-modal salience prediction dataset with standard-length 360-degree videos. 2025 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR), 44-53.
Full text available as:
Preview |
PDF
VRST.pdf - Accepted Version Available under License Creative Commons Attribution Non-commercial. 2MB |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
DOI: 10.1109/AIxVR63409.2025.00015
Abstract
Understanding user interactions in immersive 360-degree video environment is crucial for the quality of user experience. Complex spatial ambisonic information in 360-degree videos enriches sensory experiences but also poses unique challenges for multimodal salience prediction. Recent research has introduced various 360-degree datasets containing ambisonic sound but these datasets primarily contain only short video segments, typically under 30 seconds. Our present study shows that extrapolating ambisonic features learnt from short video segments to longer ones will lead to inaccuracies in VR video streaming. Therefore, we have developed a comprehensive multimodal standard-length (between 40s to 240s) 360-degree video dataset in response to challenges of multimodal salience predication. Based on data collected from 30 participants in mono and ambisonic audio settings, our user behaviour analysis sheds new light on the relationship between ambisonic audio distribution and viewer attention across full-length videos. The findings of our study underscore the complexity and challenges of applying a feature learning strategy from short segments to standard video lengths. They demonstrate that extrapolating learning from short video segments to longer ones is generally not applicable, even though it is widely used in current practices. Furthermore, we assess existing salience prediction models and introduce an efficient baseline model to evaluate the impact of different modality features in our dataset. The insights and the new dataset of our study establish a more realistic benchmark for future research on multimodal salience prediction in 360-degree videos.
Item Type: | Article |
---|---|
ISSN: | 2771-7445 |
Additional Information: | Conference 2025 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR) Lisbon, Portugal 27-29 January 2025 |
Uncontrolled Keywords: | 360 degree video; Ambisonic Sound; Salience Prediction; Multimodal Learning |
Group: | Faculty of Science & Technology |
ID Code: | 41245 |
Deposited By: | Symplectic RT2 |
Deposited On: | 20 Aug 2025 14:33 |
Last Modified: | 20 Aug 2025 14:33 |
Downloads
Downloads per month over past year
Repository Staff Only - |