Pandey, H., 2023. SWTF: Sparse Weighted Temporal Fusion for Drone-Based Activity Recognition. In: IEEE 41st International Conference on Consumer Electronics (ICCE 2023), 6-8 January 2023, Las Vegas, USA.
Full text available as:
|
PDF
1570860747 paper (3)_Redacted.pdf - Accepted Version Available under License Creative Commons Attribution Non-commercial. 6MB | |
PDF
1570860747 paper (3).pdf - Accepted Version Restricted to Repository staff only 6MB | ||
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Official URL: https://icce.org/2023/Home.html
Abstract
Drone-camera based human activity recognition (HAR) has received significant attention from the computer vision research community in the past few years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Fusion (SWTF) module to utilize sparsely sampled video frames for obtaining global weighted temporal fusion outcome. The proposed SWTF is divided into two components. First, a temporal segment network that sparsely samples a given set of frames. Second, weighted temporal fusion, that incorporates a fusion of feature maps derived from optical flow, with raw RGB images. This is followed by base-network, which comprises a convolutional neural network module along with fully connected layers that provide us with activity recognition. The SWTF network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a significant margin.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | Human Activity Recognition; ideo Understanding; Drone Action Recognition |
Group: | Faculty of Science & Technology |
ID Code: | 37768 |
Deposited By: | Symplectic RT2 |
Deposited On: | 10 Nov 2022 15:40 |
Last Modified: | 03 Apr 2023 11:03 |
Downloads
Downloads per month over past year
Repository Staff Only - |