Yadav, S., Kera, S. B., Gonela, R. V., Tiwari, K., Pandey, H. and Akbar, S. A., 2022. TBAC: Transformers Based Attention Consensus for Human Activity Recognition. In: IEEE WCCI 2022 International Joint Conference on Neural Networks (IJCNN 2022), 18-23 July 2022, University of Padua, Italy.
Full text available as:
|
PDF
TACT_WCCI.pdf - Accepted Version Available under License Creative Commons Attribution Non-commercial. 5MB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Abstract
Human Activity Recognition is an important task in Computer Vision that involves the utilization of spatio-temporal features of videos to classify human actions. The temporal portion of videos contains vital information needed for accurate classification. However, common Deep Learning methods simply average the temporal features, thereby giving all frames equal importance irrespective of their relevance, which negatively impacts the accuracy of the model. To combat this adverse effect, this paper proposes a novel Transformer Based Attention Consensus (TBAC) module. The TBAC module can be used in a plug-and play manner as an alternate to the conventional consensus methods of any existing video action recognition network. The TBAC module contains four components: (i) Query Sampling Unit, (ii) Attention Extraction Unit, (iii) Softening Unit, and (iv) Attention Consensus Unit. Our experiments demonstrate that the use of the TBAC module in place of classical consensus can improve the performance of the CNN-based action recognition models, such as Channel Separated Convolutional Network (CSN), Temporal Shift Module (TSM), and Temporal Segment Network (TSN). We also propose the Decision Consensus (DC) algorithm that utilizes multiple independent but related action recognizer models in order to improve upon the performance of most of these constituent models, using a novel fusion algorithm. Results have been obtained on two benchmark human action recognition datasets, HMDB51 and HAA500. The use of the proposed TBAC module along with Decision Consensus achieves state-of-the-art performances, with 85.23% and 83.73% classification accuracies on the two databases HMDB51 and HAA500, respectively. The code will be made publicly available.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | Video Action Recognition; Human Activity Recognition; Transformers; Temporal Attention; Consensus; Convolutional Neural Networks |
Group: | Faculty of Science & Technology |
ID Code: | 36995 |
Deposited By: | Symplectic RT2 |
Deposited On: | 30 May 2022 10:26 |
Last Modified: | 01 Sep 2022 12:32 |
Downloads
Downloads per month over past year
Repository Staff Only - |