Skip to main content

TBAC: Transformers Based Attention Consensus for Human Activity Recognition.

Yadav, S., Kera, S. B., Gonela, R. V., Tiwari, K., Pandey, H. and Akbar, S. A., 2022. TBAC: Transformers Based Attention Consensus for Human Activity Recognition. In: IEEE WCCI 2022 International Joint Conference on Neural Networks (IJCNN 2022), 18-23 July 2022, University of Padua, Italy.

Full text available as:

TACT_WCCI.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.



Human Activity Recognition is an important task in Computer Vision that involves the utilization of spatio-temporal features of videos to classify human actions. The temporal portion of videos contains vital information needed for accurate classification. However, common Deep Learning methods simply average the temporal features, thereby giving all frames equal importance irrespective of their relevance, which negatively impacts the accuracy of the model. To combat this adverse effect, this paper proposes a novel Transformer Based Attention Consensus (TBAC) module. The TBAC module can be used in a plug-and play manner as an alternate to the conventional consensus methods of any existing video action recognition network. The TBAC module contains four components: (i) Query Sampling Unit, (ii) Attention Extraction Unit, (iii) Softening Unit, and (iv) Attention Consensus Unit. Our experiments demonstrate that the use of the TBAC module in place of classical consensus can improve the performance of the CNN-based action recognition models, such as Channel Separated Convolutional Network (CSN), Temporal Shift Module (TSM), and Temporal Segment Network (TSN). We also propose the Decision Consensus (DC) algorithm that utilizes multiple independent but related action recognizer models in order to improve upon the performance of most of these constituent models, using a novel fusion algorithm. Results have been obtained on two benchmark human action recognition datasets, HMDB51 and HAA500. The use of the proposed TBAC module along with Decision Consensus achieves state-of-the-art performances, with 85.23% and 83.73% classification accuracies on the two databases HMDB51 and HAA500, respectively. The code will be made publicly available.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Video Action Recognition; Human Activity Recognition; Transformers; Temporal Attention; Consensus; Convolutional Neural Networks
Group:Faculty of Science & Technology
ID Code:36995
Deposited By: Symplectic RT2
Deposited On:30 May 2022 10:26
Last Modified:01 Sep 2022 12:32


Downloads per month over past year

More statistics for this item...
Repository Staff Only -