Yadav, S., Deshmukh, A., Gonela, R., Kera, S., Tiwari, K., Pandey, H. and Akbar, S. A., 2022. MS-KARD: A Benchmark for Multimodal Karate Action Recognition. In: IEEE WCCI 2022 International Joint Conference on Neural Networks (IJCNN 2022), 18-23 July 2022, University of Padua, Italy.
Full text available as:
|
PDF
KarateNet_WCCI.pdf - Accepted Version Available under License Creative Commons Attribution Non-commercial. 2MB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Abstract
Classifying complex human motion sequences is a major research challenge in the domain of human activity recognition. Currently, most popular datasets lack a specialized set of classes pertaining to similar action sequences (in terms of spatial trajectories). To recognize such complex action sequences with high inter-class similarity, such as those in karate, multiple streams are required. To fulfill this need, we propose MS-KARD, a Multi-Stream Karate Action Recognition Dataset that uses multiple vision perspectives, as well as sensor data - accelerometer and gyroscope. It includes 1518 video clips along with their corresponding sensor data. Each video was shot at 30fps and lasts around one minute, equating to a total of 2,814,930 frames and 5,623,734 sensor data samples. The dataset has been collected for 23 classes like Jodan Zuki, Oi Zuki, etc. The data acquisition setting involves the combination of 2 orthogonal web cameras and 3 wearable inertial sensors recording both vision and inertial data respectively. The aim of this dataset is to aid research that deals with recognizing human actions that have similar spatial trajectories. The paper describes statistics of the dataset, acquisition setting, and provides baseline performance figures using popular action recognizers. We propose an ensemble-based method, KarateNet, that performs decision-level fusion on the two input modalities (vision and sensor data) to classify actions. For the first stream, the RGB frames are extracted from the videos and passed into action recognition networks like Temporal Segment Network (TSN) and Temporal Shift Module (TSM). For the second stream, the sensor data is converted into a 2- D image and fed into a Convolutional Neural Network (CNN). The results reported were obtained on performing a fusion of the 2 streams. We also report results on ablations that use fusion with various input settings. The dataset and code will be made publicly available.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | Action recognition; Multimodal; Karate and martial arts; Sports and exercises; Deep learning; Vision and wearable |
Group: | Faculty of Science & Technology |
ID Code: | 36996 |
Deposited By: | Symplectic RT2 |
Deposited On: | 30 May 2022 10:33 |
Last Modified: | 01 Sep 2022 12:32 |
Downloads
Downloads per month over past year
Repository Staff Only - |