Tliba, M., Kerkouri, M.A., Ghariba, B., Chetouani, A., Coltekin, A., Shehata, M.S. and Bruno, A., 2022. SATSal: A Multi-Level Self-Attention Based Architecture for Visual Saliency Prediction. IEEE Access, 10, 20701-20713.
Full text available as:
|
PDF (OPEN ACCESS ARTICLE)
SATSaI.pdf - Published Version Available under License Creative Commons Attribution. 1MB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
DOI: 10.1109/ACCESS.2022.3152189
Abstract
Human visual Attention modelling is a persistent interdisciplinary research challenge, gaining new interest in recent years mainly due to the latest developments in deep learning. That is particularly evident in saliency benchmarks. Novel deep learning-based visual saliency models show promising results in capturing high-level (top-down) human visual attention processes. Therefore, they strongly differ from the earlier approaches, mainly characterised by low-level (bottom-up) visual features. These developments account for innate human selectivity mechanisms that are reliant on both high- and low-level factors. Moreover, the two factors interact with each other. Motivated by the importance of these interactions, in this project, we tackle visual saliency modelling holistically, examining if we could consider both high- and low-level features that govern human attention. Specifically, we propose a novel method SAtSal (Self-Attention Saliency). SAtSal leverages both high and low-level features using a multilevel merging of skip connections during the decoding stage. Consequently, we incorporate convolutional self-attention modules on skip connection from the encoder to the decoder network to properly integrate the valuable signals from multilevel spatial features. Thus, the self-attention modules learn to filter out the latent representation of the salient regions from the other irrelevant information in an embedded and joint manner with the main encoder-decoder model backbone. Finally, we evaluate SAtSal against various existing solutions to validate our approach, using the well-known standard saliency benchmark MIT300. To further examine SAtSal's robustness on other image types, we also evaluate it on the Le-Meur saliency painting benchmark.
Item Type: | Article |
---|---|
ISSN: | 2169-3536 |
Uncontrolled Keywords: | Eye movements; low and high vision; saliency prediction; self-attention; visual attention |
Group: | Faculty of Science & Technology |
ID Code: | 36795 |
Deposited By: | Symplectic RT2 |
Deposited On: | 29 Mar 2022 14:18 |
Last Modified: | 29 Mar 2022 14:18 |
Downloads
Downloads per month over past year
Repository Staff Only - |