The combination of global attention mechanism and positional attention mechanism is a very promising direction in deep learning! It provides a powerful tool for deep learning models, especially when dealing with tasks involving complex spatial structures and temporal relationships.
This combination can fully leverage the advantages of both to improve model performance and accuracy. For example, the representative model AFFAM achieved an accuracy as high as 99.29%. The global attention mechanism focuses on all parts of the input data, processing the entire input sequence or image with weighted processing, capable of capturing key information across the global scope. This helps the model understand the overall structure and content of the data and more accurately identify key features and patterns. The positional attention mechanism, on the other hand, focuses on utilizing the positional information of elements in the input data, helping the model understand spatial or temporal relationships between elements, thereby better capturing critical information such as object positions in images.
To keep everyone up-to-date with the field's forefront, I have compiled 7 representative combination methods, including original papers and code, let's take a look!
Scan the QR code below, reply with "Global + Position" to get all paper collections and project code for free
Paper Analysis
Enhancing Multivariate Time Series Classifiers through Self-Attention and Relative Positioning Infusion
「Paper Summary」
In this paper, the authors propose two new attention blocks (Global Temporal Attention Module and Temporal Pseudo-Gaussian Enhanced Self-Attention Module) that can enhance deep learning-based TSC methods, even if these methods are designed and optimized for specific datasets or tasks. The authors validate this claim by evaluating several state-of-the-art deep learning-based TSC models on the University of East Anglia (UEA) benchmark (a standardized collection of 30 multivariate time series classification (MTSC) datasets).
Experiments show that adding the proposed attention blocks can improve the average accuracy of the baseline models by 3.6%. Furthermore, the proposed TPS block uses a novel injection module to incorporate relative positional information from the transformer. As a standalone unit with lower computational complexity, it allows TPS to outperform most state-of-the-art DNN-based TSC methods.
Adaptive feature fusion with attention mechanism for multi-scale target detection
「Paper Summary」
To detect objects of different sizes, object detectors like YOLO V3 and DSSD employ multi-scale outputs. To improve detection performance, YOLO V3 and DSSD perform feature fusion by combining two adjacent scales. However, feature fusion only between adjacent scales is not sufficient. It does not utilize features from other scales. Furthermore, concatenation, as a common feature fusion operation, does not provide a mechanism to learn the importance and correlation of features at different scales.
This paper proposes an Adaptive Feature Fusion Attention Mechanism (AFFAM) for multi-scale object detection. AFFAM utilizes path layers and sub-pixel convolutional layers to adjust the size of feature maps, which helps in better learning complex feature maps. Additionally, AFFAM employs global attention mechanism and spatial positional attention mechanism respectively to adaptively learn the correlation of channel features and the importance of spatial features at different scales. Finally, the authors combine AFFAM with YOLO V3 to build an efficient multi-scale object detector.
DPAFNet: A Residual Dual-Path Attention-Fusion Convolutional Neural Network for Multimodal Brain Tumor Segmentation
「Paper Summary」
This paper proposes an efficient 3D segmentation model (DPAFNet) based on the dual-path (DP) module and multi-scale attention fusion (MAF) module. In DPAFNet, dual-path convolution is used to expand the network scale, and residual connections are introduced to avoid network degradation. An attention fusion module is proposed to fuse channel-level global and local information and fuse feature maps of different scales to obtain features rich in semantic information. This ensures that the object information of small tumors is fully emphasized.
In addition, the 3D Iterative Dilated Convolution Merging (IDCM) module expands the receptive field and improves contextual awareness. Ablation experiments verify the optimal combination of dilation rates for the dilated convolution merging module and demonstrate that the segmentation accuracy is improved due to post-processing methods.
Combining Global and Local Attention with Positional Encoding for Video Summarization
「Paper Summary」
This paper proposes a novel supervised video summarization method. To overcome the drawbacks of existing RNN-based summarization architectures, which are related to modeling dependencies with distant frames and the ability to parallelize the training process, the developed model relies on using self-attention mechanisms to estimate the importance of video frames. Unlike previous attention-based summarization methods that model frame dependencies by observing the entire frame sequence, this method combines global and local multi-head attention mechanisms to discover different modellings of frame dependencies at different granularity levels.
Furthermore, the utilized attention mechanism integrates components for encoding the temporal position of video frames - which is very important when creating video summaries. Experiments on two datasets (SumMe and TVSum) show that the proposed model is effective compared to existing attention-based methods and is competitive with other state-of-the-art supervised summarization methods.
Scan the QR code below, reply with "Global + Position" to get all paper collections and project code for free