ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection
### Introduction
In this paper, we propose a novel feature fusion framework of dual cross-attention transformers to model global feature interaction and capture complementary information across modalities simultaneously. In addition, we introdece an iterative interaction mechanism into dual cross-attention transformers, which shares parameters among block-wise multimodal transformers to reduce model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.
Paper download in: https://arxiv.org/pdf/2308.07504.pdf
### Overview
Fig 1. Overview of our multispectral object detection framework