DOI

10.5703/1288284317868

Description

We propose Hi-CAM, a Hierarchical-Convolutional Attention Mixing transformer for medical image segmentation. It uses a pre-trained MaxViT encoder and a novel decoder that combines multi-head self-attention, spatial attention, and squeeze-and-excitation modules to capture long-range dependencies. Deep and shallow convolutions enhance spatial features, while skip connections fuse multi-level features and suppress noise. Experiments on ACDC and Synapse datasets show that Hi-CAM significantly outperforms state-of-the-art models, highlighting the effectiveness of the hierarchical attention-mixing strategy.

Share

COinS
 

Hierarchical Convolutional Attention Mixing (Hi-CAM) Transformer for Medical Image Segmentation

We propose Hi-CAM, a Hierarchical-Convolutional Attention Mixing transformer for medical image segmentation. It uses a pre-trained MaxViT encoder and a novel decoder that combines multi-head self-attention, spatial attention, and squeeze-and-excitation modules to capture long-range dependencies. Deep and shallow convolutions enhance spatial features, while skip connections fuse multi-level features and suppress noise. Experiments on ACDC and Synapse datasets show that Hi-CAM significantly outperforms state-of-the-art models, highlighting the effectiveness of the hierarchical attention-mixing strategy.