Abstract: Audio-visual segmentation (AVS) aims to achieve precise object segmentation by leveraging multimodal cues. However, effective alignment and fusion of audio and visual features are often ...
Recent studies have integrated convolutions into transformers to introduce inductive bias and improve generalization performance. However, the static nature of conventional convolution prevents it ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results