Abstract: Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been ...
Abstract: Heterogeneous remote sensing image change detection (HRSICD) seeks to identify surface changes by comparing images captured at different times. However, CD faces significant challenges due ...
In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens. To address this, ...