Visual Programming Language Tutorial

Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models

Abstract: Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been ...

IEEE

Cross-Modal Visual Perception Consistency: A Language-Enhanced Approach for Heterogeneous Change Detection

Abstract: Heterogeneous remote sensing image change detection (HRSICD) seeks to identify surface changes by comparing images captured at different times. However, CD faces significant challenges due ...

GitHub

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens. To address this, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models

Cross-Modal Visual Perception Consistency: A Language-Enhanced Approach for Heterogeneous Change Detection

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Trending now