Recent research shows that using a pre-trained vision-language model (VLM), like CLIP, to align a query image with detailed text descriptions generated by LLMs can enhance zero-shot classification.
Abstract: In this study, we propose Multimodal Fusion-supervised Cross-modality Alignment Perception (MulFS-CAP), a novel framework for single-stage fusion of unregistered infrared-visible images.
Abstract: For privacy protection of subjects in electroencephalogram (EEG)-based brain-computer interfaces (BCIs), using source-free domain adaptation (SFDA) for cross-subject recognition has proven ...
Edited by Charles T. Campbell, University of Washington, Seattle, WA; received June 11, 2024; accepted November 12, 2024 by Editorial Board Member Peter J. Rossky ...