BNBU
Professor, Department of Computer Science, Beijing Normal-Hong Kong Baptist University (BNBU), Zhuhai, Guangdong, China.
Email: wfsu@uic.edu.cnWeifeng SU received his Ph.D. degrees in Computer Science and Engineering from Hong Kong University of Science & Technology in 2007. He is currently an Professor in the Department of Computer Science at Beijing Normal-Hong Kong Baptist University (BNBU) in Zhuhai, Guangdong, China. He has authored or co-authored many peer-reviewed publications in leading journals and conferences, including NeurIPS, IEEE TKDE, IEEE TNNLS, ACM TODS, ACM TWeb.
Vision–Language Models (VLMs) are rapidly advancing as a cornerstone of modern artificial intelligence, offering powerful capabilities to connect visual and textual information. This talk provides an overview of the foundations, representative architectures, and emerging applications of VLMs, with a particular emphasis on the healthcare domain. The presentation first traces the evolution of model inputs, from single-modality designs to frameworks that integrate textual prompts, visual cues, and heterogeneous multimodal data. Representative paradigms are then introduced, ranging from alignment-based dual encoders to unified encoder–decoder systems and large language model pipelines that incorporate images as contextual prompts. Finally, the medical relevance of VLMs is explored through their potential to support disease diagnosis, enhance clinical decision-making, and improve patient care across a variety of scenarios.