FITAT 2025

Speaker

Weifeng Su

BNBU

Professor, Department of Computer Science, Beijing Normal-Hong Kong Baptist University (BNBU), Zhuhai, Guangdong, China.

Email: wfsu@uic.edu.cn

Biography:

Weifeng SU received his Ph.D. degrees in Computer Science and Engineering from Hong Kong University of Science & Technology in 2007. He is currently an Professor in the Department of Computer Science at Beijing Normal-Hong Kong Baptist University (BNBU) in Zhuhai, Guangdong, China. He has authored or co-authored many peer-reviewed publications in leading journals and conferences, including NeurIPS, IEEE TKDE, IEEE TNNLS, ACM TODS, ACM TWeb.

Vision–Language Models for Healthcare: From Foundations to Applications

Vision–Language Models (VLMs) are rapidly advancing as a cornerstone of modern artificial intelligence, offering powerful capabilities to connect visual and textual information. This talk provides an overview of the foundations, representative architectures, and emerging applications of VLMs, with a particular emphasis on the healthcare domain. The presentation first traces the evolution of model inputs, from single-modality designs to frameworks that integrate textual prompts, visual cues, and heterogeneous multimodal data. Representative paradigms are then introduced, ranging from alignment-based dual encoders to unified encoder–decoder systems and large language model pipelines that incorporate images as contextual prompts. Finally, the medical relevance of VLMs is explored through their potential to support disease diagnosis, enhance clinical decision-making, and improve patient care across a variety of scenarios.