Abstract: The increasing interest in learning from paired medical images and textual reports highlights the need for methods that can achieve multi-grained alignment between these two modalities.
Abstract: Image captioning is an emerging field at the intersection of computer vision and natural language processing (NLP). It has shown great potential to enhance accessibility by automatically ...