Prova de Doutoramento do aluno Rita Moreira Parada Ramos

Área: Engenharia Informática e de Computadores

Despacho de nomeação de Júri

Título da Tese: Image Captioning with Retrieval Augmentation

Local da Prova:  Anfiteatro PA-3 (Piso -1 do Pavilhão de Matemática) do IST

Data: 24/07/2025

Hora: 14h00
Abstract: This Ph.D. thesis addresses the study and development of deep learning methods combining visual and textual contents, with emphasis on image captioning. Our research specifically contributes to exploring retrieval augmentation for image captioning. Retrieval augmentation has been gaining traction in natural language processing, but remains relatively unexplored in Vision-and-Language (V&L) tasks, specially in non knowledge-intensive tasks such as image captioning. We propose retrieval-augmented models for image captioning, demonstrating that incorporating retrieved captions not only improves in-domain performance, but also reduces model parameters and enables domain transfer without additional training. We also propose multilingual captioning models augmented with retrieval, demonstrating that retrieved examples can reduce the need for extensive multilingual data as well and facilitate language transfer. Finally, we provide a thorough analysis on how retrieved information can impact caption generation, and propose a more retrieval-robust approach to mitigate inaccuracies in the retrieved captions. Overall, we made a number of contributions that demonstrate that retrieval augmentation enhances captioning performance and leads to parameter and data-efficient, adaptable, and more inclusive captioning models.

Tópicos: