Prova de Doutoramento do aluno Ricardo Costa Dias Rei

Área: Engenharia Informática e de Computadores
Título da Tese: Robust, Interpretable and Efficient MT Evaluation with Fine-tuned Metrics
Local da Prova: https://videoconf-colibri.zoom.us/j/93213825745
Data: 10/04/2024
Hora: 13h00
Abstract: With the increasing need for Machine Translation (MT) in a world which is becoming globalized, there is also an increasing need to constantly evaluate the quality of the produced translations. This evaluation can be achieved through human annotators performing quality assessments or by employing automatic metrics. While human evaluation is preferred, it is expensive and time-consuming. Consequently, over the past decade, MT progress has primarily been measured using automatic metrics that assess lexical similarity against reference translations. However, numerous studies have demonstrated that lexical-based metrics do not correlate well with human judgments, casting doubt on the reliability of research in MT. Motivated by these challenges, the main goal of this thesis is to enhance the current state of MT evaluation by developing new automatic metrics that satisfy four criteria: 1) strong correlation with human judgments, 2) robustness across different domains and language pairs, 3) interpretability, and 4) efficiency. To that end we propose the COMET framework to develop supervised metrics that are optimized toward human judgments of MT quality, such as Direct Assessments (DA), Multidimensional Quality Metrics (MQM), or Human-mediated Translation Edit Rate (HTER). Our results demonstrate that metrics developed within our framework achieve state-of-the-art correlations with human judgments across various domains and language pairs. Also, we show that such metrics are interpretable as they leverage token-level information that can be attributed to translation errors. Additionally, we present several experiments aimed at reducing the computational cost and model size of COMET models. Through this work, we undertake the ambitious task of revolutionizing MT evaluation by introducing new metrics that excel in terms of performance, robustness, interpretability, and lightweight nature. This thesis represents substantial progress towards achieving this goal.


