Wednesdays@DEI: Talks, 21-02-2024

Author and Affiliation: Miguel Couceiro, Université de Lorraine

Bio: Miguel Couceiro is a Full Professor in Computer Science at the University of Lorraine. He got his PhD in mathematics in December of 2006 from the University of Tampere, Finland, on topics pertaining to multiple valued logic, universal algebra and ordered structures. He also has two habilitation degrees, the first in 2013 in computer science from the Université Paris-Dauphine, France, on algebraic tools in multicriteria decision aiding, and the second in 2023 in mathematics from the Instituto Superior Técnico, University of Lisbon, on topics pertaining to knowledge discovery and reasoning. He has (co-)authored more than 200 articles in international journals, conference proceedings and book chapters. He was an elected member of IEEE MVL-SC technical committee (2018-2020), he is a member of the management committee of the new GDR CNRS RADIA (Reasoning, Learning, and Decision in Artificial Intelligence), and he is regularly invited as a PC member of several major conferences.

He is the head of the Orpailleur team at LORIA and he has been actively contributing to fair and explainable ML/AI models (he recently proposed FixOut to handle the fairness/accuracy trade-off KD algorithms and that is currently under startup incubation), and to automated reasoning in machine learning (especially, analogy and case-based reasoning). These research topics are supported by several international and french research projects, e.g., EU project TAILOR and the ANR projects AT2TA, RICOCHETS and InExtenso. He also actively participates in the (co-)supervision of PhD students (8 ongoing, 9 defended), and he holds several pedagogical responsibilities (local coordinator of the Erasmus Mundus LCT, responsible for the 2nd year of the MSc. NLP, and responsible for international relations at the IDMC, UL).

Title: Mitigating Language Model Stereotypes by Reinforcement Learning
Abstract: Widespread adoption of applications powered by large language models such as BERT and GPT highlights concerns within the community about the impact of unintended bias that such models can inherit from training data. For example, past work reports evidence of LLMs that proliferate gender stereotypes, as well as geographical and racial bias. This is particularly worrying given the current use of LLMs in decision support systems, policy making and autonomous agents. Previous bias mitigation approaches have focused on data pre-processing techniques, or on embedding debiasing techniques with substantial limitations in terms of increased resource requirements, annotation efforts as well as limitations in terms of applicability to a wide range of bias types. In this talk, we will present REFINE-LM, a post-hoc filtering of bias using Reinforcement learning that is both model and bias-type agnostic. Experiments across a range of models show that the proposed method (i) substantially reduces stereotypical bias while preserving language model performance; (ii) achieves applicability to a wide range of bias types, generalizing across contexts such as gender, ethnicity, religion, and nationality-based biases; (iii) enables a reduction in required training resources. This constitutes joint work with R. Qureshi (UCDublin), L.Galarraga (Inria) and Yvette Graham (TCDublin), but we will also discuss some perspectives and extensions, in particular the ongoing work in collaboration A. Kulkarni (MBZ, IDIAP).

Tags: