Wednesdays@DEI: Talks, 13-09-2023

Autor e vínculos: Armando Vieira, Lead Data Scientist em Hazy (UK)
Título: Synthetic Data Methods and Use Cases
Abstract: The presentation starts by defining synthetic data as an artificially generated data through machine learning (ML) algorithms. Synthetic data offers a solution to key issues such as data scarcity and privacy. It also helps ML through data augmentation improving models performance. I will delve into different methods for generating synthetic data, including Variational Autoencoders (VAE), Generative Adversarial Networks (GANs), and the Synthpope - a method especially noted for its effectiveness. I will discuss tabular, multi tabular data stored in RDBMs as well as sequential data. Appropriate metrics for fidelity, utility and privacy will be presented. I will discuss some applications of synthetic data - from simulating patient data in healthcare to simulating diverse economic conditions for stress testing in financial services. Finally I will discuss the use of synthetic data to overcome bias and fairness in ML models.