Agile Social Media Analysis With Neural Networks

Abstract: The large scale analysis of social media has the potential to revolutionize disciplines concemed with human activities that lack enough data to afford data-driven methods (e.g. social sciences and public health). However, current Social Media Analysis (SMA) methodologies are still hampered by several limitations.The noise, brevity and ambiguity of social media poses challenges to traditional NLP methods, often forcing analysts to rely on sub-optimal methods or devote extensiva efforts in model development. On the other hand, social media users are nota representativa sample of the population. Yet, current approaches tend to ignore the inherent biases of social media, and thus the outcomes of the analyses might not reflect broader trends.This thesis aims to reduce the costa and improve the quality of SMA systems by tackling the fundamental challenges of processing user generated content and the main limitations of current methodologies. To that end, an agile framework is proposed to accelerate model development by reducing feature engineering and data annotation efforts, and allowing linguistic resources to be re-used for various applications. The framework relies on a novel method to derive-resource neural networks in two steps: (i) learning unsupervised neural embeddings for words and users; (ii) constructing minimalista neural architectures that yield low-capacity modela, which can be trained with scarce labeled data. Then, a methodology is proposed to sample demographically representativa digital cohorts of social media users. These cohorts can be leveraged to conduct demographically controlled studies, thereby mitigating sampling biases, allowing analysts to get deeper insights and extrapolate findings gleaned from imperfect datasets. The evaluation was conducted over two case-studies of real-world social media analyses: the one, regarding the development of bespoke classifiers for social sciences studies; the other, concerned with the deployment of DEMOS, a novel digital epidemiology system to track public health discussions and monitor the prevalence of mental illnesses.

Tópicos: