Full Seminar Details
Dr. Goran Glavaš
University of Mannheim
This event took place on Thursday 10 October 2019 at 15:15
Cross-lingual word embeddings (CLEs) hold promise of multilingual modeling of meaning and cross-lingual transfer of NLP models. Early models for inducing cross-lingual word vector spaces, requiring sentence- or document-level bilingual signal (i.e., parallel or comparable corpora) have recently been replaced by resource-leaner projection-based CLE models, which require cheap word-level bilingual supervision or even no supervision as all. Despite the ubiquitous usage of CLEs in downstream tasks, they are almost exclusively evaluated intrinsically only on the task of bilingual lexicon induction (BLI). Even BLI evaluations vary greatly, preventing us from correctly interpreting performance and behavior of different CLE models. In this talk, I will present initial steps towards a comprehensive evaluation of cross-lingual word embeddings. I will present results of a systemmatic comparative evaluation of both supervised and unsupervised projection-based CLE models on a large number of language pairs, both in BLI and three diverse downstream tasks, and provide new insights about the ability of cutting-edge CLE models to support cross-lingual NLP. Our study shows that performance of CLE models largely depends on the downstream task and that overfitting CLE models to BLI can severely hurt downstream performance. Finally, I will indicate the most robust supervised and unsupervised CLE models and emphasize the need to reassess simple baselines, which display competitive performance in many settings.
Maven of the Month
We are also inviting top experts in AI and Knowledge Technologies to discuss major socio-technological topics with an audience that comprises both members of the Knowledge Media Institute, as well as the wider staff at The Open University. Differently from our seminar series, these events follow a Q&A format.