CCIL Linguistics: Marco Baroni (Università di Trento)

Start date
End date
Location
UPF, Departament de Traducció i Ciències del Llenguatge 52.119, UPF, Roc Boronat (Poble Nou), Barcelona

Marco Baroni (Università di Trento)

Vector-based models of semantic relations

Abstract:

A large and growing tradition in computational linguistics and cognitive science (e.g., Lund & Burgess 1996, Landauer & Dumais 1997, Schuetze 1997, Rapp 2003, Sahlgren 2006, Pado & Lapata 2007) has shown that simple word co-occurrence statistics extracted from corpora capture important facets of lexical meaning. In this introductory lecture, I will focus on the class of corpus-based models that represent the distributional profile of a word as a vector, and use standard geometrical tools to capture relations between words in the resulting vector space. After discussing the intuition behind these approaches and introducing the basic formal machinery, I will describe different models that have been proposed, focusing in particular on how they differ in terms of representation of linguistic context and on the role played by dimensionality reduction techniques. I will then discuss various ways in which the models have been evaluated, and I will conclude with my view of the most important open issues in the field.