DOI resolved by resea

Glove: Global Vectors for Word Representation

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arith-metic, but the origin of th…

Jeffrey Pennington, Richard Socher, Christopher D. Manning
https://resea.org/10.3115/v1/d14-1162

Abstract

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arith-metic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the en-tire sparse matrix or on individual context windows in a large corpus. The model pro-duces a vector space with meaningful sub-structure, as evidenced by its performance of 75 % on a recent word analogy task. It also outperforms related models on simi-larity tasks and named entity recognition. 1