Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces
Ivan Vulić, Anna Korhonen, Goran Glavaš
Abstract
Work on projection-based induction of cross-lingual word embedding spaces (CLWEs) predominantly focuses on the improvement of the projection (i.e., mapping) mechanisms. In this work, in contrast, we show that a simple method for post-processing monolingual embedding spaces facilitates learning of the cross-lingual alignment and, in turn, substantially improves bilingual lexicon induction (BLI). The post-processing method we examine is grounded in the generalisation of first- and second-order monolingual similarities to the nth-order similarity. By post-processing monolingual spaces before the cross-lingual alignment, the method can be coupled with any projection-based method for inducing CLWE spaces. We demonstrate the effectiveness of this simple monolingual post-processing across a set of 15 typologically diverse languages (i.e., 15*14 BLI setups), and in combination with two different projection methods.- Anthology ID:
- 2020.repl4nlp-1.7
- Volume:
- Proceedings of the 5th Workshop on Representation Learning for NLP
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Venues:
- ACL | RepL4NLP | WS
- SIG:
- SIGREP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 45–54
- URL:
- https://www.aclweb.org/anthology/2020.repl4nlp-1.7
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.repl4nlp-1.7.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.