Nick Howell
2020
An Unsupervised Method for Weighting Finite-state Morphological Analyzers
Amr Keleg
|
Francis Tyers
|
Nick Howell
|
Tommi Pirinen
Proceedings of The 12th Language Resources and Evaluation Conference
Morphological analysis is one of the tasks that have been studied for years. Different techniques have been used to develop models for performing morphological analysis. Models based on finite state transducers have proved to be more suitable for languages with low available resources. In this paper, we have developed a method for weighting a morphological analyzer built using finite state transducers in order to disambiguate its results. The method is based on a word2vec model that is trained in a completely unsupervised way using raw untagged corpora and is able to capture the semantic meaning of the words. Most of the methods used for disambiguating the results of a morphological analyzer relied on having tagged corpora that need to manually built. Additionally, the method developed uses information about the token irrespective of its context unlike most of the other techniques that heavily rely on the word’s context to disambiguate its set of candidate analyses.
Effort-value payoff in lemmatisation for Uralic languages
Nick Howell
|
Maria Bibaeva
|
Francis M. Tyers
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages
Language Models for Cloze Task Answer Generation in Russian
Anastasia Nikiforova
|
Sergey Pletenev
|
Daria Sinitsyna
|
Semen Sorokin
|
Anastasia Lopukhina
|
Nick Howell
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources
Linguistics predictability is the degree of confidence in which language unit (word, part of speech, etc.) will be the next in the sequence. Experiments have shown that the correct prediction simplifies the perception of a language unit and its integration into the context. As a result of an incorrect prediction, language processing slows down. Currently, to get a measure of the language unit predictability, a neurolinguistic experiment known as a cloze task has to be conducted on a large number of participants. Cloze tasks are resource-consuming and are criticized by some researchers as an insufficiently valid measure of predictability. In this paper, we compare different language models that attempt to simulate human respondents’ performance on the cloze task. Using a language model to create cloze task simulations would require significantly less time and conduct studies related to linguistic predictability.
Search
Co-authors
- Amr Keleg 1
- Francis Tyers 1
- Tommi A. Pirinen 1
- Maria Bibaeva 1
- Francis M. Tyers 1
- show all...