Konstantin Vorontsov
2020
Topic Balancing with Additive Regularization of Topic Models
Eugeniia Veselova
|
Konstantin Vorontsov
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
This article proposes a new approach for building topic models on unbalanced collections in topic modelling, based on the existing methods and our experiments with such methods. Real-world data collections contain topics in various proportions, and often documents of the relatively small theme become distributed all over the larger topics instead of being grouped into one topic. To address this issue, we design a new regularizer for Theta and Phi matrices in probabilistic Latent Semantic Analysis (pLSA) model. We make sure this regularizer increases the quality of topic models, trained on unbalanced collections. Besides, we conceptually support this regularizer by our experiments.
TopicNet: Making Additive Regularisation for Topic Modelling Accessible
Victor Bulatov
|
Vasiliy Alekseev
|
Konstantin Vorontsov
|
Darya Polyudova
|
Eugenia Veselova
|
Alexey Goncharov
|
Evgeny Egorov
Proceedings of The 12th Language Resources and Evaluation Conference
This paper introduces TopicNet, a new Python module for topic modeling. This package, distributed under the MIT license, focuses on bringing additive regularization topic modelling (ARTM) to non-specialists using a general-purpose high-level language. The module features include powerful model visualization techniques, various training strategies, semi-automated model selection, support for user-defined goal metrics, and a modular approach to topic model training. Source code and documentation are available at https://github.com/machine-intelligence-laboratory/TopicNet
Search
Co-authors
- Eugeniia Veselova 1
- Victor Bulatov 1
- Vasiliy Alekseev 1
- Darya Polyudova 1
- Eugenia Veselova 1
- show all...