Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text
Shengbin Jia, Ling Ding, Xiaojun Chen, Shijia E, Yang Xiang
Abstract
Chinese word segmentation is necessary to provide word-level information for Chinese named entity recognition (NER) systems. However, segmentation error propagation is a challenge for Chinese NER while processing colloquial data like social media text. In this paper, we propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text, especially by leveraging uncertain information of word segmentation. Such ambiguous information contains all the potential segmentation states of a sentence that provides a channel for the model to infer deep word-level characteristics. We propose a trilogy (i.e., Candidate Position Embedding => Position Selective Attention => Adaptive Word Convolution) to encode uncertain word segmentation information and acquire appropriate word-level representation. Experimental results on the social media corpus show that our model alleviates the segmentation error cascading trouble effectively, and achieves a significant performance improvement of 2% over previous state-of-the-art methods.- Anthology ID:
- 2020.socialnlp-1.7
- Volume:
- Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Venues:
- SocialNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 51–60
- URL:
- https://www.aclweb.org/anthology/2020.socialnlp-1.7
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.socialnlp-1.7.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.