Tomoyuki Kajiwara


2020

pdf bib
Text Classification with Negative Supervision
Sora Ohashi | Junya Takayama | Tomoyuki Kajiwara | Chenhui Chu | Yuki Arase
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Advanced pre-trained models for text representation have achieved state-of-the-art performance on various text classification tasks. However, the discrepancy between the semantic similarity of texts and labelling standards affects classifiers, i.e. leading to lower performance in cases where classifiers should assign different labels to semantically similar texts. To address this problem, we propose a simple multitask learning model that uses negative supervision. Specifically, our model encourages texts with different labels to have distinct representations. Comprehensive experiments show that our model outperforms the state-of-the-art pre-trained model on both single- and multi-label classifications, sentence and document classifications, and classifications in three different languages.

pdf bib
Word Complexity Estimation for Japanese Lexical Simplification
Daiki Nishihara | Tomoyuki Kajiwara
Proceedings of The 12th Language Resources and Evaluation Conference

We introduce three language resources for Japanese lexical simplification: 1) a large-scale word complexity lexicon, 2) the first synonym lexicon for converting complex words to simpler ones, and 3) the first toolkit for developing and benchmarking Japanese lexical simplification system. Our word complexity lexicon is expanded to a broader vocabulary using a classifier trained on a small, high-quality word complexity lexicon created by Japanese language teachers. Based on this word complexity estimator, we extracted simplified word pairs from a large-scale synonym lexicon and constructed a simplified synonym lexicon useful for lexical simplification. In addition, we developed a Python library that implements automatic evaluation and key methods in each subtask to ease the construction of a lexical simplification pipeline. Experimental results show that the proposed method based on our lexicon achieves the highest performance of Japanese lexical simplification. The current lexical simplification is mainly studied in English, which is rich in language resources such as lexicons and toolkits. The language resources constructed in this study will help advance the lexical simplification system in Japanese.

pdf bib
Annotation of Adverse Drug Reactions in Patients’ Weblogs
Yuki Arase | Tomoyuki Kajiwara | Chenhui Chu
Proceedings of The 12th Language Resources and Evaluation Conference

Adverse drug reactions are a severe problem that significantly degrade quality of life, or even threaten the life of patients. Patient-generated texts available on the web have been gaining attention as a promising source of information in this regard. While previous studies annotated such patient-generated content, they only reported on limited information, such as whether a text described an adverse drug reaction or not. Further, they only annotated short texts of a few sentences crawled from online forums and social networking services. The dataset we present in this paper is unique for the richness of annotated information, including detailed descriptions of drug reactions with full context. We crawled patient’s weblog articles shared on an online patient-networking platform and annotated the effects of drugs therein reported. We identified spans describing drug reactions and assigned labels for related drug names, standard codes for the symptoms of the reactions, and types of effects. As a first dataset, we annotated 677 drug reactions with these detailed labels based on 169 weblog articles by Japanese lung cancer patients. Our annotation dataset is made publicly available at our web site (https://yukiar.github.io/adr-jp/) for further research on the detection of adverse drug reactions and more broadly, on patient-generated text processing.

pdf bib
SAPPHIRE: Simple Aligner for Phrasal Paraphrase with Hierarchical Representation
Masato Yoshinaka | Tomoyuki Kajiwara | Yuki Arase
Proceedings of The 12th Language Resources and Evaluation Conference

We present SAPPHIRE, a Simple Aligner for Phrasal Paraphrase with HIerarchical REpresentation. Monolingual phrase alignment is a fundamental problem in natural language understanding and also a crucial technique in various applications such as natural language inference and semantic textual similarity assessment. Previous methods for monolingual phrase alignment are language-resource intensive; they require large-scale synonym/paraphrase lexica and high-quality parsers. Different from them, SAPPHIRE depends only on a monolingual corpus to train word embeddings. Therefore, it is easily transferable to specific domains and different languages. Specifically, SAPPHIRE first obtains word alignments using pre-trained word embeddings and then expands them to phrase alignments by bilingual phrase extraction methods. To estimate the likelihood of phrase alignments, SAPPHIRE uses phrase embeddings that are hierarchically composed of word embeddings. Finally, SAPPHIRE searches for a set of consistent phrase alignments on a lattice of phrase alignment candidates. It achieves search-efficiency by constraining the lattice so that all the paths go through a phrase alignment pair with the highest alignment score. Experimental results using the standard dataset for phrase alignment evaluation show that SAPPHIRE outperforms the previous method and establishes the state-of-the-art performance.