Niranjan Balasubramanian
2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
|
Harsh Trivedi
|
Aruna Balasubramanian
|
Niranjan Balasubramanian
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Transformer-based QA models use input-wide self-attention – i.e. across both the question and the input passage – at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.
Modeling Label Semantics for Predicting Emotional Reactions
Radhika Gaonkar
|
Heeyoung Kwon
|
Mohaddeseh Bastan
|
Niranjan Balasubramanian
|
Nathanael Chambers
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Predicting how events induce emotions in the characters of a story is typically seen as a standard multi-label classification task, which usually treats labels as anonymous classes to predict. They ignore information that may be conveyed by the emotion labels themselves. We propose that the semantics of emotion labels can guide a model’s attention when representing the input story. Further, we observe that the emotions evoked by an event are often related: an event that evokes joy is unlikely to also evoke sadness. In this work, we explicitly model label classes via label embeddings, and add mechanisms that track label-label correlations both during training and inference. We also introduce a new semi-supervision strategy that regularizes for the correlations on unlabeled data. Our empirical evaluations show that modeling label semantics yields consistent benefits, and we advance the state-of-the-art on an emotion inference task.
Hierarchical Modeling for User Personality Prediction: The Role of Message-Level Attention
Veronica Lynn
|
Niranjan Balasubramanian
|
H. Andrew Schwartz
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Not all documents are equally important. Language processing is increasingly finding use as a supplement for questionnaires to assess psychological attributes of consenting individuals, but most approaches neglect to consider whether all documents of an individual are equally informative. In this paper, we present a novel model that uses message-level attention to learn the relative weight of users’ social media posts for assessing their five factor personality traits. We demonstrate that models with message-level attention outperform those with word-level attention, and ultimately yield state-of-the-art accuracies for all five traits by using both word and message attention in combination with past approaches (an average increase in Pearson r of 2.5%). In addition, examination of the high-signal posts identified by our model provides insight into the relationship between language and personality, helping to inform future work.
Search
Co-authors
- Qingqing Cao 1
- Harsh Trivedi 1
- Aruna Balasubramanian 1
- Radhika Gaonkar 1
- Heeyoung Kwon 1
- show all...
Venues
- ACL3