Demoting Racial Bias in Hate Speech Detection

Mengzhou Xia; Anjalie Field; Yulia Tsvetkov

Demoting Racial Bias in Hate Speech Detection

Mengzhou Xia, Anjalie Field, Yulia Tsvetkov

Abstract

In the task of hate speech detection, there exists a high correlation between African American English (AAE) and annotators’ perceptions of toxicity in current datasets. This bias in annotated training data and the tendency of machine learning models to amplify it cause AAE text to often be mislabeled as abusive/offensive/hate speech (high false positive rate) by current hate speech classifiers. Here, we use adversarial training to mitigate this bias. Experimental results on one hate speech dataset and one AAE dataset suggest that our method is able to reduce the false positive rate for AAE text with only a minimal compromise on the performance of hate speech classification.

Anthology ID:: 2020.socialnlp-1.2
Volume:: Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media
Month:: July
Year:: 2020
Address:: Online
Venues:: SocialNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7–14
URL:: https://www.aclweb.org/anthology/2020.socialnlp-1.2
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://www.aclweb.org/anthology/2020.socialnlp-1.2.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.

PDF BibTeX Search