Abstract
The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are “hallucinatory”, e.g., disambiguating gender-ambiguous occurrences of ‘doctor’ as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of ‘the doctor removed his mask’ is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) |
Publisher | Association for Computational Linguistics |
Publication date | 2020 |
Pages | 2637–2648 |
DOIs | |
Publication status | Published - 2020 |
Event | The 2020 Conference on Empirical Methods in Natural Language Processing - online Duration: 16 Nov 2020 → 20 Nov 2020 http://2020.emnlp.org |
Conference
Conference | The 2020 Conference on Empirical Methods in Natural Language Processing |
---|---|
Location | online |
Period | 16/11/2020 → 20/11/2020 |
Internet address |