Spring til hovednavigation Spring til søgning Spring til hovedindhold

Trick or Neat: Adversarial Ambiguity and Language Model Evaluation

Antonia Karamolegkou*, Oliver Eberle, Phillip Rust, Carina Kauf, Anders Søgaard

*Corresponding author af dette arbejde

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

4 Downloads (Pure)

Abstract

Detecting ambiguity is important for language understanding, including uncertainty estimation, humour detection, and processing garden path sentences. We assess language models' sensitivity to ambiguity by introducing an adversarial ambiguity dataset that includes syntactic, lexical, and phonological ambiguities along with adversarial variations (e.g., word-order changes, synonym replacements, and random-based alterations). Our findings show that direct prompting fails to robustly identify ambiguity, while linear probes trained on model representations can decode ambiguity with high accuracy, sometimes exceeding 90%. Our results offer insights into the prompting paradigm and how language models encode ambiguity at different layers. We release both our code and data: coastalcph/lm_ambiguity.

OriginalsprogEngelsk
TitelFindings of the Association for Computational Linguistics : ACL 2025
RedaktørerWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
ForlagAssociation for Computational Linguistics (ACL)
Publikationsdato2025
Sider18542-18561
ISBN (Elektronisk)9798891762565
DOI
StatusUdgivet - 2025
Begivenhed63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Østrig
Varighed: 27 jul. 20251 aug. 2025

Konference

Konference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Land/OmrådeØstrig
ByVienna
Periode27/07/202501/08/2025
SponsorAlibaba Cloud, Ant Group, Bloomberg Engineering, Citadel Securities
NavnProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN0736-587X

Bibliografisk note

Publisher Copyright:
© 2025 Association for Computational Linguistics.

Citationsformater