Investigating the Impact of Model Instability on Explanations and Uncertainty

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

Abstract

Explainable AI methods facilitate the understanding of model behaviour, yet, small, imperceptible perturbations to inputs can vastly distort explanations. As these explanations are typically evaluated holistically, before model deployment, it is difficult to assess when a particular explanation is trustworthy. Some studies have tried to create confidence estimators for explanations, but none have investigated an existing link between uncertainty and explanation quality. We artificially simulate epistemic uncertainty in text input by introducing noise at inference time. In this large-scale empirical study, we insert different levels of noise perturbations and measure the effect on the output of pre-trained language models and different uncertainty metrics. Realistic perturbations have minimal effect on performance and explanations, yet masking has a drastic effect. We find that high uncertainty doesn't necessarily imply low explanation plausibility; the correlation between the two metrics can be moderately positive when noise is exposed during the training process. This suggests that noise-augmented models may be better at identifying salient tokens when uncertain. Furthermore, when predictive and epistemic uncertainty measures are over-confident, the robustness of a saliency map to perturbation can indicate model stability issues. Integrated Gradients shows the overall greatest robustness to perturbation, while still showing model-specific patterns in performance; however, this phenomenon is limited to smaller Transformer-based language models. https://github.com/spaidataiga/unc-and-xai-noise.

OriginalsprogEngelsk
Titel62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference
RedaktørerLun-Wei Ku, Andre Martins, Vivek Srikumar
Antal sider26
ForlagAssociation for Computational Linguistics (ACL)
Publikationsdato2024
Sider11854-11879
ISBN (Elektronisk)9798891760998
StatusUdgivet - 2024
BegivenhedFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Varighed: 11 aug. 202416 aug. 2024

Konference

KonferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Land/OmrådeThailand
ByHybrid, Bangkok
Periode11/08/202416/08/2024
SponsorApple, et al., LG AI Research, Megagon Labs, Meta AI, NewsBreak

Bibliografisk note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

Citationsformater