Abstract
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present Factcheck-Bench, a holistic end-to-end framework for annotating and evaluating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels for fact-checking and correcting not just the final prediction, but also the intermediate steps that a fact-checking system might need to take. Based on this framework, we construct an open-domain factuality benchmark in three-levels of granularity: claim, sentence, and document. We further propose a system, Factcheck-GPT, which follows our framework, and we show that it outperforms several popular LLM fact-checkers. We make our annotation tool, benchmark, and code available at https://github.com/yuxiaw/Factcheck-GPT.
| Originalsprog | Engelsk |
|---|---|
| Titel | EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024 |
| Redaktører | Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen |
| Antal sider | 32 |
| Forlag | Association for Computational Linguistics (ACL) |
| Publikationsdato | 2024 |
| Sider | 14199-14230 |
| ISBN (Elektronisk) | 9798891761681 |
| DOI | |
| Status | Udgivet - 2024 |
| Begivenhed | 2024 Findings of the Association for Computational Linguistics, EMNLP 2024 - Hybrid, Miami, USA Varighed: 12 nov. 2024 → 16 nov. 2024 |
Konference
| Konference | 2024 Findings of the Association for Computational Linguistics, EMNLP 2024 |
|---|---|
| Land/Område | USA |
| By | Hybrid, Miami |
| Periode | 12/11/2024 → 16/11/2024 |
| Sponsor | Apple, Bloomberg, Citadel Securities, et al., Google , Meta |
Bibliografisk note
Publisher Copyright:© 2024 Association for Computational Linguistics.