Abstract
The relationship between the quality of a string, as judged by a human reader, and its probability, p(y) under a language model undergirds the development of better language models. For example, many popular algorithms for sampling from a language model have been conceived with the goal of manipulating p(y) to
place higher probability on strings that humans deem of high quality (Fan et al., 2018; Holtzman et al., 2020). In this article, we examine the probability–quality relationship in language models explicitly aligned to human preferences,
e.g., through reinforcement learning through human feedback. We show that, when sampling corpora from an aligned language model, there exists a trade-off between the strings’ average reward and average log-likelihood under the prior language model, i.e., the same model before alignment with human preferences. We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
place higher probability on strings that humans deem of high quality (Fan et al., 2018; Holtzman et al., 2020). In this article, we examine the probability–quality relationship in language models explicitly aligned to human preferences,
e.g., through reinforcement learning through human feedback. We show that, when sampling corpora from an aligned language model, there exists a trade-off between the strings’ average reward and average log-likelihood under the prior language model, i.e., the same model before alignment with human preferences. We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing |
Redaktører | Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen |
Antal sider | 24 |
Vol/bind | 1 |
Udgivelsessted | Miami, Florida, US |
Forlag | ACL |
Publikationsdato | 2024 |
Sider | 14805-14829 |
DOI | |
Status | Udgivet - 2024 |
Begivenhed | 2024 Conference on Empirical Methods in Natural Language Processing - EMNLP, Miami, USA Varighed: 12 nov. 2024 → 16 nov. 2024 |
Konference
Konference | 2024 Conference on Empirical Methods in Natural Language Processing |
---|---|
Lokation | EMNLP |
Land/Område | USA |
By | Miami |
Periode | 12/11/2024 → 16/11/2024 |