Reward Collapse in Aligning Large Language Models (Q6438249)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Reward Collapse in Aligning Large Language Models |
preprint article from arXiv
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Reward Collapse in Aligning Large Language Models |
preprint article from arXiv |
Statements
27 May 2023
0 references
cs.LG
0 references
cs.AI
0 references
cs.CL
0 references
math.OC
0 references
stat.ML
0 references
Ziang Song
0 references
Tianle Cai
0 references
Jason D. Lee
0 references
Weijie J. Su
0 references