LLM-Alignment Papers
updated
Concrete Problems in AI Safety
Paper
• 1606.06565
• Published
• 1
Paper
• 1611.08219
• Published
• 1
Learning to summarize from human feedback
Paper
• 2009.01325
• Published
• 4
Truthful AI: Developing and governing AI that does not lie
Paper
• 2110.06674
• Published
• 1
Scaling Laws for Neural Language Models
Paper
• 2001.08361
• Published
• 9
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published
• 24
Constitutional AI: Harmlessness from AI Feedback
Paper
• 2212.08073
• Published
• 4
Discovering Language Model Behaviors with Model-Written Evaluations
Paper
• 2212.09251
• Published
• 1
Towards Bidirectional Human-AI Alignment: A Systematic Review for
Clarifications, Framework, and Future Directions
Paper
• 2406.09264
• Published
• 2
Scalable AI Safety via Doubly-Efficient Debate
Paper
• 2311.14125
• Published
• 2