
Curated by Ti Zhao
Essential Papers for Understanding LLMs
Foundational Research on Large Language Models: Capabilities, Safety, and Limitations
This collection represents a comprehensive survey of foundational research spanning the evolution of large language models from early transformer architectures to modern safety-aligned systems. The documents trace key developments from BERT and GPT series through T5, InstructGPT, and GPT-4, establishing the core architectural and training innovations that enabled current LLM capabilities.
Core themes include:
- Architectural foundations: Transformer attention mechanisms, scaling laws, and efficiency improvements (Mamba, BitNet)
- Capability enhancement: Few-shot learning, in-context learning, reasoning (Chain-of-Thought, Tree of Thoughts), and tool use (Toolformer, ReAct)
- Safety and alignment: Human feedback training (RLHF), constitutional AI, red teaming methodologies, and preference optimization (DPO)
- Understanding and interpretability: Mechanistic interpretability, emergent abilities, self-evaluation capabilities, and factual knowledge localization
The collection provides essential context for understanding how modern LLMs achieve their capabilities while highlighting ongoing challenges in safety, interpretability, and alignment. This knowledge base serves as a foundation for researchers working on LLM development, evaluation, and governance.