Year in Review 2025

Author: Andrej Karpathy Source: karpathy.bearblog.dev Date Read: December 25, 2025

Download Anki Deck

17 cards • Last updated: Dec 2025

Flashcards

1. Modern LLM Production Stack

What are the four stages of the modern LLM production stack as of 2025?

Pretraining
Supervised Finetuning (SFT)
RLHF (Reinforcement Learning from Human Feedback)
RLVR (Reinforcement Learning from Verifiable Rewards)

Tags: #LLM #training #RLVR

2. RLVR Definition

What is RLVR (Reinforcement Learning from Verifiable Rewards)?

Training LLMs against auto-verifiable rewards (math/code puzzles), causing them to spontaneously develop reasoning strategies through optimization rather than human prescription.

Tags: #RLVR #reasoning #training

3. Computational Intensity Comparison

How does RLVR differ from SFT and RLHF in terms of computational intensity?

SFT and RLHF are relatively thin/short stages (minor finetunes), while RLVR allows much longer optimization runs because it trains against objective, non-gameable reward functions. This shifted compute from pretraining to RL, resulting in similar-sized LLMs but longer RL runs.

Tags: #RLVR #compute #optimization

4. Test-Time Compute Scaling

What new "knob" did RLVR introduce for controlling LLM capability?

Test-time compute scaling—the ability to increase capability by generating longer reasoning traces and increasing "thinking time" during inference, with its own associated scaling law.

Tags: #RLVR #scaling #inference

5. Emergent Reasoning Strategies

What reasoning strategies emerged spontaneously from RLVR training?

Models spontaneously developed strategies like:

Reflection and double-checking answers
Trying multiple approaches
Breaking down complex problems
Backtracking when stuck

These emerged through optimization rather than being explicitly programmed.

Tags: #RLVR #reasoning #emergence

6. Impact on Training Compute Allocation

How did RLVR change the allocation of compute between pretraining and RL stages?

RLVR shifted significant compute from pretraining to the RL stage. While pretraining used to dominate compute usage, RLVR allows for much longer RL optimization runs, rebalancing the compute distribution across the training pipeline.

Tags: #RLVR #compute #training

7. Key Requirement for RLVR

What is the key requirement for effective RLVR training?

Auto-verifiable rewards that are objective and non-gameable. Examples include math puzzles and code problems where correctness can be automatically verified without human judgment.

Tags: #RLVR #verification #rewards

8. RLVR vs Human Feedback

How does RLVR differ from RLHF in terms of feedback mechanism?

RLHF relies on human preferences and feedback, while RLVR uses automatically verifiable rewards from objective tasks (like math/code puzzles) that don't require human judgment.

Tags: #RLVR #RLHF #feedback

9. Scaling Laws Evolution

What new scaling law did RLVR introduce to the LLM ecosystem?

Test-time compute scaling law—the relationship between inference-time compute (thinking time) and model capability, adding to the existing pretraining scaling laws.

Tags: #scaling #RLVR #inference

10. Year of RLVR Breakthrough

According to Karpathy, what year marked the breakthrough for RLVR in LLM development?

2025—Karpathy identifies this as the year when RLVR became a major component of the LLM production stack, fundamentally changing how models are trained and scaled.

Tags: #RLVR #timeline #2025

11. OpenAI Model Inflection Points

Which OpenAI models marked the inflection point for RLVR capabilities?

OpenAI o1 (late 2024) was the first demonstration of an RLVR model, but o3 (early 2025) was the obvious inflection point where the difference became intuitively noticeable.

Tags: #OpenAI #RLVR #models

12. Ghosts vs. Animals Metaphor

What is the "Ghosts vs. Animals" metaphor for understanding LLM intelligence?

We're not "evolving/growing animals" but "summoning ghosts." LLMs have fundamentally different architecture, training data, algorithms, and optimization pressures than biological intelligence—optimized for imitating text and collecting rewards rather than survival. This produces entities inappropriate to think about through an animal lens.

Tags: #metaphor #intelligence #LLM

13. Jagged Intelligence

What is "jagged intelligence" in the context of LLMs?

LLMs display uneven performance characteristics—they "spike" in capability in verifiable domains (where RLVR applies) while remaining weak elsewhere. They can simultaneously be a "genius polymath" and a "confused grade schooler" who can be tricked by simple jailbreaks.

Tags: #intelligence #capabilities #RLVR

14. LLM App Layer Functions

What are the four key functions of an "LLM app" layer like Cursor?

Context engineering
Orchestrating multiple LLM calls in complex DAGs while balancing performance/cost
Application-specific GUI for human-in-the-loop
An "autonomy slider" for user control

Tags: #LLMapp #architecture #Cursor

15. LLM Labs vs Apps Division

How does Karpathy predict the division between LLM labs and LLM apps will evolve?

LLM labs will graduate "generally capable college students," while LLM apps will organize, finetune, and animate teams of them into "deployed professionals" in specific verticals by supplying private data, sensors, actuators, and feedback loops.

Tags: #ecosystem #labs #apps

16. Claude Code Paradigm

What paradigm shift does Claude Code represent according to Karpathy?

Claude Code represents AI that "lives on your computer"—a loopy agent combining tool use and reasoning for extended problem solving, running locally with access to your private environment, data, and context. It's a "little spirit/ghost" paradigm distinct from web-based AI.

Tags: #ClaudeCode #paradigm #localAI

17. Google Gemini “Nano Banana” and LLM GUI

What is Google Gemini "Nano banana" and why is it paradigm-shifting?

A model representing the beginning of "LLM GUI"—moving beyond text-based chat (like 1980s console commands) toward LLMs communicating via images, infographics, animations, and web apps. Its power comes from joint text generation, image generation, and world knowledge tangled in model weights.

Tags: #Gemini #GUI #multimodal

Key Insights

RLVR represents a paradigm shift: Moving from human-prescribed behaviors to emergent reasoning through optimization
Compute reallocation: The industry is shifting compute from pretraining to RL stages
Dual scaling: We now have both training-time and test-time scaling laws to optimize
Objective verification is key: The success of RLVR depends on having tasks with clear, verifiable correct answers

{"cards":[ {"front": "What are the four stages of the modern LLM production stack as of 2025?", "back": "1) Pretraining, 2) Supervised Finetuning (SFT), 3) RLHF, 4) RLVR"}, {"front": "What is RLVR (Reinforcement Learning from Verifiable Rewards)?", "back": "Training LLMs against auto-verifiable rewards (math/code puzzles), causing them to spontaneously develop reasoning strategies through optimization rather than human prescription."}, {"front": "How does RLVR differ from SFT and RLHF in terms of computational intensity?", "back": "SFT and RLHF are relatively thin/short stages (minor finetunes), while RLVR allows much longer optimization runs because it trains against objective, non-gameable reward functions."}, {"front": "What new 'knob' did RLVR introduce for controlling LLM capability?", "back": "Test-time compute scaling—the ability to increase capability by generating longer reasoning traces and increasing 'thinking time' during inference."}, {"front": "What reasoning strategies emerged spontaneously from RLVR training?", "back": "Reflection and double-checking answers, trying multiple approaches, breaking down complex problems, and backtracking when stuck."}, {"front": "How did RLVR change the allocation of compute between pretraining and RL stages?", "back": "RLVR shifted significant compute from pretraining to the RL stage, allowing for much longer RL optimization runs."}, {"front": "What is the key requirement for effective RLVR training?", "back": "Auto-verifiable rewards that are objective and non-gameable, such as math puzzles and code problems."}, {"front": "How does RLVR differ from RLHF in terms of feedback mechanism?", "back": "RLHF relies on human preferences and feedback, while RLVR uses automatically verifiable rewards from objective tasks."}, {"front": "What new scaling law did RLVR introduce to the LLM ecosystem?", "back": "Test-time compute scaling law—the relationship between inference-time compute and model capability."}, {"front": "According to Karpathy, what year marked the breakthrough for RLVR in LLM development?", "back": "2025—when RLVR became a major component of the LLM production stack."}, {"front": "Which OpenAI models marked the inflection point for RLVR capabilities?", "back": "OpenAI o1 (late 2024) was the first demonstration, but o3 (early 2025) was the obvious inflection point where the difference became intuitively noticeable."}, {"front": "What is the 'Ghosts vs. Animals' metaphor for understanding LLM intelligence?", "back": "We're not 'evolving/growing animals' but 'summoning ghosts.' LLMs have fundamentally different architecture, training, and optimization than biological intelligence—optimized for imitating text and rewards rather than survival."}, {"front": "What is 'jagged intelligence' in the context of LLMs?", "back": "LLMs display uneven performance—they 'spike' in capability in verifiable domains (RLVR) while remaining weak elsewhere. They can be both 'genius polymath' and 'confused grade schooler.'"}, {"front": "What are the four key functions of an 'LLM app' layer like Cursor?", "back": "1) Context engineering, 2) Orchestrating multiple LLM calls in complex DAGs, 3) Application-specific GUI for human-in-the-loop, 4) An 'autonomy slider' for user control"}, {"front": "How does Karpathy predict the division between LLM labs and LLM apps will evolve?", "back": "Labs will graduate 'generally capable college students,' while apps will organize/finetune them into 'deployed professionals' in verticals by supplying private data, sensors, actuators, and feedback loops."}, {"front": "What paradigm shift does Claude Code represent according to Karpathy?", "back": "AI that 'lives on your computer'—a loopy agent combining tool use and reasoning for extended problem solving, with access to your private environment and context. A 'little spirit/ghost' paradigm distinct from web-based AI."}, {"front": "What is Google Gemini 'Nano banana' and why is it paradigm-shifting?", "back": "A model representing the beginning of 'LLM GUI'—moving beyond text chat toward LLMs communicating via images, infographics, animations, and web apps. Power comes from joint text/image generation with world knowledge tangled in weights."} ]}

Download Anki Deck#

Flashcards#

Key Insights#

Download Anki Deck

Flashcards

Key Insights