Hacker News

by delducaon 5/3/25, 7:59 PMwith 1 comments

by krackerson 5/10/25, 10:19 PM

Turns out that https://www.fortressofdoors.com/four-magic-words/ was right and all you need to do in training is have the LLM meditate on a single example.

Reinforcement Learning for Reasoning in LLMs with One Training Example