Dec 1, 2025

The new skill that compounds

5 Comments

Excellent post! Curious if you write down the entire knowledge document in one file i.e. `agents.md` and then point your agent of choice to it?

Reply (1)

Yew Jin Lim

Dec 2

A monolithic file / system tends to hurt performance. A few things I found:

1. Separate contexts for separate concerns: long-running tasks pollute the window with irrelevant history

2. Step-specific prompts: each step sees only relevant knowledge, which improves consistency

3. Debuggability: when something breaks, you need to know which step failed and what it was working with

4. Tool design: related to separate context but for tool use you want to be detailed (yet clear) - example usage, edge case, clear documentation - this context isn't usually needed

Lee

Dec 14

Are you planning on sharing some of your code or files for this?

Shuba Swaminathan

Dec 3

Just like when we teach humans, improving understanding (vs fixing code) creates transferable capability.

As I was reading your post, I started mapping your approach to the signal detection framework I apply to strategic thinking (true/false positives/negatives). What I see in your ML agent's learning:

✅ Recognize relevant patterns (true positive)

✅ Ignore obvious noise (true negative)

✅ You help it discover missing knowledge when stuck (false negative)

What I'm less clear on is whether the agent also learns to question whether it's over-indexing on spurious patterns (the false positive case). When you say "identify missing knowledge, enhance the system's understanding, resume" - does it also address what the agent might be wrong about? Patterns that worked in previous competitions but are actually spurious correlations that don't generalize?

I ask because in human strategic thinking, false positives are often harder to detect than false negatives. We accumulate patterns from experience that are context-specific but we treat as universal, or correlations we mistake for causation. These get reinforced through success until they actively mislead us.

The debugging questions feel fundamentally different.

False negative: "What am I missing?"

False positive: "What am I wrong about?"

Does your meta-coding approach address both? Or does the validation/competition structure naturally select out false positives in ways that make explicit detection less critical for ML than it is for human cognition?

Harikesh

Dec 2

Thanks for sharing.

What YJ Thinks

Teaching Machines How to Think