I recently placed in the top 20% of a Kaggle competition. I didn’t write a single line of feature engineering. I didn’t tune a single hyperparameter. Instead, I spent my time doing something fundamentally different: I built a system that knows how to approach machine learning problems, then pointed it at the competition and watched.
When something broke - and things always break - I didn’t fix the bug. I asked a different question: What was missing from the system’s understanding? Then I added that understanding. Not a patch. Not a hotfix. A new piece of institutional knowledge that would prevent entire categories of future failures.
This is meta-coding: coding systems that code solutions. And I think it’s where all knowledge work is heading.
From Solving Problems to Solving Problem-Solving
Here’s the traditional workflow for a data scientist entering a Kaggle competition: load the data, explore and clean it, engineer features, try models, tune hyperparameters, iterate until the deadline. Each competition is a fresh start. Sure, you bring experience. But the work is still manual, still problem-specific, still ground-level.
Now here’s what I actually do: describe the competition to my ML agent, watch it read the problem, set up validation, and run experiments. When it gets stuck, I don’t fix its code - I improve its understanding. That understanding persists. The next competition starts smarter.
I’ve moved up a level in the stack.
The Architecture of Meta-Coding
Let me show you what this looks like concretely. I built a system called ML Planner - an AI agent with specialized knowledge for machine learning workflows. Here’s a simplified view:
Agent → Knowledge Layer → Execution Sandbox → Persistent Memory
(Gemini) • Feature eng. (Python env) (TRAINING_LOG.md)
(Claude) • Validation
(OpenAI) • GPU management
• Cost controlEach piece of the knowledge layer is mainly text - not code - that teaches the agent how to approach a class of problems. I don’t have a hardcoded feature engineering pipeline. I have knowledge about feature engineering: when to use target encoding, how to prevent leakage with GroupKFold, proven hyperparameters for LightGBM.
I don’t have explicit instructions to SSH into GPU instances. The knowledge layer teaches the agent how to think about GPU infrastructure: launch instances, monitor jobs, auto-terminate to prevent runaway costs.
This is the key insight: knowledge documents are meta-code. They’re instructions for writing instructions.
This Isn’t Theoretical
If meta-coding sounds abstract, consider CaseText. Jake Heller founded the company in 2013, got early access to GPT-4 in summer 2022, rebuilt their entire product around it, and sold to Thomson Reuters for $650 million two years later.
What did they build? Not a legal search engine. They encoded how expert lawyers think.
Heller describes their process: “Ask yourself this question - how would the best person in that field do this if they had unlimited time and unlimited resources like a thousand AIs that can all work in, you know, simultaneously to accomplish this task, right? How would the best person do this and work backwards from there, right?” (~11:20)
For legal research, that meant decomposing the task into its reasoning steps: understand the query, ask clarifying questions, make a research plan, execute dozens of searches, read every result carefully, filter for relevance, take notes on why something matters, synthesize into an essay, then verify citations. (~11:45)
Each step became a prompt or piece of deterministic code. The knowledge of how lawyers actually think was the product.
Here’s what struck me most: when their system failed, they didn’t debug code. They debugged understanding. Heller describes building hundreds of test cases, then spending weeks refining a single prompt until it passed 97%+ of them. When accuracy plateaued at 60%, most people gave up. CaseText kept grinding on the knowledge layer. (~17:00)
The New Development Loop
In traditional development, when something breaks, you debug the code: find the cause, fix it, resume. In meta-coding, when something breaks, you debug the thinking: identify missing knowledge, enhance the system’s understanding, resume.
Real example: My agent was running training jobs on Lambda Labs GPU instances. The jobs would complete, but the instance would keep running - burning $1.50/hour for nothing.
I could have added a terminate_instance() call to the training script. That’s the traditional fix. Instead, I asked: Why didn’t the agent think to do this?
The answer: the agent didn’t understand the cost model of GPU infrastructure. It was optimizing for correctness, not cost.
So I updated the knowledge layer with some knowledge:
## Safety and Cost Management
The system MUST emphasize cost control for GPU instances:
- Auto-termination on boot failure, SSH failure, or job completion
- Instance existence checks before launching new instances
- Never leave instances running after job completion
- Boot timeout: auto-terminate if SSH fails within 5 minutesNow the agent thinks about costs on every GPU task. Not because I added a termination call to one script, but because I taught it that cost matters. The fix propagates to every future interaction.
What This Looks Like in Practice
Competition 1: Playground Series S5E11 - Predicting Loan Payback Score: 0.92601 AUC | Rank: 729 of 3,726 (top ~20%)
This is a tabular classification problem: predict whether a borrower will repay their loan. The dataset has a 4:1 class imbalance - 80% of borrowers pay back, 20% default.
This is exactly the kind of problem ML Planner was built for. The agent loaded the data, identified the imbalance, set up stratified GroupKFold validation, and ran experiments with LightGBM and XGBoost. When early runs showed inflated CV scores that didn’t transfer to the leaderboard - a classic sign of data leakage - I didn’t debug the pipeline. Instead, I added knowledge about how to detect and prevent leakage in temporal financial data. The agent re-ran experiments with that understanding and converged on a clean solution.
Competition 2: Santa 2025 - Christmas Tree Packing Challenge Score: ~72.886 | Rank: 144 of 1,162 (top ~12%) | Competition ongoing
This isn’t machine learning - it’s computational geometry. The task: given N identical tree-shaped polygons, find the smallest square that can contain all N trees without overlap, for N ranging from 1 to 200.
The challenge here was different. My ML knowledge layer had nothing to offer. The agent needed to understand collision detection between non-convex polygons, gradient-free optimization (simulated annealing, basin-hopping), and heuristics for initializing placements. I built new knowledge documents from scratch: how to represent polygons, how to check for overlaps efficiently via the separating axis theorem, and when to try rotation vs. translation moves. I’m also building frameworks on how to reason through resources and do research on its own, even if incorporating cutting edge research automatically is still a bit iffy though…
In any case, watching the agent iterate on packing 50 trees into a box - trying configurations, rejecting overlaps, slowly compressing the boundary - felt like watching someone learn.
The pattern across both competitions: neither required me to write feature engineering pipelines or optimization loops by hand. Both required me to articulate how to think about the problem domain. The loan competition exercised knowledge I already had. The Santa competition forced me to build new knowledge from scratch. Both improved the system permanently.
The Meta-Ladder
There’s something recursive happening here worth making explicit.
Consider the value chain:
Raw data → Features → Models → Predictions → Decisions → Economic valueTraditional ML work operates on the left side of this chain - transforming data into predictions. Meta-coding operates one level up: building systems that transform data into predictions. But it doesn’t stop there.
The knowledge documents themselves can be improved systematically. I’ve started noticing patterns in how I update the knowledge layer - common failure modes in my own teaching, better structures for conveying certain types of understanding. That’s meta-meta-coding: improving how I improve the improver.
Each level up the ladder has different economics:
Level 1 (doing ML): Value scales with hours worked times skill.
Level 2 (building ML systems): Value scales with how many problems the system can solve.
Level 3 (improving how you build ML systems): Value compounds - you’re changing the slope of the curve itself.
This is why the skill of articulating tacit knowledge becomes so valuable. Most expertise lives in people’s heads as intuition - pattern matching they can’t explain. The meta-coder’s job is to surface that intuition, make it explicit, and encode it in a form an agent can use.
Knowledge Is the Product
Here’s a prediction: within five years, the most valuable artifacts in any knowledge domain won’t be code, documents, or datasets. They’ll be knowledge definitions - encoded expertise for how to think through problems in each economically valuable domain.
CaseText spent a decade building conventional legal software. They did fine - $20 million in revenue, 100 employees. But when they rebuilt around GPT-4 with deep domain knowledge baked in, everything changed. Word-of-mouth exploded. Sales people became “order takers.”
The difference wasn’t the underlying model - every competitor had access to GPT-4. The difference was the knowledge layer: years of legal expertise encoded into prompts, evals, and workflows.
Heller puts it bluntly: “How do professionals really do it? Break it down to steps. Each step basically becomes a prompt or piece of code. And then you test each step. Test the whole workflow all together. If you just do these two things, you’ll be like 90% of your way there to building a better AI app than what most of the crap that’s out there, ok? Because most people never eval. And they never take the time to figure out how professionals really do the job.” (~23:05)
The bottleneck is no longer “can AI do X?” It’s “can we articulate what good X looks like?”
Beyond Machine Learning
I’ve been using ML as my example because it’s familiar. I wouldn’t call myself an ML expert, though I did author a paper on one approach to the Netflix Prize. That work was deeply hands-on, and a lot of the instinct in ML Planner comes from that era.
But the pattern generalizes. CaseText did it for legal research. The same approach applies everywhere:
In software engineering, you write specifications that agents turn into code; when bugs appear, you improve the spec
In writing, you develop voice guides and structural templates; the agent drafts, you refine the meta-layer
In research, you build agents that know how to evaluate literature and identify gaps; when they miss something, you teach them what “relevance” means in your field
The common thread: you stop being the practitioner and start being the teacher of practitioners.
What We Lose (and Gain)
There’s a craft to ground-level work that meta-coding abstracts away. The intuition you build by manually cleaning messy data, by staring at learning curves, by debugging a pipeline failure at 2 AM - that intuition matters. It informs the knowledge you write.
If you skip straight to meta-coding without ever doing the ground-level work, your knowledge will be shallow. You’ll miss edge cases that only experience reveals. The ideal path is to earn your meta-coding license: do the work, feel the pain, then systematize your understanding so others don’t have to.
But there’s an obvious question: aren’t you just teaching yourself out of a job?
Yes - out of this job. That’s the point. The goal isn’t to cling to ground-level work as AI gets better at it. The goal is to keep moving up the value chain. You encode your current expertise, hand it to a system, and go solve the next harder problem - the one that requires intuition you haven’t articulated yet.
The meta-coder who systematizes feature engineering today is freed up to work on problem selection, strategy, or the judgment calls that don’t yet have names. The ladder keeps extending upward. The job isn’t to stay on your current rung; it’s to keep climbing.
Getting Started
If you want to explore meta-coding in your own domain:
Pick a repeating workflow in your work.
Write down how you think about it - not just the steps, but the heuristics and failure modes.
Give that document to an AI agent.
See where it gets stuck. Those sticking points reveal implicit knowledge you haven’t articulated yet.
Iterate on the knowledge, not the code.
Accumulate over time. The compounding is the point.
Here’s a useful heuristic: pick a job - not a task, not a feature, but a job that people pay other people to do. Then ask: how would the best person in that field approach this with unlimited time and a thousand AI instances?
Write that down. That’s your first knowledge document.
I’m still figuring this out. The ML Planner system is a prototype, full of rough edges. But every time I use it, I learn something new about how to teach rather than do. And that feels like the right direction.
The code is becoming the comment. The solution is becoming the process. The programmer is becoming the teacher.
Welcome to meta-coding.



Excellent post! Curious if you write down the entire knowledge document in one file i.e. `agents.md` and then point your agent of choice to it?
Are you planning on sharing some of your code or files for this?