Studying the Landscape

I have a rule I’ve followed for fifteen years: plan first, don’t jump to solving. When you’re excited about building something, the temptation is to start writing code immediately. But the fastest way to waste months of your life is to build the wrong thing confidently.

So February 2026 was my research month. Before I committed to building Jeff as a full coding harness, I needed to understand the problem space properly. What existed. What worked. What didn’t. And why.

The actual implementation, not the marketing

The AI coding agent market was exploding. AI was contributing a measurable percentage of commits on GitHub. The market was valued in billions. Every week a new tool launched with breathless claims about “autonomous software engineering.”

I didn’t care about any of that. I cared about how they actually worked.

I went deep. I studied the leading tools not through their landing pages or demo videos, but through their architecture. How they manage context windows (the fixed amount of text an AI model can process at once). How they structure tool use, deciding when to read a file, run a command, or make an edit. How they handle memory and persistence across sessions.

For one tool alone, I wrote eight separate research documents. Architecture. Memory systems. Compaction (how they summarise old conversation to fit new context). Context construction (how they decide what information to include in each request). Agent orchestration. Tools and permissions. Conversation handling. Hooks and plugins.

Eight documents. One tool. That’s what proper research looks like.

The platform problem

I also studied the platform Jeff originally ran on, an open-source agent framework that let you customise LLM-powered assistants. From the outside it looked solid. From the inside, less so.

The security posture was concerning. A scan revealed tens of thousands of exposed instances, many with default configurations, some leaking API keys. The cost model was brutal for active use. Running it properly meant spending somewhere between $300 and $750 per month, depending on usage patterns. And the architecture had fundamental limitations around how it handled context and memory.

These weren’t theoretical concerns. I’d been running Jeff on this platform for months. I knew the friction firsthand. But studying it systematically turned “this feels wrong” into “here’s exactly what’s wrong and why.”

Multi-agent orchestration

The research rabbit hole I enjoyed most was multi-agent orchestration. Not the “swarm” hype you see in conference talks, where fifty agents collaborate on a task like some kind of AI beehive. That’s mostly theatre.

The practical version is much simpler and more useful. How does one agent hand off a task to another? How do they share context without duplicating everything? How do you keep them coordinated without a human manually directing traffic? What happens when one agent’s work conflicts with another’s?

These patterns exist in distributed systems already. The trick was adapting them for AI agents, where the “workers” are non-deterministic, occasionally wrong, and need different amounts of context depending on the task.

The gap

What I found across all this research was a clear gap in the market. The tools that existed fell into two categories.

The first: chat interfaces pretending to be agents. You could talk to an AI about your code, and it would suggest changes, but it had no real agency. No ability to run things, verify its work, or maintain context across sessions. Fancy autocomplete with a conversation wrapper.

The second: fully autonomous systems where you couldn’t see or control what they were doing. They’d go off, make changes, and you’d come back to a pull request with forty modified files and no clear explanation of why. Impressive, sometimes. Trustworthy, rarely.

I wanted something in between. An agent that could genuinely act, with real tools and real autonomy, but where you could see what it was thinking, control what it was allowed to do, and understand why it made each decision.

What it shaped

Every significant decision that followed traced back to this research month. The brain system, Jeff’s persistent memory backed by git, came from studying how other tools handled (or failed to handle) long-term context. The office concept, a structured workspace where agents coordinate, came from the orchestration research. The permission model, where Jeff asks before doing anything destructive, came from watching what happens when agents don’t.

None of it was invented from scratch. All of it was informed by understanding what already existed and where it fell short.

The unglamorous truth

Research doesn’t feel productive. You’re not shipping code. You’re not closing tickets. You’re reading, comparing, diagramming, and writing documents that nobody else will ever see. There’s a voice in your head the entire time saying “just start building.”

But it saves you from building the wrong thing. And in a space moving as fast as AI tooling was in early 2026, building the wrong thing meant wasting months that you wouldn’t get back.

The weeks I spent studying the landscape were, in hindsight, the most productive weeks of the entire project.