Why do we still need to babysit coding agents?

One of the biggest problems with coding agents today is that they invent implicit rules and then rigidly follow them.

Take Claude, for example. It loves adding backward compatibility. Sounds thoughtful, right? But in practice, this often leads to bloated, brittle, or just plain terrible code. The intent is noble. The outcome? Messy.

You can try solving this by writing explicit guidelines in the claude.md file. But even then, things slip through. Frequently.

The inability to reliably follow explicit instructions makes these agents poor coding partners. You think you've been clear, but they hallucinate structure or logic that no one asked for.

To be fair, Claude has gotten better from Sonnet 3.7 to Sonnet 4, it now follows instructions more faithfully. But still, not enough to be truly reliable.

Now, the idea of "just only explicit instructions" is appealing in theory. But in practice, human programmers don't work like that either. We bring in context, experience, and intuition. We fill in the blanks when instructions are ambiguous or incomplete which they always are. Because to write 100% complete instructions, you'd have to describe the universe.

A human will pause, ask questions, and seek clarification when in doubt. An LLM? It just plows ahead, confidently wrong.

I'm sure there are some RLHF techniques that could improve this. But at some level, I think this is an inherent limitation.

And that's fine if you're ready to babysit. Not fine if you expect the agent to ship production-quality code without dragging you into a pit of tech debt.

Depending on how much oversight is needed, using an LLM might actually be slower than coding by hand. The time you save up front? You'll probably spend later debugging and refactoring.

All this is loosely tied to another truth: LLMs tend to generate average code at best. At worst, they produce overengineered, subtly broken slop that looks okay until it bites you.

A junior developer can be coached, mentored, and improved. An LLM can't even if you think you just need a little prompt engineering to fix this. Believe me I tried a lot.