Writing tests with coding agents is a bad idea
One thing that I found extremely important is that with AI now code becomes easier to create so you should use it to be extremely explicit and declarative. It also makes it much much easier to evaluate the code. I'm talking about things like no inheritance, functional patterns and so on.
The only part where AI doesn't get me or where I get angry is when it comes to tests.
My observation is that whenever I involve AI tools in my test-writing process, they tend to generate tests for all kinds of random scenarios. A lot of times, this feels distracting and even manipulative. Instead of thinking from the user's point of view, I find myself thinking purely in terms of code logic.
My approach to writing tests is to cover everything that's important for the user. The goal of writing tests is to ensure that my code doesn't break in production due to known reasons. If it breaks for an unknown reason, I fix it and then add a test for that specific case. But if I just focus on code logic and start writing all sorts of imaginary tests, it becomes useless.
Inventing imaginary failures upfront is usually wasted effort.
Even during PR reviews, AI can be distracting. It often suggests unnecessary test coverage or overcomplicates existing tests even when I know as a developer that it's not required. The whole PR ends up getting polluted with these irrelevant comments. Before AI, having a dense, insightful comment on a PR was considered a good thing. But now, I'm not so sure.
AI also tends to add excessive fallbacks for everything. As a developer, I understand the risks involved and can make informed decisions about them. AI tools, however, don't see the full picture and often suggest solutions that I never asked for.
We should use AI to make code explicit and easier to evaluate. Use it to reduce boilerplate and to catch obvious mistakes. But when it comes to tests treat AI as an more like autocomplete at best, not a peer. Keep tests focused on what matters to users, let impossible invariants fail loudly, and add tests for real failures as they happen. That discipline keeps code honest, reviews useful, and teams moving.

