Tokens != intelligence
The perceived cost of intelligence can be misleading.
A model might require longer or more complex prompts, or more loops, to produce the same-quality output as another model. Intelligence is the quality of the response, not just raw token cost.
Tokenizer efficiency also matters.
Claude's tokenizer produces approximately 16% more tokens than GPT-4o for the same input. That varies by domain: for mathematical equations the overhead is about 21%, and for Python code Claude generates roughly 30% more tokens.
When comparing models on real tasks :-
While a model like GPT-5 may have a cheaper per-token price than Sonnet-4, an agent using GPT-5 might consume far more tokens and require more turns to get a good result, negating the cost savings.
The ultimate cost combines token price, total tokens used, and the number of turns required to complete a task.

