On June 10, OpenAI slashed the listing value of its flagship reasoning mannequin, o3, by roughly 80% from $10 per million enter tokens and $40 per million output tokens to $2 and $8, respectively. API resellers reacted instantly: Cursor now counts one o3 request the identical as a GPT-4o name, and Windsurf lowered the “o3-reasoning” tier to a single credit score as nicely. For Cursor customers, that’s a ten-fold value reduce in a single day.
Latency improved in parallel. OpenAI hasn’t revealed new latency metrics; third-party dashboards nonetheless see time to first token (TTFT) within the 15s to 20s vary for lengthy prompts. Due to recent Nvidia GB200 clusters and a revamped scheduler that shards lengthy prompts throughout extra GPUs, o3 feels snappier in actual use. o3 remains to be slower than light-weight fashions, however now not coffee-break sluggish.
Claude 4 is quick but sloppy
A lot of the group’s oxygen has gone to Claude 4. It’s undeniably fast, and its 200k context window feels luxurious. But, in day-to-day coding, I, together with many Reddit and Discord posters, preserve tripping over Claude’s motion bias: It fortunately invents stubbed features as a substitute of actual implementations, fakes unit assessments, or rewrites mocks that had been instructed to go away alone. The pace is nice; the follow-through usually isn’t.