Hidden prices in AI deployment: Why Claude fashions could also be 20-30% dearer than GPT in enterprise settings

It’s a well-known indisputable fact that totally different mannequin households can use totally different tokenizers. Nevertheless, there was restricted evaluation on how the method of “tokenization” itself varies throughout these tokenizers. Do all tokenizers end in the identical variety of tokens for a given enter textual content? If not, how totally different are the generated tokens? How vital are the variations?

On this article, we discover these questions and study the sensible implications of tokenization variability. We current a comparative story of two frontier mannequin households: OpenAI’s ChatGPT vs Anthropic’s Claude. Though their marketed “cost-per-token” figures are extremely aggressive, experiments reveal that Anthropic fashions could be 20–30% dearer than GPT fashions.

API Pricing — Claude 3.5 Sonnet vs GPT-4o

As of June 2024, the pricing construction for these two superior frontier fashions is extremely aggressive. Each Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o have an identical prices for output tokens, whereas Claude 3.5 Sonnet presents a 40% decrease value for enter tokens.

Supply: Vantage

The hidden “tokenizer inefficiency”

Regardless of decrease enter token charges of the Anthropic mannequin, we noticed that the whole prices of operating experiments (on a given set of fastened prompts) with GPT-4o is less expensive when in comparison with Claude Sonnet-3.5.

Why?

The Anthropic tokenizer tends to interrupt down the identical enter into extra tokens in comparison with OpenAI’s tokenizer. Because of this, for an identical prompts, Anthropic fashions produce significantly extra tokens than their OpenAI counterparts. Because of this, whereas the per-token value for Claude 3.5 Sonnet’s enter could also be decrease, the elevated tokenization can offset these financial savings, resulting in greater general prices in sensible use circumstances.

This hidden value stems from the way in which Anthropic’s tokenizer encodes data, usually utilizing extra tokens to characterize the identical content material. The token rely inflation has a big impression on prices and context window utilization.

Area-dependent tokenization inefficiency

Several types of area content material are tokenized in a different way by Anthropic’s tokenizer, resulting in various ranges of elevated token counts in comparison with OpenAI’s fashions. The AI analysis group has famous related tokenization variations right here. We examined our findings on three common domains, specifically: English articles, code (Python) and math.

% Token Overhead of Claude 3.5 Sonnet Tokenizer (relative to GPT-4o) Supply: Lavanya Gupta

When evaluating Claude 3.5 Sonnet to GPT-4o, the diploma of tokenizer inefficiency varies considerably throughout content material domains. For English articles, Claude’s tokenizer produces roughly 16% extra tokens than GPT-4o for a similar enter textual content. This overhead will increase sharply with extra structured or technical content material: for mathematical equations, the overhead stands at 21%, and for Python code, Claude generates 30% extra tokens.

This variation arises as a result of some content material varieties, akin to technical paperwork and code, usually include patterns and symbols that Anthropic’s tokenizer fragments into smaller items, resulting in the next token rely. In distinction, extra pure language content material tends to exhibit a decrease token overhead.

Different sensible implications of tokenizer inefficiency

Past the direct implication on prices, there may be additionally an oblique impression on the context window utilization. Whereas Anthropic fashions declare a bigger context window of 200K tokens, versus OpenAI’s 128K tokens, as a consequence of verbosity, the efficient usable token house could also be smaller for Anthropic fashions. Therefore, there might probably be a small or giant distinction within the “marketed” context window sizes vs the “efficient” context window sizes.

Implementation of tokenizers

GPT fashions use Byte Pair Encoding (BPE ), which merges incessantly co-occurring character pairs to type tokens. Particularly, the newest GPT fashions use the open-source o200k_base tokenizer. The precise tokens utilized by GPT-4o (within the tiktoken tokenizer) could be seen right here.

JSON
 
{
    #reasoning
    "o1-xxx": "o200k_base",
    "o3-xxx": "o200k_base",

    # chat
    "chatgpt-4o-": "o200k_base",
    "gpt-4o-xxx": "o200k_base",  # e.g., gpt-4o-2024-05-13
    "gpt-4-xxx": "cl100k_base",  # e.g., gpt-4-0314, and so forth., plus gpt-4-32k
    "gpt-3.5-turbo-xxx": "cl100k_base",  # e.g, gpt-3.5-turbo-0301, -0401, and so forth.
}

Sadly, not a lot could be mentioned about Anthropic tokenizers as their tokenizer is just not as straight and simply accessible as GPT. Anthropic launched their Token Counting API in Dec 2024. Nevertheless, it was quickly demised in later 2025 variations.

Latenode studies that “Anthropic makes use of a singular tokenizer with solely 65,000 token variations, in comparison with OpenAI’s 100,261 token variations for GPT-4.” This Colab pocket book comprises Python code to investigate the tokenization variations between GPT and Claude fashions. One other device that allows interfacing with some widespread, publicly accessible tokenizers validates our findings.

The power to proactively estimate token counts (with out invoking the precise mannequin API) and finances prices is essential for AI enterprises.

Key Takeaways

Anthropic’s aggressive pricing comes with hidden prices:
Whereas Anthropic’s Claude 3.5 Sonnet presents 40% decrease enter token prices in comparison with OpenAI’s GPT-4o, this obvious value benefit could be deceptive as a consequence of variations in how enter textual content is tokenized.
Hidden “tokenizer inefficiency”:
Anthropic fashions are inherently extra verbose. For companies that course of giant volumes of textual content, understanding this discrepancy is essential when evaluating the true value of deploying fashions.
Area-dependent tokenizer inefficiency:
When selecting between OpenAI and Anthropic fashions, consider the character of your enter textual content. For pure language duties, the price distinction could also be minimal, however technical or structured domains might result in considerably greater prices with Anthropic fashions.
Efficient context window:
As a result of verbosity of Anthropic’s tokenizer, its bigger marketed 200K context window might supply much less efficient usable house than OpenAI’s 128K, resulting in a potential hole between marketed and precise context window.

Anthropic didn’t reply to VentureBeat’s requests for remark by press time. We’ll replace the story in the event that they reply.

Day by day insights on enterprise use circumstances with VB Day by day

If you wish to impress your boss, VB Day by day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Try extra VB newsletters right here.

An error occured.

Area	Mannequin Enter	GPT Tokens	Claude Tokens	% Token Overhead
English articles		77	89	~16%
Code (Python)		60	78	~30%
Math		114	138	~21%

Hidden prices in AI deployment: Why Claude fashions could also be 20-30% dearer than GPT in enterprise settings

API Pricing — Claude 3.5 Sonnet vs GPT-4o

The hidden “tokenizer inefficiency”

Area-dependent tokenization inefficiency

Different sensible implications of tokenizer inefficiency

Implementation of tokenizers

Key Takeaways

Deixe um comentário Cancelar resposta

John Bolton’s case for optimism about Donald Trump

Hidden prices in AI deployment: Why Claude fashions could also be 20-30% dearer than GPT in enterprise settings

Vodafone and UK Nationwide Parks partnership makes use of AI know-how to assist join individuals with nature and shield it for the longer term

Why Development Forecasting Nonetheless Occurs in Spreadsheets & What to Do about It

String Idea for Robotics – Hackster.io

Is Your B2B Advertising Lacking Out? Maximize Social with Cisco Advertising Velocity

SiFly Launches NDAA-Compliant Lengthy-Endurance Drones

Kongsberg Celebrates Canadian Workplace Enlargement with Ribbon-Chopping Ceremony in Kanata – sUAS Information

Lamborghini joins DreamHack Dallas as a predominant accomplice

Construct end-to-end Apache Spark pipelines with Amazon MWAA, Batch Processing Gateway, and Amazon EMR on EKS clusters

Optoelectronic synapses realized on massive scale steady MoSe2 with Te doping induced tunable reminiscence operate

Sufferers Say Wholesome Intestine Micro organism Relieved Their Continual Ache in a Puzzling Illness