When people talk about AI, they usually focus on outputs: the perfect email, the killer business plan, the bedtime story written in the voice of Morgan Freeman. What they don’t talk about is the meter running in the background. Every interaction with an AI model - every prompt you type, every word it spits back - costs you something.

Not metaphorically. Literally. Money.

Welcome to the world of tokens - the micro-units of language that AI models chew through to do their job. If you’re a founder, investor, or even a casual GenAI user, you need to understand tokens. Because in AI, tokens are money.

This post is your no-nonsense guide to what tokens are, how they’re counted, why they cost what they do, and what to think about to get the best ROI.


What is a token?

A token is a chunk of text. It could be a whole word, part of a word, punctuation, or even a trailing space.

To give you a rough idea:

  • "apple" = 1 token
  • "hamburger" = 3 tokens ("ham", "bur", "ger")
  • "Let's go!" = 4 tokens ("Let", "'s", "go", "!")

Different AI models use different tokenisers - rules that decide how to break up text - but the general rule of thumb is:

1 token ≈ 4 characters or 0.75 words in English.

This matters because you’re not paying by the word. You’re paying by the token.

If you want to get to grips with it yourself, you can actually have a play with OpenAI's version here: https://platform.openai.com/tokenizer


Why AI models use tokens (and not words)

Imagine a model trying to process your input one character at a time. It would be slow, blind to your intended context, and expensive.

Tokens give models a way to process text efficiently in manageable chunks. They strike a balance between granularity (distinguishing "re-" from "redo" and "revenue") and performance (not blowing the compute budget on every single "the").

When an AI model runs, it reads your prompt token by token, processes it, and generates a response - again, token by token. You pay for both.


What are you paying for?

Every AI model you use is essentially charging you for compute time, memory access, and model inference - priced in tokens. Think of tokens like electricity: they’re not the point of the interaction, but you’re not getting anything done without them.

Providers charge per 1,000 tokens, and rates vary widely depending on model size, context window, and vendor. Here's a rough breakdown:

ProviderModelInput (per 1K tokens)Output (per 1K tokens)Context Limit
OpenAIGPT-4 Turbo$0.01$0.03128K tokens
AnthropicClaude 3 Opus$15.00$18.751M tokens
GoogleGemini 2.5 Pro$1.25 - $2.50$10 - $151M tokens
MetaLlama 3/4 (API)$0.10 - $0.59$0.50 - $0.88~130K tokens

So, summarising a 3-page memo might cost $0.02 on GPT-4 Turbo, but $0.40 on Claude Opus. Not catastrophic - until you're doing it 10,000 times.

This is where smart usage (and smart tooling) makes a difference.


Real world example: the due diligence memo

Say you’re a PE firm summarising CIMs using Claude Opus:

  • 3-page document = ~2,000 input tokens = $0.03
  • Summary output = ~500 tokens = $0.009
  • Total = $0.039 per doc

Do that for 500 deals a year and you’re into $20+ just for one use case. Triple that if you run multiple drafts, models, or stakeholders.

Could you use GPT-4 or Llama for 80% of that work at a fifth of the cost? Probably. Could you automatically trim prompts, cache contexts, and reduce output verbosity by 30% without losing quality? Definitely - if you know how.

That’s where teams like GiantKelp earn their keep: not by throwing the fanciest model at every problem, but by using the right one, the right way.


Tokenisation isn't standard

One frustrating twist? Different models tokenise differently. That means the same sentence can be cheaper on one model than another, even if pricing looks similar.

  • "Let's get started on revenue projections."
    • GPT-4: 8 tokens
    • Claude: 7 tokens
    • Gemini: 6 tokens
    • Llama: 10 tokens

It’s like each provider invented their own version of Scrabble scoring. And it means prompt design isn’t portable - another reason not to hardwire yourself to a single tool.


Why this matters

Tokens aren’t just a billing mechanism. They’re the atomic unit of how AI works. Understanding tokens helps you:

  • Control cost without compromising performance
  • Choose the right model for the right job
  • Build scalable tools without accidental waste
  • Ask better questions in vendor meetings

And when you're serious about AI - as an operator or an investor - that's the difference between playing with AI and using it strategically.


Five quick wins for smarter token use

  1. Use shorter, clearer prompts - Remove filler, test variations
  2. Set output limits - Don’t let the model ramble
  3. Cache and reuse - Don’t resend the same content repeatedly
  4. Trim chat history - Especially in assistant-style apps
  5. Choose the right model - Don’t use a sledgehammer on a thumbtack

And yes, these are the kinds of things we obsess over at GiantKelp. Because in AI, quality and efficiency aren’t opposites. They’re a compound return.


Final thought: listen for the meter

Tokens are the sound of your AI engine ticking over. You don’t need to count every one. But you do need to know they’re there. Because they matter and they add up - to cost, to performance, to outcomes.

The good news? Once you learn to hear them, you can start making smarter choices.

And if you want someone who already knows where the efficiencies are hidden? You know where to find us.

·- ··

At GiantKelp, we build AI tools which elevate your people and your business. Talk to us to find out how. #GrowLikeKelp

·- ··

#GenerativeAI #AIforBusiness #TokensExplained #SMBtech #DigitalTransformation