Token

The basic unit of text that language models read and generate — roughly three-quarters of an English word on average.

A token is the smallest chunk of text a large language model processes. Not a word — a piece of one. Common words like "the" are a single token. Longer words get split: "understanding" becomes "under" + "standing." One token is roughly 0.75 English words, or about four characters. A typical business email runs around 200 tokens. A 200-page document is about 75,000.

Tokens are how you pay. Every major provider — OpenAI, Anthropic, Google — prices on input and output token count separately. A support system processing 10,000 conversations a day at 500 tokens each burns 5 million tokens daily. Whether that's noise or a real budget line depends entirely on the model tier you picked.

Tokens also define what the model can work with at once. A model's context window is measured in tokens. Exceed it and you need to chunk the input, which introduces engineering complexity and new failure modes — losing context at chunk boundaries, inconsistent retrieval, harder debugging. The transformer architecture processes all tokens in the window simultaneously, which is why larger windows cost more compute and more money.

Treat token usage like any other cloud resource. Know your per-transaction counts, right-size models to your workload, and watch for runaway costs from verbose system prompts or bloated context loads. A 10x cost difference between models is common. The model that's "good enough" for your use case is almost always the right choice.

Related Terms