LLM (Large Language Model)

A neural network trained on massive text datasets that can generate, summarize, and reason about language.

A large language model is a neural network — almost always a transformer — trained on massive text datasets to predict the next token in a sequence. That one objective, applied at sufficient scale, produces a system that can generate, summarize, translate, answer questions, and follow complex instructions.

The "next token prediction" framing sounds limiting. It isn't. At scale, the model has to build a useful internal representation of the world to predict text well, which is why these systems can reason about things they were never explicitly taught to reason about. The capability is real. The hype around it is also real, which means calibration matters.

For practical purposes, you rarely need to care about architecture details. What you do need to evaluate: context window size, reliability on your specific task, latency, and cost per token. A smaller, cheaper model that handles 95% of your cases reliably beats a frontier model that occasionally hallucinates on 5% of them — depending on what that 5% costs you.

The most consequential choice isn't which LLM to use. It's whether to use a general-purpose model as-is, fine-tune one for your domain, or build retrieval on top of it with RAG. Those decisions determine cost, maintainability, and how hard it is to recover when the model gets something wrong.

Related Terms