GroveAI
Glossary

Context Window

A context window is the maximum amount of text (measured in tokens) that a language model can process in a single interaction, encompassing both the input prompt and the generated response.

What is a Context Window?

The context window is the total amount of text a language model can "see" and work with at any given time. It includes everything: the system prompt, conversation history, any documents or data provided, and the model's own response. Once this limit is exceeded, the model cannot process additional information without dropping earlier content. Context windows are measured in tokens — sub-word units that typically represent 3-4 characters of English text. Modern models offer context windows ranging from 8,000 tokens (about 6,000 words) to over 1 million tokens (roughly 750,000 words), with sizes continuing to grow.

How Context Windows Work

The context window is a fundamental architectural constraint of transformer-based models. The attention mechanism that gives transformers their power computes relationships between every pair of tokens in the context, which means computational cost scales quadratically with context length. Longer contexts require more memory and compute to process. In practice, most applications manage context carefully. For chatbots, this means maintaining a sliding window of recent conversation history. For document analysis, it means chunking large documents and processing them in sections. For RAG systems, it means retrieving only the most relevant passages rather than entire documents. Not all parts of the context window are equally effective. Research has shown that information in the middle of very long contexts can be less reliably attended to than information at the beginning or end — a phenomenon sometimes called "lost in the middle." This has practical implications for how documents and context should be ordered within prompts.

Why Context Windows Matter for Business

Context window size directly determines what applications are feasible. A small context window limits the model to short conversations and small documents. A large context window enables processing entire codebases, lengthy legal contracts, full research papers, or extended multi-turn conversations without losing context. For businesses, this affects architectural decisions. Applications that need to process large documents must either use models with sufficient context windows or implement strategies like chunking and summarisation to work within limits. The trade-off between context size, cost, and speed is a key consideration. Larger context windows also increase API costs, since providers typically charge per token processed. Processing a 100-page document through a million-token context window costs proportionally more than processing a single page. Understanding these economics is essential for cost-effective AI deployment.

Working With Context Limits

Several strategies help organisations work effectively within context window constraints. RAG retrieves only the most relevant information for each query, keeping context focused and efficient. Summarisation condenses earlier conversation history or long documents into shorter representations that preserve key information. Context management frameworks automatically track token usage and implement strategies like truncation, compression, or hierarchical summarisation as conversations grow long. For applications that require processing very large documents, map-reduce approaches process the document in chunks and combine the results. Choosing the right model for the task is also important. Not every application needs a million-token context window, and using a smaller, faster model for simple tasks is often more cost-effective than always using the largest available context.

Related Terms

Explore further

FAQ

Frequently asked questions

Approximately 96,000 words or 300-400 pages of standard text. This is enough for most business documents, but varies significantly based on the content type. Code, technical documentation, and non-English text may use more tokens per word.

Not necessarily. While longer contexts allow more information to be processed, models can be less reliable at attending to information buried in very long contexts. For many applications, providing focused, relevant context produces better results than simply maximising the amount of input.

When the context limit is reached, the model cannot process additional tokens. Applications typically handle this by truncating older conversation history, summarising previous exchanges, or returning an error. Well-designed applications monitor token usage and manage context proactively.

Need help implementing this?

Our team can help you apply these concepts to your business. Book a free strategy call.