GroveAI
Glossary

Beam Search

Beam search is a text generation strategy that explores multiple candidate sequences simultaneously, keeping the top-scoring options at each step to find a globally better output than greedy decoding.

Beam search is a decoding algorithm used in sequence generation tasks. Rather than greedily selecting the single most probable next token at each step, beam search maintains multiple candidate sequences (called beams) in parallel. At each generation step, it expands all current beams, scores the results, and keeps only the top-scoring candidates. The beam width (or beam size) parameter controls how many candidates are maintained. A beam width of 5 means the algorithm tracks the 5 most promising sequences at all times. This allows the algorithm to explore paths that might not start with the single highest-probability token but lead to better overall sequences. Beam search strikes a balance between exhaustive search (computationally infeasible for long sequences) and greedy decoding (which can miss globally optimal solutions). It is particularly effective for tasks where output quality is more important than diversity, such as machine translation, speech recognition, and summarisation.

Why Beam Search Matters for Business

The choice of decoding strategy directly affects the quality and characteristics of AI-generated outputs. Beam search tends to produce more polished, coherent outputs than sampling-based methods, making it suitable for production applications where consistency and accuracy are paramount. In machine translation systems, beam search has been the standard approach for years because it reliably produces high-quality translations. In document summarisation, it helps ensure that generated summaries are grammatically correct and factually consistent with the source material. However, beam search can also produce outputs that are safe but bland, lacking the natural variation of human language. For conversational AI and creative applications, sampling-based approaches like top-k or top-p sampling are often preferred. Understanding these trade-offs helps teams choose the right generation strategy for each use case.

FAQ

Frequently asked questions

Use beam search for tasks requiring accuracy and consistency, such as translation or summarisation. Use sampling (top-k, top-p) for conversational or creative tasks where natural variety is desired. Many modern chat models default to sampling-based approaches.

Typical beam widths range from 2 to 10. Larger beams explore more possibilities but increase computation time and memory usage. Research suggests that very large beam widths often yield diminishing returns and can even degrade quality.

Most modern conversational AI systems use sampling-based methods rather than beam search, as sampling produces more natural and varied responses. Beam search is more common in behind-the-scenes tasks like translation and transcription.

Need help implementing this?

Our team can help you apply these concepts to your business. Book a free strategy call.