Streaming
Streaming is a technique where AI model responses are delivered incrementally, token by token, as they are generated, rather than waiting for the complete response before displaying it.
How Streaming Works in AI
Why Streaming Matters for Business
Related Terms
Explore further
FAQ
Frequently asked questions
No. Streaming only changes how the response is delivered, not what is generated. The final output is identical whether streaming is enabled or disabled. It is purely a delivery optimisation.
Time-to-first-token (TTFT) measures how long a user waits before seeing the first token of the response. It is a key latency metric for streamed AI applications, as it determines the perceived responsiveness of the system.
Streaming requires more complex client-side implementation to handle partial responses. It can also make it harder to implement features that depend on the complete response, such as formatting entire tables or validating structured output before display.
Need help implementing this?
Our team can help you apply these concepts to your business. Book a free strategy call.