Receive tokens as they are generated instead of waiting for the full response.
Streaming lets your application display or process tokens as the model generates them, dramatically reducing perceived latency for users. Instead of waiting for the entire response to finish, you receive a continuous stream of small chunks over a persistent HTTP connection.
Routeway supports Server-Sent Events (SSE) streaming using the same format as the OpenAI API. Any client that works with OpenAI streaming will work with Routeway out of the box.
Set "stream": true in your request. The response will be delivered as a series of data: lines, each containing a JSON chunk. The stream ends with data: [DONE].
Each chunk carries a delta object instead of a full message. Text arrives through delta.content. The final chunk sets finish_reason to indicate why the model stopped.
Errors can appear mid-stream as a regular SSE data frame containing an error key. Always check for this before assuming a chunk contains choices.
for chunk in stream: # Check for an error frame if hasattr(chunk, "error"): print(f"Stream error: {chunk.error}") break delta = chunk.choices[0].delta if delta.content: print(delta.content, end="", flush=True)
Do not treat the stream as plain text. Parse each chunk as JSON and handle finish_reason and error frames explicitly.