Skip to main content
Streaming lets your application display or process tokens as the model generates them, dramatically reducing perceived latency for users. Instead of waiting for the entire response to finish, you receive a continuous stream of small chunks over a persistent HTTP connection.
Routeway supports Server-Sent Events (SSE) streaming using the same format as the OpenAI API. Any client that works with OpenAI streaming will work with Routeway out of the box.

How It Works

Set "stream": true in your request. The response will be delivered as a series of data: lines, each containing a JSON chunk. The stream ends with data: [DONE].
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"finish_reason":null}]}

data: [DONE]
Each chunk carries a delta object instead of a full message. Text arrives through delta.content. The final chunk sets finish_reason to indicate why the model stopped.

Basic Example

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain quantum entanglement simply."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

Streaming with Tool Calls

When a model decides to call a tool, arguments arrive incrementally through delta.tool_calls. Accumulate the chunks before executing the function.
import json
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"],
            },
        },
    }
]

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    stream=True,
)

tool_call_chunks = {}

for chunk in stream:
    delta = chunk.choices[0].delta

    # Accumulate text content
    if delta.content:
        print(delta.content, end="", flush=True)

    # Accumulate tool call arguments
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            if idx not in tool_call_chunks:
                tool_call_chunks[idx] = {"id": tc.id, "name": tc.function.name, "args": ""}
            if tc.function.arguments:
                tool_call_chunks[idx]["args"] += tc.function.arguments

# After stream ends, execute the tool
for tc in tool_call_chunks.values():
    args = json.loads(tc["args"])
    print(f"\nCalling {tc['name']} with {args}")

Include Usage Stats

To receive token usage at the end of a stream, pass stream_options:
{
  "model": "gpt-4o-mini",
  "messages": [...],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}
A final chunk will include a usage object after [DONE]:
{
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

Handling Errors in Streams

Errors can appear mid-stream as a regular SSE data frame containing an error key. Always check for this before assuming a chunk contains choices.
for chunk in stream:
    # Check for an error frame
    if hasattr(chunk, "error"):
        print(f"Stream error: {chunk.error}")
        break

    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
Do not treat the stream as plain text. Parse each chunk as JSON and handle finish_reason and error frames explicitly.

Best Practices

Flush output immediately

Use flush=True (Python) or process.stdout.write (Node.js) so tokens render in real time.

Handle reconnects

Network interruptions can cut streams short. Track the last received content and retry if needed.

Accumulate tool args

Tool call arguments arrive in fragments. Always concatenate before calling JSON.parse.

Check finish_reason

length means the model hit max_tokens. stop means a natural end. tool_calls means a function was requested.