Skip to main content
The Responses API simplifies multi-turn conversations by letting you reference a previous response instead of manually maintaining and resending the full message history on every request.

How It Works

Each response has a unique id. Pass it as previous_response_id in your next request, and the model automatically has access to the full conversation context up to that point.
You don’t need to resend any messages from earlier turns. The server reconstructs the full conversation from the chain of response IDs.

Basic Example

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

# Turn 1
response = client.responses.create(
    model="gpt-4o",
    input="What is Rust?"
)
print(response.output_text)

# Turn 2 — references the first response
response = client.responses.create(
    model="gpt-4o",
    input="How does its ownership model work?",
    previous_response_id=response.id
)
print(response.output_text)

# Turn 3 — references the second response (which includes turn 1)
response = client.responses.create(
    model="gpt-4o",
    input="Give me a simple code example.",
    previous_response_id=response.id
)
print(response.output_text)

Using Instructions Across Turns

The instructions parameter applies system-level guidance to every turn. When using previous_response_id, include instructions on each request to maintain consistent behavior.
instructions = "You are a Rust tutor. Give concise, beginner-friendly explanations."

response = client.responses.create(
    model="gpt-4o",
    instructions=instructions,
    input="What are lifetimes in Rust?"
)

response = client.responses.create(
    model="gpt-4o",
    instructions=instructions,
    input="Why are they needed?",
    previous_response_id=response.id
)
Unlike Chat Completions where the system message is part of messages, the instructions parameter sits outside the conversation history. This makes it easy to change instructions between turns without modifying the stored context.

Conversation Branching

Since each response has its own ID, you can branch conversations by referencing the same previous_response_id in multiple follow-up requests:
# Base question
base = client.responses.create(
    model="gpt-4o",
    input="Compare Python and Rust."
)

# Branch A — performance focus
branch_a = client.responses.create(
    model="gpt-4o",
    input="Focus on performance differences.",
    previous_response_id=base.id
)

# Branch B — ecosystem focus (same parent)
branch_b = client.responses.create(
    model="gpt-4o",
    input="Focus on ecosystem and libraries.",
    previous_response_id=base.id
)

Disabling Storage

By default, responses are stored server-side so they can be referenced by future requests. If you don’t need multi-turn and want to avoid storing data, set store: false:
{
  "model": "gpt-4o",
  "input": "One-off question, no need to remember this.",
  "store": false
}
When store is false, the response ID cannot be used as a previous_response_id in future requests. Only disable storage for truly one-off queries.

Comparison: Multi-turn Approaches

ApproachWhen to use
previous_response_idConversations where the server manages state — simpler client code
Manual input arrayWhen you need full control over what context the model sees, or want to trim/edit history
Chat Completions messagesMaximum model coverage, non-OpenAI SDKs, or fine-grained message control

Token Usage in Multi-turn

Each turn in a previous_response_id chain re-processes the full conversation history as input tokens. The billing is the same as Chat Completions — longer conversations cost more per turn. To manage costs:
  • Keep conversations short when possible
  • Use instructions instead of long system messages repeated in input
  • Start a new conversation chain when the topic changes significantly
  • Use Prompt Caching on models that support it
Token usage reported in the usage field reflects the full context processed for that turn, including all prior messages from the chain.