The Responses API simplifies multi-turn conversations by letting you reference a previous response instead of manually maintaining and resending the full message history on every request.
How It Works
Each response has a unique id. Pass it as previous_response_id in your next request, and the model automatically has access to the full conversation context up to that point.
You don’t need to resend any messages from earlier turns. The server reconstructs the full conversation from the chain of response IDs.
Basic Example
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.routeway.ai/v1",
api_key=os.getenv("ROUTEWAY_API_KEY")
)
# Turn 1
response = client.responses.create(
model="gpt-4o",
input="What is Rust?"
)
print(response.output_text)
# Turn 2 — references the first response
response = client.responses.create(
model="gpt-4o",
input="How does its ownership model work?",
previous_response_id=response.id
)
print(response.output_text)
# Turn 3 — references the second response (which includes turn 1)
response = client.responses.create(
model="gpt-4o",
input="Give me a simple code example.",
previous_response_id=response.id
)
print(response.output_text)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.routeway.ai/v1",
apiKey: process.env.ROUTEWAY_API_KEY,
});
// Turn 1
let response = await client.responses.create({
model: "gpt-4o",
input: "What is Rust?",
});
console.log(response.output_text);
// Turn 2
response = await client.responses.create({
model: "gpt-4o",
input: "How does its ownership model work?",
previous_response_id: response.id,
});
console.log(response.output_text);
// Turn 3
response = await client.responses.create({
model: "gpt-4o",
input: "Give me a simple code example.",
previous_response_id: response.id,
});
console.log(response.output_text);
# Turn 1
curl https://api.routeway.ai/v1/responses \
-H "Authorization: Bearer $ROUTEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "What is Rust?"
}'
# Returns: {"id": "resp_abc123", ...}
# Turn 2 — reference the previous response
curl https://api.routeway.ai/v1/responses \
-H "Authorization: Bearer $ROUTEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "How does its ownership model work?",
"previous_response_id": "resp_abc123"
}'
Using Instructions Across Turns
The instructions parameter applies system-level guidance to every turn. When using previous_response_id, include instructions on each request to maintain consistent behavior.
instructions = "You are a Rust tutor. Give concise, beginner-friendly explanations."
response = client.responses.create(
model="gpt-4o",
instructions=instructions,
input="What are lifetimes in Rust?"
)
response = client.responses.create(
model="gpt-4o",
instructions=instructions,
input="Why are they needed?",
previous_response_id=response.id
)
Unlike Chat Completions where the system message is part of messages, the instructions parameter sits outside the conversation history. This makes it easy to change instructions between turns without modifying the stored context.
Conversation Branching
Since each response has its own ID, you can branch conversations by referencing the same previous_response_id in multiple follow-up requests:
# Base question
base = client.responses.create(
model="gpt-4o",
input="Compare Python and Rust."
)
# Branch A — performance focus
branch_a = client.responses.create(
model="gpt-4o",
input="Focus on performance differences.",
previous_response_id=base.id
)
# Branch B — ecosystem focus (same parent)
branch_b = client.responses.create(
model="gpt-4o",
input="Focus on ecosystem and libraries.",
previous_response_id=base.id
)
Disabling Storage
By default, responses are stored server-side so they can be referenced by future requests. If you don’t need multi-turn and want to avoid storing data, set store: false:
{
"model": "gpt-4o",
"input": "One-off question, no need to remember this.",
"store": false
}
When store is false, the response ID cannot be used as a previous_response_id in future requests. Only disable storage for truly one-off queries.
Comparison: Multi-turn Approaches
| Approach | When to use |
|---|
previous_response_id | Conversations where the server manages state — simpler client code |
Manual input array | When you need full control over what context the model sees, or want to trim/edit history |
Chat Completions messages | Maximum model coverage, non-OpenAI SDKs, or fine-grained message control |
Token Usage in Multi-turn
Each turn in a previous_response_id chain re-processes the full conversation history as input tokens. The billing is the same as Chat Completions — longer conversations cost more per turn.
To manage costs:
- Keep conversations short when possible
- Use
instructions instead of long system messages repeated in input
- Start a new conversation chain when the topic changes significantly
- Use Prompt Caching on models that support it
Token usage reported in the usage field reflects the full context processed for that turn, including all prior messages from the chain.