Skip to main content
POST
/
v1
/
responses
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

response = client.responses.create(
    model="gpt-4o",
    input="What is the capital of France?"
)

print(response.output_text)
{
  "id": "<string>",
  "object": "response",
  "created_at": 123,
  "completed_at": 123,
  "status": "<string>",
  "incomplete_details": {
    "reason": "<string>"
  },
  "model": "<string>",
  "previous_response_id": "<string>",
  "instructions": "<string>",
  "output": [
    {
      "type": "<string>",
      "id": "<string>",
      "content": [
        {
          "type": "<string>",
          "text": "<string>"
        }
      ]
    }
  ],
  "error": {
    "code": "<string>",
    "message": "<string>"
  },
  "tools": [
    {
      "type": "function",
      "name": "<string>",
      "description": "<string>",
      "parameters": {},
      "strict": true
    }
  ],
  "tool_choice": {
    "type": "function",
    "name": "<string>"
  },
  "parallel_tool_calls": true,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "top_p": 123,
  "presence_penalty": 123,
  "frequency_penalty": 123,
  "top_logprobs": 123,
  "temperature": 123,
  "reasoning": {},
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123,
    "input_tokens_details": {
      "cached_tokens": 123
    },
    "output_tokens_details": {
      "reasoning_tokens": 123
    }
  },
  "max_output_tokens": 123,
  "max_tool_calls": 123,
  "store": true,
  "background": true,
  "service_tier": "<string>",
  "prompt_cache_key": "<string>"
}
Generate a unified response block from various input configurations. The Responses API is a higher-level alternative to Chat Completions with built-in tools and optional server-side conversation state.

Create Response

To create a response, use the following endpoint: POST /v1/responses

Request Body

model
string
required
The model ID to use for the response (e.g. "gpt-4o", "gpt-4o-mini"). Only models that advertise /v1/responses in their endpoints array are supported. See Models for the full list.
input
string | array
required
The input for the model. Can be a plain text string or an array of input items (text, image, file, etc.).
instructions
string
A system-level instruction that guides the model’s behavior throughout the response. Equivalent to the system message in Chat Completions.
max_output_tokens
integer
The maximum number of tokens to generate in the output.
temperature
number
Sampling temperature between 0 and 2. Higher values make output more random; lower values make it more deterministic.
stream
boolean
Whether to stream the response back as Server-Sent Events (SSE).
previous_response_id
string
The ID of a previous response to continue from. Enables multi-turn conversations without resending full message history.
tools
array
A list of custom tools the model may call.
tool_choice
string | object
Controls whether and how the model calls tools. Can be "auto", "none", "required", or an object specifying a particular tool.
store
boolean
Whether to store the response server-side so it can be referenced by future requests via previous_response_id. Defaults to true.
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

response = client.responses.create(
    model="gpt-4o",
    input="What is the capital of France?"
)

print(response.output_text)

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

x-api-key
string

Body

application/json
model
string | null
input
previous_response_id
string | null
include
enum<string>[] | null
Available options:
reasoning.encrypted_content,
message.output_text.logprobs
tools
(FunctionToolParam · object | ResponsesToolParam · object)[] | null
tool_choice
text
TextParam · object
temperature
number | null
top_p
number | null
presence_penalty
number | null
frequency_penalty
number | null
parallel_tool_calls
boolean | null
stream
boolean | null

Whether to stream response events as server-sent events.

background
boolean | null

Whether to run the request in the background and return immediately.

max_output_tokens
integer | null
Required range: x >= 16
max_tool_calls
integer | null
Required range: x >= 1
reasoning
ReasoningParam · object
prompt_cache_key
string | null
Maximum string length: 64
truncation
enum<string> | null
Available options:
auto,
disabled
instructions
string | null
store
boolean | null

Whether to store the response so it can be retrieved later.

service_tier
enum<string> | null
Available options:
auto,
default,
flex,
priority
top_logprobs
integer | null
Required range: 0 <= x <= 20

Response

Successful Response

id
string
required

The unique ID of the response that was created.

object
enum<string>
required

The object type, which was always response.

Available options:
response
created_at
integer
required

The Unix timestamp (in seconds) for when the response was created.

completed_at
integer | null
required
status
string
required

The status that was set for the response.

incomplete_details
IncompleteDetails · object
required
model
string
required

The model that generated this response.

previous_response_id
string | null
required
instructions
string | null
required
output
(Message · object | FunctionCall · object | FunctionCallOutput · object | ReasoningBody · object | CompactionBody · object)[]
required

The output items that were generated by the model.

error
Error1 · object
required
tools
Tool · object[]
required

The tools that were available to the model during response generation.

tool_choice
required
truncation
enum<string>
required
Available options:
auto,
disabled
parallel_tool_calls
boolean
required

Whether the model was allowed to call multiple tools in parallel.

text
TextField · object
required
top_p
number
required

The nucleus sampling parameter that was used for this response.

presence_penalty
number
required

The presence penalty that was used to penalize new tokens based on whether they appear in the text so far.

frequency_penalty
number
required

The frequency penalty that was used to penalize new tokens based on their frequency in the text so far.

top_logprobs
integer
required

The number of most likely tokens that were returned at each position, along with their log probabilities.

temperature
number
required

The sampling temperature that was used for this response.

reasoning
Reasoning · object
required
usage
Usage · object
required
max_output_tokens
integer | null
required
max_tool_calls
integer | null
required
store
boolean
required

Whether this response was stored so it can be retrieved later.

background
boolean
required

Whether this request was run in the background.

service_tier
string
required

The service tier that was used for this response.

prompt_cache_key
string | null
required