Multimodal Inputs - Routeway Docs

Multimodal inputs let you send more than plain text to a model. Routeway supports images, PDFs/documents, and audio through the same OpenAI-compatible endpoint — the model can read, describe, and reason over all of them in a single request.

Vision and multimodal support depends on the model, not just the endpoint. Models like gpt-4o, gpt-4o-mini, claude-opus-4-5, and gemini-2.5-pro support image and document inputs. Check the Models page for per-model capability details.

Images

Pass an image by URL or as a base64-encoded data URI inside a content array. The model can describe, analyze, compare, or extract information from images.

Python
Node.js
cURL

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image? Describe it in detail."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png"
                    }
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.routeway.ai/v1",
  apiKey: process.env.ROUTEWAY_API_KEY,
});

async function main() {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "What's in this image? Describe it in detail.",
          },
          {
            type: "image_url",
            image_url: {
              url: "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png",
            },
          },
        ],
      },
    ],
  });

  console.log(response.choices[0].message.content);
}

main().catch(console.error);

curl https://api.routeway.ai/v1/chat/completions \
  -H "Authorization: Bearer $ROUTEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.png"
            }
          }
        ]
      }
    ]
  }'

Base64 Images

For local files or private images that aren’t publicly accessible via URL, encode them as base64 and embed them directly.

import os
import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe any UI issues you see in this screenshot."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}"
                    }
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)

Image Detail Level

Control how much detail the model uses when processing an image. Higher detail increases token usage and cost.

{
    "type": "image_url",
    "image_url": {
        "url": "https://example.com/chart.png",
        "detail": "high"   # "low", "high", or "auto" (default)
    }
}

Value	Behavior	Best for
`"auto"`	Model chooses based on image size	General use
`"low"`	Fast, cheap — 85 tokens fixed	Thumbnails, simple images
`"high"`	Tiles the image for fine detail	Charts, documents, dense text

Multiple Images

Pass several images in the same message to compare, diff, or analyze them together.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What are the differences between these two designs?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/design-v1.png"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/design-v2.png"}},
            ]
        }
    ],
)

PDFs and Documents

Send PDF files as base64-encoded data for the model to read and reason over. Useful for contract analysis, document Q&A, and data extraction.

Python
Node.js

import os
import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

with open("contract.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Summarize the key obligations and termination clauses in this contract."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:application/pdf;base64,{pdf_data}"
                    }
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  baseURL: "https://api.routeway.ai/v1",
  apiKey: process.env.ROUTEWAY_API_KEY,
});

async function main() {
  const pdfBuffer = fs.readFileSync("contract.pdf");
  const pdfBase64 = pdfBuffer.toString("base64");

  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Summarize the key obligations and termination clauses in this contract.",
          },
          {
            type: "image_url",
            image_url: {
              url: `data:application/pdf;base64,${pdfBase64}`,
            },
          },
        ],
      },
    ],
  });

  console.log(response.choices[0].message.content);
}

main().catch(console.error);

PDF support depends on the model. gpt-4o and claude-opus-4-5 handle multi-page PDFs well. For very large documents, consider extracting the relevant pages first to reduce token cost.

Multi-turn Vision Conversations

Images persist in the conversation history just like text messages. The model can refer back to a previously sent image in later turns.

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Here is a chart of our Q3 sales data."},
            {"type": "image_url", "image_url": {"url": "https://example.com/q3-chart.png"}}
        ]
    },
    {
        "role": "assistant",
        "content": "The chart shows a strong uptick in September with a peak of $2.4M..."
    },
    {
        "role": "user",
        "content": "Which month had the lowest performance and what might explain it?"
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

Supported Formats

Type	Formats	Notes
Images	PNG, JPEG, WebP, GIF	GIF uses only the first frame
Documents	PDF	Via base64 data URI
Max image size	20 MB per image	Resize large images before sending
Max images per request	10–20	Model-dependent

Best Practices

Resize before sending

Large images increase token usage significantly with little quality gain. Resize to 1024×1024 or smaller before encoding. The model doesn’t need 4K resolution to understand content.

Use low detail for quick checks

When you only need a rough description or classification, use "detail": "low". It uses a fixed 85 tokens regardless of image size and returns much faster.

Be specific in your text prompt

Tell the model exactly what to look for. Vague prompts like “describe this” produce generic results. “List every line item and its price from this receipt” produces structured, useful output.

Combine with Structured Outputs

Pair vision with a JSON schema to extract structured data from images. Ideal for receipts, invoices, forms, and screenshots.

from pydantic import BaseModel

class Invoice(BaseModel):
    vendor: str
    total: float
    line_items: list[str]
    due_date: str | None = None

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract the invoice details."},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
            ]
        }
    ],
    response_format=Invoice,
)

invoice = response.choices[0].message.parsed

​Images

​Base64 Images

​Image Detail Level

​Multiple Images

​PDFs and Documents

​Multi-turn Vision Conversations

​Supported Formats

​Best Practices

Images

Base64 Images

Image Detail Level

Multiple Images

PDFs and Documents

Multi-turn Vision Conversations

Supported Formats

Best Practices