Skip to main content
Multimodal inputs let you send more than plain text to a model. Routeway supports images, PDFs/documents, and audio through the same OpenAI-compatible endpoint — the model can read, describe, and reason over all of them in a single request.
Vision and multimodal support depends on the model, not just the endpoint. Models like gpt-4o, gpt-4o-mini, claude-opus-4-5, and gemini-2.5-pro support image and document inputs. Check the Models page for per-model capability details.

Images

Pass an image by URL or as a base64-encoded data URI inside a content array. The model can describe, analyze, compare, or extract information from images.
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image? Describe it in detail."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png"
                    }
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)

Base64 Images

For local files or private images that aren’t publicly accessible via URL, encode them as base64 and embed them directly.
import os
import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe any UI issues you see in this screenshot."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}"
                    }
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)

Image Detail Level

Control how much detail the model uses when processing an image. Higher detail increases token usage and cost.
{
    "type": "image_url",
    "image_url": {
        "url": "https://example.com/chart.png",
        "detail": "high"   # "low", "high", or "auto" (default)
    }
}
ValueBehaviorBest for
"auto"Model chooses based on image sizeGeneral use
"low"Fast, cheap — 85 tokens fixedThumbnails, simple images
"high"Tiles the image for fine detailCharts, documents, dense text

Multiple Images

Pass several images in the same message to compare, diff, or analyze them together.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What are the differences between these two designs?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/design-v1.png"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/design-v2.png"}},
            ]
        }
    ],
)

PDFs and Documents

Send PDF files as base64-encoded data for the model to read and reason over. Useful for contract analysis, document Q&A, and data extraction.
import os
import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeway.ai/v1",
    api_key=os.getenv("ROUTEWAY_API_KEY")
)

with open("contract.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Summarize the key obligations and termination clauses in this contract."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:application/pdf;base64,{pdf_data}"
                    }
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)
PDF support depends on the model. gpt-4o and claude-opus-4-5 handle multi-page PDFs well. For very large documents, consider extracting the relevant pages first to reduce token cost.

Multi-turn Vision Conversations

Images persist in the conversation history just like text messages. The model can refer back to a previously sent image in later turns.
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Here is a chart of our Q3 sales data."},
            {"type": "image_url", "image_url": {"url": "https://example.com/q3-chart.png"}}
        ]
    },
    {
        "role": "assistant",
        "content": "The chart shows a strong uptick in September with a peak of $2.4M..."
    },
    {
        "role": "user",
        "content": "Which month had the lowest performance and what might explain it?"
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

Supported Formats

TypeFormatsNotes
ImagesPNG, JPEG, WebP, GIFGIF uses only the first frame
DocumentsPDFVia base64 data URI
Max image size20 MB per imageResize large images before sending
Max images per request10–20Model-dependent

Best Practices

Large images increase token usage significantly with little quality gain. Resize to 1024×1024 or smaller before encoding. The model doesn’t need 4K resolution to understand content.
When you only need a rough description or classification, use "detail": "low". It uses a fixed 85 tokens regardless of image size and returns much faster.
Tell the model exactly what to look for. Vague prompts like “describe this” produce generic results. “List every line item and its price from this receipt” produces structured, useful output.
Pair vision with a JSON schema to extract structured data from images. Ideal for receipts, invoices, forms, and screenshots.
from pydantic import BaseModel

class Invoice(BaseModel):
    vendor: str
    total: float
    line_items: list[str]
    due_date: str | None = None

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract the invoice details."},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
            ]
        }
    ],
    response_format=Invoice,
)

invoice = response.choices[0].message.parsed