Send images, PDFs, and audio alongside text in a single request.
Multimodal inputs let you send more than plain text to a model. Routeway supports images, PDFs/documents, and audio through the same OpenAI-compatible endpoint — the model can read, describe, and reason over all of them in a single request.
Vision and multimodal support depends on the model, not just the endpoint. Models like gpt-4o, gpt-4o-mini, claude-opus-4-5, and gemini-2.5-pro support image and document inputs. Check the Models page for per-model capability details.
Pass an image by URL or as a base64-encoded data URI inside a content array. The model can describe, analyze, compare, or extract information from images.
import OpenAI from "openai";import fs from "fs";const client = new OpenAI({ baseURL: "https://api.routeway.ai/v1", apiKey: process.env.ROUTEWAY_API_KEY,});async function main() { const pdfBuffer = fs.readFileSync("contract.pdf"); const pdfBase64 = pdfBuffer.toString("base64"); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [ { role: "user", content: [ { type: "text", text: "Summarize the key obligations and termination clauses in this contract.", }, { type: "image_url", image_url: { url: `data:application/pdf;base64,${pdfBase64}`, }, }, ], }, ], }); console.log(response.choices[0].message.content);}main().catch(console.error);
PDF support depends on the model. gpt-4o and claude-opus-4-5 handle multi-page PDFs well. For very large documents, consider extracting the relevant pages first to reduce token cost.
Images persist in the conversation history just like text messages. The model can refer back to a previously sent image in later turns.
messages = [ { "role": "user", "content": [ {"type": "text", "text": "Here is a chart of our Q3 sales data."}, {"type": "image_url", "image_url": {"url": "https://example.com/q3-chart.png"}} ] }, { "role": "assistant", "content": "The chart shows a strong uptick in September with a peak of $2.4M..." }, { "role": "user", "content": "Which month had the lowest performance and what might explain it?" }]response = client.chat.completions.create( model="gpt-4o", messages=messages,)
Large images increase token usage significantly with little quality gain. Resize to 1024×1024 or smaller before encoding. The model doesn’t need 4K resolution to understand content.
Use low detail for quick checks
When you only need a rough description or classification, use "detail": "low". It uses a fixed 85 tokens regardless of image size and returns much faster.
Be specific in your text prompt
Tell the model exactly what to look for. Vague prompts like “describe this” produce generic results. “List every line item and its price from this receipt” produces structured, useful output.
Combine with Structured Outputs
Pair vision with a JSON schema to extract structured data from images. Ideal for receipts, invoices, forms, and screenshots.