messages array — an ordered list of turns that represents the full conversation context. The model reads the entire array on each call and generates the next turn.
Message Structure
Each message object has two required fields:| Field | Type | Description |
|---|---|---|
role | string | Who sent the message: system, user, or assistant |
content | string | array | The message text, or a mixed array of text and media objects |
Roles
system
Sets the model’s behavior, persona, and constraints for the entire conversation. Processed before any user message, giving it the highest priority.
system to:
- Define the model’s persona or tone
- Restrict the model to a specific domain
- Provide background context or instructions that apply globally
user
A message from the end user (or your application acting as the user). This is the primary way to send instructions and content to the model.
assistant
A message generated by the model. When building multi-turn conversations, append the model’s previous responses as assistant messages so it can refer back to them.
You can also inject
assistant messages manually — for example, to prime the model with a specific tone or continue a previous conversation without replaying all prior turns.Multi-turn Conversations
To maintain context across turns, include all previous messages in each new request. The model has no memory between API calls — themessages array is the memory.
- Python
- Node.js
- cURL
Rich Content Messages
Thecontent field can be an array of typed objects instead of a plain string. This is how you pass images, PDFs, and mixed media alongside text.
Common Patterns
Persona injection
Persona injection
Give the model a name and personality via the
system message. Consistent persona instructions reduce drift in long conversations.Few-shot examples
Few-shot examples
Prepend example
user/assistant pairs to show the model the exact format you want.Context injection
Context injection
Inject retrieved documents or data into a
user or system message before the question. This is the foundation of RAG (retrieval-augmented generation).Context Window Limits
Themessages array is bounded by the model’s context window — the maximum number of tokens it can process in a single request. Exceeding this limit causes a 400 error.
For long conversations, trim old messages from the middle of the array (keeping the system message and the most recent turns) or use Prompt Caching to reduce costs when the prefix is stable.
See Tokens & Context for more detail.