The parameters described here directly affect your API costs and response quality. Misconfiguration can lead to unexpected bills or poor user experiences.
Core Parameters Overview
The ClassAI chat system operates through six fundamental parameters that work together to create coherent, contextual conversations. Each parameter serves a specific purpose in shaping how the AI model generates responses.Generation Control
max_tokens
, top_p
- Control response length and creativityRepetition Management
frequency_penalty
, presence_penalty
- Prevent repetitive outputsMessages Structure
Themessages
array is the backbone of every conversation. It maintains the complete context and history that the AI uses to generate relevant responses.
Message Format
The messages array is processed sequentially. The AI model sees the entire conversation history with each request, so longer conversations consume more tokens.
Token Consumption Pattern
Each message in the array consumes tokens based on its content length. The model processes:- System message: Sets behavior context (processed once per conversation)
- User messages: Direct input requiring responses
- Assistant messages: Previous AI responses that inform future outputs
Roles Explained
Therole
field defines the message source and fundamentally changes how the model interprets content.
system
system
Purpose: Establishes the AI’s behavior, personality, and operational constraints.Best Practices:
- Keep system messages concise but specific
- Define output format requirements here
- Set behavioral boundaries and expertise areas
user
user
Purpose: Represents human input and queries to the AI system.Best Practices:
- Be specific about desired output format
- Include relevant context within the message
- Break complex requests into smaller parts
assistant
assistant
Purpose: Contains AI-generated responses that become part of conversation context.Usage Notes:
- Automatically added by ClassAI after each response
- Can be manually included for few-shot prompting
- Affects the model’s understanding of conversation flow
Response Generation Controls
max_tokens
Controls the maximum length of generated responses, directly impacting both cost and completeness.Token Estimation: Roughly 4 characters = 1 token in English. A 300-token response equals approximately 1,200 characters or 200-250 words.
max_tokens
to 1000 when you only need 200-word responses increases your costs by 5x unnecessarily.
top_p (Nucleus Sampling)
Controls response creativity and randomness through probability distribution sampling.Value | Behavior | Use Case |
---|---|---|
0.1 | Highly deterministic, focused responses | Code generation, factual queries |
0.5 | Balanced creativity and consistency | General conversation |
0.9 | Creative, varied responses | Creative writing, brainstorming |
Example Configuration
Values above
0.95
can produce incoherent responses. Values below 0.05
may result in repetitive, robotic outputs.Repetition Prevention
frequency_penalty
Reduces the likelihood of repeating frequently used tokens within the current response. Range:-2.0
to 2.0
Recommended: 0.0
to 0.8
for most applications
0.0
: Natural repetition allowed0.5
: Moderate repetition reduction1.0
: Strong avoidance of repeated words
presence_penalty
Encourages the model to introduce new topics and concepts rather than rehashing existing ones. Range:-2.0
to 2.0
Recommended: 0.0
to 0.6
for most applications
Combined Usage: Use both penalties together for optimal results. Try
frequency_penalty: 0.3
with presence_penalty: 0.2
as a starting point.FAQ
Responses Cut Off Mid-Sentence
Responses Cut Off Mid-Sentence
Cause:
max_tokens
set too low for the requested content complexity.Solution: Increase max_tokens
or simplify the request scope.Repetitive or Robotic Responses
Repetitive or Robotic Responses
Cause:
top_p
too low or penalties set to negative values.Solution: Increase top_p
to 0.5-0.7 and set positive penalty values.Inconsistent Response Quality
Inconsistent Response Quality
Cause: Missing or vague system message context.Solution: Provide detailed system instructions with examples.