Skip to main content
Understanding the core mechanics behind ClassAI’s chat system is crucial for building robust AI applications. This guide breaks down the key parameters that control conversation flow, response quality, and model behavior.
The parameters described here directly affect your API costs and response quality. Misconfiguration can lead to unexpected bills or poor user experiences.

Core Parameters Overview

The ClassAI chat system operates through six fundamental parameters that work together to create coherent, contextual conversations. Each parameter serves a specific purpose in shaping how the AI model generates responses.

Generation Control

max_tokens, top_p - Control response length and creativity

Repetition Management

frequency_penalty, presence_penalty - Prevent repetitive outputs

Messages Structure

The messages array is the backbone of every conversation. It maintains the complete context and history that the AI uses to generate relevant responses.

Message Format

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful AI assistant specialized in technical documentation."
    },
    {
      "role": "user",
      "content": "Explain how JWT tokens work"
    },
    {
      "role": "assistant",
      "content": "JWT (JSON Web Tokens) are a compact, URL-safe means of representing claims..."
    }
  ]
}
The messages array is processed sequentially. The AI model sees the entire conversation history with each request, so longer conversations consume more tokens.

Token Consumption Pattern

Each message in the array consumes tokens based on its content length. The model processes:
  • System message: Sets behavior context (processed once per conversation)
  • User messages: Direct input requiring responses
  • Assistant messages: Previous AI responses that inform future outputs

Roles Explained

The role field defines the message source and fundamentally changes how the model interprets content.
Purpose: Establishes the AI’s behavior, personality, and operational constraints.Best Practices:
  • Keep system messages concise but specific
  • Define output format requirements here
  • Set behavioral boundaries and expertise areas
{
  "role": "system",
  "content": "You are a senior software engineer. Provide code examples with explanations. Always include error handling in your solutions."
}
Purpose: Represents human input and queries to the AI system.Best Practices:
  • Be specific about desired output format
  • Include relevant context within the message
  • Break complex requests into smaller parts
{
  "role": "user",
  "content": "Write a Python function to validate email addresses using regex. Include docstring and type hints."
}
Purpose: Contains AI-generated responses that become part of conversation context.Usage Notes:
  • Automatically added by ClassAI after each response
  • Can be manually included for few-shot prompting
  • Affects the model’s understanding of conversation flow
{
  "role": "assistant",
  "content": "Here's a robust email validation function:\n\n```python\nimport re\nfrom typing import bool\n\ndef validate_email(email: str) -> bool:\n    \"\"\"Validate email address using RFC-compliant regex pattern.\"\"\"\n    # Implementation details...\n```"
}

Response Generation Controls

max_tokens

Controls the maximum length of generated responses, directly impacting both cost and completeness.
{
  "max_tokens": 100,
  "messages": [...]
}
Token Estimation: Roughly 4 characters = 1 token in English. A 300-token response equals approximately 1,200 characters or 200-250 words.
Cost Impact: Setting max_tokens to 1000 when you only need 200-word responses increases your costs by 5x unnecessarily.

top_p (Nucleus Sampling)

Controls response creativity and randomness through probability distribution sampling.
ValueBehaviorUse Case
0.1Highly deterministic, focused responsesCode generation, factual queries
0.5Balanced creativity and consistencyGeneral conversation
0.9Creative, varied responsesCreative writing, brainstorming
Example Configuration
{
  "top_p": 0.7,
  "messages": [...],
  "max_tokens": 400
}
Values above 0.95 can produce incoherent responses. Values below 0.05 may result in repetitive, robotic outputs.

Repetition Prevention

frequency_penalty

Reduces the likelihood of repeating frequently used tokens within the current response. Range: -2.0 to 2.0 Recommended: 0.0 to 0.8 for most applications
{
  "frequency_penalty": 0.3,
  "messages": [...]
}
Effect Demonstration:
  • 0.0: Natural repetition allowed
  • 0.5: Moderate repetition reduction
  • 1.0: Strong avoidance of repeated words

presence_penalty

Encourages the model to introduce new topics and concepts rather than rehashing existing ones. Range: -2.0 to 2.0 Recommended: 0.0 to 0.6 for most applications
{
  "presence_penalty": 0.4,
  "messages": [...]
}
Combined Usage: Use both penalties together for optimal results. Try frequency_penalty: 0.3 with presence_penalty: 0.2 as a starting point.

FAQ

Cause: max_tokens set too low for the requested content complexity.Solution: Increase max_tokens or simplify the request scope.
// Instead of this:
{ "max_tokens": 50, "content": "Write a detailed explanation..." }

// Use this:
{ "max_tokens": 300, "content": "Write a detailed explanation..." }
Cause: top_p too low or penalties set to negative values.Solution: Increase top_p to 0.5-0.7 and set positive penalty values.
{
  "top_p": 0.6,
  "frequency_penalty": 0.3,
  "presence_penalty": 0.2
}
Cause: Missing or vague system message context.Solution: Provide detailed system instructions with examples.
{
  "role": "system",
  "content": "You are an expert in API documentation. Always include: 1) Clear parameter descriptions, 2) Code examples, 3) Expected responses. Format responses in markdown."
}
I