How the Chat Actually Works

Understanding the core mechanics behind ClassAI’s chat system is crucial for building robust AI applications. This guide breaks down the key parameters that control conversation flow, response quality, and model behavior.

The parameters described here directly affect your API costs and response quality. Misconfiguration can lead to unexpected bills or poor user experiences.

Core Parameters Overview

The ClassAI chat system operates through six fundamental parameters that work together to create coherent, contextual conversations. Each parameter serves a specific purpose in shaping how the AI model generates responses.

Generation Control

max_tokens, top_p - Control response length and creativity

Repetition Management

frequency_penalty, presence_penalty - Prevent repetitive outputs

Messages Structure

The messages array is the backbone of every conversation. It maintains the complete context and history that the AI uses to generate relevant responses.

Message Format

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful AI assistant specialized in technical documentation."
    },
    {
      "role": "user",
      "content": "Explain how JWT tokens work"
    },
    {
      "role": "assistant",
      "content": "JWT (JSON Web Tokens) are a compact, URL-safe means of representing claims..."
    }
  ]
}

The messages array is processed sequentially. The AI model sees the entire conversation history with each request, so longer conversations consume more tokens.

Token Consumption Pattern

Each message in the array consumes tokens based on its content length. The model processes:

System message: Sets behavior context (processed once per conversation)
User messages: Direct input requiring responses
Assistant messages: Previous AI responses that inform future outputs

Roles Explained

The role field defines the message source and fundamentally changes how the model interprets content.

system

Purpose: Establishes the AI’s behavior, personality, and operational constraints.Best Practices:

Keep system messages concise but specific
Define output format requirements here
Set behavioral boundaries and expertise areas

{
  "role": "system",
  "content": "You are a senior software engineer. Provide code examples with explanations. Always include error handling in your solutions."
}

user

Purpose: Represents human input and queries to the AI system.Best Practices:

Be specific about desired output format
Include relevant context within the message
Break complex requests into smaller parts

{
  "role": "user",
  "content": "Write a Python function to validate email addresses using regex. Include docstring and type hints."
}

assistant

Purpose: Contains AI-generated responses that become part of conversation context.Usage Notes:

Automatically added by ClassAI after each response
Can be manually included for few-shot prompting
Affects the model’s understanding of conversation flow

{
  "role": "assistant",
  "content": "Here's a robust email validation function:\n\n```python\nimport re\nfrom typing import bool\n\ndef validate_email(email: str) -> bool:\n    \"\"\"Validate email address using RFC-compliant regex pattern.\"\"\"\n    # Implementation details...\n```"
}

Response Generation Controls

max_tokens

Controls the maximum length of generated responses, directly impacting both cost and completeness.

{
  "max_tokens": 100,
  "messages": [...]
}

Token Estimation: Roughly 4 characters = 1 token in English. A 300-token response equals approximately 1,200 characters or 200-250 words.

Cost Impact: Setting max_tokens to 1000 when you only need 200-word responses increases your costs by 5x unnecessarily.

top_p (Nucleus Sampling)

Controls response creativity and randomness through probability distribution sampling.

Value	Behavior	Use Case
`0.1`	Highly deterministic, focused responses	Code generation, factual queries
`0.5`	Balanced creativity and consistency	General conversation
`0.9`	Creative, varied responses	Creative writing, brainstorming

Example Configuration

{
  "top_p": 0.7,
  "messages": [...],
  "max_tokens": 400
}

Values above 0.95 can produce incoherent responses. Values below 0.05 may result in repetitive, robotic outputs.

Repetition Prevention

frequency_penalty

Reduces the likelihood of repeating frequently used tokens within the current response. Range: -2.0 to 2.0 Recommended: 0.0 to 0.8 for most applications

{
  "frequency_penalty": 0.3,
  "messages": [...]
}

Effect Demonstration:

0.0: Natural repetition allowed
0.5: Moderate repetition reduction
1.0: Strong avoidance of repeated words

presence_penalty

Encourages the model to introduce new topics and concepts rather than rehashing existing ones. Range: -2.0 to 2.0 Recommended: 0.0 to 0.6 for most applications

{
  "presence_penalty": 0.4,
  "messages": [...]
}

Combined Usage: Use both penalties together for optimal results. Try frequency_penalty: 0.3 with presence_penalty: 0.2 as a starting point.

FAQ

Responses Cut Off Mid-Sentence

Cause: max_tokens set too low for the requested content complexity.Solution: Increase max_tokens or simplify the request scope.

// Instead of this:
{ "max_tokens": 50, "content": "Write a detailed explanation..." }

// Use this:
{ "max_tokens": 300, "content": "Write a detailed explanation..." }

Repetitive or Robotic Responses

Cause: top_p too low or penalties set to negative values.Solution: Increase top_p to 0.5-0.7 and set positive penalty values.

{
  "top_p": 0.6,
  "frequency_penalty": 0.3,
  "presence_penalty": 0.2
}

Inconsistent Response Quality

Cause: Missing or vague system message context.Solution: Provide detailed system instructions with examples.

{
  "role": "system",
  "content": "You are an expert in API documentation. Always include: 1) Clear parameter descriptions, 2) Code examples, 3) Expected responses. Format responses in markdown."
}

Get Started

Core Concepts

API Reference

Changelog

How the Chat Actually Works

Core Parameters Overview

Generation Control

Repetition Management

Messages Structure

Message Format

Token Consumption Pattern

Roles Explained

Response Generation Controls

max_tokens

top_p (Nucleus Sampling)

Repetition Prevention

frequency_penalty

presence_penalty

FAQ

Get Started

Core Concepts

API Reference

Changelog

​Core Parameters Overview

Generation Control

Repetition Management

​Messages Structure

​Message Format

​Token Consumption Pattern

​Roles Explained

​Response Generation Controls

​max_tokens

​top_p (Nucleus Sampling)

​Repetition Prevention

​frequency_penalty

​presence_penalty

​FAQ

Core Parameters Overview

Messages Structure

Message Format

Token Consumption Pattern

Roles Explained

Response Generation Controls

max_tokens

top_p (Nucleus Sampling)

Repetition Prevention

frequency_penalty

presence_penalty

FAQ