Chatting with Models — nanochat Guide

This section covers interactive chatting with your trained models using either a web-based user interface or a command-line interface (CLI). It’s designed for end users who have completed model training or evaluation and want to test conversational capabilities in real time. Chatting leverages efficient generation with key-value caching for faster responses. For preparing models to chat, see Training Chat Models and Model Evaluation. For hardware setup that affects performance here, see Getting Started and Configuration Reference.

Overview

The chatting features provide two ways to interact with models: a Web Chat UI resembling familiar chat applications (like ChatGPT) for browser-based use, and a CLI Chat for terminal-based sessions. Both support multi-turn conversations, streaming responses (text appears as generated), and sampling controls like temperature and top-k for varied outputs. Conversations use special tokens to structure user/assistant exchanges. Start a new session anytime, and clear history as needed.

Web Chat UI

Access the Web Chat UI by launching the web server and opening the provided URL in your browser (e.g., http://0.0.0.0:8000 or a public IP for remote access). The interface features a clean, responsive design with a header, scrollable chat area, and bottom input bar.

Interface Elements

Header: Contains the nanochat title and New Conversation button (plus icon; shortcut: Ctrl+Shift+N). Clicking it clears the current chat and starts fresh.
Chat Area: Displays messages in sequence. User messages appear on the right in light gray bubbles. Assistant responses stream in real-time on the left (plain text style, hover for subtle highlight).
Input Area:
- Chat input textarea (placeholder: Ask anything). Supports multi-line text; expands up to 200px height. Focus highlights border in blue.
- Send button (arrow icon). Enabled when input has text; disabled otherwise. Hover darkens; click or Enter (with Shift+Enter for new line) sends.
A typing indicator (···) shows during generation.

Starting and Managing Conversations

Launch the server (see ## Launch Options below).
Open the printed URL (e.g., http://YOUR-IP:8000).
Type in the chat input and press Send or Enter.
Watch the Assistant response stream in.
Repeat for multi-turn chat; history is maintained until New Conversation.
Use New Conversation to reset.

[!NOTE]
Conversations are session-based in the browser. Refreshing reloads a blank chat.

Limits and Validation

The UI enforces safeguards to prevent overload:

Limit	Value	Effect
Messages per request	500 max	Exceeding shows error: “Too many messages. Maximum 500 messages allowed per request”
Characters per message	8000 max	Exceeding shows error for that message: “Message X is too long. Maximum 8000 characters allowed per message”
Total conversation length	32000 characters max	Exceeding shows error: “Total conversation is too long. Maximum 32000 characters allowed”
Roles	user, assistant, system only	Invalid role shows error: “Message X has invalid role. Must be ‘user’, ‘assistant’, or ‘system’”
Temperature	0.0–2.0	Out-of-range clamped or rejected
Top-k	0–200 (0 = full vocab)	Out-of-range clamped or rejected
Max tokens	1–4096	Out-of-range clamped or rejected

Errors appear as red banners below the input.

graph TB
    subgraph "Chat Session"
        Start["Open UI at http://IP:8000"] --> Input["Type in **chat input**"]
        Input -->|"Press **Send** or Enter"| Generate["**Assistant** response streams in<br/>Typing indicator shows"]
        Generate --> Display["Message added to chat area"]
        Display --> Loop["Continue typing"]
        Loop --> Input
    end
    subgraph "Controls"
        NewBtn["Click **New Conversation**<br/>(*Ctrl+Shift+N*)"] -->|"Clears history"| Start
    end
    NewBtn -.-> Start

CLI Chat

Run the CLI for terminal-based chatting. It supports interactive multi-turn sessions or single-prompt responses.

Interface and Commands

Prompts as User: (your input).
Responses stream as Assistant: (text appears token-by-token).
Special commands: | Command | Effect | |———|——–| | quit or exit | Ends session | | clear | Resets conversation history | | Empty input | Skips turn |

Starting Sessions

Launch with defaults for interactive mode (see ## Launch Options).
Type at User: prompt.
Response streams at Assistant:.
Continue or use commands.

For one-shot: Provide –prompt value; gets single response and exits.

graph LR
    Start["Launch CLI"] --> Interactive["**User:** prompt appears"]
    Interactive --> Input["Type message"]
    Input -->|"quit/exit"| End["Goodbye!"]
    Input -->|"clear"| Reset["History cleared<br/>New **User:** prompt"]
    Input -->|"Enter"| Generate["**Assistant:** streams response"]
    Generate --> Interactive
    Prompt["Use --prompt for single response"] --> Generate --> End
    Reset -.-> Interactive

Launch Options

Both tools load models from sft or rl sources (see Training Chat Models). Use tags/steps from checkpoints.

Web Server Options

Console shows: worker init, access URL, health/stats endpoints (GET /health, GET /stats).

CLI Options

Troubleshooting

Common issues appear in browser errors, console output, or terminal.

Message	Severity	Meaning
“Too many messages. Maximum 500 messages allowed per request”	Error	Web request exceeded message limit; shorten history or start new conversation
“Message X is too long. Maximum 8000 characters allowed per message”	Error	Specific message too long; edit and resend
“Total conversation is too long. Maximum 32000 characters allowed”	Error	Full history too long; use New Conversation
“At least one message is required” or “Message X has empty content”	Error	Empty submission; add text to chat input
“Initializing worker pool with X GPUs…” followed by per-GPU loads	Info	Normal startup; wait for “All workers initialized!”
No response or slow generation	Warning	Check hardware (Configuration Reference); try cpu if GPU issues; reduce max-tokens
“Goodbye!”	Info	Normal CLI exit via quit/exit or –prompt

[!WARNING]
Long contexts may slow down or hit limits; clear often. Multiple GPUs require CUDA.

Summary

Use Web Chat UI for browser-friendly, streaming multi-turn chats with new conversation controls and built-in limits.
Use CLI Chat for terminal sessions with clear, quit, or single –prompt responses.
Customize generation via launch options like –temperature, –top-k, and model selection (–source, –model-tag).
Access web via printed URL; monitor server console for status.
For model prep, see Training Chat Models and Model Evaluation. Tune hardware in Configuration Reference. Explore advanced use in Advanced Workflows.

Generated by ESX Wiki