This section covers Advanced Workflows for experienced users conducting systematic experiments on model scaling, training series, and generating custom datasets. These workflows build on core training and evaluation processes to explore performance trends, such as how model depth affects capabilities and efficiency. They are ideal for researchers tuning hyperparameters, analyzing scaling laws, or creating specialized training data. For foundational training, see Training Base Models. For evaluation metrics like CORE score, see Model Evaluation. Hardware and precision options are detailed in Configuration Reference.
Overview
Advanced Workflows provide automated sequences for running multiple training runs, collecting metrics into summary files, and generating synthetic datasets. Key capabilities include:
- Sweeping model depths in a miniseries to benchmark efficiency.
- Fixed compute budget experiments for scaling laws analysis.
- Creating diverse conversation datasets for chat model finetuning.
Results are saved as CSV files with columns for depths, parameters, training time, validation bits per byte (BPB), and CORE score, plus formatted terminal tables for quick review.
Miniseries Training
Use this workflow to train a series of models with increasing depths (from 12 to 26) and compare their scaling behavior. It automatically handles setup, training, metric extraction, and CSV logging.
Running the Workflow
- Open a terminal in the project directory.
- Run miniseries with an optional series name (e.g., jan11; defaults to current date in lowercase like oct15).
- The workflow performs initial setup (downloads tokenizer data, trains tokenizer unless skipped), then trains each depth sequentially.
- Monitor progress via terminal logs and optional Weights & Biases (wandb) integration.
- At completion, view a formatted table of results and a CSV file in the cache directory under series_name_miniseries_results/results.csv.
[!NOTE]
Set environment variables before running: SKIP_SETUP=1 to bypass tokenizer setup, NPROC_PER_NODE=8 (default) for GPU count per node, WANDB_RUN=series_name_miniseries for logging.
Outputs and Metrics
Training adjusts device batch size automatically (32 for depths <20, 16 for 20-27, 8 for ≥28) to prevent memory issues. Each run logs to a per-depth file and appends to the CSV.
| Field | Description |
|---|---|
| depth | Model depth (12 to 26). |
| model_dim | Embedding dimension (depth × 64). |
| num_params | Total model parameters. |
| num_scaling_params | Parameters in scaling components (transformer matrices, etc.). |
| num_iterations | Training steps completed. |
| tokens_trained | Total tokens seen (iterations × 524288). |
| param_data_ratio | Tokens per scaling parameter (higher is more data-efficient). |
| val_bpb | Final validation bits per byte (lower is better). |
| core_score | Final CORE metric score (higher is better). |
| train_time_sec | Wall-clock time in seconds. |
Example results table (from a sample run):
| depth | model_dim | num_params | core_score | train_time_sec |
|---|---|---|---|---|
| 12 | 768 | 124M | 0.45 | 180 |
| 20 | 1280 | 345M | 0.62 | 450 |
| 26 | 1664 | 582M | 0.71 | 720 |
graph TB
subgraph "Preparation"
A["Run miniseries<br/>with series name"] --> B["Setup tokenizer<br/>(skip with SKIP_SETUP=1)"]
end
subgraph "Training Loop"
B --> C["For each depth 12-26"]
C --> D["Train model<br/>Adjust batch size by depth"]
D --> E["Log to wandb and file<br/>Eval CORE at end"]
end
subgraph "Results"
E --> F["Extract metrics<br/>Append to CSV"]
F -->|All depths complete| G["Display formatted table<br/>CSV: results.csv"]
end
C -.->|Loop| D
Scaling Laws Experiments
This workflow trains models across depths (8 to 20) at fixed compute budgets (FLOPs like 1e18 to 1e19) to study optimal allocation. It skips completed runs and provides detailed parameter breakdowns.
Running the Workflow
- Open a terminal in the project directory.
- Run scaling_laws (uses label like jan26 by default; set LABEL env var).
- It trains only missing combinations, using ~100M tokens for final evaluation.
- Results append to scaling_laws_results_label/results.csv with a terminal table.
[!NOTE]
Customize via NPROC_PER_NODE, WANDB_RUN, EVAL_TOKENS.
Outputs and Metrics
CSV includes granular parameter counts for analysis.
| Field | Description |
|---|---|
| flops_budget | Target FLOPs (e.g., 1e18). |
| depth | Model depth. |
| params_transformer | Transformer matrix parameters. |
| params_total | All parameters. |
| val_bpb | Final validation BPB. |
| core_score | Final CORE score. |
Synthetic Data Generation
Generate diverse multi-turn conversations between users and the model for use in supervised finetuning (SFT). Outputs JSONL files compatible with Training Chat Models.
Running the Workflow
- Set OPENROUTER_API_KEY environment variable.
- Run gen_synthetic_data in the terminal.
- It produces a JSONL file with conversations varying by topic (identity, architecture, etc.), user persona (e.g., curious beginner), dynamics (e.g., skeptical arc), and opening messages.
Each conversation is a structured JSON object with balanced diversity for high-quality SFT data.
Configuration Options
Common to all workflows:
| Setting | Default | Options | What It Controls |
|---|---|---|---|
| NPROC_PER_NODE | 8 | Positive integer | GPUs per node. |
| SKIP_SETUP | Unset (0) | 1 to skip | Tokenizer/dataset prep. |
| WANDB_RUN | Auto-generated | Custom string | Logging project name. |
| SERIES_NAME/LABEL | Date/label | Custom string | Results folder prefix. |
Troubleshooting
| Message | Severity | Meaning |
|---|---|---|
| “Skipping d=X at Y FLOPs (already in results)” | Info | Run exists in CSV; no action needed. |
| “WARNING: Could not extract CORE score for d=X” | Warning | Log missing final eval; check hardware/logs and rerun. |
| “Training d=X: params=…, CORE=Z” | Info | Summary of completed run; use for quick checks. |
[!WARNING]
Large depths may cause out-of-memory; lower device batch size manually if needed.
Summary
- Miniseries automates depth sweeps (12-26), outputs CSV with CORE, BPB, time for efficiency plots; see results table.
- Scaling Laws tests fixed FLOPs budgets across depths, skips duplicates, detailed params for analysis.
- Synthetic Data creates diverse JSONL conversations via API for SFT; requires API key.
- Integrates with Training Base Models, Model Evaluation; customize hardware in Configuration Reference. For chatting results, use Chatting with Models.