| # Scripts Directory |
|
|
| This directory contains utility scripts for the Pico training framework. |
|
|
| ## generate_data.py |
| |
| A script to automatically generate `data.json` from training log files for the dashboard. |
| |
| ### What it does |
| |
| This script parses log files from the `runs/` directory and extracts: |
| - **Training metrics**: Loss, learning rate, and inf/NaN counts at each step |
| - **Evaluation results**: Paloma evaluation metrics |
| - **Model configuration**: Architecture parameters (d_model, n_layers, etc.) |
| |
| ### Usage |
| |
| ```bash |
| # Generate data.json from the default runs directory |
| python scripts/generate_data.py |
|
|
| # Specify custom runs directory |
| python scripts/generate_data.py --runs-dir /path/to/runs |
| |
| # Specify custom output file |
| python scripts/generate_data.py --output /path/to/output.json |
| ``` |
| |
| ### How it works |
| |
| 1. **Scans runs directory**: Looks for subdirectories containing training runs |
| 2. **Finds log files**: Locates `.log` files in each run's `logs/` subdirectory |
| 3. **Parses log content**: Uses regex patterns to extract structured data |
| 4. **Generates JSON**: Creates a structured JSON file for the dashboard |
| |
| ### Log Format Requirements |
| |
| The script expects log files with the following format: |
| |
| ``` |
| 2025-08-29 02:09:12 - pico-train - INFO - Step 500 -- π Training Metrics |
| 2025-08-29 02:09:12 - pico-train - INFO - βββ Loss: 10.8854 |
| 2025-08-29 02:09:12 - pico-train - INFO - βββ Learning Rate: 3.13e-06 |
| 2025-08-29 02:09:12 - pico-train - INFO - βββ Inf/NaN count: 0 |
| ``` |
| |
| And evaluation results: |
| |
| ``` |
| 2025-08-29 02:15:26 - pico-train - INFO - Step 1000 -- π Evaluation Results |
| 2025-08-29 02:15:26 - pico-train - INFO - βββ paloma: 7.125172406420199e+27 |
| ``` |
| |
| ### Output Format |
| |
| The generated `data.json` has this structure: |
| |
| ```json |
| { |
| "runs": [ |
| { |
| "run_name": "model-name", |
| "log_file": "log_filename.log", |
| "training_metrics": [ |
| { |
| "step": 0, |
| "loss": 10.9914, |
| "learning_rate": 0.0, |
| "inf_nan_count": 0 |
| } |
| ], |
| "evaluation_results": [ |
| { |
| "step": 1000, |
| "paloma": 59434.76600609756 |
| } |
| ], |
| "config": { |
| "d_model": 96, |
| "n_layers": 12, |
| "max_seq_len": 2048, |
| "vocab_size": 50304, |
| "lr": 0.0003, |
| "max_steps": 200000, |
| "batch_size": 8 |
| } |
| } |
| ], |
| "summary": { |
| "total_runs": 1, |
| "run_names": ["model-name"] |
| } |
| } |
| ``` |
| |
| ### When to use |
|
|
| - **After training**: Generate updated dashboard data |
| - **Adding new runs**: Include new training sessions in the dashboard |
| - **Debugging**: Verify log parsing is working correctly |
| - **Dashboard setup**: Initial setup of the training metrics dashboard |
|
|
| ### Troubleshooting |
|
|
| If the script doesn't find any data: |
| 1. Check that log files exist in `runs/*/logs/` |
| 2. Verify log format matches the expected pattern |
| 3. Ensure log files contain training metrics entries |
| 4. Check file permissions and encoding |
|
|