---
license: apache-2.0
language:
- en
tags:
- multimodal
- vision-language
- openvino
- optimum-intel
- testing
- tiny-model
- minicpmo
base_model: openbmb/MiniCPM-o-2_6
library_name: transformers
pipeline_tag: image-text-to-text
---

# Tiny Random MiniCPM-o-2_6

A tiny (~42 MB) randomly-initialized version of [MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6) designed for **testing purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library.

## Purpose

This model was created to replace the existing test model at `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` (185 MB) with a smaller alternative for CI/CD testing. Smaller test models reduce:

- Download times in CI pipelines
- Storage requirements
- Test execution time

## Size Comparison

| Model | Total Size | Model Weights |
|-------|------------|---------------|
| [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (Original) | 17.4 GB | ~17 GB |
| [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) (Current Test Model) | 185 MB | 169 MB |
| **hrithik-dev8/tiny-random-MiniCPM-o-2_6** (This Model) | **~42 MB** | **41.55 MB** |

**Result: 4× smaller than Intel's current test model**

## Model Configuration

| Component | This Model | Original |
|-----------|------------|----------|
| **Vocabulary** | 5,000 tokens | 151,700 tokens |
| **LLM Hidden Size** | 128 | 3,584 |
| **LLM Layers** | 1 | 40 |
| **LLM Attention Heads** | 8 | 28 |
| **Vision Hidden Size** | 128 | 1,152 |
| **Vision Layers** | 1 | 27 |
| **Image Size** | 980 (preserved) | 980 |
| **Patch Size** | 14 (preserved) | 14 |
| **Audio d_model** | 64 | 1,280 |
| **TTS Hidden Size** | 128 | - |

## Parameter Breakdown

| Component | Parameters | Size (MB) |
|-----------|------------|-----------|
| TTS/DVAE | 19,339,766 | 36.89 |
| LLM | 1,419,840 | 2.71 |
| Vision | 835,328 | 1.59 |
| Resampler | 91,392 | 0.17 |
| Audio | 56,192 | 0.11 |
| Other | 20,736 | 0.04 |
| **Total** | **21,763,254** | **~41.5** |

## Technical Details

### Why Keep TTS/DVAE Components?

The TTS (Text-to-Speech) component, which includes the DVAE (Discrete Variational Auto-Encoder), accounts for approximately 37 MB (~85%) of the model size. While the optimum-intel tests do **not** exercise TTS functionality (they only test image+text → text generation), we retain this component because:

1. **Structural Consistency**: Removing TTS via `init_tts=False` causes structural differences in the model that lead to numerical divergence between PyTorch and OpenVINO outputs
2. **Test Compatibility**: The `test_compare_to_transformers` test compares PyTorch vs OpenVINO outputs and requires exact structural matching
3. **Architecture Integrity**: The MiniCPM-o architecture expects TTS weights to be present during model loading

### Tokenizer Shrinking

The vocabulary was reduced from 151,700 to 5,000 tokens:

- **Base tokens**: IDs 0-4899 (first 4,900 most common tokens)
- **Special tokens**: IDs 4900-4949 (remapped from original high IDs)
- **BPE merges**: Filtered from 151,387 to 4,644 (only merges involving retained tokens)

Key token mappings:
| Token | ID |
|-------|-----|
| `<unk>` | 4900 |
| `<\|endoftext\|>` | 4901 |
| `<\|im_start\|>` | 4902 |
| `<\|im_end\|>` | 4903 |

### Reproducibility

Model weights are initialized with a fixed random seed (42) to ensure:
- Reproducible outputs between runs
- Consistent behavior between PyTorch and OpenVINO
- Passing of `test_compare_to_transformers` which compares framework outputs

## Test Results

Tested with `pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v`:

| Test | Status | Notes |
|------|--------|-------|
| `test_compare_to_transformers` | ✅ PASSED | PyTorch/OpenVINO outputs match |
| `test_generate_utils` | ✅ PASSED | Generation pipeline works |
| `test_model_can_be_loaded_after_saving` | ⚠️ FAILED | Windows file locking issue (not model-related) |

The third test failure is a **Windows-specific issue** where OpenVINO keeps file handles open, preventing cleanup of temporary directories. This is a known platform limitation, not a model defect. The test passes on Linux/macOS.

## Usage

### For optimum-intel Testing

```python
# In optimum-intel/tests/openvino/utils_tests.py, update MODEL_NAMES:
MODEL_NAMES = {
    # ... other models ...
    "minicpmo": "hrithik-dev8/tiny-random-MiniCPM-o-2_6",
}
```

Then run tests:
```bash
pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v
```

### Basic Model Loading

```python
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "hrithik-dev8/tiny-random-MiniCPM-o-2_6",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "hrithik-dev8/tiny-random-MiniCPM-o-2_6",
    trust_remote_code=True
)
```

## Files Included

| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | 41.55 MB | Model weights (bfloat16) |
| `config.json` | 5.33 KB | Model configuration |
| `tokenizer.json` | 338.27 KB | Shrunk tokenizer (5,000 tokens) |
| `tokenizer_config.json` | 12.78 KB | Tokenizer settings |
| `vocab.json` | 85.70 KB | Vocabulary mapping |
| `merges.txt` | 36.58 KB | BPE merge rules |
| `preprocessor_config.json` | 1.07 KB | Image processor config |
| `generation_config.json` | 121 B | Generation settings |
| `added_tokens.json` | 1.13 KB | Special tokens |
| `special_tokens_map.json` | 1.24 KB | Special token mappings |

## Requirements

- Python 3.8+
- transformers >= 4.45.0, < 4.52.0
- torch
- For OpenVINO testing: optimum-intel with OpenVINO backend

## Limitations

⚠️ **This model is for testing only** - it produces random/meaningless outputs and should not be used for inference.