--- license: apache-2.0 language: - en tags: - multimodal - vision-language - openvino - optimum-intel - testing - tiny-model - minicpmo base_model: openbmb/MiniCPM-o-2_6 library_name: transformers pipeline_tag: image-text-to-text --- # Tiny Random MiniCPM-o-2_6 A tiny (~42 MB) randomly-initialized version of [MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6) designed for **testing purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library. ## Purpose This model was created to replace the existing test model at `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` (185 MB) with a smaller alternative for CI/CD testing. Smaller test models reduce: - Download times in CI pipelines - Storage requirements - Test execution time ## Size Comparison | Model | Total Size | Model Weights | |-------|------------|---------------| | [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (Original) | 17.4 GB | ~17 GB | | [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) (Current Test Model) | 185 MB | 169 MB | | **hrithik-dev8/tiny-random-MiniCPM-o-2_6** (This Model) | **~42 MB** | **41.55 MB** | **Result: 4× smaller than Intel's current test model** ## Model Configuration | Component | This Model | Original | |-----------|------------|----------| | **Vocabulary** | 5,000 tokens | 151,700 tokens | | **LLM Hidden Size** | 128 | 3,584 | | **LLM Layers** | 1 | 40 | | **LLM Attention Heads** | 8 | 28 | | **Vision Hidden Size** | 128 | 1,152 | | **Vision Layers** | 1 | 27 | | **Image Size** | 980 (preserved) | 980 | | **Patch Size** | 14 (preserved) | 14 | | **Audio d_model** | 64 | 1,280 | | **TTS Hidden Size** | 128 | - | ## Parameter Breakdown | Component | Parameters | Size (MB) | |-----------|------------|-----------| | TTS/DVAE | 19,339,766 | 36.89 | | LLM | 1,419,840 | 2.71 | | Vision | 835,328 | 1.59 | | Resampler | 91,392 | 0.17 | | Audio | 56,192 | 0.11 | | Other | 20,736 | 0.04 | | **Total** | **21,763,254** | **~41.5** | ## Technical Details ### Why Keep TTS/DVAE Components? The TTS (Text-to-Speech) component, which includes the DVAE (Discrete Variational Auto-Encoder), accounts for approximately 37 MB (~85%) of the model size. While the optimum-intel tests do **not** exercise TTS functionality (they only test image+text → text generation), we retain this component because: 1. **Structural Consistency**: Removing TTS via `init_tts=False` causes structural differences in the model that lead to numerical divergence between PyTorch and OpenVINO outputs 2. **Test Compatibility**: The `test_compare_to_transformers` test compares PyTorch vs OpenVINO outputs and requires exact structural matching 3. **Architecture Integrity**: The MiniCPM-o architecture expects TTS weights to be present during model loading ### Tokenizer Shrinking The vocabulary was reduced from 151,700 to 5,000 tokens: - **Base tokens**: IDs 0-4899 (first 4,900 most common tokens) - **Special tokens**: IDs 4900-4949 (remapped from original high IDs) - **BPE merges**: Filtered from 151,387 to 4,644 (only merges involving retained tokens) Key token mappings: | Token | ID | |-------|-----| | `` | 4900 | | `<\|endoftext\|>` | 4901 | | `<\|im_start\|>` | 4902 | | `<\|im_end\|>` | 4903 | ### Reproducibility Model weights are initialized with a fixed random seed (42) to ensure: - Reproducible outputs between runs - Consistent behavior between PyTorch and OpenVINO - Passing of `test_compare_to_transformers` which compares framework outputs ## Test Results Tested with `pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v`: | Test | Status | Notes | |------|--------|-------| | `test_compare_to_transformers` | ✅ PASSED | PyTorch/OpenVINO outputs match | | `test_generate_utils` | ✅ PASSED | Generation pipeline works | | `test_model_can_be_loaded_after_saving` | ⚠️ FAILED | Windows file locking issue (not model-related) | The third test failure is a **Windows-specific issue** where OpenVINO keeps file handles open, preventing cleanup of temporary directories. This is a known platform limitation, not a model defect. The test passes on Linux/macOS. ## Usage ### For optimum-intel Testing ```python # In optimum-intel/tests/openvino/utils_tests.py, update MODEL_NAMES: MODEL_NAMES = { # ... other models ... "minicpmo": "hrithik-dev8/tiny-random-MiniCPM-o-2_6", } ``` Then run tests: ```bash pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v ``` ### Basic Model Loading ```python from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained( "hrithik-dev8/tiny-random-MiniCPM-o-2_6", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( "hrithik-dev8/tiny-random-MiniCPM-o-2_6", trust_remote_code=True ) ``` ## Files Included | File | Size | Description | |------|------|-------------| | `model.safetensors` | 41.55 MB | Model weights (bfloat16) | | `config.json` | 5.33 KB | Model configuration | | `tokenizer.json` | 338.27 KB | Shrunk tokenizer (5,000 tokens) | | `tokenizer_config.json` | 12.78 KB | Tokenizer settings | | `vocab.json` | 85.70 KB | Vocabulary mapping | | `merges.txt` | 36.58 KB | BPE merge rules | | `preprocessor_config.json` | 1.07 KB | Image processor config | | `generation_config.json` | 121 B | Generation settings | | `added_tokens.json` | 1.13 KB | Special tokens | | `special_tokens_map.json` | 1.24 KB | Special token mappings | ## Requirements - Python 3.8+ - transformers >= 4.45.0, < 4.52.0 - torch - For OpenVINO testing: optimum-intel with OpenVINO backend ## Limitations ⚠️ **This model is for testing only** - it produces random/meaningless outputs and should not be used for inference.