| tags: | |
| - model-merge | |
| - hermite-interpolation | |
| - deepseek | |
| base_model: | |
| - deepseek-ai/deepseek-math-7b-instruct | |
| - deepseek-ai/deepseek-coder-7b-instruct-v1.5 | |
| # deepseek-7b-math-code-lambda010 | |
| 2モデルの線形補間マージモデル。 | |
| ## Merge Configuration | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Model A | `deepseek-ai/deepseek-math-7b-instruct` | | |
| | Model B | `deepseek-ai/deepseek-coder-7b-instruct-v1.5` | | |
| | λ_a | 0.10 | | |
| | λ_b | 0.90 | | |
| | Formula | θ* = 0.10 × θ_a + 0.90 × θ_b | | |
| | dtype | torch.float16 | | |
| ## Tokenizer | |
| Union tokenizer (mergekit-style): vocabularies of both models are merged. | |
| - Union vocab size: 100016 | |
| - Tokens added from Model B: 14 | |
| - Tokens only in Model A: 0 | |
| For tokens missing from a model, the other model's embedding is used as fallback | |
| before linear interpolation. | |
| ## Description | |
| This model was created by linearly interpolating the parameters of two models: | |
| - **Model A** (`deepseek-ai/deepseek-math-7b-instruct`): weight = 0.10 | |
| - **Model B** (`deepseek-ai/deepseek-coder-7b-instruct-v1.5`): weight = 0.90 | |