deepseek-7b-math-code-lambda060 / README.md

lejelly

Upload merged model (lambda_a=0.60, lambda_b=0.40)

dc1eeee verified 1 day ago

preview code

raw

history blame contribute delete

1.06 kB

metadata

tags:
  - model-merge
  - hermite-interpolation
  - deepseek
base_model:
  - deepseek-ai/deepseek-math-7b-instruct
  - deepseek-ai/deepseek-coder-7b-instruct-v1.5

deepseek-7b-math-code-lambda060

2モデルの線形補間マージモデル。

Merge Configuration

Parameter	Value
Model A	`deepseek-ai/deepseek-math-7b-instruct`
Model B	`deepseek-ai/deepseek-coder-7b-instruct-v1.5`
λ_a	0.60
λ_b	0.40
Formula	θ* = 0.60 × θ_a + 0.40 × θ_b
dtype	torch.float16

Tokenizer

Union tokenizer (mergekit-style): vocabularies of both models are merged.

Union vocab size: 100016
Tokens added from Model B: 14
Tokens only in Model A: 0

For tokens missing from a model, the other model's embedding is used as fallback before linear interpolation.

Description

This model was created by linearly interpolating the parameters of two models:

Model A (deepseek-ai/deepseek-math-7b-instruct): weight = 0.60
Model B (deepseek-ai/deepseek-coder-7b-instruct-v1.5): weight = 0.40