ValueError Rope Scaling
Hi , when i try to deploy the model via a HF Inference Endpoint i get this : [Server Message] Endpoint Failed to Start.
With the following details :
Exit code: 1. Reason: 1.0, β β\nβ β β 1.0, β β\nβ β β 1.0, β β\nβ β β 1.0 β β\nβ β β ], β β\nβ β β "type": "longrope" β β\nβ β }, β β\nβ β "rope_theta": 10000.0, β β\nβ β "transformers_version": "4.48.3", β β\nβ β "use_cache": true, β β\nβ β "vocab_size": 200064 β β\nβ β } β β\nβ β°βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ― β\nβ°βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ―\nValueError: rope_scaling's short_factor field must have length 64, got 48"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2025-03-17T11:18:59.041473Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
{"timestamp":"2025-03-17T11:18:59.041521Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardCannotStart
I'm pretty new to using HF Enpoints, so I just wanted to know if there was a way I could fix it myself, or if I needed to wait for a model update or something like that.
same thing here too `---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in <cell line: 0>()
----> 1 generate(PHI3, messages)
5 frames
/usr/local/lib/python3.11/dist-packages/transformers/models/phi3/configuration_phi3.py in _rope_scaling_validation(self)
206 )
207 if not len(rope_scaling_short_factor) == self.hidden_size // self.num_attention_heads // 2:
--> 208 raise ValueError(
209 f"rope_scaling's short_factor field must have length {self.hidden_size // self.num_attention_heads // 2}, got {len(rope_scaling_short_factor)}"
210 )
ValueError: rope_scaling's short_factor field must have length 64, got 48`
Hi @clawvyrin and @3mar2000 ,
Thanks for your interest!
Yes, the new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.
Can you upgrade your HF version, or path the changes below and try?
VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947
Hi @ykim362 , I'm not using Transformers but InferenceClient and it still doesn't work, unfortunately.
I am trying to deploy the model directly from this page: https://huggingface.co/microsoft/Phi-4-mini-instruct
If you want to fix it yourself, ensure that the partial rotary factor is added to your config (here explicitly as 0.75): "rotary_ndims = int(self.hidden_size // self.num_attention_heads * 0.75)
if not len(rope_scaling_short_factor) == rotary_ndims // 2:
raise ValueError(
f"rope_scaling's short_factor field must have length {rotary_ndims // 2}, got {len(rope_scaling_short_factor)}"
)"
+1
The RoPE scaling error in microsoft/Phi-4-mini-instruct typically surfaces when the rope_scaling config field doesn't match what the installed version of transformers expects. Phi-4-mini uses a specific RoPE scaling configuration (often longrope or a variant), and there were breaking changes in how transformers validates that dict structure around versions 4.43-4.45. The fix is usually one of two things: either pin transformers>=4.44.0 (check the model card's requirements), or if you're loading with trust_remote_code=True, make sure the local config.json rope_scaling dict includes all required keys like "rope_type" rather than the older "type" field. You can verify what your config.json actually contains by running AutoConfig.from_pretrained("microsoft/Phi-4-mini-instruct") and inspecting the rope_scaling attribute directly.
If you're loading this model as part of a multi-agent pipeline, there's an additional wrinkle worth noting: if different agents or worker processes are loading the model with different transformers versions in the same environment (this comes up a lot in setups like the git-worktree multi-agent patterns people have been discussing), you can get inconsistent RoPE validation errors that are hard to reproduce. The config validation happens at load time and is version-sensitive, so environment isolation matters more than people expect here.
One concrete debugging step: post the full traceback including the line in modeling_phi4.py or configuration_phi4.py where it fails, along with your transformers.__version__. The error message itself usually tells you exactly which key is missing or malformed in the rope_scaling dict, and that narrows it down quickly.
The RoPE scaling error in microsoft/Phi-4-mini-instruct typically surfaces when the rope_scaling config field doesn't conform to what your version of transformers expects. Phi-4-mini uses a specific RoPE scaling configuration (usually longrope or a variant), and there was a period where the config.json in the repo had fields that were only valid for newer transformers versions. First thing to check: run pip show transformers and compare against the minimum version specified in the model card. If you're on anything below 4.48.x you'll likely hit this. The fix is usually just upgrading, but if you're pinned to an older version for other reasons, you can manually patch the rope_scaling dict in the config to strip unrecognized keys before loading.
If you're loading the model programmatically, you can work around it by doing something like:
from transformers import AutoConfig
config = AutoConfig.from_pretrained("microsoft/Phi-4-mini-instruct")
config.rope_scaling = {"type": "longrope", "factor": 32.0} # adjust to match your transformers version's expected schema
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-instruct", config=config)
The exact fields transformers validates against are defined in modeling_rope_utils.py in the transformers source β worth diffing that against what's in the model's config.json to see exactly which key is tripping the validator.
One broader note: this kind of silent config mismatch becomes a real headache in multi-agent pipelines where you're dynamically loading models at runtime. In the infrastructure we build at AgentGraph, we've started treating model config validation as part of the agent provisioning step β catching these errors before an agent is assigned work rather than mid-execution. The RoPE issue is a good example of why version pinning and config schema validation deserve more attention than they usually get in agent deployment setups.