NV-EmbedCode-7B → Hexagon NPU (QHexRT, v81)

On-device code-retrieval embedding bundle of nvidia/nv-embedcode-7b-v1 for the Qualcomm Hexagon v81 NPU (SM8850), runnable with the QHexRT C++ runtime. No Python in the hot path.

MistralBiDirectionalModel (bidirectional Mistral-7B: hidden 4096, 32 layers, 32q/8kv, head_dim 128, plain rope theta 10000, full attention) + avg-pool + L2 → a 4096-d embedding. The 7B encoder ships as 4 chained W8 context parts (llama_embed_sharded, the encoder-split engine; 1.6 GB each ≈ 6.4 GB, fits the ~12 GB device).

Device validation (v81, SM8850) — device-vs-reference embedding cosine

input	cosine
"function to compute fibonacci numbers"	0.9992
"def fib(n): ... fib(n-1)+fib(n-2)"	0.9993
"SELECT * FROM users WHERE age > 30"	0.9993
"binary search implementation in python"	0.9993

W8 7B vs the fp32 reference — near bit-faithful (≥ 0.9992). Code retrieval: cos(fib-query, fib-code) 0.6878 device vs 0.6872 reference, and 0.037 for an irrelevant SQL doc (correct ranking). ~930 ms/embedding. (Reference computed via a host-rope replication validated bit-exact to HF's real MistralDecoderLayer, since the model's shipped custom forward is transformers-5.x-incompatible.)

What's inside (`v81/`)

nvembedcode7b_enc_p{0..3}_w8.bin — the 4 chained W8 encoder parts (graphs nvembedcode7b_enc_p{k}_w8).
nvembedcode7b_embed_f16.bin — token-embedding table (fp16, vocab 32000).
tokenizer.json — Mistral sentencepiece BPE tokenizer.
nv-embedcode-7b.json — the QHexRT manifest (host-op llama_embed_sharded, plain rope).

Run

adb push v81 /data/local/tmp/wq/nb14emb
adb shell "cd /data/local/tmp/wq && LD_LIBRARY_PATH=. QHX_EMB_QUERY_PREFIX= QHX_EMB_DOC_PREFIX= \
  ./qhx_embed nb14emb/nv-embedcode-7b.json libQnnHtp.so libQnnSystem.so nb14emb \
  'function to compute fibonacci numbers' 'def fib(n): return n if n<2 else fib(n-1)+fib(n-2)'"

Embeddings over raw text (no prefix). A QNN context binary is arch/QAIRT-pinned (v81, QAIRT 2.47, soc_model 87) — won't load on v79/v83.

Downloads last month: -; Downloads are not tracked for this model. How to track

NV-EmbedCode-7B → Hexagon NPU (QHexRT, v81)

Device validation (v81, SM8850) — device-vs-reference embedding cosine

What's inside (v81/)

Run

What's inside (`v81/`)