π We ran genuine quantum key-recovery on 'real IBM quantum hardware' β and pushed the frontier well past the largest hardware demos we're aware of (which sat at N=4).
Using Simon's algorithm on ibm_kingston, we recovered the secret key of two symmetric-cipher structures: β’ EvenβMansour β N=5 β N=10 β’ 3-round Feistel (DES-family) β block 6 β 8
Each verified against an 'independent control key', using error mitigation only (no QEC).
π§ Honest scope: this is not a quantum speedup (the effective difficulty tracks the classical birthday bound ~2^{n/2}), not a break of real AES/RSA, and not 16-round DES (ours is 3-round). The recovery method is reserved for a forthcoming paper; formal record status is pending peer review.
π§ Does your LLM know when it's about to be wrong?
Most leaderboards measure accuracy. We measure metacognition β whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. π
The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 β ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.
Two independent axes (never compared across a row): β trap_rate β does it fall for tempting trap options? (lower = stronger) β‘ adapter gain Ξ β how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)
AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere β how efficiently you use the GPUs you already have.
Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU β the effect of plugging in one more "virtual GPU."
VIDRAFT's VKAE, measured (B200, same-harness, no quality loss):
Qwen3.5-35B-A3B (MoE): 25.7 β 601 tok/s (23.4Γ) Darwin-36B-Opus (in-house MoE): 25.0 β 280.8 (11.2Γ) 10,000+ tok/s peak aggregate under concurrency The key: it's reproducible β model + serving shipped as one container.
docker pull vidraft/qwen35-vkae:601 Don't take our word for it β run it yourself. The mechanism will be released as a paper.
AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere β how efficiently you use the GPUs you already have.
Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU β the effect of plugging in one more "virtual GPU."
VIDRAFT's VKAE, measured (B200, same-harness, no quality loss):
Qwen3.5-35B-A3B (MoE): 25.7 β 601 tok/s (23.4Γ) Darwin-36B-Opus (in-house MoE): 25.0 β 280.8 (11.2Γ) 10,000+ tok/s peak aggregate under concurrency The key: it's reproducible β model + serving shipped as one container.
docker pull vidraft/qwen35-vkae:601 Don't take our word for it β run it yourself. The mechanism will be released as a paper.