Young Sik Hong

RICHARDYHONG

10 35

AI & ML interests

None yet

Recent Activity

upvoted an article 7 days ago

Quantum Cryptanalysis on Real Hardware: Pushing Symmetric-Structure Key Recovery Beyond the Published Frontier

reacted to SeaWolf-AI's post with 👍 7 days ago

🔓 We ran genuine quantum key-recovery on 'real IBM quantum hardware' — and pushed the frontier well past the largest hardware demos we're aware of (which sat at N=4). Using Simon's algorithm on `ibm_kingston`, we recovered the secret key of two symmetric-cipher structures: • Even–Mansour — N=5 → N=10 • 3-round Feistel (DES-family) — block 6 → 8 Each verified against an 'independent control key', using error mitigation only (no QEC). 🧭 Honest scope: this is not a quantum speedup (the effective difficulty tracks the classical birthday bound ~2^{n/2}), not a break of real AES/RSA, and not 16-round DES (ours is 3-round). The recovery method is reserved for a forthcoming paper; formal record status is pending peer review. 📄 Write-up: https://huggingface.co/blog/FINAL-Bench/quantum 🕹️ Try it live in your browser: https://vidraft-quantumos.hf.space/crypto 🏆 Leaderboard: https://huggingface.co/spaces/FINAL-Bench/quantum-bench-leaderboard #quantum #cryptography #quantumcomputing

reacted to SeaWolf-AI's post with 🔥 7 days ago

View all activity

Organizations

None yet

upvoted an article 7 days ago

Article

Quantum Cryptanalysis on Real Hardware: Pushing Symmetric-Structure Key Recovery Beyond the Published Frontier

FINAL-Bench

•

7 days ago

• 15

reacted to SeaWolf-AI's post with 👍🔥❤️ 7 days ago

Post

5135

🔓 We ran genuine quantum key-recovery on 'real IBM quantum hardware' — and pushed the frontier well past the largest hardware demos we're aware of (which sat at N=4).

Using Simon's algorithm on ibm_kingston, we recovered the secret key of two symmetric-cipher structures:
• Even–Mansour — N=5 → N=10
• 3-round Feistel (DES-family) — block 6 → 8

Each verified against an 'independent control key', using error mitigation only (no QEC).

🧭 Honest scope: this is not a quantum speedup (the effective difficulty tracks the classical birthday bound ~2^{n/2}), not a break of real AES/RSA, and not 16-round DES (ours is 3-round). The recovery method is reserved for a forthcoming paper; formal record status is pending peer review.

📄 Write-up: https://huggingface.co/blog/FINAL-Bench/quantum
🕹️ Try it live in your browser: https://vidraft-quantumos.hf.space/crypto
🏆 Leaderboard: FINAL-Bench/quantum-bench-leaderboard

#quantum #cryptography #quantumcomputing

liked a model 9 days ago

FINAL-Bench/Qwen3.5-35B-A3B-VKAE

Text Generation • Updated 8 days ago • 28

reacted to SeaWolf-AI's post with 👍🔥👀❤️ 9 days ago

Post

2996

🚀 Adding a GPU without building one

AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere — how efficiently you use the GPUs you already have.

Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU — the effect of plugging in one more "virtual GPU."

VIDRAFT's VKAE, measured (B200, same-harness, no quality loss):

Qwen3.5-35B-A3B (MoE): 25.7 → 601 tok/s (23.4×)
Darwin-36B-Opus (in-house MoE): 25.0 → 280.8 (11.2×)
10,000+ tok/s peak aggregate under concurrency
The key: it's reproducible — model + serving shipped as one container.

docker pull vidraft/qwen35-vkae:601
Don't take our word for it — run it yourself. The mechanism will be released as a paper.

🏆 Leaderboard & demo 👉 VIDraft/vkae
Articles 👉 https://huggingface.co/blog/FINAL-Bench/vkae-leaderboard

3 replies

reacted to ginigen-ai's post with 👍👀🔥❤️ 11 days ago

Post

10429

🧠 Does your LLM know when it's about to be wrong?

Most leaderboards measure accuracy. We measure metacognition — whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. 🎉

The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 — ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.

Two independent axes (never compared across a row): ① trap_rate — does it fall for tempting trap options? (lower = stronger) ② adapter gain Δ — how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)

What's open: 📊 300+100 trap problems (each with a hidden trap + TICOS type) 🏆 24-model leaderboard 🧩 11 per-model adapters — adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state → P(wrong))

Submit any HF model → auto-scored daily at 09:00 KST and added to the board.

🏆 Leaderboard → ginigen-ai/Metacognition-Leaderboard-Space

📊 Benchmark → ginigen-ai/Metacognition-Bench

🧩 Adapters → FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961

📊 Article → https://huggingface.co/blog/ginigen-ai/metacognition

Benchmark by ginigen-ai · Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).

11 replies

upvoted an article 11 days ago

Article

Does Your LLM Know When It's About to Be Wrong?

ginigen-ai

•

11 days ago

• 21

reacted to ginigen-ai's post with 😎❤️👍🔥 13 days ago

Post

5195

🍳 The RoboCasa Kitchen Leaderboard
What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) — and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control.

RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks — picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more — inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck.

The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison.

This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables:

🏆 Kitchen 24-task (matched) — head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust.
➕ Other protocols — self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate.
🤖 GR1-Tabletop — a different, humanoid-based variant suite, separated to avoid confusion.

Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself.

👉 ginigen-ai/robocasa-kitchen-leaderboard

liked a Space 13 days ago

RoboCasa Kitchen Leaderboard

🍳

Neutral aggregation of VLA success rates on RoboCasa Kitchen

reacted to SeaWolf-AI's post with 👀 13 days ago

Post

5157

🐯 Chitos — The Security Scanner That Actually Proves It

Most security scanners hand you a suspect list and walk away. That gap between detection and proof is where attackers live — and it's exactly the gap that Chitos was built to close.

Chitos is the successor to Mythos, a static analyzer built for quick code health checks. Mythos was good at pattern matching — spotting dangerous sinks, mapping CWEs, producing readable reports. But static analysis has a structural ceiling. A rule that sees eval(user_input) can tell you that looks dangerous. It cannot tell you whether the input is reachable, whether sanitization three layers up covers this path, or whether there's a live exploit chain for your exact framework version. Chitos was built to answer those questions.

🔍 Phase 1 applies 50 language-agnostic rules across Python, JavaScript, Go, Java, C/C++, Rust, PHP, YAML and more — covering injection sinks, deserialization gadgets, credential leakage, broken crypto, and prototype pollution. Every candidate is re-verified before reaching the report. Findings that can't be substantiated are excluded, not handed to you as noise.

🔬 Phase 2 dispatches an autonomous web-search agent to hunt live CVE databases, exploit advisories, and public PoC repositories. It formulates hypotheses, verifies them, and synthesizes a structured threat narrative. This phase needs a user-supplied Claude API key — Phases 1 and 3 run entirely free.

🎯 Phase 3 is where Chitos diverges from everything else. Against targets you own or are authorized to test, it fires real payloads — XSS, SQLi, path traversal, command injection — mutates on block, captures hard evidence, and connects every proven finding into a kill-chain showing which vulnerabilities to remediate first.

No installation. No account. No code sent to third-party APIs.

Article: https://huggingface.co/blog/FINAL-Bench/chitos

Try it now 👉 https://chitos.vidraft.net

9 replies

Young Sik Hong

AI & ML interests

Recent Activity

Organizations

RICHARDYHONG's activity

Quantum Cryptanalysis on Real Hardware: Pushing Symmetric-Structure Key Recovery Beyond the Published Frontier

Does Your LLM Know *When It's About to Be Wrong*?

RoboCasa Kitchen Leaderboard

Does Your LLM Know When It's About to Be Wrong?