Daniel Leong's picture

Daniel Leong

daniel-ltw

·

daniel-ltw

AI & ML interests

None yet

Recent Activity

reacted to SeaWolf-AI's post with 👍 about 2 months ago

🚀 Introducing MARL — Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning Now available on PyPI · GitHub · ClawHub · HuggingFace AI models sense they could be wrong, but they can't actually fix what's broken. 🤗 Live A/B test: https://huggingface.co/spaces/VIDraft/MARL We evaluated 9 SOTA models (GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, etc.) across 1,800 assessments in FINAL Bench and found a 39.2%p gap between "recognizing potential errors (MA=0.694)" and "actually finding and fixing them (ER=0.302)." MARL (Model-Agnostic Runtime Middleware for LLMs) was built to close this metacognitive gap. It decomposes a single LLM call into a 5-stage expert pipeline (Hypothesis → Solver → Auditor → Adversarial Verifier → Synthesizer), transforming "answer in one shot" into "think, doubt, correct, and rewrite." No weight modification — works instantly with GPT-5.4, Claude, Gemini, Llama, or any OpenAI API-compatible LLM by changing one line: base_url. Ships with 9 domain-specific emergence engines (invention, pharma, genomics, chemistry, ecology, law, and more — 5,538 expert data items) activated by a simple tag like model="gpt-5.4::pharma". pip install marl-middleware MARL is also officially registered on ClawHub, the skill marketplace of OpenClaw — an AI agent platform with 260K+ developers and 3,200+ skills. It's the first middleware in the Reasoning Enhancement category. One command — clawhub install marl-middleware — gives your AI agent a metacognition upgrade. 📝 Technical deep dive: https://huggingface.co/blog/FINAL-Bench/marl-middleware 📦 PyPI: https://pypi.org/project/marl-middleware/ 🐙 GitHub: https://github.com/Vidraft/MARL 🦀 ClawHub: https://clawhub.ai/Cutechicken99/marl-middleware #MARL #LLM #Hallucination #Metacognition #MultiAgent #AIMiddleware #FINALBench #OpenClaw #ClawHub #PyPI #AGI #HuggingFace #ReasoningAI #SelfCorrection #GlassBoxAI

reacted to SeaWolf-AI's post with 🔥 about 2 months ago

🚀 Introducing MARL — Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning Now available on PyPI · GitHub · ClawHub · HuggingFace AI models sense they could be wrong, but they can't actually fix what's broken. 🤗 Live A/B test: https://huggingface.co/spaces/VIDraft/MARL We evaluated 9 SOTA models (GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, etc.) across 1,800 assessments in FINAL Bench and found a 39.2%p gap between "recognizing potential errors (MA=0.694)" and "actually finding and fixing them (ER=0.302)." MARL (Model-Agnostic Runtime Middleware for LLMs) was built to close this metacognitive gap. It decomposes a single LLM call into a 5-stage expert pipeline (Hypothesis → Solver → Auditor → Adversarial Verifier → Synthesizer), transforming "answer in one shot" into "think, doubt, correct, and rewrite." No weight modification — works instantly with GPT-5.4, Claude, Gemini, Llama, or any OpenAI API-compatible LLM by changing one line: base_url. Ships with 9 domain-specific emergence engines (invention, pharma, genomics, chemistry, ecology, law, and more — 5,538 expert data items) activated by a simple tag like model="gpt-5.4::pharma". pip install marl-middleware MARL is also officially registered on ClawHub, the skill marketplace of OpenClaw — an AI agent platform with 260K+ developers and 3,200+ skills. It's the first middleware in the Reasoning Enhancement category. One command — clawhub install marl-middleware — gives your AI agent a metacognition upgrade. 📝 Technical deep dive: https://huggingface.co/blog/FINAL-Bench/marl-middleware 📦 PyPI: https://pypi.org/project/marl-middleware/ 🐙 GitHub: https://github.com/Vidraft/MARL 🦀 ClawHub: https://clawhub.ai/Cutechicken99/marl-middleware #MARL #LLM #Hallucination #Metacognition #MultiAgent #AIMiddleware #FINALBench #OpenClaw #ClawHub #PyPI #AGI #HuggingFace #ReasoningAI #SelfCorrection #GlassBoxAI

repliedto abusyed's post 2 months ago

I use multiple AI coding agents daily, Claude Code, Cursor, Codex (one of them's good at design, one's good at problem solving, one's good to just have an overall plan)... and I kept running into two problems that were driving me insane: Context loss on every switch. Every time I moved from Cursor to Claude Code (or vice versa), I'd have to reexplain the entire project philosophy, past decisions, why I chose X architecture over Y. Half my prompts became "here's what the last agent did and why." Agent drift — technically correct but philosophically wrong code. This is the sneaky one. I build AI tutors that force students to reason through problems instead of getting answers handed to them. One agent literally added a "Skip Reasoning" button to the UI. Technically valid code. Completely violates the entire product philosophy. And the agent had no way of knowing that because it couldn't see the design intent. So I built LedgerSync - a file-based shared context protocol that solves both problems. How it works: An append-only ledger (.ledgersync/ledger.jsonl) logs every agent decision with full reasoning traces - not just what happened, but WHY Agents read grounding documents (product philosophy, design constraints, user research) before making decisions When you switch tools, the new agent reads the ledger and picks up where the last one left off - with full context Auto-generates agent-specific instruction files (CLAUDE.md, .cursorrules, etc.) No server, no accounts, no setup. Just files that live in your repo. Your agents already know how to read files - LedgerSync just gives them the right ones. The key insight: the problem isn't that agents are bad at coding. It's that they have no memory and no product awareness. LedgerSync gives them both. MIT licensed, early stage: https://github.com/Metacog-AI/ledgersync Has anyone else dealt with the agent drift problem?

View all activity

Organizations

None yet

liked a Space 6 months ago

Pixcribe

Generate professional social media captions from images

liked a Space about 1 year ago

Gender Age Detector

Human gender age detection

liked 2 Spaces over 1 year ago

Open Source Ai Year In Review 2024

What happened in open-source AI this year, and what’s next?

— Zero GPU Spaces —

List of spaces using ZERO-GPU

liked a dataset over 1 year ago

fka/prompts.chat

Viewer • Updated about 23 hours ago • 1.75k • 59.3k • 9.68k