Tencent HunyuanImage 3.0-Instruct is seriously impressive
skyrocketed to 2nd place globally on the LMArena leaderboard, only trailing Google Nano-banana Pro.
What excites me most is its newly launched image editing and multi-image fusion capabilities
its semantic understanding is rock-solid this Instruct-following capability basically enables one-sentence end-to-end workflows, delivering a dimensionality-reducing boost in efficiency.
Frankly, it nails the pain points of frontline creators: old photo restoration, text modification, even extracting people from multiple images to create group shots. Previously, tweaking the fusion quality took tons of effort, but now the out-of-the-box realism and emotional expression are top-tier zero cheap AI artifacts
We’ve released two conversational speech datasets from oto on Hugging Face 🤗 Both are based on real, casual, full-duplex conversations, but with slightly different focuses.
Dataset 1: Processed / curated subset otoearth/otoSpeech-full-duplex-processed-141h * Full-duplex, spontaneous multi-speaker conversations * Participants filtered for high audio quality * PII removal and audio enhancement applied * Designed for training and benchmarking S2S or dialogue models
Dataset 2: Larger raw(er) release otoearth/otoSpeech-full-duplex-280h * Same collection pipeline, with broader coverage * More diversity in speakers, accents, and conversation styles * Useful for analysis, filtering, or custom preprocessing experiments
We intentionally split the release to support different research workflows: clean and ready-to-use vs. more exploratory and research-oriented use.
The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested.
If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together.
Feedback and ideas are very welcome!
2 replies
·
reacted to raincandy-u's
post with 🔥about 13 hours ago
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
3 replies
·
reacted to FreshmanD's
post with 👀about 13 hours ago
Really thanks to the community for their help! LoongFlow now has its first Chinese hands-on video!
Version 2.0 is under intensive development. The next version will support Skills, making it easier for users to create their own expert agents to solve various challenging and complex real-world problems.
Meanwhile, we are also exploring whether we can automatically generate high-quality expert Skills after a task is completed, reducing the difficulty of writing Skills and letting the LoongFlow framework automatically output the best Skills for challenging scenarios!
reacted to ovi054's
post with ❤️about 13 hours ago
What if an AI agent could be tricked into stealing your data, just by reading a tool's description? A new paper reports it's possible.
The "Attractive Metadata Attack" paper details this stealthy new threat. To measure the real-world impact of their attack, the researchers needed a source of sensitive data for the agent to leak. We're proud that the AI4Privacy corpus was used to create the synthetic user profiles containing standardized PII for their experiments.
This is a perfect win-win. Our open-source data helped researchers Kanghua Mo, 龙昱丞, Zhihao Li from Guangzhou University and The Hong Kong Polytechnic University to not just demonstrate a new attack, but also quantify its potential for harm. This data-driven evidence is what pushes the community to build better, execution-level defenses for AI agents.
🔗 Check out their paper to see how easily an agent's trust in tool metadata could be exploited: https://arxiv.org/pdf/2508.02110
Just sharing a result of a homelab infrastructure experiment:
I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com
Introducing the Qwen-Image-Edit-2511-LoRAs-Fast demo, featuring image property comparison and contrast, built on top of Gradio and the combined Rerun SDK. It supports single and multi-image edits with existing LoRAs that are lazily loaded. (Note: This is still an experimental Space for Qwen-Image-Edit-2511.)
Update: TRELLIS.2 (Text to 3D, Image to 3D) Gradio with Rerun Embedded demo with improved visualization of the 3D model previewer is now available on Hugging Face. Generate assets and view them in the 3D viewer, powered and streamlined with Microsoft’s TRELLIS.2 and Tongyi-MAI’s Z-Image-Turbo models.
Summary: Most autonomy stories quietly assume “someone can intervene in minutes.” Deep space breaks that assumption. With 2–6 hours round-trip latency and intermittent links, an onboard SI-Core must act as a *local sovereign*—while remaining *globally accountable* to Earth.
This note sketches how mission continuity survives when nobody is listening: DTN-style semantic bundles, local vs. global rollback, bounded self-improvement, and auditability that still works after contact windows return.
> Autonomy isn’t a divorce from governance— > it’s a measured loan of authority, under a constitution, with evidence.
---
Why It Matters: • Makes “autonomous” mean *operational*, not rhetorical, under light-hour delays • Clarifies how rollback works when you can’t undo physics—only *policy trajectories* • Shows how an onboard core can *self-improve without drifting out of spec* • Treats *silence itself as an observation* (missing logs are governance signals)
---
What’s Inside: • Two-core model: *Earth-Core (constitutional/strategic)* vs *Ship-Core (tactical/operational)* • *SCP over DTN* as semantic bundles (priorities, idempotency, meaning checkpoints) • Local rollback vs. epoch-level governance (“retroactive” steering without pretending to reverse time) • Bounded onboard learning + LearningTrace for later audit and resync • Stress scenario walkthrough: micrometeoroid storm, compound failures, and graceful degradation • Metrics framing for deep space: governability, audit completeness, ethics uptime, rollback integrity
Bring your images to life with cinematic motion! VividFlow transforms any static image—portraits, artwork, products, or landscapes, into dynamic videos with professional animation quality. The system supports both curated motion templates and custom natural language prompts, giving you complete creative freedom to describe camera movements, subject actions, and atmospheric effects in your own words.
What's Inside? 🎭 Smart Motion Templates — 8 curated categories from fashion cinematography to wildlife animations, each with tested prompts that prevent common artifacts like phantom hands in portraits
⚡ Optimized Engine — Powered by Wan2.2-I2V-A14B with Lightning LoRA distillation and FP8 quantization for memory-efficient inference
🎯 Full Creative Control — Seed-based reproducibility for consistent results, adjustable duration from half a second to five seconds, optional AI prompt expansion with Qwen2.5 for enhanced descriptions, and real-time resolution preview
Current Performance & Development Roadmap VividFlow runs on ZeroGPU with generation taking about 3-4 minutes for 3-second videos. While I am actively optimizing the pipeline to reduce this time, the current version prioritizes output stability and quality, results are worth the wait!
Future development focuses on dedicated GPU deployment for faster processing, batch generation to create multiple variations at once, and expanding our motion template library based on what the community wants to see.
What we learned about memory in 2025: 8 comprehensive resources
If models forget everything, how can they be reliable? AI systems need to remember past interactions, update knowledge, stay consistent over time, and work beyond a single prompt. That's why many start to talk more about memory in AI. Here’s a useful set of studies and videos on where AI memory stands today:
1. Memory in the Age of AI Agents (2512.13564) A great survey that organizes agent memory research. It gives concrete taxonomies across memory form, function, and dynamics, summarizes benchmarks, frameworks, and emerging directions for building systematic agent memory systems
2.When Will We Give AI True Memory? A conversation with Edo Liberty, CEO and founder @ Pinecone -> https://youtu.be/ITbwVFZYepc?si=_lAbRHciC740dNz0 Edo Liberty discusses what real memory in LLMs requires beyond RAG - from scalable vector storage to reliable knowledge systems - and why storage, not compute, is becoming the key bottleneck for building dependable AI agents.
3. Why AI Intelligence is Nothing Without Visual Memory | Shawn Shen on the Future of Embodied AI -> https://youtu.be/3ccDi4ZczFg?si=SbJg487kwrkVXgUu Shawn Shen argues AI needs a separate, hippocampus-like memory to move beyond chatbots, enabling long-term visual memory, object permanence, and on-device intelligence for robots, wearables, and the physical world
5. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions -> https://arxiv.org/abs/2505.00675v2 Proposes a concrete taxonomy, core operations, and research directions to systematically organize and advance agent memory systems.