Dev Mode Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

DmitryRyumin authored a paper 6 days ago

Team RAS in 11th ABAW Competition: Multimodal Ambivalence Recognition Approach

DmitryRyumin authored a paper 6 days ago

Team LEYA in 10th ABAW Competition: Multimodal Ambivalence/Hesitancy Recognition Approach

nielsr submitted a paper 11 days ago

MonkeyOCRv2: A Visual-Text Foundation Model for Document AI

View all activity

Nymbo

posted an update about 17 hours ago

Post

984

Introducing Inflect-v2, two exceptionally small, open-weight English TTS models at just 3.9M and 9.3M parameters. Both generate speech multiple times faster than real-time on CPU. Despite their size, Inflect-v2 delivers quality that is competitive with much larger lightweight TTS systems, including KittenTTS, Piper, and Supertonic-3.

CPU, CUDA, PyTorch, and ONNX are supported. Apache 2.0.

See it for yourselves:
owensong/Inflect-Micro-v2
owensong/Inflect-Nano-v2

Try the Demos:
Nymbo/Inflect-TTS (unlimited CPU usage)
owensong/Inflect-v2 (ultra-fast ZeroGPU usage)

3 replies

DmitryRyumin

authored 2 papers 6 days ago

Team RAS in 11th ABAW Competition: Multimodal Ambivalence Recognition Approach

Paper • 2607.14702 • Published 11 days ago

Team LEYA in 10th ABAW Competition: Multimodal Ambivalence/Hesitancy Recognition Approach

Paper • 2603.12848 • Published Mar 13

osanseviero

authored a paper 17 days ago

Gemma 4 Technical Report

Paper • 2607.02770 • Published 25 days ago • 72

GeorgeBredis

submitted a paper to Daily Papers 18 days ago

Rank-Then-Act: Reward-Free Control from Frame-Order Progress

Paper • 2607.01897 • Published 25 days ago • 7

osanseviero

submitted a paper to Daily Papers 18 days ago

Gemma 4 Technical Report

Paper • 2607.02770 • Published 25 days ago • 72

DongfuJiang

authored 2 papers 21 days ago

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2606.15007 • Published Jun 12 • 19

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Paper • 2606.14885 • Published Jun 12 • 11

Prabhjotschugh

authored 4 papers 27 days ago

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

Paper • 2606.17188 • Published Jun 17 • 1

FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes

Paper • 2606.20769 • Published Jun 18 • 1

Beyond 'One Language, One Script': Quantifying Orthographic Bias in Multilingual VLMs with PuMVR

Paper • 2606.20770 • Published Jun 18 • 1

AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion

Paper • 2605.26130 • Published May 20 • 1

eienmojiki

posted an update about 1 month ago

Post

212

Hi everyone,

I've created a Gradio space for embedding and extracting invisible watermarks in images:
👉 eienmojiki/blind-watermark-studio

It supports hiding text, images, and bit arrays using the DWT-DCT-SVD algorithm.

Credits:
- Original library: https://github.com/guofei9987/blind_watermark
- Author: Guo Fei

:).

KingNish

posted an update about 1 month ago

Post

4574

We trained an open-source Mythos like cybersecurity LLM for the Build Small Hackathon meet OpenMythos

Trained in two stages: SFT on ~1.84K filtered ArXiv cs.CR papers + real CVE data, then RLVR using paired with past vulnerabilities GitHub repos with a verifier model checking outputs against ground truth.

Trained on: H100s from Modal

The RLVR stage made the biggest difference responses got more precise and less prone to confusing similar vulnerability classes.

Everything is open:
🤖 Demo → build-small-hackathon/OpenMythos
🧠 Model → build-small-hackathon/OpenMythos
📦 CVE Dataset → build-small-hackathon/CVE_Vulnerailities_Detailed
📄 ArXiv Dataset → himanshu17HF/ArvixImport-Filtered-Final

Try it out and let us know where it breaks 🙏

2 replies

Abhaykoul

posted an update about 1 month ago

Post

302

Shipped v0.1.2 of vtx — a minimalist coding agent for the terminal.

Most agentic CLIs ship 10k+ token system prompts. Vtx is ~2,200. Less prompt overhead means more room for your code in the model's context window.

Vtx is a from-scratch Python implementation of the design philosophy behind pi-mono — same principles, pure Python, no transpiled runtime.

What ships out of the box:

→ Textual TUI + headless CLI (vtx -p "fix the failing test")
→ 49 LLM provider gateways, all declared in a single provider.yaml
→ 5 core tools (read / edit / write / bash / find) plus web search and fetch
→ Session tree with compaction, handoff, and resume
→ AGENTS.md / CLAUDE.md auto-discovery
→ Skills system — drop SKILL.md files in .agents/skills/ and they become slash commands
→ Two OAuth flows (GitHub Copilot device flow, OpenAI Codex PKCE)
→ Two-mode permissions: prompt (default) or auto, with a safe-command allowlist

This release adds a proper extension system. Register new LLM-callable tools, intercept tool calls, hook lifecycle events, and add slash commands from a single register(api) function in a Python file under ~/.vtx/agent/extensions/. Extensions can override built-in tools by name and chain handler logic across subscribers.

Apache 2.0. uv tool install vtx-coding-agent and you're running.

GitHub: https://github.com/OEvortex/vtx-coding-agent
PyPI: https://pypi.org/project/vtx-coding-agent

Built in the open. Feedback, extensions, and PRs welcome.