Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models Paper β’ 2604.08545 β’ Published 14 days ago β’ 41
Vero: An Open RL Recipe for General Visual Reasoning Paper β’ 2604.04917 β’ Published 17 days ago β’ 32
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper β’ 2603.28767 β’ Published 24 days ago β’ 58
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper β’ 2603.19232 β’ Published Mar 19 β’ 33
BitDance Collection BitDance: Open-source autoregressive model with binary visual tokens. A research project for building powerful multimodal autoregressive model. β’ 10 items β’ Updated Mar 2 β’ 11
Running on Zero MCP Featured 85 BitDance-14B-64x π 85 Open-source autoregressive model with binary visual tokens.
UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model Paper β’ 2602.14178 β’ Published Feb 15 β’ 14
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper β’ 2602.14041 β’ Published Feb 15 β’ 53
UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model Paper β’ 2602.14178 β’ Published Feb 15 β’ 14
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper β’ 2602.14041 β’ Published Feb 15 β’ 53