nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD Image-Text-to-Text • Updated Nov 13, 2025 • 5.15k • 20
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 17
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 Aug 21, 2024 • 42
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 68
view article Article Welcome Falcon Mamba: The first strong attention-free 7B model +4 Aug 12, 2024 • 113