FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference Paper • 2505.22758 • Published May 28, 2025 • 1
PaTH Attention: Position Encoding via Accumulating Householder Transformations Paper • 2505.16381 • Published May 22, 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published Feb 14, 2025 • 1
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Paper • 2501.06589 • Published Jan 11, 2025
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models Paper • 2409.04787 • Published Sep 7, 2024 • 1
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published Sep 11, 2024 • 20
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23
The infrastructure powering IBM's Gen AI model development Paper • 2407.05467 • Published Jul 7, 2024 • 3
FlexAttention for Efficient High-Resolution Vision-Language Models Paper • 2407.20228 • Published Jul 29, 2024 • 1
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 17
The infrastructure powering IBM's Gen AI model development Paper • 2407.05467 • Published Jul 7, 2024 • 3
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26, 2024 • 48
Autonomous Tree-search Ability of Large Language Models Paper • 2310.10686 • Published Oct 14, 2023 • 2
SALMON: Self-Alignment with Principle-Following Reward Models Paper • 2310.05910 • Published Oct 9, 2023 • 2
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention Paper • 2304.03282 • Published Apr 6, 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision Paper • 2305.03047 • Published May 4, 2023 • 1