metadata
language:
- en
AscendKernelGen/KernelGen-LM-4B
KernelGen-LM-4B is a state-of-the-art domain-adaptive large language model specialized for low-level NPU kernel generation, specifically for the Huawei Ascend architecture using the AscendC programming language. Built upon the Qwen3-4B backbone, it is trained on the Ascend-CoT dataset and refined via reinforcement learning with execution feedback.
Introduction
Our framework, AscendKernelGen (AKGen), bridges the gap between general-purpose code generation and hardware-specific programming through a closed-loop system of data construction, training, and evaluation. Key innovations include:
- Ascend-CoT Dataset: A high-quality, domain-specific dataset incorporating Chain-of-Thought (CoT) reasoning. It combines documentation-based reasoning, code-centric reasoning derived from real-world kernel implementations, and general reasoning chains to capture the structured logic required for low-level NPU programming.
- Domain-Adaptive Post-Training: A two-stage optimization process that yields KernelGen-LM. We first employ Supervised Fine-Tuning (SFT) with error-derived supervision (correcting API misuse and numerical errors). This is followed by Reinforcement Learning (RL) using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
- Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark that assesses compilation success, functional correctness, and performance (latency) on real Ascend hardware across varying complexity levels.
- Performance: The model demonstrates siginificant improvement on complex Level-2 kernels compared to baselines, and effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) fail completely.