| --- |
| tags: |
| - gpu-runtime-prediction |
| - code-understanding |
| - regression |
| - performance-modeling |
| datasets: |
| - RajBhope/gpu-runtime-prediction-dataset |
| language: |
| - code |
| library_name: scikit-learn |
| pipeline_tag: tabular-regression |
| --- |
| |
| # GPU Runtime Predictor 🚀⚡ |
|
|
| Predicts GPU kernel/operation **runtime in milliseconds** given **source code** + **GPU hardware specifications**. |
|
|
| ## How It Works |
|
|
| 1. **Code Feature Extraction**: Analyzes source code to extract 48 features (tensor dimensions, operation types, complexity indicators) |
| 2. **GPU Feature Encoding**: Uses 12 hardware specs (CUDA cores, memory bandwidth, compute capability, etc.) |
| 3. **ML Prediction**: Ensemble of Gradient Boosted Trees + Random Forest + Neural Network |
|
|
| ### Model Comparison |
|
|
| | Model | R² | RMSE | Spearman ρ | MAPE % | |
| |-------|-----|------|------------|--------| |
| | **GBR** | 0.9923 | 0.0728 | 0.9264 | 16.5% | |
| | **RF** | 0.9924 | 0.0724 | 0.9277 | 16.3% | |
| | **NN** | 0.9932 | 0.0687 | 0.9187 | 17.0% | |
| | **Ensemble** | 0.9930 | 0.0693 | 0.9272 | 16.3% | |
|
|
| ### GPU Catalog (12 GPUs) |
|
|
| | GPU | FP32 TFLOPS | Memory BW | VRAM | |
| |-----|------------|-----------|------| |
| | NVIDIA T4 | 8.1 | 320 GB/s | 16 GB | |
| | NVIDIA V100 | 15.7 | 900 GB/s | 32 GB | |
| | NVIDIA A10G | 31.2 | 600 GB/s | 24 GB | |
| | NVIDIA A100 40GB | 19.5 | 1555 GB/s | 40 GB | |
| | NVIDIA A100 80GB | 19.5 | 2039 GB/s | 80 GB | |
| | NVIDIA L4 | 30.3 | 300 GB/s | 24 GB | |
| | NVIDIA L40S | 91.6 | 864 GB/s | 48 GB | |
| | NVIDIA RTX 3090 | 35.6 | 936 GB/s | 24 GB | |
| | NVIDIA RTX 4090 | 82.6 | 1008 GB/s | 24 GB | |
| | NVIDIA H100 SXM | 67.0 | 3350 GB/s | 80 GB | |
| | NVIDIA H100 PCIe | 48.0 | 2039 GB/s | 80 GB | |
| | NVIDIA RTX A6000 | 38.7 | 768 GB/s | 48 GB | |
|
|
| ### 15 Supported Workload Types |
| matmul, conv2d, attention, transformer_block, linear, layernorm, batchnorm, |
| softmax, embedding, elementwise, reduction, pooling, FFT, sort, loss+backward |
| |
| ## Usage |
| |
| ```python |
| # See the Gradio demo for interactive use |
| # Or load models directly: |
| import pickle |
| with open('model_gbr.pkl', 'rb') as f: |
| model = pickle.load(f) |
| ``` |
| |
| ## Training |
|
|
| - **Dataset**: [RajBhope/gpu-runtime-prediction-dataset](https://hf.co/datasets/RajBhope/gpu-runtime-prediction-dataset) |
| - **51,900 samples** = 4,325 workloads × 12 GPUs |
| - Runtime generated via physics-based roofline performance model |
| - Based on research from [Regression Language Models](https://arxiv.org/abs/2509.26476) and [HELP](https://arxiv.org/abs/2106.08630) |
|
|