RajBhope
/

gpu-runtime-predictor

Tabular Regression

gpu-runtime-prediction

code-understanding

performance-modeling

Model card Files Files and versions

gpu-runtime-predictor / README.md

RajBhope's picture

Upload README.md with huggingface_hub

2edd63e verified 15 days ago

|

history blame contribute delete

2.42 kB

	---
	tags:
	- gpu-runtime-prediction
	- code-understanding
	- regression
	- performance-modeling
	datasets:
	- RajBhope/gpu-runtime-prediction-dataset
	language:
	- code
	library_name: scikit-learn
	pipeline_tag: tabular-regression
	---

	# GPU Runtime Predictor 🚀⚡

	Predicts GPU kernel/operation runtime in milliseconds given source code + GPU hardware specifications.

	## How It Works

	1. Code Feature Extraction: Analyzes source code to extract 48 features (tensor dimensions, operation types, complexity indicators)
	2. GPU Feature Encoding: Uses 12 hardware specs (CUDA cores, memory bandwidth, compute capability, etc.)
	3. ML Prediction: Ensemble of Gradient Boosted Trees + Random Forest + Neural Network

	### Model Comparison

	\| Model \| R² \| RMSE \| Spearman ρ \| MAPE % \|
	\|-------\|-----\|------\|------------\|--------\|
	\| GBR \| 0.9923 \| 0.0728 \| 0.9264 \| 16.5% \|
	\| RF \| 0.9924 \| 0.0724 \| 0.9277 \| 16.3% \|
	\| NN \| 0.9932 \| 0.0687 \| 0.9187 \| 17.0% \|
	\| Ensemble \| 0.9930 \| 0.0693 \| 0.9272 \| 16.3% \|

	### GPU Catalog (12 GPUs)

	\| GPU \| FP32 TFLOPS \| Memory BW \| VRAM \|
	\|-----\|------------\|-----------\|------\|
	\| NVIDIA T4 \| 8.1 \| 320 GB/s \| 16 GB \|
	\| NVIDIA V100 \| 15.7 \| 900 GB/s \| 32 GB \|
	\| NVIDIA A10G \| 31.2 \| 600 GB/s \| 24 GB \|
	\| NVIDIA A100 40GB \| 19.5 \| 1555 GB/s \| 40 GB \|
	\| NVIDIA A100 80GB \| 19.5 \| 2039 GB/s \| 80 GB \|
	\| NVIDIA L4 \| 30.3 \| 300 GB/s \| 24 GB \|
	\| NVIDIA L40S \| 91.6 \| 864 GB/s \| 48 GB \|
	\| NVIDIA RTX 3090 \| 35.6 \| 936 GB/s \| 24 GB \|
	\| NVIDIA RTX 4090 \| 82.6 \| 1008 GB/s \| 24 GB \|
	\| NVIDIA H100 SXM \| 67.0 \| 3350 GB/s \| 80 GB \|
	\| NVIDIA H100 PCIe \| 48.0 \| 2039 GB/s \| 80 GB \|
	\| NVIDIA RTX A6000 \| 38.7 \| 768 GB/s \| 48 GB \|

	### 15 Supported Workload Types
	matmul, conv2d, attention, transformer_block, linear, layernorm, batchnorm,
	softmax, embedding, elementwise, reduction, pooling, FFT, sort, loss+backward

	## Usage

	```python
	# See the Gradio demo for interactive use
	# Or load models directly:
	import pickle
	with open('model_gbr.pkl', 'rb') as f:
	model = pickle.load(f)
	```

	## Training

	- Dataset: [RajBhope/gpu-runtime-prediction-dataset](https://hf.co/datasets/RajBhope/gpu-runtime-prediction-dataset)
	- 51,900 samples = 4,325 workloads × 12 GPUs
	- Runtime generated via physics-based roofline performance model
	- Based on research from [Regression Language Models](https://arxiv.org/abs/2509.26476) and [HELP](https://arxiv.org/abs/2106.08630)