library_name: kernels {% if license %}license: {{ license }} {% endif %}---
TiledAttention is a scaled dot-product attention (SDPA) forward kernel for NVIDIA GPUs, implemented in cuTile Python (TileIR) and exposed for PyTorch-oriented workflows. The design follows FlashAttention-style online softmax with tiled (K,V) streaming, while emphasizing schedule-level modifiability (tile shapes, staging, shared-memory layout) for reproducible kernel research.
In the accompanying study, TiledAttention is evaluated against PyTorch SDPA auto-dispatch and explicit baselines across sequence length, head dimension, causal/non-causal masking, and FP16/BF16 precision.
This Hub kernel is packaged as a Python-only CUDA kernel. At runtime it also requires cupy-cuda13x and cuda-tile in the consumer environment.
How to use
{% if functions %}
# make sure `kernels` is installed: `pip install -U kernels`
from kernels import get_kernel
kernel_module = get_kernel("{{ repo_id }}", version={{ version }})
{{ functions[0] }} = kernel_module.{{ functions[0] }}
{{ functions[0] }}(...)
{% else %}
Usage example not available. {% endif %}
Available functions
{% if functions %} {% for func in functions %}
{{ func }}{% endfor %} {% else %}
Function list not available. {% endif %} {% if layers %}
Available layers
{% for layer in layers %}
{{ layer }}{% endfor %} {% endif %}
Benchmarks
{% if has_benchmark %}
Benchmarking script is available for this kernel. Run kernels benchmark {{ repo_id }} --version {{ version }}.
{% else %}
No benchmark available yet. {% endif %} {% if upstream %}
Source code
Source code of this kernel originally comes from {{ upstream }} and it was repurposed for compatibility with kernels.
{% endif %}
- Downloads last month
- -