Oven logo

Oven

flash-attn-44.0.0b19

Published

Flash Attention CUTE (CUDA Template Engine) implementation

pip install flash-attn-4

Package Downloads

Weekly DownloadsMonthly Downloads

Authors

Requires Python

>=3.10

FlashAttention-4 (CuTeDSL)

FlashAttention-4 is a CuTeDSL-based implementation of FlashAttention for Hopper and Blackwell GPUs.

Installation

pip install flash-attn-4

If you're on CUDA 13, install with the cu13 extra for best performance:

pip install "flash-attn-4[cu13]"

Usage

from flash_attn.cute import flash_attn_func, flash_attn_varlen_func

out = flash_attn_func(q, k, v, causal=True)

Development

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
pip install -e "flash_attn/cute[dev]"       # CUDA 12.x
pip install -e "flash_attn/cute[dev,cu13]"  # CUDA 13.x (e.g. B200)
pytest tests/cute/