flash-attn-44.0.0b19
flash-attn-44.0.0b19
Published
Flash Attention CUTE (CUDA Template Engine) implementation
pip install flash-attn-4
Package Downloads
Authors
Project URLs
Requires Python
>=3.10
FlashAttention-4 (CuTeDSL)
FlashAttention-4 is a CuTeDSL-based implementation of FlashAttention for Hopper and Blackwell GPUs.
Installation
pip install flash-attn-4
If you're on CUDA 13, install with the cu13 extra for best performance:
pip install "flash-attn-4[cu13]"
Usage
from flash_attn.cute import flash_attn_func, flash_attn_varlen_func
out = flash_attn_func(q, k, v, causal=True)
Development
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
pip install -e "flash_attn/cute[dev]" # CUDA 12.x
pip install -e "flash_attn/cute[dev,cu13]" # CUDA 13.x (e.g. B200)
pytest tests/cute/