cache-dit - Oven

cache-dit1.1.9

Published 7 days ago

A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs.

pip install cache-dit

Package Downloads

Authors

Project URLs

Requires Python

>=3.10

Dependencies

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

Baseline	SCM S S*	SCM F D*	SCM U D*	+TS	+compile	+FP8*
24.85s	15.4s	11.4s	8.2s	8.2s	🎉7.1s	🎉4.5s

Scheme: DBCache + SCM(steps_computation_mask) + TS(TaylorSeer) + FP8*, L20x1, S*: static cache,
D*: dynamic cache, S: Slow, F: Fast, U: Ultra Fast, TS: TaylorSeer, FP8*: FP8 DQ + Sage, FLUX.1-Dev

🔥Hightlight

We are excited to announce that the 🎉v1.1.0 version of cache-dit has finally been released! It brings 🔥Context Parallelism and 🔥Tensor Parallelism to cache-dit, thus making it a PyTorch-native and Flexible Inference Engine for 🤗DiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, SCM, Context Parallelism (w/ UAA), Tensor Parallelism and 🎉SOTA performance.

pip3 install -U cache-dit # Also, pip3 install git+https://github.com/huggingface/diffusers.git (latest)

You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try ♥️ Cache Acceleration with just one line of code ~ ♥️

>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
>>> output = pipe(...) # Just call the pipe as normal.
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.

📚Core Features

🎉Full 🤗Diffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiT-based pipelines, include 30+ series, nearly 100+ pipelines: 🔥FLUX, 🔥Qwen-Image, 🔥Z-image, 🔥Wan, etc.
🎉Extremely Easy to Use: In most cases, you only need one line of code: cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal.
🎉State-of-the-Art Performance: Compared with other algorithms, cache-dit achieved the SOTA w/ 7.4x↑🎉 speedup on ClipScore! Surprisingly, it's DBCache also works for extremely few-step distilled models.
🎉Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, Quantization, CPU or Sequential Offloading, 🔥Context Parallelism, 🔥Tensor Parallelism, etc.
🎉Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes. DBCache acts as the Indicator to decide when to cache, while the Calibrator decides how to cache.
🎉HTTP Serving Support: Built-in HTTP serving capabilities for production deployment with simple REST API. Supports text-to-image, image editing, text-to-video, and image-to-video generation.
🤗Diffusers Ecosystem Integration: 🔥cache-dit has joined the Diffusers community ecosystem as the first DiT-specific cache acceleration framework for 🤗diffusers, 🔥SGLang Diffusion, and 🔥vLLM-Omni.

🔥Supported DiTs

[!Tip] One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; so, any pipelines that include the supported transformer are already supported by cache-dit. ✅: known work and official supported now; ✖️: unofficial supported now, but maybe support in the future; Q: 4-bits models w/ nunchaku W4A4; TE: Text Encoder Parallelism; 💡C*: Hybrid Cache Acceleration.

📚Model	C*	CP	TP	TE	📚Model	C*	CP	TP	TE
🔥Z-Image	✅	✅	✅	✅	🔥Z-Image-Control	✖️	✖️	✅	✅
🔥Ovis-Image	✅	✅	✅	✅	🔥HuyuanVideo 1.5	✅	✖️	✖️	✅
🔥FLUX.2	✅	✅	✅	✅	🎉FLUX.1 `Q`	✅	✅	✖️	✅
🎉FLUX.1	✅	✅	✅	✅	🎉Qwen-Image `Q`	✅	✅	✖️	✅
🎉Qwen-Image	✅	✅	✅	✅	🎉Qwen...Edit `Q`	✅	✅	✖️	✅
🎉Qwen...Edit	✅	✅	✅	✅	🎉Qwen.E.Plus `Q`	✅	✅	✖️	✅
🎉Qwen..Light	✅	✅	✅	✅	🎉Qwen...Light `Q`	✅	✅	✖️	✅
🎉Wan 2.2 T2V/ITV	✅	✅	✅	✅	🎉Qwen.E.Light `Q`	✅	✅	✖️	✅
🎉Wan 2.2 VACE	✅	✅	✅	✅	🎉Mochi	✅	✖️	✅	✅
🎉Wan 2.1 T2V/ITV	✅	✅	✅	✅	🎉HiDream	✅	✖️	✖️	✅
🎉Wan 2.1 VACE	✅	✅	✅	✅	🎉HunyuanDiT	✅	✖️	✅	✅
🎉HunyuanVideo	✅	✅	✅	✅	🎉Sana	✅	✖️	✖️	✅
🎉ChronoEdit	✅	✅	✅	✅	🎉Bria	✅	✖️	✖️	✅
🎉CogVideoX	✅	✅	✅	✅	🎉SkyReelsV2	✅	✅	✅	✅
🎉CogVideoX 1.5	✅	✅	✅	✅	🎉Lumina 1/2	✅	✖️	✅	✅
🎉CogView4	✅	✅	✅	✅	🎉DiT-XL	✅	✅	✖️	✅
🎉CogView3Plus	✅	✅	✅	✅	🎉Allegro	✅	✖️	✖️	✅
🎉PixArt Sigma	✅	✅	✅	✅	🎉Cosmos	✅	✖️	✖️	✅
🎉PixArt Alpha	✅	✅	✅	✅	🎉OmniGen	✅	✖️	✖️	✅
🎉Chroma-HD	✅	✅	️✅	✅	🎉EasyAnimate	✅	✖️	✖️	✅
🎉VisualCloze	✅	✅	✅	✅	🎉StableDiffusion3	✅	✖️	✖️	✅
🎉HunyuanImage	✅	✅	✅	✅	🎉PRX T2I	✅	✖️	✖️	✅
🎉Kandinsky5	✅	✅️	✅️	✅	🎉Amused	✅	✖️	✖️	✅
🎉LTXVideo	✅	✅	✅	✅	🎉AuraFlow	✅	✖️	✖️	✅
🎉ConsisID	✅	✅	✅	✅	🎉LongCatVideo	✅	✖️	✖️	✅

🔥Click here to show many Image/Video cases🔥

🔥Wan2.2 MoE | +cache-dit:2.0x↑🎉 | HunyuanVideo | +cache-dit:2.1x↑🎉

🔥Qwen-Image | +cache-dit:1.8x↑🎉 | FLUX.1-dev | +cache-dit:2.1x↑🎉

🔥Qwen...Lightning | +cache-dit:1.14x↑🎉 | HunyuanImage | +cache-dit:1.7x↑🎉

🔥Qwen-Image-Edit | Input w/o Edit | Baseline | +cache-dit:1.6x↑🎉 | 1.9x↑🎉

🔥FLUX-Kontext-dev | Baseline | +cache-dit:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉

🔥HiDream-I1 | +cache-dit:1.9x↑🎉 | CogView4 | +cache-dit:1.4x↑🎉 | 1.7x↑🎉

🔥CogView3 | +cache-dit:1.5x↑🎉 | 2.0x↑🎉| Chroma1-HD | +cache-dit:1.9x↑🎉

🔥Mochi-1-preview | +cache-dit:1.8x↑🎉 | SkyReelsV2 | +cache-dit:1.6x↑🎉

🔥LTX-Video-0.9.7 | +cache-dit:1.7x↑🎉 | CogVideoX1.5 | +cache-dit:2.0x↑🎉

🔥OmniGen-v1 | +cache-dit:1.5x↑🎉 | 3.3x↑🎉 | Lumina2 | +cache-dit:1.9x↑🎉

🔥Allegro | +cache-dit:1.36x↑🎉 | AuraFlow-v0.3 | +cache-dit:2.27x↑🎉

🔥Sana | +cache-dit:1.3x↑🎉 | 1.6x↑🎉| PixArt-Sigma | +cache-dit:2.3x↑🎉

🔥PixArt-Alpha | +cache-dit:1.6x↑🎉 | 1.8x↑🎉| SD 3.5 | +cache-dit:2.5x↑🎉

🔥Asumed | +cache-dit:1.1x↑🎉 | 1.2x↑🎉 | DiT-XL-256 | +cache-dit:1.8x↑🎉
♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️

📖Table of Contents

For more advanced features such as Unified Cache APIs, Forward Pattern Matching, Automatic Block Adapter, Hybrid Forward Pattern, Patch Functor, DBCache, DBPrune, TaylorSeer Calibrator, SCM, Hybrid Cache CFG, Context Parallelism (w/ UAA) and Tensor Parallelism, please refer to the 🎉User_Guide.md for details.

🚀Quick Links

📊Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
🌐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text-to-video generation.
❓FAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.

📚Documentation

👋Contribute

How to contribute? Star ⭐️ this repo to support us or check CONTRIBUTE.md.

🎉Projects Using CacheDiT

Here is a curated list of open-source projects integrating CacheDiT, including popular repositories like jetson-containers, flux-fast, sdnext, 🔥vLLM-Omni, and 🔥SGLang Diffusion. 🎉CacheDiT has been recommended by many famous opensource projects: 🔥Z-Image, 🔥Wan 2.2, 🔥Qwen-Image, 🔥LongCat-Video, Qwen-Image-Lightning, Kandinsky-5, LeMiCa, 🤗diffusers, HelloGitHub and GaintPandaCV.

Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project. We learned the design and reused code from the following projects: 🤗diffusers, SGLang, ParaAttention, xDiT, TaylorSeer and LeMiCa.

©️Citations

@misc{cache-dit@2025,
  title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com},
  year={2025}
}