xgrammar0.1.22
Published
Efficient, Flexible and Portable Structured Generation
pip install xgrammar
Package Downloads
Authors
Requires Python
<4,>=3.8
Dependencies
- pydantic
- torch
>=1.10.0
- transformers
>=4.38.0
- triton
; platform_system == "Linux" and platform_machine == "x86_64"
- mlx-lm
; platform_system == "Darwin" and platform_machine == "arm64"
- ninja
- typing-extensions
>=4.9.0
- huggingface-hub
[cli]; extra == "test"
- protobuf
; extra == "test"
- pytest
; extra == "test"
- sentencepiece
; extra == "test"
- tiktoken
; extra == "test"
- transformers
<4.50.0; platform_system == "Darwin" and extra == "test"
Efficient, Flexible and Portable Structured Generation
News
- [2025/02] XGrammar has been officially integrated into Modular's MAX
- [2025/01] XGrammar has been officially integrated into TensorRT-LLM.
- [2024/12] XGrammar has been officially integrated into vLLM.
- [2024/12] We presented research talks on XGrammar at CMU, UC Berkeley, MIT, THU, SJTU, Ant Group, LMSys, Qingke AI, Camel AI. The slides can be found here.
- [2024/11] XGrammar has been officially integrated into SGLang.
- [2024/11] XGrammar has been officially integrated into MLC-LLM.
- [2024/11] We officially released XGrammar v0.1.0!
Overview
XGrammar is an open-source library for efficient, flexible, and portable structured generation.
It leverages constrained decoding to ensure 100% structural correctness of the output. It supports general context-free grammar to enable a broad range of structures, including JSON, regex, custom context-free grammar, etc.
XGrammar uses careful optimizations to achieve extremely low overhead in structured generation. It has achieved near-zero overhead in JSON generation, making it one of the fastest structured generation engines available.
XGrammar features universal deployment. It supports:
- Platforms: Linux, macOS, Windows
- Hardware: CPU, NVIDIA GPU, AMD GPU, Apple Silicon, TPU, etc.
- Languages: Python, C++, and JavaScript APIs
- Models: Qwen, Llama, DeepSeek, Phi, Gemma, etc.
XGrammar is very easy to integrate with LLM inference engines. It is the default structured generation backend for most LLM inference engines, including vLLM, SGLang, TensorRT-LLM, and MLC-LLM, as well as many other companies. You can also try out their structured generation modes!
Get Started
Install XGrammar:
pip install xgrammar
Import XGrammar:
import xgrammar as xgr
Please visit our documentation to get started with XGrammar.
Adoption
XGrammar has been adopted by many projects and companies, including but not limited to:
Citation
If you find XGrammar useful in your research, please consider citing our paper:
@article{dong2024xgrammar,
title={Xgrammar: Flexible and efficient structured generation engine for large language models},
author={Dong, Yixin and Ruan, Charlie F and Cai, Yaxing and Lai, Ruihang and Xu, Ziyi and Zhao, Yilong and Chen, Tianqi},
journal={Proceedings of Machine Learning and Systems 7},
year={2024}
}