optimum-amd0.1.0
Published
Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.
pip install optimum-amd
Package Downloads
Authors
Project URLs
Requires Python
Optimum-AMD
🤗 Optimum-AMD is an extension to Hugging Face libraries enabling performance optimizations for ROCm for AMD GPUs and Ryzen AI for AMD NPU accelerator.
Install
Optimum-AMD library can be installed through pip:
pip install --upgrade-strategy eager optimum[amd]
Installation is possible from source as well:
git clone https://github.com/huggingface/optimum-amd.git
cd optimum-amd
pip install -e .
ROCm support for AMD GPUs
Hugging Face libraries natively support AMD GPUs through PyTorch for ROCm with zero code change.
🤗 Transformers natively supports Flash Attention 2, GPTQ quantization with ROCm. 🤗 Text Generation Inference library for LLM deployment has native ROCm support, with Flash Attention 2, Paged Attention, fused positional encoding & layer norm kernels support.
Find out more about these integrations in the documentation!
In the future, Optimum-AMD may host more ROCm-specific optimizations.
How to use it: Text Generation Inference
Text Generation Inference library for LLM deployment supports AMD Instinct MI210/MI250 GPUs. Deployment can be done as follow:
- Install ROCm5.7 to the host machine
- Example LLM server setup: launch a Falcon-7b model server on the ROCm-enabled docker.
model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.2-rocm --model-id $model
- Client setup: Open another shell and run:
curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
How to use it: ONNX Runtime with ROCm
Optimum ONNX Runtime integration supports ROCm for AMD GPUs. Usage is as follow:
- Install ROCm 5.7 on the host machine.
- Use the example Dockerfile or install
onnxruntime-rocm
package locally from source. Pip wheels are not available at the time. - Run a BERT text classification ONNX model by using
ROCMExecutionProvider
:
from optimum.onnxruntime import ORTModelForSequenceClassification
from optimum.pipelines import pipeline
from transformers import AutoTokenizer
ort_model = ORTModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english",
export=True,
provider="ROCMExecutionProvider",
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0")
result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9997727274894714}]
Ryzen AI
AMD's Ryzen™ AI family of laptop processors provide users with an integrated Neural Processing Unit (NPU) which offloads the host CPU and GPU from AI processing tasks. Ryzen™ AI software consists of the Vitis™ AI execution provider (EP) for ONNX Runtime combined with quantization tools and a pre-optimized model zoo. All of this is made possible based on Ryzen™ AI technology built on AMD XDNA™ architecture, purpose-built to run AI workloads efficiently and locally, offering a host of benefits for the developer innovating the next groundbreaking AI app.
Optimum-AMD provides easy interface for loading and inference of Hugging Face models on Ryzen AI accelerator.
Ryzen AI Environment setup
A Ryzen AI environment needs to be enabled to use this library. Please refer to Ryzen AI's Installation and Runtime Setup.
How to use it?
- Quantize the ONNX model with Optimum or using the RyzenAI quantization tools
For more information on quantization refer to Model Quantization guide.
- Load model with Ryzen AI class
To load a model and run inference with RyzenAI, you can just replace your AutoModelForXxx
class with the corresponding RyzenAIModelForXxx
class.
import requests
from PIL import Image
- from transformers import AutoModelForImageClassification
+ from optimum.amd.ryzenai import RyzenAIModelForImageClassification
from transformers import AutoFeatureExtractor, pipeline
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
model_id = <path of the model>
- model = AutoModelForImageClassification.from_pretrained(model_id)
+ model = RyzenAIModelForImageClassification.from_pretrained(model_id, vaip_config=<path to config file>)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
cls_pipe = pipeline("image-classification", model=model, feature_extractor=feature_extractor)
outputs = cls_pipe(image)
If you find any issue while using those, please open an issue or a pull request.