Oven logo

Oven

Published

Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.

Optimum-AMD

🤗 Optimum-AMD is an extension to Hugging Face libraries enabling performance optimizations for ROCm for AMD GPUs and Ryzen AI for AMD NPU accelerator.

Install

Optimum-AMD library can be installed through pip:

pip install --upgrade-strategy eager optimum[amd]

Installation is possible from source as well:

git clone https://github.com/huggingface/optimum-amd.git
cd optimum-amd
pip install -e .

ROCm support for AMD GPUs

Hugging Face libraries natively support AMD GPUs through PyTorch for ROCm with zero code change.

🤗 Transformers natively supports Flash Attention 2, GPTQ quantization with ROCm. 🤗 Text Generation Inference library for LLM deployment has native ROCm support, with Flash Attention 2, Paged Attention, fused positional encoding & layer norm kernels support.

Find out more about these integrations in the documentation!

In the future, Optimum-AMD may host more ROCm-specific optimizations.

How to use it: Text Generation Inference

Text Generation Inference library for LLM deployment supports AMD Instinct MI210/MI250 GPUs. Deployment can be done as follow:

  1. Install ROCm5.7 to the host machine
  2. Example LLM server setup: launch a Falcon-7b model server on the ROCm-enabled docker.
model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.2-rocm --model-id $model
  1. Client setup: Open another shell and run:
curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

How to use it: ONNX Runtime with ROCm

Optimum ONNX Runtime integration supports ROCm for AMD GPUs. Usage is as follow:

  1. Install ROCm 5.7 on the host machine.
  2. Use the example Dockerfile or install onnxruntime-rocm package locally from source. Pip wheels are not available at the time.
  3. Run a BERT text classification ONNX model by using ROCMExecutionProvider:
from optimum.onnxruntime import ORTModelForSequenceClassification
from optimum.pipelines import pipeline
from transformers import AutoTokenizer

ort_model = ORTModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english",
    export=True,
    provider="ROCMExecutionProvider",
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0")
result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9997727274894714}]

Ryzen AI

AMD's Ryzen™ AI family of laptop processors provide users with an integrated Neural Processing Unit (NPU) which offloads the host CPU and GPU from AI processing tasks. Ryzen™ AI software consists of the Vitis™ AI execution provider (EP) for ONNX Runtime combined with quantization tools and a pre-optimized model zoo. All of this is made possible based on Ryzen™ AI technology built on AMD XDNA™ architecture, purpose-built to run AI workloads efficiently and locally, offering a host of benefits for the developer innovating the next groundbreaking AI app.

Optimum-AMD provides easy interface for loading and inference of Hugging Face models on Ryzen AI accelerator.

Ryzen AI Environment setup

A Ryzen AI environment needs to be enabled to use this library. Please refer to Ryzen AI's Installation and Runtime Setup.

How to use it?

  • Quantize the ONNX model with Optimum or using the RyzenAI quantization tools

For more information on quantization refer to Model Quantization guide.

  • Load model with Ryzen AI class

To load a model and run inference with RyzenAI, you can just replace your AutoModelForXxx class with the corresponding RyzenAIModelForXxx class.

import requests
from PIL import Image

- from transformers import AutoModelForImageClassification
+ from optimum.amd.ryzenai import RyzenAIModelForImageClassification
from transformers import AutoFeatureExtractor, pipeline

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model_id = <path of the model>
- model = AutoModelForImageClassification.from_pretrained(model_id)
+ model = RyzenAIModelForImageClassification.from_pretrained(model_id, vaip_config=<path to config file>)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
cls_pipe = pipeline("image-classification", model=model, feature_extractor=feature_extractor)
outputs = cls_pipe(image)

If you find any issue while using those, please open an issue or a pull request.