furiosa-serving0.10.2
Published
Furiosa serving framework, easy to use inference server.
pip install furiosa-serving
Package Downloads
Authors
Project URLs
Requires Python
~=3.8
Dependencies
- furiosa-server
==0.10.*
- Pillow
- python-multipart
- httpx
- prometheus-client
- opentelemetry-instrumentation-fastapi
- opentelemetry-instrumentation-logging
- opentelemetry-exporter-otlp
- opentelemetry-api
- opentelemetry-sdk
- furiosa-server
[openvino]==0.10.*; extra == "openvino"
- transformers
; extra == "openvino"
- mypy
; extra == "test"
- pytest
; extra == "test"
- pytest-asyncio
~=0.17.2; extra == "test"
- pytest-cov
; extra == "test"
- ruff
; extra == "test"
- types-Pillow
; extra == "test"
Furiosa Serving
Furiosa serving is a lightweight library based on FastAPI to make a model server running on a Furiosa NPU.
Dependency
Furiosa serving depends on followings:
- Furiosa NPU
- furiosa-server
Installation
furiosa-serving
can be installed from PyPI using pip
(note that the package name is different from the importable name)::
pip install 'furiosa-sdk[serving]'
Getting started
There is one main API called ServeAPI
. You can think of ServeAPI
as a kind of FastAPI
wrapper.
Run server
# main.py
from fastapi import FastAPI
from furiosa.serving import ServeAPI
serve = ServeAPI()
# This is FastAPI instance
app: FastAPI = serve.app
You can run uvicorn server via internal app
variable from ServeAPI
instance like normal FastAPI application
$ uvicorn main:app # or uvicorn main:serve.app
Load model
From ServeAPI
, you can load your model binary which will be running on a Furiosa NPU. You should specify model name and URI where to load the model. URI can be one of them below
- Local file
- HTTP
- S3
Note that model binary which is now supported by Furiosa NPU should be one of them below
from furiosa.common.thread import synchronous
from furiosa.serving import ServeAPI, ServeModel
serve = ServeAPI()
# Load model from local disk
imagenet: ServeModel = synchronous(serve.model("furiosart"))(
'imagenet',
location='./examples/assets/models/image_classification.onnx'
)
# Load model from HTTP
resnet: ServeModel = synchronous(serve.model("furiosart"))(
'imagenet',
location='https://raw.githubusercontent.com/onnx/models/main/vision/classification/resnet/model/resnet50-v1-12.onnx'
)
# Load model from S3 (Auth environment variable for aioboto library required)
densenet: ServeModel = synchronous(serve.model("furiosart"))(
'imagenet',
location='s3://furiosa/models/93d63f654f0f192cc4ff5691be60fb9379e9d7fd'
)
Define API
From a model you just created, you can define FastAPI path operation decorator like post()
, get()
to expose API endpoints.
You should follow FastAPI Request Body concept to correctly define payload.
:warning: This example below is not actually working as you have to define your own preprocess(), postprocess() functions first.
from typing import Dict
from fastapi import File, UploadFile
from furiosa.common.thread import synchronous
from furiosa.serving import ServeAPI, ServeModel
import numpy as np
serve = ServeAPI()
model: ServeModel = synchronous(serve.model("furiosart"))(
'imagenet',
location='./examples/assets/models/image_classification.onnx'
)
@model.post("/models/imagenet/infer")
async def infer(image: UploadFile = File(...)) -> Dict:
# Convert image to Numpy array with your preprocess() function
tensors: List[np.ndarray] = preprocess(image)
# Infer from ServeModel
result: List[np.ndarray] = await model.predict(tensors)
# Classify model from numpy array with your postprocess() function
response: Dict = postprocess(result)
return response
After running uvicorn server, you can find documentations provided by FastAPI at localhost:8000/docs
Use sub applications
Furiosa serving provides predefined FastAPI sub applications to give you additional functionalities out of box.
You can mount the sub applications using mount()
. We provides several sub applications like below
- Repository: model repository to list models and load/unload a model dynamically
- Model: model metadata, model readiness
- Health: server health, server readiness
from fastapi import FastAPI
from furiosa.serving import ServeAPI
from furiosa.serving.apps import health, model, repository
# Create ServeAPI with Repository instance. This repository maintains models
serve = ServeAPI(repository.repository)
app: FastAPI = serve.app
app.mount("/repository", repository.app)
app.mount("/models", model.app)
app.mount("/health", health.app)
You can also find documentations for the sub applications at localhost:8000/{application}/docs
. Note that model
sub application has different default doc API like localhost:8000/{application}/api/docs
since default doc URL conflicts model API.
Use processors for pre/post processing
Furiosa serving provides several processors which are predefined pre/post process functions to convert your data for each model.
import numpy as np
from furiosa.common.thread import synchronous
from furiosa.serving import ServeModel, ServeAPI
from furiosa.serving.processors import imagenet
serve = ServeAPI()
model: ServeModel = synchronous(serve.model("furiosart"))(
'imagenet',
location='./examples/assets/models/image_classification.onnx'
)
@model.post("/models/imagenet/infer")
async def infer(image: UploadFile = File(...)) -> Dict:
shape = model.inputs[0].shape
input = await imagenet.preprocess(shape, image)
output = await model.predict(input)
return await imagenet.postprocess(
output[0], label='./examples/assets/labels/ImageNetLabels.txt'
)
Compose models
You can composite multiple models using FastAPI dependency injection.
:warning: This example below is not actually working as there is no segmentnet in processors yet
from fastapi import Depends
from furiosa.common.thread import synchronous
from furiosa.serving import ServeModel, ServeAPI
serve = ServeAPI()
imagenet: ServeModel = synchronous(serve.model("furiosart"))(
'imagenet',
location='./examples/assets/models/image_classification.onnx'
)
segmentnet: ServeModel = synchronous(serve.model("furiosart"))(
'segmentnet',
location='./examples/assets/models/image_segmentation.onnx'
)
# Note that no "imagenet.post()" here not to expose the endpoint
async def classify(image: UploadFile = File(...)) -> List[np.ndarray]:
from furiosa.serving.processors.imagenet import preprocess
tensors: List[np.arrary] = await preprocess(
imagenet.inputs[0].shape, image
)
return await imagenet.predict(tensors)
@segmentnet.post("/models/composed/infer")
async def segment(tensors: List[np.ndarray] = Depends(classify)) -> Dict:
from furiosa.serving.processors.segmentnet import postprocess
tensors = await model.predict(tensors)
return await postprocess(tensors)
Example 1
You can find a complete example at examples/image_classify.py
cd examples
examples$ python image_classify.py
INFO:furiosa_sdk_runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/debug/libnux.so (0.4.0-dev d1720b938)
INFO: Started server process [984608]
INFO:uvicorn.error:Started server process [984608]
INFO: Waiting for application startup.
INFO:uvicorn.error:Waiting for application startup.
[1/6] 🔍 Compiling from tflite to dfg
Done in 0.27935523s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 1079.9143s
▪▪▪▪▪ [2/3] Lowering...Done in 93.315895s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 45.07178s
Done in 1218.3285s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.002127793s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.096237786s
[5/6] 🔍 Compiling from gir to lir
Done in 0.03271749s
[6/6] 🔍 Compiling from lir to enf
Done in 0.48739022s
✨ Finished in 1219.4524s
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
You can find available API in http://localhost:8000/docs#/
Send image to classify a image from server you just launched.
examples$ curl -X 'POST' \
'http://127.0.0.1:8000/imagenet/infer' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'image=@assets/images/car.jpg'
Example 2
In many user scenarios, for each request users may want to split a large image into a number of small images, and process all of them at a time.
In this use cases, using multiple devices will be able to boost the throughput, eventually leading to lower latency.
This example examples/number_classify.py
shows how to implement this usecase with session pool and Python async/await/gather.
cd examples
examples$ python number_classify.py
INFO: Started server process [57892]
INFO: Waiting for application startup.
2022-10-28T05:36:42.468215Z INFO nux::npu: Npu (npu0pe0-1) is being initialized
2022-10-28T05:36:42.473084Z INFO nux: NuxInner create with pes: [PeId(0)]
2022-10-28T05:36:42.503103Z INFO nux::npu: Npu (npu1pe0-1) is being initialized
2022-10-28T05:36:42.507724Z INFO nux: NuxInner create with pes: [PeId(0)]
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
You can find available API in http://localhost:8000/docs#/
Send image to classify a image from server you just launched.
examples$ curl -X 'POST' \
'http://127.0.0.1:8000/infer' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@assets/images/1234567890.jpg'
Code
The code and issue tracker are hosted on GitHub:
https://github.com/furiosa-ai/furiosa-sdk
Contributing
We welcome many types of contributions - bug reports, pull requests (code, infrastructure or documentation fixes). For more information about how to contribute to the project, see the CONTRIBUTING.md
file in the repository.