Oven logo

Oven

Furiosa Serving

Furiosa serving is a lightweight library based on FastAPI to make a model server running on a Furiosa NPU.

Dependency

Furiosa serving depends on followings:

Installation

furiosa-serving can be installed from PyPI using pip (note that the package name is different from the importable name)::

pip install 'furiosa-sdk[serving]'

Getting started

There is one main API called ServeAPI. You can think of ServeAPI as a kind of FastAPI wrapper.

Run server

# main.py
from fastapi import FastAPI
from furiosa.serving import ServeAPI

serve = ServeAPI()

# This is FastAPI instance
app: FastAPI = serve.app

You can run uvicorn server via internal app variable from ServeAPI instance like normal FastAPI application

$ uvicorn main:app # or uvicorn main:serve.app

Load model

From ServeAPI, you can load your model binary which will be running on a Furiosa NPU. You should specify model name and URI where to load the model. URI can be one of them below

  • Local file
  • HTTP
  • S3

Note that model binary which is now supported by Furiosa NPU should be one of them below

from furiosa.common.thread import synchronous
from furiosa.serving import ServeAPI, ServeModel


serve = ServeAPI()


# Load model from local disk
imagenet: ServeModel = synchronous(serve.model("furiosart"))(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

# Load model from HTTP
resnet: ServeModel = synchronous(serve.model("furiosart"))(
    'imagenet',
     location='https://raw.githubusercontent.com/onnx/models/main/vision/classification/resnet/model/resnet50-v1-12.onnx'
)

# Load model from S3 (Auth environment variable for aioboto library required)
densenet: ServeModel = synchronous(serve.model("furiosart"))(
    'imagenet',
     location='s3://furiosa/models/93d63f654f0f192cc4ff5691be60fb9379e9d7fd'
)

Define API

From a model you just created, you can define FastAPI path operation decorator like post(), get() to expose API endpoints.

You should follow FastAPI Request Body concept to correctly define payload.

:warning: This example below is not actually working as you have to define your own preprocess(), postprocess() functions first.

from typing import Dict

from fastapi import File, UploadFile
from furiosa.common.thread import synchronous
from furiosa.serving import ServeAPI, ServeModel
import numpy as np


serve = ServeAPI()


model: ServeModel = synchronous(serve.model("furiosart"))(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

@model.post("/models/imagenet/infer")
async def infer(image: UploadFile = File(...)) -> Dict:
    # Convert image to Numpy array with your preprocess() function
    tensors: List[np.ndarray] = preprocess(image)

    # Infer from ServeModel
    result: List[np.ndarray] = await model.predict(tensors)

    # Classify model from numpy array with your postprocess() function
    response: Dict = postprocess(result)

    return response

After running uvicorn server, you can find documentations provided by FastAPI at localhost:8000/docs

Use sub applications

Furiosa serving provides predefined FastAPI sub applications to give you additional functionalities out of box.

You can mount the sub applications using mount(). We provides several sub applications like below

  • Repository: model repository to list models and load/unload a model dynamically
  • Model: model metadata, model readiness
  • Health: server health, server readiness
from fastapi import FastAPI
from furiosa.serving import ServeAPI
from furiosa.serving.apps import health, model, repository


# Create ServeAPI with Repository instance. This repository maintains models
serve = ServeAPI(repository.repository)

app: FastAPI = serve.app

app.mount("/repository", repository.app)
app.mount("/models", model.app)
app.mount("/health", health.app)

You can also find documentations for the sub applications at localhost:8000/{application}/docs. Note that model sub application has different default doc API like localhost:8000/{application}/api/docs since default doc URL conflicts model API.

Use processors for pre/post processing

Furiosa serving provides several processors which are predefined pre/post process functions to convert your data for each model.

import numpy as np
from furiosa.common.thread import synchronous
from furiosa.serving import ServeModel, ServeAPI
from furiosa.serving.processors import imagenet


serve = ServeAPI()

model: ServeModel = synchronous(serve.model("furiosart"))(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

@model.post("/models/imagenet/infer")
async def infer(image: UploadFile = File(...)) -> Dict:
    shape = model.inputs[0].shape
    input = await imagenet.preprocess(shape, image)
    output = await model.predict(input)
    return await imagenet.postprocess(
        output[0], label='./examples/assets/labels/ImageNetLabels.txt'
    )

Compose models

You can composite multiple models using FastAPI dependency injection.

:warning: This example below is not actually working as there is no segmentnet in processors yet

from fastapi import Depends
from furiosa.common.thread import synchronous
from furiosa.serving import ServeModel, ServeAPI


serve = ServeAPI()

imagenet: ServeModel = synchronous(serve.model("furiosart"))(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

segmentnet: ServeModel = synchronous(serve.model("furiosart"))(
    'segmentnet',
    location='./examples/assets/models/image_segmentation.onnx'
)

# Note that no "imagenet.post()" here not to expose the endpoint
async def classify(image: UploadFile = File(...)) -> List[np.ndarray]:
    from furiosa.serving.processors.imagenet import preprocess

    tensors: List[np.arrary] = await preprocess(
        imagenet.inputs[0].shape, image
    )
    return await imagenet.predict(tensors)

@segmentnet.post("/models/composed/infer")
async def segment(tensors: List[np.ndarray] = Depends(classify)) -> Dict:
    from furiosa.serving.processors.segmentnet import postprocess

    tensors = await model.predict(tensors)
    return await postprocess(tensors)

Example 1

You can find a complete example at examples/image_classify.py

cd examples

examples$ python image_classify.py
INFO:furiosa_sdk_runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/debug/libnux.so (0.4.0-dev d1720b938)
INFO:     Started server process [984608]
INFO:uvicorn.error:Started server process [984608]
INFO:     Waiting for application startup.
INFO:uvicorn.error:Waiting for application startup.
[1/6] 🔍   Compiling from tflite to dfg
Done in 0.27935523s
[2/6] 🔍   Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 1079.9143s
▪▪▪▪▪ [2/3] Lowering...Done in 93.315895s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 45.07178s
Done in 1218.3285s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.002127793s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.096237786s
[5/6] 🔍   Compiling from gir to lir
Done in 0.03271749s
[6/6] 🔍   Compiling from lir to enf
Done in 0.48739022s
✨  Finished in 1219.4524s
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

You can find available API in http://localhost:8000/docs#/

Send image to classify a image from server you just launched.

examples$ curl -X 'POST' \
  'http://127.0.0.1:8000/imagenet/infer' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'image=@assets/images/car.jpg'

Example 2

In many user scenarios, for each request users may want to split a large image into a number of small images, and process all of them at a time. In this use cases, using multiple devices will be able to boost the throughput, eventually leading to lower latency. This example examples/number_classify.py shows how to implement this usecase with session pool and Python async/await/gather.

cd examples

examples$ python number_classify.py
INFO:     Started server process [57892]
INFO:     Waiting for application startup.
2022-10-28T05:36:42.468215Z  INFO nux::npu: Npu (npu0pe0-1) is being initialized
2022-10-28T05:36:42.473084Z  INFO nux: NuxInner create with pes: [PeId(0)]
2022-10-28T05:36:42.503103Z  INFO nux::npu: Npu (npu1pe0-1) is being initialized
2022-10-28T05:36:42.507724Z  INFO nux: NuxInner create with pes: [PeId(0)]
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)


You can find available API in http://localhost:8000/docs#/

Send image to classify a image from server you just launched.

examples$ curl -X 'POST' \
  'http://127.0.0.1:8000/infer' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@assets/images/1234567890.jpg'

Code

The code and issue tracker are hosted on GitHub:
https://github.com/furiosa-ai/furiosa-sdk

Contributing

We welcome many types of contributions - bug reports, pull requests (code, infrastructure or documentation fixes). For more information about how to contribute to the project, see the CONTRIBUTING.md file in the repository.