outlines-core0.2.3
Published
Structured Text Generation in Rust
pip install outlines-core
Package Downloads
Authors
Project URLs
Requires Python
>=3.8
Dependencies
- jsonschema
- pre-commit
; extra == "test"
- pydantic
; extra == "test"
- pytest
; extra == "test"
- pytest-benchmark
; extra == "test"
- pytest-cov
; extra == "test"
- pytest-mock
; extra == "test"
- coverage
[toml]>=5.1; extra == "test"
- diff-cover
; extra == "test"
- numpy
; extra == "test"
- scipy
; extra == "test"
- asv
; extra == "test"
- psutil
; extra == "test"
- setuptools-rust
; extra == "test"
Outlines-core
This package provides the core functionality for structured generation, formerly implemented in Outlines, with a focus on performance and portability, it offers a convenient way to:
-
build regular expressions from JSON schemas
-
construct an
Index
object by combining aVocabulary
and regular expression to efficiently map tokens from a given vocabulary to state transitions in a finite-state automation
Example
Basic example of how it all fits together.
use outlines_core::prelude::*;
// Define a JSON schema
let schema = r#"{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name", "age"]
}"#;
// Generate a regular expression from it
let regex = json_schema::regex_from_str(&schema, None)?;
// Create `Vocabulary` from pretrained large language model (but manually is also possible)
let vocabulary = Vocabulary::from_pretrained("openai-community/gpt2", None)?;
// Create new `Index` from regex and a given `Vocabulary`
let index = Index::new(®ex, &vocabulary)?;
let initial_state = index.initial_state();
let allowed_tokens = index.allowed_tokens(&initial_state).expect("Some allowed token ids");
let token_id = allowed_tokens.first().expect("First token id");
let next_state = index.next_state(&initial_state, token_id);
let final_states = index.final_states();
Python Bindings
Additionally, project provides interfaces to integrate the crate's functionality with Python.
import json
from outlines_core.json_schema import build_regex_from_schema
from outlines_core.guide import Guide, Index, Vocabulary
schema = {
"title": "Foo",
"type": "object",
"properties": {"date": {"type": "string", "format": "date"}}
}
regex = build_regex_from_schema(json.dumps(schema))
vocabulary = Vocabulary.from_pretrained("openai-community/gpt2")
index = Index(regex, vocabulary)
guide = Guide(index)
# Get current state of the Guide:
current_state = guide.get_state()
# Get allowed tokens for the current state of the Guide:
allowed_tokens = guide.get_tokens()
# Advance Guide to the next state via some token_id and return allowed tokens for that new state:
next_allowed_tokens = guide.advance(allowed_tokens[-1])
# To check if Guide is finished:
guide.is_finished()
# If it's finished then this assertion holds:
assert guide.get_tokens() == [vocabulary.get_eos_token_id()]
How to contribute?
Setup
Fork the repository on GitHub and clone the fork locally:
git clone [email protected]/YourUserName/outlines-core.git
cd outlines-core
Create a new virtual environment and install the dependencies in editable mode:
python -m venv .venv
source .venv/bin/activate
pip install -e ".[test]"
pre-commit install
Before pushing your code
If working with Python bindings don't forget to build Rust extension before testing, for example, in debug mode:
make build-extension-debug
Run Python tests:
pytest
Run Rust tests:
cargo test
Or alternatively using Makefile for both:
make test
Finally, run the code style checks:
pre-commit run --all-files
Or using Makefile:
make pcc
If necessary you can run benchmarks locally:
make pybench