instructor1.13.0
Published
structured outputs for llm
pip install instructor
Package Downloads
Project URLs
Requires Python
<4.0,>=3.9
Dependencies
- aiohttp
<4.0.0,>=3.9.1 - diskcache
>=5.6.3 - docstring-parser
<1.0,>=0.16 - jinja2
<4.0.0,>=3.1.4 - jiter
<0.12,>=0.6.1 - openai
<3.0.0,>=2.0.0 - pre-commit
>=4.3.0 - pydantic-core
<3.0.0,>=2.18.0 - pydantic
<3.0.0,>=2.8.0 - requests
<3.0.0,>=2.32.3 - rich
<15.0.0,>=13.7.0 - tenacity
<10.0.0,>=8.2.3 - ty
>=0.0.1a23 - typer
<1.0.0,>=0.9.0 - anthropic
==0.71.0; extra == "anthropic" - xmltodict
<1.1,>=0.13; extra == "anthropic" - boto3
<2.0.0,>=1.34.0; extra == "bedrock" - cerebras-cloud-sdk
<2.0.0,>=1.5.0; extra == "cerebras-cloud-sdk" - cohere
<6.0.0,>=5.1.8; extra == "cohere" - datasets
<5.0.0,>=3.0.1; extra == "datasets" - coverage
<8.0.0,>=7.3.2; extra == "dev" - jsonref
<2.0.0,>=1.1.0; extra == "dev" - pre-commit
>=4.2.0; extra == "dev" - pytest-asyncio
<2.0.0,>=0.24.0; extra == "dev" - pytest-examples
>=0.0.15; extra == "dev" - pytest-xdist
>=3.8.0; extra == "dev" - pytest
<9.0.0,>=8.3.3; extra == "dev" - python-dotenv
>=1.0.1; extra == "dev" - mkdocs-jupyter
<0.26.0,>=0.24.6; extra == "docs" - mkdocs-material-extensions
>=1.3.1; extra == "docs" - mkdocs-material
>=9.6.14; extra == "docs" - mkdocs-material
[imaging]<10.0.0,>=9.5.9; extra == "docs" - mkdocs-minify-plugin
<1.0.0,>=0.8.0; extra == "docs" - mkdocs-redirects
<2.0.0,>=1.2.1; extra == "docs" - mkdocs-rss-plugin
<2.0.0,>=1.12.0; extra == "docs" - mkdocs
<2.0.0,>=1.6.1; extra == "docs" - mkdocstrings-python
<2.0.0,>=1.12.2; extra == "docs" - mkdocstrings
<0.31.0,>=0.27.1; extra == "docs" - pytest-examples
>=0.0.15; extra == "docs" - fireworks-ai
<1.0.0,>=0.15.4; extra == "fireworks-ai" - google-genai
>=1.5.0; extra == "google-genai" - jsonref
<2.0.0,>=1.1.0; extra == "google-genai" - graphviz
<1.0.0,>=0.20.3; extra == "graphviz" - groq
<0.34.0,>=0.4.2; extra == "groq" - litellm
<2.0.0,>=1.35.31; extra == "litellm" - mistralai
<2.0.0,>=1.5.1; extra == "mistral" - openai
<3.0.0,>=2.0.0; extra == "perplexity" - phonenumbers
<10.0.0,>=8.13.33; extra == "phonenumbers" - pydub
<1.0.0,>=0.25.1; extra == "pydub" - sqlmodel
<1.0.0,>=0.0.22; extra == "sqlmodel" - diskcache
<6.0.0,>=5.6.3; extra == "test-docs" - fastapi
<0.121.0,>=0.109.2; extra == "test-docs" - litellm
<2.0.0,>=1.35.31; extra == "test-docs" - mistralai
<2.0.0,>=1.5.1; extra == "test-docs" - pandas
<3.0.0,>=2.2.0; extra == "test-docs" - pydantic-extra-types
<3.0.0,>=2.6.0; extra == "test-docs" - redis
<8.0.0,>=5.0.1; extra == "test-docs" - tabulate
<1.0.0,>=0.9.0; extra == "test-docs" - trafilatura
<3.0.0,>=1.12.2; extra == "trafilatura" - google-cloud-aiplatform
<2.0.0,>=1.53.0; extra == "vertexai" - jsonref
<2.0.0,>=1.1.0; extra == "vertexai" - writer-sdk
<3.0.0,>=2.2.0; extra == "writer" - python-dotenv
>=1.0.0; extra == "xai" - xai-sdk
>=0.2.0; python_version >= "3.10" and extra == "xai"
Instructor: Structured Outputs for LLMs
Get reliable JSON from any LLM. Built on Pydantic for validation, type safety, and IDE support.
import instructor
from pydantic import BaseModel
# Define what you want
class User(BaseModel):
name: str
age: int
# Extract it from natural language
client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "John is 25 years old"}],
)
print(user) # User(name='John', age=25)
That's it. No JSON parsing, no error handling, no retries. Just define a model and get structured data.
Why Instructor?
Getting structured data from LLMs is hard. You need to:
- Write complex JSON schemas
- Handle validation errors
- Retry failed extractions
- Parse unstructured responses
- Deal with different provider APIs
Instructor handles all of this with one simple interface:
| Without Instructor | With Instructor |
|
|
Install in seconds
pip install instructor
Or with your package manager:
uv add instructor
poetry add instructor
Works with every major provider
Use the same code with any LLM provider:
# OpenAI
client = instructor.from_provider("openai/gpt-4o")
# Anthropic
client = instructor.from_provider("anthropic/claude-3-5-sonnet")
# Google
client = instructor.from_provider("google/gemini-pro")
# Ollama (local)
client = instructor.from_provider("ollama/llama3.2")
# With API keys directly (no environment variables needed)
client = instructor.from_provider("openai/gpt-4o", api_key="sk-...")
client = instructor.from_provider("anthropic/claude-3-5-sonnet", api_key="sk-ant-...")
client = instructor.from_provider("groq/llama-3.1-8b-instant", api_key="gsk_...")
# All use the same API!
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)
Production-ready features
Automatic retries
Failed validations are automatically retried with the error message:
from pydantic import BaseModel, field_validator
class User(BaseModel):
name: str
age: int
@field_validator('age')
def validate_age(cls, v):
if v < 0:
raise ValueError('Age must be positive')
return v
# Instructor automatically retries when validation fails
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
max_retries=3,
)
Streaming support
Stream partial objects as they're generated:
from instructor import Partial
for partial_user in client.chat.completions.create(
response_model=Partial[User],
messages=[{"role": "user", "content": "..."}],
stream=True,
):
print(partial_user)
# User(name=None, age=None)
# User(name="John", age=None)
# User(name="John", age=25)
Nested objects
Extract complex, nested data structures:
from typing import List
class Address(BaseModel):
street: str
city: str
country: str
class User(BaseModel):
name: str
age: int
addresses: List[Address]
# Instructor handles nested objects automatically
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)
Used in production by
Trusted by over 100,000 developers and companies building AI applications:
- 3M+ monthly downloads
- 10K+ GitHub stars
- 1000+ community contributors
Companies using Instructor include teams at OpenAI, Google, Microsoft, AWS, and many YC startups.
Get started
Basic extraction
Extract structured data from any text:
from pydantic import BaseModel
import instructor
client = instructor.from_provider("openai/gpt-4o-mini")
class Product(BaseModel):
name: str
price: float
in_stock: bool
product = client.chat.completions.create(
response_model=Product,
messages=[{"role": "user", "content": "iPhone 15 Pro, $999, available now"}],
)
print(product)
# Product(name='iPhone 15 Pro', price=999.0, in_stock=True)
Multiple languages
Instructor's simple API is available in many languages:
- Python - The original
- TypeScript - Full TypeScript support
- Ruby - Ruby implementation
- Go - Go implementation
- Elixir - Elixir implementation
- Rust - Rust implementation
Learn more
- Documentation - Comprehensive guides
- Examples - Copy-paste recipes
- Blog - Tutorials and best practices
- Discord - Get help from the community
Why use Instructor over alternatives?
vs Raw JSON mode: Instructor provides automatic validation, retries, streaming, and nested object support. No manual schema writing.
vs LangChain/LlamaIndex: Instructor is focused on one thing - structured extraction. It's lighter, faster, and easier to debug.
vs Custom solutions: Battle-tested by thousands of developers. Handles edge cases you haven't thought of yet.
Contributing
We welcome contributions! Check out our good first issues to get started.
License
MIT License - see LICENSE for details.
Built by the Instructor community. Special thanks to Jason Liu and all contributors.