arize-phoenix-evals2.1.0
arize-phoenix-evals2.1.0
Published
LLM Evaluations
pip install arize-phoenix-evals
Package Downloads
Authors
Project URLs
Requires Python
<3.14,>=3.8
Dependencies
- jsonpath-ng
- openinference-instrumentation
>=0.1.20
- openinference-semantic-conventions
>=0.1.19
- opentelemetry-api
- pandas
- pydantic
>=2.0.0
- pystache
- tqdm
- typing-extensions
<5,>=4.5
- anthropic
>0.18.0; extra == "dev"
- boto3
; extra == "dev"
- litellm
>=1.28.9; extra == "dev"
- mistralai
>=1.0.0; extra == "dev"
- openai
>=1.0.0; extra == "dev"
- vertexai
; extra == "dev"
- anthropic
>=0.18.0; extra == "test"
- boto3
; extra == "test"
- lameenc
; extra == "test"
- litellm
>=1.28.9; extra == "test"
- mistralai
>=1.0.0; extra == "test"
- nest-asyncio
; extra == "test"
- openai
>=1.0.0; extra == "test"
- openinference-semantic-conventions
; extra == "test"
- pandas
; extra == "test"
- pandas-stubs
<=2.0.2.230605; extra == "test"
- respx
; extra == "test"
- tqdm
; extra == "test"
- types-tqdm
; extra == "test"
- typing-extensions
<5,>=4.5; extra == "test"
- vertexai
; extra == "test"
arize-phoenix-evals
Phoenix Evals provides lightweight, composable building blocks for writing and running evaluations on LLM applications, including tools to determine relevance, toxicity, hallucination detection, and much more.
Features
- Works with your preferred model SDKs via adapters (OpenAI, LiteLLM, LangChain)
- Powerful input mapping and binding for working with complex data structures
- Several pre-built metrics for common evaluation tasks like hallucination detection
- Evaluators are natively instrumented via OpenTelemetry tracing for observability and dataset curation
- Blazing fast performance - achieve up to 20x speedup with built-in concurrency and batching
- Tons of convenience features to improve the developer experience!
Installation
Install Phoenix Evals 2.0 using pip:
pip install 'arize-phoenix-evals>=2.0.0' openai
Quick Start
from phoenix.evals import create_classifier
from phoenix.evals.llm import LLM
# Create an LLM instance
llm = LLM(provider="openai", model="gpt-4o")
# Create an evaluator
evaluator = create_classifier(
name="helpfulness",
prompt_template="Rate the response to the user query as helpful or not:\n\nQuery: {input}\nResponse: {output}",
llm=llm,
choices={"helpful": 1.0, "not_helpful": 0.0},
)
# Simple evaluation
scores = evaluator.evaluate({"input": "How do I reset?", "output": "Go to settings > reset."})
scores[0].pretty_print()
# With input mapping for nested data
scores = evaluator.evaluate(
{"data": {"query": "How do I reset?", "response": "Go to settings > reset."}},
input_mapping={"input": "data.query", "output": "data.response"}
)
scores[0].pretty_print()
Evaluating Dataframes
import pandas as pd
from phoenix.evals import create_classifier, evaluate_dataframe
from phoenix.evals.llm import LLM
# Create an LLM instance
llm = LLM(provider="openai", model="gpt-4o")
# Create multiple evaluators
relevance_evaluator = create_classifier(
name="relevance",
prompt_template="Is the response relevant to the query?\n\nQuery: {input}\nResponse: {output}",
llm=llm,
choices={"relevant": 1.0, "irrelevant": 0.0},
)
helpfulness_evaluator = create_classifier(
name="helpfulness",
prompt_template="Is the response helpful?\n\nQuery: {input}\nResponse: {output}",
llm=llm,
choices={"helpful": 1.0, "not_helpful": 0.0},
)
# Prepare your dataframe
df = pd.DataFrame([
{"input": "How do I reset my password?", "output": "Go to settings > account > reset password."},
{"input": "What's the weather like?", "output": "I can help you with password resets."},
])
# Evaluate the dataframe
results_df = evaluate_dataframe(
dataframe=df,
evaluators=[relevance_evaluator, helpfulness_evaluator],
)
print(results_df.head())
Documentation
- Full Documentation - Complete API reference and guides
- Phoenix Docs - Detailed use-cases and examples
- OpenInference - Auto-instrumentation libraries for frameworks
Community
Join our community to connect with thousands of AI builders:
- π Join our Slack community.
- π Read the Phoenix documentation.
- π‘ Ask questions and provide feedback in the #phoenix-support channel.
- π Leave a star on our GitHub.
- π Report bugs with GitHub Issues.
- π Follow us on π.
- πΊοΈ Check out our roadmap to see where we're heading next.