evaluate0.4.3
Published
HuggingFace community-driven open-source library of evaluation
pip install evaluate
Package Downloads
Authors
Requires Python
>=3.8.0
Dependencies
- datasets
>=2.0.0
- numpy
>=1.17
- dill
- pandas
- requests
>=2.19.0
- tqdm
>=4.62.1
- xxhash
- multiprocess
- fsspec
[http]>=2021.05.0
- huggingface-hub
>=0.7.0
- packaging
- importlib-metadata
; python_version < "3.8"
- absl-py
; extra == "dev"
- charcut
>=1.1.1; extra == "dev"
- cer
>=1.2.0; extra == "dev"
- nltk
<3.9; extra == "dev"
- pytest
; extra == "dev"
- pytest-datadir
; extra == "dev"
- pytest-xdist
; extra == "dev"
- tensorflow
!=2.6.0,!=2.6.1,<=2.10,>=2.3; extra == "dev"
- torch
; extra == "dev"
- accelerate
; extra == "dev"
- bert-score
>=0.3.6; extra == "dev"
- rouge-score
>=0.1.2; extra == "dev"
- sacrebleu
; extra == "dev"
- sacremoses
; extra == "dev"
- scipy
>=1.10.0; extra == "dev"
- seqeval
; extra == "dev"
- scikit-learn
; extra == "dev"
- jiwer
; extra == "dev"
- sentencepiece
; extra == "dev"
- transformers
; extra == "dev"
- mauve-text
; extra == "dev"
- trectools
; extra == "dev"
- toml
>=0.10.1; extra == "dev"
- requests-file
>=1.5.1; extra == "dev"
- tldextract
>=3.1.0; extra == "dev"
- texttable
>=1.6.3; extra == "dev"
- unidecode
>=1.3.4; extra == "dev"
- Werkzeug
>=1.0.1; extra == "dev"
- six
~=1.15.0; extra == "dev"
- black
~=22.0; extra == "dev"
- flake8
>=3.8.3; extra == "dev"
- isort
>=5.0.0; extra == "dev"
- pyyaml
>=5.3.1; extra == "dev"
- s3fs
; extra == "docs"
- transformers
; extra == "evaluator"
- scipy
>=1.7.1; extra == "evaluator"
- black
~=22.0; extra == "quality"
- flake8
>=3.8.3; extra == "quality"
- isort
>=5.0.0; extra == "quality"
- pyyaml
>=5.3.1; extra == "quality"
- cookiecutter
; extra == "template"
- gradio
>=3.0.0; extra == "template"
- tensorflow
!=2.6.0,!=2.6.1,>=2.2.0; extra == "tensorflow"
- tensorflow-gpu
!=2.6.0,!=2.6.1,>=2.2.0; extra == "tensorflow-gpu"
- absl-py
; extra == "tests"
- charcut
>=1.1.1; extra == "tests"
- cer
>=1.2.0; extra == "tests"
- nltk
<3.9; extra == "tests"
- pytest
; extra == "tests"
- pytest-datadir
; extra == "tests"
- pytest-xdist
; extra == "tests"
- tensorflow
!=2.6.0,!=2.6.1,<=2.10,>=2.3; extra == "tests"
- torch
; extra == "tests"
- accelerate
; extra == "tests"
- bert-score
>=0.3.6; extra == "tests"
- rouge-score
>=0.1.2; extra == "tests"
- sacrebleu
; extra == "tests"
- sacremoses
; extra == "tests"
- scipy
>=1.10.0; extra == "tests"
- seqeval
; extra == "tests"
- scikit-learn
; extra == "tests"
- jiwer
; extra == "tests"
- sentencepiece
; extra == "tests"
- transformers
; extra == "tests"
- mauve-text
; extra == "tests"
- trectools
; extra == "tests"
- toml
>=0.10.1; extra == "tests"
- requests-file
>=1.5.1; extra == "tests"
- tldextract
>=3.1.0; extra == "tests"
- texttable
>=1.6.3; extra == "tests"
- unidecode
>=1.3.4; extra == "tests"
- Werkzeug
>=1.0.1; extra == "tests"
- six
~=1.15.0; extra == "tests"
- torch
; extra == "torch"
š¤ Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized.
It currently contains:
- implementations of dozens of popular metrics: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like
accuracy = load("accuracy")
, get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX). - comparisons and measurements: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.
- an easy way of adding new evaluation modules to the š¤ Hub: you can create new evaluation modules and push them to a dedicated Space in the š¤ Hub with
evaluate-cli create [metric name]
, which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.
š Find a metric, comparison, measurement on the Hub
š Add a new evaluation module
š¤ Evaluate also has lots of useful features like:
- Type checking: the input types are checked to make sure that you are using the right input formats for each metric
- Metric cards: each metrics comes with a card that describes the values, limitations and their ranges, as well as providing examples of their usage and usefulness.
- Community metrics: Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.
Installation
With pip
š¤ Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)
pip install evaluate
Usage
š¤ Evaluate's main methods are:
evaluate.list_evaluation_modules()
to list the available metrics, comparisons and measurementsevaluate.load(module_name, **kwargs)
to instantiate an evaluation moduleresults = module.compute(*kwargs)
to compute the result of an evaluation module
Adding a new evaluation module
First install the necessary dependencies to create a new metric with the following command:
pip install evaluate[template]
Then you can get started with the following command which will create a new folder for your metric and display the necessary steps:
evaluate-cli create "Awesome Metric"
See this step-by-step guide in the documentation for detailed instructions.
Credits
Thanks to @marella for letting us use the evaluate
namespace on PyPi previously used by his library.