Oven logo

Oven

Published

A spaCy package for the Rust tokenizations library

pip install spacy-alignments

Package Downloads

Weekly DownloadsMonthly Downloads

Project URLs

Requires Python

>=3.7

Dependencies

    spacy-alignments: Align tokenizations for spaCy + transformers

    A spaCy package for Yohei Tamura's Rust tokenizations library with Python bindings.

    Installation

    pip install -U pip setuptools wheel
    pip install spacy-alignments
    

    If no binary wheel is available for your platform, you will need to install Rust in order to build spacy-alignments from source.

    spacy-alignments vs. pytokenizations

    The spacy_alignments module is a drop-in replacement for tokenizations:

    import spacy_alignments as tokenizations
    a2b, b2a = tokenizations.get_alignments(["å", "BC"], ["abc"])
    assert a2b == [[0], [0]]
    assert b2a == [[0, 1]]
    

    The only difference between this package and the original pytokenizations is that it switches the build system to setuptools-rust to make it easier for us at Explosion to build source and binary packages for a wider range of platforms.

    Bug reports and other issues

    Please use spaCy's issue tracker to report a bug, or open a new thread on the discussion board for any other issue.