Oven logo

Oven

nlpo31.3.1

Published

Python binding for nlpO3 Thai language processing library in Rust

pip install nlpo3

Package Downloads

Weekly DownloadsMonthly Downloads

Requires Python

>=3.7

Dependencies


    SPDX-FileCopyrightText: 2024 PyThaiNLP Project SPDX-License-Identifier: Apache-2.0

    nlpO3 Python binding

    PyPI Python 3.7 Apache-2.0

    Python binding for nlpO3, a Thai natural language processing library in Rust.

    To install:

    pip install nlpo3
    

    Table of Contents

    Features

    • Thai word tokenizer
      • segment() - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
        • 2.5x faster than similar pure Python implementation (PyThaiNLP's newmm)
      • load_dict() - load a dictionary from a plain text file (one word per line)

    Use

    Load file path/to/dict.file to memory and assign a name dict_name to it.

    Then tokenize a text with the dict_name dictionary:

    from nlpo3 import load_dict, segment
    
    load_dict("path/to/dict.file", "custom_dict")
    segment("สวัสดีครับ", "dict_name")
    

    it will return a list of strings:

    ['สวัสดี', 'ครับ']
    

    (result depends on words included in the dictionary)

    Use multithread mode, also use the dict_name dictionary:

    segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
    

    Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries:

    segment("สวัสดีครับ", dict_name="dict_name", safe=True)
    

    Dictionary

    • For the interest of library size, nlpO3 does not assume what dictionary the user would like to use, and it does not come with a dictionary.
    • A dictionary is needed for the dictionary-based word tokenizer.
    • For tokenization dictionary, try

    Build

    Requirements

    • Rust 2018 Edition
    • Python 3.7 or newer (PyO3's minimum supported version)
    • Python Development Headers
      • Ubuntu: sudo apt-get install python3-dev
      • macOS: No action needed
    • PyO3 - already included in Cargo.toml
    • setuptools-rust

    Steps

    python -m pip install --upgrade build
    python -m build
    

    This should generate a wheel file, in dist/ directory, which can be installed by pip.

    To install a wheel from a local directory:

    pip install dist/nlpo3-1.3.1-cp311-cp311-macosx_12_0_x86_64.whl 
    

    Test

    To run a Python unit test:

    cd tests
    python -m unittest
    

    Issues

    Please report issues at https://github.com/PyThaiNLP/nlpo3/issues

    License

    nlpO3 Python binding is copyrighted by its authors and licensed under terms of the Apache Software License 2.0 (Apache-2.0). See file LICENSE for details.

    Binary wheels

    A pre-built binary package is available from PyPI for these platforms:

    PythonOSArchitectureHas binary wheel?
    3.13Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    musllinuxx86_64
    3.12Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    musllinuxx86_64
    3.11Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    musllinuxx86_64
    3.10Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    musllinuxx86_64
    3.9Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    musllinuxx86_64
    3.8Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    musllinuxx86_64
    3.7Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    musllinuxx86_64
    PyPy 3.10Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    PyPy 3.9Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    PyPy 3.8Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686
    PyPy 3.7Windowsx86
    WindowsAMD64
    macOSx86_64
    macOSarm64
    manylinuxx86_64
    manylinuxi686