Oven logo

Oven

nlpo31.4.0

Published

Python binding for nlpO3 Thai language processing library in Rust

pip install nlpo3

Package Downloads

Weekly DownloadsMonthly Downloads

Requires Python

>=3.9

Dependencies

No dependencies


SPDX-FileCopyrightText: 2024-2026 PyThaiNLP Project SPDX-License-Identifier: Apache-2.0

nlpO3 Python binding

PyPI Python 3.9 Apache-2.0 DOI

Python binding for nlpO3, a Thai natural language processing library written in Rust.

To install:

pip install nlpo3

Table of Contents

Features

  • Thai word tokenizer
    • segment() - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
      • 2.5x faster than similar pure Python implementation (PyThaiNLP's newmm)
    • load_dict() - load a dictionary from a plain text file (one word per line)

Use

Load a dictionary file and assign it a name (for example, dict_name).

Then tokenize text using the named dictionary:

from nlpo3 import load_dict, segment

load_dict("path/to/dict.file", "dict_name")
segment("สวัสดีครับ", "dict_name")

The function returns a list of strings, for example:

['สวัสดี', 'ครับ']

The result depends on the words included in the dictionary.

Use multithread mode using the dict_name dictionary:

segment("สวัสดีครับ", dict_name="dict_name", parallel=True)

Use safe mode to avoid long run times for inputs with many ambiguous word boundaries:

segment("สวัสดีครับ", dict_name="dict_name", safe=True)

Dictionary

  • To keep the library small, nlpO3 does not include a dictionary. Users must provide a dictionary when using the dictionary-based tokenizer.
  • For tokenization dictionaries, try

Build

Requirements

  • Rust 2018 Edition
  • Python 3.7 or newer (PyO3's minimum supported version)
  • Python Development Headers
    • Ubuntu: sudo apt-get install python3-dev
    • macOS: No action needed
  • PyO3 - already included in Cargo.toml
  • setuptools-rust

Steps

python -m pip install --upgrade build
python -m build

This should generate a wheel file, in dist/ directory, which can be installed by pip.

To install a wheel from a local directory:

pip install dist/nlpo3-1.3.1-cp311-cp311-macosx_12_0_x86_64.whl 

Test

To run a Python unit test:

cd tests
python -m unittest

Issues

Please report issues at https://github.com/PyThaiNLP/nlpo3/issues

License

nlpO3 Python binding is copyrighted by its authors and licensed under terms of the Apache Software License 2.0 (Apache-2.0). See file LICENSE for details.

Binary wheels

Pre-built binary packages for CPython, GraalPy, and PyPy are available on PyPI for the platforms listed below. Versions with a "t" suffix indicate CPython with free threading.

PythonOSArchitectureBinary wheel
3.14Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
musllinuxx86_64
3.14tWindowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
musllinuxx86_64
3.13Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
musllinuxx86_64
3.12Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
musllinuxx86_64
3.11Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
musllinuxx86_64
3.10Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
musllinuxx86_64
3.9Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
musllinuxx86_64
3.8Windowsx86✓ (v1.3.1)
AMD64✓ (v1.3.1)
macOSx86_64✓ (v1.3.1)
arm64✓ (v1.3.1)
manylinuxx86_64✓ (v1.3.1)
i686✓ (v1.3.1)
musllinuxx86_64✓ (v1.3.1)
3.7Windowsx86✓ (v1.3.1)
AMD64✓ (v1.3.1)
macOSx86_64✓ (v1.3.1)
arm64
manylinuxx86_64✓ (v1.3.1)
i686✓ (v1.3.1)
musllinuxx86_64✓ (v1.3.1)
GraalPy 3.12Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
GraalPy 3.11Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
PyPy 3.11Windowsx86
AMD64
macOSx86_64
arm64
manylinuxx86_64
i686
PyPy 3.10Windowsx86
AMD64✓ (v1.3.1)
macOSx86_64✓ (v1.3.1)
arm64✓ (v1.3.1)
manylinuxx86_64✓ (v1.3.1)
i686✓ (v1.3.1)
PyPy 3.9Windowsx86
AMD64✓ (v1.3.1)
macOSx86_64✓ (v1.3.1)
arm64✓ (v1.3.1)
manylinuxx86_64✓ (v1.3.1)
i686✓ (v1.3.1)
PyPy 3.8Windowsx86
AMD64✓ (v1.3.1)
macOSx86_64✓ (v1.3.1)
arm64✓ (v1.3.1)
manylinuxx86_64✓ (v1.3.1)
i686✓ (v1.3.1)
PyPy 3.7Windowsx86
AMD64✓ (v1.3.1)
macOSx86_64✓ (v1.3.1)
arm64
manylinuxx86_64✓ (v1.3.1)
i686✓ (v1.3.1)