Oven logo




Python binding for nlpO3 Thai language processing library in Rust

pip install nlpo3

Package Downloads

Weekly DownloadsMonthly Downloads

Requires Python



    pypi Python 3.6 License Downloads

    nlpO3 Python binding

    Python binding for nlpO3, a Thai natural language processing library in Rust.


    • Thai word tokenizer
      • segment() - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
        • 2.5x faster than similar pure Python implementation (PyThaiNLP's newmm)
      • load_dict() - load a dictionary from plain text file (one word per line)

    Dictionary file

    • For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
    • For tokenization dictionary, try


    pip install nlpo3


    Load file path/to/dict.file to memory and assign a name dict_name to it. Then tokenize a text with the dict_name dictionary:

    from nlpo3 import load_dict, segment
    load_dict("path/to/dict.file", "custom_dict")
    segment("สวัสดีครับ", "dict_name")

    it will return a list of strings:

    ['สวัสดี', 'ครับ']

    (result depends on words included in the dictionary)

    Use multithread mode, also use the dict_name dictionary:

    segment("สวัสดีครับ", dict_name="dict_name", parallel=True)

    Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries:

    segment("สวัสดีครับ", dict_name="dict_name", safe=True)



    • Rust 2018 Edition
    • Python 3.6 or newer
    • Python Development Headers
      • Ubuntu: sudo apt-get install python3-dev
      • macOS: No action needed
    • PyO3 - already included in Cargo.toml
    • setuptools-rust


    python -m pip install --upgrade build
    python -m build

    This should generate a wheel file, in dist/ directory, which can be installed by pip.


    Please report issues at https://github.com/PyThaiNLP/nlpo3/issues