Oven logo

Oven

Published

Pure Python spell checker, utilizing Spylls a port of Hunspell

pip install phunspell

Package Downloads

Weekly DownloadsMonthly Downloads

Project URLs

Requires Python

Dependencies

    Phunspell

    A pure Python spell checker utilizing spylls a port of Hunspell.

    NOTE: If you are only supporting languages: English, Russian or Swedish then use spylls directly: (pip install spylls)

    This library includes dictionaries for all languages supported by LibreOffice.

    Just a note giving credit where it's due, spylls is a fantastic project which deserves all the credit. There is a corresponding blog entry which is a good read. (and of course Hunspell itself)

    Usage

    import phunspell
    
    pspell = phunspell.Phunspell('en_US')
    print(pspell.lookup("phunspell")) # False
    print(pspell.lookup("about")) # True
    
    mispelled = pspell.lookup_list("Bill's TV is borken".split(" "))
    print(mispelled) # ["borken"]
    
    for suggestion in pspell.suggest('phunspell'):
        print(suggestion) # Hunspell
    

    Installation

    pip install phunspell
    

    Supported Languages

    LanguageLanguage Code
    Afrikaansaf_ZA
    Aragonesean_ES
    Arabicar
    Belarusianbe_BY
    Bulgarianbg_BG
    Bretonbr_FR
    Catalanca_ES
    Czechcs_CZ
    Danishda_DK
    Germande_AT
    Germande_CH
    Germande_DE
    Greekel_GR
    English (Australian)en_AU
    English (Canada)en_CA
    English (Great Britain)en_GB
    English (US)en_US
    English (South African)en_ZA
    Spanish (all variants)es
    Spanishes_AR
    Spanishes_BO
    Spanishes_CL
    Spanishes_CO
    Spanishes_CR
    Spanishes_CU
    Spanishes_DO
    Spanishes_EC
    Spanishes_ES
    Spanishes_GQ
    Spanishes_GT
    Spanishes_HN
    Spanishes_MX
    Spanishes_NI
    Spanishes_PA
    Spanishes_PE
    Spanishes_PH
    Spanishes_PR
    Spanishes_PY
    Spanishes_SV
    Spanishes_US
    Spanishes_UY
    Spanishes_VE
    Estonianet_EE
    Frenchfr_FR
    Scottish Gaelicgd_GB
    Gujaratigu_IN
    Guaranigug_PY
    Hebrewhe_IL
    Hindihi_IN
    Croatianhr_HR
    Hungarianhu_HU (TODO)
    Icelandicis
    Indonesianid_ID
    Italianit_IT
    Kurdish (Turkey)ku_TR
    Lithuanianlt_LT
    Latvianlv_LV
    Mapudüngunmd (arn) (TODO)
    Netherlandsnl_NL
    Norwegiannb_NO
    Norwegiannn_NO
    Occitanoc_FR
    Polishpl_PL
    Brazilian Portuguesept_BR
    Portuguesept_PT
    Romanianro_RO
    Sinhalasi_LK
    Slovaksk_SK
    Sloveniansl_SI
    Serbian (Cyrillic)sr
    Serbian (Latin)sr-Latn
    Swedishsv_SE
    Swahilisw_TZ
    TamilTa (TODO)
    Thaith_TH
    Turkishtr_TR
    Ukrainianuk_UA
    Vietnamesevi_VN

    Tests

    python -m unittest discover -s phunspell/tests -p "test_*.py"
    

    Experimental

    
        # Extended Optional:
    
        # First time usage:
        # create a directory of dictionaries stored as object
        # makes loading/access much faster
    
        storage_path = "/home/dvwright/data/phunspell/dictionary_objects"
        # run once only:
        pspell_object_create = PhunspellObjectStore(path=storage_path)
    
    
        # Then, typical usage:
        pspell = Phunspell(object_storage=storage_path)
    
        dicts_words = {
            "an_ES": "vengar",
            "be_BY": "ідалапаклонніцкі",
            "bg_BG": "удържехме",
        }
    
        for loc in dicts_words.keys():
            print(pspell.lookup(dicts_words[loc], locs=loc))
    

    There is an option to build/store all the dictionaries as pickled data. Since there are security risks associated with pickled data we will not include that data in the distrubution.

    To create your own local pickled dictionaries:

    enter a python shell:

    $ python
    storage_path = "/home/dvwright/data/phunspell/dictionary_objects"
    pspell = PhunspellObjectStore(path=storage_path)
    

    NOTE: You only have to do this once before using the library and it's optional (this will consume a lot of resources!)

    Once completed you should have a picked object for every dictionary supported by this lib.

    $ ls /home/dwright/python/phunspell/pickled_data/
    af_ZA
    an_ES
    be_BY
    bg_BG
    bn_BD
    br_FR
    bs_BA
    cs_CZ
    da_DK
    de_AT
    de_CH
    ...
    ...
    ...
    

    NOTE: will take up almost 2 GB of space

    $ du -sh .
    1.4G
    

    For all future uses of the library just pass the directory as an argument.

    storage_path = "/home/dvwright/data/phunspell/dictionary_objects"
    pspell = Phunspell(object_storage=storage_path)
    
    # load the specific locale on lookups
    pspell.lookup_list(['us-word1', 'us-word2'], locs='en_US')
    pspell.lookup('german-word', locs='de_DE')
    
    

    NOTE: If you ever update dictionary data, you will need to create a new pickle store for it.

    and it should find the dictionaries and load them quickly

    Misc

    python, python3, hunspell, libreoffice, spell, spell checking