Oven logo

Oven

Published

Helper functions to syntactically validate strings according to RFC 3987.

pip install rfc3987-syntax

Package Downloads

Weekly DownloadsMonthly Downloads

Authors

Will Riley

Requires Python

>=3.9

rfc3987-syntax

Helper functions to parse and validate the syntax of terms defined in RFC 3987 โ€” the IETF standard for Internationalized Resource Identifiers (IRIs).

๐ŸŽฏ Purpose

The goal of rfc3987-syntax is to provide a lightweight, permissively licensed Python module for validating that strings conform to the ABNF grammar defined in RFC 3987. These helpers are:

  • โœ… Strictly aligned with the syntax rules of RFC 3987
  • โœ… Built using a permissive MIT license
  • โœ… Designed for both open source and proprietary use
  • โœ… Powered by Lark, a fast, EBNF-based parser

๐Ÿง  Note: This project focuses on syntax validation only. RFC 3987 specifies additional semantic rules (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately.

๐Ÿ“„ License, Attribution, and Citation

rfc3987-syntax is licensed under the MIT License, which allows reuse in both open source and commercial software.

This project:

  • โŒ Does not depend on the rfc3987 Python package (GPL-licensed)
  • โœ… Uses lark, licensed under MIT
  • โœ… Implements grammar from RFC 3987, using RFC 3986 where RFC 3987 delegates syntax

โš ๏ธ This project is not affiliated with or endorsed by the authors of RFC 3987 or the rfc3987 Python package.

Please cite this software in accordance with the enclosed CITATION.cff file.

โš ๏ธ Limitations

The grammar and parser enforce only the ABNF syntax defined in RFC 3987. The following are not validated and must be handled separately for full compliance:

  • โœ… Unicode Normalization Form C (NFC)
  • โœ… Bidirectional text (BiDi) constraints (RFC 3987 ยง4.1)
  • โœ… Port number ranges (must be 0โ€“65535)
  • โœ… Valid IPv6 compression (only one ::, max segments)
  • โœ… Context-aware percent-encoding requirements

ChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome.

๐Ÿ“ฆ Installation

pip install rfc3987-syntax

๐Ÿ›  Usage

List all supported "terms" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987

from rfc3987_syntax import RFC3987_SYNTAX_TERMS

print("Supported terms:")
for term in RFC3987_SYNTAX_TERMS:
    print(term)

Syntactically validate a string using the general-purpose validator

from rfc3987_syntax import is_valid_syntax

if is_valid_syntax(term='iri', value='http://github.com'):
    print("โœ“ Valid IRI syntax")

if not is_valid_syntax(term='iri', value='bob'):
    print("โœ— Invalid IRI syntax")

if not is_valid_syntax(term='iri_reference', value='bob'):
    print("โœ“ Valid IRI-reference syntax")

Alternatively, use term-specific helpers to validate RFC 3987 syntax.

from rfc3987_syntax import is_valid_syntax_iri
from rfc3987_syntax import is_valid_syntax_iri_reference

if is_valid_syntax_iri('http://github.com'):
    print("โœ“ Valid IRI syntax")

if not is_valid_syntax_iri('bob'):
    print("โœ— Invalid IRI syntax")
    
if is_valid_syntax_iri_reference('bob'):
    print("โœ“ Valid IRI-reference syntax")

Get the Lark parse tree for a syntax validation (useful for additional semantic validation)

from rfc3987_syntax import parse

ptree: ParseTree = parse(term="iri", value="http://github.com")

print(ptree)

๐Ÿ“š Sources

This grammar was derived from:

๐Ÿ“ When RFC 3986 is listed as the source, it is used in accordance with RFC 3987, which explicitly references it for foundational elements.

Rule-to-Source Mapping

Rule/ComponentSourceNotes
iriRFC 3987Top-level IRI rule
iri_referenceRFC 3987Top-level IRI Reference rule
absolute_iriRFC 3987Top-level Absolute IRI rule
schemeRFC 3986Referenced by RFC 3987 ยง2.2
ihier_partRFC 3987IRI-specific hierarchy
irelative_refRFC 3987IRI-specific relative ref
irelative_partRFC 3987IRI-specific relative part
iauthorityRFC 3986Standard URI authority
ipath_abemptyRFC 3986Path format variant
ipath_absoluteRFC 3986Absolute path
ipath_noschemeRFC 3986Path disallowing scheme prefix
ipath_rootlessRFC 3986Used in non-scheme contexts
iqueryRFC 3987Query extension to URI
ifragmentRFC 3987Fragment extension to URI
ipchar, isegmentRFC 3986Path characters and segments
isegment_nz_ncRFC 3987IRI-specific path constraint
iunreservedRFC 3987Includes ucschar
ucschar, iprivateRFC 3987Unicode support
sub_delimsRFC 3986Reserved characters
ip_literalRFC 3986IPv6 or IPvFuture in []
ipv6addressRFC 3986Expanded forms only
ipvfutureRFC 3986Forward-compatible
ipv4addressRFC 3986Dotted-decimal IPv4
ls32RFC 3986Final 32 bits of IPv6
h16, dec_octetRFC 3986Hex and decimal chunks
portRFC 3986Optional numeric
pct_encodedRFC 3986Percent encoding (e.g. %20)
alpha, digit, hexdigRFC 3986Character classes