Oven logo



A wrapper around the stdlib `tokenize` which roundtrips.

pip install tokenize-rt

Package Downloads

Weekly DownloadsMonthly Downloads

Project URLs

Requires Python



    build status pre-commit.ci status


    The stdlib tokenize module does not properly roundtrip. This wrapper around the stdlib provides two additional tokens ESCAPED_NL and UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and tokens_to_src to roundtrip.

    This library is useful if you're writing a refactoring tool based on the python tokenization.


    pip install tokenize-rt



    tokenize_rt.Offset(line=None, utf8_byte_offset=None)

    A token offset, useful as a key when cross referencing the ast and the tokenized source.

    tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)

    Construct a token

    • name: one of the token names listed in token.tok_name or ESCAPED_NL or UNIMPORTANT_WS
    • src: token's source as text
    • line: the line number that this token appears on.
    • utf8_byte_offset: the utf8 byte offset that this token appears on in the line.


    Retrieves an Offset for this token.

    converting to and from Token representations

    tokenize_rt.src_to_tokens(text: str) -> List[Token]

    tokenize_rt.tokens_to_src(Iterable[Token]) -> str

    additional tokens added by tokenize-rt





    A frozenset containing tokens which may appear between others while not affecting control flow or code:

    • NL

    tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]

    parse a string literal into its prefix and string content

    >>> parse_string_literal('f"foo"')
    ('f', '"foo"')

    tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]

    yields (index, token) pairs. Useful for rewriting source.

    tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]

    find the indices of the string parts of a (joined) string literal

    • i should start at the end of the string literal
    • returns () (an empty tuple) for things which are not string literals
    >>> tokens = src_to_tokens('"foo" "bar".capitalize()')
    >>> rfind_string_parts(tokens, 2)
    (0, 2)
    >>> tokens = src_to_tokens('("foo" "bar").capitalize()')
    >>> rfind_string_parts(tokens, 4)
    (1, 3)

    Differences from tokenize

    • tokenize-rt adds ESCAPED_NL for a backslash-escaped newline "token"
    • tokenize-rt adds UNIMPORTANT_WS for whitespace (discarded in tokenize)
    • tokenize-rt normalizes string prefixes, even if they are not parsed -- for instance, this means you'll see Token('STRING', "f'foo'", ...) even in python 2.
    • tokenize-rt normalizes python 2 long literals (4l / 4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).

    Sample usage