tokenize-rt6.2.0
Published
A wrapper around the stdlib `tokenize` which roundtrips.
pip install tokenize-rt
Package Downloads
Authors
Project URLs
Requires Python
>=3.9
Dependencies
tokenize-rt
The stdlib tokenize module does not properly roundtrip.  This wrapper
around the stdlib provides two additional tokens ESCAPED_NL and
UNIMPORTANT_WS, and a Token data type.  Use src_to_tokens and
tokens_to_src to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
Installation
pip install tokenize-rt
Usage
datastructures
tokenize_rt.Offset(line=None, utf8_byte_offset=None)
A token offset, useful as a key when cross referencing the ast and the
tokenized source.
tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)
Construct a token
name: one of the token names listed intoken.tok_nameorESCAPED_NLorUNIMPORTANT_WSsrc: token's source as textline: the line number that this token appears on.utf8_byte_offset: the utf8 byte offset that this token appears on in the line.
tokenize_rt.Token.offset
Retrieves an Offset for this token.
converting to and from Token representations
tokenize_rt.src_to_tokens(text: str) -> List[Token]
tokenize_rt.tokens_to_src(Iterable[Token]) -> str
additional tokens added by tokenize-rt
tokenize_rt.ESCAPED_NL
tokenize_rt.UNIMPORTANT_WS
helpers
tokenize_rt.NON_CODING_TOKENS
A frozenset containing tokens which may appear between others while not
affecting control flow or code:
COMMENTESCAPED_NLNLUNIMPORTANT_WS
tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]
parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]
yields (index, token) pairs.  Useful for rewriting source.
tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]
find the indices of the string parts of a (joined) string literal
ishould start at the end of the string literal- returns 
()(an empty tuple) for things which are not string literals 
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
Differences from tokenize
tokenize-rtaddsESCAPED_NLfor a backslash-escaped newline "token"tokenize-rtaddsUNIMPORTANT_WSfor whitespace (discarded intokenize)tokenize-rtnormalizes string prefixes, even if they are not parsed -- for instance, this means you'll seeToken('STRING', "f'foo'", ...)even in python 2.tokenize-rtnormalizes python 2 long literals (4l/4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).