Oven logo

Oven

xmltodict0.13.0

Published

Makes working with XML feel like you are working with JSON

pip install xmltodict

Package Downloads

Weekly DownloadsMonthly Downloads

Project URLs

Requires Python

>=3.4

Dependencies

    xmltodict

    xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

    Build Status

    >>> print(json.dumps(xmltodict.parse("""
    ...  <mydocument has="an attribute">
    ...    <and>
    ...      <many>elements</many>
    ...      <many>more elements</many>
    ...    </and>
    ...    <plus a="complex">
    ...      element as well
    ...    </plus>
    ...  </mydocument>
    ...  """), indent=4))
    {
        "mydocument": {
            "@has": "an attribute", 
            "and": {
                "many": [
                    "elements", 
                    "more elements"
                ]
            }, 
            "plus": {
                "@a": "complex", 
                "#text": "element as well"
            }
        }
    }
    

    Namespace support

    By default, xmltodict does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing process_namespaces=True will make it expand namespaces for you:

    >>> xml = """
    ... <root xmlns="http://defaultns.com/"
    ...       xmlns:a="http://a.com/"
    ...       xmlns:b="http://b.com/">
    ...   <x>1</x>
    ...   <a:y>2</a:y>
    ...   <b:z>3</b:z>
    ... </root>
    ... """
    >>> xmltodict.parse(xml, process_namespaces=True) == {
    ...     'http://defaultns.com/:root': {
    ...         'http://defaultns.com/:x': '1',
    ...         'http://a.com/:y': '2',
    ...         'http://b.com/:z': '3',
    ...     }
    ... }
    True
    

    It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:

    >>> namespaces = {
    ...     'http://defaultns.com/': None, # skip this namespace
    ...     'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a"
    ... }
    >>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == {
    ...     'root': {
    ...         'x': '1',
    ...         'ns_a:y': '2',
    ...         'http://b.com/:z': '3',
    ...     },
    ... }
    True
    

    Streaming mode

    xmltodict is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:

    >>> def handle_artist(_, artist):
    ...     print(artist['name'])
    ...     return True
    >>> 
    >>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
    ...     item_depth=2, item_callback=handle_artist)
    A Perfect Circle
    Fantômas
    King Crimson
    Chris Potter
    ...
    

    It can also be used from the command line to pipe objects to a script like this:

    import sys, marshal
    while True:
        _, article = marshal.load(sys.stdin)
        print(article['title'])
    
    $ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | myscript.py
    AccessibleComputing
    Anarchism
    AfghanistanHistory
    AfghanistanGeography
    AfghanistanPeople
    AfghanistanCommunications
    Autism
    ...
    

    Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

    $ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | gzip > enwiki.dicts.gz
    

    And you reuse the dicts with every script that needs them:

    $ gunzip enwiki.dicts.gz | script1.py
    $ gunzip enwiki.dicts.gz | script2.py
    ...
    

    Roundtripping

    You can also convert in the other direction, using the unparse() method:

    >>> mydict = {
    ...     'response': {
    ...             'status': 'good',
    ...             'last_updated': '2014-02-16T23:10:12Z',
    ...     }
    ... }
    >>> print(unparse(mydict, pretty=True))
    <?xml version="1.0" encoding="utf-8"?>
    <response>
    	<status>good</status>
    	<last_updated>2014-02-16T23:10:12Z</last_updated>
    </response>
    

    Text values for nodes can be specified with the cdata_key key in the python dict, while node properties can be specified with the attr_prefix prefixed to the key name in the python dict. The default value for attr_prefix is @ and the default value for cdata_key is #text.

    >>> import xmltodict
    >>> 
    >>> mydict = {
    ...     'text': {
    ...         '@color':'red',
    ...         '@stroke':'2',
    ...         '#text':'This is a test'
    ...     }
    ... }
    >>> print(xmltodict.unparse(mydict, pretty=True))
    <?xml version="1.0" encoding="utf-8"?>
    <text stroke="2" color="red">This is a test</text>
    

    Lists that are specified under a key in a dictionary use the key as a tag for each item. But if a list does have a parent key, for example if a list exists inside another list, it does not have a tag to use and the items are converted to a string as shown in the example below. To give tags to nested lists, use the expand_iter keyword argument to provide a tag as demonstrated below. Note that using expand_iter will break roundtripping.

    >>> mydict = {
    ...     "line": {
    ...         "points": [
    ...             [1, 5],
    ...             [2, 6],
    ...         ]
    ...     }
    ... }
    >>> print(xmltodict.unparse(mydict, pretty=True))
    <?xml version="1.0" encoding="utf-8"?>
    <line>
            <points>[1, 5]</points>
            <points>[2, 6]</points>
    </line>
    >>> print(xmltodict.unparse(mydict, pretty=True, expand_iter="coord"))
    <?xml version="1.0" encoding="utf-8"?>
    <line>
            <points>
                    <coord>1</coord>
                    <coord>5</coord>
            </points>
            <points>
                    <coord>2</coord>
                    <coord>6</coord>
            </points>
    </line>
    

    Ok, how do I get it?

    Using pypi

    You just need to

    $ pip install xmltodict
    

    RPM-based distro (Fedora, RHEL, …)

    There is an official Fedora package for xmltodict.

    $ sudo yum install python-xmltodict
    

    Arch Linux

    There is an official Arch Linux package for xmltodict.

    $ sudo pacman -S python-xmltodict
    

    Debian-based distro (Debian, Ubuntu, …)

    There is an official Debian package for xmltodict.

    $ sudo apt install python-xmltodict
    

    FreeBSD

    There is an official FreeBSD port for xmltodict.

    $ pkg install py36-xmltodict
    

    openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)

    There is an official openSUSE package for xmltodict.

    # Python2
    $ zypper in python2-xmltodict
    
    # Python3
    $ zypper in python3-xmltodict