当前位置: 首页 > 工具软件 > xml-eval > 使用案例 >

python解析xml文件成字典_如何将xml字符串转换为字典?

卫飞鹏
2023-12-01

如何将xml字符串转换为字典?

我有一个程序从套接字读取xml文档。 我将xml文档存储在一个字符串中,我希望将其直接转换为Python字典,就像在Django的dic_xml库中一样。

举个例子:

str ="<?xml version="1.0" ?>john20

dic_xml = convert_to_dic(str)

那么dic_xml看起来像{'person' : { 'name' : 'john', 'age' : 20 } }

15个解决方案

226 votes

xmltodict(完全披露:我写的)完全是这样的:

xmltodict.parse("""

john

20

""")

# {u'person': {u'age': u'20', u'name': u'john'}}

Martin Blech answered 2019-08-20T09:17:55Z

42 votes

这是某人创建的一个很棒的模块。 我已多次使用它了。[http://code.activestate.com/recipes/410469-xml-as-dictionary/]

这是网站上的代码,以防链接变坏。

import cElementTree as ElementTree

class XmlListConfig(list):

def __init__(self, aList):

for element in aList:

if element:

# treat like dict

if len(element) == 1 or element[0].tag != element[1].tag:

self.append(XmlDictConfig(element))

# treat like list

elif element[0].tag == element[1].tag:

self.append(XmlListConfig(element))

elif element.text:

text = element.text.strip()

if text:

self.append(text)

class XmlDictConfig(dict):

'''

Example usage:

>>> tree = ElementTree.parse('your_file.xml')

>>> root = tree.getroot()

>>> xmldict = XmlDictConfig(root)

Or, if you want to use an XML string:

>>> root = ElementTree.XML(xml_string)

>>> xmldict = XmlDictConfig(root)

And then use xmldict for what it is... a dict.

'''

def __init__(self, parent_element):

if parent_element.items():

self.update(dict(parent_element.items()))

for element in parent_element:

if element:

# treat like dict - we assume that if the first two tags

# in a series are different, then they are all different.

if len(element) == 1 or element[0].tag != element[1].tag:

aDict = XmlDictConfig(element)

# treat like list - we assume that if the first two tags

# in a series are the same, then the rest are the same.

else:

# here, we put the list in dictionary; the key is the

# tag name the list elements all share in common, and

# the value is the list itself

aDict = {element[0].tag: XmlListConfig(element)}

# if the tag has attributes, add those to the dict

if element.items():

aDict.update(dict(element.items()))

self.update({element.tag: aDict})

# this assumes that if you've got an attribute in a tag,

# you won't be having any text. This may or may not be a

# good idea -- time will tell. It works for the way we are

# currently doing XML configuration files...

elif element.items():

self.update({element.tag: dict(element.items())})

# finally, if there are no child tags and no attributes, extract

# the text

else:

self.update({element.tag: element.text})

用法示例:

tree = ElementTree.parse('your_file.xml')

root = tree.getroot()

xmldict = XmlDictConfig(root)

//或者,如果要使用XML字符串:

root = ElementTree.XML(xml_string)

xmldict = XmlDictConfig(root)

James answered 2019-08-20T09:17:29Z

36 votes

以下XML-to-Python-dict片段解析实体以及遵循此XML-to-JSON&#34;规范&#34;的属性。 它是处理所有XML案例的最通用的解决方案。

from collections import defaultdict

def etree_to_dict(t):

d = {t.tag: {} if t.attrib else None}

children = list(t)

if children:

dd = defaultdict(list)

for dc in map(etree_to_dict, children):

for k, v in dc.items():

dd[k].append(v)

d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}

if t.attrib:

d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())

if t.text:

text = t.text.strip()

if children or t.attrib:

if text:

d[t.tag]['#text'] = text

else:

d[t.tag] = text

return d

它用于:

from xml.etree import cElementTree as ET

e = ET.XML('''

text

text

text text

text text

text text

''')

from pprint import pprint

pprint(etree_to_dict(e))

此示例的输出(按照上面链接的&#34;规范&#34;)应该是:

{'root': {'e': [None,

'text',

{'@name': 'value'},

{'#text': 'text', '@name': 'value'},

{'a': 'text', 'b': 'text'},

{'a': ['text', 'text']},

{'#text': 'text', 'a': 'text'}]}}

不一定很漂亮,但它是明确的,更简单的XML输入导致更简单的JSON。:)

更新

如果你想反过来,从JSON / dict发出一个XML字符串,你可以使用:

try:

basestring

except NameError: # python3

basestring = str

def dict_to_etree(d):

def _to_etree(d, root):

if not d:

pass

elif isinstance(d, basestring):

root.text = d

elif isinstance(d, dict):

for k,v in d.items():

assert isinstance(k, basestring)

if k.startswith('#'):

assert k == '#text' and isinstance(v, basestring)

root.text = v

elif k.startswith('@'):

assert isinstance(v, basestring)

root.set(k[1:], v)

elif isinstance(v, list):

for e in v:

_to_etree(e, ET.SubElement(root, k))

else:

_to_etree(v, ET.SubElement(root, k))

else:

raise TypeError('invalid type: ' + str(type(d)))

assert isinstance(d, dict) and len(d) == 1

tag, body = next(iter(d.items()))

node = ET.Element(tag)

_to_etree(body, node)

return ET.tostring(node)

pprint(dict_to_etree(d))

K3---rnc answered 2019-08-20T09:18:53Z

22 votes

这个轻量级版本虽然不可配置,但很容易根据需要进行定制,并且适用于旧的蟒蛇。 它也很严格 - 意味着无论属性是否存在,结果都是相同的。

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):

if root:

return {r.tag : dictify(r, False)}

d=copy(r.attrib)

if r.text:

d["_text"]=r.text

for x in r.findall("./*"):

if x.tag not in d:

d[x.tag]=[]

d[x.tag].append(dictify(x,False))

return d

所以:

root = ET.fromstring("vw")

dictify(root)

结果是:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

Erik Aronesty answered 2019-08-20T09:19:21Z

6 votes

最新版本的PicklingTools库(1.3.0和1.3.1)支持从XML转换为Python dict的工具。

可从此处下载:PicklingTools 1.3.1

这里有很多关于转换器的文档:文档详细描述了在XML和Python字典之间进行转换时将出现的所有决策和问题(有许多边缘情况:属性,列表,匿名列表,匿名 大多数转换器都不会处理的dicts,eval等。 但总的来说,转换器易于使用。 如果是&#39; example.xml&#39;包含:

1

2.2

three

然后将其转换为字典:

>>> from xmlloader import *

>>> example = file('example.xml', 'r') # A document containing XML

>>> xl = StreamXMLLoader(example, 0) # 0 = all defaults on operation

>>> result = xl.expect XML()

>>> print result

{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

在C ++和Python中都有转换工具:C ++和Python进行转换,但C ++的转换速度提高了约60倍

rts1 answered 2019-08-20T09:20:15Z

3 votes

您可以使用lxml轻松完成此操作。 首先安装它:

[sudo] pip install lxml

这是我写的一个递归函数,它为你做了繁重的工作:

from lxml import objectify as xml_objectify

def xml_to_dict(xml_str):

""" Convert xml to dict, using lxml v3.4.2 xml processing library """

def xml_to_dict_recursion(xml_object):

dict_object = xml_object.__dict__

if not dict_object:

return xml_object

for key, value in dict_object.items():

dict_object[key] = xml_to_dict_recursion(value)

return dict_object

return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?>

Test1234

3455"""

print xml_to_dict(xml_string)

以下变体保留父键/元素:

def xml_to_dict(xml_str):

""" Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """

def xml_to_dict_recursion(xml_object):

dict_object = xml_object.__dict__

if not dict_object: # if empty dict returned

return xml_object

for key, value in dict_object.items():

dict_object[key] = xml_to_dict_recursion(value)

return dict_object

xml_obj = objectify.fromstring(xml_str)

return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

如果你只想返回一个子树并将其转换为dict,你可以使用Element.find()获取子树然后转换它:

xml_obj.find('.//') # lxml.objectify.ObjectifiedElement instance

请参阅此处的lxml文档。 我希望这有帮助!

radtek answered 2019-08-20T09:21:07Z

2 votes

最容易使用的Python XML解析器是ElementTree(从2.5x及更高版本开始,它位于标准库xml.etree.ElementTree中)。 我不认为有任何东西可以完全满足您的需求。 使用ElementTree写一些你想要的东西是非常简单的,但为什么要转换为字典,为什么不直接使用ElementTree。

answered 2019-08-20T09:21:33Z

2 votes

def xml_to_dict(node):

u'''

@param node:lxml_node

@return: dict

'''

return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}

dibrovsd answered 2019-08-20T09:21:50Z

2 votes

来自[http://code.activestate.com/recipes/410469-xml-as-dictionary/]的代码运行良好,但如果在层次结构中的给定位置有多个相同的元素,则它会覆盖它们。

我在self.update()之前添加了一个垫片,看看该元素是否已经存在。 如果是,则弹出现有条目并创建现有和新的列表。 任何后续重复项都会添加到列表中。

不确定这是否可以更优雅地处理,但它的工作原理:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):

def __init__(self, parent_element):

if parent_element.items():

self.updateShim(dict(parent_element.items()))

for element in parent_element:

if len(element):

aDict = XmlDictConfig(element)

if element.items():

aDict.updateShim(dict(element.items()))

self.updateShim({element.tag: aDict})

elif element.items():

self.updateShim({element.tag: dict(element.items())})

else:

self.updateShim({element.tag: element.text.strip()})

def updateShim (self, aDict ):

for key in aDict.keys():

if key in self:

value = self.pop(key)

if type(value) is not list:

listOfDicts = []

listOfDicts.append(value)

listOfDicts.append(aDict[key])

self.update({key: listOfDicts})

else:

value.append(aDict[key])

self.update({key: value})

else:

self.update(aDict)

Adam Clark answered 2019-08-20T09:22:29Z

2 votes

从@ K3 --- rnc响应(对我来说最好的)我已经添加了一些小修改来从XML文本中获取OrderedDict(有时候排序很重要):

def etree_to_ordereddict(t):

d = OrderedDict()

d[t.tag] = OrderedDict() if t.attrib else None

children = list(t)

if children:

dd = OrderedDict()

for dc in map(etree_to_ordereddict, children):

for k, v in dc.iteritems():

if k not in dd:

dd[k] = list()

dd[k].append(v)

d = OrderedDict()

d[t.tag] = OrderedDict()

for k, v in dd.iteritems():

if len(v) == 1:

d[t.tag][k] = v[0]

else:

d[t.tag][k] = v

if t.attrib:

d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())

if t.text:

text = t.text.strip()

if children or t.attrib:

if text:

d[t.tag]['#text'] = text

else:

d[t.tag] = text

return d

关注@ K3 --- rnc示例,您可以使用它:

from xml.etree import cElementTree as ET

e = ET.XML('''

text

text

text text

text text

text text

''')

from pprint import pprint

pprint(etree_to_ordereddict(e))

希望能帮助到你 ;)

serfer2 answered 2019-08-20T09:23:08Z

2 votes

免责声明:这个修改过的XML解析器的灵感来自Adam Clark原始XML解析器适用于大多数简单情况。 但是,它并不适用于某些复杂的XML文件。 我逐行调试代码,最后解决了一些问题。 如果您发现了一些错误,请告诉我。 我很高兴解决它。

class XmlDictConfig(dict):

'''

Note: need to add a root into if no exising

Example usage:

>>> tree = ElementTree.parse('your_file.xml')

>>> root = tree.getroot()

>>> xmldict = XmlDictConfig(root)

Or, if you want to use an XML string:

>>> root = ElementTree.XML(xml_string)

>>> xmldict = XmlDictConfig(root)

And then use xmldict for what it is... a dict.

'''

def __init__(self, parent_element):

if parent_element.items():

self.updateShim( dict(parent_element.items()) )

for element in parent_element:

if len(element):

aDict = XmlDictConfig(element)

# if element.items():

# aDict.updateShim(dict(element.items()))

self.updateShim({element.tag: aDict})

elif element.items(): # items() is specialy for attribtes

elementattrib= element.items()

if element.text:

elementattrib.append((element.tag,element.text )) # add tag:text if there exist

self.updateShim({element.tag: dict(elementattrib)})

else:

self.updateShim({element.tag: element.text})

def updateShim (self, aDict ):

for key in aDict.keys(): # keys() includes tag and attributes

if key in self:

value = self.pop(key)

if type(value) is not list:

listOfDicts = []

listOfDicts.append(value)

listOfDicts.append(aDict[key])

self.update({key: listOfDicts})

else:

value.append(aDict[key])

self.update({key: value})

else:

self.update({key:aDict[key]}) # it was self.update(aDict)

tiger answered 2019-08-20T09:23:33Z

1 votes

这是一个指向ActiveState解决方案的链接 - 以及代码,以防它再次消失。

==================================================

xmlreader.py:

==================================================

from xml.dom.minidom import parse

class NotTextNodeError:

pass

def getTextFromNode(node):

"""

scans through all children of node and gathers the

text. if node has non-text child-nodes, then

NotTextNodeError is raised.

"""

t = ""

for n in node.childNodes:

if n.nodeType == n.TEXT_NODE:

t += n.nodeValue

else:

raise NotTextNodeError

return t

def nodeToDic(node):

"""

nodeToDic() scans through the children of node and makes a

dictionary from the content.

three cases are differentiated:

- if the node contains no other nodes, it is a text-node

and {nodeName:text} is merged into the dictionary.

- if the node has the attribute "method" set to "true",

then it's children will be appended to a list and this

list is merged to the dictionary in the form: {nodeName:list}.

- else, nodeToDic() will call itself recursively on

the nodes children (merging {nodeName:nodeToDic()} to

the dictionary).

"""

dic = {}

for n in node.childNodes:

if n.nodeType != n.ELEMENT_NODE:

continue

if n.getAttribute("multiple") == "true":

# node with multiple children:

# put them in a list

l = []

for c in n.childNodes:

if c.nodeType != n.ELEMENT_NODE:

continue

l.append(nodeToDic(c))

dic.update({n.nodeName:l})

continue

try:

text = getTextFromNode(n)

except NotTextNodeError:

# 'normal' node

dic.update({n.nodeName:nodeToDic(n)})

continue

# text node

dic.update({n.nodeName:text})

continue

return dic

def readConfig(filename):

dom = parse(filename)

return nodeToDic(dom)

def test():

dic = readConfig("sample.xml")

print dic["Config"]["Name"]

print

for item in dic["Config"]["Items"]:

print "Item's Name:", item["Name"]

print "Item's Value:", item["Value"]

test()

==================================================

sample.xml:

==================================================

My Config File

First Item

Value 1

Second Item

Value 2

==================================================

output:

==================================================

My Config File

Item's Name: First Item

Item's Value: Value 1

Item's Name: Second Item

Item's Value: Value 2

tgray answered 2019-08-20T09:23:57Z

0 votes

有一次我不得不解析并编写只包含没有属性的元素的XML,因此可以轻松地从XML到dict的1:1映射。 这是我提出的,以防其他人也不需要属性:

def xmltodict(element):

if not isinstance(element, ElementTree.Element):

raise ValueError("must pass xml.etree.ElementTree.Element object")

def xmltodict_handler(parent_element):

result = dict()

for element in parent_element:

if len(element):

obj = xmltodict_handler(element)

else:

obj = element.text

if result.get(element.tag):

if hasattr(result[element.tag], "append"):

result[element.tag].append(obj)

else:

result[element.tag] = [result[element.tag], obj]

else:

result[element.tag] = obj

return result

return {element.tag: xmltodict_handler(element)}

def dicttoxml(element):

if not isinstance(element, dict):

raise ValueError("must pass dict type")

if len(element) != 1:

raise ValueError("dict must have exactly one root key")

def dicttoxml_handler(result, key, value):

if isinstance(value, list):

for e in value:

dicttoxml_handler(result, key, e)

elif isinstance(value, basestring):

elem = ElementTree.Element(key)

elem.text = value

result.append(elem)

elif isinstance(value, int) or isinstance(value, float):

elem = ElementTree.Element(key)

elem.text = str(value)

result.append(elem)

elif value is None:

result.append(ElementTree.Element(key))

else:

res = ElementTree.Element(key)

for k, v in value.items():

dicttoxml_handler(res, k, v)

result.append(res)

result = ElementTree.Element(element.keys()[0])

for key, value in element[element.keys()[0]].items():

dicttoxml_handler(result, key, value)

return result

def xmlfiletodict(filename):

return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):

ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):

return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):

return ElementTree.tostring(dicttoxml(element))

josch answered 2019-08-20T09:24:22Z

0 votes

@dibrovsd:如果xml具有多个具有相同名称的标记,则解决方案将无效

在你的思路上,我已经修改了一些代码并将其写入一般节点而不是root:

from collections import defaultdict

def xml2dict(node):

d, count = defaultdict(list), 1

for i in node:

d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]

d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list

d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict

return d

pg2455 answered 2019-08-20T09:24:54Z

-1 votes

我有一个递归方法从lxml元素获取字典

def recursive_dict(element):

return (element.tag.split('}')[1],

dict(map(recursive_dict, element.getchildren()),

**element.attrib))

moylop260 answered 2019-08-20T09:25:19Z

 类似资料: