<?xml version="1.0" encoding="utf-8"?>
<root>
<tag1 attrib11="" attrib12="" >text1</tag1>tail1
<tag2 attrib21="" attrib22="" >text2</tag2>tail2
</root>
# encoding=utf8
from lxml import etree
xml = etree.parse('filepath')
root = xml.getroot()
children = root.xpath('//*')
child = children[0]
etree.parse()
得到对象lxml.etree._ElementTree
,xml.getroot()
得到对象Element
,我们主要操作的对象是Element
print(child.tag)
print(child.attrib)
print(child.text)
print(child.tail)
>>> tag1
>>> {'attrib11': '', 'attrib12': ''}
>>> text1
>>> tail1
child.tag = 'newtag'
newattrib = {}
child.attrib.update(newattrib)
attrib
属性不能直接赋值,可用dict.update
方法来修改
attrib = {}
element = etree.Element('tag', attrib=attrib)
print(root.xpath('.//*'))
root.append(child)
print(root.xpath('.//*'))
root.insert(0, child)
print(root.xpath('.//*'))
root.remove(child)
print(root.xpath('.//*'))
root.append(element)
print(root.xpath('.//*'))
>>> [<Element t8aaa at 0x52e8ac8>, <Element t98ab at 0x52e8b08>]
>>> [<Element t98ab at 0x52e8b08>, <Element t8aaa at 0x52e8ac8>]
>>> [<Element t8aaa at 0x52e8ac8>, <Element t98ab at 0x52e8b08>]
>>> [<Element t98ab at 0x52e8b08>]
>>> [<Element t98ab at 0x52e8b08>, <Element t7efa at 0x52e8bf8>]
在同一个父节点下插入同一个Element
只会改变顺序,不会新增节点,只有插入新的Element
才会新增节点
etree.ElementTree(root).write('filename', encoding='utf-8', pretty_print=True,
xml_declaration=True)
encoding
控制xml编码,不设置时默认使用URL编码。xml_declaretion
控制xml是否带声明(<?xml version="1.0" encoding="utf-8"?>
)。pretty_print
控制是否带格式输出,需要特别注意的是,要使pretty_print
生效,需要在解析xml的时候就设置使用参数为remove_blank_text=True
的解析器1,即:xml = etree.parse('filepath', parser=parser=etree.XMLParser(remove_blank_text=True))
root = xml.getroot()
用etree.Element
生成的节点作为根节点生成xml则pretty_print
可以生效