如何以与命名空间无关的方式通过Python中的XPath查找XML元素？

罗毅

2023-03-14

问题内容：

由于我第二次遇到这个烦人的问题，所以我认为提出要求会有所帮助。

有时我必须从XML文档中获取Elements，但是这样做的方法很尴尬。

我想知道一个可以满足我需求的python库，一种优雅的方式来表达我的XPath，一种在前缀中自动注册名称空间的方法，或者在内置XML实现或lxml中的隐藏首选项中完全删除名称空间的方法。除非您已经知道我想要什么，否则将进行澄清：)

范例文件：

<root xmlns="http://really-long-namespace.uri"
  xmlns:other="http://with-ambivalent.end/#">
    <other:elem/>
</root>

我可以做什么

ElementTree API是（我知道的）唯一提供XPath查询的内置API。但这需要我使用“
UNames”。看起来像这样：/{http://really-long-namespace.uri}root/{http://with- ambivalent.end/#}elem

如您所见，它们非常冗长。我可以通过执行以下操作来缩短它们：

default_ns = "http://really-long-namespace.uri"
other_ns   = "http://with-ambivalent.end/#"
doc.find("/{{{0}}}root/{{{1}}}elem".format(default_ns, other_ns))

但是，这是双方{{{丑陋}}}和脆弱的，因为http…end/#＆cong; http…end#＆cong;
http…end/＆cong;http…end和我是谁知道哪个变种会用吗？

同样，lxml支持名称空间前缀，但是它既不使用文档中的名称空间，也不提供自动方式来处理默认名称空间。我仍然必须获取每个命名空间的一个元素才能从文档中检索它。命名空间属性不会保留，因此也无法从这些属性中自动检索它们。

XPath查询也有一种与名称空间无关的方式，但是它既冗长/难看又在内置实现中不可用： /*[local-name() = 'root']/*[local- name() = 'elem']

我想做的事

我想找到一个库，选项或通用XPath-morphing函数，通过键入以下内容来实现上述示例……

未命名空间： /root/elem
文档中的命名空间前缀： /root/other:elem

…也许还有一些我确实想使用文档前缀或去除名称空间的语句。

进一步说明：尽管我当前的用例就这么简单，但将来我将不得不使用更复杂的用例。

谢谢阅读！

用户samplebias使我的注意力转向py-dom-xpath；正是我想要的。我的实际代码如下所示：

#parse the document into a DOM tree
rdf_tree = xml.dom.minidom.parse("install.rdf")
#read the default namespace and prefix from the root node
context = xpath.XPathContext(rdf_tree)

name    = context.findvalue("//em:id", rdf_tree)
vershtml" target="_blank">ion = context.findvalue("//em:version", rdf_tree)

#<Description/> inherits the default RDF namespace
resource_nodes = context.find("//Description/following-sibling::*", rdf_tree)

与文档一致，简单，具有名称空间意识；完善。

问题答案：

该*[local-name() = "elem"]语法应该可以工作，但是为了简化操作，您可以创建一个函数来简化部分或全部“通配符名称空间”
XPath表达式的构造。

我 在Ubuntu 10.04上 使用 python-lxml 2.2.4， 下面的脚本对我有用
。您需要根据要为每个元素指定默认名称空间的方式自定义行为，并处理要折叠到表达式中的任何其他XPath语法：

import lxml.etree

def xpath_ns(tree, expr):
    "Parse a simple expression and prepend namespace wildcards where unspecified."
    qual = lambda n: n if not n or ':' in n else '*[local-name() = "%s"]' % n
    expr = '/'.join(qual(n) for n in expr.split('/'))
    nsmap = dict((k, v) for k, v in tree.nsmap.items() if k)
    return tree.xpath(expr, namespaces=nsmap)

doc = '''<root xmlns="http://really-long-namespace.uri"
    xmlns:other="http://with-ambivalent.end/#">
    <other:elem/>
</root>'''

tree = lxml.etree.fromstring(doc)
print xpath_ns(tree, '/root')
print xpath_ns(tree, '/root/elem')
print xpath_ns(tree, '/root/other:elem')

输出：

[<Element {http://really-long-namespace.uri}root at 23099f0>]
[<Element {http://with-ambivalent.end/#}elem at 2309a48>]
[<Element {http://with-ambivalent.end/#}elem at 2309a48>]

更新：如果发现确实需要解析XPath，则可以签出py-dom-xpath之类的项目，该项目是XPath 1.0（大部分）的纯Python实现。至少可以使您对解析XPath的复杂性有所了解。

如何以与命名空间无关的方式通过Python中的XPath查找XML元素？

我可以做什么

我想做的事

相关阅读

相关文章

相关问答

相关工具

相关文档