长久以来,在shell 下处理xml一直是楼主头疼的问题,用 regex 匹配不是不可以,但对于复杂的需求就有点捉襟见肘了,例如这样一个问题 :查找某个元素值 item 的值等于value的节点,并把这个节点的另外一个元素值 item2 的值替换为另外一个值 value2。
对于这种需求,用 regex (sed) 处理起来很是痛苦,即便你用了 sed 的 pattern space ,也是很麻烦,光日后维护都成问题了。
昨天找了一下,有几个候选的
xmllint
xml-coreutils
xmlstarlet
xmllint 太老了,而且文档比较少,就一个 manual ,不太敢用。下面先说 xml-coreutils
这个东西 manual 和文档倒还算齐全。是模拟 coreutils 包的思路,具体可以看这里。有 xml-ls、xml-cat 等命令,看起来还不错,不过用起来就是另外一回事了,最大的问题就是 Xpath 的支持不到位 ,很多功能都没有 :不支持属性、不支持条件、不支持函数,而且对于大文件貌似处理有问题,经常报错。所以最后放弃了,也不建议大家使用。例如下面的 xml ,xml-coreutils 不支持 “/bookstore/book@category ” 这种 Xpath,也就是无法识别出 category 这个属性。。。
books.xml
Shell
1
2
3
4
5
6
7
8
9
10
11
12
Everyday ItalianGiada De Laurentiis
2005
30.00
下面正式介绍 xmlstarlet 这个东东。xmlstarlet (官方站点点这里) 跟 xml-coreutils 不同,它并不是按照 coreutils 那种风格,分成多个小工具的方式,而是一个大的程序,通过不同的子命令来实现不同的功能。这个还只是小区别,主要是 xmlstarlet 对 Xpath 规范的支持相当到位(至少我认为我日常需要的大部分的功能都满足,没有想到的也提供了),这个才是 xmlstarlet 跟 xml-coreutils 的最大区别。
XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands.
This set of command line utilities can be used by those who deal with many XML documents on UNIX shell command prompt as well as for automated XML processing with shell scripts.
The toolkit’s feature set includes options to:
Check or validate XML files (simple well-formedness check, DTD, XSD, RelaxNG)
Calculate values of XPath expressions on XML files (such as running sums, etc)
Search XML files for matches to given XPath expressions
Apply XSLT stylesheets to XML documents (including EXSLT support, and passing parameters to stylesheets)
Query XML documents (ex. query for value of some elements of attributes, sorting, etc)
Modify or edit XML documents (ex. delete some elements)
Format or “beautify” XML documents (as changing indentation, etc)
Fetch XML documents using http:// or ftp:// URLs
Browse tree structure of XML documents (in similar way to ‘ls’ command for directories)
Include one XML document into another using XInclude
XML c14n canonicalization
Escape/unescape special XML characters in input text
Print directory as XML document
Convert XML into PYX format (based on ESIS – ISO 8879), and vice versa
XMLStarlet command line utility is written in C and uses libxml2 and libxslt from http://xmlsoft.org/.
Implementation of extensive choice of options for XMLStarlet utility was only possible because of rich feature set of libxml2 and libxslt (many thanks to the developers of those libraries for great work).