12 XML


XML is a significant markup language mainly intended as a means of serialising data structures as a text document. Go has basic support for XML document processing.




XML is now a widespread way of representing complex data structures serialised into text format. It is used to describe documents such as DocBook and XHTML. It is used in specialised markup languages such as MathML and CML (Chemistry Markup Language). It is used to encode data as SOAP messages for Web Services, and the Web Service can be specified using WSDL (Web Services Description Language).


At the simplest level, XML allows you to define your own tags for use in text documents. Tags can be nested and can be interspersed with text. Each tag can also contain attributes with values. For example,


    <family> Newmarch </family>
    <personal> Jan </personal>
  <email type="personal">
  <email type="work">

The structure of any XML document can be described in a number of ways:


  • A document type definition DTD is good for describing structure
  • XML schema are good for describing the data types used by an XML document
  • RELAX NG is proposed as an alternative to both
  • 一个文档类型定义DTD有利于表现数据结构
  • 在一个XML文档中,使用XML模式有利于描述数据类型
  • RELAX NG提出了替代方案

There is argument over the relative value of each way of defining the structure of an XML document. We won't buy into that, as Go does not suport any of them. Go cannot check for validity of any document against a schema, but only for well-formedness.


Four topics are discussed in this chapter: parsing an XML stream, marshalling and unmarshalling Go data into XML, and XHTML.


Parsing XML


Go has an XML parser which is created using NewParser. This takes an io.Reader as parameter and returns a pointer to Parser. The main method of this type is Token which returns the next token in the input stream. The token is one of the types StartElement, EndElement, CharData, Comment, ProcInst or Directive.

Go有一个使用 NewParser.创建的XML解析器。这需要一个io.Reader 作为参数,并返回一个指向Parser 的指针。这个类型的主要方法是 Token ,这个方法返回输入流中的下一个标记。该标记是 StartElement, EndElement, CharData, Comment, ProcInstDirective 其中一种。

The types are



The type StartElement is a structure with two field types:

StartElement 类型是一个包含两个字段的结构:

type StartElement struct {
    Name Name
    Attr []Attr

type Name struct {
    Space, Local string

type Attr struct {
    Name  Name
    Value string

This is also a structure


type EndElement struct {
    Name Name

This type represents the text content enclosed by a tag and is a simple type


type CharData []byte

Similarly for this type


type Comment []byte

A ProcInst represents an XML processing instruction of the form <?target inst?>

一个ProcInst表示一个XML处理指令形式,如<target inst?>

type ProcInst struct {
    Target string
    Inst   []byte

A Directive represents an XML directive of the form <!text>. The bytes do not include the <! and > markers.

一个指令用XML指令<!文本>的形式表示,内容不包含< !和> 构成部分。

type Directive []byte

A program to print out the tree structure of an XML document is


/* Parse XML

package main

import (

func main() {
  if len(os.Args) != 2 {
    fmt.Println("Usage: ", os.Args[0], "file")
  file := os.Args[1]
  bytes, err := ioutil.ReadFile(file)
  r := strings.NewReader(string(bytes))

  parser := xml.NewDecoder(r)
  depth := 0
  for {
    token, err := parser.Token()
    if err != nil {
    switch t := token.(type) {
    case xml.StartElement:
      elmt := xml.StartElement(t)
      name := elmt.Name.Local
      printElmt(name, depth)
    case xml.EndElement:
      elmt := xml.EndElement(t)
      name := elmt.Name.Local
      printElmt(name, depth)
    case xml.CharData:
      bytes := xml.CharData(t)
      printElmt("\""+string([]byte(bytes))+"\"", depth)
    case xml.Comment:
      printElmt("Comment", depth)
    case xml.ProcInst:
      printElmt("ProcInst", depth)
    case xml.Directive:
      printElmt("Directive", depth)

func printElmt(s string, depth int) {
  for n := 0; n < depth; n++ {
    fmt.Print("  ")

func checkError(err error) {
  if err != nil {
    fmt.Println("Fatal error ", err.Error())

Note that the parser includes all CharData, including the whitespace between tags.


If we run this program against the person data structure given earlier, it produces

如果我们运行这个程序对前面给出的 person数据结构,它就会打印出

      " Newmarch "
      " Jan "

Note that as no DTD or other XML specification has been used, the tokenizer correctly prints out all the white space (a DTD may specify that the whitespace can be ignored, but without it that assumption cannot be made.)

注意,因为没有使用DTD或其他XML规范, tokenizer 正确地打印出所有的空白(一个DTD可能指定可以忽略空格,但是没有它假设就不能成立。)

There is a potential trap in using this parser. It re-uses space for strings, so that once you see a token you need to copy its value if you want to refer to it later. Go has methods such as func (c CharData) Copy() CharData to make a copy of data.

在使用这个解析器过程中有一个潜在的陷阱值得注意:它会为字符串重新利用空间,所以,一旦你看到一个你想要复制它的值的标记,假设你想稍后引用它的话,Go有类似的方法如 func (c CharData) Copy() CharData 来复制数据。

Unmarshalling XML


Go provides a function Unmarshal and a method func (*Parser) Unmarshal to unmarshal XML into Go data structures. The unmarshalling is not perfect: Go and XML are different languages.

Go提供一个函数 Unmarshal 和一个方法调用 func (*Parser) Unmarshal 解组XML转化为Go数据结构。解组并不是完美的:Go和XML毕竟是是两个不同的语言。

We consider a simple example before looking at the details. We take the XML document given earlier of


    <family> Newmarch </family>
    <personal> Jan </personal>
  <email type="personal">
  <email type="work">

We would like to map this onto the Go structures


type Person struct {
  Name Name
  Email []Email

type Name struct {
  Family string
  Personal string

type Email struct {
  Type string
  Address string

This requires several comments:


  1. Unmarshalling uses the Go reflection package. This requires that all fields by public i.e. start with a capital letter. Earlier versions of Go used case-insensitive matching to match fields such as the XML string "name" to the field Name. Now, though, case-sensitive matching is used. To perform a match, the structure fields must be tagged to show the XML string that will be matched against. This changes Person to
    type Person struct {
      Name Name `xml:"name"`
      Email []Email `xml:"email"`
  2. While tagging of fields can attach XML strings to fields, it can't do so with the names of the structures. An additional field is required, with field name "XMLName". This only affects the top-level struct, Person
    type Person struct {
      XMLName Name `xml:"person"`
      Name Name `xml:"name"`
      Email []Email `xml:"email"`
  3. Repeated tags in the map to a slice in Go
  4. Attributes within tags will match to fields in a structure only if the Go field has the tag ",attr". This occurs with the field Type of Email, where matching the attribute "type" of the "email" tag requires `xml:"type,attr"`
  5. If an XML tag has no attributes and only has character data, then it matches a string field by the same name (case-sensitive, though). So the tag `xml:"family"` with character data "Newmarch" maps to the string field Family
  6. But if the tag has attributes, then it must map to a structure. Go assigns the character data to the field with tag ,chardata. This occurs with the "email" data and the field Address with tag ,chardata
  1. 使用Go reflection包去解组。这要求所有字段是公有,也就是以一个大写字母开始。早期版本的Go使用不区分大小写匹配来匹配字段,例如XML标签“name”对应Name字段。但是现在使用case-sensitive匹配,要执行一个匹配,结构字段后必须用标记来显示XML标签名,以应付匹配。Person修改下应该是
    type Person struct {
      Name Name `xml:"name"`
      Email []Email `xml:"email"`
  2. 虽然标记结构字段可以使用XML字符串,但是对于结构名不能这么做 ,这个解决办法是增加一个额外字段,命名“XMLName”。这只会影响上级结构,修改Person 如下
    type Person struct {
      XMLName Name `xml:"person"`
      Name Name `xml:"name"`
      Email []Email `xml:"email"`
  3. 重复标记会映射到Go的slice
  4. 要包含属性的标签准确匹配对应的结构字段,只有在Go字段后标记”,attr”。举个下面例子中 Email类型的Type字段,需要标记`xml:"type,attr"`才能匹配带有“type”属性的“email”
  5. 如果一个XML标签没有属性而且只有文本内容,那么它匹配一个string 字段是通过相同的名称(区分大小写的,不过如此)。所以标签`xml:"family"`将对应着文本”Newmarch”映射到Family的string字段中
  6. 但如果一个标签带有属性,那么它这个特征必须反映到一个结构。Go在字段后标记着 ,chardata的文字。如下面例子中通过 Address 后标记,chardata的字段来获取email的文本值

A program to unmarshal the document above is


/* Unmarshal

package main

import (

type Person struct {
  XMLName Name    `xml:"person"`
  Name    Name    `xml:"name"`
  Email   []Email `xml:"email"`

type Name struct {
  Family   string `xml:"family"`
  Personal string `xml:"personal"`

type Email struct {
  Type    string `xml:"type,attr"`
  Address string `xml:",chardata"`

func main() {
  str := `<?xml version="1.0" encoding="utf-8"?>
    <family> Newmarch </family>
    <personal> Jan </personal>
  <email type="personal">
  <email type="work">

  var person Person

  err := xml.Unmarshal([]byte(str), &person)

  // now use the person structure e.g.
 fmt.Println("Family name: \"" + person.Name.Family + "\"")
  fmt.Println("Second email address: \"" + person.Email[1].Address + "\"")

func checkError(err error) {
  if err != nil {
    fmt.Println("Fatal error ", err.Error())

(Note the spaces are correct.). The strict rules are given in the package specification.


Marshalling XML

编组 XML

Go 1 also has support for marshalling data structures into an XML document. The function is


func Marshal(v interface}{) ([]byte, error)

This was used as a check in the last two lines of the previous program.



At present there is no support for marshalling a Go data structure
into XML. In this section we present a simple marshalling
function that will give
a basic serialisation. The result can be unmarshalled using
the Go function Unmarshal of the previous section.

目前还不支持编组Go数据结构为XML。 在这一节中我们提出一个简单的编组函数,将提供一个基本连载。使用上一节的Go函数Unmarshal编组出结果。

A straightforward but naive approach would be to write code that
walks over your data structures, printing out results as it goes.
But if is customised to your data types, then you wil need to change
code each time the types change.


A better approach, and one that is used by the Go serialisation
libraries is to use the reflection package.
This is a package that allows you to examine data types and
data structures from within a running program. The idea of
reflection has been present in artificial intelligence
programming for many years, but is still seen as a rather arcane
technique for mainstream languages.

有一个更好的方法,一个是用于Go连载库是使用 reflection 包。这是一个允许您从一个运行着的程序中检查数据类型和数据结构的包。这个反射的办法多年一直存在于人工智能编程,但相对于主流语言仍被视为一个相当晦涩难懂的技术。

Go has two principal reflection types:
reflect.Type gives information about the Go types,
while reflect.Value gives information about a
particular data value. Value has a method
Type() that can return the type.


The simplest types and values correspond to primitive types.
For example, there is IntType, BoolType
etc, which can be used as values in type switches to determine the
precise type of a Type. The corresponding value types
are IntValue and BoolValue with
methods such as Get to return the value.

最简单的类型和值相当于基础类型。例如,< IntTypeBoolType等等,这可以作为值的类型转换器,用来确定Type 精确的类型。相应的值类型是通过调用方法如Get来得到返回值IntValueBoolValue

A StructType is more complex, as it has methods
to access the fields by

func (t *StructType) Field(i int) (f StructField)

and a StructField has methods such as
Name to return the string value of the field's
label. This is useful for examing the type structure.


func (t *StructType) Field(i int) (f StructField)


A StructValue is useful for examining the value
of fields of a data value. It has a method

func (v *StructValue) Field(i int) Value

which can be used to extract the value of each field.


func (v *StructValue) Field(i int) Value


The reflection process is basically stsrted by calling
NewValue on a data object, and then examining
its type and recursively walking through the values.
What we do with each value is to surround it by tags,
made of field names of the structures encountered.

反射过程基本上是开始通过在一个数据对象调用NewValue,然后检查它的类型并且递归地遍历值。我们对每一个值所做的都是围绕它的标签,由遇到的结构字段名构成的 。

There are two complexities: the first is that the initial
data value will tpyically be a structure, and this doesn't
have a field name as it is not itself part of a structure.
For this starting case, we use the type name of the structure
as XML tag name.


The second complexity comes with arrays or slices. In this case we need to work through each element of the array/slice,
each time repeating the field name from the enclosing


We define thre functions: Marshal which takes an initial data value. This prepares the XML document and creates the toplevel tag from the structure's type name.
The second function recurses through the
type values, switching on data types and writing tags from
field names and values as XML character data.
The third function handles the special case of slices,
as the tag name needs to be kept for all of the elements
of this slice.


We ignore pointers, channels, etc. We also do not produce
attributes, just tags and character data.
The program is

/* Marshal

package main

import (

type Person struct {
  Name  Name
  Email []Email

type Name struct {
  Family   string
  Personal string

type Email struct {
  Kind    string "attr"
  Address string "chardata"

func main() {
  person := Person{
    Name: Name{Family: "Newmarch", Personal: "Jan"},
    Email: []Email{Email{Kind: "home", Address: "jan"},
      Email{Kind: "work", Address: "jan"}}}

  buff := bytes.NewBuffer(nil)
  Marshal(person, buff)

func Marshal(e interface{}, w io.Writer) {
  // make it a legal XML document
 w.Write([]byte("<?xml version=\"1.1\" encoding=\"UTF-8\" ?>\n"))

  // topvel e is a value and has no structure field, 
 // so use its type
 typ := reflect.TypeOf(e)
  name := typ.Name()

  startTag(name, w)
  MarshalValue(reflect.ValueOf(e), w)
  endTag(name, w)

func MarshalValue(v reflect.Value, w io.Writer) {
  t := v.Type()
  switch t.Kind() {
  case reflect.Struct:
    for n := 0; n < t.NumField(); n++ {
      field := t.Field(n)

      vv := v

      // special case if it is a slice

      if vv.Field(n).Type().Kind() == reflect.Slice {
        // slice
          vv.Field(n), w)
      } else {
        // not a slice
       startTag(field.Name, w)
        MarshalValue(vv.Field(n), w)
        endTag(field.Name, w)
  case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64, reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
  case reflect.Bool:
  case reflect.String:
    vv := v
    w.Write([]byte("   " + vv.String() + "\n"))

func MarshalSliceValue(tag string, v reflect.Value, w io.Writer) {
  for n := 0; n < v.Len(); n++ {
    startTag(tag, w)
    MarshalValue(v.Index(n), w)
    endTag(tag, w)

func startTag(s string, w io.Writer) {
  w.Write([]byte("<" + s + ">\n"))

func endTag(s string, w io.Writer) {
  w.Write([]byte("</" + s + ">\n"))

func checkError(err error) {
  if err != nil {
    fmt.Println("Fatal error ", err.Error())




HTML does not conform to XML syntax. It has unterminated tags such as '<br>'. XHTML is a cleanup of HTML to make it compliant to XML. Documents in XHTML can be managed using the techniques above for XML.

HTML并不符合XML语法。 它包含无闭端的标签如“
”。XHTML是HTML的一个自身兼容XML的子集。 在XHTML文档中可以使用操作XML的技术。


There is some support in the XML package to handle HTML documents even though they are not XML-compliant. The XML parser discussed earlier can handle many HTML documents if it is modified by


  parser := xml.NewDecoder(r)
  parser.Strict = false
  parser.AutoClose = xml.HTMLAutoClose
  parser.Entity = xml.HTMLEntity



Go has basic support for dealing with XML strings. It does not as yet have mechanisms for dealing with XML specification languages such as XML Schema or Relax NG.

Go基本支持对XML字符的处理,而且它不像有着针对XML专用语言如XML Schema或Relax NG的处理机制。