4 数据序列化
Communication between a client and a service requires the exchange of data. This data may be highly structured, but has to be serialised for transport. This chapter looks at the basics of serialisation and then considers several techniques supported by Go APIs.
客户端与服务之间通过数据交换来通信。因为数据可能是高度结构化的,所以在传输前必须进行序列化。这一章将研究序列化基础并介绍一些Go API提供的序列化技术。
Introduction
简介
A client and server need to exchange information via messages. TCP and UDP provide the transport mechanisms to do this. The two processes also have to have a protocol in place so that message exchange can take place meaningfully.
客户端与服务器需要通过消息来交换信息。TCP与UDP是消息传递的两种机制,在这两种机制之上就需要有合适的协议来约定传输的内容的含义。
Messages are sent across the network as a sequence of bytes, which has no structure except for a linear stream of bytes. We shall address the various possibilities for messages and the protocols that define them in the next chapter. In this chapter we concentrate on a component of messages - the data that is transferred.
在网络上,消息被当作字节序列来传输,它们是没有结构的,仅仅只是一串字节流。我们将在下一章讨论定义消息与协议涉及到的的各种问题。本章,我们只重点关注消息的一个方面 - 被传输的数据
A program will typically build complex data structures to hold the current program state. In conversing with a remote client or service, the program will be attempting to transfer such data structures across the network - that is, outside of the application's own address space.
程序通常构造一个复杂的数据结构来保存其自身当前的状态。在与远程的客户端或服务的交互中,程序会通过网络将这样的数据结构传输到 -应用程序所在的地址空间之外的地方
Programming languages use structured data such as
编程语言使用的结构化的数据类型有
- records/structures
- variant records
- array - fixed size or varying
- string - fixed size or varying
- tables - e.g. arrays of records
- non-linear structures such as
- circular linked list
- binary tree
- objects with references to other objects
- 记录/结构
- 可变记录
- 数组 - 固定大小或可变大小
- 字符串 - 固定大小或可变大小
- 表 - 例如:记录构成的数组
- 非线程结构,比如
- 循环链表
- 二叉树
- 含有其他对象引用的对象
None of IP, TCP or UDP packets know the meaning of any of these data types. All that they can contain is a sequence of bytes. Thus an application has to serialise any data into a stream of bytes in order to write it, and deserialise the stream of bytes back into suitable data structures on reading it. These two operations are known as marshalling and unmarshalling respectively.
IP,TCP或者UDP网络包并不知道这些数据类型的含义,它们只是字节序列的载体。因此,写入网络包的时候,应用需要将要传输的(有类型的)数据 序列化 成字节流,反之,读取网络包的时候,应用需要将字节流反序列化成合适的数据结构,这两个操作被分别称为编组和解组。
For example, consider sending the following variable length table of two columns of variable length strings:
例如:考虑发送如下这样一个由两列可变长度字符串构成的可变长度的表格
fred | programmer |
liping | analyst |
sureerat | manager |
This could be done by in various ways. For example, suppose that it is known that the data will be an unknown number of rows in a two-column table. Then a marshalled form could be
这可以通过多种方式来完成。比如:假设知道数据是一个未知行数的两列表格,那么编组形式可能是:
3 // 3 rows, 2 columns assumed
4 fred // 4 char string,col 1
10 programmer // 10 char string,col 2
6 liping // 6 char string, col 1
7 analyst // 7 char string, col 2
8 sureerat // 8 char string, col 1
7 manager // 7 char string, col 2
Variable length things can alternatively have their length indicated by terminating them with an "illegal" value, such as '\0' for strings:
可变长度的事物都可以通过用一个“非法”的终结值,比如对于字符串来说的'\0',来间接获得它们的长度
3
fred\0
programmer\0
liping\0
analyst\0
sureerat\0
manager\0
Alternatively, it may be known that the data is a 3-row fixed table of two columns of strings of length 8 and 10 respectively. Then a serialisation could be
假设知道数据是一个三行两列且每列长度分别是8或10的表格,那么序列化的结果可能是:
fred\0\0\0\0
programmer
liping\0\0
analyst\0\0\0
sureerat
manager\0\0\0
Any of these formats is okay - but the message exchange protocol must specify which one is used, or allow it to be determined at runtime.
这些格式中的任意一种都是可行的 - 但是消息交换协议必须指定使用哪一种(格式),或者约定在运行期再做决定。
Mutual agreement
交互协议
The previous section gave an overview of the issue of data serialisation. In practise, the details can be considerably more complex. For example, consider the first possibility, marshalling a table into the stream
前一小节总结了在数据序列化过程中可能遇到的各种问题。而在实际操作中,需要考虑的细节还更多一些,例如:先考虑下面这个问题,如何将下面这个表编组成流.
3
4 fred
10 programmer
6 liping
7 analyst
8 sureerat
7 manager
Many questions arise. For example, how many rows are possible for the table - that is, how big an integer do we need to describe the row size? If it is 255 or less, then a single byte will do, but if it is more, then a short, integer or long may be needed. A similar problem occurs for the length of each string. With the characters themselves, to which character set do they belong? 7 bit ASCII? 16 bit Unicode? The question of character sets is discussed at length in a later chapter.
许多问题冒出来了。例如:这个表格可能有多少行?- 即我们需要多大的整数来表示表格的大小,如果它只有255行或者更少,那么一个字节就够了,如果更大一些,就可能需要short,integer或者long来表示了。对于字符串的长度也存在同样的问题,对字符本身来说,它们属于哪种字符集? 7位的ASCII?16位的Unicode?字符集的问题将会在后面的章节里详细讨论。
The above serialisation is opaque or implicit. If data is marshalled using the above format, then there is nothing in the serialised data to say how it should be unmarshalled. The unmarshalling side has to know exactly how the data is serialised in order to unmarshal it correctly. For example, if the number of rows is marshalled as an eight-bit integer, but unmarshalled as a sixteen-bit integer, then an incorrect result will occur as the receiver tries to unmarshall 3 and 4 as a sixteen-bit integer, and the receiving program will almost certainly fail later.
上面的序列化是不透明的或者被称为隐式的,如果采用这种格式来编组数据,那么序列化后的数据中没有包含任何指示它应该被如何解组的信息。为了正确的解组,解组的一端需要精确的知晓编组的方式。如果数据的行数以8位整型数的方式编组,却以16位整型的方式解组,那么接收者将得到错误的解码结果。比如接受者尝试将3与4当作16位整型解组,在后续的程序运行的时候肯定会失败。
An early well-known serialisation method is XDR (external data representation) used by Sun's RPC, later known as ONC (Open Network Computing). XDR is defined by RFC 1832 and it is instructive to see how precise this specification is. Even so, XDR is inherently type-unsafe as serialised data contains no type information. The correctness of its use in ONC is ensured primarily by compilers generating code for both marshalling and unmarshalling.
早期比较出名的序列化方法是Sun公司的RPC中使用的XDR(外部资料表示法)。后来就是ONC(开放式网络运算)。XDR由 RFC 1832定义,阅读一下这个规范的详细定义是有意义的,即便如此,由于序列化的数据中不包含类型信息,XDR是天生不安全的。ONC中主要通过由编译器为编、解组生成额外的代码来确保数据的正确性。
Go contains no explicit support for marshalling or unmarshalling opaque serialised data. The RPC package in Go does not use XDR, but instead uses "gob" serialisation, described later in this chapter.
Go没有为编、解组不透明的序列化数据提供显式的支持,标准包中的RPC包也没有使用XDR,而是使用了这一章后面的小节中将要介绍的gob来作为替代方案。
Self-describing data
自描述数据
Self-describing data carries type information along with the data. For example, the previous data might get encoded as
自描述数据在最终的结果数据中附带了类型信息,例如,前面提到的数据可能被编码为:
table
uint8 3
uint 2
string
uint8 4
[]byte fred
string
uint8 10
[]byte programmer
string
uint8 6
[]byte liping
string
uint8 7
[]byte analyst
string
uint8 8
[]byte sureerat
string
uint8 7
[]byte manager
Of course, a real encoding would not normally be as cumbersome and verbose as in the example: small integers would be used as type markers and the whole data would be packed in as small a byte array as possible. (XML provides a counter-example, though.). However, the principle is that the marshaller will generate such type information in the serialised data. The unmarshaller will know the type-generation rules and will be able to use this to reconstruct the correct data structure.
当然,实际使用的编码方式不会如此啰嗦。小整数可能被用作类型标记,并且整个数据编码后的字节数组会尽量的小(XML是一个反例)。原则就是编组器会在序列化后的数据中包含类型信息。解组器知道类型生成的规则,并使用此规则重组出正确的数据结构。
ASN.1
抽象语法表示法
Abstract Syntax Notation One (ASN.1) was originally designed in 1984 for the telecommunications industry. ASN.1 is a complex standard, and a subset of it is supported by Go in the package "asn1". It builds self-describing serialised data from complex data structures. Its primary use in current networking systems is as the encoding for X.509 certificates which are heavily used in authentication systems. The support in Go is based on what is needed to read and write X.509 certificates.
抽象语法表示法/1(ASN.1)最初出现在1984年,它是一个为电信行业设计的复杂标准,Go的标准包asn1实现了它的一个子集,它可以将复杂的数据结构序列化成自描述的数据。在当前的网络系统中,它主要用于对认证系统中普遍使用的X.509证书的编码。Go对ASN.1的支持主要是X.509证书的读写上。
Two functions allow us to marshal and unmarshal data
以下两个函数用以对数据的编、解组
func Marshal(val interface{}) ([]byte, os.Error)
func Unmarshal(val interface{}, b []byte) (rest []byte, err os.Error)
The first marshals a data value into a serialised byte array, and the second unmarshals it. However, the first argument of type interface
deserves further examination. Given a variable of a type, we can marshal it by just passing its value. To unmarshal it, we need a variable of a named type that will match the serialised data. The precise details of this are discussed later. But we also need to make sure that the variable is allocated to memory for that type, so that there is actually existing memory for the unmarshalling to write values into.
前一个将数据值编组成序列化的字节数组,后一个将其解组出来,需要对interface
类型的参数进行更多的类型检查。编组时,我们只需要传递某个类型的变量的值即可,解组它,则需要一个与被序列化过的数据匹配的确定类型的变量,我们将在后面讨论这部分的细节 。除了有确定类型的变量外,我们同时需要保证那个变量的内存已经被分配,以使被解组后的数据能有实际被写入的地址。
We illustrate with an almost trivial example, of marshalling and unmarshalling an integer. We can pass an integer value to Marshal
to return a byte array, and unmarshal the array into an integer variable as in this program: <!-- . We can find the time from the function LocalTime
in package "time". We can marshal this value directly as type net.Time
. To unmarshal, we need the address a variable also of that type. BUT: that variable must have allocated space, which we can ensure by using the operator new
. -->
我们将举一个整数编、解组的小例子。在这个例子中。我们先将一个整数传递给Marshal得到一个字节数组,然后又将此数组解组成一个整数。
/* ASN.1
*/
package main
import (
"encoding/asn1"
"fmt"
"os"
)
func main() {
mdata, err := asn1.Marshal(13)
checkError(err)
var n int
_, err1 := asn1.Unmarshal(mdata, &n)
checkError(err1)
fmt.Println("After marshal/unmarshal: ", n)
}
func checkError(err error) {
if err != nil {
fmt.Fprintf(os.Stderr, "Fatal error: %s", err.Error())
os.Exit(1)
}
}
The unmarshalled value, is of course, 13.
当然,被解组后的值,是13
Once we move beyond this, things get harder. In order to manage more complex data types, we have to look more closely at the data structures supported by ASN.1, and how ASN.1 support is done in Go.
一旦我们越过了这个小关卡,事情开始变得复杂。为了管理更复杂的数据类型,我们需要更深入的了解ASN.1支持的数据类型,以及Go是如何支持ASN.1的。
Any serialisation method will be able to handle certain data types and not handle some others. So in order to determine the suitability of any serialisation such as ASN.1, you have to look at the possible data types supported versus those you wish to use in your application. The following ASN.1 types are taken from http://www.obj-sys.com/asn1tutorial/node4.html
任何序列化方法都只能处理某些数据类型,而对其他的数据类型无能为力。因此为了评估类似ASN.1等序列化方案的可行性,你必须先将要在程序中使用的数据类型与它们支持的数据类型做个比较,下面是ASN.1支持的数据类型,它们来自于http://www.obj-sys.com/asn1tutorial/node4.html
The simple types are
简单数据类型有:
- BOOLEAN: two-state variable values
- INTEGER: Model integer variable values
- BIT STRING: Model binary data of arbitrary length
- OCTET STRING: Model binary data whose length is a multiple of eight
- NULL: Indicate effective absence of a sequence element
- OBJECT IDENTIFIER: Name information objects
- REAL: Model real variable values
- ENUMERATED: Model values of variables with at least three states
- CHARACTER STRING: Models values that are strings of characters fro
- BOOLEAN:两态变量值
- INTEGER:表征整型变量值
- BIT STRING:表征任意长度的二进制数据
- OCT STRING:表征长度是8的倍数的二进制数据
- NULL:指示一个没有有效数据的序列
- OBJECT IDENTIFIER:命名信息对象
- REAL:表征一个real变量值
- ENUMERATED:表征一个至少有三个状态的变量值
- CHARACTER STRING:表征一个字符串值
Character strings can be from certain character sets
字符串可以来自于确定的字符集
- NumericString: 0,1,2,3,4,5,6,7,8,9, and space
- PrintableString: Upper and lower case letters, digits, space, apostrophe, left/right parenthesis, plus sign, comma, hyphen, full stop, solidus, colon, equal sign, question mark
- TeletexString (T61String): The Teletex character set in CCITT's T61, space, and delete
- VideotexString: The Videotex character set in CCITT's T.100 and T.101, space, and delete
- VisibleString (ISO646String): Printing character sets of international ASCII, and space
- IA5String: International Alphabet 5 (International ASCII)
- GraphicString 25 All registered G sets, and space GraphicString
- NumericString: 0,1,2,3,4,5,6,7,8,9, 与空格(space)
- PrintableString: 大、小写字母,数字,空格,省略号,左、右小括号,加号,逗号,连字符,句号,斜线,冒号,等号,问号
- TeletexString(T61String): CCITT的Teletex字符集中的T61,空格和删除(delete)
- VideotexString:CCITT的Videotex字符集中的T.100与T.101, 空格和删除(delete)
- VisibleString (ISO646String):国际ASCII中的打印字符集和空格
- IA5String:国际字母表5(国际ASCII)
- GraphicString:所有被注册的G集和空格
And finally, there are the structured types:
最后,以下是结构化的类型:
- SEQUENCE: Models an ordered collection of variables of different type
- SEQUENCE OF: Models an ordered collection of variables of the same type
- SET: Model an unordered collection of variables of different types
- SET OF: Model an unordered collection of variables of the same type
- CHOICE: Specify a collection of distinct types from which to choose one type
- SELECTION: Select a component type from a specified CHOICE type
- ANY: Enable an application to specify the type Note: ANY is a deprecated ASN.1 Structured Type. It has been replaced with X.680 Open Type.
- SEQUENCE:表征不同类型变量构成的有序集合
- SEQUENCE OF: 表征相同类型的变量构成的有序集合
- SET: 表征不同类型的变量构成的无序集合
- SET OF:表征相同类型的变量构成的有序集合
- CHOICE:从一个不同类型构成的特定集合中选出一个类型
- SELECTION: 从一个特定的CHOICE类型中选取一个组件类型
- ANY:启用一个用以指定类型的应用. 注意:ANY是一个弃用的ASN.1结构类型,它被x.680的 Open Type所替代
Not all of these are supported by Go. Not all possible values are supported by Go. The rules as given in the Go "asn1" package documentation are
不是以上所有的类型、可能的值都被Go支持,在Go 'asn1'包文档中定义的规则如下:
- An ASN.1 INTEGER can be written to an int or int64. If the encoded value does not fit in the Go type, Unmarshal returns a parse error.
- An ASN.1 BIT STRING can be written to a BitString.
- An ASN.1 OCTET STRING can be written to a []byte.
- An ASN.1 OBJECT IDENTIFIER can be written to an ObjectIdentifier.
- An ASN.1 ENUMERATED can be written to an Enumerated.
- An ASN.1 UTCTIME or GENERALIZEDTIME can be written to a *time.Time.
- An ASN.1 PrintableString or IA5String can be written to a string.
- Any of the above ASN.1 values can be written to an interface{}. The value stored in the interface has the corresponding Go type. For integers, that type is int64.
- An ASN.1 SEQUENCE OF x or SET OF x can be written to a slice if an x can be written to the slice's element type.
- An ASN.1 SEQUENCE or SET can be written to a struct if each of the elements in the sequence can be written to the corresponding element in the struct.
- ASN.1 INTEGER 可以被写入int或者int64中. 如果被编码的值与Go类型不匹配,Unmarshal将返回一个解析错误.
- ASN.1 BIT STRING 可以被写入BitString中.
- ASN.1 OCT STRING可以被写入[]byte中.
- ASN.1 OBJECT IDENTIFIER 可以被写入ObjectIdentifier中.
- ASN.1 ENUMERATED 可以被写入Enumerated中.
- ASN.1 UTCTIME或者GENERALIZEDTIME可以被写入*time.Time中.
- ASN.1 PrintableString 或者 IA5String可以被写入string中.
- 以上的任何ASN.1类型的值都可以作为对应的Go类型的值写入interface{}中。比如整数放入interface{}的话,它对应的类型是int64。
- 如果一个变量x可以被当做某个类型写入,那么ASN.1中的x构成的有序列或者集合就可以当做这个类型的slice写入了。
- 如果某个有序列或者集合中的所有元素都可以被写入到某个结构里与之对应的元素中,那么此ASN.1 SEQUENCE 或者SET就可以写入到这个结构中。
Go places real restrictions on ASN.1. For example, ASN.1 allows integers of any size, while the Go implementation will only allow upto signed 64-bit integers. On the other hand, Go distinguishes between signed and unsigned types, while ASN.1 doesn't. So for example, transmitting a value of uint64
may fail if it is too large for int64
,
Go在实现上,为ASN.1添加了一些约束。例如ASN.1允许任意大小的整数,而GO只允许最大为64位有符号整数能表示的值.另一方面,Go区分有符号类型与无符号类型,而在ASN.1则没有分别.因此传递一个大于int64
最大值能表示的uint64
的值,则可能会失败。
In a similar vein, ASN.1 allows several different character sets. Go only supports PrintableString and IA5String (ASCII). ASN.1 does not support Unicode characters (which require the BMPString ASN.1 extension). The basic Unicode character set of Go is not supported, and if an application requires transport of Unicode characters, then an encoding such as UTF-7 will be needed. Such encodings are discussed in a later chapter on character sets.
同理,ASN.1允许多个不同的字符集,而Go只支持PrintableString和IA5String(ASCII). ASN.1不支持Unicode字符(它需要BMPString ASN.1扩展),连Go中的基本Unicode字符集它都不支持,如果应用程序需要传输Unicode字符,则可能需要类似UTF-7的编码。有关编码的内容将会在后边字符集相关的章节来讨论。
We have seen that a value such as an integer can be easily marshalled and unmarshalled. Other basic types such as booleans and reals can be similarly dealt with. Strings which are composed entirely of ASCII characters can be marshalled and unmarshalled. However, if the string is, for example, "hello \u00bc" which contains the non-ASCII character '¼' then an error will occur: "ASN.1 structure error: PrintableString contains invalid character". This code works, as long as the string is only composed of printable characters:
我们已经看到,整型的值很容易被编、解组。类似的boolean与real等基本类型处理手法也类似。由ASCII字符构成的字符串也很容易。但当处理 "hello \u00bc"这种含有 '¼'这个非ASCII字符的字符串,则会出现错误:“ASN.1 结构错误:PrintableString包含非法字符”。以下的代码仅在处理由可打印字符(printable characters)构成的字符串时,工作良好。
s := "hello"
mdata, _ := asn1.Marshal(s)
var newstr string
asn1.Unmarshal(mdata, &newstr)
ASN.1 also includes some "useful types" not in the above list, such as UTC time. Go supports this UTC time type. This means that you can pass time values in a way that is not possible for other data values. ASN.1 does not support pointers, but Go has special code to manage pointers to time values. The function GetLocalTime
returns *time.Time
. The special code marshals this, and it can be unmarshalled into a pointer variable to a time.Time
object. Thus this code works
ASN.1还包含一些未在上边列表中出现的“有用的类型(useful types)”, 比如UTC时间类型,GO支持此UTC时间类型。就是说你可以用一种特有的类型来传递时间值。ASN.1不支持指针,Go中却有指向时间值的指针。比如函数GetLocalTime返回*time.Time
。asn1包编组这个time结构,也使用这个包解组到一个time.Time
对象指针中。代码如下
t := time.LocalTime()
mdata, err := asn1.Marshal(t)
var newtime = new(time.Time)
_, err1 := asn1.Unmarshal(&newtime, mdata)
Both LocalTime
and new
handle pointers to a *time.Time
, and Go looks after this special case.
LocalTime
与new
函数都返回的是*time.Time类型的指针,GO将内部对这些特殊类型进行处理。
In general, you will probably want to marshal and unmarshal structures. Apart from the special case of time, Go will happily deal with structures, but not with pointers to structures. Operations such as new
create pointers, so you have to dereference them before marshalling/unmarshalling them. Go normally dereferences pointers for you when needed, but not in this case. These both work for a type T
:
除了time这种特殊情况外,你可能要编、解组结构类型。除了上面提到的Time结构外,其他的结构Go还是很好处理的。类以new
的操作将会创建指针,因此在编、解组之前,你需要解引用它。通常,Go会随需自动对指针进行解引用,但是下面这个例子并不是这么个情况。对于类型T,以下两种方式均可.
// using variables
var t1 T
t1 = ...
mdata1, _ := asn1.Marshal(t)
var newT1 T
asn1.Unmarshal(&newT1, mdata1)
/// using pointers
var t2 = new(T)
*t2 = ...
mdata2, _ := asn1.Marshal(*t2)
var newT2 = new(T)
asn1.Unmarshal(newT2, mdata2)
Any suitable mix of pointers and variables will work as well.
恰当地的使用指针与变量能让代码工作得更好。
The fields of a structure must all be exportable, that is, field names must begin with an uppercase letter. Go uses the reflect
package to marshal/unmarshal structures, so it must be able to examine all fields. This type cannot be marshalled:
结构的所有字段必须是公共的,即字段名必须以大写字母开头。Go内部实际是使用reflect
包来编、解组结构,因此reflect包必须能访问所有的字段。比如下面这个类型是不能被编组的:
type T struct {
Field1 int
field2 int // not exportable
}
ASN.1 only deals with the data types. It does not consider the names of structure fields. So the following type T1
can be marshalled/unmarshalled into type T2
as the corresponding fields are the same types:
ASN.1只处理数据类型,它并不关心结构字段的名字。因此只要对应的字段类型相同那么下面的T1类型将可以被解、解组到T2类型中。
type T1 struct {
F1 int
F2 string
}
type T2 struct {
FF1 int
FF2 string
}
Not only the types of each field must match, but the number must match as well. These two types don't work:
不仅每个字段的类型必须匹配,而且字段数目也要相等,下面两个类型将不能互编、解码:
type T1 struct {
F1 int
}
type T2 struct {
F1 int
F2 string // too many fields
}
ASN.1 daytime client and server
ASN.1 日期查询服务客户端与服务器
Now (finally) let us turn to using ASN.1 to transport data across the network.
现在(最后)让我们使用ASN.1来跨网络传输数据
We can write a TCP server that delivers the current time as an ASN.1 Time type, using the techniques of the last chapter. A server is
我们可以使用上一章的技术来编写一个将当前时间作为ASN.Time类型时间来传送的TCP服务器。服务器是:
/* ASN1 DaytimeServer
*/
package main
import (
"encoding/asn1"
"fmt"
"net"
"os"
"time"
)
func main() {
service := ":1200"
tcpAddr, err := net.ResolveTCPAddr("tcp", service)
checkError(err)
listener, err := net.ListenTCP("tcp", tcpAddr)
checkError(err)
for {
conn, err := listener.Accept()
if err != nil {
continue
}
daytime := time.Now()
// Ignore return network errors.
mdata, _ := asn1.Marshal(daytime)
conn.Write(mdata)
conn.Close() // we're finished
}
}
func checkError(err error) {
if err != nil {
fmt.Fprintf(os.Stderr, "Fatal error: %s", err.Error())
os.Exit(1)
}
}
which can be compiled to an executable such as ASN1DaytimeServer
and run with no arguments. It will wait for connections and then send the time as an ASN.1 string to the client.
它可以被编译为一个诸如名为ASN1DaytimeServer的可执行程序,运行它不需要任何实际参数,(启动后)它将等待来自客户端的连接,当有新连接后它会将当前时间当作ASN.1字符串传回给客户端.
A client is
客户端代码是
/* ASN.1 DaytimeClient
*/
package main
import (
"bytes"
"encoding/asn1"
"fmt"
"io"
"net"
"os"
"time"
)
func main() {
if len(os.Args) != 2 {
fmt.Fprintf(os.Stderr, "Usage: %s host:port", os.Args[0])
os.Exit(1)
}
service := os.Args[1]
conn, err := net.Dial("tcp", service)
checkError(err)
result, err := readFully(conn)
checkError(err)
var newtime time.Time
_, err1 := asn1.Unmarshal(result, &newtime)
checkError(err1)
fmt.Println("After marshal/unmarshal: ", newtime.String())
os.Exit(0)
}
func checkError(err error) {
if err != nil {
fmt.Fprintf(os.Stderr, "Fatal error: %s", err.Error())
os.Exit(1)
}
}
func readFully(conn net.Conn) ([]byte, error) {
defer conn.Close()
result := bytes.NewBuffer(nil)
var buf [512]byte
for {
n, err := conn.Read(buf[0:])
result.Write(buf[0:n])
if err != nil {
if err == io.EOF {
break
}
return nil, err
}
}
return result.Bytes(), nil
}
This connects to the service given in a form such as localhost:1200
, reads the TCP packet and decodes the ASN.1 content back into a string, which it prints.
连接字符串形如:localhost:1200
。它将读取应答TCP包然后将ASN.1内容解码成字符串并输出。
We should note that neither of these two - the client or the server - are compatable with the text-based clients and servers of the last chapter. This client and server are exchanging ASN.1 encoded data values, not textual strings.
我们应当注意,无论是客户端还是服务器都不兼容前一章介绍的基于文本的客户端与服务器。此地的客户端与服务器交换的是ASN.1编码的数据值,而非文本串。
JSON
JSON
JSON stands for JavaScript Object Notation. It was designed to be a lighweight means of passing data between JavaScript systems. It uses a text-based format and is sufficiently general that it has become used as a general purpose serialisation method for many programming languages.
JSON全称是JavaScript Object Notation,它是一种应用于JavaScript系统之间传递数据的轻量级格式。它使用基于文本的格式,因为足够通用,现在已经成为了多种编程语言采用的通用的序列化方法了。
JSON serialises objects, arrays and basic values. The basic values include string, number, boolean values and the null value. Arrays are a comma-separated list of values that can represent arrays, vectors, lists or sequences of various programming languages. They are delimited by square brackets "[ ... ]". Objects are represented by a list of "field: value" pairs enclosed in curly braces "{ ... }".
JSON 序列化对象,数组和基本值。基本值包括:字符串,数字,布尔值和NULL值。数组是逗号分割的一组值的列表,可以用来表示各种编程语言中的数组、向量、列表或者序列。它们由方括号来界定,对象则由一个包含在大括号中的"field: values"对构成的列表来表示。
For example, the table of employees given earlier could be written as an array of employee objects:
例如.前面提到过的雇员表可以被编码成如下的一个雇员对象的数组.
[
{Name: fred, Occupation: programmer},
{Name: liping, Occupation: analyst},
{Name: sureerat, Occupation: manager}
]
There is no special support for complex data types such as dates, no distinction between number types, no recursive types, etc. JSON is a very simple language, but nevertheless can be quite useful. Its text-based format makes it easy for people to use, even though it has the overheads of string handling.
JSON没有为类似日期这这样的复杂数据类型提供特别的格式支持,不区分各种数字类型,也没有递归类型等。JSON是一个非常简单但却十分有用的语言,尽管他基于文本的格式在字符传递上开销过多,但是却很适合人类阅读和使用。
From the Go JSON package specification, marshalling uses the following type-dependent default encodings:
从Go JSON包的规范文档可知,JSON包将在编组时使用以下类型相关的默认编码方法:
- Boolean values encode as JSON booleans.
- Floating point and integer values encode as JSON numbers.
- String values encode as JSON strings, with each invalid UTF-8 sequence replaced by the encoding of the Unicode replacement character U+FFFD.
- Array and slice values encode as JSON arrays, except that []byte encodes as a base64-encoded string.
- Struct values encode as JSON objects. Each struct field becomes a member of the object. By default the object's key name is the struct field name converted to lower case. If the struct field has a tag, that tag will be used as the name instead.
- Map values encode as JSON objects. The map's key type must be string; the object keys are used directly as map keys.
- Pointer values encode as the value pointed to. (Note: this allows trees, but not graphs!). A nil pointer encodes as the null JSON object.
- Interface values encode as the value contained in the interface. A nil interface value encodes as the null JSON object.
- Channel, complex, and function values cannot be encoded in JSON. Attempting to encode such a value causes Marshal to return an InvalidTypeError.
- JSON cannot represent cyclic data structures and Marshal does not handle them. Passing cyclic structures to Marshal will result in an infinite recursion.
- 布尔值被编码为JSON的布尔值。
- 浮点数与整数被编码为JSON的数字值。
- 字符串被编码为JSON的字符串,每一个非法的UTF-8序列将会被UTF8替换符U+FFFD替换。
- 数组与Slice会被编码为JSON数组,但是[]byte是会被编码为base64字符串。
- 结构体被编码为JSON对象。每一个结构体字段被编码为此对象的对应成员,默认情况下对象的key的名字是对应结构体字段名的小写。如果此字段含有tag,则此tag将是最终对象key的名字。
- map值被编码为JSON对象,此map的key的类型必须是string;map的key直接被当作JSON对象的key。
- 指针值被编码为指针所指向的值(注意:此处只允许出现树(tree),而不允许出现图(graph)!)。空指针被编码为空JSON对象.
- 接口值被编码为接口实际包含的值。空接口被编码为空JSON对象。
- 程道,复数,函数不能被编码为JSON格式。如果尝试这样做,Marshal将会返回一个InvalidTypeError错误。
- JSON不能表示环形数据结构。Go的Marshal函数也不处理它们,将一个环形结构传递给Marshal将会导致死循环。
A program to store JSON serialised data into a file is
将JSON数据存入文件的示例如下:
/* SaveJSON */
package main
import (
"encoding/json"
"fmt"
"os"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func main() {
person := Person{
Name: Name{Family: "Newmarch", Personal: "Jan"},
Email: []Email{Email{Kind: "home", Address: "jan@newmarch.name"},
Email{Kind: "work", Address: "j.newmarch@boxhill.edu.au"}}}
saveJSON("person.json", person)
}
func saveJSON(fileName string, key interface{}) {
outFile, err := os.Create(fileName)
checkError(err)
encoder := json.NewEncoder(outFile)
err = encoder.Encode(key)
checkError(err)
outFile.Close()
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
and to load it back into memory is
可以这样将之重新加载到内存中:
/* LoadJSON
*/
package main
import (
"encoding/json"
"fmt"
"os"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func (p Person) String() string {
s := p.Name.Personal + " " + p.Name.Family
for _, v := range p.Email {
s += "\n" + v.Kind + ": " + v.Address
}
return s
}
func main() {
var person Person
loadJSON("person.json", &person)
fmt.Println("Person", person.String())
}
func loadJSON(fileName string, key interface{}) {
inFile, err := os.Open(fileName)
checkError(err)
decoder := json.NewDecoder(inFile)
err = decoder.Decode(key)
checkError(err)
inFile.Close()
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
The serialised form is (formatted nicely)
被序列化后的结果如:(经过了美化处理)
{"Name":{"Family":"Newmarch",
"Personal":"Jan"},
"Email":[{"Kind":"home","Address":"jan@newmarch.name"},
{"Kind":"work","Address":"j.newmarch@boxhill.edu.au"}
]
}
A client and server
客户端与服务器
A client to send a person's data and read it back ten times is
一个将person数据收发10次的客户端
/* JSON EchoClient
*/
package main
import (
"fmt"
"net"
"os"
"encoding/json"
"bytes"
"io"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func (p Person) String() string {
s := p.Name.Personal + " " + p.Name.Family
for _, v := range p.Email {
s += "\n" + v.Kind + ": " + v.Address
}
return s
}
func main() {
person := Person{
Name: Name{Family: "Newmarch", Personal: "Jan"},
Email: []Email{Email{Kind: "home", Address: "jan@newmarch.name"},
Email{Kind: "work", Address: "j.newmarch@boxhill.edu.au"}}}
if len(os.Args) != 2 {
fmt.Println("Usage: ", os.Args[0], "host:port")
os.Exit(1)
}
service := os.Args[1]
conn, err := net.Dial("tcp", service)
checkError(err)
encoder := json.NewEncoder(conn)
decoder := json.NewDecoder(conn)
for n := 0; n < 10; n++ {
encoder.Encode(person)
var newPerson Person
decoder.Decode(&newPerson)
fmt.Println(newPerson.String())
}
os.Exit(0)
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
func readFully(conn net.Conn) ([]byte, error) {
defer conn.Close()
result := bytes.NewBuffer(nil)
var buf [512]byte
for {
n, err := conn.Read(buf[0:])
result.Write(buf[0:n])
if err != nil {
if err == io.EOF {
break
}
return nil, err
}
}
return result.Bytes(), nil
}
and the corrsponding server is
对应的服务器
/* JSON EchoServer
*/
package main
import (
"fmt"
"net"
"os"
"encoding/json"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func (p Person) String() string {
s := p.Name.Personal + " " + p.Name.Family
for _, v := range p.Email {
s += "\n" + v.Kind + ": " + v.Address
}
return s
}
func main() {
service := "0.0.0.0:1200"
tcpAddr, err := net.ResolveTCPAddr("tcp", service)
checkError(err)
listener, err := net.ListenTCP("tcp", tcpAddr)
checkError(err)
for {
conn, err := listener.Accept()
if err != nil {
continue
}
encoder := json.NewEncoder(conn)
decoder := json.NewDecoder(conn)
for n := 0; n < 10; n++ {
var person Person
decoder.Decode(&person)
fmt.Println(person.String())
encoder.Encode(person)
}
conn.Close() // we're finished
}
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
The gob package
gob包
Gob is a serialisation technique specific to Go. It is designed to encode Go data types specifically and does not at present have support for or by any other languages. It supports all Go data types except for channels, functions and interfaces. It supports integers of all types and sizes, strings and booleans, structs, arrays and slices. At present it has some problems with circular structures such as rings, but that will improve over time.
gob是Go中特有的序列化技术。它只能编码Go的数据类型,目前它不支持其他语言,反之亦然。它支持除interface,function,channel外的所有的Go数据类型。它支持任何类型和任何大小的整数,还有字符串和布尔值,结构,数组与切片。目前它在处理ring等环型数据结构方面还存在一些问题,但假以时日,将会得到改善。
Gob encodes type information into its serialised forms. This is far more extensive than the type information in say an X.509 serialisation, but far more efficient than the type information contained in an XML document. Type information is only included once for each piece of data, but includes, for example, the names of struct fields.
Go将类型信息编码到序列化后的表单中,在扩展性方面这远比对应的X.509序列化方法要好。而同时与将类型信息包含在表单中的XML文档相比,则更加高效。对于每个数据,类型信息只包含一次。当然,包含的是字段名称这样的信息。
This inclusion of type information makes Gob marshalling and unmarshalling fairly robust to changes or differences between the marshaller and unmarshaller. For example, a struct
包含类型信息使得Gob在编、解组操作上,当marshaler与unmarshaler不同或者有变化时,具有相当高的健壮性。例如,如下这个结构:
struct T {
a int
b int
}
can be marshalled and then unmarshalled into a different struct
可以被编组并随需解组到不同的结构中.
struct T {
b int
a int
}
where the order of fields has changed. It can also cope with missing fields (the values are ignored) or extra fields (the fields are left unchanged). It can cope with pointer types, so that the above struct could be unmarshalled into
此处变更了字段的顺序.它也可以处理缺少字段(值将被忽略)或多出字段(此字段原样保持)的情况。它也可以处理指针类型,因此上边的结构可以被解组到下面的结构中.
struct T {
*a int
**b int
}
To some extent it can cope with type coercions so that an int
field can be broadened into an int64
, but not with incompatable types such as int
and uint
.
在一定程度上,它也可以强制执行类型转换,比如int
字段被扩展成为
int64
。而对于不兼容类型,比如int
与uint
,就无能为力了.
To use Gob to marshall a data value, you first need to create an Encoder
. This takes a Writer
as parameter and marshalling will be done to this write stream. The encoder has a method Encode
which marshalls the value to the stream. This method can be called multiple times on multiple pieces of data. Type information for each data type is only written once, though.
为了使用gob编组一个数据值,首先你得创建Encoder
。它使用Writer
作为参数,编组操作会将最终结果写入此流中。encoder有个Encode
方法,它执行将值编组成流的操作。此方法可以在多份数据上被调用多次。但是对于每一种数据类型,类型信息却只会被写入一次。
You use a Decoder
to unmarshall the serialised data stream. This takes a Reader
and each read returns an unmarshalled data value.
你将使用Decoder
来执行解组序列化后的数据流的操作。它持有一个Reader
参数,每次读取都将返回一个解组后的数据值。
A program to store gob serialised data into a file is
将gob序列化后的数据存入文件的示例程序如下:
/* SaveGob
*/
package main
import (
"fmt"
"os"
"encoding/gob"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func main() {
person := Person{
Name: Name{Family: "Newmarch", Personal: "Jan"},
Email: []Email{Email{Kind: "home", Address: "jan@newmarch.name"},
Email{Kind: "work", Address: "j.newmarch@boxhill.edu.au"}}}
saveGob("person.gob", person)
}
func saveGob(fileName string, key interface{}) {
outFile, err := os.Create(fileName)
checkError(err)
encoder := gob.NewEncoder(outFile)
err = encoder.Encode(key)
checkError(err)
outFile.Close()
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
and to load it back into memory is
将之重新加载回内存的操作如下:
/* LoadGob
*/
package main
import (
"fmt"
"os"
"encoding/gob"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func (p Person) String() string {
s := p.Name.Personal + " " + p.Name.Family
for _, v := range p.Email {
s += "\n" + v.Kind + ": " + v.Address
}
return s
}
func main() {
var person Person
loadGob("person.gob", &person)
fmt.Println("Person", person.String())
}
func loadGob(fileName string, key interface{}) {
inFile, err := os.Open(fileName)
checkError(err)
decoder := gob.NewDecoder(inFile)
err = decoder.Decode(key)
checkError(err)
inFile.Close()
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
A client and server
一个客户端与服务器的例子
A client to send a person's data and read it back ten times is
一个将person数据收发10次的客户端
/* Gob EchoClient
*/
package main
import (
"fmt"
"net"
"os"
"encoding/gob"
"bytes"
"io"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func (p Person) String() string {
s := p.Name.Personal + " " + p.Name.Family
for _, v := range p.Email {
s += "\n" + v.Kind + ": " + v.Address
}
return s
}
func main() {
person := Person{
Name: Name{Family: "Newmarch", Personal: "Jan"},
Email: []Email{Email{Kind: "home", Address: "jan@newmarch.name"},
Email{Kind: "work", Address: "j.newmarch@boxhill.edu.au"}}}
if len(os.Args) != 2 {
fmt.Println("Usage: ", os.Args[0], "host:port")
os.Exit(1)
}
service := os.Args[1]
conn, err := net.Dial("tcp", service)
checkError(err)
encoder := gob.NewEncoder(conn)
decoder := gob.NewDecoder(conn)
for n := 0; n < 10; n++ {
encoder.Encode(person)
var newPerson Person
decoder.Decode(&newPerson)
fmt.Println(newPerson.String())
}
os.Exit(0)
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
func readFully(conn net.Conn) ([]byte, error) {
defer conn.Close()
result := bytes.NewBuffer(nil)
var buf [512]byte
for {
n, err := conn.Read(buf[0:])
result.Write(buf[0:n])
if err != nil {
if err == io.EOF {
break
}
return nil, err
}
}
return result.Bytes(), nil
}
and the corrsponding server is
对应的服务器:
/* Gob EchoServer
*/
package main
import (
"fmt"
"net"
"os"
"encoding/gob"
)
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Kind string
Address string
}
func (p Person) String() string {
s := p.Name.Personal + " " + p.Name.Family
for _, v := range p.Email {
s += "\n" + v.Kind + ": " + v.Address
}
return s
}
func main() {
service := "0.0.0.0:1200"
tcpAddr, err := net.ResolveTCPAddr("tcp", service)
checkError(err)
listener, err := net.ListenTCP("tcp", tcpAddr)
checkError(err)
for {
conn, err := listener.Accept()
if err != nil {
continue
}
encoder := gob.NewEncoder(conn)
decoder := gob.NewDecoder(conn)
for n := 0; n < 10; n++ {
var person Person
decoder.Decode(&person)
fmt.Println(person.String())
encoder.Encode(person)
}
conn.Close() // we're finished
}
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
Encoding binary data as strings
将二进制数据编码为字符串
Once upon a time, transmtting 8-bit data was problematic. It was often transmitted over noisy serial lines and could easily become corrupted. 7-bit data on the other hand could be transmitted more reliably because the 8th bit could be used as check digit. For example, in an "even parity" scheme, the check digit would be set to one or zero to make an even number of 1's in a byte. This allows detection of errors of a single bit in each byte.
以前,传输8-bit数据总是会出现各种问题。它通常由充满噪声的串行线来输入,因此会出错。因为第8个比特位可以被用来做数字检验,所以7-bit的传输要值得信任一些。例如在 “偶数奇偶校检”模式下,为了让一个字节中出现偶数个1,校检位可以被设置1或0,这将可侦测每个字节中的单个bit位出现的错误.
ASCII is a 7-bit character set. A number of schemes have been developed that are more sophisticated than simple parity checking, but which involve translating 8-bit binary data into 7-bit ASCII format. Essentially, the 8-bit data is stretched out in some way over the 7-bit bytes.
ASCII是一种7-bit字符集。很多比‘奇偶检验’精巧的模式被开发出来,但是本质上都是将8-bit二进制数据转化成7-bit ASCII格式。本质上8-bit数据是7-bit数据的延伸。
Binary data transmitted in HTTP responses and requests is often translated into an ASCII form. This makes it easy to inspect the HTTP messages with a simple text reader without worrying about what strange 8-bit bytes might do to your display!
在HTTP的请求与应答中,二进制数据常被转化为ASCII的形式。这使得通过一个简单的文本阅读器来检视HTTP消息变得容易,而不需要担心8-bit字节造成的显示乱码的问题!
One common format is Base64. Go has support for many binary-to-text formats, including base64.
一个通用的格式是Base64,Go支持包括base64在内的多种binary-to-text格式.
There are two principal functions to use for Base64 encoding and decoding:
两个编、解码Base64的主要函数:
func NewEncoder(enc *Encoding, w io.Writer) io.WriteCloser
func NewDecoder(enc *Encoding, r io.Reader) io.Reader
A simple program just to encode and decode a set of eight binary digits is
一个用以演示编解码8位二进制数的简单程序如下:
/**
* Base64
*/
package main
import (
"bytes"
"encoding/base64"
"fmt"
)
func main() {
eightBitData := []byte{1, 2, 3, 4, 5, 6, 7, 8}
bb := &bytes.Buffer{}
encoder := base64.NewEncoder(base64.StdEncoding, bb)
encoder.Write(eightBitData)
encoder.Close()
fmt.Println(bb)
dbuf := make([]byte, 12)
decoder := base64.NewDecoder(base64.StdEncoding, bb)
decoder.Read(dbuf)
for _, ch := range dbuf {
fmt.Print(ch)
}
}