Golang bytes包Buffer源码分析

喻元龙

2023-12-01

bytes/buffer.go

Buffer提供了一个可扩展的字节缓冲区，底层是对[]byte进行封装，提供读写的功能。

结构体

type Buffer struct {
   buf       []byte   // contents are the bytes buf[off : len(buf)]  缓冲区
   off       int      // read at &buf[off], write at &buf[len(buf)]  读写的索引值，指针偏移量
   bootstrap [64]byte // memory to hold first slice; helps small buffers avoid allocation.  存储第一切片，避免小缓冲区分配
   lastRead  readOp   // last read operation, so that Unread* can work correctly.上次读的操作，用于UnReadRune等撤回到上次读操作之前的状态，所以记录最新一次的操作，自动扩容使用

   // FIXME: it would be advisable to align Buffer to cachelines to avoid false
   // sharing.
}

Buffer是一个可变大小的字节缓冲区，具有Read和Write方法。 Buffer的零值是一个可以使用的空缓冲区。

初始化

func NewBuffer(buf []byte) *Buffer { return &Buffer{buf: buf} }

根据buf初始化buffer缓冲区

func NewBufferString(s string) *Buffer {
   return &Buffer{buf: []byte(s)}
}

根据字符串初始化并返回buffer

readOp常量描述了对缓冲区执行的最后一个操作

type readOp int8

// Don't use iota for these, as the values need to correspond with the
// names and comments, which is easier to see when being explicit.
const (
   opRead      readOp = -1 // Any other read operation.  任何其他操作
   opInvalid   readOp = 0  // Non-read operation.    没有读操作
   opReadRune1 readOp = 1  // Read rune of size 1.  读取大小为1的字符 （由于UTF-8字符可能包含1-4个字节）
   opReadRune2 readOp = 2  // Read rune of size 2.  读取大小为2的字符
   opReadRune3 readOp = 3  // Read rune of size 3.  读取大小为3的字符
   opReadRune4 readOp = 4  // Read rune of size 4.  读取大小为4的字符
)

func (b *Buffer) Bytes() []byte { return b.buf[b.off:] }

作用：用于获取未读部分的buffer数据
Bytes返回一个长度为b.Len()的片段，其中包含缓冲区的未读部分。
切片仅在下一次缓冲区修改之前有效（即，直到下一次调用Read，Write，Reset或Truncate之类的方法）。

func (b *Buffer) String() string {
   if b == nil {
      // Special case, useful in debugging.
      return "<nil>"
   }
   return string(b.buf[b.off:])
}

作用：返回缓冲区中未读部分的字符串形式

// empty returns whether the unread portion of the buffer is empty.
func (b *Buffer) empty() bool { return len(b.buf) <= b.off }

作用：返回缓冲区的未读部分是否为空

//Len returns the number of bytes of the unread portion of the buffer;
func (b *Buffer) Len() int { return len(b.buf) - b.off }

作用：返回缓冲区未读部分的字节数;

// Cap returns the capacity of the buffer's underlying byte slice, that is, the
// total space allocated for the buffer's data.
func (b *Buffer) Cap() int { return cap(b.buf) }

作用：返回缓冲区切片的容量，即为缓冲区数据分配的总空间。

// Truncate discards all but the first n unread bytes from the buffer
// but continues to use the same allocated storage.
// It panics if n is negative or greater than the length of the buffer.
func (b *Buffer) Truncate(n int) {
   if n == 0 {
      b.Reset()
      return
   }
   b.lastRead = opInvalid
   if n < 0 || n > b.Len() {
      panic("bytes.Buffer: truncation out of range")
   }
   b.buf = b.buf[:b.off+n]
}

作用：从缓冲区中丢弃除前n个未读字节以外的所有字节，但继续使用相同的已分配存储。(已读的数据不会删除)
如果n为负或大于缓冲区的长度，则会发生panic。

// Reset resets the buffer to be empty,
// but it retains the underlying storage for use by future writes.
// Reset is the same as Truncate(0).
func (b *Buffer) Reset() {
   b.buf = b.buf[:0]
   b.off = 0
   b.lastRead = opInvalid
}

重置操作

将缓冲区重置为空，但它会保留底层存储以供将来的写入使用。（清空数据，cap不变）
offset 偏移量置为0
lastRead置为未读取

// tryGrowByReslice is a inlineable version of grow for the fast-case where the
// internal buffer only needs to be resliced.
// It returns the index where bytes should be written and whether it succeeded.
//适用于内部缓冲区只需要复制的快速情况。
// 它返回开始写入字节的索引以及是否成功。
//判断现有的容量是否需要扩容  （ len(b.off)+n < c ）
func (b *Buffer) tryGrowByReslice(n int) (int, bool) {
	if l := len(b.buf); n <= cap(b.buf)-l {
		b.buf = b.buf[:l+n]
		return l, true
	}
   return 0,false
}

// grow grows the buffer to guarantee space for n more bytes.
// It returns the index where bytes should be written.
// If the buffer can't grow it will panic with ErrTooLarge.
func (b *Buffer) grow(n int) int {
   m := b.Len()
   // If buffer is empty, reset to recover space.
   if m == 0 && b.off != 0 {
      b.Reset()
   }
   // Try to grow by means of a reslice.
   if i, ok := b.tryGrowByReslice(n); ok {
      return i
   }
   // Check if we can make use of bootstrap array.
   if b.buf == nil && n <= len(b.bootstrap) {
      b.buf = b.bootstrap[:n]
      return 0
   }
   c := cap(b.buf)
   if n <= c/2-m {
      // We can slide things down instead of allocating a new
      // slice. We only need m+n <= c to slide, but
      // we instead let capacity get twice as large so we
      // don't spend all our time copying.
      copy(b.buf, b.buf[b.off:])
   } else if c > maxInt-c-n {
      //panic(ErrTooLarge)
   } else {
      // Not enough space anywhere, we need to allocate.
      buf := makeSlice(2*c + n)
      copy(buf, b.buf[b.off:])
      b.buf = buf
   }
   // Restore b.off and len(b.buf).
   b.off = 0
   b.buf = b.buf[:m+n]
   return m
}

作用：用于缓冲区扩展n个字节的长度。它返回应写入字节的索引。
扩容规则如下：

1. len(n.buf) + n <= cap(b.buf) 直接返回 b.buf[:len(b.buf) + n]
2. b.buf == nil && n <= len(b.bootstrap) 直接返回 b.bootstrap[:n]
3. n + m <= c/2 (m为未读的长度，c是容量),则进行滑动 copy(b.buf, b.buf[b.off:]) ，将已读的部分移除
4. buf := makeSlice(2*c + n)，容量编程2*c + n，即时原来容量的2倍+n

// makeSlice allocates a slice of size n. If the allocation fails, it panics
// with ErrTooLarge.
func makeSlice(n int) []byte {
   // If the make fails, give a known error.
   defer func() {
      if recover() != nil {
         panic(ErrTooLarge)
      }
   }()
   return make([]byte, n)
}

分配一个切片大小为n的切片。

// Grow grows the buffer's capacity, if necessary, to guarantee space for
// another n bytes. After Grow(n), at least n bytes can be written to the
// buffer without another allocation.
// If n is negative, Grow will panic.
// If the buffer can't grow it will panic with ErrTooLarge.
func (b *Buffer) Grow(n int) {
   if n < 0 {
      panic("bytes.Buffer.Grow: negative count")
   }
   m := b.grow(n)
   b.buf = b.buf[:m]
}

如果需要，Grow会增加缓冲区的容量，以保证另外n个字节的空间。
调用grow()进行扩容

// Write appends the contents of p to the buffer, growing the buffer as
// needed. The return value n is the length of p; err is always nil. If the
// buffer becomes too large, Write will panic with ErrTooLarge.
func (b *Buffer) Write(p []byte) (n int, err error) {
   b.lastRead = opInvalid
   //ok是否需要扩容，m是开始写的位置索引
   m, ok := b.tryGrowByReslice(len(p))
   if !ok {
      //扩容，m是开始写的位置
      m = b.grow(len(p))
   }
   return copy(b.buf[m:], p), nil
}

作用：将切片p的内容写到buffer缓冲区

// WriteString appends the contents of s to the buffer, growing the buffer as
// needed. The return value n is the length of s; err is always nil. If the
// buffer becomes too large, WriteString will panic with ErrTooLarge.
func (b *Buffer) WriteString(s string) (n int, err error) {
   b.lastRead = opInvalid
   m, ok := b.tryGrowByReslice(len(s))
   if !ok {
      m = b.grow(len(s))
   }
   return copy(b.buf[m:], s), nil
}

作用：将字符串s写入到缓冲区buffer

// MinRead is the minimum slice size passed to a Read call by
// Buffer.ReadFrom. As long as the Buffer has at least MinRead bytes beyond
// what is required to hold the contents of r, ReadFrom will not grow the
// underlying buffer.
//MinRead是Buffer.ReadFrom传递给Read调用的最小切片大小。
// 只要Buffer至少具有保存r内容所需的MinRead字节，ReadFrom就不会增长底层缓冲区。
const MinRead = 512


// ReadFrom reads data from r until EOF and appends it to the buffer, growing
// the buffer as needed. The return value n is the number of bytes read. Any
// error except io.EOF encountered during the read is also returned. If the
// buffer becomes too large, ReadFrom will panic with ErrTooLarge.
func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error) {
   b.lastRead = opInvalid
   for {
      i := b.grow(MinRead)
      m, e := r.Read(b.buf[i:cap(b.buf)])
      if m < 0 {
         panic(errNegativeRead)
      }

      b.buf = b.buf[:i+m]
      n += int64(m)
      if e == io.EOF {
         return n, nil // e is EOF, so return nil explicitly
      }
      if e != nil {
         return n, e
      }
   }
}

作用：ReadFrom从r读取数据直到EOF并将其附加到缓冲区，根据需要自动扩容缓冲区。返回值n是读取的字节数。除了在读取期间遇到的io.EOF之外的任何错误也会被返回。如果缓冲区变得太大，ReadFrom可能会触发ErrTooLarge panic

// WriteTo writes data to w until the buffer is drained or an error occurs.
// The return value n is the number of bytes written; it always fits into an
// int, but it is int64 to match the io.WriterTo interface. Any error
// encountered during the write is also returned.
func (b *Buffer) WriteTo(w io.Writer) (n int64, err error) {
   b.lastRead = opInvalid
   if nBytes := b.Len(); nBytes > 0 {
      m, e := w.Write(b.buf[b.off:])
      if m > nBytes {
         panic("bytes.Buffer.WriteTo: invalid Write count")
      }
      b.off += m
      n = int64(m)
      if e != nil {
         return n, e
      }
      // all bytes should have been written, by definition of
      // Write method in io.Writer
      if m != nBytes {
         return n, io.ErrShortWrite
      }
   }
   // Buffer is now empty; reset.
   b.Reset()
   return n, nil
}

WriteTo方法将缓冲区的数据写入w，直到缓冲区耗尽或发生错误。返回值n是写入的字节数; 它总是适合int，但它是int64以匹配io.WriterTo接口。写入期间遇到的任何错误也会返回。

// WriteByte appends the byte c to the buffer, growing the buffer as needed.
// The returned error is always nil, but is included to match bufio.Writer's
// WriteByte. If the buffer becomes too large, WriteByte will panic with
// ErrTooLarge.
func (b *Buffer) WriteByte(c byte) error {
   b.lastRead = opInvalid
   m, ok := b.tryGrowByReslice(1)
   if !ok {
      m = b.grow(1)
   }
   b.buf[m] = c
   return nil
}

WriteByte将字节c附加到缓冲区，根据需要自动扩容缓冲区。返回的错误总是为nil，如果缓冲区变得太大超过了ErrTooLarge会发生panic。

// WriteRune appends the UTF-8 encoding of Unicode code point r to the
// buffer, returning its length and an error, which is always nil but is
// included to match bufio.Writer's WriteRune. The buffer is grown as needed;
// if it becomes too large, WriteRune will panic with ErrTooLarge.
func (b *Buffer) WriteRune(r rune) (n int, err error) {
   //r为一个字节，则调用WriteByte
   if r < utf8.RuneSelf {
      b.WriteByte(byte(r))
      return 1, nil
   }
   //判断是否需要扩容，如果需要则进行扩容，r的最长字节数可能是utf8.UTFMax(即4个字节)
   b.lastRead = opInvalid
   m, ok := b.tryGrowByReslice(utf8.UTFMax)
   if !ok {
      m = b.grow(utf8.UTFMax)
   }
   //将r写入到缓冲区中
   n = utf8.EncodeRune(b.buf[m:m+utf8.UTFMax], r)
   b.buf = b.buf[:m+n]
   return n, nil
}

WriteRune将Unicode字符r的UTF-8编码附加到缓冲区，返回其长度和错误，
如果缓冲区变得太大超过了ErrTooLarge会发生panic。

// Read reads the next len(p) bytes from the buffer or until the buffer
// is drained. The return value n is the number of bytes read. If the
// buffer has no data to return, err is io.EOF (unless len(p) is zero);
// otherwise it is nil.
func (b *Buffer) Read(p []byte) (n int, err error) {
   b.lastRead = opInvalid
   //判断缓冲区内的数据是否为空
   if b.empty() {
      // Buffer is empty, reset to recover space.
      b.Reset()
      if len(p) == 0 {
         return 0, nil
      }
      return 0, io.EOF
   }
   //将缓冲区的数据cope到p
   n = copy(p, b.buf[b.off:])
   b.off += n
   if n > 0 {
      b.lastRead = opRead
   }
   return n, nil
}

作用：从缓冲区读取len(p) 字节到p切片中，如果缓冲区中剩余的元素个数小于len(p)则将缓冲区中剩余的元素读取到p切片中。返回值n是读取的字节数。如果缓冲区中没有数据，则err为io.EOF（除非len（p）为零）; 否则它是nil。

// Next returns a slice containing the next n bytes from the buffer,
// advancing the buffer as if the bytes had been returned by Read.
// If there are fewer than n bytes in the buffer, Next returns the entire buffer.
// The slice is only valid until the next call to a read or write method.
func (b *Buffer) Next(n int) []byte {
   b.lastRead = opInvalid
   //获取缓冲区buffer中未读的长度m
   m := b.Len()
   if n > m {
      n = m
   }

   data := b.buf[b.off : b.off+n]
   b.off += n
   if n > 0 {
      b.lastRead = opRead
   }
   return data
}

Next用于获取缓冲区中接下来n个未读字节，并移动缓冲区的指针偏移量off，就好像Read读取数据一样。如果缓冲区中的字节少于n个，则Next返回整个缓冲区。切片仅在下次调用read或write方法之前有效，Next也属于读操作，因为指针偏移量会增加

// ReadByte reads and returns the next byte from the buffer.
// If no byte is available, it returns error io.EOF.
func (b *Buffer) ReadByte() (byte, error) {
   //如果为空，则复位，返回io.EOF
   if b.empty() {
      // Buffer is empty, reset to recover space.
      b.Reset()
      return 0, io.EOF
   }
   //获取接下来一个未读字节，偏移量索引+1，标记上次读操作是opRead操作
   c := b.buf[b.off]
   b.off++
   b.lastRead = opRead
   return c, nil
}

读取缓冲区下一个未读字节，如果该缓冲区没有可读的字节，则复位缓冲区，返回io.EOF

// ReadRune reads and returns the next UTF-8-encoded
// Unicode code point from the buffer.
// If no bytes are available, the error returned is io.EOF.
// If the bytes are an erroneous UTF-8 encoding, it
// consumes one byte and returns U+FFFD, 1.
func (b *Buffer) ReadRune() (r rune, size int, err error) {
   //判断是否为空
   if b.empty() {
      // Buffer is empty, reset to recover space.
      b.Reset()
      return 0, 0, io.EOF
   }
   //如果未读的接下来一个字符是ASCII，
   c := b.buf[b.off]
   if c < utf8.RuneSelf {
      b.off++
      b.lastRead = opReadRune1
      return rune(c), 1, nil
   }
   //读取接下来一个utf-8字符，r是该字符，n是字符长度
   r, n := utf8.DecodeRune(b.buf[b.off:])
   b.off += n
   b.lastRead = readOp(n)
   return r, n, nil
}

读取一个utf-8字符，返回f该字符和字符的长度size

// UnreadRune unreads the last rune returned by ReadRune.
// If the most recent read or write operation on the buffer was
// not a successful ReadRune, UnreadRune returns an error.  (In this regard
// it is stricter than UnreadByte, which will unread the last byte
// from any read operation.)
func (b *Buffer) UnreadRune() error {
   //如果上次操作不是ReadRune()，则返回错误
   if b.lastRead <= opInvalid {
      return errors.New("bytes.Buffer: UnreadRune: previous operation was not a successful ReadRune")
   }

   //偏移量off回到ReadRune之前的状态
   if b.off >= int(b.lastRead) {
      b.off -= int(b.lastRead)
   }
   b.lastRead = opInvalid
   return nil
}

UnreadRune方法用于撤回上次ReadRune操作的数据，如果上次操作不是ReadRune(),则返回错误

// UnreadByte unreads the last byte returned by the most recent successful
// read operation that read at least one byte. If a write has happened since
// the last read, if the last read returned an error, or if the read read zero
// bytes, UnreadByte returns an error.
func (b *Buffer) UnreadByte() error {
   if b.lastRead == opInvalid {
      return errors.New("bytes.Buffer: UnreadByte: previous operation was not a successful read")
   }
   b.lastRead = opInvalid
   if b.off > 0 {
      b.off--
   }
   return nil
}

UnreadByte方法用于撤销上次的ReadByte数据，如果上次不是ReadByte()则返回错误

// ReadBytes reads until the first occurrence of delim in the input,
// returning a slice containing the data up to and including the delimiter.
// If ReadBytes encounters an error before finding a delimiter,
// it returns the data read before the error and the error itself (often io.EOF).
// ReadBytes returns err != nil if and only if the returned data does not end in
// delim.
func (b *Buffer) ReadBytes(delim byte) (line []byte, err error) {
   slice, err := b.readSlice(delim)
   // return a copy of slice. The buffer's backing array may
   // be overwritten by later calls.
   line = append(line, slice...)
   return line, err
}

ReadBytes方法读取直到第一次出现delim字节，如果delim不在剩余的buffer中，则读完

// readSlice is like ReadBytes but returns a reference to internal buffer data.
func (b *Buffer) readSlice(delim byte) (line []byte, err error) {
   i := IndexByte(b.buf[b.off:], delim)
   end := b.off + i + 1
   if i < 0 {
      //len(b.buf)是缓冲区中数据的长度
      end = len(b.buf)
      err = io.EOF
   }
   line = b.buf[b.off:end]
   b.off = end
   b.lastRead = opRead
   return line, err
}

读取缓冲区剩余的数据，直到delim字节为止
如果delim字节找不到，将剩余所有的数据读完
和ReadBytes不同的是readSlice直接返回buffer数据的引用

// ReadString reads until the first occurrence of delim in the input,
// returning a string containing the data up to and including the delimiter.
// If ReadString encounters an error before finding a delimiter,
// it returns the data read before the error and the error itself (often io.EOF).
// ReadString returns err != nil if and only if the returned data does not end
// in delim.
func (b *Buffer) ReadString(delim byte) (line string, err error) {
   slice, err := b.readSlice(delim)
   return string(slice), err
}

ReadString方法和ReadBytes功能相似，只是返回字符串

Golang bytes包Buffer源码分析

相关阅读

相关文章

相关问答

相关文档