google hash table ------稀疏hash表之sparse hashtable

钱宇

2023-12-01

A sparse hashtable is a particular implementation of
// a hashtable: one that is meant to minimize memory use.
// It does this by using a *sparse table* (cf sparsetable.h),
// which uses between 1 and 2 bits to store empty buckets

// (we may need another bit for hashtables that support deletion).

sparse hashtable 非常的节约内存，google通过了sparse table实现，具体的做法是将empty元素用一个bit（或者是两个bit来表示）

现在我们重点来分析sparse table

这个table里面的成员变量如下：

static const int kBlockSize = 32;

  int size_;
  std::vector<std::vector<T> > elements_;
  std::vector<uint32> masks_;

可以发现，这个元素是用的一个二维数组俩表示的。

我们以sparse table中的set方法为例

void set(int index, const T& elem) {
    const int offset = index / kBlockSize;
    const int pos = index % kBlockSize;
    if (elements_[offset].size() == 0) {
      elements_[offset].resize(kBlockSize);
    }
    elements_[offset][index % kBlockSize] = elem;
    masks_[offset] |= (1U << pos);
  }

就是说，当这个index对应的大约32范围内的元素几乎都会落到一个范围里面

如果在某个区间，比如有32个，都没有元素，那么有可能这个区间内都不分配vector空间

接下来我们看看如何某个index中是不是有元素的：

bool test(int index) const {
    const int offset = index / kBlockSize;
    const int pos = index % kBlockSize;
    return ((masks_[offset] & (1U << pos)) != 0);
  }

如上面的代码，在set后，会将指定的offset中对应的mask向左移动，也就说，bit位为1的mask对应的pos处是有元素的。

这也就是为什么“which uses between 1 and 2 bits to store empty buckets”开头这句话说，为空的bucket只需要一个位来存储。

综上所述，稀疏哈希表通过这个数据结构来节省空间，其余的计算方法都和稠密哈希表大同小异。

由于博主水平有限，不足请不吝指出。

google hash table ------稀疏hash表之sparse hashtable

相关阅读

相关文章

相关问答

相关文档