当前位置: 首页 > 工具软件 > IHBase > 使用案例 >

hbase中二级索引的实现--ihbase

冯良才
2023-12-01
一般来说,对数据库建立索引,往往需要单独的数据结构来存储索引的数据.在为hbase建立索引时,可以另外建立一张索引表,查询时先查询索引表,然后用查询结果查询数据表.

[img]http://dl2.iteye.com/upload/attachment/0099/4712/041675af-eed4-3f4d-9007-7367504ca6e5.png[/img]
这个图左边表示索引表,右边是数据表.

但是对于hbase这种分布式的数据库来说,最大的问题是解决索引表和数据表的本地性问题,hbase很容易就因为负载均衡,表split等原因把索引表和数据表的数据分布到不同的region server上,比如下图中,数据表和索引表就出现在了不同的region server上


[img]http://dl2.iteye.com/upload/attachment/0099/4719/8d7d9d6c-6041-33a5-85d8-c16dae6c3e60.png[/img]

所以为了解决这个问题,[url=https://issues.apache.org/jira/browse/HBASE-2037]ihbase[/url]项目应运而生,它的主要思想是在region级别建立索引而不是在表级别.

它的解决方案是用IdxRegion代替了常规的region实现,在flush的时候为region建立索引
@Override
protected void internalPreFlashcacheCommit() throws IOException {
rebuildIndexes();
super.internalPreFlashcacheCommit();
}


在scan的时候,提供特殊的scanner

 @Override
protected InternalScanner instantiateInternalScanner(Scan scan,
List<KeyValueScanner> additionalScanners) throws IOException {
Expression expression = IdxScan.getExpression(scan);
if (scan == null || expression == null) {
totalNonIndexedScans.incrementAndGet();
return super.instantiateInternalScanner(scan, additionalScanners);
} else {
totalIndexedScans.incrementAndGet();
// Grab a new search context
IdxSearchContext searchContext = indexManager.newSearchContext();
// use the expression evaluator to determine the final set of ints
IntSet matchedExpression = expressionEvaluator.evaluate(searchContext,
expression);
if (LOG.isDebugEnabled()) {
LOG.debug(String.format("%s rows matched the index expression",
matchedExpression.size()));
}
return new IdxRegionScanner(scan, searchContext, matchedExpression);
}
}


ihbase在内存中为region维护了一份索引,在scan的时候首先在索引中查找数据,按顺序提供rowkey,而在常规的scan时,能利用上一步的rowkey来move forward,有目的的进行seek.

IdxRegionScanner在进行scan的时候,用索引来构造keyProvider,然后执行next方法时,用keyProvider提供的rowkey进行定位

    @Override
public boolean next(List<KeyValue> outResults) throws IOException {
// Seek to the next key value
seekNext();
boolean result = super.next(outResults);

//省略部分代码

return result;
}


seekNext方法就是从keyProvider取得下一个rowkey,然后跳到该rowkey
    protected void seekNext() throws IOException {
KeyValue keyValue;
do {
keyValue = keyProvider.next();

if (keyValue == null) {
// out of results keys, nothing more to process
super.getStoreHeap().close();
return;
} else if (lastKeyValue == null) {
// first key returned from the key provider
break;
} else {
// it's possible that the super nextInternal method progressed past the
// ketProvider's next key. We need to keep calling next on the keyProvider
// until the key returned is after the last key returned from the
// next(List<KeyValue>) method.

// determine which of the two keys is less than the other
// when the keyValue is greater than the lastKeyValue then we're good
int comparisonResult = comparator.compareRows(keyValue, lastKeyValue);
if (comparisonResult > 0) {
break;
}
}
} while (true);

// seek the store heap to the next key
// (this is what makes the scanner faster)
getStoreHeap().seek(keyValue);
}


我感觉这种实现问题在于内存占用很高,而且不知道如果region如果load balance到其他region server上,还能不能保持索引和数据的一致性
 类似资料: