1. HBase Read Replicas简介:
在没有HBase Read Replicas的情况下,只有一个RegionServer为来自客户机的读请求提供服务,而不管RegionServer是否与具有对相同块的本地访问权的其他datanode进行了协作。这确保了正在读取的数据的一致性。然而,由于性能不佳的RegionServer、网络问题或其他可能导致读取速度慢的原因,RegionServer可能成为瓶颈。
启用read replicas后,HMaster将区域(副本)的只读副本分发到集群中的不同区域服务器。一个RegionServer服务缺省副本或主副本,这是唯一一个可以服务写请求的副本。如果为主副本提供服务的RegionServer关闭,写操作将失败。
其他的RegionServer提供辅助副本,遵循主分区服务器,只看到提交的更新。辅助副本是只读的,无法为写请求提供服务。通过按设定的间隔读取主副本的HFiles或复制,可以使辅助副本保持最新。如果使用第一种方法,那么在进行更新时,辅助副本可能不会反映数据的最新更新,而RegionServer还没有将memstore刷新到HDFS。如果客户端接收到来自次要副本的读响应,则通过将读标记为“陈旧的”来表示。客户端可以检测读取的结果是否过时,并相应地做出反应。
副本放在不同的区域性服务器上,如果可能的话放在不同的机架上。就读取而言,这提供了高可用性(HA)的度量。如果一个RegionServer变得不可用,客户端仍然可以访问它所服务的区域,甚至在该区域被另一个RegionServer(使用一个辅助副本)接管之前。在特定区域的新RegionServer处理整个WAL之前,读取可能会失效。
对于任何给定的读请求,客户端都可以请求更快的结果,即使它来自次要副本,或者如果一致性比速度更重要,它可以确保它的请求由主区域性服务器提供服务。这允许您使用时间轴一致性语义,根据CAP定理在应用程序上下文中或应用程序的各个方面确定一致性和可用性的相对重要性!
时间轴一致性是一个一致性模型,它允许比默认的强一致性HBase模型更灵活的一致性标准。客户机可以指示给定读取(Get或扫描)操作所需的一致性级别。默认的一致性级别是强的,这意味着读请求只发送到服务于该区域的RegionServer。这与不使用读副本时的行为相同。另一种可能性是TIMELINE,它将请求发送到所有带有副本的regionserver,包括主服务器。客户端接受第一个响应,其中包括来自主服务器或辅助区域性服务器的响应。如果它来自次要服务器,客户端可以选择稍后验证读操作,或者不将其视为确定操作。
The read replica feature includes two different mechanisms for keeping replicas up to date:
Using a Timer
In this mode, replicas are refreshed at a time interval controlled by the configuration option hbase.regionserver.storefile.refresh.period.
Using Replication
In this mode, replicas are kept current between a source and sink cluster using HBase replication. This can potentially allow for faster synchronization than using a timer. Each time a flush occurs on the source cluster, a notification is pushed to the sink clusters for the table. To use replication to keep replicas current, you must first set the column family attribute REGION_MEMSTORE_REPLICATION to false, then set the HBase configuration property hbase.region.replica.replication.enabled to true.
Important: Read-replica updates using replication are not supported for the hbase:meta table. Columns of hbase:meta must always have their REGION_MEMSTORE_REPLICATION attribute set to false.
Important:
Before you enable read-replica support, make sure to account for their increased heap memory requirements. Although no additional copies of HFile data are created, read-only replicas regions have the same memory footprint as normal regions and need to be considered when calculating the amount of increased heap memory required. For example, if your table requires 8 GB of heap memory, when you enable three replicas, you need about 24 GB of heap memory.
To enable support for read replicas in HBase, you must set several properties.
Property Name | Default Value | Description |
---|---|---|
hbase.region.replica.replication.enabled | false | The mechanism for refreshing the secondary replicas. If set to false, secondary replicas are not guaranteed to be consistent at the row level. Secondary replicas are refreshed at intervals controlled by a timer (hbase.regionserver.storefile.refresh.period), and so are guaranteed to be at most that interval of milliseconds behind the primary RegionServer. Secondary replicas read from the HFile in HDFS, and have no access to writes that have not been flushed to the HFile by the primary RegionServer. If true, replicas are kept up to date using replication. and the column family has the attribute REGION_MEMSTORE_REPLICATION set to false, Using replication for read replication of hbase:meta is not supported, and REGION_MEMSTORE_REPLICATION must always be set to false on the column family. |
hbase.regionserver.storefile.refresh.period | 0 (disabled) | The period, in milliseconds, for refreshing the store files for the secondary replicas. The default value of 0 indicates that the feature is disabled. Secondary replicas update their store files from the primary RegionServer at this interval. If refreshes occur too often, this can create a burden for the NameNode. If refreshes occur too infrequently, secondary replicas will be less consistent with the primary RegionServer. |
hbase.master.loadbalancer.class | org.apache.hadoop.hbase.master. balancer.StochasticLoadBalancer (the class name is split for formatting purposes) | The Java class used for balancing the load of all HBase clients. The default implementation is the StochasticLoadBalancer, which is the only load balancer that supports reading data from secondary RegionServers. |
hbase.ipc.client.allowsInterrupt | true | Whether or not to enable interruption of RPC threads at the client. The default value of trueenables primary RegionServers to access data from other regions' secondary replicas. |
hbase.client.primaryCallTimeout.get | 10 ms | The timeout period, in milliseconds, an HBase client's will wait for a response before the read is submitted to a secondary replica if the read request allows timeline consistency. The default value is 10. Lower values increase the number of remote procedure calls while lowering latency. |
hbase.client.primaryCallTimeout.multiget | 10 ms | The timeout period, in milliseconds, before an HBase client's multi-get request, such as HTable.get(List<GET>)), is submitted to a secondary replica if the multi-get request allows timeline consistency. Lower values increase the number of remote procedure calls while lowering latency. |
<property> <name>hbase.regionserver.storefile.refresh.period</name> <value>0</value> </property> <property> <name>hbase.ipc.client.allowsInterrupt</name> <value>true</value> <description>Whether to enable interruption of RPC threads at the client. The default value of true is required to enable Primary RegionServers to access other RegionServers in secondary mode. </description> </property> <property> <name>hbase.client.primaryCallTimeout.get</name> <value>10</value> </property> <property> <name>hbase.client.primaryCallTimeout.multiget</name> <value>10</value> </property>
Rack awareness for read replicas is modeled after the mechanism used for rack awareness in Hadoop. Its purpose is to ensure that some replicas are on a different rack than the RegionServer servicing the table. The default implementation, which you can override by setting hbase.util.ip.to.rack.determiner, to custom implementation, is ScriptBasedMapping, which uses a topology map and a topology script to enforce distribution of the replicas across racks. To use the default topology map and script for CDH, setting hbase.util.ip.to.rack.determiner, to ScriptBasedMapping is sufficient. Add the following property to HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml if you use Cloudera Manager, or to hbase-site.xml otherwise.
<property> <name>hbase.util.ip.to.rack.determiner</name> <value>ScriptBasedMapping</value> </property>
The topology map assigns hosts to racks. It is read by the topology script. A rack is a logical grouping, and does not necessarily correspond to physical hardware or location. Racks can be nested. If a host is not in the topology map, it is assumed to be a member of the default rack. The following map uses a nested structure, with two data centers which each have two racks. All services on a host that are rack-aware will be affected by the rack settings for the host.
If you use Cloudera Manager, do not create the map manually. Instead, go to Hosts, select the hosts to assign to a rack, and select Actions for Selected > Assign Rack.
<topology> <node name="host1.example.com" rack="/dc1/r1"/> <node name="host2.example.com" rack="/dc1/r1"/> <node name="host3.example.com" rack="/dc1/r2"/> <node name="host4.example.com" rack="/dc1/r2"/> <node name="host5.example.com" rack="/dc2/r1"/> <node name="host6.example.com" rack="/dc2/r1"/> <node name="host7.example.com" rack="/dc2/r2"/> <node name="host8.example.com" rack="/dc2/r2"/> </topology>
The topology script determines rack topology using the topology map. By default, CDH uses /etc/hadoop/conf.cloudera.YARN-1/topology.py To use a different script, set net.topology.script.file.nameto the absolute path of the topology script.
After enabling read replica support on your RegionServers, configure the tables for which you want read replicas to be created. Keep in mind that each replica increases the amount of storage used by HBase in HDFS.
To create a new table with read replication capabilities enabled, set the REGION_REPLICATION property on the table. Use a command like the following, in HBase Shell:
hbase> create 'myTable', 'myCF', {REGION_REPLICATION => '3'}
You can also alter an existing column family to enable or change the number of read replicas it propagates, using a command similar to the following. The change will take effect at the next major compaction.
hbase> disable 'myTable' hbase> alter 'myTable', 'myCF', {REGION_REPLICATION => '3'} hbase> enable 'myTable'
To request a timeline-consistent read in your application, use the get.setConsistency(Consistency.TIMELINE)method before performing the Get or Scan operation.
To check whether the result is stale (comes from a secondary replica), use the isStale() method of the result object. Use the following examples for reference.
Get get = new Get(key); get.setConsistency(Consistency.TIMELINE); Result result = table.get(get);
Scan scan = new Scan(); scan.setConsistency(CONSISTENCY.TIMELINE); ResultScanner scanner = table.getScanner(scan); Result result = scanner.next();
This example overrides the normal behavior of sending the read request to all known replicas, and only sends it to the replica specified by ID.
Scan scan = new Scan(); scan.setConsistency(CONSISTENCY.TIMELINE); scan.setReplicaId(2); ResultScanner scanner = table.getScanner(scan); Result result = scanner.next();
Result result = table.get(get); if (result.isStale()) { ... }
You can also request timeline consistency using HBase Shell, allowing the result to come from a secondary replica.
hbase> get 'myTable', 'myRow', {CONSISTENCY => "TIMELINE"} hbase> scan 'myTable', {CONSISTENCY => 'TIMELINE'}