问题：

从Apache Storm bolt实现HBase中值的插入和删除

高宏峻

2023-03-14

o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
o.a.z.ClientCnxn [INFO] Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
o.a.h.h.z.RecoverableZooKeeper [WARN] Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

我的想法是通过在prepare方法中为螺栓打开连接的每个实例创建一个连接，并在cleanup时关闭它，从而减少到HBase的连接数量。但是，根据文档，不能保证在分布式模式下调用cleanup。

在这之后，我找到了Storm使用hbase-storm-Hbase工作的框架。不幸的是，几乎没有关于它的信息，只有在它的github Repo上的自述。

那么我的第一个问题是，使用storm-hbase进行storm-hbase集成是否是一个好的解决方案？最好的方法是什么？

寇夜洛

2023-03-14

哦，孩子，我该发光了！我已经做了大量的优化写到HBase从Storm，所以希望这将有助于您。

如果你刚刚开始storm-hbase是一个很好的方式开始流数据到HBase。您可以只克隆项目，进行maven安装，然后在拓扑中引用它。

但是，如果您开始获得更复杂的逻辑，那么创建自己的类来与HBase对话可能是可行的方法。这就是我在这里的回答中要展示的。

我假设您使用的是maven和maven-shade插件。您需要引用hbase-client：

<dependency>
   <groupId>org.apache.hbase</groupId>
   <artifactId>hbase-client</artifactId>
   <version>${hbase.version}</version>
</dependency>

还要确保在拓扑JAR中打包hbase-site.xml。您可以从集群下载该文件，并将其放在src/main/resources中。我还有一个名为hbase-site.dev.xml的dev测试工具。然后只需使用shade插件将其移动到JAR的根部。

<plugin>
   <groupId>org.apache.maven.plugins</groupId>
   <artifactId>maven-shade-plugin</artifactId>
   <version>2.4</version>
   <configuration>
      <createDependencyReducedPom>true</createDependencyReducedPom>
      <artifactSet>
         <excludes>
            <exclude>classworlds:classworlds</exclude>
            <exclude>junit:junit</exclude>
            <exclude>jmock:*</exclude>
            <exclude>*:xml-apis</exclude>
            <exclude>org.apache.maven:lib:tests</exclude>
            <exclude>log4j:log4j:jar:</exclude>
            <exclude>org.testng:testng</exclude>
         </excludes>
      </artifactSet>
   </configuration>
   <executions>
      <execution>
         <phase>package</phase>
         <goals>
            <goal>shade</goal>
         </goals>
         <configuration>
            <transformers>
               <transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
                       <resource>core-site.xml</resource>
                       <file>src/main/resources/core-site.xml</file>
                   </transformer>
                   <transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
                       <resource>hbase-site.xml</resource>
                       <file>src/main/resources/hbase-site.xml</file>
                   </transformer>
                   <transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
                       <resource>hdfs-site.xml</resource>
                       <file>src/main/resources/hdfs-site.xml</file>
                   </transformer>
               <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
               <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass></mainClass>
               </transformer>
            </transformers>
            <filters>
               <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                     <exclude>META-INF/*.SF</exclude>
                     <exclude>META-INF/*.DSA</exclude>
                     <exclude>META-INF/*.RSA</exclude>
                     <exclude>junit/*</exclude>
                     <exclude>webapps/</exclude>
                     <exclude>testng*</exclude>
                     <exclude>*.js</exclude>
                     <exclude>*.png</exclude>
                     <exclude>*.css</exclude>
                     <exclude>*.json</exclude>
                     <exclude>*.csv</exclude>
                  </excludes>
               </filter>
            </filters>
         </configuration>
      </execution>
   </executions>
</plugin>

Configuration config = HBaseConfiguration.create();
connection = HConnectionManager.createConnection(config);

// single put method
private HConnection connection;

@SuppressWarnings("rawtypes")
@Override
public void prepare(java.util.Map stormConf, backtype.storm.task.TopologyContext context) {
   Configuration config = HBaseConfiguration.create();
   connection = HConnectionManager.createConnection(config);
}

@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
   try {
      // do stuff
      // call putFruit
   } catch (Exception e) {
      LOG.error("bolt error", e);
      collector.reportError(e);
   }
}

// example put method you'd call from within execute somewhere
private void putFruit(String key, FruitResult data) throws IOException {
   HTableInterface table = connection.getTable(Constants.TABLE_FRUIT);
   try {
     Put p = new Put(key.getBytes());
        long ts = data.getTimestamp();
        p.add(Constants.FRUIT_FAMILY, Constants.COLOR, ts, data.getColor().getBytes());
        p.add(Constants.FRUIT_FAMILY, Constants.SIZE, ts, data.getSize().getBytes());
        p.add(Constants.FRUIT_FAMILY, Constants.WEIGHT, ts, Bytes.toBytes(data.getWeight()));
        table.put(p);
   } finally {
      try {
         table.close();
      } finally {
         // nothing
      }
   }
}

注意，我在这里重新使用了连接。我建议从这里开始，因为这样更容易工作和调试。最终，由于您试图通过网络发送的请求的数量，这将无法扩展，您将需要开始将多个放入批处理在一起。

为了批处理PUTs，您需要使用HConnection打开一个表，并保持其打开状态。您还需要将自动刷新设置为false。这意味着表将自动缓冲请求，直到它达到“hbase.client.write.buffer”大小（默认值为2097152）。

// batch put method
private static boolean AUTO_FLUSH = false;
private static boolean CLEAR_BUFFER_ON_FAIL = false;
private HConnection connection;
private HTableInterface fruitTable;

@SuppressWarnings("rawtypes")
@Override
public void prepare(java.util.Map stormConf, backtype.storm.task.TopologyContext context) {
   Configuration config = HBaseConfiguration.create();
   connection = HConnectionManager.createConnection(config);
   fruitTable = connection.getTable(Constants.TABLE_FRUIT);
   fruitTable.setAutoFlush(AUTO_FLUSH, CLEAR_BUFFER_ON_FAIL);
}

@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
   try {
      // do stuff
      // call putFruit
   } catch (Exception e) {
      LOG.error("bolt error", e);
      collector.reportError(e);
   }
}

// example put method you'd call from within execute somewhere
private void putFruit(String key, FruitResult data) throws IOException {
   Put p = new Put(key.getBytes());
   long ts = data.getTimestamp();
   p.add(Constants.FRUIT_FAMILY, Constants.COLOR, ts, data.getColor().getBytes());
   p.add(Constants.FRUIT_FAMILY, Constants.SIZE, ts, data.getSize().getBytes());
   p.add(Constants.FRUIT_FAMILY, Constants.WEIGHT, ts, Bytes.toBytes(data.getWeight()));
   fruitTable.put(p);
}

在这两种方法中，最好还是在cleanup中尝试关闭HBase连接。请注意，在你的工人被杀之前，它可能不会被调用。

null

从Apache Storm bolt实现HBase中值的插入和删除

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档