整合spark3.3.x和hive2.1.1-cdh6.3.2碰到个问题,就是spark官方支持的hive是2.3.x,但是cdh中的hive确是2.1.x的,项目中又计划用spark-thrift-server,导致编译过程中有部分报错。其中OperationLog这个类在hive2.3中新增加了几个方法,导致编译报错。这个时候有两种解决办法:
最终决定使用第二种方法,减少对源码的修改。
OperationLog类打包以后会出现在hive-exec-{version}.jar中,其实要做的事情很简单,就是删除这个jar中的OperationLog类,这里通过maven实现我知道的有如下两种实现办法:
上述两种方法各有优劣
第一种方法的pom中build部分如下
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.3.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<!-- shade会包含这个jar的所有类,所以这里直接排除 -->
<excludeArtifactIds>hive-exec</excludeArtifactIds>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<artifactSet>
<includes>
<include>org.apache.hive:hive-exec</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>org.apache.hive:hive-exec</artifact>
<excludes>
<exclude>META-INF/*.MF</exclude>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>org/apache/hadoop/hive/ql/session/OperationLog*</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
第二种方法的pom中build部分如下
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<!--这里id必须为default-jar否则会报错,因为默认执行时候会先执行default-jar,这里用这个id等于是替换掉默认的操作-->
<id>default-jar</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
<configuration>
<!--打包时候排除掉这些类-->
<excludes>**/OperationLog*.class</excludes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.3.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.8</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<target>
<echo message="Repackage hive-exec."/>
<!--解压原先的hive-exec.jar,解压时候直接跳过OperationLog相关的文件-->
<unjar src="${project.build.directory}/lib/hive-exec-${hive.version}.jar"
dest="${project.build.directory}/exploded/hive-exec">
<patternset>
<exclude name="**/OperationLog*.class"/>
</patternset>
</unjar>
<!--将自己编译的OperationLog相关的类复制到解压后的目录中-->
<copy todir="${project.build.directory}/exploded/hive-exec">
<fileset dir="${project.build.directory}/classes">
<include name="**/OperationLog*.class" />
</fileset>
</copy>
<!--重新打包新的hive-exec.jar-->
<jar destfile="${project.build.directory}/lib/hive-exec-${hive.version}.jar"
basedir="${project.build.directory}/exploded/hive-exec"/>
</target>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
虽然上述两种方法都能达到目的,第一种看着简洁一点,第二种看着操作复杂一点。不过我还是倾向于第二种,首先第二种不会减少和增加包的数量,包体积变化也不会太大,看着也更加符合预期的目的,其次第二种操作完以后看起来很简洁,更加适合强迫症或者代码洁癖患者。
参考链接: