下载地址:
Apache Tez各版本下载地址:Apache TEZ Releases
备用下载地址:Apache Tez
解压并更名:
tar -xzvf apache-tez-0.9.1-bin.tar.gz -C /opt/module/
mv /opt/module/apache-tez-0.9.1-bin /opt/module/tez-0.9.1
注意: 需要将Tez(客户端)安装在与Hive(客户端)相同节点上
tez/lib
路径下的hadoop相关jar包这一步操作时为了避免jar包版本冲突,因为后续这些不同版本的jar包都会添加到HADOOP_CLASSPATH中,如果不覆盖,在Hive中使用MR引擎执行Job时会发生版本冲突而报错
删除tez-0.9.1/lib
下的hadoop相关的jar包:
rm hadoop-mapreduce-client-core-2.7.0.jar
rm hadoop-mapreduce-client-common-2.7.0.jar
将集群中hadoop中的对应jar包复制添加到tez-0.9.1/lib
下(实测也可以不添加):
cp /opt/module/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.7.jar /opt/module/tez-0.9.1/lib
cp /opt/module/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.7.jar /opt/module/tez-0.9.1/lib
tez/share/tez.tar.gz
上传至HDFS中,并修改权限hadoop fs -rm -R /apps/tez-0.9.1
hadoop fs -mkdir -p /apps/tez-0.9.1
hadoop fs -put -f /opt/module/tez-0.9.1/share/tez.tar.gz /apps/tez-0.9.1
hadoop fs -chmod -R 777 /apps
PS:如果是编译Tez的Maven项目源码,则是将压缩包 tez/target/tez-x.y.z-SNAPSHOT.tar.gz 上传到HDFS
hive/conf
目录下创建tez-site.xml文件在hive/conf
目录下创建 tez-site.xml 文件,并配置相关参数
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 设置tez依赖的jar包路径,值为上传的Tez压缩包所在的HDFS路径 -->
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/apps/tez-0.9.1/tez.tar.gz</value>
<description>
String value to a file path.
The location of the Tez libraries which will be localized for DAGs.
</description>
<type>string</type>
</property>
<!-- 设置是否使用集群中的hadoop函数库,如果为false,则使用tez.lib.uris中包含的hadoop依赖 -->
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>false</value>
<description>
Boolean value.
Specify whether hadoop libraries required to run Tez should be the ones deployed on the cluster.
This is disabled by default - with the expectation being that tez.lib.uris has a complete
tez-deployment which contains the hadoop libraries.
</description>
<type>boolean</type>
</property>
<!-- 如果没有设置 tez.am.launch.cmd-opts 参数,则便会使用此功能.
此参数设定Tez Job所能使用的JVM堆内存占整个Container内存大小的比例
如果YARN中的container内存资源较少,则将此值适当减小,反之则适当增大. -->
<property>
<name>tez.container.max.java.heap.fraction</name>
<value>0.2</value>
<description>
Double value. Tez automatically determines the Xmx for the JVMs used to run
Tez tasks and app masters. This feature is enabled if the user has not
specified Xmx or Xms values in the launch command opts. Doing automatic Xmx
calculation is preferred because Tez can determine the best value based on
actual allocation of memory to tasks the cluster. The value if used as a
fraction that is applied to the memory allocated Factor to size Xmx based
on container memory size. Value should be greater than 0 and less than 1.
Set this value to -1 to allow Tez to use different default max heap fraction
for different container memory size. Current policy is to use 0.7 for container
smaller than 4GB and use 0.8 for larger container.
</description>
<type>float</type>
</property>
<!-- 设置Tez task的ApplicationMaster 所用内存,单位MB -->
<!-- 由于主机内存只有1.5G可用,因此将此值减小 -->
<!-- 默认值:1024 -->
<property>
<name>tez.am.resource.memory.mb</name>
<value>1024</value>
<description>
Int value. The amount of memory in MB to be used by the AppMaster
</description>
<type>integer</type>
</property>
<!-- 设置Tez task的所用内存,单位MB-->
<!-- 由于主机内存只有1.5G可用,因此将此值减小 -->
<!-- 默认值:1024 -->
<property>
<name>tez.task.resource.memory.mb</name>
<value>512</value>
<description>
Int value. The amount of memory in MB to be used by tasks. This applies to
all tasks across all vertices. Setting it to the same value for all tasks
is helpful for container reuse and thus good for performance typically.
</description>
<type>integer</type>
</property>
</configuration>
直接在Hive安装路径下的conf/hive-env.sh
文件结尾设置相关环境变量,故每次Hive启动时,自动加载Tez相关环境变量。
TEZ_CONF_DIR
:Tez配置文件 tez-site.xml 所在路径TEZ_JARS
:Tez压缩包解压路径HADOOP_CLASSPATH
:Hadoop运行时的classpath# Tez classpath
TEZ_CONF_DIR=/opt/module/tez-0.9.1/conf/tez-site.xml
TEZ_JARS=/opt/module/tez-0.9.1
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
# 如果使用某些额外的jar包,可以通过HIVE_AUX_JARS_PATH变量指定路径
# 如hadoop-lzo依赖包等,此处额外依赖包都放在了/opt/libs/路径下
export HIVE_AUX_JARS_PATH=/opt/libs/*
hive> set hive.execution.engine=tez;
hive> SELECT deptno, avg(sal) as avg_sal FROM emp group by deptno;
可以直接在hive/conf/hive-site.xml
文件中设置参数hive.execution.engine
值为tez
,即默认使用Tez执行MR Job:
<property>
<name>hive.execution.engine</name>
<value>tez</value>
<description>
Expects one of [mr, tez, spark].
Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
remains the default engine for historical reasons, it is itself a historical engine
and is deprecated in Hive 2 line. It may be removed without further warning.
</description>
</property>
hive> set hive.execution.engine=mr;