一、要求
1、对于spring hadoop2.1构建在jdk7上面(最低要求:jdk6及其以上),hadoop 2.6 默认构建在spring framework 4.1上。
2、spring for Apache hadoop2.1支持如下hadoop版本
Apache Hadoop 2.4.1
Apache Hadoop 2.5.2
Apache Hadoop 2.6.0
Pivotal HD 2.1
Cloudera CDH5(2.5.0-CDH5.3.0)
Hortonworks Data Platform 2.0
任何通过Apache Hadoop 2.2.x系列分布式都能够使用Spring For Apache Hadoop2.1,同时也支持HBase0.94.11、Hive 0.11.0 、Pig 0.1及其以上版本
在使用spring for Apache hadoop时,使用hadoop版本为基础,查看其所匹配的其他框架的版本
二、spring 和 hadoop
(1)、hadoop 配置
在运行时使用无论是本地的hadoop还是远程hadoop集群,都必须正确的配置以及以及引导hadoop提交job,具体的操作如下,注spring for Apache hadoop 简称 SHDP
第一步:使用shdp的命名空间
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hdp="http://www.springframework.org/schema/hadoop" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd"> <bean/> <hdp:configuration/> </beans>
注意上述的配置信息,同时也可以修改命名空间相关内容,<beans> 转为<hap>
<?xml version="1.0" encoding="UTF-8"?> <beans:beans xmlns="http://www.springframework.org/schema/hadoop" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:beans="http://www.springframework.org/schema/beans" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd"> <beans:bean id ... > <configuration ...> </beans:beans>
第二步: SHDP javaconfig形式
import org.springframework.context.annotation.Configuration;
import org.springframework.data.hadoop.config.annotation.EnableHadoop
import org.springframework.data.hadoop.config.annotation.SpringHadoopConfigurerAdapter
import org.springframework.data.hadoop.config.annotation.builders.HadoopConfigConfigurer;
@Configuration
@EnableHadoop
static class Config extends SpringHadoopConfigurerAdapter {
@Override
public void configure(HadoopConfigConfigurer config) throws Exception {
config
.fileSystemUri("hdfs://localhost:8021");
}
}
其中HadoopConfigConfigurer config参数记录hadoop相关的配置;@EnableHadoop必须关联标示@Configuration的class
第三步:配置hadoop
为了使用hadoop,首先需要Configuration对象配置hadoop的追踪信息、输入输出格式等各种配置参数,来简化工作
<hdp:configuration>定义一个ConfigurationFactoryBean名称默认为hadoopConfiguration的实体bean。
特殊情况需要指定资源的配置
<hdp:configuration resources="classpath:/custom-site.xml, classpath:/hq-site.xml">
完成将两个configuration文件添加到Hadoop Configuration中,除了上述的方法之外,我们可以通过properties文件设定Hadoop的配置信息
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
<hdp:configuration>
fs.defaultFS=hdfs://localhost:8020
hadoop.tmp.dir=/tmp/hadoop
electric=sea
</hdp:configuration>
</beans>
常用的参数fs.defaultFS、mapred.job.tracker、yarn.resourcemanager.address通过标签属性file-system-uri、job-tracker-uri、rm-manager-uri
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
<hdp:configuration>
fs.defaultFS=${hd.fs}
hadoop.tmp.dir=file://${java.io.tmpdir}
hangar=${number:18}
</hdp:configuration>
<context:property-placeholder location="classpath:hadoop.properties" />
</beans>
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
<!-- merge the local properties, the props bean and the two properties files -->
<hdp:configuration properties-ref="props" properties-location="cfg-1.properties, cfg-2.properties">
star=chasing
captain=eo
</hdp:configuration>
<util:properties id="props" location="props.properties"/>
</beans>
<!-- default name is 'hadoopConfiguration' -->
<hdp:configuration>
fs.defaultFS=${hd.fs}
hadoop.tmp.dir=file://${java.io.tmpdir}
</hdp:configuration>
<hdp:configuration id="custom" configuration-ref="hadoopConfiguration">
fs.defaultFS=${custom.hd.fs}
</hdp:configuration>
在如上的定义的configuration项中,相同的项,后面将覆盖掉前面的
register-url-handler配置项默认时关闭的,一旦开启之后每一vm都只有url,一旦出现故障,将会记录到log但是不会抛出异常,那么vm就不知道hdfs的意图,若是hdfs出现这方面的问题,需要注意相关方面。
详细hdp:configuration 配置项
Name Values Description
configuration-ref Bean Reference Reference to existing Configuration bean
properties-ref Bean Reference Reference to existing Properties bean
properties-location Comma delimited list List or Spring Resource paths
resources Comma delimited list List or Spring Resource paths
register-url-handler Boolean Registers an HDFS url handler in the running VM. Note that t his operation can be executed at most once in a given JVM hence the default is false. De faults to false.
file-system-uri String The HDFS filesystem address. Equivalent to fs.defaultFS propertys.
job-tracker-uri String
Job tracker address for HadoopV1. Equivalent to mapred.job.tracker property.
rm-manager-uri String The Yarn Resource manager address for HadoopV2. Equivalent to yarn.re sourcemanager.address property.
user-keytab String Security keytab.
user-principal String User security principal.
namenode-principal String Namenode security principal.
rm-manager-principal String Resource manager security principal.
security-method String The security method for hadoop.
四、命令行的支持