在Mapreduce中,map函数和reduce函数的独立测试非常方便,MRunit是一个测试库,它便于将已知的输入传递给mapper或者检查reducer的输出是否符合预期。这里由于主要体会MRunit.就用了最简单jing'dian的wordcount
MRUnit需要与Junit一起使用
(首先如果在windows下测试,还需要winutils.exe和hadoop.dll,github地址在这里:
https://github.com/steveloughran/winutils
需要根据自己的hadoop版本下载,github直接在这个仓库点击find,输入这两个文件找到下载
下载后移入hadoop下的/bin/目录下即可)
第一种方式:maven项目中,pom.xml设置:
<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>0.8.0-incubating</version>
</dependency>
第二种方式:下载jar包导入项目
下载地址:https://repository.apache.org/content/repositories/releases/org/apache/mrunit/mrunit/1.1.0/
直接下载jar包就好了,注意版本,注意hadoop后是1则匹配hadoop1.xx,2匹配hadoop2.xx版本
这里我用了第二种方式,自己下载包导入的,除此之外其他全部由pom.xml管理.下面是我的pom.xml,几乎只有hadoop所需要的包,还有MRUnit所需要的Junit
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>mvnhadoop</groupId>
<artifactId>hadoop</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.version>2.7.5</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.powermock</groupId>
<artifactId>powermock-api-mockito</artifactId>
<version>1.7.4</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class wordcount extends Mapper<IntWritable,Text,Text,IntWritable> {
@Override
protected void map(IntWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] arr = line.split(" ");
for(String s : arr){
context.write(new Text(s),new IntWritable(1));
}
}
}
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Test;
import java.io.IOException;
public class test {
@Test
public void run() throws IOException,InterruptedException{
Text value = new Text("a good man");
new MapDriver<IntWritable,Text,Text,IntWritable>()
.withMapper(new wordcount())
.withInput(new IntWritable(1),value)
.withOutput(new Text("a"),new IntWritable(1))
.withOutput(new Text("good"),new IntWritable(1))
.withOutput(new Text("man"),new IntWritable(1))
.runTest();
}
}
这里注意几个点:
1>引入的是org.hadop.mrunit.mapreduce.MapDriver而非org.apache.hadoop.mrunit.mapreduce
2>由于mapper忽略输入key,因此,输入key可以设置为任何值
3>withOutput()的key值必须包括所有mapper类输出的key值,否则MRUnit测试失败,这里输入的mapper的输入只有"a","good","man"这三个词,因此每一个都要对应一个withOutput()方法,如果一个mapper输出的单词没有对应的withOutput()方法,则测试失败
没有报错的话,则测试通过