首页 技术 正文
技术 2022年11月17日
0 收藏 596 点赞 5,011 浏览 12426 个字

引用地址:http://www.cnblogs.com/lucius/p/3442381.html

examples:

Overview

This document explains how to write unit tests for your map reduce code, and testing your mapper and reducer logic on your desktop without having any Hadoop environment setup.

Let’s look at some code

For testing your map and reduce logic, we will need 4 blocks of code: Mapper code, Reducer code, Driver code, and finally the Unit Testing code.

Sample Mapper

In our sample Mapper code, we are simply counting the frequency of words and emitting <word, 1=””> for each word found.

hadoop2.2编程:MRUnit测试

 1 package com.kodkast.analytics; 2 3 import org.apache.hadoop.io.LongWritable; 4 import org.apache.hadoop.io.NullWritable; 5 import org.apache.hadoop.io.Text; 6 import org.apache.hadoop.mapred.OutputCollector; 7 import org.apache.hadoop.mapred.Reporter; 8 import org.apache.hadoop.mapred.MapReduceBase; 9 import org.apache.hadoop.mapred.Mapper;10 import org.apache.hadoop.mapred.JobConf;11 import org.apache.log4j.Logger;1213 import java.lang.Runtime;14 import java.io.*;1516 public class UnitTestDemoMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {1718     public static final Logger Log = Logger.getLogger(UnitTestDemoMapper.class.getName());19     private final static Text one = new Text("1");2021     public void configure(JobConf conf) {22         // mapper initialization code, if needed23     }2425     public void map(Object key, Text value, OutputCollector<Text, Text> collector, Reporter rep) throws IOException {26         try {2728             String input = value.toString();29             String[] words = processInput(input);3031             for(int i = 0; i < words.length; i++) {32                 Text textInput = new Text(words[i]);33                 collector.collect(textInput, one);34             }3536         } catch(IOException e) {37             e.printStackTrace();38         }39     }4041     private String[] processInput(String input) {42         String words[] = input.split(" ");43         return words;44     }4546 }

hadoop2.2编程:MRUnit测试

Sample Reducer

In our sample Reducer code, we are simply adding all the word counts and emitting the final result as <word, totalfrequency=””> for each word.

hadoop2.2编程:MRUnit测试

 1 package com.kodkast.analytics; 2 3 import java.io.IOException; 4 import java.util.Iterator; 5 6 import org.apache.hadoop.io.Text; 7 import org.apache.hadoop.mapred.MapReduceBase; 8 import org.apache.hadoop.mapred.OutputCollector; 9 import org.apache.hadoop.mapred.Reducer;10 import org.apache.hadoop.mapred.Reporter;1112 public class UnitTestDemoReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {1314     public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {1516         int count = 0;17         while (values.hasNext()) {18             String value = values.next().toString();19             count += Integer.parseInt(value);20         }21         String countStr = "" + count;22         output.collect(key, new Text(countStr));23     }24 }

hadoop2.2编程:MRUnit测试

Sample Driver

Simple invocation of Mapper and Reducer code.

hadoop2.2编程:MRUnit测试

 1 package com.kodkast.analytics; 2 3 import java.io.IOException; 4 import java.util.*; 5 6 import org.apache.hadoop.fs.Path; 7 import org.apache.hadoop.conf.*; 8 import org.apache.hadoop.io.*; 9 import org.apache.hadoop.mapred.*;10 import org.apache.hadoop.util.*;1112 public class UnitTestDemo {1314     public static void main(String[] args) throws Exception {15         JobConf conf = new JobConf(UnitTestDemo.class);16         conf.setJobName("unit-test-demo");17         conf.setOutputKeyClass(Text.class);18         conf.setOutputValueClass(Text.class);19         conf.setMapperClass(UnitTestDemoMapper.class);20         conf.setReducerClass(UnitTestDemoReducer.class);21         FileInputFormat.setInputPaths(conf, new Path(args[0]));22         FileOutputFormat.setOutputPath(conf, new Path(args[1]));23         JobClient.runJob(conf);24     }25 }

hadoop2.2编程:MRUnit测试

Unit Testing Class

Now, this is the new class which we are adding to test our mapper and reducer logic using mrunit framework built on top of junit.

hadoop2.2编程:MRUnit测试

 1 package com.kodkast.analytics; 2 3 import java.util.ArrayList; 4 import java.util.List; 5 import java.io.*; 6 7 import org.apache.hadoop.io.LongWritable; 8 import org.apache.hadoop.io.Text; 910 import org.apache.hadoop.mrunit.MapDriver;11 import org.apache.hadoop.mrunit.MapReduceDriver;12 import org.apache.hadoop.mrunit.ReduceDriver;13 import org.apache.hadoop.mapred.JobConf;14 import org.junit.Before;15 import org.junit.Test;1617 public class UnitTestDemoTest {1819     MapDriver<Object, Text, Text, Text> mapDriver;20     ReduceDriver<Text, Text, Text, Text> reduceDriver;2122     @Before23     public void setUp() {2425         // create mapper and reducer objects26         UnitTestDemoMapper mapper = new UnitTestDemoMapper();27         UnitTestDemoReducer reducer = new UnitTestDemoReducer();2829         // call mapper initialization code30         mapper.configure(new JobConf());3132         // create mapdriver and reducedriver objects for unit testing33         mapDriver = new MapDriver<Object, Text, Text, Text>();34         mapDriver.setMapper(mapper);35         reduceDriver = new ReduceDriver<Text, Text, Text, Text>();36         reduceDriver.setReducer(reducer);37     }3839     @Test40     public void testMapper() {4142         // prepare mapper input43         String input = "Hadoop is nice and Java is also very nice";4445         // test mapper logic46         mapDriver.withInput(new LongWritable(1), new Text(input));47         mapDriver.withOutput(new Text("Hadoop"), new Text("1"));48         mapDriver.withOutput(new Text("is"), new Text("1"));49         mapDriver.withOutput(new Text("nice"), new Text("1"));50         mapDriver.withOutput(new Text("and"), new Text("1"));51         mapDriver.withOutput(new Text("Java"), new Text("1"));52         mapDriver.withOutput(new Text("is"), new Text("1"));53         mapDriver.withOutput(new Text("also"), new Text("1"));54         mapDriver.withOutput(new Text("very"), new Text("1"));55         mapDriver.withOutput(new Text("nice"), new Text("1"));56         mapDriver.runTest();57     }5859     @Test60     public void testReducer() {6162         // prepare mapper output values63         List<Text> values = new ArrayList<Text>();64         String mapperValues[] = "1,1".split(",");65         for (int i = 0; i <= mapperValues.length - 1; i++) {66             values.add(new Text(mapperValues[i]));67         }6869         // test reducer logic70         reduceDriver.withInput(new Text("nice"), values);71         reduceDriver.withOutput(new Text("nice"), new Text("2"));72         reduceDriver.runTest();73     }7475 }

hadoop2.2编程:MRUnit测试

  • Add Unit tests for testing the Map Reduce logic

The use of this framework is quite straightforward, especially in our business case. So I will just show the unit test code and some comments if necessary but I think it is quite obvious how to use it.
The unit test for the Mapper ‘MapperTest’:

hadoop2.2编程:MRUnit测试

 1 package net.pascalalma.hadoop; 2 import org.apache.hadoop.io.Text; 3 import org.apache.hadoop.mrunit.mapreduce.MapDriver; 4 import org.junit.Before; 5 import org.junit.Test; 6 import java.io.IOException; 7 /** 8 * Created with IntelliJ IDEA. 9 * User: pascal10 */11 public class MapperTest {12 MapDriver<Text, Text, Text, Text> mapDriver;13 @Before14 public void setUp() {15 WordMapper mapper = new WordMapper();16 mapDriver = MapDriver.newMapDriver(mapper);17 }18 @Test19 public void testMapper() throws IOException {20 mapDriver.withInput(new Text("a"), new Text("ein"));21 mapDriver.withInput(new Text("a"), new Text("zwei"));22 mapDriver.withInput(new Text("c"), new Text("drei"));23 mapDriver.withOutput(new Text("a"), new Text("ein"));24 mapDriver.withOutput(new Text("a"), new Text("zwei"));25 mapDriver.withOutput(new Text("c"), new Text("drei"));26 mapDriver.runTest();27 }28 }

hadoop2.2编程:MRUnit测试

This test class is actually even simpler than the Mapper implementation itself. You just define the input of the mapper and the expected output and then let the configured MapDriver run the test. In our case the Mapper doesn’t do anything specific but you see how easy it is to setup a testcase.
For completeness here is the test class of the Reducer:

hadoop2.2编程:MRUnit测试

 1 package net.pascalalma.hadoop; 2 import org.apache.hadoop.io.Text; 3 import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; 4 import org.junit.Before; 5 import org.junit.Test; 6 import java.io.IOException; 7 import java.util.ArrayList; 8 import java.util.List; 9 /**10 * Created with IntelliJ IDEA.11 * User: pascal12 */13 public class ReducerTest {14 ReduceDriver<Text, Text, Text, Text> reduceDriver;15 @Before16 public void setUp() {17 AllTranslationsReducer reducer = new AllTranslationsReducer();18 reduceDriver = ReduceDriver.newReduceDriver(reducer);19 }20 @Test21 public void testReducer() throws IOException {22 List<Text> values = new ArrayList<Text>();23 values.add(new Text("ein"));24 values.add(new Text("zwei"));25 reduceDriver.withInput(new Text("a"), values);26 reduceDriver.withOutput(new Text("a"), new Text("|ein|zwei"));27 reduceDriver.runTest();28 }29 }

hadoop2.2编程:MRUnit测试

Debugging MapReduce Programs With MRUnit

The distributed nature of MapReduce programs makes debugging a challenge. Attaching a debugger to a remote process is cumbersome, and the lack of a single console makes it difficult to inspect what is occurring when several distributed copies of a mapper or reducer are running concurrently. Furthermore, operations that work on small amounts of input (e.g., saving the inputs to a reducer in an array) fail when running at scale, causing out-of-memory exceptions or other unintended effects.

A full discussion of how to debug MapReduce programs is beyond the scope of a single blog post, but I’d like to introduce you to a tool we designed at Cloudera to assist you with MapReduce debugging: MRUnit.

MRUnit helps bridge the gap between MapReduce programs and JUnit by providing a set of interfaces and test harnesses, which allow MapReduce programs to be more easily tested using standard tools and practices.

While this doesn’t solve the problem of distributed debugging, many common bugs in MapReduce programs can be caught and debugged locally. For this purpose, developers often try to use JUnit to test their MapReduce programs. The current state of the art often involves writing a set of tests that each create a JobConf object, which is configured to use a mapper and reducer, and then set to use the LocalJobRunner (viaJobConf.set(”mapred.job.tracker”, “local”)). A MapReduce job will then run in a single thread, reading its input from test files stored on the local filesystem and writing its output to another local directory.

This process provides a solid mechanism for end-to-end testing, but has several drawbacks. Developing new tests requires adding test inputs to files that are stored alongside one’s program. Validating correct output also requires filesystem access and parsing of the emitted data files. This involves writing a great deal of test harness code, which itself may contain subtle bugs. Finally, this process is slow. Each test requires several seconds to run. Users often find themselves aggregating several unrelated inputs into a single test (violating a unit testing principle of isolating unrelated tests) or performing less exhaustive testing due to the high barriers to test authorship.

The easiest way to test MapReduce programs is to include as little Hadoop-specific code as possible in one’s application. Parsers can operate on instances of String instead of Text, and mappers should instantiate instances of MySpecificParser to tokenize input data rather than embed parsing code in the body of MyMapper.map(). Your MySpecificParser implementation can then be tested with ordinary JUnit tests. Another class or method could then be used to perform processing on parsed lines.

But even with those components separately tested, your map() and reduce() calls should still be tested individually, as the composition of separate classes may cause unintended bugs to surface. MRUnit provides test drivers that accept programmatically specified inputs and outputs, which validate the correct behavior of mappers and reducers in isolation, as well as when composed in a MapReduce job. For instance, the following code checks whether the IdentityMapper emits the same (key, value) pair as output that it receives as input:

hadoop2.2编程:MRUnit测试

 1 import junit.framework.TestCase; 2 3 import org.apache.hadoop.io.Text; 4 import org.apache.hadoop.mapred.Mapper; 5 import org.apache.hadoop.mapred.lib.IdentityMapper; 6 import org.junit.Before; 7 import org.junit.Test; 8 9 public class TestExample extends TestCase {1011   private Mapper mapper;12   private MapDriver driver;1314   @Before15   public void setUp() {16     mapper = new IdentityMapper();17     driver = new MapDriver(mapper);18   }1920   @Test21   public void testIdentityMapper() {22     driver.withInput(new Text("foo"), new Text("bar"))23             .withOutput(new Text("foo"), new Text("bar"))24             .runTest();25   }26 }

hadoop2.2编程:MRUnit测试

The MapDriver orchestrates the test process, feeding the input (“foo” and “bar”) record to the IdentityMapper when its runTest() method is called. It also passes a mock OutputCollector implementation to the mapper. The driver then validates the output received by the OutputCollectoragainst the expected output (”foo” and “bar”) record. If the actual and expected outputs mismatch, a JUnit assertion failure is raised, informing the developer of the error. More test drivers exist for testing individual reducers, as well as mapper/reducer compositions.

End-to-end tests involving JobConf configuration code, InputFormat and OutputFormat implementations, filesystem access, and larger scale testing are still necessary. But many errors can be quickly identified with small tests involving a single, well-chosen input record, and a suite of regression tests allows correct behavior to be assured in the face of ongoing changes to your data processing pipeline. We hope MRUnit helps your organization test code, find bugs, and improve its use of Hadoop by facilitating faster and more thorough test cycles.

MRUnit is open source and is included in Cloudera’s Distribution for Hadoop. For more information about MRUnit, including where to get it and how to use its API, see the MRUnit documentation page.

How to run MRUnit with Command line?

注意: 需要下载MRUnit并编译,之后修改HADOOPHOME/libexec/hadoop−config.sh,将MRUnit_HOME/lib/*.jar添加进去, 之后source $HADOOP_HOME/libexec/hadoop-config.sh,再执行下面操作:

javac  -d class/  MaxTemperatureMapper.java  MaxTemperatureMapperTest.javajar -cvf test.jar -C class ./java -cp test.jar:$CLASSPATH org.junit.runner.JUnitCore  MaxTemperatureMapperTest  # oryarn -cp test.jar:$CLASSPATH org.junit.runner.JUnitCore  MaxTemperatureMapperTest
相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,086
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,561
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,410
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,183
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:7,820
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:4,903