Hadoop GenericOptionsParser

GenericOptionsParser is used in ToolRunner to parse out the generic options, -conf, -D, -jt, -files, -libjars, -archives and set these generic options in the job configuration accordingly. Any leftover remaining command options will then be passed to our mapreduce job by calling the run(toolArgs).

String[] toolArgs = parser.getRemainingArgs();  //get any remaining command options other than the generic options

return tool.run(toolArgs);

Since our mapreduce job implements the run(String[] args) method, we will get passed all the remaining command options and can use them for configuring our job.

Hadoop ToolRunner and Tool

When your create a MapReduce Job, you can implement the Tool interface’s method

int run(String[] args) throws Exception;

This is the only method in Tool interface. This would allow the ToolRunner to invoke our implemented run(String[]) method of our MapReduce job. In the main method, you call ToolRunner.run(new Configuration(), new MapReduceJob(), args) as follows:

public static void main(String[] args) {

int status = ToolRunner.run(new Configuration(), new MapReduceJob(), args);

}

If you take a look at the ToolRunner class, you will find the method, since our MapReduceJob is an implementation of Tool interface, ToolRunner will in turn invokes our implementation of run(String[] args)

public static int run(Configuration conf, Tool tool, String[] args)

throws Exception{

if(conf == null) {

conf = new Configuration();

}

GenericOptionsParser parser = new GenericOptionsParser(conf, args);

//set the configuration back, so that Tool can configure itself

tool.setConf(conf);

//get the args w/o generic hadoop args

String[] toolArgs = parser.getRemainingArgs();

return tool.run(toolArgs);

}