Use Oozie to run terasort

The better choice of “Action” for running terasort test case in Oozie is “Java Action” instead of “Mapreduce Action” because terasort need to run

TeraInputFormat.writePartitionFile(job, partitionFile);

first and then load ‘partitonFile’ by “TotalOrderPartitioner”. It’s not a simple Mapreduce job which need merely a few propertyies.
The directory of this”TerasortApp” which using “Java Action” of Oozie looks just like:

TerasortApp/
├── job.properties
├── lib
│   └── hadoop-mapreduce-examples.jar
└── workflow.xml

The core of this App is “workflow.xml”:

                                                                                              [12/1991]
  
  
    
      ${jobTracker}
      ${nameNode}
      
        
      
      org.apache.hadoop.examples.terasort.TeraGen
      -Dmapred.map.tasks=96
      ${numRows}
      ${inputDir}
    
    
    
  
  
    
      ${jobTracker}
      ${nameNode}
      
        
      
      
        
          mapreduce.input.fileinputformat.split.minsize
          4294967296
        
      
      org.apache.hadoop.examples.terasort.TeraSort
      ${inputDir}
      ${outputDir}
      
    
    
    
  
  
    Failed to terasort!

Note 1. In Cloudera environment, The Web UI will fail in the last step of creating sharelib for Oozie Service. To fix this problem:

$sudo -u oozie /usr/lib/oozie/bin/oozie-setup.sh sharelib create -fs hdfs://localhost:8020 -locallib /usr/lib/oozie/oozie-sharelib-yarn/
$sudo -u oozie oozie  admin -shareliblist -oozie http://localhost:11000/oozie
[Available ShareLib]
oozie
hive
distcp
hcatalog
sqoop
mapreduce-streaming
spark
hive2
pig

Note 2. We can’t use property of ‘mapred.map.tasks’ to change the number of mappers in Terasort because it is actually decided by class ‘TotalOrderPartitioner’. Therefore I use ‘mapreduce.input.fileinputformat.split.minsize’ property to limit the number of mappers.

Robin on Linux

Use Oozie to run terasort

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply