The better choice of “Action” for running terasort test case in Oozie is “Java Action” instead of “Mapreduce Action” because terasort need to run

TeraInputFormat.writePartitionFile(job, partitionFile);

first and then load ‘partitonFile’ by “TotalOrderPartitioner”. It’s not a simple Mapreduce job which need merely a few propertyies.

The directory of this”TerasortApp” which using “Java Action” of Oozie looks just like:

TerasortApp/
├── job.properties
├── lib
│   └── hadoop-mapreduce-examples.jar
└── workflow.xml

The core of this App is “workflow.xml”:

                                                                                              [12/1991]
  
  
    
      ${jobTracker}
      ${nameNode}
      
        
      
      org.apache.hadoop.examples.terasort.TeraGen
      -Dmapred.map.tasks=96
      ${numRows}
      ${inputDir}
    

    
    
  

  
    
      ${jobTracker}
      ${nameNode}
      
        
      
      
        
          mapreduce.input.fileinputformat.split.minsize
          4294967296
        
      
      org.apache.hadoop.examples.terasort.TeraSort
      ${inputDir}
      ${outputDir}
      
    

    
    

  

  
    Failed to terasort!
  

  

Note 1. In Cloudera environment, The Web UI will fail in the last step of creating sharelib for Oozie Service. To fix this problem:

$sudo -u oozie /usr/lib/oozie/bin/oozie-setup.sh sharelib create -fs hdfs://localhost:8020 -locallib /usr/lib/oozie/oozie-sharelib-yarn/
$sudo -u oozie oozie  admin -shareliblist -oozie http://localhost:11000/oozie
[Available ShareLib]
oozie
hive
distcp
hcatalog
sqoop
mapreduce-streaming
spark
hive2
pig

Note 2. We can’t use property of ‘mapred.map.tasks’ to change the number of mappers in Terasort because it is actually decided by class ‘TotalOrderPartitioner’. Therefore I use ‘mapreduce.input.fileinputformat.split.minsize’ property to limit the number of mappers.