Category Archives: bigdata

Use MapReduce to join two datasets

The two datasets are:

To join the two tables above by “student id”, we need to use MultipleInputs. The code is:

Compile and run it:

And the result in /my is:

Some tips about Hive

      No Comments on Some tips about Hive

Found some tips about Hive in my learning progress: 1. When I start “bin/hive” at first time, these errors report:

The solution is simple:

Actually, we’d better use mysql instead of derby for multi-users environment. 2. Control the number of mappers for SQL jobs. If a SQL job… Read more »