Use hive to join two datasets

In previous article, I write java code of MapReduce-Framework to join two datasets. Furthermore, I enhanced the code to sort by scores for every student. The complete join-and-sort code is here. It need more than 170 lines of java code to join two tables and sort it. But in product environment, we usually use Hive to do the same work.
By using the same sample datasets:

Now we could join them:

Just three lines of HQL (Hive Query Language), not 170 lines of java code.

These two tables are very small, thus we could use local mode to run Hive task:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.