I find two datasets: employee and salary for learning and practicing. After putting two files into HDFS, we just need to create tables:

```create external table employee (
employee_id INT,
birthday DATE,
first_name STRING,
family_name STRING,
gender CHAR(1),
work_day DATE)
with serdeproperties (
"separatorChar" = ",",
"quoteChar"     = "'"
)
stored as textfile
location '/employee/';
create external table salary (
employee_id INT,
salary INT,
start_date DATE,
end_date DATE)
with serdeproperties (
"separatorChar" = ",",
"quoteChar"     = "'"
)
stored as textfile
location '/salary/';
```

Now we could analyze the data.
Find the oldest 10 employees.

```select * from employee order by birthday asc limit 10;
```

Find all the employees joined the corporation in January 1990.

```select * from employee where work_day >= '1990-01-01' and work_day <= '1990-01-31';
```

Find the top 10 employees earned the highest average salary. Notice we use 'order by' here because 'sort by' only produce local order in reducer.

```select e.first_name, e.family_name, avg(s.salary) as avg_salary from
employee as e join salary as s on (e.employee_id == s.employee_id)
group by e.first_name, e.family_name order by avg_salary limit 10;
```

Let's find out whether this corporation has sex discrimination:

```  SELECT e.gender, AVG(s.salary) AS avg_salary
FROM employee AS e
JOIN salary AS s
ON (e.employee_id == s.employee_id)
GROUP BY e.gender;
```

The result is:

```F       63767.607741168045
M       63839.90097030445
```

Looks good ðŸ™‚