Tag Archives: Apache Spark

A problem about using DataFrame in Apache Spark

Here is the code for loading CSV file (table employee) to DataFrame of Apache Spark:

But after I run the jar in Spark, it report:

Seems data haven’t been correctly load. After reviewed the document for CSV format carefully, I noticed that the quote in my CSV file… Read more »

An example of using Spark Structured Streaming

This snippet will monitor two directories and join the data from them when there is a new CSV file in any directory.

The join operation is implemented by Spark SQL which is easy to use (for DBA), and also easy to maintain. Some articles said if the Spark process… Read more »

A problem of using Pyspark SQL

Here is the code:

It will report error after running ‘cat xxx.py|bin/pyspark’:

I used to think it was because ‘2’ is a string, so I changed ‘row’ to be ‘[2, 29, 29, 29]’. But the error also changed to:

Then I searched on google, and find this… Read more »