Here is the code for loading CSV file (table employee) to DataFrame of Apache Spark:
val schema = StructType(
Seq(
StructField("id", LongType),
StructField("birthday", DateType),
StructField("firstname", StringType),
StructField("lastname", StringType),
StructField("gender", StringType),
StructField("workingdate", DateType)
)
)
val df = ss.read.format("csv")
.option("header", "false")
.option("quote", "'")
.schema(schema)
.load("employees.csv")
df.show()
But after I run the jar in Spark, it report:
+----+--------+---------+--------+------+-----------+
| id|birthday|firstname|lastname|gender|workingdate|
+----+--------+---------+--------+------+-----------+
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
|null| null| null| null| null| null|
+----+--------+---------+--------+------+-----------+
Seems data haven’t been correctly load.
After reviewed the document for CSV format carefully, I noticed that the quote in my CSV file is ‘ instead of “. So I added a option in my code to let Spark recognise single quote:
val df = ss.read.format("csv")
.option("header", "false")
.option("quote", "'")
.schema(schema)
.load("employees.csv")
This time the CSV have been read out properly.
+-----+----------+---------+-----------+------+-----------+
| id| birthday|firstname| lastname|gender|workingdate|
+-----+----------+---------+-----------+------+-----------+
|10001|1953-09-02| Georgi| Facello| M| 1986-06-26|
|10002|1964-06-02| Bezalel| Simmel| F| 1985-11-21|
|10003|1959-12-03| Parto| Bamford| M| 1986-08-28|
|10004|1954-05-01|Chirstian| Koblick| M| 1986-12-01|
|10005|1955-01-21| Kyoichi| Maliniak| M| 1989-09-12|
|10006|1953-04-20| Anneke| Preusig| F| 1989-06-02|
|10007|1957-05-23| Tzvetan| Zielinski| F| 1989-02-10|
|10008|1958-02-19| Saniya| Kalloufi| M| 1994-09-15|
|10009|1952-04-19| Sumant| Peac| F| 1985-02-18|
|10010|1963-06-01|Duangkaew| Piveteau| F| 1989-08-24|
|10011|1953-11-07| Mary| Sluis| F| 1990-01-22|
|10012|1960-10-04| Patricio| Bridgland| M| 1992-12-18|
|10013|1963-06-07|Eberhardt| Terkki| M| 1985-10-20|
|10014|1956-02-12| Berni| Genin| M| 1987-03-11|
|10015|1959-08-19| Guoxiang| Nooteboom| M| 1987-07-02|
|10016|1961-05-02| Kazuhito|Cappelletti| M| 1995-01-27|
|10017|1958-07-06|Cristinel| Bouloucos| F| 1993-08-03|
|10018|1954-06-19| Kazuhide| Peha| F| 1987-04-03|
|10019|1953-01-23| Lillian| Haddadi| M| 1999-04-30|
|10020|1952-12-24| Mayuko| Warwick| M| 1991-01-26|
+-----+----------+---------+-----------+------+-----------+