Here is the code for loading CSV file (table employee) to DataFrame of Apache Spark:
val schema = StructType(
Seq(
StructField("id", LongType),
StructField("birthday", DateType),
StructField("firstname", StringType),
StructField("lastname", StringType),
StructField("gender", StringType),
StructField("workingdate", DateType)
)
)
val df = ss.read.format("csv")
.option("header", "false")
.option("quote", "'")
.schema(schema)
.load("employees.csv")
df.show()
But after I run the jar in Spark, it report:
+----+--------+---------+--------+------+-----------+ | id|birthday|firstname|lastname|gender|workingdate| +----+--------+---------+--------+------+-----------+ |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| |null| null| null| null| null| null| +----+--------+---------+--------+------+-----------+
Seems data haven’t been correctly load.
After reviewed the document for CSV format carefully, I noticed that the quote in my CSV file is ‘ instead of “. So I added a option in my code to let Spark recognise single quote:
val df = ss.read.format("csv")
.option("header", "false")
.option("quote", "'")
.schema(schema)
.load("employees.csv")
This time the CSV have been read out properly.
+-----+----------+---------+-----------+------+-----------+ | id| birthday|firstname| lastname|gender|workingdate| +-----+----------+---------+-----------+------+-----------+ |10001|1953-09-02| Georgi| Facello| M| 1986-06-26| |10002|1964-06-02| Bezalel| Simmel| F| 1985-11-21| |10003|1959-12-03| Parto| Bamford| M| 1986-08-28| |10004|1954-05-01|Chirstian| Koblick| M| 1986-12-01| |10005|1955-01-21| Kyoichi| Maliniak| M| 1989-09-12| |10006|1953-04-20| Anneke| Preusig| F| 1989-06-02| |10007|1957-05-23| Tzvetan| Zielinski| F| 1989-02-10| |10008|1958-02-19| Saniya| Kalloufi| M| 1994-09-15| |10009|1952-04-19| Sumant| Peac| F| 1985-02-18| |10010|1963-06-01|Duangkaew| Piveteau| F| 1989-08-24| |10011|1953-11-07| Mary| Sluis| F| 1990-01-22| |10012|1960-10-04| Patricio| Bridgland| M| 1992-12-18| |10013|1963-06-07|Eberhardt| Terkki| M| 1985-10-20| |10014|1956-02-12| Berni| Genin| M| 1987-03-11| |10015|1959-08-19| Guoxiang| Nooteboom| M| 1987-07-02| |10016|1961-05-02| Kazuhito|Cappelletti| M| 1995-01-27| |10017|1958-07-06|Cristinel| Bouloucos| F| 1993-08-03| |10018|1954-06-19| Kazuhide| Peha| F| 1987-04-03| |10019|1953-01-23| Lillian| Haddadi| M| 1999-04-30| |10020|1952-12-24| Mayuko| Warwick| M| 1991-01-26| +-----+----------+---------+-----------+------+-----------+