Here is the code:
from pyspark.sql import SQLContext from pyspark.context import SparkContext from pyspark.sql.types import * from typing import List sc = SparkContext() sqlContext = SQLContext.getOrCreate(sc) schema = StructType([StructField('id', LongType(), True), StructField('gid', LongType(), True), StructField('pid', LongType(), True), StructField('firstlogin', IntegerType(), True) ]) row = ['2', '29', '29', '29'] df = sqlContext.createDataFrame(row, schema) df.show()
It will report error after running ‘cat xxx.py|bin/pyspark’:
TypeError: StructType can not accept object '2' in type
I used to think it was because ‘2’ is a string, so I changed ‘row’ to be ‘[2, 29, 29, 29]’. But the error also changed to:
TypeError: StructType can not accept object 2 in type
Then I searched on google, and find this article. Looks like I forgot to transfer ‘list’ of python to ‘RDD’ of Apache Spark.
But at last, I found the real reason: I just need to add ‘[]’ between my ‘list’!
The right code is here:
row = ['2', '29', '29', '29'] df = sqlContext.createDataFrame([row], schema)