create pyspark dataframe with json string values and schema

I am trying to manually create some dummy pyspark dataframe.

I did the following:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [('{"applicationTimeStamp":"2020-08-01T08:14:20.650Z","version":null}')
            ]

schema = StructType([ \
    StructField("raw_json",StringType(),True)
  ])

df = spark.createDataFrame(data=data2,schema=schema)
df.printSchema()
df.show(truncate=False)

but i got the error:

TypeError: StructType can not accept object '[{"applicationTimeStamp":"2020-08-01T08:14:20.650Z","version":null}]' in type <class 'str'>

How am i able to put json string into pyspark dataframe as values?

my ideal result is:

+-----------------------------------------------------------------+
|value                                                             |             
+-----------------------------------------------------------------------
| {"applicationTimeStamp":"2020-08-01T08:14:20.650Z","version":null}|


Read more here: https://stackoverflow.com/questions/66270117/create-pyspark-dataframe-with-json-string-values-and-schema

Content Attribution

This content was originally published by sdwww at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: