Adding pyspark avro dependency for writing df to hive

I'm trying to write a pyspark dataframe to a hive table as avro format. I understand this is an external dependency from reading here, so I added this line to my spark session builder: .config('packages', 'org.apache.spark:spark-avro_2.12:3.0.2') \ which, from my understanding, should add that package to the spark submit. However, I am still getting this error:

Py4JJavaError: An error occurred while calling o1025.saveAsTable.
: org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;

when writing to the table with:

df.write.saveAsTable('my_table_name',
                    mode='append',
                    format='avro',
                    partitionBy=['year','month','day'])

Spark version: 2.4.5



Read more here: https://stackoverflow.com/questions/66321955/adding-pyspark-avro-dependency-for-writing-df-to-hive

Content Attribution

This content was originally published by hunterm at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: