Adding pyspark avro dependency for writing df to hive

I'm trying to write a pyspark dataframe to a hive table as avro format. I understand this is an external dependency from reading here, so I added this line to my spark session builder: .config('packages', 'org.apache.spark:spark-avro_2.12:3.0.2') \ which, from my understanding, should add that package to the spark submit. However, I am still getting this error:

Py4JJavaError: An error occurred while calling o1025.saveAsTable.
: org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;

when writing to the table with:


Spark version: 2.4.5

Read more here:

Content Attribution

This content was originally published by hunterm at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: