Need to update a PySpark dataframe if the column contains the certain substring
for example:
df looks like
id address
1 spring-field_garden
2 spring-field_lane
3 new_berry place
If the address column contains spring-field_
just replace it with spring-field
.
Expected result:
id address
1 spring-field
2 spring-field
3 new_berry place
Tried:
df = df.withColumn('address',F.regexp_replace(F.col('address'), 'spring-field_*', 'spring-field'))
Seems not working.
Read more here: https://stackoverflow.com/questions/66251932/replace-string-if-it-contains-certain-substring-in-pyspark
Content Attribution
This content was originally published by newleaf at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.