Replace string if it contains certain substring in PySpark

Need to update a PySpark dataframe if the column contains the certain substring

for example:

df looks like

id      address
1       spring-field_garden
2       spring-field_lane
3       new_berry place

If the address column contains spring-field_ just replace it with spring-field.

Expected result:

id      address
1       spring-field
2       spring-field
3       new_berry place


df = df.withColumn('address',F.regexp_replace(F.col('address'), 'spring-field_*', 'spring-field'))

Seems not working.

Read more here:

Content Attribution

This content was originally published by newleaf at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: