Clean dataset with many dfferent words for one thing

At the moment I have a dataset of ingredients. The problem is that it is not very clean because it contains many different names for the same thing. Here are a few examples:

Mehl = Weizenmehl, Mehl Type360


Eier = Eier, Ei(er), Ei

I thought of maybe deleting those brackets and making many if statements which are looking for different things like "Mehl" but there I would have to also look for something like "Dinkel" because of

Dinkelmehl != Mehl

I could do it but it would be very laborious because that's a big dataset. Are there some other methods maybe with a dictionary or something? I hope you can help me thank you!

Read more here:

Content Attribution

This content was originally published by Frederick at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: