Clean dataset with many dfferent words for one thing

At the moment I have a dataset of ingredients. The problem is that it is not very clean because it contains many different names for the same thing. Here are a few examples:

Mehl = Weizenmehl, Mehl Type360

or

Eier = Eier, Ei(er), Ei

I thought of maybe deleting those brackets and making many if statements which are looking for different things like "Mehl" but there I would have to also look for something like "Dinkel" because of

Dinkelmehl != Mehl

I could do it but it would be very laborious because that's a big dataset. Are there some other methods maybe with a dictionary or something? I hope you can help me thank you!



Read more here: https://stackoverflow.com/questions/64894999/clean-dataset-with-many-dfferent-words-for-one-thing

Content Attribution

This content was originally published by Frederick at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: