Encoding a lot of categorical variables

I have 10 million categorical variables (each variable has 3 categories). What is the best way to encode these 10 million variables to train a deep learning model on them? (If I use one hot encoding, then I will end up having 30 million variables. Also, embedding layer with one output makes no sense (it is similar to integer encoding and there is no order between these categories) and embedding layer with two outputs does not make that much difference. Usually, we use embedding layer when number of categories is a lot). Please give me your opinion.



Read more here: https://stackoverflow.com/questions/64398841/encoding-a-lot-of-categorical-variables

Content Attribution

This content was originally published by hsn15051 at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: