Add null features to train model when new phrase comes and model new inputs

Im training a model to detect entities in phrases. My train is composed by 500 phrases, which have 1000 words. So, my

X_train.shape = (500,1000) 

X_train = [[0. 0. 0. 0. ...], [0. 0. ...], ...]. <-- already have this

Each column is about an specific word (order is very important).

When I want to predict a new phrase's entity, I can receive words never seen. Consider that I receive the input: "My shirt is yellow"

I need to put this input in form of an np.array with shape (1, 1000). If the word yellow doesn't exists, I need to have an shape (1,1001) and retrain the model (with all zeros for that column, ofc). How can I do this?

Small example:

           "I" "am" "dark" "Vader's" "son". (trained corpus)
X_train = [[1,   1,   0,      0,      0], 
           [1,   1,   1,      0,      0]]

New input: Predict "I am dark Vader's daughter"

So I need to retrain my model with:

       "I" "am" "dark" "Vader's" "son" "daughter". (trained corpus)
X_train = [[1,   1,   0,      0,      0,   0], 
           [1,   1,   1,      0,      0,   0]]

So I can predict the new input:

X_predict = [[1,1,1,1,0,1]] - also need to put this in this form

Read more here:

Content Attribution

This content was originally published by memyselfandYone at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: