Train models over a list by combining different elements of the list, return the combinaison that gives the best score

I am trying to train several models over a dataset by using different combinaisons of the features list. So far, I have this:

features = ['ca','thal','slope','oldpeak','chol','fbs','thalach','exang']
for i in range(1, len(features) + 1):  # iterate and select next features
    Sbest = [] # Sbest will contain the list of elements which give the best score
    input_f = features[:i]
    y = data['target']
    X = data[input_f] 
    model_= KMeans(n_clusters=2, random_state=0, init='k-means++', n_init=10, max_iter=100)
    print(input_f,': {:.2f}'.format(fscore))

which gives the following output:

['ca'] : 0.62
['ca', 'thal'] : 0.62
['ca', 'thal', 'slope'] : 0.62
['ca', 'thal', 'slope', 'oldpeak'] : 0.71
['ca', 'thal', 'slope', 'oldpeak', 'chol'] : 0.42
['ca', 'thal', 'slope', 'oldpeak', 'chol', 'fbs'] : 0.42
['ca', 'thal', 'slope', 'oldpeak', 'chol', 'fbs', 'thalach'] : 0.56
['ca', 'thal', 'slope', 'oldpeak', 'chol', 'fbs', 'thalach', 'exang'] : 0.56

What I would like is to output the list of features that gives the best result, as we can see here, it's the one with the fscore of 0.71. So, instead of having all the outputs, I want instead this output:

['ca', 'thal', 'slope', 'oldpeak'] : 0.71

And if it happened that I had different lists that output the same score, the output would be the one with less elements. What is missing from my code ?

Read more here:

Content Attribution

This content was originally published by DNZ at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: