I'm running a k-means algorithm (k=5) to cluster my Data. To check the stability of my algorithm I first run the algorithm once on my whole dataset and afterwards I run the algorithm multiple times on 2/3 of my dataset (using a different random states for the splits). I use the results to predict the cluster of the remaining 1/3 of my data. Finally I want to compare the predicted cluster with the cluster I get when I run k-means on the whole dataset. This is where I get stuck.
Since k-means always assigns different labels to the (more or less) same clusters I can't just compare the them. I tried using
.value_counts() to reasign the labels 0 to 4 based on their frequency. But because I run this check multiple times, I need something that works in a loop.
Basically when I use
.value_counts() I get something like this
PredictedCluster 4 55555 0 44444 2 33333 1 22222 3 11111
If I could turn this into an array like this
a = [[4, 55555],[0,44444],...,[3,11111]]
I would be fine. Basically I want to get an array where the labels are sorted by size.
Can anyone please tell me how to do this or what other approach I could use to solve my problem?