How to apply KMeans to get the centroid using dataframe with multiple features

I am following this detailed KMeans tutorial: which uses dataset with 2 features.

But I have a dataframe with 5 features (columns), so instead of using the def euclidean_distance(x1, x2): function in the tutorial, I compute the euclidean distance as below.

def euclidean_distance(df):
    n = df.shape[1]
    distance_matrix = np.zeros((n,n))
    for i in range(n):
        for j in range(n):
            distance_matrix[i,j] = np.sqrt(np.sum((df.iloc[:,i] - df.iloc[:,j])**2))
    return distance_matrix

Next I want to implement the part in the tutorial that computes the centroid as below;

def _closest_centroid(self, sample, centroids):
    distances = [euclidean_distance(sample, point) for point in centroids]

Since my def euclidean_distance(df): function only takes 1 argument, df, how best can I implement it in order to get the centroid?

My sample dataset, df is as below:


Read more here:

Content Attribution

This content was originally published by Gee at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: