Cloud Stack Ninja

I'm working with network analysis and I performed spectral clustering in a graph and I'd like to evaluate my clustering using silhouette score, but I'm getting an error which makes me question if I'm going in the right direction. First, I define a custom distance function between two nodes of my graph (I'm using networkx to do that):

def compute_node_distance(nodeA, nodeB, graph):
  try:
    return nx.shortest_path_length(graph, source=nodeA['ID'], target=nodeB['ID'])
  except nx.exception.NetworkXNoPath as noPath:
    #Return a big value if there are no paths between nodeA and nodeB
    return 1000000

Then I try to compute the silhouette score using this custom function as my distance metric, like this:

silhouette_score(
                 labeled_nodes_df,
                 labeled_nodes_df['cluster'],
                 metric=lambda a,b: compute_node_distance(a,b,G))

Where G is the Graph I'm performing spectral clustering on and labeled_nodes_df is a pandas DataFrame with only two columns, 'ID' and 'cluster'. In order to my custom function to work, I only need the nodes' IDs, so I believe this should be enough.

However, when running this code, silhouette_score gives me an error, saying that it couldn't convert one my nodes ID to float:

ValueError: could not convert string to float: '030714318X'

The thing is I do NOT want this value to be converted to a float, because I have my own metric function that is perfectly able to compute the distance between two nodes given their IDs as strings.

I know I could pre-compute the distance matrix between my nodes and use metric='precomputed', but the graph I'm working on is really big (65000+ nodes) and storing such a matrix is really inefficient, so I'm trying to compute the distances as they are needed. Can someone please help me to figure out what I'm doing wrong and why silhouette_score is trying to convert my IDs to floats? Thanks.



Read more here: https://stackoverflow.com/questions/64407071/how-can-i-compute-the-silhouette-score-of-a-spectral-clustering-using-a-custom-d

Content Attribution

This content was originally published by Vinícius Silva at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: