Show on which columns duplicate rows, with respect to one column, differ

I have a dataframe with arount one million rows and 500 columns. I used the following code to find out how many rows are duplicates with respect to x: duplicate_x = df[df.duplicated(['x'])] print(len(duplicate_x)) That gives 26.000 duplicates.

I used the following code to find out how many rows are duplicates with respect to all columns: duplicates = df[df.duplicated()] print(len(duplicates)) That gives duplicates.

Now i would like to find out on which columns those 26.000 duplicates with respect to x differ: any ideas? Thanks!!!!



Read more here: https://stackoverflow.com/questions/66329691/show-on-which-columns-duplicate-rows-with-respect-to-one-column-differ

Content Attribution

This content was originally published by Flo at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: