I have a dataframe with arount one million rows and 500 columns.
I used the following code to find out how many rows are duplicates with respect to x:
duplicate_x = df[df.duplicated(['x'])] print(len(duplicate_x))
That gives 26.000 duplicates.
I used the following code to find out how many rows are duplicates with respect to all columns:
duplicates = df[df.duplicated()] print(len(duplicates))
That gives duplicates.
Now i would like to find out on which columns those 26.000 duplicates with respect to x differ: any ideas? Thanks!!!!
Read more here: https://stackoverflow.com/questions/66329691/show-on-which-columns-duplicate-rows-with-respect-to-one-column-differ
Content Attribution
This content was originally published by Flo at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.