Pandas group different rows with same values in different columns

I have a dataframe which has same values in different parts of it, they can be in different rows and different columns. For example it has same email in 2 different columns and I want to get ids of 2 different rows with this email.

test1 = pd.DataFrame([{'id': 'iii1', 'phone': 'aaa1', 'email': 'qqq1', 'phone2': 'bbb1', 'email2': 'sss1'},
                     {'id': 'iii2', 'phone': 'aaa2', 'email': 'qqq2', 'phone2': 'aaa1', 'email2': 'sss2'},
                     {'id': 'iii3', 'phone': 'aaa3', 'email': 'qqq3', 'phone2': 'bbb3', 'email2': 'sss3'},
                     {'id': 'iii4', 'phone': 'aaa4', 'email': 'qqq4', 'phone2': 'bbb4', 'email2': 'qqq3'},
                     {'id': 'iii5', 'phone': 'aaa5', 'email': 'qqq5', 'phone2': 'bbb5', 'email2': 'sss5'},
                     {'id': 'iii6', 'phone': 'aaa6', 'email': 'qqq6', 'phone2': 'bbb6', 'email2': 'qqq1'}])

source df

I tried to make it with these steps:

  1. melt columns
test2 = pd.melt(
            test1, id_vars=['id'],
                                value_vars=['phone', 'email', 'phone2', 'email2']
        ).sort_values(by=['id'], ascending=False).reset_index(drop=True)
  1. group by melted values:
def testf(ser):
    uniqs = pd.unique(ser.values.ravel()).tolist()
    uniqs_len = len(uniqs)
    if uniqs_len > 1:
        return uniqs
    else:
        return 'only 1, doesnt interesting'

test3 = test2.groupby('value')['id'].apply(testf).reset_index()

So finally after these steps I got:
res I have

which almost what I want, but expected result should be:
[iii1,iii2,iii6]; [iii3,iii4]
I think other way can be merge, but I don't know how to realize that.



Read more here: https://stackoverflow.com/questions/65728115/pandas-group-different-rows-with-same-values-in-different-columns

Content Attribution

This content was originally published by Zext at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: