Adding columns to a pandas dataframe based in external criteria

Given three dataframes, one contains user data, the second one contains data binning and the third are category names as in:

klasses_df = pd.DataFrame([[1, 'Sad'],
                           [7, 'Regular'],
                           [13, 'Happy'],
                           [42, 'Magical']],
                           columns=['klass', 'mood'])
                           
bins_df = pd.DataFrame([[0.0, 3.0, 1],
                        [3.0, 6.0, 7],
                        [6.0, 8.0, 13]],
                       columns=['lower', 'upper', 'klass'])


person_df = pd.DataFrame([['John', 1.5],
                          ['Mary', 3.6],
                          ['Paul', 7.2],
                          ['Josh', 5.7],
                          ['Phil', 9.9]],
                         columns=['name', 'feeling'])

I would like to extend the person_df (or create a new dataframe) where the correct klass_id and mood can be found. For example in the first row of person_df, John's feelings are at 1.5, checking in bins_df we can see that is in range first range [0, 3] hence at klass 1. Looking klasses_df we find that klass_id 1 is Sad. This will make the final/new row related to Jonh as John, 1.5, 1, 'Sad'.

To achieve that I have created two auxiliary funcions:

def find_klass_from_feeling(feeling, bin_data):
    values = bin_data.values
    klass = values[(values[:,0] <= feeling) & (feeling <= values[:,1])][:,2]
    if len(klass) == 0:
        return 0
    else:
        return int(klass.flatten()[0])

def find_mood_from_class(klass, klasses_data):
    if klass == 0:
        return None
    retval = klasses_df[klasses_df['klass'] == klass]['mood'].iloc[0]
    return retval

And I call them as:


final_df = person_df.copy()
klss = []
moods = []
for idx, row in person_df.iterrows():
    kls = find_klass_from_feeling(row['feeling'], bins_df)
    mood = find_mood_from_class(kls, klasses_df)
    klss.append(kls)
    moods.append(mood)
    
final_df['klass'] = klss
final_df['mood'] = moods

It works but seems completely wrong, since I believe, pandas has some more proper way to handle it. I tried to use apply and applymap without success.

Any hints are welcome.



Read more here: https://stackoverflow.com/questions/64902127/adding-columns-to-a-pandas-dataframe-based-in-external-criteria

Content Attribution

This content was originally published by lin at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: