How do you sort a key,value pair using MapReduce?

I have been messing around with MapReduce, still very new to it, and was wondering if I could get some help with a question I'm having trouble answering: I have a txt file of dates and counts and want to sort the dates in ascending order based on their respective counts. The text file looks like this: postedDates

I have looked around and found some code like this:

import re

from mrjob.job import MRJob from mrjob.step import MRStep

WORD_RE = re.compile(r"[\w']+")

class MRWordFrequencyCount(MRJob):

def steps(self):
    return [
        MRStep(
            mapper=self.mapper_extract_words, combiner=self.combine_word_counts,
            reducer=self.reducer_sum_word_counts
        ),
        MRStep(
            reducer=self.reduce_sort_counts
        )
    ]

def mapper_extract_words(self, _, line):
    for word in WORD_RE.findall(line):
        yield word.lower(), 1

def combine_word_counts(self, word, counts):
    yield word, sum(counts)

def reducer_sum_word_counts(self, key, values):
    yield None, (sum(values), key)

def reduce_sort_counts(self, _, word_counts):
    for count, key in sorted(word_counts, reverse=True):
        yield ('%020d' % int(count), key)

But this seems too complex, because as you can see from the postedDates txt file, I already have the keys and their respective counts. So do I just need to add a second step that is just a reducer function that sorts the list of keys and values using "sorted(counts)"?

Kind regards for your time.



Read more here: https://stackoverflow.com/questions/66996225/how-do-you-sort-a-key-value-pair-using-mapreduce

Content Attribution

This content was originally published by Kristo Savic at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: