I have been messing around with MapReduce, still very new to it, and was wondering if I could get some help with a question I'm having trouble answering: I have a txt file of dates and counts and want to sort the dates in ascending order based on their respective counts. The text file looks like this: postedDates
I have looked around and found some code like this:
from mrjob.job import MRJob from mrjob.step import MRStep
WORD_RE = re.compile(r"[\w']+")
def steps(self): return [ MRStep( mapper=self.mapper_extract_words, combiner=self.combine_word_counts, reducer=self.reducer_sum_word_counts ), MRStep( reducer=self.reduce_sort_counts ) ] def mapper_extract_words(self, _, line): for word in WORD_RE.findall(line): yield word.lower(), 1 def combine_word_counts(self, word, counts): yield word, sum(counts) def reducer_sum_word_counts(self, key, values): yield None, (sum(values), key) def reduce_sort_counts(self, _, word_counts): for count, key in sorted(word_counts, reverse=True): yield ('%020d' % int(count), key)
But this seems too complex, because as you can see from the postedDates txt file, I already have the keys and their respective counts. So do I just need to add a second step that is just a reducer function that sorts the list of keys and values using "sorted(counts)"?
Kind regards for your time.