Cloud Stack Ninja

Pyspark noob here. I have a data set that looks like this (with thousands of different start and endIDs):

startID,  endID
1         1
1         2
1         3
2         3
1         1
...

And I need to count up all the times (rows) where the combinations of startID and endID occurred together and get something like this:

startID   endID  count
1         1      2
1         2      1
...


Read more here: https://stackoverflow.com/questions/64190673/pyspark-find-common-pairs-of-rows-with-column-values

Content Attribution

This content was originally published by b-ryce at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: