Cloud Stack Ninja

Pyspark noob here. I have a data set that looks like this (with thousands of different start and endIDs):

startID,  endID
1         1
1         2
1         3
2         3
1         1

And I need to count up all the times (rows) where the combinations of startID and endID occurred together and get something like this:

startID   endID  count
1         1      2
1         2      1

Read more here:

Content Attribution

This content was originally published by b-ryce at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: