Use cumulative sum to assign a value in python/pyspark

Using Python I'd like to write some code that classifies all items where the cumulative sum of the Miles column <=2.5 as being "IN" and the rest "OUT". Are there any suggestions where to start?

Example Data set

Rank  Name  Miles
  1   A     0.5  
  2   A     1
  3   B     1
  4   B     1
  5   C     2

Desired Output

Rank  Name  Miles  Assign
  1   A     0.5     IN
  2   A     1       IN
  3   B     1       IN
  4   B     1       OUT
  5   C     2       OUT

Read more here:

Content Attribution

This content was originally published by steppermotor at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: