How to create new variables based on a binary variable in a grouped data set in R?

The data set has 3 columns-the 1st column is id and the 2nd column is called year, and the 3rd column is node and it is a binary variable. Now, we need to fix the data errors on the column 3 "node" with the rule as follows.

1)Within each id, all values before the last value of node=1 should be equal to 1. There should be no node=0 occurring before a node=1. It should only stay at node=1 for all years or go from node=1 to node=0 at some point.

2)within each id, if all values of node from year1 to year8 equals (0 or 1), then we must keep them without any change.

To sum, the corrected data set should look like this,

 id     node   year
383100111   1   1
383100111   1   2
383100111   1   3
383100111   1   4
383100111   1   5
383100111   1   6
383100111   1   7
383100111   1   8
383100222   0   1
383100222   1   2
383100222   1   3
383100222   1   4
383100222   1   5
383100222   1   6
383100222   1   7
383100222   1   8
383100333   1   1
383100333   1   2
383100333   1   3
383100333   1   4
383100333   1   5
383100333   1   6
383100333   1   7
383100333   1   8
383100444   1   1
383100444   1   2
383100444   1   3
383100444   1   4
383100444   1   5
383100444   1   6
383100444   1   7
383100444   1   8
383100555   0   1
383100555   1   2
383100555   1   3
383100555   1   4
383100555   1   5
383100555   1   6
383100555   1   7
383100555   1   8
383100666   1   1
383100666   0   2
383100666   0   3
383100666   0   4
383100666   0   5
383100666   0   6
383100666   0   7
383100666   0   8
383100777   0   1
383100777   1   2
383100777   1   3
383100777   1   4
383100777   1   5
383100777   1   6
383100777   1   7
383100777   1   8

The original data set with errors is structured as follows,

structure(list(id = c(383100111, 383100111, 383100111, 383100111, 
383100111, 383100111, 383100111, 383100111, 383100222, 383100222, 
383100222, 383100222, 383100222, 383100222, 383100222, 383100222, 
383100333, 383100333, 383100333, 383100333, 383100333, 383100333, 
383100333, 383100333, 383100444, 383100444, 383100444, 383100444, 
383100444, 383100444, 383100444, 383100444, 383100555, 383100555, 
383100555, 383100555, 383100555, 383100555, 383100555, 383100555, 
383100666, 383100666, 383100666, 383100666, 383100666, 383100666, 
383100666, 383100666, 383100777, 383100777, 383100777, 383100777, 
383100777, 383100777, 383100777, 383100777), node = c(1, 1, 1, 
0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 
0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), year = c(1, 2, 3, 4, 5, 6, 
7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 
4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 
1, 2, 3, 4, 5, 6, 7, 8)), row.names = c(NA, 56L), class = "data.frame"))->dataframe

Thank you!



Read more here: https://stackoverflow.com/questions/64890800/how-to-create-new-variables-based-on-a-binary-variable-in-a-grouped-data-set-in

Content Attribution

This content was originally published by fan lin at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: