In my reprex below:
RSAis the output of a process that is to be analyzed and whose results is to be grouped.- Each
RSAgroup has varying range of days (datenum) each is observed. var1varies less frequently, but each is observed for the same 8 days consecutively.- The
RSAgroups are to be numbered sequentially within thevar1group; when a newvar1is encountered theRSAgroup numbering begins anew. idx_objectiveis the index that I am looking for.
Reprex:
var1 <- c("aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd")
RSA <- c(1,1,1,0,-1,-1,0,-1,
0,0,0,-1,-1,-1,1,1,
-1,-1,0,1,1,-1,-1,1,
1,-1,-1,1,1,0,-1,1)
idx_objective <- c(1,1,1,2,3,3,4,5,
1,1,1,2,2,2,3,3,
1,1,2,3,3,4,4,5,
1,2,2,3,3,4,5,6)
objective.df <- data.frame(var1, RSA, idx_objective) %>%
group_by(var1) %>%
mutate (datenum = 1:n()) %>%
relocate (datenum, .after = var1)
I have reviewed many SO posts that appear to be similar...
1dplyr: group variables then assign unique names based on unique grouping
revolves around correct use of cumsum, which I think I am using correctly
[https://stackoverflow.com/questions/40519129/how-to-assign-unique-id-for-group-of-duplicates]
[2]How to divide between groups of rows using dplyr
The last two don't seem applicable; two others referenced in the following:
Approach #1: using a change flag and cumsum
objective.try1 <- objective.df %>%
group_by(var1) %>%
mutate(chg_flg = ifelse(lag(RSA) != RSA, 1, 0) %>%
coalesce(0)) %>%
relocate(chg_flg, .after = RSA) %>%
relocate (datenum, .after var1) %>%
group_by(var1, chg_flg) %>%
mutate (idx_objective_try = cumsum(chg_flg) +1) %>%
Results:
objective.try1 <- c(1, 1, 1, 2, 3, 1, 4, 5, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 3, 1, 4, 1, 5, 1, 2, 1, 3, 1, 4, 5, 6)
objective.df <- data.frame(var1, RSA, idx_objective, objective.try1 %>%
group_by(var1) %>%
mutate (datenum = 1: n()) %>%
relocate(datenum, .after = var1)
Observation for objective.try1: rows 1-5 work, but row 6 incorrectly restarts the idx numbering over again, but then resumes correctly reflecting the chg_flg until rows 13 and 14 at which time the idx numbering is again incorrectly restarted, but then resumes again being correct for one row until being incorrect again at rows 16, 21, 23, 27, and 29.
Following the logic at row 6, for example -- the previous idx_objective_try (row 5) is 3 and the chg_flg value at row 6 is zero, so the idx_objecitve_try ought to be the correct value of 3. Why isn't it?
Approach #2: Using match and duplicated:
objective.try2 <- objective.df %>%
group_by(var1) %>%. # var1 corresponds to "prop" in the SO post (both the slower moving variables)
mutate(well_rep1 = match(RSA, unique(RSA)), # "RSA" corresponds to "well" in the SO post (both the faster changing variables)
well_rep2 = cumsum(!duplicated(RSA))) # approach similar to above
Observation for objective.try2: most rows work, but there again are rows that do not work, though the rows that don't work are different from those in the first try.
I would appreciate it if someone would point out what I am doing wrong.