Using data.table package we can preserve the original sorting (recommended):
library(data.table)
setDT(df1)[,new_col:=.GRP-1, by = c("A", "B","C")]
#if you want the column as factor (one-liner, no need for previous line)
setDT(df1)[,new_col:=.GRP-1, by = c("A", "B","C")][,new_col:=as.factor(new_col)]
Using dplyr we can do something like below:
(Rui's solution implemented in dplyr with minimal modification to consider possibility of duplicate rows):
This also preserves the sorting;
df1 %>% mutate(mtemp=paste0(A,B,C)) %>%
mutate(new_col = as.integer(factor(mtemp, levels = unique(.$mtemp)))-1) %>%
select(-mtemp)
We can use a dummy variable to label the data:
df1 %>% mutate(mtemp = paste0(A,B,C)) %>%
group_by(mtemp) %>% arrange(mtemp) %>% ungroup() %>%
mutate(new_col = c(0,cumsum(lead(mtemp)[-n()] != lag(mtemp)[-1]))) %>% select(-mtemp)
# # A tibble: 8 x 5
# A B C newCol new_col
# <dbl> <dbl> <dbl> <int> <dbl>
# 1 0 0 0 0 0
# 2 0 0 0 0 0
# 3 0 0 1 3 1
# 4 0 1 0 2 2
# 5 0 1 1 5 3
# 6 1 0 0 1 4
# 7 1 1 0 4 5
# 8 1 1 1 6 6
or in reference to this thread:
df1 %>%
mutate(group_id = group_indices(., paste0(A,B,C)))
Explanation on dplyr solutions:
First solution creates a dummy variable by pasting all three desired variables together; in the next step, each group of that dummy var gets a unique id (compare newCol to new_col). Basically if mtemp changes between any two rows, we get True (its numeric value is 1) as the answer of our comparison (lead(mtemp)...) and then cumsum adds it to the previous id generated which eventually results in different ids for different mtemp (combination of A, B, and C). This solution relies on arrangement of the dummy variable and therefore does not address the sorting requirements.
For the other solution, just read up on ?group_indices.
Data:
df1 <- structure(list(A = c(0, 1, 0, 0, 0, 1, 0, 1), B = c(0, 0, 0,
1, 0, 1, 1, 1), C = c(0, 0, 0, 0, 1, 0, 1, 1), newCol = c(0L,
1L, 0L, 2L, 3L, 4L, 5L, 6L)), class = "data.frame", row.names = c(NA,
-8L))