I've got a data frame df (+/- 331000 observations with 4 variables) with Date (range in format = "%Y-%m-%d"), ID (factor with 19 levels), Station (factor with 18 levels), and Presence (1/0).
The data frame is setup in such a way that there's a range of dates (over an almost three year period) for each ID at each Station, and whether a person was present (1/0) on a particular day at a particular Station.
If one would subset/filter the df according to a day and ID, you'd get the following dataset (I'll refer to this from now on as 'group'):
filter(df, Date == "2016-01-03" & ID == "Fred")
Date ID Station Presence
<date> <fct> <fct> <dbl>
2016-01-03 Fred Station1 0
2016-01-03 Fred Station2 0
2016-01-03 Fred Station3 0
2016-01-03 Fred Station4 1
2016-01-03 Fred Station5 0
2016-01-03 Fred Station6 0
2016-01-03 Fred Station7 0
2016-01-03 Fred Station8 0
2016-01-03 Fred Station9 0
2016-01-03 Fred Station10 0
2016-01-03 Fred Station11 0
2016-01-03 Fred Station12 0
2016-01-03 Fred Station13 0
2016-01-03 Fred Station14 0
2016-01-03 Fred Station15 0
2016-01-03 Fred Station16 0
2016-01-03 Fred Station17 0
2016-01-03 Fred Station18 0
I would like to remove rows from the group if the following conditions are met:
For each group, if df$Presence == 1, remove rows with df$Presence == 0 (it is possible to have rows with multiple df$Presence == 1 within one group, e.g. Fred was at Station4, Station9 and Station 15 on 2016-01-06). But if there are no rows with df$Presence == 1 within the group, don't remove any of the rows (so I can't simply remove all the df$Presence == 0 rows).
The above group would thus become:
Date ID Station Presence
<date> <fct> <fct> <dbl>
2016-01-03 Fred Station4 1
However, the following group would stay as it is (as there are no Presence == 1 within the group):
filter(df, Date== "2016-01-03" & ID == "Mark")
Date ID Station Presence
<date> <fct> <fct> <dbl>
2016-01-03 Mark Station1 0
2016-01-03 Mark Station2 0
2016-01-03 Mark Station3 0
2016-01-03 Mark Station4 0
2016-01-03 Mark Station5 0
2016-01-03 Mark Station6 0
2016-01-03 Mark Station7 0
2016-01-03 Mark Station8 0
2016-01-03 Mark Station9 0
2016-01-03 Mark Station10 0
2016-01-03 Mark Station11 0
2016-01-03 Mark Station12 0
2016-01-03 Mark Station13 0
2016-01-03 Mark Station14 0
2016-01-03 Mark Station15 0
2016-01-03 Mark Station16 0
2016-01-03 Mark Station17 0
2016-01-03 Mark Station18 0
I've thought of starting with
df %>% group_by(Date, ID) %>%
However, I don't know how to proceed from there. I don't know how to deal with the contrasting conditions.