I am having trouble figuring out how to sum rows in a data.table while omitting the values of a certain group in the process.
Let's say I have a data.table of the following form:
library(data.table)
dt <- data.table(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003),
name = c("Tom", "Tom", "Tom", "Tom", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Ann", "Ann", "Ann", "Ann"),
g1 = c(1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1),
g2 = c(1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1),
g3 = c(1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1),
g4 = c(0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1))
setkey(dt, name, year)
where g1-g4 are indicator variables for games in which the players in name participated at time year.
What I want to do is to calculate the number of players for each game NPg1-NPg4 in which both players participated in the focal game, but only if they also played against each other in another game in the same year and this sum should exclude the player for whom it is being calculated.
I get close using this code modified from how to cumulatively add values in one vector in R e.g for NPg1:
dtg1 <- dt[,.SD[(g1==1) & (g2==1 | g3==1 | g4==1)][, NPg1:= sum(g1)], by=year]
This subsets the dt on my conditions and creates the sum, however, the sum includes the focal players. For example NPg1 in year==2000 is 1 for Tom, but it should be 0 because even though he played in g1 he did not play another player in another game in that year. Once I get the sum right, I can then do this for each game and combine the results back into a data.table. The main question is, how can I get the correct sum with these conditions.
The result for NPg1 should look like this
dtg1$NPg1result <- c(0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3)
Any ideas would be greatly appreciated.
After @Mike.Gahan's comment:
This is the sub-result for g1, maybe this does not become very clear form my post. Once I have that correctly I could easily join it back to the full data.table using:
library(plyr)
dt <- join(dt, dtg1)
or other merge/join operations but since my question is mainly concerned with the sub-result I did not want to bother everyone with the rest.
EDIT after @ Ricardo Saportas solution
The full desired result with all the games looks as follows:
dtresult <- data.table(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003),
name = c("Ann", "Ann", "Ann", "Ann", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Tom", "Tom", "Tom", "Tom"),
NPg1 = c(0, 1, 3, 3, 0, 0, 3, 3, 0, 0, 3, 3, 0, 1, 3, 3),
NPg2 = c(0, 0, 2, 3, 0, 0, 2, 3, 1, 0, 0, 3, 1, 0, 2, 3),
NPg3 = c(0, 0, 3, 2, 0, 2, 3, 0, 1, 2, 3, 2, 1, 2, 3, 2),
NPg4 = c(0, 0, 2, 2, 0, 1, 0, 0, 0, 1, 2, 2, 0, 0, 2, 2))