I have a dataframe df that contains a couple of columns, but the only relevant ones are given below.
node | precedingWord
-------------------------
A-bom de
A-bom die
A-bom de
A-bom een
A-bom n
A-bom de
acroniem het
acroniem t
acroniem het
acroniem n
acroniem een
act de
act het
act die
act dat
act t
act n
I'd like to use these values to make a count of the precedingWords per node, but with subcategories. For instance: one column to add values to that is titled neuter, another non-neuter and a last one rest. neuter would contain all values for which precedingWord is one of these values: t,het, dat. non-neuter would contain de and die, and rest would contain everything that doesn't belong into neuter or non-neuter. (It would be nice if this could be dynamic, in other words that rest uses some sort of reversed variable that is used for neuter and non-neuter. Or which simply subtracts the values in neuter and non-neuter from the length of rows with that node.)
Example output (in a new dataframe, let's say freqDf, would look like this:
node | neuter | nonNeuter | rest
-----------------------------------------
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
To create freqDf$node, I can do this:
freqDf<- data.frame(node = unique(df$node), stringsAsFactors = FALSE)
But that's already all I got; I don't know how to continue. I figured I could do something like this, but unfortunately the ++ operator doesn't work as I had hoped.
freqDf$neuter[grep("dat|het|t", df$precedingWord, perl=TRUE)] <- ++
freqDf$nonNeuter[grep("de|die", df$precedingWord, perl=TRUE)] <- ++
e <- table(df$Node)
freqDf$rest <- as.numeric(e - freqDf$neuter - freqDf$nonNeuter)
Also, this won't work for each node individually. I need some sort of loop that automatically runs for each different value in freqDf$node.