I am trying to perform a (simplified!) query on colour and date differences on the following two databases (extract):
A B
A.COL A.TIME B.COL B.TIME
1 blue 2009-01-31 1 blue 2007-01-31
2 blue 2009-02-28 2 blue 2008-12-31
3 blue 2009-03-31 3 blue 2009-02-28
4 blue 2009-04-30 4 blue 2009-04-30
5 blue 2009-05-31 5 blue 2009-06-30
6 blue 2009-06-30 6 blue 2016-08-31
7 blue 2016-03-31
8 blue 2016-04-30
9 red ...
10 red ...
What I want to do: Merge the tables based on COL as well as the difference in TIME, that is the difference betweeen both times must not be larger or lower than 2 month (or in other words lie between -2 and +2, depending from which date one starts with).
# For example starting with observation 1 from A, that would imply 2 matches:
2009-01-31 matched to 2008-12-31 (diff = 1)
2009-01-31 matched to 2009-02-28 (diff = -1)
# for obs 2 from A, that would imply
2009-02-28 matched to 2008-12-31 (diff = 2)
2009-02-28 matched to 2009-02-28 (diff = 0)
2009-02-28 matched to 2009-04-30 (diff = -2)
etc.
I was thinking about some kind of Date difference function, either from lubridate, which was problematic in cases of months with less than 30 days and sometimes made NAs, or with as.yearmon from zoo, which worked to correctly calculate the differences at least. However, I wasn't able to implement this into sqldf properly (Error: error in statement: near "as": syntax error). The reason seems to be that one cannot use every R function withing sqldf.
Any ideas how it can be done in R? I was also looking for an elegant way of how to subtract months from each other. There is this issue with lubridate:
Add/subtract 6 months (bond time) in R using lubridate, but here was one proposed way how to accomplish it with zoo: Get the difference between dates in terms of weeks, months, quarters, and years
Get the data (thanks @bouncyball below for the code):
A <- read.table(
text = "
A.COL A.TIME
blue 2009-01-31
blue 2009-02-28
blue 2009-03-31
blue 2009-04-30
blue 2009-05-31
blue 2009-06-30
blue 2016-03-31
blue 2016-04-30
", header = T, stringsAsFactors = FALSE)
B <- read.table(
text = "
B.COL B.TIME
blue 2007-01-31
blue 2008-12-31
blue 2009-02-28
blue 2009-04-30
blue 2009-06-30
blue 2016-08-31
", stringsAsFactors = FALSE, header = T)