I want to match 2 controls for every case with two conditions:
the
agedifference should between ±2;the
incomedifference should between ±2.
If there are more than 2 controls for a case, I just need select 2 controls randomly.
There is an example:
EXAMPLE
DATA
dat = structure(list(id = c(1, 2, 3, 4, 111, 222, 333, 444, 555, 666,
777, 888, 999, 1000),
age = c(10, 20, 44, 11, 12, 11, 8, 12, 11, 22, 21, 18, 21, 18),
income = c(35, 72, 11, 35, 37, 36, 33, 70, 34, 74, 70, 44, 76, 70),
group = c("case", "case", "case", "case", "control", "control",
"control", "control", "control", "control", "control",
"control", "control", "control")),
row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
> dat
# A tibble: 14 x 4
id age income group
<dbl> <dbl> <dbl> <chr>
1 1 10 35 case
2 2 20 72 case
3 3 44 11 case
4 4 11 35 case
5 111 12 37 control
6 222 11 36 control
7 333 8 33 control
8 444 12 70 control
9 555 11 34 control
10 666 22 74 control
11 777 21 70 control
12 888 18 44 control
13 999 21 76 control
14 1000 18 70 control
EXPECT OUTCOME
For id = 1, the matched controls as below, and I just need select 2 controls randomly in the table below.
| id | age | income | group |
|---|---|---|---|
| 111 | 12 | 37 | control |
| 222 | 11 | 36 | control |
| 333 | 8 | 33 | control |
| 555 | 11 | 34 | control |
For id = 2,the matched controls as below, and I just need select 2 controls randomly in the table below.
| id | age | income | group |
|---|---|---|---|
| 666 | 22 | 74 | control |
| 777 | 21 | 70 | control |
| 1000 | 18 | 70 | control |
For id = 3,there is no matched controls in dat.
For id = 4, the matched controls as below, and I just need select 2 controls randomly in the table below.
One thing to note here is that we can find that the controls for
id = 1andid = 4have overlapping parts. I don't want twocasesto share acontrol, what I need is that ifid = 1choosesid = 111andid = 222ascontrol, thenid = 4can only chooseid = 555ascontrol, and ifid = 1choosesid = 111andid = 333as control, thenid = 4can only chooseid = 222andid = 555as controls.
| id | age | income | group |
|---|---|---|---|
| 111 | 12 | 37 | control |
| 222 | 11 | 36 | control |
| 555 | 11 | 34 | control |
The final output maybe like this(the id in control group is randomly selected from the id that meets the conditions):
| id | age | income | group |
|---|---|---|---|
| 1 | 10 | 35 | case |
| 2 | 20 | 72 | case |
| 3 | 44 | 11 | case |
| 4 | 11 | 35 | case |
| 111 | 12 | 37 | control |
| 222 | 11 | 36 | control |
| 333 | 8 | 33 | control |
| 555 | 11 | 34 | control |
| 777 | 21 | 70 | control |
| 1000 | 18 | 70 | control |
NOTE
I've looked up some websites, but they don't meet my needs. I don't know how to implement my requirements using R code.
Any help will be highly appreciated!
Reference:
1.https://stackoverflow.com/questions/56026700/is-there-any-package-for-case-control-matching-individual-1n-matching-in-r-n
2.Case control matching in R (or spss), based on age, sex and ethnicity?
3.Matching case-controls in R using the ccoptimalmatch package