I am new to R and struggling with grouping my dataset. This is an example of the data:
| sample | profile |
|---|---|
| 1 | A |
| 2 | A,B |
| 3 | A,B |
| 4 | A,C |
| 5 | C |
| 6 | A,C |
I am trying to group the profiles so that the same profiles are labelled as the same group:
| sample | profile | profile group/cluster |
|---|---|---|
| genome 1 | A | 1 |
| genome 2 | A,B | 2 |
| genome 3 | A,B | 2 |
| genome 4 | A,C | 3 |
| genome 5 | C | 4 |
| genome 6 | A,C | 3 |
From this, profiles A,B and A,C have been grouped together.
I have tried playing around with these packages
library(tidyverse)
library(janitor)
library(stringr)
dupes <- get_dupes(database, profile)
dupes
ll_by_outcome <- as.data.frame(database %>%
group_by(profile) %>%
add_count())
ll_by_outcome
But these just find duplicates within the sample. I am not sure how to go about this issue. Any help is appreciated!