I'm attempting to join two tables, one is a smaller table with a column of names of common food items (e.g. "Corn", "Peppers", "Squash"...etc...), and the other is a larger table with specific food names (e.g. "Sweet Corn", "Red Corn", "Baby Corn", "Zucchini Squash", "Orange Squash", "Squash , Large"...etc...). I'm trying to join based on a "fuzzy" condition in which I specify to join on the food names and pull the most frequent code in another column of the larger table (the mode) into a new column in the smaller table.
Here is an example of the smaller table:
| Food Name | Food Code |
|---|---|
| Corn | NA |
| Squash | NA |
| Peppers | NA |
Here is an example of the larger table:
| Food Name | Food Code |
|---|---|
| Sweet Corn | 532 |
| Red Corn | 532 |
| Baby Corns | 944 |
| Squash | 111 |
| Long Squash | 123 |
| Red Pepper | 654 |
| Green Pepper | 655 |
| Red Peppers | 654 |
I've tried the "left_join" function from the dplyr package, but this doesn't seem to work that well with the "fuzzy" string join feature. I know that the tidyverse also has a function to find the mode of grouped variables and I was hoping to use that function, but I am unsure how to incorporate that into the left_join statement. I also discovered the fuzzyjoin package in R, but I am not certain if this is the best option.
My desired output would look like:
| Food Name | Food Code |
|---|---|
| Corn | 532 |
| Squash | 111 |
| Peppers | 654 |