I have a data frame users with a column id and country
id country
1 France
2 United States
3 France
I want to add a new column salary which depends on the average salary for a given country.
My first thought was to create a config vector with (country, salary) like this :
salary_country <- c(
"France"=45000,
"United States"=50000,
...)
And then to create the column like this (using dplyr) :
tbl_df(users) %>%
mutate(salary = ifelse(country %in% names(salary_country),
salary_country[country],
0))
It runs like a charm. If the country does not exist in my salary_country vector, the salary is equal to 0 else it's equal to the given salary.
But, it is quite slow on a very large data frame and quite verbose.
Is there a better way to accomplish that ?