I have a big data frame main_df with company_names and several variables. Some of the company_names are misspelled, have typos, or need to be changed otherwise. Therefore, I am creating a vector of unique names, using:
unique_names <- unique(levels(as.factor(main_df$company_name)))
This gives me a vector that looks something like this when seen from the view window view(unique_names):
V1:
Cosmonize Bulgaria Inc.
Crown One Foundation
Institut f�r Luft-und Raumfahrttechnik
Suppose, for instance, that Crown One Foundation changed its name to Crown Two Foundation. In this case, I would hard code the change in main_df for all instances:
main_df$company_name[which(main_df$company_name == "Crown One Foundation")] <- "Crown Two Foundation"
This approach has worked well for all entries except the ones that show a replacement character, like Institut f�r Luft-und Raumfahrttechnik.
I've tried copying the entry from the view window:
main_df$company_name[which(main_df$company_name == "Institut f�r Luft-und Raumfahrttechnik")] <- "Institut fur Luft-und Raumfahrttechnik"
I've also tried to slice out the appropriate cell and used the result: unique_names[100]:
main_df$company_name[which(main_df$company_name == "Institut f\xfcr Luft-und Raumfahrttechnik")] <- "Institut fur Luft-und Raumfahrttechnik"
Neither approach worked. When I refresh unique_names <- unique(levels(as.factor(main_df$company_name))) nothing changes. Interestingly, when I search for Institute in the search window of the view window, the one in question does not appear.
Another idea I had was to work with Encoded. I used Encoding(unique_names[100] to find that it is UTF-8. Using Encoding(unique_names[100] <- 'latin1' changed the entry in the view window to Institut für Luft-und Raumfahrttechnik.
However, upon refreshing the unique entries using unique_names <- unique(levels(as.factor(main_df$company_name))), the entry is not updated.
Even then,
main_df$company_name[which(main_df$company_name == "Institut für Luft-und Raumfahrttechnik")] <- "Institut fur Luft-und Raumfahrttechnik"
doesn't lead to a change either (removing the umlaut here).
Am I looking at this the wrong way? I know there is a lot of hard coding and I've changed all entries besides the ones with the replacement character. Therefore, I don't want to change the Encoded properties for the entire vector but rather change these few dozen entries manually.
Thanks a lot in advance. I don't have a package preference and would appreciate any help.
Edit: Upon request, here is the part of the output for dput(unique_names):
c("Aalborg University", "Aalto University", "Aarhus University", "ACDVE", "Aero LLC", "AgilitySpaceCorp", "Air Force Research Laboratory (AFRL), "Airbus")
Here is dput(head(main_df$company_name)):
c("Aalborg University", "Aalborg University", "Aalborg University", "Aalborg University", "Aalborg University", "Aalborg University")