I have a large dataset of addresses that I plan to geocode in ArcGIS (Google geolocating is too expensive). Examples of the addresses are below.
9999 ST PAUL ST BSMT
GARRISON BL & BOARMAN AVENUE REAR
1234 MAIN STREET 123
1234 MAIN ST UNIT1
ArcGIS doesn't recognize addresses that include units and other words at the end. So I want to remove these words so that it looks like the below.
9999 ST PAUL ST
GARRISON BL & BOARMAN AVENUE
1234 MAIN STREET
1234 MAIN ST
The key challenges include
STis used both to abbreviate streets and indicate "SAINT" in street names.- Addresses end in many different indicators such as
STREETandAVENUE - There are intersections (indicated with
&) that might include indicators likeSTandAVENUEtwice.
Using R, I'm attempting to apply the sub() function to solve the problem but I have not had success. Below is my latest attempt.
sub("(.*)ST","\\1",df$Address,perl=T)
I know that many questions ask similar questions but none address this problem directly and I suspect it is relevant to other users.