Regex/grep strings containing us currency

Question

I have a list of strings, some of which contain dollar figures. For example:

'$34232 foo    \n  bar'

is there an [r] command that can return to me only the strings which contain dollar amounts in them?

Thank you!

Already answered here, more or less: http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex — Richard A., Jan 04 '13 at 15:11
by the way, if you are thinking of your example as "a list of strings" (it's not; it's a length-1 character vector) you may want to use `strsplit(z,"[[:space:]]+")[[1]]` to convert it to a character vector. — Ben Bolker, Jan 04 '13 at 16:23

score 4 · Accepted Answer · edited May 23 '17 at 11:50

Use \\$ to protect the $ which otherwise means "end of string":

   grep("\\$[0-9]+",c("123","$567","abc $57","$abc"),value=TRUE)

This will select strings that contain a dollar sign followed by one or more digits (but not e.g. $abc). grep with value=FALSE returns the indices. grepl returns a logical vector. One R-specific point is that you need to specify \\$, not just \$ (i.e. an additional backslash is required for protection): \$ will give you an "unrecognized escape" error.

@Cerbrus's answer, '\\$[0-9,.]+', will match slightly more broadly (e.g. it will match $456.89 or $367,245,100). It will also match some implausible currency strings, e.g. $45.13.89 or $467.43,2,1 (i.e. commas should be allowed only for groupings of 3 digits in the dollars segment; there should be only one decimal point separating dollars and cents). Both of our answers will (incorrectly?) match $45abc. If you're lucky, your data don't have contain any of these tricky possibilities. Getting this right in general is hard; the answer referred to in the comments ( What is "The Best" U.S. Currency RegEx? ) tries to do this, and as a result has significantly more complex answers, but could be useful if you adapt the answers to R by protecting $ appropriately.

If all you are doing is `grep`, i.e. looking for a match, then you don't need the `+` because if there is 1 number after the `$` then there is at least one. Leaving out the `+` could possibly speed things up (though the speedup may be too small to care about unless these are very long vectors). The `+` is important for substitutions or extracting the number. — Greg Snow, Jan 04 '13 at 16:09

Cerbrus · Answer 2 · 2013-01-04T15:47:17.260

3

Sure there is:

'\\$[0-9,.]+'

\\$ //Dollar sign
[0-9,.]+ // One or more numbers, dots, or comma's.

edited Jan 04 '13 at 15:47

answered Jan 04 '13 at 15:15

Cerbrus

70,800
18
132
147

In R, one would have to use double `\\`. – Roman Luštrik Jan 04 '13 at 15:38

Regex/grep strings containing us currency

2 Answers2

Linked