Is my idea to just concentrate on parsing function and ignoring everything else (i.e. dumping them back) inherently flawed?
It's not inherently flawed. In fact, it is a common approach [Note 1]. However, your solution requires more work.
First, you need to make your lexer more robust. It should correctly identify comments and string literals. Otherwise, you risk matching false positives: apparent tokens hidden in such literals. Ideally you would also identify regexes but that is a lot more complicated since it requires cooperation from the parser, and a simplistic parser such as the one which you propose does not have enough context to distinguish division operators from the start of a regular expression. [Note 2]
You also need to recognize identifiers; otherwise, an identifier which happened to contain the characters function (such as compare_function) would also be a false match.
The problem arises because any cannot contain a FUNCTION token. So if your scanner produces a stray FUNCTION token, the parse will fail.
Also, remember that parentheses and braces are not ANYTHING tokens. Since a program will typically have many parentheses and braces which are not part of a function literal, you will need to add these to your any rules. Note that you don't want to add them as single tokens; rather, you need to add parenthesis-balanced sequences ('(' any ')', for example). Otherwise, you will have a shift-reduce conflict on the '}'. (function(){ var a = { };...: how does the parser know that the } does not close the function body?)
It will probably prove simpler to have two non-terminals, something like this [Note 3]:
any: /* empty */ { $$ = ""; }
| any any_object { $$ = $1 + $2; }
;
any_object
: ANYTHING
| fun
| '(' any ')' { $$ = $1 + $2 + $3; }
| '{' any '}' { $$ = $1 + $2 + $3; }
;
The other issue you have is that whitespace is skipped by your scanner, so your parser will never see it. That means it won't be present in the semantic values so it will be stripped by your transformation. That will break any program which depends on automatic semicolon insertion, as well as certain other constructs (return 42;, for example; return42; is quite different.) You will probably want to recognize whitespace as a separate token, and add it both to your any rules (or the any_object rule above), as well as an optional element in your fun rule between function and ( and between ) and {. (Since whitespace will be included in any, you must not add it beside an any non-terminal; that could cause a reduce-reduce conflict.)
Speaking of automatic semicolon insertion, you would be well-advised not to rely on it in your transformed program. You should put an explicit semicolon after the inserted console.log(...) statement.
Notes
As Ira Baxter points out in a comment, this approach is generally called an "island parser", from the idea that you are trying to find "islands" in an ocean of otherwise uninteresting text. A useful paper which I believe popularized this term is Leon Moonen's 2001 contribution to WCRE, "Generating robust parsers using island grammars". (Google will find you full-text PDFs.) Google will also find you other information about this paradigm, including Ira Baxter's own more pessimistic answer here on SO
This is probably the most serious objection to the basic idea. If you don't want to address it, you'll need to place the following restrictions on regular expressions in the programs you want to transform:
- parentheses and braces must be balanced
- the regular expression cannot contain the string
function.
The second restriction is relatively simple, since you could replace function with the entirely equivalent [f]unction. The first one is more troublesome; you would need to replace /(/ with something like /\x28/.
In your proposed grammar, there is an error because of a confusion about what any represents. The third production for any should not be a duplicate of the fun production; instead, it should allow fun to be added to an any sequence. (Perhaps you just left out any from that production. But even so, there is no need to repeat the fun production when you can just use the non-terminal.)