I am trying to resolve how to handle ambiguities in ANTLR.
I need to parse identifiers or identifiers with size prefix correctly.
First I came up to this buggy grammar
grammar PrefixProblem;
options
{
language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: LETTER+;
LETTER: 'A'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
I need to handle B as ID, B:B as id B with prefix B. It didn't work.
Then I found two solutions to this problem.
grammar PrefixSolution1;
options
{
language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: (LETTER | B)+;
LETTER: 'A' | 'C'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
In the code above B was removed from a lexer rule and concatenated in id rule.
grammar PrefixSolution2;
options
{
language = Java;
}
goal: PREFIX_VAR;
PREFIX_VAR: (B WSFULL* ':' WSFULL*)? ID;
fragment ID: (LETTER)+;
fragment LETTER: 'A'..'Z' ;
fragment B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
Here I just moved a rule to the lexer.
PrefixSolution1 has the main con that I need to stripe lexer rules into smaller chunks and then concatecate later.
PrefixSolution2: this approach leads that i need always to take an account white space characters which should be ignored.
As for me both solutions will lead to a big mess writing a grammar for a whole language. Is there any other solution? If not, which way is the most optimal one?
All source code is available here
