Optimal ambiguity resolving

Question

I am trying to resolve how to handle ambiguities in ANTLR. I need to parse identifiers or identifiers with size prefix correctly. First I came up to this buggy grammar

grammar PrefixProblem;
options       
{   
    language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: LETTER+;
LETTER: 'A'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

I need to handle B as ID, B:B as id B with prefix B. It didn't work.

Then I found two solutions to this problem.

grammar PrefixSolution1;
options       
{   
    language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: (LETTER | B)+;
LETTER: 'A' | 'C'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

In the code above B was removed from a lexer rule and concatenated in id rule.

grammar PrefixSolution2;
options       
{   
    language = Java;
}
goal: PREFIX_VAR;
PREFIX_VAR: (B WSFULL* ':' WSFULL*)? ID;
fragment ID: (LETTER)+;
fragment LETTER: 'A'..'Z' ;
fragment B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

Here I just moved a rule to the lexer.

PrefixSolution1 has the main con that I need to stripe lexer rules into smaller chunks and then concatecate later.

PrefixSolution2: this approach leads that i need always to take an account white space characters which should be ignored.

As for me both solutions will lead to a big mess writing a grammar for a whole language. Is there any other solution? If not, which way is the most optimal one?

All source code is available here

Ok so you will need something like this: "B" and something like this: "B:B", and in AST when "B" occurs you need to identify it as ID and when "B:B" occurs you need to identify it as ID with Prefix? Did I got it right — sm13294, Apr 26 '12 at 11:35

score 1 · Accepted Answer · edited May 23 '17 at 12:01

I wouldn't go with either of them. I'd simply create ID tokens and not B tokens (or create PREFIX_VAR tokens: this belongs in the parser).

You can match a capital B (capB) using a disambiguating semantic predicate¹ in a parser rule like this:

grammar Test;

goal
 : (prefixVar | ID)+ EOF
 ;

prefixVar
 : capB ':' ID 
 ;

capB
 : {input.LT(1).getText().equals("B")}? ID
 ;

ID : LETTER+;
WS : (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

fragment LETTER: 'A'..'Z' ;

which would parse the input B:B B B:C into the following parse tree:

enter image description here

¹ What is a 'semantic predicate' in ANTLR?

score 0 · Answer 2 · answered Apr 26 '12 at 11:42

0

Try this one:

grammar PrefixProblem;


options       
{   
language = Java;
}

 goal: (size_prefix ':')? (id|B);

size_prefix: B;

id: LETTER+;

LETTER: 'A'|'C'..'Z' ;

B: 'B';

WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

answered Apr 26 '12 at 11:42

sm13294

563
7
23

Actually, it has the same con as PrefixSolution1, it requires striping the lexer, which i want to avoid – Overdose Apr 26 '12 at 11:45
Oh yes, sorry i just realized that I came to the same solution as 2nd one. I will try to do something now – sm13294 Apr 26 '12 at 11:47

Optimal ambiguity resolving

2 Answers2