Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new syntax for negated condition lists. #200

Open
cshorler opened this issue Apr 8, 2018 · 9 comments
Open

Add new syntax for negated condition lists. #200

cshorler opened this issue Apr 8, 2018 · 9 comments

Comments

@cshorler
Copy link

cshorler commented Apr 8, 2018

inclusive_condition.re.txt

inclusive_condition.c.txt

I've implemented a lexer using re2c that mirrors flex's use of a condition stack, it works well with a couple workarounds. I'd like a bit of clarity on the default rule and direct goto as below (I'm using this to implement inclusive conditions).

<STMT> ";" { --YYCURSOR; yy_pop_state(yyscanner); }
<STMT> * :=> P3

results in code like this:

p1cSTMT:
        yych = *YYCURSOR;
        switch (yych) {
        case ';':       goto p135;
        default:        goto p133;
        }
p133:
        ++YYCURSOR;
        YYSETCONDITION(yycP3);
        goto p1cP3;
p135:
        ++YYCURSOR;
#line 30 "inclusive_condition.re"
        { --YYCURSOR; yy_pop_state(yyscanner); }

In the case of the default action (goto p133) the cursor is advanced (in my case, this means the ";" is eaten the P3 state, rather than the p135 label ever getting executed. This, because the P3 state also considers the ";" character.

My workaround is to customise to goto statement
re2c:cond:goto = '--YYCURSOR; goto @@;' ;

If the above behaviour is the defined behaviour, then perhaps a warning should be issued that the rule <STMT> ";" is unreachable? otherwise, is there a bug to fix?

@skvadrik
Copy link
Owner

skvadrik commented Apr 8, 2018

Hi Chris, I think that you should just use empty string "" instead of default rule *:

<STMT> "" :=> P3

This way the rule says exactly that what you're trying to achieve: if none of the input could be matched, go to some other condition and resume from there. Note that you can use trailing context if you want to match some input without consuming it, as in http://re2c.org/examples/example_05.html at line 35:

<init> "" / [1-9]         :=> dec

If the above behaviour is the defined behaviour, then perhaps a warning should be issued that the rule ";" is unreachable?

Unreachable rules are those that can never be triggered (on no input string). In your case, <STMT> ";" is reachable because it is triggered if you enter lex with yy_top_state(yyscanner) equal to yycSTMT and the character pointed to by YYCURSOR is ;. Or did I misunderstand your idea?

@cshorler
Copy link
Author

cshorler commented Apr 8, 2018

(edited)
Regarding unreachable - I was thinking this: with YYCURSOR pointing to the x of the input "x;z". yych will be "x", then the default clause under p133 above increments YYCURSOR to point at ";" and goto/jump p1cP3. Then the ";" is consumed by <P3> SPECIAL Of course, it would be unlikely for the token before ";" to be just a single character - and then the beginning of the token will be discarded character at a time by the STMT default rule because sacrificing the first character prevents a match in P3... then I end up looping in STMT until the condition "x;z" and then loosing the character as outlined earlier. This was really why I asked if it's defined behaviour - because I was trying to envisage the use case where you'd want to discard a character at a time and always perform an unconditional goto.

I tried the empty string - I've reproduced re2c's output below (following comment). -- all warnings are enabled.

@cshorler
Copy link
Author

cshorler commented Apr 8, 2018

first attempt - not sure if this was a success or not (line 900 is a catchall for bad input)

re2c: warning: line 556: rule in condition 'STMT' matches empty string [-Wmatch-empty-string]
re2c: warning: line 556: unreachable rule in condition 'STMT' (shadowed by rules at lines 555, 798, 900) [-Wunreachable-rules]
line 555:  <STMT> ";" { --YYCURSOR; yy_pop_state(yyscanner); continue; }
line 556:  <STMT> "" :=> P3
...
line 798   <*> WS+ { continue; }
....
line 900:  <*> * { --YYCURSOR; yyerror("unregonised input: '%.*s'\n", yylineno, 20, YYCURSOR); }

@skvadrik
Copy link
Owner

skvadrik commented Apr 8, 2018

It's because of the rule <*> *: it is merged to all conditions, and it matches longer than "", therefore rule <STMT> "" is shadowed and becomes unreachable (the other rule <*> WS+ is reported because part of the shadowing matches correspond to this rule). You'll need to specify conditions for default rule explicitly.

As for -Wmatch-empty-string warning, re2c gives it because sometimes nullable rule is unintentional. You can disable it with -Wno-match-empty-string, or just ignore it.

@cshorler
Copy link
Author

cshorler commented Apr 8, 2018

okay thanks - FYI, I updated the other comment - because I misread your first response slightly. I'll try the above now

@cshorler
Copy link
Author

cshorler commented Apr 8, 2018

confirmed that works - thanks!

It's unfortunate there's no syntax for exclusionary conditions
e.g. there is the wildcard
<*> *
and a dedicated listing

<P1,P3,COMMENT,SEARCH_SCOPE,ID_CUR_SCOPE,ENUM_ID_LIST,ENUM_REF,
     ENTITY_REF,DOT_ATTR,CONSTANT,LOCAL,INV_FOR,SUBTYPE_CONS> * ...

it would be useful to be able to do
<* - P2,STMT> * { }
(a kind of set deduction / difference glob)

@skvadrik
Copy link
Owner

skvadrik commented Apr 8, 2018

Regarding unreachable - I was thinking this: with YYCURSOR pointing to the x of the input "x;z". yych will be "x", then the default clause under p133 above increments YYCURSOR to point at ";" and goto/jump p1cP3. Then the ";" is consumed by SPECIAL

If you use <STMT> "" instead of <STMT> *, the default clause won't consume anything and P3 will still see x before ;, so the erroneous scenario above won't happen, right?

By the way, I just noticed that you use rules like this:

    <P3> 'END_' ('SCHEMA' | 'PART') { yy_pop_state(yyscanner); }

This is incorrect because you have a fallthrough at the end of rule: re2c does not automatically generate any goto here (it only happens with the special shortcut rules :=>, but not in the general case). You can make sure by looking at the generated code. Change it to something like:

    <P3> 'END_' ('SCHEMA' | 'PART') { yy_pop_state(yyscanner); goto start; }

Or better, if you know the right condition, to something like:

    <P3> 'END_' ('SCHEMA' | 'PART') { yy_pop_state(yyscanner); goto p1cP3; }

Condition syntax is a bit deceiving --- despite the way it looks, it's only semi-automated and you still have to write most of the code yourself. This is because in each rule the programmer usually knows the next condition, and the shortcut goto next is more efficient than if re2c generated a uniform goto start at the end of the rule, which would go the long way through the initial condition dispatch.

@skvadrik
Copy link
Owner

skvadrik commented Apr 8, 2018

it would be useful to be able to do <* - P2,STMT> * { }

Yes, and not too hard to implement (the hardest bit is deciding on the syntax). I'll change the name of the issue and leave it open as a feature request for now.

@skvadrik skvadrik changed the title :=> condition (with a stack) Add new syntax for negated condition lists. Apr 8, 2018
@cshorler
Copy link
Author

cshorler commented Apr 8, 2018

okay, that sounds great - a new feature, thanks!

and FYI - regarding the missing continue / return... in fact I already spotted it
now everything ends with continue or return depending on the lexical step (it's a 3 pass lexer)

<P2> 'END_' ('SCHEMA'|'FUNCTION'|'PROCEDURE'|'RULE'|'ENTITY'|'TYPE') / BREAK {
    scope_pop(yyextra->scope_stack);
    continue;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants