Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Raw-Text Nodes: Support Alternative Syntaxes #40

Closed
8 tasks done
tajmone opened this issue Aug 21, 2022 · 11 comments
Closed
8 tasks done

Fix Raw-Text Nodes: Support Alternative Syntaxes #40

tajmone opened this issue Aug 21, 2022 · 11 comments
Labels
💎 raw-text PML Nodes » raw-text ⭐ PML syntax Topic: PML syntax definition for ST4

Comments

@tajmone
Copy link
Owner

tajmone commented Aug 21, 2022

As it turned out (see Discussion #39) all PML raw text block-nodes accept multiple syntaxes, which are not documented in the PML Reference Guide (only in the PDML documentation).

So I'll have to fix all currently implemented raw-text nodes to ensure they correctly support all three variation:

As for the remaining raw-text nodes, I'll implement them with all three variations right away when I'll add them to the syntax:

  • [input
  • [output

References

@tajmone tajmone added ⭐ PML syntax Topic: PML syntax definition for ST4 💎 raw-text PML Nodes » raw-text labels Aug 21, 2022
@tajmone tajmone pinned this issue Aug 21, 2022
@tajmone
Copy link
Owner Author

tajmone commented Dec 5, 2022

Other Nodes too!

Remember that the four nodes mentioned above (i.e. the "raw nodes") are not the only one that need to support these alternative syntaxes; as mentioned in #39:

they apply to all nodes with is_raw_text_block=true

So this includes also the [table_data node, and possibly others. E.g. from the documentation:

[table_data (halign="C,L,R")
    ~~~
    Position, Product, Price
    -
    1, Organic food, 12.50
    2, Meditation lessons, 150.00
    -
    ,,Total: 162.50
    ~~~
]

So, right now I have no idea of how many other nodes exactly support these three alternative syntaxes, beyond those explicitly mentioned in #39, but I'll only need to check the JSON tags file to find out — by the time I've finished with those four nodes there'll probably be newer raw nodes introduced in the syntax.

Top-Priority Fix!

This is going to be a serious setback for the syntax development, because unless all three syntaxes are correctly supported and handled the editor will fail whenever it encounters them, breaking up highlighting for the rest of document.

I've already experienced this when working on the source files of the official PML documentation, which freely use all different syntaxes, and the results are disastrous: you simply can't work on the document, all editing functionality breaks.

So even though "in theory" only the "official variation" should be used, in practice it doesn't work like that since we encounter all variations in the official docs, as well as their examples.

Implementing correctly all these nodes should take higher precedence over implementing other more common nodes right now, because unimplemented "simpler" nodes don't break the document, they are just highlighted as "unknown" node, which doesn't affect editing functionality.

On the other hand, until these alternative syntaxes are supported there will always be the (quite real) risk that a valid PML document will cause Sublime PML to break its internal state, rendering it useless (especially when working with the official PML docs).

@pdml-lang
Copy link

I'll have to fix all currently implemented raw-text nodes to ensure they correctly support all three variation:

Please note that the Text Block Syntax has been deprecated and will be removed in an upcoming major version.

Hence, you only have to support the two remaining syntaxes: Delimited Text Syntax and Standard Text Syntax.

I have no idea of how many other nodes exactly support these three alternative syntaxes, beyond those explicitly mentioned in #39, but I'll only need to check the JSON tags file to find out

Yes, the JSON tags file is and will always be a reliable way to find out which PML nodes are of type raw_text. There are currently no plans to add new nodes of type raw_text, but in the future such nodes will very probably be added.

the source files of the official PML documentation, which freely use all different syntaxes

The Text Block Syntax will no more be used in the next major version of the PML documentation.

@tajmone
Copy link
Owner Author

tajmone commented Dec 7, 2022

Please note that the Text Block Syntax has been deprecated and will be removed in an upcoming major version.

Thanks for pointing it out! it had slipped by me, so it's good to know.

Also, it's good to have less variations in terms of editors support because it somewhat reduces their implementation complexity.

NOTE: The currently implemented raw nodes are all in the Text Block Syntax variation.

Hence, you only have to support the two remaining syntaxes: Delimited Text Syntax and Standard Text Syntax.

Any estimate on when the next MAJOR version would be?

If we're talking a couple of months then it makes sense skipping the deprecated syntax, but if it's longer than that it might be worth implementing/keeping it until the next MAJOR because of the document-wrecking consequences of its absence — but only if it's not too much work to keep them all!

It's hard to estimate how complex their branching implementation is going to be until I get to work on them. The devil is always in the detail in these cases. If the documentation is correct (i.e. if there are no parsing edge cases that contradict it) then the main branching point is between the Standard Text Syntax and the other two, since the former differs from the latter(s) in its inception line:

When this syntax variation is used, the text must start on the same line as the node, just after the node's name.

The Text Block Syntax is/was then identified by the absence of the fence delimiters, but other than that it is/was a close variation of the Delimited Text Syntax.

In practical terms, when it comes to editor syntaxes with one-line RE-based definitions, the three syntaxes branched out as:

    ___ Standard Text
___/    ___ Delimited Text
   \___/
       \___ Text Block

whereas by dropping the Text Block Syntax we'll be now left with just a branching point:

    ___ Standard Text
___/
   \___ Delimited Text

I think this can be done without backtracking, since the branching criteria is whether the opening tag is immediately followed by contents or a new line — which means that the branching decision can be resolved within a single source line. If this is the case, then most TexMate like editors should be able to support this feature, not just ST4 — i.e. unless there are complications involved, e.g. due to edge cases, attributes, etc.

@pml-lang
Copy link
Collaborator

pml-lang commented Dec 7, 2022

Any estimate on when the next MAJOR version would be?
If we're talking a couple of months then it makes sense skipping the deprecated syntax

Today I posted the list of planned breaking changes in the next major version (and sent you an email too).
After agreeing on these changes I will start implementing version 4.0.0 (and fix all currently known bugs).
Right now I'm busy with a lot of end-of-year-tasks (not related to PML), but version 4.0.0 should be available in february.

the branching criteria is whether the opening tag is immediately followed by contents or a new line

The branching criteria implemented in the PDML parser is as follows: If the node name is followed by (optional) spaces and/or tabs, followed by a new line character, then the 'Delimited Text Syntax' is used, otherwise it's the 'Standard Text Syntax'.

@tajmone
Copy link
Owner Author

tajmone commented Dec 7, 2022

If the node name is followed by (optional) spaces and/or tabs, followed by a new line character

that's what I had in mind with "edge cases" here. But what about attributes, don't some of these nodes also support attributes groups? If yes, these need to be added to the equation too.

I think that to handle these the best approach is a lookahead RegEx, just to ensure that a single RE is able to discern between the two possible branches at once. Once you know exactly how to branch, the rest can be handled fairly straight forward.

But, as always, things are easier said than done, because of the pervasive nature of some nodes, e.g. comments and constants, which can basically occur anywhere since they are handled by the preprocessor. So it's important that the branching lookahead RE doesn't mismatch due to a comment or a constant.

Pre-processor nodes are tricky to handle because it's hard to pinpoint and foresee all their possible occurrences. During my side tests, I noticed that there's a wide margin in their real use, leaving us with 100% valid PML sources that are tricky to highlight correctly by the editor. Surely, in most cases end users won't end up resorting to such exotic uses of comments and constants, but it's still a possibility within the realm of valid documents.

@pml-lang
Copy link
Collaborator

what about attributes, don't some of these nodes also support attributes groups?

Yes. I should have said: If the node name (and optional attributes) is followed by (optional) spaces and/or tabs, followed by a new line character

@pml-lang
Copy link
Collaborator

the Text Block Syntax has been deprecated and will be removed in an upcoming major version.

Done in version 4.0.0.

@tajmone
Copy link
Owner Author

tajmone commented Feb 24, 2023

Almost Done!

@pml-lang, I've almost finished implementing both Standard and Delimited Text Syntax for the raw-nodes currently supported by Sublime PML (i.e. only [code and [html at the moment, since first I wanted to come up with a strategy/pattern to handle them, which I will now be able to replicate for the remaining nodes).

Support for Delimited Text Syntax has already been committed to master, so you should receive the automatic update next time you launch Sublime Text.

As for the Standard Text Syntax, I've completed its implementation for the [code node (on dev branch), and once I've finished working on the [html node I'll merge the changes to master — just wanted some room for last-minute changes, since I can rewrite history on dev but not on master (the latter would break the package for end users).
Right now I'm too tired to continue, so I'll finish the work in the next of couple of days — and this would be the final step to migrate Sublime PML to PML 4 (i.e. in terms of its currently supported features).

Anyhow, just wanted to let you know that handling both syntaxes turned out to be easier than expected, and didn't require context branching (so I assume it should doable on any TextMate-like editor too).

@pml-lang
Copy link
Collaborator

GREAT!

you should receive the automatic update next time you launch Sublime Text

Yes, I just tried it out and the [code block is now correctly highlighted.

@tajmone
Copy link
Owner Author

tajmone commented Feb 24, 2023

Yes, I just tried it out and the [code block is now correctly highlighted.

Unfortunately until I merge the fixes for the Standard Syntax all non-fenced raw blocks will fail.

I was hoping to finish it tonight, but it's too late — the problem isn't tweaking the syntax, really, but updating the syntax tests, which need to cover all possible combinations, in order to catch any bugs before merging (if the syntax fix takes an hour, updating the tests takes two or three hours 😢).

@tajmone
Copy link
Owner Author

tajmone commented Feb 24, 2023

Done!

Now Sublime PML correctly supports alternative Text Syntaxes in raw-text nodes [code and [html. Changes have been merged to master and will now be available via packages auto-update.

I've had a chance to further polish and optimize the contexts to handle the dual syntax support, which I now only need to replicate on the remaining raw-nodes which are missing in Sublime PML (i.e. [input and [output) — if I find the time, I'll work on them during the weekend.

@tajmone tajmone closed this as completed Feb 24, 2023
tajmone added a commit that referenced this issue Mar 3, 2023
Implement `[input` and `[output` raw-text
block nodes (see #40).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💎 raw-text PML Nodes » raw-text ⭐ PML syntax Topic: PML syntax definition for ST4
Projects
None yet
Development

No branches or pull requests

3 participants