Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature requiring: Adding source mapping information during macro expansion. #900

Open
ufo5260987423 opened this issue Jan 6, 2025 · 7 comments

Comments

@ufo5260987423
Copy link

ufo5260987423 commented Jan 6, 2025

Hey, I don't know whether I talk this feature in Chez's tongue, but scheme-langserver really wants source mapping information during expansion. Here, source mapping information I mean is: after macro expansion, the result expression, its all elements could be addressed their correspondence with the origin s-expression before expansion.

An example may help understanding. In following code:

;;a self-defined try-except macro here is used to handle possible exceptions
(try 
  ;;some works
  todo 
  ;;to catch exception c
  (except c 
  ;; a branch to handle
    [else c]))

many programmers want to know the c in handling branch is relating to the c exception after except. It's called "goto-definition" in LSP(Language Server Protocol). You may find the definition of try-except macro in the end of this post, apparently it requires analysis on addressing correspondence between symbols before and after expansion.

And here I want to talk more about my detail requirements:

  1. I wrote a "fake" expander which totally skipped the expanding process. Within syntax-rules and syntax-case, I built a chain from pattern to template to expansion by pure matching method, and apparently, it works buggy and slowly.
  2. Someone advised me to hack Chez's s/syntax.ss, or refer racket7. It occurs to me whether Chez Scheme's community could help.

Above is a requirement, not an order or some rude things. If community could help, scheme-langserver and maybe many programmers would be helped. If not, we still gratefully thank you for your work.

Happy new year.

try... except macro and macro expansion

The try ... except macro is usually defined as following:

(define-syntax try
  (lambda (x)
    (syntax-case x (except)
      [(try body0 body1 ... (except condition clause0 clause1 ...))
       #`((call/1cc
       (lambda (escape)
         (with-exception-handler
           (lambda (c)
         (let ([condition c])     ;; clauses may set! this
           #,(let loop ([first #'clause0] [rest #'(clause1 ...)])
               (if (null? rest)
               (syntax-case first (else =>)
                 [(else h0 h1 ...) #'(escape (lambda () h0 h1 ...))]
                 [(tst) #'(let ([t tst]) (if t (escape (lambda () t)) (raise c)))]
                 [(tst => l) #'(let ([t tst]) (if t (escape (lambda () (l t))) (raise c)))]
                 [(tst h0 h1 ...) #'(if tst (escape (lambda () h0 h1 ...)) (raise c))])
               (syntax-case first (=>)
                 [(tst) #`(let ([t tst]) (if t (escape (lambda () t)) #,(loop (car rest) (cdr rest))))]
                 [(tst => l) #`(let ([t tst]) (if t (escape (lambda () (l t))) #,(loop (car rest) (cdr rest))))]
                 [(tst h0 h1 ...) #`(if tst (escape (lambda () h0 h1 ...)) #,(loop (car rest) (cdr rest)))])))))
           (lambda ()
         ;; cater for multiple return values
         (call-with-values
             (lambda () body0 body1 ...)
           (lambda args
             (escape (lambda ()
                   (apply values args))))))))))])))

And the expansion of first code, is usually like following:

((call/1cc
   (lambda (escape)
     (with-exception-handler
       (lambda (c) (let ([c c]) (escape (lambda () c))))
       (lambda ()
         (call-with-values
           (lambda () todo)
           (lambda args (escape (lambda () (apply values args))))))))))
@soegaard
Copy link

soegaard commented Jan 6, 2025

Just a few questions to help interpret the question.

... but scheme-langserver really wants source mapping information during expansion.

Why do you need this information during expansion?
Part of the expansion process it to reveal binding information.

It's called "goto-definition" in LSP(Language Server Protocol).
Does this mean the overall goal is to implement 'goto-definition' for LSP?

If so, that seems a more approachable question.

To reveal binding information for a program written in Chez Scheme you need to run the expander.
The resulting syntax object does contain binding information, but you need to traverse it to
find "original identifiers".

If Chez Scheme could output a "source map" that would help you in implementing "goto-definition".

Note: If you want "goto-definition" to work while the program is "broken" (i.e. can't be expanded due to
the user editing it), then you need to keep track of older information - and how the program has changed.
This is somewhat tricky. In racket-mode a separate Racket instance (running in the background)
is used to expand program files and to keep cross references up to date.
Maybe you can take a similar approach?

https://github.com/greghendershott/racket-mode/

@ufo5260987423
Copy link
Author

@soegaard

Scheme-langserver now has "goto-definition", but it can't work for self-defined macros which may introduce new identifiers during expansion. And this is why I put above example.

but you need to traverse it to find "original identifiers"

As my understanding, your suggestion is to build a chain like macro callee-> pattern -> template -> expansion. And the last step compounds complex comparison and tree traverse. I call them "source mapping information" and I truely did this.

I issue this feature mainly because of the efficiency problem. Nested macros usually cause undecidable(if I use this word correctly) time-consuming, because "goto-definition" requiring an abstract interpreter evaluates all expansions and backwarding identifier claiments from expansion to macro callee , which interleaves macro expanding.

I can't say, oh, my code didn't work buggy. But I can say, directly give me source mapping information may helped a lot.
Chez scheme can directly generate "source mapping information" without traversal, and pattern variable would be directly insert into expansion. This may be much more efficiency than our current conversation.

@soegaard
Copy link

soegaard commented Jan 7, 2025

As my understanding, your suggestion is to build a chain like macro callee-> pattern -> template -> expansion.

No I am suggesting that you feed the entire file (module) into the expander and get a fully expanded module back.
The identifiers in the fully expanded module contain both source location information and binding information.

@ufo5260987423
Copy link
Author

I'm not sure, do you means:

> (expand '(try c (except c [else c])))
((#2%call/1cc
   (lambda (#{escape lge2hkfkseab2xecwg4knq562-5})
     (#2%with-exception-handler
       (lambda (#{c lge2hkfkseab2xecwg4knq562-6})
         (let ([#{c lge2hkfkseab2xecwg4knq562-7} #{c lge2hkfkseab2xecwg4knq562-6}])
           (#{escape lge2hkfkseab2xecwg4knq562-5}
             (lambda () #{c lge2hkfkseab2xecwg4knq562-7}))))
       (lambda ()
         (#2%call-with-values
           (lambda () c)
           (lambda #{args lge2hkfkseab2xecwg4knq562-8}
             (#{escape lge2hkfkseab2xecwg4knq562-5}
               (lambda ()
                 (#2%apply
                   #2%values
                   #{args lge2hkfkseab2xecwg4knq562-8}))))))))))

How could I know the c after except is available to branch?

@soegaard
Copy link

soegaard commented Jan 7, 2025

Note: I know the specifics for Racket, but only the general workings of Chez Scheme,
so I hope others will give you some specific guidance.

How could I know the c after except is available to branch?

Your example was (with indices added):

(expand '(try c_0 (except c_1 [else c_2])))

Now you want to know something about c_2.
That is c_2 must be located in the expansion.

However, in your example you are handing expand a raw s-expression
without any source location information. This means you can
distinguish between c_0, c_1 and c_2.

So the first task is to add source locations to the input.

Use annotations to add source locations:

https://www.scheme.com/csug8/syntax.html#g103

@ufo5260987423
Copy link
Author

No, you don't understand what I mean.

(expand '(try c_0 (except c_1 [else c_2])))

I want to know c_1 claims an identifier, and c_2 call it. And, I can only know this by recognizing #{c lge2hkfkseab2xecwg4knq562-7} refers c_1.

In above process, annotation works nothing.

@soegaard
Copy link

soegaard commented Jan 8, 2025

No, you don't understand what I mean.
That's indeed possible.

(expand '(try c_0 (except c_1 [else c_2])))

I want to know c_1 claims an identifier, and c_2 call it. And, I can only know this by recognizing #{c lge2hkfkseab2xecwg4knq562-7} refers c_1.

I see that as step 2.

In the source code both c_1 and c_2 are just "c". So the first step must be to find out what
identifiers in the fully expanded module correspond to the names in the source program.
This is solved by adding annotations.

In step 2 you want to examine the binding information in the fully expanded module.
To do this, you can use free-identifier=? and bound-identifier=?.
Whether these operations are enough, I am not sure.

In the worst case, you can make a recursive descent of the fully expanded module
to recover just the information you need. Note that a fully expanded module
consists of a limited set of form, so that makes a it possible in the first place to
do a recursive descent. But before going this route it might be worth looking at the
source of racket-mode to see how the problem was solved there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants