Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: handle tracef's %c as unicode code point #411

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zetanumbers
Copy link
Contributor

@zetanumbers zetanumbers commented Feb 27, 2022

Previously converted such character to UTF-16 char code, so large unicode characters would have been truncated. Now it's possible to pass unicode characters.

Previously converted such character to UTF-16 char code,
so large unicode characters would have been truncated.
Now it's possible to pass unicode characters.
@aduros
Copy link
Owner

aduros commented Mar 1, 2022

We should probably match the same behavior as C's printf, which I think truncates to 8 bits for %c.

For me this program:

printf("Hello %c\n", 12345678);

Prints Hello N.

@zetanumbers
Copy link
Contributor Author

We should probably match the same behavior as C's printf, which I think truncates to 8 bits for %c.

But why? It's not like we are trying to implement libc. With this PR we would able to pass rust's char for example.

@zetanumbers
Copy link
Contributor Author

Btw if we truncate, should we truncate to 7 bits for ASCII, or truncate to 8 bits and allow some UTF-16 char codes? Aren't non-ASCII characters for printf OS dependent?

@aduros
Copy link
Owner

aduros commented Mar 2, 2022

Could we truncate to 8 bits? libc printf semantics aren't perfect, but at least they're well-defined and we don't need to document our own special handling of certain features.

For printing unicode characters, isn't it possible to use %s instead of %c? Or just format the string directly in Rust.

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Mar 4, 2022

Could we truncate to 8 bits? libc printf semantics aren't perfect, but at least they're well-defined and we don't need to document our own special handling of certain features.

Until and even then we truncate to 8 bits, we probably could handle non-ascii chars as unicode code points instead of UTF-16 char codes?

@zetanumbers
Copy link
Contributor Author

For printing unicode characters, isn't it possible to use %s instead of %c? Or just format the string directly in Rust.

Current %s implementation only works on ascii null-terminated strings.

https://github.com/aduros/wasm4/blob/main/runtimes/web/src/runtime.ts#L272

To manually tracef in Rust you would:

  1. Create an empty string;
  2. Gradually write to this string other substrings, numbers, etc. Meanwhile the String would grow (reallocate) gradually increasing its capacity;
  3. Flush the whole string onto a single line via traceUtf8;
  4. Deallocate the string.

This brings some runtime (~7KiB on all code optimizations) into the binary. It could have been better (now only ~2KIB) if there was an ability flush the line by parts, requiring no allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants