-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String length expressed as byte or character count for bencode #92
Comments
Hey, that's a pretty interesting problem that I haven't personally run into, even having worked on a pretty widely deployed bencode implementation (on a completely unrelated project). Do you know of a way to reliably determine whether a file should be interpreted as the variant interpretation and when it should not? Also, do you have any examples of implementations of bencode that support this (even if they're in other langauges)? |
Hello @jzelinskie As for a practical example of a software using the interpretation I was referring to, you can look https://github.com/Zimbra/zm-mailbox/blob/develop/common/src/java/com/zimbra/common/util/BEncoding.java here for the serialization functions used by the Zimbra Communication Suite in its Java code, i.e. the source of my annoyance ;D . |
Following up on this, my problem ended up being with an implementation expressing string length as a the count of UTF-16 code units used to represent the string. Pretty removed from the standard implementation, yet it exists. |
Hello.
First of all, thanks a lot for the tool.
I am, however, encountering problems when dealing with data encoded with
bencode
.It's a problem I've come across time and again and hopefully one you can address.
From what I've seen you've interpreted the string length as the number of bytes the string is encoded as, which should be fine.
Since, I guess, the original specs of the format, if we can call them that, were less than crystal clear as to what string length meant, there are many implementations around interpreting the string length as the character count, in Unicode terms the count of codepoints present in the string.
Could you create a variant of the
bencode
format supported byfaq
that matches the variant interpretation of string length described above? It would make my life a lot easier dealing with these sorts of systems.Just as a reference, faq would encode (arguably correctly) the JSON
{ "a": "à" }
as the bencode-dd1:a2:àe
, while the variant format would encode it asd1:a1:àe
, assuming UTF-8 encoded strings.Thanks in advance.
The text was updated successfully, but these errors were encountered: