-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EUC-JP wrongly detected in this case that contains german umlaut #29
Comments
Yeah, this is really tricky. Encoding detection is not deterministic (for most cases) and relies on heuristic methods. This is why it will never be 100% reliable.
|
@aadsm here is the output:
|
AFAIK Visual Studio Code uses jschardet and some of us users experience the very same problem in VSC: microsoft/vscode#4891 |
I was reporting this on behalf of VS Code. |
This has been fixed in https://github.com/chardet/chardet at chardet/chardet@c0f1ab5 and in the original source at https://bugzilla.mozilla.org/show_bug.cgi?id=306272 This doesn't solve the problem at #29 completely because the fix just raises the limit for "sure detection" up to 3 frequent characters found. However, it makes it at parity level with the original chardet.
The following file detects as EUC-JP even though it is not. Seems to be caused by a single
ü
inside that file.File: QuietLight.tmTheme.txt
The text was updated successfully, but these errors were encountered: