-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differences with JS version #3
Comments
Thanks! This code was lagging behind the JS version, I updated it, thanks for spotting that. Just out of curiosity, in what context do you use this code? ICU now has good collation rules for Tibetan (see blog post), so it could be better to use it directly |
I edit a French-Tibetan dictionary (LuaLaTeX-PDF) for a linguist, and use your Python script between LaTeX compilations to order the entries in the index. Incidentally, I'm having a few problems with the following letters: བྷ and བྷེ... In fact, I found your script just after reading this blog post, I had seen the ICU rules (the XML file), but I have to admit that at the moment, I'm not sure how I could use them (it is the first time I need an external tool like this one)... |
Oh I see, thanks! Is it with Guillaume Jacques? (I see you've published with him in the past) You can use the ICU rules with the example provided on https://github.com/eroux/tibetan-collation/blob/master/implementations/Unicode/test.py (which might need a few updates? I'm not sure...) What problems do you have with བྷ and བྷེ? |
Indeed, it could have been with Guillaume, since I'm currently working with him on a new version of the Japhug dictionary, but for today's purposes, it's with Camille Simon! For བྷ and བྷེ, since I'm not very familiar with the language, it might seem silly.... I had 4 entries that were obviously well-ordered within the others, but as I'm inserting lettrines for the index, I'm using the segments of the 30 or so blocks to make a regular expression that detects the block change and inserts the lettrine (like By the way, I also have to manage the entries starting with numbers separately... They seem to be well-ordered too but I'll have to manually add a regex for them... They seem simpler than the other glyphs, can I directly make a regex like this one: Thanks for the link, I'll check it out when I get the chance! |
Oh I see, for བྷ maybe it's because you're using the NFC representation ( The numbers should work like that yes |
This confirms what I was thinking, as I could see that the computer character seemed a little more complex than the others, already incorporating modifiers while others around seemed more deconstructed! Thanks! |
I noticed a few differences with the data from the JS version.
JS version:
Python version:
Some glyphs are missing and
['ཞ', 'གཞ', 'བཞ']
is repeated in Python version, and there are some differences…I couldn't do a test run comparing the 2 versions, I just wanted to ask if everything was on purpose!
The text was updated successfully, but these errors were encountered: