Extend is_kanji to recognise kanji in the CJK Unified Ideographs Extension blocks (or provide alternate function) #15

Heliozoa · 2024-01-12T12:36:21Z

Currently, is_kanji uses the Unicode range U+4E00-U+9FAF to recognise kanji, corresponding to the CJK Unified Ideographs block. Unicode has additional "extension blocks" that contain more uncommon kanji, such as CJK Unified Ideographs Extension B which contains the kanji 𬵪.

Since these are quite obscure and possibly difficult to determine which of them qualify as "kanji", I think it would be useful to include such functionality in a crate.

The text was updated successfully, but these errors were encountered:

PSeitz · 2024-01-12T14:43:35Z

Are you suggesting there should be a separate method for this?
This crate is mostly for hiragana and katakana, is_kanji is just for convenience.

Heliozoa · 2024-01-12T15:01:43Z

That's understandable, covering just the CJK Unified Ideographs block is enough for most purposes I imagine.

For added context, I was using the crate for its other functionality already, and started using is_kanji to pick out kanji from the words contained in the JMdict dictionary. It contains some words that contain kanji from the extension blocks, so they were unexpectedly (to me) getting filtered out by is_kanji.

You can close the issue if this is out of scope, or leave it up if this is something that may have a place in the crate in the future. Thanks for the quick response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend is_kanji to recognise kanji in the CJK Unified Ideographs Extension blocks (or provide alternate function) #15

Extend is_kanji to recognise kanji in the CJK Unified Ideographs Extension blocks (or provide alternate function) #15

Heliozoa commented Jan 12, 2024

PSeitz commented Jan 12, 2024

Heliozoa commented Jan 12, 2024 •

edited

Loading

Extend is_kanji to recognise kanji in the CJK Unified Ideographs Extension blocks (or provide alternate function) #15

Extend is_kanji to recognise kanji in the CJK Unified Ideographs Extension blocks (or provide alternate function) #15

Comments

Heliozoa commented Jan 12, 2024

PSeitz commented Jan 12, 2024

Heliozoa commented Jan 12, 2024 • edited Loading

Heliozoa commented Jan 12, 2024 •

edited

Loading