Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add absolute confidence metric based on unique and most common ngrams #419

Draft
wants to merge 78 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
137e71e
Enhance model for Afrikaans
pemistahl Dec 31, 2024
b758b54
Enhance model for Arabic
pemistahl Dec 31, 2024
2f1d525
Enhance model for Azerbaijani
pemistahl Dec 31, 2024
ac5feab
Enhance model for Belarusian
pemistahl Dec 31, 2024
f420a7b
Enhance model for Bulgarian
pemistahl Dec 31, 2024
2b5504f
Enhance model for Bengali
pemistahl Dec 31, 2024
5b98117
Enhance model for Bosnian
pemistahl Dec 31, 2024
48447ad
Enhance model for Catalan
pemistahl Dec 31, 2024
efcac94
Enhance model for Czech
pemistahl Dec 31, 2024
21dc1bb
Enhance model for Welsh
pemistahl Dec 31, 2024
2af82df
Enhance model for Danish
pemistahl Dec 31, 2024
4fe0221
Enhance model for German
pemistahl Dec 31, 2024
87e7b85
Enhance model for Greek
pemistahl Dec 31, 2024
39e0b8d
Enhance model for English
pemistahl Dec 31, 2024
f151d82
Enhance model for Esperanto
pemistahl Dec 31, 2024
7017457
Enhance model for Spanish
pemistahl Dec 31, 2024
cd05e25
Enhance model for Estonian
pemistahl Dec 31, 2024
519f6b2
Enhance model for Basque
pemistahl Dec 31, 2024
165f184
Enhance model for Persian
pemistahl Dec 31, 2024
60427fa
Enhance model for Finnish
pemistahl Dec 31, 2024
5d10b71
Enhance model for French
pemistahl Dec 31, 2024
ca88c95
Enhance model for Irish
pemistahl Dec 31, 2024
bf76575
Enhance model for Gujarati
pemistahl Dec 31, 2024
ffed05b
Enhance model for Hebrew
pemistahl Dec 31, 2024
00cd1a2
Enhance model for Hindi
pemistahl Dec 31, 2024
d2ea706
Enhance model for Croatian
pemistahl Dec 31, 2024
bb5a022
Enhance model for Hungarian
pemistahl Dec 31, 2024
297c112
Enhance model for Armenian
pemistahl Dec 31, 2024
2138f8a
Enhance model for Indonesian
pemistahl Dec 31, 2024
44970d9
Enhance model for Icelandic
pemistahl Dec 31, 2024
f1e8cb7
Enhance model for Italian
pemistahl Dec 31, 2024
743a741
Enhance model for Japanese
pemistahl Dec 31, 2024
ad26100
Enhance model for Georgian
pemistahl Dec 31, 2024
e5a3eac
Enhance model for Kazakh
pemistahl Dec 31, 2024
f0e7ccd
Enhance model for Korean
pemistahl Dec 31, 2024
570b995
Enhance model for Latin
pemistahl Dec 31, 2024
5eb5227
Enhance model for Ganda
pemistahl Dec 31, 2024
0f47f02
Enhance model for Lithuanian
pemistahl Dec 31, 2024
e193b8f
Enhance model for Latvian
pemistahl Dec 31, 2024
f2eb156
Enhance model for Maori
pemistahl Dec 31, 2024
f3e926c
Enhance model for Macedonian
pemistahl Dec 31, 2024
14d5b18
Enhance model for Mongolian
pemistahl Dec 31, 2024
7eb0da9
Enhance model for Marathi
pemistahl Dec 31, 2024
3e84974
Enhance model for Malay
pemistahl Dec 31, 2024
8ce97fe
Enhance model for Bokmal
pemistahl Dec 31, 2024
8b3ca7e
Enhance model for Dutch
pemistahl Dec 31, 2024
d4104ec
Enhance model for Nynorsk
pemistahl Dec 31, 2024
0848e05
Enhance model for Punjabi
pemistahl Dec 31, 2024
e38bd2a
Enhance model for Polish
pemistahl Dec 31, 2024
8c256c4
Enhance model for Portuguese
pemistahl Dec 31, 2024
ce4bd6b
Enhance model for Romanian
pemistahl Dec 31, 2024
3ab0235
Enhance model for Russian
pemistahl Dec 31, 2024
e84795e
Enhance model for Slovak
pemistahl Dec 31, 2024
864166d
Enhance model for Slovene
pemistahl Dec 31, 2024
d793e0f
Enhance model for Shona
pemistahl Dec 31, 2024
c48302c
Enhance model for Somali
pemistahl Dec 31, 2024
f70ed90
Enhance model for Albanian
pemistahl Dec 31, 2024
27f1335
Enhance model for Serbian
pemistahl Dec 31, 2024
2c8315f
Enhance model for Sotho
pemistahl Dec 31, 2024
24fd288
Enhance model for Swedish
pemistahl Dec 31, 2024
42e2715
Enhance model for Swahili
pemistahl Dec 31, 2024
a3ce31c
Enhance model for Tamil
pemistahl Dec 31, 2024
d806ee6
Enhance model for Telugu
pemistahl Dec 31, 2024
28e43cc
Enhance model for Thai
pemistahl Dec 31, 2024
18d0cfd
Enhance model for Tagalog
pemistahl Dec 31, 2024
bb8ac34
Enhance model for Tswana
pemistahl Dec 31, 2024
12baa1b
Enhance model for Turkish
pemistahl Dec 31, 2024
226deca
Enhance model for Tsonga
pemistahl Dec 31, 2024
5c9948a
Enhance model for Ukrainian
pemistahl Dec 31, 2024
ff2489f
Enhance model for Urdu
pemistahl Dec 31, 2024
9dae0eb
Enhance model for Vietnamese
pemistahl Dec 31, 2024
7af5b6c
Enhance model for Xhosa
pemistahl Dec 31, 2024
d4ddc2c
Enhance model for Yoruba
pemistahl Dec 31, 2024
36fbb7c
Enhance model for Chinese
pemistahl Dec 31, 2024
e7f7cd4
Enhance model for Zulu
pemistahl Dec 31, 2024
30c67ee
Add `Language::all_with_single_unique_script()`
pemistahl Dec 31, 2024
347aa3e
Remove struct `TestDataLanguageModel`
pemistahl Jan 2, 2025
449bb70
Refactor language model serialization
pemistahl Jan 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
150 changes: 75 additions & 75 deletions Cargo.lock

Large diffs are not rendered by default.

150 changes: 75 additions & 75 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,81 +63,81 @@ serde = { version = "1.0.217", features = ["derive"] }
serde_json = "1.0.134"
strum = "0.26.3"
strum_macros = "0.26.4"
lingua-afrikaans-language-model = { path = "language-models/af", version = "1.1.0", optional = true }
lingua-albanian-language-model = { path = "language-models/sq", version = "1.1.0", optional = true }
lingua-arabic-language-model = { path = "language-models/ar", version = "1.1.0", optional = true }
lingua-armenian-language-model = { path = "language-models/hy", version = "1.1.0", optional = true }
lingua-azerbaijani-language-model = { path = "language-models/az", version = "1.1.0", optional = true }
lingua-basque-language-model = { path = "language-models/eu", version = "1.1.0", optional = true }
lingua-belarusian-language-model = { path = "language-models/be", version = "1.1.0", optional = true }
lingua-bengali-language-model = { path = "language-models/bn", version = "1.1.0", optional = true }
lingua-bokmal-language-model = { path = "language-models/nb", version = "1.1.0", optional = true }
lingua-bosnian-language-model = { path = "language-models/bs", version = "1.1.0", optional = true }
lingua-bulgarian-language-model = { path = "language-models/bg", version = "1.1.0", optional = true }
lingua-catalan-language-model = { path = "language-models/ca", version = "1.1.0", optional = true }
lingua-chinese-language-model = { path = "language-models/zh", version = "1.1.0", optional = true }
lingua-croatian-language-model = { path = "language-models/hr", version = "1.1.0", optional = true }
lingua-czech-language-model = { path = "language-models/cs", version = "1.1.0", optional = true }
lingua-danish-language-model = { path = "language-models/da", version = "1.1.0", optional = true }
lingua-dutch-language-model = { path = "language-models/nl", version = "1.1.0", optional = true }
lingua-english-language-model = { path = "language-models/en", version = "1.1.0", optional = true }
lingua-esperanto-language-model = { path = "language-models/eo", version = "1.1.0", optional = true }
lingua-estonian-language-model = { path = "language-models/et", version = "1.1.0", optional = true }
lingua-finnish-language-model = { path = "language-models/fi", version = "1.1.0", optional = true }
lingua-french-language-model = { path = "language-models/fr", version = "1.1.0", optional = true }
lingua-ganda-language-model = { path = "language-models/lg", version = "1.1.0", optional = true }
lingua-georgian-language-model = { path = "language-models/ka", version = "1.1.0", optional = true }
lingua-german-language-model = { path = "language-models/de", version = "1.1.0", optional = true }
lingua-greek-language-model = { path = "language-models/el", version = "1.1.0", optional = true }
lingua-gujarati-language-model = { path = "language-models/gu", version = "1.1.0", optional = true }
lingua-hebrew-language-model = { path = "language-models/he", version = "1.1.0", optional = true }
lingua-hindi-language-model = { path = "language-models/hi", version = "1.1.0", optional = true }
lingua-hungarian-language-model = { path = "language-models/hu", version = "1.1.0", optional = true }
lingua-icelandic-language-model = { path = "language-models/is", version = "1.1.0", optional = true }
lingua-indonesian-language-model = { path = "language-models/id", version = "1.1.0", optional = true }
lingua-irish-language-model = { path = "language-models/ga", version = "1.1.0", optional = true }
lingua-italian-language-model = { path = "language-models/it", version = "1.1.0", optional = true }
lingua-japanese-language-model = { path = "language-models/ja", version = "1.1.0", optional = true }
lingua-kazakh-language-model = { path = "language-models/kk", version = "1.1.0", optional = true }
lingua-korean-language-model = { path = "language-models/ko", version = "1.1.0", optional = true }
lingua-latin-language-model = { path = "language-models/la", version = "1.1.0", optional = true }
lingua-latvian-language-model = { path = "language-models/lv", version = "1.1.0", optional = true }
lingua-lithuanian-language-model = { path = "language-models/lt", version = "1.1.0", optional = true }
lingua-macedonian-language-model = { path = "language-models/mk", version = "1.1.0", optional = true }
lingua-malay-language-model = { path = "language-models/ms", version = "1.1.0", optional = true }
lingua-maori-language-model = { path = "language-models/mi", version = "1.1.0", optional = true }
lingua-marathi-language-model = { path = "language-models/mr", version = "1.1.0", optional = true }
lingua-mongolian-language-model = { path = "language-models/mn", version = "1.1.0", optional = true }
lingua-nynorsk-language-model = { path = "language-models/nn", version = "1.1.0", optional = true }
lingua-persian-language-model = { path = "language-models/fa", version = "1.1.0", optional = true }
lingua-polish-language-model = { path = "language-models/pl", version = "1.1.0", optional = true }
lingua-portuguese-language-model = { path = "language-models/pt", version = "1.1.0", optional = true }
lingua-punjabi-language-model = { path = "language-models/pa", version = "1.1.0", optional = true }
lingua-romanian-language-model = { path = "language-models/ro", version = "1.1.0", optional = true }
lingua-russian-language-model = { path = "language-models/ru", version = "1.1.0", optional = true }
lingua-serbian-language-model = { path = "language-models/sr", version = "1.1.0", optional = true }
lingua-shona-language-model = { path = "language-models/sn", version = "1.1.0", optional = true }
lingua-slovak-language-model = { path = "language-models/sk", version = "1.1.0", optional = true }
lingua-slovene-language-model = { path = "language-models/sl", version = "1.1.0", optional = true }
lingua-somali-language-model = { path = "language-models/so", version = "1.1.0", optional = true }
lingua-sotho-language-model = { path = "language-models/st", version = "1.1.0", optional = true }
lingua-spanish-language-model = { path = "language-models/es", version = "1.1.0", optional = true }
lingua-swahili-language-model = { path = "language-models/sw", version = "1.1.0", optional = true }
lingua-swedish-language-model = { path = "language-models/sv", version = "1.1.0", optional = true }
lingua-tagalog-language-model = { path = "language-models/tl", version = "1.1.0", optional = true }
lingua-tamil-language-model = { path = "language-models/ta", version = "1.1.0", optional = true }
lingua-telugu-language-model = { path = "language-models/te", version = "1.1.0", optional = true }
lingua-thai-language-model = { path = "language-models/th", version = "1.1.0", optional = true }
lingua-tsonga-language-model = { path = "language-models/ts", version = "1.1.0", optional = true }
lingua-tswana-language-model = { path = "language-models/tn", version = "1.1.0", optional = true }
lingua-turkish-language-model = { path = "language-models/tr", version = "1.1.0", optional = true }
lingua-ukrainian-language-model = { path = "language-models/uk", version = "1.1.0", optional = true }
lingua-urdu-language-model = { path = "language-models/ur", version = "1.1.0", optional = true }
lingua-vietnamese-language-model = { path = "language-models/vi", version = "1.1.0", optional = true }
lingua-welsh-language-model = { path = "language-models/cy", version = "1.1.0", optional = true }
lingua-xhosa-language-model = { path = "language-models/xh", version = "1.1.0", optional = true }
lingua-yoruba-language-model = { path = "language-models/yo", version = "1.1.0", optional = true }
lingua-zulu-language-model = { path = "language-models/zu", version = "1.1.0", optional = true }
lingua-afrikaans-language-model = { path = "language-models/af", version = "1.2.0", optional = true }
lingua-albanian-language-model = { path = "language-models/sq", version = "1.2.0", optional = true }
lingua-arabic-language-model = { path = "language-models/ar", version = "1.2.0", optional = true }
lingua-armenian-language-model = { path = "language-models/hy", version = "1.2.0", optional = true }
lingua-azerbaijani-language-model = { path = "language-models/az", version = "1.2.0", optional = true }
lingua-basque-language-model = { path = "language-models/eu", version = "1.2.0", optional = true }
lingua-belarusian-language-model = { path = "language-models/be", version = "1.2.0", optional = true }
lingua-bengali-language-model = { path = "language-models/bn", version = "1.2.0", optional = true }
lingua-bokmal-language-model = { path = "language-models/nb", version = "1.2.0", optional = true }
lingua-bosnian-language-model = { path = "language-models/bs", version = "1.2.0", optional = true }
lingua-bulgarian-language-model = { path = "language-models/bg", version = "1.2.0", optional = true }
lingua-catalan-language-model = { path = "language-models/ca", version = "1.2.0", optional = true }
lingua-chinese-language-model = { path = "language-models/zh", version = "1.2.0", optional = true }
lingua-croatian-language-model = { path = "language-models/hr", version = "1.2.0", optional = true }
lingua-czech-language-model = { path = "language-models/cs", version = "1.2.0", optional = true }
lingua-danish-language-model = { path = "language-models/da", version = "1.2.0", optional = true }
lingua-dutch-language-model = { path = "language-models/nl", version = "1.2.0", optional = true }
lingua-english-language-model = { path = "language-models/en", version = "1.2.0", optional = true }
lingua-esperanto-language-model = { path = "language-models/eo", version = "1.2.0", optional = true }
lingua-estonian-language-model = { path = "language-models/et", version = "1.2.0", optional = true }
lingua-finnish-language-model = { path = "language-models/fi", version = "1.2.0", optional = true }
lingua-french-language-model = { path = "language-models/fr", version = "1.2.0", optional = true }
lingua-ganda-language-model = { path = "language-models/lg", version = "1.2.0", optional = true }
lingua-georgian-language-model = { path = "language-models/ka", version = "1.2.0", optional = true }
lingua-german-language-model = { path = "language-models/de", version = "1.2.0", optional = true }
lingua-greek-language-model = { path = "language-models/el", version = "1.2.0", optional = true }
lingua-gujarati-language-model = { path = "language-models/gu", version = "1.2.0", optional = true }
lingua-hebrew-language-model = { path = "language-models/he", version = "1.2.0", optional = true }
lingua-hindi-language-model = { path = "language-models/hi", version = "1.2.0", optional = true }
lingua-hungarian-language-model = { path = "language-models/hu", version = "1.2.0", optional = true }
lingua-icelandic-language-model = { path = "language-models/is", version = "1.2.0", optional = true }
lingua-indonesian-language-model = { path = "language-models/id", version = "1.2.0", optional = true }
lingua-irish-language-model = { path = "language-models/ga", version = "1.2.0", optional = true }
lingua-italian-language-model = { path = "language-models/it", version = "1.2.0", optional = true }
lingua-japanese-language-model = { path = "language-models/ja", version = "1.2.0", optional = true }
lingua-kazakh-language-model = { path = "language-models/kk", version = "1.2.0", optional = true }
lingua-korean-language-model = { path = "language-models/ko", version = "1.2.0", optional = true }
lingua-latin-language-model = { path = "language-models/la", version = "1.2.0", optional = true }
lingua-latvian-language-model = { path = "language-models/lv", version = "1.2.0", optional = true }
lingua-lithuanian-language-model = { path = "language-models/lt", version = "1.2.0", optional = true }
lingua-macedonian-language-model = { path = "language-models/mk", version = "1.2.0", optional = true }
lingua-malay-language-model = { path = "language-models/ms", version = "1.2.0", optional = true }
lingua-maori-language-model = { path = "language-models/mi", version = "1.2.0", optional = true }
lingua-marathi-language-model = { path = "language-models/mr", version = "1.2.0", optional = true }
lingua-mongolian-language-model = { path = "language-models/mn", version = "1.2.0", optional = true }
lingua-nynorsk-language-model = { path = "language-models/nn", version = "1.2.0", optional = true }
lingua-persian-language-model = { path = "language-models/fa", version = "1.2.0", optional = true }
lingua-polish-language-model = { path = "language-models/pl", version = "1.2.0", optional = true }
lingua-portuguese-language-model = { path = "language-models/pt", version = "1.2.0", optional = true }
lingua-punjabi-language-model = { path = "language-models/pa", version = "1.2.0", optional = true }
lingua-romanian-language-model = { path = "language-models/ro", version = "1.2.0", optional = true }
lingua-russian-language-model = { path = "language-models/ru", version = "1.2.0", optional = true }
lingua-serbian-language-model = { path = "language-models/sr", version = "1.2.0", optional = true }
lingua-shona-language-model = { path = "language-models/sn", version = "1.2.0", optional = true }
lingua-slovak-language-model = { path = "language-models/sk", version = "1.2.0", optional = true }
lingua-slovene-language-model = { path = "language-models/sl", version = "1.2.0", optional = true }
lingua-somali-language-model = { path = "language-models/so", version = "1.2.0", optional = true }
lingua-sotho-language-model = { path = "language-models/st", version = "1.2.0", optional = true }
lingua-spanish-language-model = { path = "language-models/es", version = "1.2.0", optional = true }
lingua-swahili-language-model = { path = "language-models/sw", version = "1.2.0", optional = true }
lingua-swedish-language-model = { path = "language-models/sv", version = "1.2.0", optional = true }
lingua-tagalog-language-model = { path = "language-models/tl", version = "1.2.0", optional = true }
lingua-tamil-language-model = { path = "language-models/ta", version = "1.2.0", optional = true }
lingua-telugu-language-model = { path = "language-models/te", version = "1.2.0", optional = true }
lingua-thai-language-model = { path = "language-models/th", version = "1.2.0", optional = true }
lingua-tsonga-language-model = { path = "language-models/ts", version = "1.2.0", optional = true }
lingua-tswana-language-model = { path = "language-models/tn", version = "1.2.0", optional = true }
lingua-turkish-language-model = { path = "language-models/tr", version = "1.2.0", optional = true }
lingua-ukrainian-language-model = { path = "language-models/uk", version = "1.2.0", optional = true }
lingua-urdu-language-model = { path = "language-models/ur", version = "1.2.0", optional = true }
lingua-vietnamese-language-model = { path = "language-models/vi", version = "1.2.0", optional = true }
lingua-welsh-language-model = { path = "language-models/cy", version = "1.2.0", optional = true }
lingua-xhosa-language-model = { path = "language-models/xh", version = "1.2.0", optional = true }
lingua-yoruba-language-model = { path = "language-models/yo", version = "1.2.0", optional = true }
lingua-zulu-language-model = { path = "language-models/zu", version = "1.2.0", optional = true }

[target.'cfg(not(target_family = "wasm"))'.dependencies]
ahash = "0.8.11"
Expand Down
4 changes: 2 additions & 2 deletions language-models/af/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

[package]
name = "lingua-afrikaans-language-model"
version = "1.1.0"
version = "1.2.0"
authors = ["Peter M. Stahl <pemistahl@gmail.com>"]
description = """
The Afrikaans language model for Lingua, an accurate natural language detection library
Expand All @@ -34,4 +34,4 @@ keywords = [
]

[dependencies]
include_dir = "0.7.3"
include_dir = "0.7.4"
6 changes: 6 additions & 0 deletions language-models/af/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ the most accurate natural language detection library in the Rust ecosystem.

### Changelog

#### Version 1.2.0

- The language model has been enhanced by including unique and most common
ngrams to support an absolute confidence metric which is independent of
other languages.

#### Version 1.1.0

- The language model files are now compressed with the Brotli algorithm which
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added language-models/af/models/unique_bigrams.json.br
Binary file not shown.
Binary file not shown.
Binary file not shown.
6 changes: 6 additions & 0 deletions language-models/af/models/unique_trigrams.json.br
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@,ÌóÝ<M
®–ƒ0g†x1sÆo$?¥Q{ÇáöƒÒMá!3÷[è«eKKd+AXúJ§‹àkÇ÷O;Šu¡ÕK0ÆH¸þüÂmìäxQ¨DÕö/yoø:ïñÝ>ÍùÊâÅH[æ•AˆO8;×~m(í×)ÙÞÛL|ôM2‹
f¹‚Jéyî…ãaá wÊسˆ”@çbàp{"Ö$nKܗˆd¢àš€(d3_ø¯¢ÃAŸÃ<±J¦FvTðëGª*Q JZ«®ÑØ"@²ƒ»þú엑ªï©Õċ'>8lr
Y$Î¥o
e8$9ƒ€R©ÂÄç­
û;É»š0c¤ºSyC¿Îabh#­¿iƳ(ԙ+r‡›d™o––ÓyË!*!!Cꍒt”áß÷Ãí
Expand Down
1 change: 1 addition & 0 deletions language-models/af/models/unique_unigrams.json.br
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
€{"language": "AFRIKAANS", "ngrams": ["ȇ", "ȅ", "ʼn"]}
4 changes: 2 additions & 2 deletions language-models/ar/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

[package]
name = "lingua-arabic-language-model"
version = "1.1.0"
version = "1.2.0"
authors = ["Peter M. Stahl <pemistahl@gmail.com>"]
description = """
The Arabic language model for Lingua, an accurate natural language detection library
Expand All @@ -34,4 +34,4 @@ keywords = [
]

[dependencies]
include_dir = "0.7.3"
include_dir = "0.7.4"
6 changes: 6 additions & 0 deletions language-models/ar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ the most accurate natural language detection library in the Rust ecosystem.

### Changelog

#### Version 1.2.0

- The language model has been enhanced by including unique and most common
ngrams to support an absolute confidence metric which is independent of
other languages.

#### Version 1.1.0

- The language model files are now compressed with the Brotli algorithm which
Expand Down
Binary file added language-models/ar/models/mostcommon_bigrams.json.br
Binary file not shown.
Binary file not shown.
3 changes: 3 additions & 0 deletions language-models/ar/models/mostcommon_quadrigrams.json.br
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
M@<˜s¯V!Ú:Xi O=—O–¿ÆÎÙ«n ¹Ë&ÇRÇÃÑsËRd«­³Yº‚¦ŽT_€”ÐPŠ%e˜Üܺ³¶‚²Cmm.P
í[Ú¾S
,Ãô†ÍÓDyÐK>X¹T[^s½g©"î;wtE½^ !„‘AQ¡ ~ Ž(XÉúÕ]5dÀ/š€&r݁ ÖÙ"qmî
Expand Down
2 changes: 2 additions & 0 deletions language-models/ar/models/mostcommon_trigrams.json.br
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
 <h—×KØÄؘx `ݺOŒþ:¯nó¸ÿX¡=oÈDŠ{L‘­¶f êʖ:R}RBC)””arsê¶z>øÅ.Öbmk
¦pºÍš-3n^úÀr˜[þ•ú(VÄ%°ä%µ(®ßµÛÏ`óR33ŒTâ†X¡½,FBz‡_È{
Binary file not shown.
Binary file added language-models/ar/models/unique_bigrams.json.br
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added language-models/ar/models/unique_unigrams.json.br
Binary file not shown.
4 changes: 2 additions & 2 deletions language-models/az/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

[package]
name = "lingua-azerbaijani-language-model"
version = "1.1.0"
version = "1.2.0"
authors = ["Peter M. Stahl <pemistahl@gmail.com>"]
description = """
The Azerbaijani language model for Lingua, an accurate natural language detection library
Expand All @@ -34,4 +34,4 @@ keywords = [
]

[dependencies]
include_dir = "0.7.3"
include_dir = "0.7.4"
6 changes: 6 additions & 0 deletions language-models/az/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ the most accurate natural language detection library in the Rust ecosystem.

### Changelog

#### Version 1.2.0

- The language model has been enhanced by including unique and most common
ngrams to support an absolute confidence metric which is independent of
other languages.

#### Version 1.1.0

- The language model files are now compressed with the Brotli algorithm which
Expand Down
Binary file added language-models/az/models/mostcommon_bigrams.json.br
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions language-models/az/models/mostcommon_quadrigrams.json.br
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@œv,ž´ ÖçâTݔ¯r#A@‚Z2Ē½bÇ<ïŽxf~Æ&‹NW€Xî@Ú[óôAìﷇo­µ$i¥Û"æõ–Ù¤ñ¢ùAs=)›ì0J©µ½bJg§ÓÊ Íqw‚Ç4Å(3KÈCSZ›vësj±wq[)G‹hš-®?`ÿ3ßé´ðÊÁîmŸ_6åVü
Binary file not shown.
Binary file not shown.
4 changes: 4 additions & 0 deletions language-models/az/models/unique_bigrams.json.br
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
 ,x“/dÁb5*üQqšbsj=%
Ä^æ 7sm¥-—¶2Ä+ã…ñ—"[mN9ò8 ¬¶C‡ÃD«Ï¬åüæ;=÷T¡–Ù„ÒŠp
uS"úèÚØ¥²Õ…ýCüš»¾‚äúcðôˀƒ;cƒP°ß¨›‹+…l!Ý©ü0& x˜-„>qì„*˜%þ†XBðîmbUsNºcdCSaߙÔJJOcåfÿçÒèf4°‡‘¾…a!è«Ì݇GŽ®
¹[etǾ÷niQ<ÇL˜Ø*¶%4‰‡‘Ì#sK£uxn$lò./ù<{‰vëŸÊL-üA\3¬X4=IV‡r7ãäئÇBøú<ÆB0&+äÍÃO,UƇĜ{ùû±lN:Xˆª»ÁCÁÌ
Expand Down
Binary file not shown.
Binary file not shown.
Binary file added language-models/az/models/unique_trigrams.json.br
Binary file not shown.
4 changes: 2 additions & 2 deletions language-models/be/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

[package]
name = "lingua-belarusian-language-model"
version = "1.1.0"
version = "1.2.0"
authors = ["Peter M. Stahl <pemistahl@gmail.com>"]
description = """
The Belarusian language model for Lingua, an accurate natural language detection library
Expand All @@ -34,4 +34,4 @@ keywords = [
]

[dependencies]
include_dir = "0.7.3"
include_dir = "0.7.4"
6 changes: 6 additions & 0 deletions language-models/be/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ the most accurate natural language detection library in the Rust ecosystem.

### Changelog

#### Version 1.2.0

- The language model has been enhanced by including unique and most common
ngrams to support an absolute confidence metric which is independent of
other languages.

#### Version 1.1.0

- The language model files are now compressed with the Brotli algorithm which
Expand Down
Binary file added language-models/be/models/mostcommon_bigrams.json.br
Binary file not shown.
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions language-models/be/models/mostcommon_trigrams.json.br
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
 œ¶l;ÄáX0ÿÀbi¸‹<(«-hkA- ýÇßøº-äzGÐü’‡^DÏ|ŠlµDÞÖŸ˜"™™Íí'” h‚ T¡x kÇK$ç£>$¥ö<÷jy„(JÏý}†T@ˆ,VSSÎa7ÚUժĘôöÀË•¸….RÕA²•œ39®5+_ 5žØ>?
Expand Down
Binary file not shown.
Binary file added language-models/be/models/unique_bigrams.json.br
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added language-models/be/models/unique_trigrams.json.br
Binary file not shown.
4 changes: 2 additions & 2 deletions language-models/bg/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

[package]
name = "lingua-bulgarian-language-model"
version = "1.1.0"
version = "1.2.0"
authors = ["Peter M. Stahl <pemistahl@gmail.com>"]
description = """
The Bulgarian language model for Lingua, an accurate natural language detection library
Expand All @@ -34,4 +34,4 @@ keywords = [
]

[dependencies]
include_dir = "0.7.3"
include_dir = "0.7.4"
Loading
Loading