Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARNING: Can not find mwt: default from official model list. Ignoring it. #297

Closed
rodriguesfas opened this issue May 8, 2020 · 6 comments

Comments

@rodriguesfas
Copy link

Hello, I'm running a pipeline with stanza and I get an error for the MWT analysis.
WARNING: Can not find mwt: default from official model list. Ignoring it.
Do you know what it is? For some reason this model is not low in stanza_resources.

@yuhaozhang
Copy link
Member

Can you provide more details on how you initialize the pipeline? A code snippet will help us reproduce the issue.

@twang18
Copy link

twang18 commented May 13, 2020

I am getting the same warnings for both en and zh models.

The code and logs are as follows:

stanza.download('zh-hans')
nlp = stanza.Pipeline(lang='zh-hans', processors='tokenize,mwt,pos', use_gpu=False)

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.0.0.json: 116kB [00:00, 1.25MB/s] 2020-05-13 08:25:16 INFO: Downloading default packages for language: zh-hans (Simplified_Chinese)... 2020-05-13 08:25:18 INFO: File exists: C:\Users\WT.YX\stanza_resources\zh-hans\default.zip. 2020-05-13 08:25:25 INFO: Finished downloading models and saved to C:\Users\WT.YX\stanza_resources. 2020-05-13 08:25:25 WARNING: Can not find mwt: default from official model list. Ignoring it. 2020-05-13 08:25:25 INFO: Loading these models for language: zh-hans (Simplified_Chinese):

2020-05-13 08:25:25 INFO: Use device: cpu 2020-05-13 08:25:25 INFO: Loading: tokenize 2020-05-13 08:25:25 INFO: Loading: pos 2020-05-13 08:25:28 INFO: Done loading processors!

stanza.download('en')
nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos', use_gpu=False)

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.0.0.json: 116kB [00:00, 1.28MB/s] 2020-05-13 08:23:02 INFO: Downloading default packages for language: en (English)... 2020-05-13 08:23:02 INFO: File exists: C:\Users\WT.YX\stanza_resources\en\default.zip. 2020-05-13 08:23:07 INFO: Finished downloading models and saved to C:\Users\WT.YX\stanza_resources. 2020-05-13 08:23:07 WARNING: Can not find mwt: default from official model list. Ignoring it. 2020-05-13 08:23:07 INFO: Loading these models for language: en (English):

Are there any tricks I am missing here? And based on the tutorials, mwtprocessor is required for pos, so will the absence of mwtprocessor affect the subsequent pos performance? Thanks for any enlightenment!

@yuhaozhang
Copy link
Member

According to the Universal Dependencies tokenization guideline, many languages do not have multi-word token (MWT) expansions. These languages include English and Chinese. So we do not have MWT models for English and Chinese and you do not need the MWT processors to produce accurate UD parsing for these languages.

@yuhaozhang
Copy link
Member

To add on the above answer, since this is a warning message, you can simply ignore it and it should not affect the actual running of the pipeline at all. But removing mwt from the processors list should help you get rid of the warning message.

@argosopentech
Copy link

argosopentech commented Mar 13, 2024

I'm also seeing this warning after upgrading my Stanza version to 1.8.1:

2024-03-13 11:51:25 WARNING: Language en package default expects mwt, which has been added

I have this for my processors list: processors="tokenize"

@AngledLuffa
Copy link
Collaborator

This is expected. English has MWT, such as won't, gonna, Jennifer's.

It has caused some amount of irritation, though, such as

#1366
#1361

We are working through those issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants