Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect forms #62

Closed
leoalenc opened this issue Feb 5, 2020 · 5 comments
Closed

Incorrect forms #62

leoalenc opened this issue Feb 5, 2020 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@leoalenc
Copy link
Contributor

leoalenc commented Feb 5, 2020

Out of the following entries, the last two are spurious:

~/MorphoBr$ grep -E "sagueir[oa]" nouns/*.dict adjectives/*.dict
nouns/nouns.gfl.dict:sagueiro   sagueiro+N+M+SG
nouns/nouns.gfl.dict:sagueiros  sagueiro+N+M+PL
nouns/q-z.delaf.dict:sagueiões  sagueiro+N+F+PL
nouns/q-z.delaf.dict:sagueiro   sagueiro+N+F+SG

In Portuguese, only "sagueiro" (name of a palm tree) exists https://www.infopedia.pt/dicionarios/lingua-portuguesa/sagueiro

The three last forms of lexeme "comentarista" are spurious:

~/MorphoBr/nouns$ grep -E "[[:space:]]comentarista\+N" *.dict
a-c.delaf.dict:comentarista     comentarista+N+F+SG
a-c.delaf.dict:comentarista     comentarista+N+M+SG
a-c.delaf.dict:comentaristas    comentarista+N+F+PL
a-c.delaf.dict:comentaristas    comentarista+N+M+PL
nouns.gfl.dict:comentaristaa    comentarista+N+F+SG
nouns.gfl.dict:comentaristaas   comentarista+N+F+PL
nouns.gfl.dict:comentaristaes   comentarista+N+M+PL

Extracting possible spurious noun forms:

~/MorphoBr/nouns$ grep -E ".+a[ae]s?[[:space:]]" *.dict
nouns.gfl.dict:cagarretaa       cagarreta+N+F+SG
nouns.gfl.dict:cagarretaas      cagarreta+N+F+PL
nouns.gfl.dict:cagarretaes      cagarreta+N+M+PL
nouns.gfl.dict:comentaristaa    comentarista+N+F+SG
nouns.gfl.dict:comentaristaas   comentarista+N+F+PL
nouns.gfl.dict:comentaristaes   comentarista+N+M+PL
nouns.gfl.dict:folha-sebrae     folha-sebrae+N+F+SG
nouns.gfl.dict:hediondezaes     hediondeza+N+F+PL
nouns.gfl.dict:reflectânciaa    reflectância+N+F+SG
nouns.gfl.dict:reflectânciaas   reflectância+N+F+PL
nouns.gfl.dict:reflectânciaes   reflectância+N+M+PL
nouns.gfl.dict:retransmissoraes retransmissora+N+F+PL
nouns.gfl.dict:talassaes        talassa+N+F+PL
nouns.gfl.dict:talassaes        talassa+N+M+PL
nouns.gfl.dict:venezuelaas      venezuelao+N+F+PL
nouns.gfl.dict:venezuelaa       venezuelao+N+F+SG
q-z.delaf.dict:reggae   reggae+N+M+SG
q-z.delaf.dict:reggaes  reggae+N+M+PL
q-z.delaf.dict:sundaes  sundae+N+M+PL
q-z.delaf.dict:sundae   sundae+N+M+SG

Out of these forms, the following are correct ones, i.e. the lemma
ends with "ae":

nouns.gfl.dict:folha-sebrae     folha-sebrae+N+F+SG
q-z.delaf.dict:reggae   reggae+N+M+SG
q-z.delaf.dict:reggaes  reggae+N+M+PL
q-z.delaf.dict:sundaes  sundae+N+M+PL
q-z.delaf.dict:sundae   sundae+N+M+SG

Besides, the following lemmas are spurious:

nouns.gfl.dict:venezuelaas      venezuelao+N+F+PL
nouns.gfl.dict:reflectânciaes   reflectância+N+M+PL

The former doesn't seem to exist, the latter only exists as a feminine
noun, see https://www.infopedia.pt/dicionarios/lingua-portuguesa/reflectância

Applying the same procedure to adjectives:

~/MorphoBr/adjectives$ grep -E ".+a[ae]s?[[:space:]]" *.dict

The results are classified into groups and commented on below:

  1. spurious final "a" in the inflected form:
adjs.gfl.dict:cagarretaa        cagarreta+A+F+SG
adjs.gfl.dict:cagarretaas       cagarreta+A+F+PL
adjs.gfl.dict:cagarretaes       cagarreta+A+M+PL
  1. missing "d" between final vowels, e.g. it should read "calejado"
    instead of "calejao"
adjs.gfl.dict:calejaa   calejao+A+F+SG
adjs.gfl.dict:calejaas  calejao+A+F+PL
adjs.gfl.dict:camuflaa  camuflao+A+F+SG
adjs.gfl.dict:camuflaas camuflao+A+F+PL
adjs.gfl.dict:camurçaa  camurçao+A+F+SG
adjs.gfl.dict:camurçaas camurçao+A+F+PL
adjs.gfl.dict:canaliculaa       canaliculao+A+F+SG
adjs.gfl.dict:canaliculaas      canaliculao+A+F+PL
adjs.gfl.dict:caparazonaa       caparazonao+A+F+SG
adjs.gfl.dict:caparazonaas      caparazonao+A+F+PL
adjs.gfl.dict:firmamentaa       firmamentao+A+F+SG
adjs.gfl.dict:firmamentaas      firmamentao+A+F+PL
  1. spurious "e" before the plural morpheme -s:
adjs.gfl.dict:cavernícolaes     cavernícola+A+F+PL
adjs.gfl.dict:cavernícolaes     cavernícola+A+M+PL
adjs.gfl.dict:hospitaes hospita+A+F+PL
adjs.gfl.dict:hospitaes hospita+A+M+PL
adjs.gfl.dict:talassícolaes     talassícola+A+F+PL
adjs.gfl.dict:talassícolaes     talassícola+A+M+PL
  1. the following lemma is a Latin noun in the genitive singular, it
    only occurs in Portuguese in the MWE "curriculum vitae":
adjs.gfl.dict:vitae     vitae+A+F+PL
adjs.gfl.dict:vitae     vitae+A+F+SG
adjs.gfl.dict:vitae     vitae+A+M+PL
adjs.gfl.dict:vitae     vitae+A+M+SG
  1. an "n" is missing in the following forms:
~/MorphoBr/adjectives$ grep -E "plutómao" *.dict
adjs.gfl.dict:plutómaa  plutómao+A+F+SG
adjs.gfl.dict:plutómaas plutómao+A+F+PL
adjs.gfl.dict:plutómao  plutómao+A+M+SG
adjs.gfl.dict:plutómaos plutómao+A+M+PL

As shown by the file names, almost all the above errors were inherited from GFL.

@leoalenc leoalenc added the bug Something isn't working label Feb 5, 2020
@lucasrct
Copy link
Contributor

Entradas removidas:

Nomes:

nouns/q-z.delaf.dict:sagueiões  sagueiro+N+F+PL
nouns/q-z.delaf.dict:sagueiro   sagueiro+N+F+SG
nouns.gfl.dict:comentaristaa    comentarista+N+F+SG
nouns.gfl.dict:comentaristaas   comentarista+N+F+PL
nouns.gfl.dict:comentaristaes   comentarista+N+M+PL
nouns.gfl.dict:aa	a+N+M+PL
nouns.gfl.dict:cagarretaa	cagarreta+N+F+SG
nouns.gfl.dict:reflectânciaa	reflectância+N+F+SG
nouns.gfl.dict:venezuelaa	venezuelao+N+F+SG
nouns.gfl.dict:cagarretaa       cagarreta+N+F+SG
nouns.gfl.dict:cagarretaas      cagarreta+N+F+PL
nouns.gfl.dict:cagarretaes      cagarreta+N+M+PL
nouns.gfl.dict:comentaristaa    comentarista+N+F+SG
nouns.gfl.dict:comentaristaas   comentarista+N+F+PL
nouns.gfl.dict:comentaristaes   comentarista+N+M+PL
nouns.gfl.dict:hediondezaes     hediondeza+N+F+PL
nouns.gfl.dict:reflectânciaa    reflectância+N+F+SG
nouns.gfl.dict:reflectânciaas   reflectância+N+F+PL
nouns.gfl.dict:reflectânciaes   reflectância+N+M+PL
nouns.gfl.dict:retransmissoraes retransmissora+N+F+PL
nouns.gfl.dict:talassaes        talassa+N+F+PL
nouns.gfl.dict:talassaes        talassa+N+M+PL
nouns.gfl.dict:venezuelaas      venezuelao+N+F+PL
nouns.gfl.dict:venezuelaa       venezuelao+N+F+SG

Adjetivos:

adjs.gfl.dict:cagarretaa	cagarreta+A+F+SG
adjs.gfl.dict:cagarretaas	cagarreta+A+F+PL
adjs.gfl.dict:cagarretaes	cagarreta+A+M+PL

@arademaker
Copy link
Contributor

Obrigado. Aguardo PR para fechar issue. Obs: talvez #59 possa reintroduzir algumas destas formas ou derivações delas?

@lucasrct
Copy link
Contributor

Obrigado. Aguardo PR para fechar issue. Obs: talvez #59 possa reintroduzir algumas destas formas ou derivações delas?

@arademaker vou terminar os adjetivos e faço a PR. É verdade, mas talvez o ideal agora seja rodar de novo os scripts do Hélio com essa base atualizada para evitar isso.

lucasrct added a commit that referenced this issue Feb 12, 2020
lucasrct added a commit that referenced this issue Feb 12, 2020
lucasrct added a commit that referenced this issue Feb 12, 2020
@arademaker
Copy link
Contributor

Verdade. Então estaríamos abandonando a ideia da conversa com @leoalenc de aproveitar diretamente os arquivos dict que estão la no repositório do Hélio. Teríamos que aguardar os scripts dele que geram as entradas e então rodar os scripts que já estão lá para produzir novos dict

lucasrct added a commit that referenced this issue Feb 12, 2020
@lucasrct lucasrct mentioned this issue Feb 12, 2020
@lucasrct
Copy link
Contributor

'd' adicionado às palavras, todas possuem entradas em algum dos dicionários com pedigree

adjs.gfl.dict:calejaa   calejao+A+F+SG
adjs.gfl.dict:calejaas  calejao+A+F+PL
adjs.gfl.dict:camuflaa  camuflao+A+F+SG
adjs.gfl.dict:camuflaas camuflao+A+F+PL
adjs.gfl.dict:camurçaa  camurçao+A+F+SG
adjs.gfl.dict:camurçaas camurçao+A+F+PL
adjs.gfl.dict:canaliculaa       canaliculao+A+F+SG
adjs.gfl.dict:canaliculaas      canaliculao+A+F+PL
adjs.gfl.dict:caparazonaa       caparazonao+A+F+SG
adjs.gfl.dict:caparazonaas      caparazonao+A+F+PL
adjs.gfl.dict:firmamentaa       firmamentao+A+F+SG
adjs.gfl.dict:firmamentaas      firmamentao+A+F+PL

'e' removido das palavras:

adjs.gfl.dict:cavernícolaes     cavernícola+A+F+PL
adjs.gfl.dict:cavernícolaes     cavernícola+A+M+PL
adjs.gfl.dict:talassícolaes     talassícola+A+F+PL
adjs.gfl.dict:talassícolaes     talassícola+A+M+PL

a palavra "hospita" não existe em nenhum dos dicionários, portanto as removi

adjs.gfl.dict:hospitaes hospita+A+F+PL
adjs.gfl.dict:hospitaes hospita+A+M+PL

Removido:

adjs.gfl.dict:vitae     vitae+A+F+PL
adjs.gfl.dict:vitae     vitae+A+F+SG
adjs.gfl.dict:vitae     vitae+A+M+PL
adjs.gfl.dict:vitae     vitae+A+M+SG

Adicionado "n" às palavras:

adjs.gfl.dict:plutómaa  plutómao+A+F+SG
adjs.gfl.dict:plutómaas plutómao+A+F+PL
adjs.gfl.dict:plutómao  plutómao+A+M+SG
adjs.gfl.dict:plutómaos plutómao+A+M+PL

arademaker pushed a commit that referenced this issue Feb 12, 2020
* resolvendo issue-62
arademaker pushed a commit that referenced this issue Jul 8, 2020
* resolvendo issue-62

* resolvendo issue #62

* Adicionando trabalho do Hélio, contido na pasta suffixes

* Adicionando trabalho do Hélio, na pasta suffixes

Co-authored-by: lucasrct <lucasribeiro@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants