You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder, can we improve final score, if we encode each word and masking some numeric entry followed by classification, rather than character level classification.
#11
Open
shreeshiv opened this issue
May 3, 2020
· 1 comment
I wonder, can we improve final score, if we encode each word and masking some numeric entry followed by classification, rather than character level classification for task 3?
The text was updated successfully, but these errors were encountered:
Thank you @shreeshiv ! Constructing a dictionary is indeed a valid approach and, as I believe, a common practice in NLP. And yes, there is a solid chance that it may improve performance. However, it also comes with some disadvantages, such as we won't be able to detect a word outside the constructed dictionary, and it puts more heavy lifting on encoding.
In our case, we thought it is very likely that a non-dictionary word will appear in the test set, such as abbreviations, shop names, or menu entries. Characters, on the other hand, are easy to encode and can deal with new words, and have yielded satisfying results.
However, I do encourage you to explore a word-based approach if you would like!
I wonder, can we improve final score, if we encode each word and masking some numeric entry followed by classification, rather than character level classification for task 3?
The text was updated successfully, but these errors were encountered: