You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo is amazing for quickly finding the recent papers on a particular NLP problem. Thanks a lot for creating it! However, the leaderboards would be even more useful if they distinguished between methods using additional data and the ones that only used the supplied dataset. See this blog post by Anna Rogers for a more detailed discussion of why this is important.
For instance, consider the IMDB leaderboard on http://nlpprogress.com/english/sentiment_analysis.html. The top few methods use 100GB of text, whereas the virtual adversarial training (Miyato et al., 2016) on # 7 only uses the 25K labeled observations + 50K unlabeled observations, which are supplied as part of the IMDB task. It's very useful to know that.
Pushing the bigger models to limit is important as they can teach us about how far we can take the current paradigm and about its fundamental limitations, whereas it's similarly great to know the best approaches for training in a low-resource paradigm w/o a large pretrained model. Perhaps a simple solution would be to split them out into an "all tricks allowed" section and an "only supplied dataset" section?
The text was updated successfully, but these errors were encountered:
Hi Bjarke, thanks a lot for the suggestion! I think that's a really important distinction to make and I'd be really happy if we could surface this going forward. Alternatively, for some tasks people have already been using an asterisk next to the method name to highlight settings of particular methods. It might make sense to go with this as I think it's also helpful to know what additional resources a method employed and as it might be less obtrusive.
The asterisk approach makes sense. For some benchmarks, it would require many different symbols. For instance, for the IMDB dataset on the sentiment analysis leaderboard, e.g. XLNet uses Wikipedia, BooksCorpus, Giga5, ClueWeb, and Common Crawl, whereas BERT_large and BERT_base uses Wikipedia and BooksCorpus. If this repo went with an asterisk-style, perhaps the simplest would be to use a different symbol for each resource?
These symbols could be reused across different methods. This is similar to the author affiliation symbols in research publications. That would make it easy to compare across methods w/o having to read footnotes for each one. What do you think?
This repo is amazing for quickly finding the recent papers on a particular NLP problem. Thanks a lot for creating it! However, the leaderboards would be even more useful if they distinguished between methods using additional data and the ones that only used the supplied dataset. See this blog post by Anna Rogers for a more detailed discussion of why this is important.
For instance, consider the IMDB leaderboard on http://nlpprogress.com/english/sentiment_analysis.html. The top few methods use 100GB of text, whereas the virtual adversarial training (Miyato et al., 2016) on # 7 only uses the 25K labeled observations + 50K unlabeled observations, which are supplied as part of the IMDB task. It's very useful to know that.
Pushing the bigger models to limit is important as they can teach us about how far we can take the current paradigm and about its fundamental limitations, whereas it's similarly great to know the best approaches for training in a low-resource paradigm w/o a large pretrained model. Perhaps a simple solution would be to split them out into an "all tricks allowed" section and an "only supplied dataset" section?
The text was updated successfully, but these errors were encountered: