What are search_pubs() best practices and expectations? #430
Replies: 5 comments 15 replies
-
There's a small chance that python -m unittest test_module.TestLuminati and see if it passes. You should have |
Beta Was this translation helpful? Give feedback.
-
Coming from issue #291 So I tried scraperapi like @stanleyrhodes and it worked so well. Bottom line: this won't be enough but in my case I have no budget for paying a scraping api. One python-noob side question: with open(filename,"w",encoding="utf-8") as f:
for pub in scholarly.search_pubs(query = "my query"):
file.write(str(pub)+"\n") But it is not that easy to reuse these lines now (if I ever want to try |
Beta Was this translation helpful? Give feedback.
-
One alternative would be to have multiple accounts, say each person from your research group has one, and you could pool in all your credits together. If |
Beta Was this translation helpful? Give feedback.
-
The captchas option might be interesting but I am not sure in which context you suggest to use them.
|
Beta Was this translation helpful? Give feedback.
-
Good to know about that program. As you say its built-in rate limiting for GS might be the key point for this issue. |
Beta Was this translation helpful? Give feedback.
-
I have some general questions about search_pubs() that are more discussion questions for those who use search_pubs than precise QnA-style questions. It would be useful to know what others have been able to do to set new users' expectations, including mine.
Background: I've had little success (or luck) using search_pubs to build a computationally-aided lit review. In my case, I was trying to find papers that contained particular phrases that defined a technical term that, in usage, varied enough that it wasn't quite yet a technical term. Terms like these are places where researchers tend to talk past one another, so working on them holds promise for advancing research in that area. Mapping their usage requires a lot of searching and papers. In my case, depending on the combination of phrased used and phrases disallowed, it could lead to roughly 5,000 to 30,000 results. I didn't expect to do all of that in one search, I was planning to do smaller sets (~500-1000) and then combine them, but that's still way too much to do by hand. That's why I started investigating and playing with Scholarly. Although I did once succeed in retrieving ~500 results from a test search, I usually fail to get anything for even small, < 20 results searches. I went from probably a 80% failure rate to a 99% failure rate. It may be that I'm doing a lot of things wrong, or it may be that GS is just too good at blocking anything and everything search_pubs(). I don't recall having a single success with FreeProxies. I've been using BrightData (formerly Luminati) data center proxies, and I've tried both rotating and longer-term IPs. I've tried limiting the data centers to a particular country or leaving it global. I've had the most luck with global, rotating IPs.
What is your success rate with search_pubs searches? Roughly how big is your expected number of results (e.g. 10, 100, 1000, etc.)? Do you use a proxy, and if so, which, and with what settings?
Beta Was this translation helpful? Give feedback.
All reactions