You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of concurrency using Promise is causing errors such as Error checking robots.txt for {URL}: fetch failed which is likely due to multiple concurrent requests and causing Skipping {URL}: Not allowed by robots.txt Attaching a screenshot of the error. Upon manual inspection of failed cases, I have found that robots.txt does exist.
Upon doing some investigations, I believe that limiting the number of requests in the checkRobotsAndEnqueue function using const limit = pLimit(10); have yielded better results and less number of failed cases. @in-c0 I will be adding PR soon.
The text was updated successfully, but these errors were encountered:
The current implementation of concurrency using Promise is causing errors such as
Error checking robots.txt for {URL}: fetch failed
which is likely due to multiple concurrent requests and causingSkipping {URL}: Not allowed by robots.txt
Attaching a screenshot of the error. Upon manual inspection of failed cases, I have found thatrobots.txt
does exist.Upon doing some investigations, I believe that limiting the number of requests in the
checkRobotsAndEnqueue
function usingconst limit = pLimit(10);
have yielded better results and less number of failed cases. @in-c0 I will be adding PR soon.The text was updated successfully, but these errors were encountered: