Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGS] Current implementation of concurrency is causing errors #6

Open
pradhanhitesh opened this issue Jan 14, 2025 · 0 comments · May be fixed by #7
Open

[BUGS] Current implementation of concurrency is causing errors #6

pradhanhitesh opened this issue Jan 14, 2025 · 0 comments · May be fixed by #7

Comments

@pradhanhitesh
Copy link
Contributor

pradhanhitesh commented Jan 14, 2025

The current implementation of concurrency using Promise is causing errors such as Error checking robots.txt for {URL}: fetch failed which is likely due to multiple concurrent requests and causing Skipping {URL}: Not allowed by robots.txt Attaching a screenshot of the error. Upon manual inspection of failed cases, I have found that robots.txt does exist.

image

Upon doing some investigations, I believe that limiting the number of requests in the checkRobotsAndEnqueue function using const limit = pLimit(10); have yielded better results and less number of failed cases. @in-c0 I will be adding PR soon.

@pradhanhitesh pradhanhitesh linked a pull request Jan 15, 2025 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant