Skip to content

maxmwang/jobet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jobet

Distributed web/API scraping implementation, migrated from goscrape.

New features:

  • Pub-sub output: ZeroMQ is the primary form of output for scrape results, decoupling all handlers from the scraping daemon.
  • Priority-rated companies: Higher priority companies are scrapped more frequently, while lower priority are scrapped less frequently. This is implemented to reduce outbound request rate.

Design

Excalidraw Link.

design.png

Technologies

  • SQLite: Lightweight SQL database
  • gRPC: Lightweight communication between services
  • ZeroMQ: Zero-Broker message queue
  • Supabase: Open-source cloud platform

About

Distributed web/API scraping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published