Creepy_Crawler

Creepy Crawler is a full-stack search engine application. It's inspired by popular search engine apps. It allows the user to make queries, see their history, and set their theme.

Crawl the web 🕷

Queries from the frontend are received asynchronously by Flask with help from the Crochet library where they are processed and passed to the Scrapy spiders.

import crochet
crochet.setup()
@crochet.wait_for(timeout=200.0)
def scrape_with_crochet(raw_query):
  partitioned_query = ...
  query_regex = re.compile(...)
  dispatcher.connect(_crawler_result, signal=signals.item_scraped)
  spiders = [...]
  if len(partitioned_query):
      for spider in spiders: crawl_runner.crawl(spider, query_regex=query_regex)
      eventual = crawl_runner.join()
      return

Settings are passed from Flask backend to Scrapy framework through configuration object.

...
from scrapy.utils.project import get_project_settings
...
settings = get_project_settings()
settings_dict = json.load(open('app/api/routes/settings.json'))
settings.update(settings_dict)
crawl_runner = CrawlerRunner(settings)

Each spider runs a broad crawl through the web, starting from a seed URL.

class BroadCrawler2(scrapy.Spider):
  """Broad crawling spider."""

  name = 'broad_crawler_2'
  start_urls = ['https://example.com/']

  def parse(self, response):
      """Follow links."""
      try:
          all_text = response.css('*:not(script):not(style)::text')
          for text in all_text:
              query_found = bool(re.search(self.query_regex, text.get()))
              if query_found: yield { 'url': response.request.url, 'text': text.get() }
              
      except: print(f'End of the line error for {self.name}.')

      yield from response.follow_all(css='a::attr(href)', callback=self.parse)

Create custom themes 🎨

AWS integration allows users to add backgrounds and profile images of their choice.

Look over your search history 🔍

The user can conveniently switch between 24 and 12 hour time.
Moreover, NATO timezone abbreviations are specially parsed for users with altered native settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Creepy_Crawler

Crawl the web 🕷

Create custom themes 🎨

Look over your search history 🔍

Enjoy advanced interactions with your themes 🧮

Contact

Errors I encountered and conquered:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Creepy_Crawler

Crawl the web 🕷

Create custom themes 🎨

Look over your search history 🔍

Enjoy advanced interactions with your themes 🧮

Contact

Errors I encountered and conquered: