Current crawl speed is: # items/min #205

jdubs1994 · 2022-02-23T09:13:37Z

jdubs1994
Feb 23, 2022

Hey, newer to elixir and very new to Crawly. I'm trying to figure out what's going on. I have all the crawling logic working and I have the data I want to save to an Ecto Model(currently the variable head is the data I'd be saving). I'm just kinda confused on why Crawly keeps outputting the current crawl speed. Is this process supposed to end at some point or is it crawling every 1 minute? I don't know why it's logging crawl speed each minute, but it only outputs what I need once? Do I need to kill the process or something? Ideally, id like to crawl the site every x amount of time. I tried starting the same process again but it errors saying spider already started. So if the spider is already started and the process never ends, how do I get updated data from scraping the website again?

Thanks in advance!

  @impl Crawly.Spider
  def parse_item(response) do
     {:ok, document} = Floki.parse_document(response.body)

     post_links = document
     |> Floki.find("a.title")
     |> Floki.attribute("href")
     [head | _tail] = post_links
     IO.puts(head)


   %Crawly.ParsedItem{:items => [head], :requests => []}
  end

Answered by nuno84

Sep 12, 2022

Probably very late, but for future record, you must stop the crawler or set a limit for it to stop using for instance:
closespider_timeout or closespider_itemcount
More info here:
configuration

View full answer

nuno84 · 2022-09-12T13:21:57Z

nuno84
Sep 12, 2022

Probably very late, but for future record, you must stop the crawler or set a limit for it to stop using for instance:
closespider_timeout or closespider_itemcount
More info here:
configuration

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current crawl speed is: # items/min #205

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Current crawl speed is: # items/min #205

jdubs1994 Feb 23, 2022

Replies: 1 comment

nuno84 Sep 12, 2022

jdubs1994
Feb 23, 2022

nuno84
Sep 12, 2022