Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retriable errors and backoffs #898

Open
jsha opened this issue Apr 4, 2022 · 2 comments
Open

Retriable errors and backoffs #898

jsha opened this issue Apr 4, 2022 · 2 comments

Comments

@jsha
Copy link
Contributor

jsha commented Apr 4, 2022

I'm working on a tool that uses fetcher.go, and also calls GetProofByHash. I'd like to configure it to do backoffs and retries. I see that fetcher.go uses a Backoff struct provided internally:

bo := &backoff.Backoff{
Min: 1 * time.Second,
Max: 30 * time.Second,
Factor: 2,
Jitter: true,
}
var resp *ct.GetEntriesResponse
// TODO(pavelkalinnikov): Report errors in a LogClient decorator on failure.
if err := bo.Retry(ctx, func() error {
var err error
resp, err = f.client.GetRawEntries(ctx, r.start, r.end)
return err
}); err != nil {
glog.Errorf("%s: GetRawEntries() failed: %v", f.uri, err)
// There is no error reporting yet for this worker, so just retry again.
continue
}
fn(EntryBatch{Start: r.start, Entries: resp.Entries})
r.start += int64(len(resp.Entries))

But I think that usage might be broken. Backoff will only retry on grpc error codes that are retriable, plus the RetriableError struct. The GetRawEntries call I linked to is implemented by jsonclient.GetAndParse, which returns RspErrors. Those RspErrors don't match any of the conditions for retries.

I'd like to propose this: In addition to / instead of backoff.RetriableError, the backoff package should check for some interface, e.g. Retryable() bool. Then jsonclient.RspError can implement Retryable() as: a RspError is retryable if its wrapped error is a net.Timeout error OR the HTTP status code is >500.

What do you think?

@pav-kv
Copy link
Contributor

pav-kv commented May 11, 2022

Good catch, I think you are right. The retry will still happen because of the continue in line 289, but there will be no backoff in this case.

@rgdd
Copy link

rgdd commented Jan 6, 2025

Hi there, I just noticed this as well! I think the intent is that the backoff
should be working out-of-the-box for get-entries (CC @mhutchinson )? See:

#747

But as mentioned by @jsha, client.LogClient wraps errors as
jsonclient.RspError which are not retry:able.

Here are some options I can think of wrt. getting this fixed:

  1. Update the comments to clearly state how the backoff does (not) work. In
    particular, it doesn't work with client.LogClient. But it is possible to
    define your own implementation of fetcher.LogClient that returns the
    appropriate retry:able errors. I verified that this works
    here.
  2. Update func (f *Fetcher) runWorker so that it actually does retries by
    wrapping the errors similar to how it's done for the get-sth endpoint.
    This behavior would be more consistent and easier to understand.
  3. See also the ideas by @jsha.

(I omitted a few options that would clearly be breaking changes.)

On retry behavior: I agree that backoff and retry on 429 and 5XX makes sense in
general. I opted for 4XX and 5XX because I just want to avoid the silentct
software from hammering things "silently"; now it will poke slowly and
eventually the user will get a notification that it's behind on downloading.

It's worth noting that the current behavior is somewhat confusing. That is,
backoff does work for the get-sth endpoint; and it does print things that are
hard to silence for get-entries unless it's HTTP status 429
; but it does not
do any rate-limiting on 429 despite the comments and code snippets for this.

I don't have a strong opinion on what the fix for this issue should be, but I
think it would be good to commit something that makes it less likely that people
mis-use this library thinking they run with backoff when in fact they're not!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants