Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add configurable retry with exponential backoff to HTTP requests #34847

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

jannatkhandev
Copy link
Contributor

@jannatkhandev jannatkhandev commented Dec 30, 2024

Proposed changes

Add HTTP Retry Mechanism

The HTTP accessor in the Apps Engine now supports automatic retries with exponential backoff for failed requests. This feature helps apps handle transient network failures and API rate limits more gracefully.

Configuration

The retry mechanism is disabled by default and can be enabled per request through the retry configuration option in the HTTP request options.

Default Configuration

{
  enabled: false,    // Retry mechanism disabled by default
  maxAttempts: 3,    // Maximum number of attempts including the initial request
  initialDelay: 1000, // Initial delay in milliseconds before first retry
  statusCodesToRetry: [
            HttpStatusCode.SERVICE_UNAVAILABLE,
            HttpStatusCode.INTERNAL_SERVER_ERROR,
            HttpStatusCode.TOO_MANY_REQUESTS,
            HttpStatusCode.GATEWAY_TIMEOUT,
            HttpStatusCode.BAD_GATEWAY
        ]
}

Configuration Options

Option Type Default Description
enabled boolean false Enables/disables the retry mechanism
maxAttempts number 3 Maximum number of attempts including the initial request
initialDelay number 1000 Initial delay in milliseconds before first retry
statusCodesToRetry number[] [503, 500, 429, 504, 502] Status codes that trigger a retry

Usage Examples

Basic Usage with Default Settings

import { IHttp } from '@rocket.chat/apps-engine/definition/accessors';

export async function makeRequest({ http }: { http: IHttp }) {
    const response = await http.get('https://api.example.com/data', {
        retry: {
            enabled: true // Uses default maxAttempts and initialDelay
        }
    });
    return response;
}

Custom Configuration

import { IHttp } from '@rocket.chat/apps-engine/definition/accessors';

export async function makeRequest({ http }: { http: IHttp }) {
    const response = await http.get('https://api.example.com/data', {
        retry: {
            enabled: true,
            maxAttempts: 5,     // Try up to 5 times
            initialDelay: 2000  // Start with 2 second delay
        }
    });
    return response;
}

How It Works

When enabled, the retry mechanism will:

  1. Attempt the HTTP request
  2. If the request fails:
    • If retries are disabled or max attempts reached: throw the error
    • Otherwise: wait for the calculated delay and retry
  3. For each retry:
    • The delay increases exponentially: initialDelay * (2 ^ attemptNumber)
    • Example with initialDelay = 1000:
      • First retry: 1000ms delay
      • Second retry: 2000ms delay
      • Third retry: 4000ms delay

Error Handling

The final error from the last retry attempt will be thrown after all retries are exhausted. This error maintains the original structure and stack trace from the failed request.

try {
    await http.get('https://api.example.com/data', {
        retry: {
            enabled: true,
            maxAttempts: 3
        }
    });
} catch (error) {
    // Error from the last failed attempt
    console.error('All retries failed:', error);
}

Best Practices

  1. Selective Usage: Enable retries only for operations that are safe to retry (idempotent operations)
  2. Timeout Configuration: Consider setting appropriate request timeouts along with retry configuration
  3. Monitoring: Monitor retry patterns to identify recurring issues with external services

Interface Definition

interface IHttpRetryConfig {
    /** Whether retry functionality is enabled */
    enabled?: boolean;
    /** Maximum number of retry attempts */
    maxAttempts?: number;
    /** Initial delay in milliseconds before first retry */
    initialDelay?: number;
    /** Status codes that trigger a retry */
    statusCodesToRetry?: HttpStatusCode[];
}

interface IHttpRequest {
    // ... existing options ...
    retry?: IHttpRetryConfig;
}

Implementation Details

The retry mechanism is implemented in the Http class of the Apps Engine. It preserves all existing functionality including:

  • Pre-request handlers
  • Pre-response handlers
  • Default headers and parameters
  • Error handling

Each retry attempt is not treated as a new request, all pre-request and pre-response handlers run once to prevent possible side effects.

@jannatkhandev jannatkhandev requested a review from a team as a code owner December 30, 2024 14:38
Copy link
Contributor

dionisio-bot bot commented Dec 30, 2024

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label
  • This PR is missing the required milestone or project

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

Copy link

changeset-bot bot commented Dec 30, 2024

⚠️ No Changeset found

Latest commit: e71db48

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Gustrb
Copy link
Contributor

Gustrb commented Dec 30, 2024

Thanks for the contribution!!
This seems like a really interesting approach, I have some concerns about:

  • When should we retry, I think we should retry if and only if the response is a 503 or a 500
  • Should we (potentially) call the pre http request handlers more than once(?) won't that be problematic?

@jannatkhandev
Copy link
Contributor Author

@Gustrb
Thank you for the valuable feedback! I completely agree with your points.
Yes, my initial intention was actually to be more selective with retries. The retry mechanism should primarily handle transient failures like service unavailability (503) and network errors, rather than retrying on all error types.
Here's what I propose for the retry conditions:

  1. Network/connection errors (e.g., timeout, connection refused)
  2. HTTP 503 (Service Unavailable) - perfect candidate for retries as it indicates temporary unavailability
  3. HTTP 500 (Internal Server Error) - could be included as these might be transient server issues
  4. Rate limit responses (HTTP 429) - could also be a candidate with proper delay handling
  5. or more 4xx and 5xx responses?

Regarding the pre-request handlers - point to be discussed. While the current implementation maintains consistency by running them on each attempt, we should discuss if this is the desired behavior or if we should preserve the handlers' result from the initial attempt?

@jannatkhandev
Copy link
Contributor Author

@Gustrb
I have made changes addressing your review, can you PTAL?
Thanks.

@jannatkhandev jannatkhandev requested a review from Gustrb January 1, 2025 08:17
@Gustrb
Copy link
Contributor

Gustrb commented Jan 1, 2025

Hey @jannatkhandev I looked at the good and it mostly seems good to me, here is the course of actions I will take(and have been taking since Yesterday);

  • Discuss internally with the more experienced developers in the apps engine
  • Check if approach is the best one (we are thinking of not having this embedded into the bridge, but instead have it as an available wrapper so that apps can use it, but are not forced to it)

The issue of calling the pre-http handlers concerns me, it seems like something that could end up breaking some external integration and I am skeptical of not having it as a breaking change (it is not really a breaking change, but we would be potentially changing the behaviour of those hooks which is not ideal).

I'll keep you updated on this one, thanks for the good work :)

import type { AppBridges } from '../bridges/AppBridges';
import type { AppAccessorManager } from '../managers/AppAccessorManager';

export class Http implements IHttp {
private static readonly DEFAULT_RETRY_CONFIG: IHttpRetryConfig = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of having this hardcoded in here, we should have settings so the user can configure that in their workspace, wdyt @d-gubert ??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even a retry policy inside the app's settings, telling how should it retry. But I am not sure if it would not be overkill for now though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the app's settings, like having default retry config for the app.
For the workspace setting as these changes revolve around apps using Http API, I feel it as an overkill.
Happy to make changes as advised :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I think having it as a workspace setting would be the easier way off.

@jannatkhandev
Copy link
Contributor Author

@Gustrb thanks for your prompt reply.
Requesting you to please add me to the group conversation with @d-gubert @rodrigok sir on our open server? (on open.rocket.chat, my username is jannat.khan)
I have several improvement ideas for the apps engine, I will be happy to create sprints for the same and work on the tasks.

@Gustrb
Copy link
Contributor

Gustrb commented Jan 2, 2025

Hey, I don't think we need to create sprints &/or tasks to implement those.
But I'll connect with you on open so we can discuss your ideas wdyt?

@jannatkhandev
Copy link
Contributor Author

@Gustrb Moved retry logic to higher order function as advised, PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants