Building Resilient Systems: A Practical Guide to Circuit Breakers, Retries, and Backpressure

Building Resilient Systems: A Practical Guide to Circuit Breakers, Retries, and Backpressure

In the world of distributed systems, failures are inevitable. Whether it's a network glitch, a slow downstream service, or a complete outage, resilient systems are designed to handle these failures gracefully. In this post, we'll explore three key patterns for building resilient systems: Circuit BreakersRetries, and Backpressure. We'll also provide code examples in JavaScript for better readability.


1. Circuit Breakers

Circuit Breaker is a design pattern that prevents a system from making requests to a failing service. It acts like an electrical circuit breaker: when the failure rate crosses a threshold, the circuit "opens," and requests are no longer sent to the failing service.

Why Use Circuit Breakers?

  • To avoid overwhelming a failing service.
  • To fail fast and provide fallback responses.
  • To allow the failing service time to recover.

How Circuit Breakers Work

Circuit breakers typically operate in three states:

  1. Closed: Requests flow normally. Failures are tracked.
  2. Open: Requests are blocked for a set period to allow the service to recover.
  3. Half-Open: A limited number of requests are allowed to test if the service has recovered.

If the test requests succeed, the circuit transitions back to Closed. If they fail, the circuit remains Open.

Example: Circuit Breaker in JavaScript

class CircuitBreaker {
  constructor({ failureThreshold, recoveryTime }) {
    this.failureThreshold = failureThreshold; // Max failures before opening the circuit
    this.recoveryTime = recoveryTime; // Time to wait before trying again
    this.failures = 0;
    this.lastFailureTime = null;
    this.state = 'CLOSED'; // CLOSED, OPEN, HALF-OPEN
  }

  async execute(requestFn) {
    if (this.state === 'OPEN') {
      const now = Date.now();
      if (now - this.lastFailureTime > this.recoveryTime) {
        this.state = 'HALF-OPEN';
      } else {
        throw new Error('Circuit is OPEN. Request blocked.');
      }
    }

    try {
      const response = await requestFn();
      this.reset();
      return response;
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }

  recordFailure() {
    this.failures++;
    this.lastFailureTime = Date.now();
    if (this.failures >= this.failureThreshold) {
      this.state = 'OPEN';
    }
  }

  reset() {
    this.failures = 0;
    this.state = 'CLOSED';
  }
}

// Usage
const breaker = new CircuitBreaker({ failureThreshold: 3, recoveryTime: 5000 });

async function fetchData() {
  // Simulate a request
  if (Math.random() > 0.7) {
    return 'Success!';
  } else {
    throw new Error('Service failed');
  }
}

(async () => {
  try {
    const data = await breaker.execute(fetchData);
    console.log(data);
  } catch (error) {
    console.error(error.message);
  }
})();

2. Retries

Retries are a simple yet powerful mechanism to handle transient failures. Instead of failing immediately, the system retries the operation a few times before giving up.

Best Practices for Retries

  • Use exponential backoff to avoid overwhelming the service.
  • Set a maximum retry limit to prevent infinite loops.
  • Combine with a Circuit Breaker to avoid retrying during outages.

When to Use Retries

Retries are most effective for transient issues, such as:

  • Temporary network glitches.
  • Rate-limiting errors (e.g., HTTP 429).
  • Timeout errors.

However, retries should not be used for permanent failures, such as invalid input or authentication errors.

Example: Retry Logic in JavaScript

async function retry(fn, retries = 3, delay = 1000) {
  for (let attempt = 1; attempt <= retries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === retries) {
        throw new Error(`Failed after ${retries} attempts: ${error.message}`);
      }
      console.log(`Retrying... (${attempt}/${retries})`);
      await new Promise((resolve) => setTimeout(resolve, delay * attempt)); // Exponential backoff
    }
  }
}

// Usage
async function fetchData() {
  if (Math.random() > 0.7) {
    return 'Data fetched successfully!';
  } else {
    throw new Error('Temporary failure');
  }
}

(async () => {
  try {
    const data = await retry(fetchData, 5, 500);
    console.log(data);
  } catch (error) {
    console.error(error.message);
  }
})();

3. Backpressure

Backpressure is a mechanism to prevent a system from being overwhelmed by too many requests. It ensures that the system processes requests at a manageable rate.

Why Use Backpressure?

  • To avoid resource exhaustion.
  • To maintain system stability under high load.

Techniques for Implementing Backpressure

  1. Queues: Buffer incoming requests and process them at a controlled rate.
  2. Rate Limiting: Limit the number of requests a client can make within a time window.
  3. Throttling: Dynamically adjust the rate of processing based on system load.

Example: Backpressure with a Queue

class Queue {
  constructor(maxSize) {
    this.queue = [];
    this.maxSize = maxSize;
  }

  enqueue(item) {
    if (this.queue.length >= this.maxSize) {
      throw new Error('Queue is full! Applying backpressure...');
    }
    this.queue.push(item);
  }

  dequeue() {
    return this.queue.shift();
  }

  size() {
    return this.queue.length;
  }
}

// Usage
const queue = new Queue(5);

try {
  for (let i = 0; i < 10; i++) {
    queue.enqueue(`Task ${i}`);
    console.log(`Enqueued Task ${i}`);
  }
} catch (error) {
  console.error(error.message);
}

while (queue.size() > 0) {
  console.log(`Processing: ${queue.dequeue()}`);
}

Conclusion

Building resilient systems is essential in today's distributed architectures. By implementing Circuit BreakersRetries, and Backpressure, you can ensure that your system remains stable and responsive, even in the face of failures.

Key Takeaways

  • Circuit Breakers prevent cascading failures by stopping requests to failing services.
  • Retries handle transient failures but should be used judiciously to avoid overwhelming services.
  • Backpressure protects systems from overload by controlling the flow of requests.

These patterns are not mutually exclusive. In fact, they work best when combined. For example, you can use retries with exponential backoff alongside a circuit breaker to handle transient failures while avoiding overwhelming a failing service.

Start incorporating these patterns into your systems today, and you'll be well on your way to building robust and reliable applications!

Read more