Modern backends almost always rely on external APIs—payments, messaging (SMS/WhatsApp), maps, identity providers, shipping services, or even LLMs. The moment you integrate with these services, you also inherit their instability: outages, latency spikes, throttling, and unpredictable behaviour.
No matter how well your own system is designed, you have zero control over the reliability of third-party vendors.
This is exactly where the Circuit Breaker pattern comes in. It is one of the most effective architectural tools for preventing external failures from spreading and taking down your entire backend.
What Is the Circuit Breaker Pattern?
A circuit breaker is a resilience mechanism that protects your system from repeatedly calling an unhealthy external dependency.
In practice, it works by:
-
Tracking requests made to an external service
-
Detecting when failures or slow responses exceed acceptable thresholds
-
Temporarily blocking further calls to that service (opening the circuit)
-
Waiting for a cooldown period before cautiously retrying requests
The idea comes from electrical engineering. When a circuit is overloaded, the breaker trips to prevent damage or fire. In software, the goal is similar: fail fast instead of letting failures drag on, consume resources, and escalate into a system-wide incident.
Why Circuit Breakers Matter for Third-Party APIs
Third-party APIs are uniquely risky because:
-
They are completely outside your control
-
Their failure modes are often unpredictable
-
They may slow down instead of fully going offline
-
They strictly enforce rate limits
Without a circuit breaker, your backend continues to wait, retry, and accumulate pending work. Over time, this exhausts threads, database connections, and request queues. Eventually, your application looks “down” to users—even though the real issue lies with an external service.
A circuit breaker acts as a protective boundary, shielding your backend from external latency and failure, much like a firewall for reliability.
How Circuit Breakers Work: The Three States
A typical circuit breaker operates in three distinct states.
1. Closed (Normal Operation)
In the closed state, requests flow normally to the external service. The system silently monitors failures and latency in the background.
2. Open (Fail Fast)
Once failure thresholds are exceeded, the breaker trips and enters the open state. Requests are rejected immediately without contacting the vendor. Optional fallback logic—such as cached responses or default data—can be executed instead.
3. Half-Open (Recovery Testing)
After a predefined timeout, the breaker allows a small number of test requests through.
-
If they succeed, the circuit closes and normal traffic resumes
-
If they fail, the circuit opens again and the cooldown restarts
This approach prevents overwhelming a recovering service while still allowing recovery detection.
Common Failure Scenarios (and How Circuit Breakers Help)
1. The “Zombie” Latency Problem
Scenario: A vendor’s p99 latency jumps from 300ms to 8 seconds. Promises stack up, and database connections remain open while waiting.
Solution: The circuit breaker detects the rising timeout rate and opens. Your system fails fast, releasing resources instead of waiting seconds per request.
2. Rate-Limit Meltdowns (HTTP 429)
Scenario: Traffic spikes trigger retries, which multiply requests and hit vendor rate limits. API keys get temporarily blocked.
Solution: Repeated 429 responses trip the breaker. Traffic stops briefly, giving the vendor time to recover and rate-limit buckets to refill.
3. Partial Outages (“Brownouts”)
Scenario: The vendor is partially operational, but 20% of requests return 5xx errors.
Solution: The circuit breaker detects the rising error percentage and isolates the dependency, preventing inconsistent user experiences and noisy logs.
What a Production-Ready Setup Looks Like
A circuit breaker alone is helpful. Combined with other resilience patterns, it becomes production-grade.
Timeouts
Long timeouts silently kill systems. For synchronous user requests, 2–3 seconds is often safer than the default 30-second timeouts.
Retries (Limited)
Retry only once or twice—and only for safe, transient errors such as network hiccups. Never blindly retry 4xx responses.
Fallbacks
Always define behaviour for when the circuit is open:
-
Payments: Queue the intent and notify the user later
-
Maps: Return cached or approximate location data
-
Search: Show popular or default results
Implementing Circuit Breakers in Node.js
You don’t need to build this from scratch. Mature libraries already handle the complexity.
Option A: Opossum (Simple and Widely Used)
Opossum is the de facto standard for Node.js. It wraps async functions and manages circuit state automatically.
Key features include built-in timeouts, failure thresholds, reset timers, and fallback support—making it a strong default choice for most applications.
import CircuitBreaker from "opossum";
async function callVendor(userId) {
const controller = new AbortController();
// Internal request timeout (independent from opossum timeout)
const internalTimeoutMs = 2000;
const timer = setTimeout(() => controller.abort(), internalTimeoutMs);
try {
const res = await fetch(`https://api.vendor.com/users/${userId}`, {
signal: controller.signal,
headers: { Authorization: `Bearer ${process.env.VENDOR_TOKEN}` },
});
if (!res.ok) {
const err = new Error(`Vendor HTTP ${res.status}`);
err.status = res.status;
throw err;
}
return await res.json();
} finally {
clearTimeout(timer);
}
}
// Configure the breaker
const breaker = new CircuitBreaker(callVendor, {
timeout: 2500, // If the function takes longer than this, opossum treats it as a failure
errorThresholdPercentage: 50, // Trip if >= 50% of recent calls fail
resetTimeout: 15_000, // Stay open for 15s, then try half-open
volumeThreshold: 20, // Don't evaluate/trip until at least 20 requests have happened
// Decide which errors should NOT count as "breaker failures"
// Example: ignore "bad request" (4xx) except rate limiting (429)
errorFilter: (err) => {
const status = err?.status;
if (!status) return false; // network/timeout errors should count as failures
if (status === 429) return false; // rate limits should count as failures
return status >= 400 && status < 500; // ignore other 4xx (caller/client mistakes)
},
});
// Fallback when open (or when failures occur)
breaker.fallback((userId) => ({
data: null,
degraded: true,
reason: "Vendor temporarily unavailable",
userId,
}));
export function getUserFromVendor(userId) {
return breaker.fire(userId);
}Option B: Cockatiel (Policy-Based and TypeScript-Friendly)
Cockatiel shines when you want explicit control and composable policies. It allows you to layer retries, timeouts, and circuit breakers in a clear, declarative way—ideal for complex TypeScript systems.
import {
circuitBreaker,
ConsecutiveBreaker,
retry,
handleAll,
timeout,
wrap,
} from "cockatiel";
async function callVendor(userId: string) {
const res = await fetch(`https://api.vendor.com/users/${userId}`, {
headers: { Authorization: `Bearer ${process.env.VENDOR_TOKEN}` },
});
if (!res.ok) {
const err = new Error(`Vendor HTTP ${res.status}`) as Error & { status?: number };
err.status = res.status;
throw err;
}
return (await res.json()) as unknown;
}
// Policies
const retryPolicy = retry(handleAll, { maxAttempts: 2 }); // retry up to 2 attempts total
const timeoutPolicy = timeout(2000); // 2s timeout
const breakerPolicy = circuitBreaker(handleAll, {
breaker: new ConsecutiveBreaker(5), // open after 5 consecutive failures
halfOpenAfter: 15_000, // wait 15s before allowing a probe request
});
// Compose: Retry inside Timeout inside Breaker
const resilientPolicy = wrap(retryPolicy, timeoutPolicy, breakerPolicy);
// Usage
export function getUserFromVendor(userId: string) {
return resilientPolicy.execute(() => callVendor(userId));
}Configuration Cheat Sheet (Safe Defaults)
Most circuit breaker issues come from poor configuration. These are reliable starting points:
-
Timeout: 1–3 seconds for real-time requests
-
Failure Threshold: ~50%
-
Minimum Volume: At least 20 requests before tripping
-
Open Duration: 10–30 seconds
-
Half-Open Probes: 1–5 test requests
Observability: Don’t Fly Blind
You can’t manage what you don’t measure. At minimum, track:
-
Circuit state changes (Closed → Open → Half-Open)
-
Failure causes (timeouts, 5xx errors, 429 rate limits)
-
Fallback frequency and degraded user experiences
Alerting tip: Individual failures are noise. A circuit staying open for more than five minutes is a real signal.
Bottom Line
If your backend depends on third-party APIs, circuit breakers are not optional—they are essential guardrails.
They protect your system from vendor slowdowns, rate-limit cascades, and resource exhaustion. In Node.js, Opossum works well for most teams, while Cockatiel offers precision for complex TypeScript architectures.
Tune your thresholds, design meaningful fallbacks, and stop letting external dependencies dictate the stability of your internal systems.
