LogoTRUONG PHAM
Home
Projects
Blogs
YouTube
Contact

Newsletter

Stay updated with technical artifacts and engineering insights.

LogoTRUONG PHAM

Building scalable software and sharing insights on technology & life.

Sitemap

  • Home
  • Projects
  • Blogs
  • YouTube
  • Contact

Connect

  • GitHub
  • LinkedIn
  • Email
  • YouTube

© 2024 TRUONG PHAM. © All rights reserved.

Privacy PolicyTerms of Service
Back
Blog #49: Surviving Black Box API – When you have to live with 'Instability'
50 FRONTEND LESSONS – HARD-EARNED EXPERIENCES

Blog #49: Surviving Black Box API – When you have to live with 'Instability'

Analyzing the decision to implement Resilience Patterns when integrating untrustworthy third-party APIs.

TP
Truong PhamSoftware Engineer
PublishedSeptember 15, 2024

I once participated in a project integrating a mapping and positioning system of a local partner. Team size 4, user base around 50,000 accesses per day. The biggest problem wasn't in our code, but in the partner's API system: It was an extremely temperamental "Black Box." Occasionally it had 500 errors, occasionally it slowed down to 30 seconds, and occasionally it... disappeared without a trace.

The Problem: Surviving in Chaos

Our system relied 100% on this API to display the location of shippers. Every time the partner's API "sneezed," our application caught a "cold": Users saw error screens, shippers didn't receive orders, and our switchboard was on fire with complaints.

We couldn't fix the partner's code. The question was: How could our Frontend survive and still maintain a minimum user experience while the foundation (the API) was shaking?

Options Considered

We stood between 2 strategy choices:

Option 1: Retry Logic (Persistence Strategy)

  • Solution: If an API call fails, automatically try again after 1s, 2s, and then 5s. Use an "Exponential Backoff" strategy.
  • Pros: Solves "temporary" errors arising from network congestion. Easy to install with libraries like axios-retry.
  • Cons: If the partner's API is truly down (long downtime), continuous retrying only wastes resources and adds a burden to both sides.

Option 2: Circuit Breaker & Fallback (Disconnection Strategy)

  • Solution: If the API is detected to have failed more than 5 times consecutively in 1 minute, immediately "break the circuit" (no longer allow calling that API for the next 5 minutes). Instead, display old data (cache) or a friendly notification: "The positioning service is undergoing maintenance; we will use your last known location."
  • Pros: Protects our system from freezing along with the partner. Provides a "Graceful degradation" experience for users.
  • Cons: The logic for handling "closed/open circuit" states is quite complex to manage on the Frontend.

Final Decision and Analysis

I decided to combine both but prioritized Option 2.

// Pseudo-code of a simple circuit breaker logic
const fetchDataWithResilience = async () => {
  if (circuitBreaker.isOpen()) {
    return getCachedData(); // Return old data if API is down
  }

  try {
    const data = await apiClient.get('/partner/location');
    circuitBreaker.recordSuccess();
    return data;
  } catch (error) {
    circuitBreaker.recordFailure();
    throw error;
  }
};

Impact on Performance: Markedly improved the application's responsiveness when the partner API crashed. Instead of waiting 30 seconds to report an error (timeout), our system reported an error or used the cache immediately within 100ms.

Impact on Maintainability: Code becomes more "defensive." We had to manage an additional Cache layer (LocalStorage or IndexedDB) to ensure there was always backup data.

Impact on Team: Juniors learned a hard lesson about "never fully trusting anything outside your own system."

Self-Reflection: Was it Over-engineering?

I asked myself: Why not let the Backend handle this for the Frontend? In reality, our Backend would also freeze without a similar mechanism. The Frontend proactively handling "Circuit Breaking" saved us from thousands of hopeless requests to the server, salvaged users' phone batteries, and kept the UI always in a controllable state.

If I were starting over, would I choose differently? No. In the modern web world, where we integrate dozens of Microservices and 3rd-party APIs, Resilience is not an option; it is survival.


Notes on building fortresses in the middle of a storm.

Series • Part 49 of 50

50 FRONTEND LESSONS – HARD-EARNED EXPERIENCES

NextBlog #50: Frontend Is Not Easy – When Simplicity is the Ultimate Sophistication
Blog #48: Slow AI UX Fail – When User Experience can't wait for Artificial Intelligence
44Blog #44: CI/CD and Environment Variables that 'Disappear' Without a Trace45Blog #45: The Brutal Difference Between Staging and Production46Blog #46: Feature Flag – The 'Lifebuoy' for Risky Releases47Blog #47: Long-term Tab Crash – What happens when Users never close their Tabs?48Blog #48: Slow AI UX Fail – When User Experience can't wait for Artificial Intelligence49Blog #49: Surviving Black Box API – When you have to live with 'Instability'Reading50Blog #50: Frontend Is Not Easy – When Simplicity is the Ultimate Sophistication
TP

Written by Truong Pham

Software Engineer passionate about building high-performance systems and meaningful experiences.

Read more articles