LogoTRUONG PHAM
Home
Projects
Blogs
YouTube
Contact

Newsletter

Stay updated with technical artifacts and engineering insights.

LogoTRUONG PHAM

Building scalable software and sharing insights on technology & life.

Sitemap

  • Home
  • Projects
  • Blogs
  • YouTube
  • Contact

Connect

  • GitHub
  • LinkedIn
  • Email
  • YouTube

© 2024 TRUONG PHAM. © All rights reserved.

Privacy PolicyTerms of Service
Back
Blog #45: The Brutal Difference Between Staging and Production
50 FRONTEND LESSONS – HARD-EARNED EXPERIENCES

Blog #45: The Brutal Difference Between Staging and Production

When bugs only appear on the customer's machine and a lesson in consistency between environments.

TP
Truong PhamSoftware Engineer
PublishedAugust 25, 2024

To me from the time before I was ever scolded by a customer because of a "it still works fine on my machine" error,

Do you remember that confident feeling when you hit the "Merge" button after seeing everything running smoothly in the Staging environment? You were certain that Staging was 99% the same as Production, so if Staging was OK, Prod would be OK.

However, just 5 minutes after deploying to Production, the payment system froze. The bug only appeared on Production. You went crazy because you couldn't reproduce it locally or on staging. "Impossible, the code is identical!", you exclaimed.

The Problem: The Illusion of Homogeneity

To put it most simply: Staging and Production are never 100% identical. They differ in:

  1. Data: Staging has 100 fake users; Production has 1 million real users.
  2. Infrastructure Config: CDNs, Caching, Load Balancers, SSL... are usually only fully configured on Production.
  3. User Behavior: Real users always find "weird" ways to use things that your QA would never think of.

Current Perspective: Why "It works on my machine" is a Meaningless Excuse

Now I understand that errors appearing only on Production are often due to issues of Scale and Edge Cases.

Example: You write a function to filter a product list. On Staging there are only 10 products; the function runs in 1ms. On Production there are 50,000 products; that same function freezes the user's browser.

// Junior mistake: Not anticipating the scale of data
const filterProducts = (list) => {
  return list.filter(p => expensiveCalculation(p)); // Works fine with 10 items, dies with 10,000 items
};

Comparison of Solutions:

  • Quick fix (and wrong): Try to copy a part of the database from Production to Staging for testing (Very dangerous due to leaking personal customer information). Or just randomly fix on Prod and see the result (Blind hotfix).
  • Sustainable way:
    1. Feature Flags: Deploy the new code to Production but keep it "off." Only turn it on for 1% of users or for your own team to test on real accounts.
    2. Error Monitoring: Use tools like Sentry or LogRocket to record exactly what happened on the customer's machine.
    3. Production Parity: Try to configure Staging as close to Prod as possible (using the same type of DB, same CDN config).

Practical Lesson

Your code is only truly "finished" when it runs stably on the end user's machine, not your own. Never be complacent when you see Staging is green. Always prepare a Rollback plan and real-time error monitoring tools.

The difference between a coder and an engineer lies in the ability to anticipate what will happen when code leaves its "safe zone."


Notes on the day I learned to doubt every environment.

Series • Part 45 of 50

50 FRONTEND LESSONS – HARD-EARNED EXPERIENCES

NextBlog #46: Feature Flag – The 'Lifebuoy' for Risky Releases
Blog #44: CI/CD and Environment Variables that 'Disappear' Without a Trace
40Blog #40: Don't try to be smart – Write code for humans, not for machines41Blog #41: The 2 AM Panic and Infinite Question Marks42Blog #42: When the Backend Changes the Schema and the Fragility of the Frontend43Blog #43: CORS Isn't Always the Backend's Fault44Blog #44: CI/CD and Environment Variables that 'Disappear' Without a Trace45Blog #45: The Brutal Difference Between Staging and ProductionReading46Blog #46: Feature Flag – The 'Lifebuoy' for Risky Releases47Blog #47: Long-term Tab Crash – What happens when Users never close their Tabs?48Blog #48: Slow AI UX Fail – When User Experience can't wait for Artificial Intelligence49Blog #49: Surviving Black Box API – When you have to live with 'Instability'50Blog #50: Frontend Is Not Easy – When Simplicity is the Ultimate Sophistication
TP

Written by Truong Pham

Software Engineer passionate about building high-performance systems and meaningful experiences.

Read more articles