On the morning of May 9th, 2026, we got an alert that stopped us cold. Our primary database cluster — hosted on servers in a regional data centre — had gone dark. No failover. No graceful shutdown. Just silence. The Iran-USA conflict had escalated overnight, and the physical infrastructure we'd relied on for three years was caught in the crossfire of network shutdowns and power grid disruptions across the region.
We had client data, project files, user accounts, and three years of operational records sitting in a system we could no longer reach. What followed was 48 of the most focused hours this team has ever worked. This is the honest account of what happened, what we did, and what we changed so it can never happen this way again.
What we lost — and what we didn't
The good news, such as it was: we run daily automated backups to a separate cloud storage bucket in a geographically distant region. The most recent snapshot was from 11 PM the night before the outage — roughly nine hours before we lost access. That window hurt. Some client deliverable notes and a handful of in-progress project updates were in that gap.
We reached out to every active client within the first two hours to be transparent about the situation. Not a single one asked us to compensate them. Several sent messages of support. That trust, earned over years of reliability, was the only cushion we had.
"Resilience isn't the absence of disruption. It's the speed at which you recover from it." — Internal post-mortem, May 2026
The 48-hour migration
We had three priorities, in order: restore read access for clients, restore write access for the team, then rebuild the full stack on infrastructure that didn't share any geographic risk with the conflict zone.
Hour 0–6: Triage and backup verification
First step was confirming the backup was intact and not corrupted. We spun up a temporary read-only instance from the snapshot on a cloud provider in Europe — not our production stack, just enough to verify data integrity and let clients access their project histories. By hour four, we had a status page live and the read instance accessible.
Hour 6–24: Standing up the new environment
We made the call to migrate our full stack to a multi-region setup across two providers — one primary in Europe, one failover in Southeast Asia. No single region, no single provider. We used pg_restore to hydrate the new primary from the backup snapshot, then ran integrity checks against every table. Ninety-four percent of the data came across clean. The remaining six percent — the nine-hour gap — we reconstructed from email threads, Notion notes, and client communications.
Hour 24–48: Full restoration and hardening
Write access was restored to the team at hour 26. By hour 40 we had streaming replication running between the primary and failover nodes. By hour 48 the new stack was fully live, monitored, and running faster than our old setup — we'd been meaning to upgrade the database version for six months, and the forced migration finally made us do it.
The rule we broke: single-region infrastructure is acceptable until it isn't. If your database lives in one place, you're one bad day away from this situation. Multi-region replication costs a fraction of what a 48-hour outage costs in reputation.
What we changed permanently
The migration forced us to fix things we'd been deferring. Here's the short list of what's now non-negotiable in our infrastructure:
- Multi-region, multi-provider replication. Primary in EU-West, warm standby in Asia-Pacific. Failover tested monthly, not quarterly.
- Hourly incremental backups. Daily snapshots weren't enough. We now run full snapshots every 24 hours and incremental backups every hour. The maximum data loss window is now 60 minutes, not 9 hours.
- Runbook for every failure mode. We wrote out, step by step, exactly what to do if we lose the primary, if we lose the backup, if we lose both. It lives in a static document that doesn't depend on the infrastructure it describes.
- No single geographic dependency. Any vendor, any service, any DNS provider — if it has a single physical region, we run a parallel alternative. Geopolitical risk is real and it doesn't announce itself.
The bigger lesson
We build digital products for businesses. Our entire value proposition is that we make companies more resilient, more scalable, more technology-driven. It would be embarrassing if we didn't apply that same standard to our own infrastructure. The conflict was a forcing function — an ugly, costly one — but it made us better engineers and a more reliable company.
If you're running any kind of business on a single-region server, anywhere in the world, take this as your warning. The geopolitical situation doesn't have to touch you directly. Submarine cables, BGP route hijacks, collateral power outages — there are more ways to lose access to a server than most people think about. Build for the failure before the failure builds itself into your story.
We're sharing this because we believe in being transparent with our clients and the wider tech community in Pakistan. Infrastructure resilience isn't a luxury — for a company that handles client data, it's the most basic form of professional responsibility. We fell short, we fixed it, and we've written it down so you don't have to go through the same 48 hours.