Cloud migration gets sold as a tidy process. Move your stuff to the cloud, save money, go home early. Having done more than a dozen of these now, I can tell you it never works like that. Not because anything is fundamentally broken, but because every organisation has years of accumulated decisions, workarounds, and systems that only one person truly understands.
So I thought I’d walk through what happens when we do one of these.
First, we figure out what you’ve got
This is the bit that always takes longer than anyone expects. We need a full picture: every server, every database, every integration, every weird thing running in the corner that nobody remembers setting up.
We once found a critical reporting service running on an ancient t2.micro EC2 instance in a personal AWS account. It had been there for four years, paid for on someone’s personal credit card. The engineer who set it up had left the company in 2020. Three business-critical dashboards depended on it, and nobody on the current team even knew the account existed. That’s the kind of thing you find during discovery.
We map dependencies, work out what talks to what, identify compliance requirements, and figure out which systems are still needed. Every organisation has at least a few things running purely because nobody’s been brave enough to switch them off. Migration is a good excuse to finally ask the question.
Then we plan
Once we know what exists, we decide what goes where and how. Not everything moves the same way. Some things get lifted and shifted as-is. Some get moved onto managed services. Some get properly redesigned. And some just get retired.
We work out the sequence: least critical things first, so we can validate our approach before touching anything important. We set cutover criteria, agree on acceptable downtime, and make sure everyone knows what success looks like before we start.
The architectural decisions happen here too. Which regions, how networking works, disaster recovery, security groups, backup strategy. It’s not glamorous work, but skipping it is how migrations go sideways.
Then we start moving things
The early moves are usually the easy ones. Stateless services, standard databases, systems with clean boundaries. We’re proving the process works.
And things will go wrong. Performance behaves differently in the cloud. A dependency you thought was optional turns out to be load-bearing. Networking does something unexpected. That’s fine. We expect it. Each problem we hit early makes the later moves smoother.
Some systems get replatformed along the way. That old SQL Server on a physical box becomes a managed RDS instance. The file share becomes S3. We’re not just copying things. We’re taking the opportunity to reduce operational overhead where it makes sense.
Then we cut over and hold our breath
Before the actual switch, we run both environments in parallel with real traffic. We reduce DNS TTLs well in advance and have rollback plans ready.
I remember a DNS cutover at 2am on a Saturday. Everything looked fine on our end. Monitoring was green. Then a customer in Glasgow rang at 7am because their ISP was still caching the old address. You plan for this, but it still catches you out. We had someone ready to walk them through a cache flush, and it was sorted in minutes, but it’s a good reminder that “done” doesn’t mean “done for everyone at the same time.”
The actual flip is usually anticlimactic. Update DNS, monitor for 24-48 hours, sleep badly anyway. That’s normal.
After that, we optimise
Migration isn’t finished when everything’s moved. You’ve probably over-provisioned during the move because you weren’t sure what you’d need. Now you have real data. Scale down the instances sitting at 5% CPU. Move to reserved instances for predictable workloads. Start using managed services properly.
This is also when we tackle technical debt. The new environment is stable, so now we can containerise things that benefit from it, improve monitoring, and clean up the rough edges.
Things that catch people out
A few patterns I see on nearly every migration:
Data transfer takes longer than you think. Network bandwidth isn’t magic. If you’re moving terabytes, budget the time properly and consider something like AWS DataSync or even physical transfer.
Legacy systems have hidden dependencies. That standalone application needs a specific Active Directory user, or an obscure environment variable, or a file in a very particular location. You find these during testing, not before.
DNS propagation isn’t instant everywhere. Your team sees success, but someone on the other side of the country is still hitting the old system. Plan for a tail of stragglers.
Licensing surprises. Software that was included in your on-premises contract might need separate cloud licensing. Budget for it early.
Cloud migration is worth doing, but it’s not a weekend project. It takes proper planning, people who’ve done it before, and realistic expectations about timelines.