Nobody does.
It usually starts innocent. One team likes AWS because that is what they know. Another team is deep in Azure because of Microsoft everything. Then someone says Google Cloud has the best data tools, and suddenly your analytics stack is there. And then you wake up one day and realize you are running production systems across three clouds, paying three sets of bills, reading three different status pages, and translating the same security requirement into three different languages.
It can work. Plenty of companies do it.
But the mental load is real. The death by a thousand console clicks is also real.
So this is a practical, slightly battle worn guide to keeping your sanity while managing AWS, Azure, and GCP at the same time. Not in theory. In the stuff you actually do on a Tuesday.
Why three clouds gets messy so fast
If you are thinking, “Infra is infra, how different can it be?”, the answer is. Very.
The names differ, the defaults differ, the identity models differ, and the network boundaries and quotas and billing dimensions all differ. Even the way support works. Even the way a “project” is defined. Even how logs are stored and priced.
And the worst part is not learning each cloud. You can learn each cloud. The worst part is context switching.
You are in AWS thinking in accounts, IAM roles, VPCs, security groups, CloudWatch. Then you jump to Azure and now it is subscriptions, resource groups, managed identities, VNets, NSGs, Monitor. Then GCP where it is projects, folders, IAM bindings, VPC networks, firewall rules, Cloud Logging.
Same goals, different mechanics, different footguns.
So the main job is not memorizing features. It is reducing the amount of unique thinking you have to do per cloud.
That is the theme of this whole thing.
First rule: decide what “multi cloud” actually means for you
A lot of orgs claim they are multi cloud, but what they really have is “some stuff over there”. That is fine. It is normal.
You need to be brutally clear about which one you are:
- Primary cloud plus satellites
- One cloud runs the core platform. The other two host specific services. Maybe one data product in GCP. Maybe AD integration stuff in Azure. Good. This is manageable.
- Active active across clouds
- Same app, same stack, running in multiple clouds for redundancy. This is the hard mode. Expensive too.
- Portfolio split
- Different products in different clouds. This happens with acquisitions or independent teams.
Your operating model depends on which one you are. If you do not define it, you end up accidentally building toward the hardest version while telling yourself you are doing the easier one.
Write it down. Literally. Put it in your internal docs. Make it a thing people can point to when someone suggests moving a random service to a new cloud because they read a blog post.
Standardize the boring stuff and stop debating it every week
You cannot standardize everything across three clouds. You will go insane trying.
But you can standardize the boring fundamentals. The stuff that eats 80 percent of your time.
Here is what is worth standardizing:
Naming, tagging, and ownership
If you only fix one thing, fix this. Because when something breaks at 2:07 AM, the question is always:
“Who owns this thing and why does it exist?”
Create a minimal tagging policy that applies everywhere:
owner(team or on call group)service(app name)env(prod, staging, dev)cost_center(whatever finance needs)data_classification(optional, but helpful)
Then enforce it.
Not with a polite wiki page. With policy. AWS has tag policies and SCP patterns. Azure has Policy. GCP has organization policy constraints and you can enforce labels via deployment tooling. Use whatever lever you have. Even if enforcement is “no merge to main unless tags exist”.
And keep names consistent too. If it is payments-api in AWS, do not call it paymentsservice-prod in Azure and pay-api in GCP. That sounds minor. It adds up fast.
Environments and account structure
The easiest way to lose your mind is mixing dev and prod in the same boundary.
You want consistent environment boundaries:
- AWS: separate accounts for prod and non prod, ideally more
- Azure: separate subscriptions at least
- GCP: separate projects, and usually separate folders if you are more mature
Do not try to be clever here. Hard boundaries prevent accidents. And they make billing and access reviews way simpler.
Golden paths for common deployments
Most teams deploy the same shapes of things:
- a containerized API
- a background worker
- a scheduled job
- a database
- a queue
- a bucket
- an internal service behind private networking
Create “golden path” templates for those. In one internal repo. With docs that are actually short.
If someone can deploy a new service without having to invent IAM policies from scratch, you just saved yourself weeks of future support tickets.
Identity: pick one control plane, and make everything else defer to it
Identity is where three clouds become three separate kingdoms. And if you let each kingdom evolve independently, you will spend your life doing access reviews.
The sanity move is to pick a single source of truth for human identity. Usually this is your corporate IdP.
Common patterns:
- Microsoft Entra ID (Azure AD) as the IdP for everything
- Okta as the IdP for everything
- Google Workspace identity, sometimes, depending on the org
Then:
- Use SSO for AWS (IAM Identity Center or federation)
- Use SSO for Azure (native)
- Use SSO for GCP (workforce identity federation or SSO via your IdP)
The goal: nobody has permanent cloud local users. No random IAM users hanging around. No personal access keys for humans.
For workloads, do the same thing conceptually, even if the mechanics differ:
- AWS: IAM roles for service accounts, IRSA for EKS, instance profiles
- Azure: Managed Identities
- GCP: Service Accounts with Workload Identity for GKE
And then establish a standard: workloads use short lived credentials, not static keys. Period.
If you have static keys today, do not shame yourself. Just start a migration plan and chip away at it. Static keys are like glitter. Once you have them, they get everywhere.
Networking: stop building three separate “special snowflake” network designs
Multi cloud networking can get complicated fast. Especially if you try to make it all feel like one flat network.
My preference, for sanity, is this:
- Each cloud has its own clean, well segmented network.
- Intercloud traffic is explicit and limited.
- Shared services are either duplicated per cloud or centralized with clear boundaries.
You have a few connectivity options:
- Site to site VPN between clouds
- Dedicated interconnects (AWS Direct Connect, Azure ExpressRoute, GCP Interconnect) via a colocation or partner
- Transit hubs (AWS Transit Gateway, Azure Virtual WAN, GCP Cloud Router plus hub VPC patterns)
The key is not the technology. It is governance.
Write down allowed paths. Like:
- prod to prod only
- dev cannot reach prod
- only specific shared services can be reached cross cloud
- all cross cloud traffic is logged and measured
If you do not define this, you end up with random peering and ad hoc VPNs because someone “just needed to test something”.
Also. DNS. Do not ignore DNS.
Pick a strategy for internal DNS resolution and stick to it. Hybrid DNS setups can become a haunted house if you let every team add records in a different place.
Observability: one pane of glass is nice, but one way of working is the real win
People love the phrase “single pane of glass”. Usually it means “we bought a tool”.
Tools help. But the sanity saver is consistency in:
- how you name metrics
- how you structure logs
- how you set alerts
- how you do incident response
You can keep native tools per cloud, but you need a common layer so on call does not have to think too hard.
What works well:
- Centralize logs into one system (Elastic, Splunk, Datadog, Grafana Loki, whatever you use)
- Standardize JSON log format across apps
- Standardize trace propagation and service naming (OpenTelemetry helps a lot)
- Use one alerting and on call tool (PagerDuty, Opsgenie, etc)
Then define a small set of global SLOs and alert policies that apply to all services, regardless of cloud.
Example:
- latency p95
- error rate
- saturation signals (CPU, memory, queue depth)
- dependency health
When an alert fires, the first steps should be identical whether the service is on ECS, AKS, or GKE.
That is how you keep your brain from melting.
FinOps: three clouds will quietly eat your budget if you do not set rules
Cloud billing is already confusing in one cloud. In three, it is basically a hobby.
You need a cost operating rhythm.
Not a once a quarter “cost optimization initiative”. An actual weekly habit.
Here is a simple setup that works:
- Weekly cost review meeting, 30 minutes, same agenda
- Each cloud has an owner who can explain anomalies
- Top 10 services by spend are tracked
- Idle resources report is tracked
- Commitments are managed intentionally (Savings Plans, Reserved Instances, Azure Reservations, GCP CUDs)
A couple of rules that help:
- No untagged spend in prod
- No public IPs by default unless approved
- Default to autoscaling and rightsizing
- Put dev and preview environments on stricter budgets and schedules
- Create “sandbox” areas with hard quotas so experiments do not become permanent bills
Also, build a habit of looking at unit cost. Cost per request, per customer, per job. Not just total bill. Total bill is too easy to shrug at until it doubles.
Security: unify the policy, not the tooling
Trying to make every cloud use the exact same security toolset can backfire. You will spend years integrating.
Instead, unify the policy and the outcomes.
Define your baseline controls:
- MFA enforced for all humans
- least privilege access reviews on a schedule
- encryption at rest everywhere
- TLS everywhere in transit
- no public storage buckets
- vulnerability scanning for images
- patching policy for VMs
- centralized secrets management approach
- audit logging enabled and retained
Then map those controls to each cloud’s native capabilities and your chosen tools.
A pragmatic pattern is:
- Use CSPM to get a consistent view (Wiz, Prisma, Defender for Cloud, etc)
- Use cloud native security where it is strong and easy
- Centralize findings into one place (SIEM) so security does not chase three dashboards
Most importantly, keep a single exception process. One form, one workflow, one place to track it.
Otherwise you end up with “temporary exceptions” that live forever because nobody knows where they were approved.
Deployment and IaC: pick one approach and enforce it, gently but firmly
If teams are clicking around in consoles across three clouds, you are going to suffer.
Infrastructure as Code is not optional in multi cloud. It is how you keep things repeatable.
Pick your primary IaC approach:
- Terraform / OpenTofu is common because it spans clouds
- Pulumi if you want code based IaC
- Cloud native templates can exist, but then you have three systems
You can still allow cloud specific modules. That is fine. The goal is a consistent workflow:
- PR based changes
- code review
- plan output visible
- apply via CI
- state handled properly
- drift detection, at least for critical stuff
Then add platform guardrails:
- approved modules for common things
- policy as code (OPA, Sentinel, Azure Policy as gates, etc)
- automated tests for IaC where possible
And please, keep modules boring. The more magic you put in them, the more you will be the only person who can debug them.
Governance without becoming the “no” team
Three clouds makes governance more necessary. It also makes governance easier to mess up.
If your central platform team becomes a gate for everything, teams will route around you. They will open tickets. They will create shadow infrastructure. They will do weird things with credit cards.
So aim for paved roads, not roadblocks.
- Give teams self service templates that are safe
- Enforce a few hard rules (identity, logging, tagging, network boundaries)
- Let teams move fast inside those boundaries
A good sign you have it right: engineers complain a little, but they still use the templates because it is faster than DIY.
The on call reality: make “where is it running” irrelevant
When you are on call, you should not have to answer these questions from scratch:
- which cloud is this service in?
- where are its logs?
- where are its dashboards?
- what is its runbook?
- who owns it?
- what dependencies does it have?
You want a service catalog. Not fancy. Just accurate.
At minimum, every production service should have:
- owner and escalation
- cloud and region
- repo link
- deploy pipeline link
- dashboard link
- log search link
- runbook link
- dependency list
This sounds like bureaucracy until you are in an incident and you realize nobody knows which team owns the thing that is failing.
Also, do game days. Even small ones. Once a month, break something in a controlled way. Practice cross cloud failure scenarios if you actually depend on cross cloud traffic.
Because the first time you discover your DNS failover does not work should not be during a real outage.
A simple mental model that helps: “global standards, local implementations”
This one sentence has saved me a lot of pointless arguments.
You define global standards:
- how identity works
- what tagging is required
- what logs must exist
- what encryption is required
- what environments exist
- what SLOs look like
- what the incident process is
Then each cloud implements those standards with its own native constructs.
You stop trying to make AWS look exactly like Azure. You stop trying to make GCP projects behave like AWS accounts. You accept the differences, but you make them feel consistent from the perspective of the engineer doing daily work.
That is the trick.
The sanity checklist (print this, seriously)
If you are managing three clouds, and you want a quick gut check, here you go:
- One IdP, SSO everywhere, no long lived human keys
- Clear environment boundaries in all clouds
- Minimal tagging policy enforced
- IaC as the default path, PR based
- Centralized logging and a consistent log format
- One alerting/on call system, consistent runbooks
- Weekly cost review, top spend tracked, commitments managed
- Baseline security controls mapped per cloud, exceptions tracked
- Documented network connectivity rules, DNS strategy not ad hoc
- Service catalog exists and is actually maintained
If half of these are missing, that is not a moral failure. It just means your future stress is already scheduled.
Wrapping up
Managing three clouds is not automatically “bad”. Sometimes it is the right call. Sometimes it is the result of reality. Mergers, customer requirements, team expertise, compliance, geography. All valid.
But three clouds will punish inconsistency.
So the way you keep your sanity is not by becoming an expert in every console. It is by reducing variation. Standardizing the boring stuff. Automating the repeatable stuff. Centralizing identity and observability. And making it easy for teams to do the right thing without asking permission every time.
Then, weirdly, it starts to feel calm again. Not effortless. But calm.
And that is kind of the best you can ask for when you are running three different clouds at once.
FAQs (Frequently Asked Questions)
Why does managing multiple clouds like AWS, Azure, and GCP get messy so fast?
Managing multiple clouds gets messy quickly because each cloud provider has different names, defaults, identity models, network boundaries, quotas, billing dimensions, support processes, project definitions, and log storage/pricing. The biggest challenge isn’t learning each cloud but dealing with constant context switching between different terminologies and mechanics, which increases mental load and complexity.
What are the common multi-cloud strategies organizations use?
Organizations typically fall into one of three multi-cloud categories: 1) Primary cloud plus satellites—one main cloud runs core platforms while others host specific services; 2) Active-active across clouds—running the same app and stack redundantly across multiple clouds (hardest and most expensive); 3) Portfolio split—different products run independently in different clouds often due to acquisitions or team independence. Defining your strategy clearly is essential for effective management.
What is the first rule to manage a multi-cloud environment effectively?
The first rule is to decide what ‘multi-cloud’ actually means for your organization. Be brutally clear about your multi-cloud model—whether it’s primary plus satellites, active-active redundancy, or portfolio split—and document it internally. This clarity prevents accidental complexity and helps guide decisions about where to deploy new services.
Which aspects should be standardized across multiple clouds to reduce complexity?
Standardize the boring fundamentals that consume most time: naming conventions, tagging policies (including owner, service name, environment, cost center), ownership clarity, environment boundaries (separate prod/non-prod accounts or projects), and golden path templates for common deployments like APIs, workers, databases, queues, and storage buckets. Enforce these standards with policies rather than just documentation.
How should identity management be handled in a multi-cloud setup?
Pick one identity control plane—usually your corporate Identity Provider (IdP) such as Microsoft Entra ID (Azure AD), Okta, or Google Workspace—and make all cloud environments defer to it using Single Sign-On (SSO). Avoid permanent local cloud users or personal access keys for humans by federating identities via SSO for AWS IAM Identity Center or federation, Azure native SSO, and GCP workforce identity federation to maintain centralized access control and simplify reviews.
Why is separating environments like production and development important in multi-cloud architectures?
Separating environments using hard boundaries—such as separate AWS accounts for prod/non-prod, separate Azure subscriptions, and distinct GCP projects/folders—prevents accidental cross-environment impacts. This separation simplifies billing management and access reviews while reducing risks associated with mixing dev/test workloads with production systems.

