Managing 3 Clouds at Once: How to Keep Your Sanity

Net Onboard

- February 7, 2026

Nobody does.

It usually starts innocent. One team likes AWS because that is what they know. Another team is deep in Azure because of Microsoft everything. Then someone says Google Cloud has the best data tools, and suddenly your analytics stack is there. And then you wake up one day and realize you are running production systems across three clouds, paying three sets of bills, reading three different status pages, and translating the same security requirement into three different languages.

It can work. Plenty of companies do it.

But the mental load is real. The death by a thousand console clicks is also real.

So this is a practical, slightly battle worn guide to keeping your sanity while managing AWS, Azure, and GCP at the same time. Not in theory. In the stuff you actually do on a Tuesday.

Why three clouds gets messy so fast

If you are thinking, “Infra is infra, how different can it be?”, the answer is. Very.

The names differ, the defaults differ, the identity models differ, and the network boundaries and quotas and billing dimensions all differ. Even the way support works. Even the way a “project” is defined. Even how logs are stored and priced.

And the worst part is not learning each cloud. You can learn each cloud. The worst part is context switching.

You are in AWS thinking in accounts, IAM roles, VPCs, security groups, CloudWatch. Then you jump to Azure and now it is subscriptions, resource groups, managed identities, VNets, NSGs, Monitor. Then GCP where it is projects, folders, IAM bindings, VPC networks, firewall rules, Cloud Logging.

Same goals, different mechanics, different footguns.

So the main job is not memorizing features. It is reducing the amount of unique thinking you have to do per cloud.

That is the theme of this whole thing.

First rule: decide what “multi cloud” actually means for you

A lot of orgs claim they are multi cloud, but what they really have is “some stuff over there”. That is fine. It is normal.

You need to be brutally clear about which one you are:

Primary cloud plus satellites
One cloud runs the core platform. The other two host specific services. Maybe one data product in GCP. Maybe AD integration stuff in Azure. Good. This is manageable.
Active active across clouds
Same app, same stack, running in multiple clouds for redundancy. This is the hard mode. Expensive too.
Portfolio split
Different products in different clouds. This happens with acquisitions or independent teams.

Your operating model depends on which one you are. If you do not define it, you end up accidentally building toward the hardest version while telling yourself you are doing the easier one.

Write it down. Literally. Put it in your internal docs. Make it a thing people can point to when someone suggests moving a random service to a new cloud because they read a blog post.

Standardize the boring stuff and stop debating it every week

You cannot standardize everything across three clouds. You will go insane trying.

But you can standardize the boring fundamentals. The stuff that eats 80 percent of your time.

Here is what is worth standardizing:

Naming, tagging, and ownership

If you only fix one thing, fix this. Because when something breaks at 2:07 AM, the question is always:

“Who owns this thing and why does it exist?”

Create a minimal tagging policy that applies everywhere:

owner (team or on call group)
service (app name)
env (prod, staging, dev)
cost_center (whatever finance needs)
data_classification (optional, but helpful)

Then enforce it.

Not with a polite wiki page. With policy. AWS has tag policies and SCP patterns. Azure has Policy. GCP has organization policy constraints and you can enforce labels via deployment tooling. Use whatever lever you have. Even if enforcement is “no merge to main unless tags exist”.

And keep names consistent too. If it is payments-api in AWS, do not call it paymentsservice-prod in Azure and pay-api in GCP. That sounds minor. It adds up fast.

Environments and account structure

The easiest way to lose your mind is mixing dev and prod in the same boundary.

You want consistent environment boundaries:

AWS: separate accounts for prod and non prod, ideally more
Azure: separate subscriptions at least
GCP: separate projects, and usually separate folders if you are more mature

Do not try to be clever here. Hard boundaries prevent accidents. And they make billing and access reviews way simpler.

Golden paths for common deployments

Most teams deploy the same shapes of things:

a containerized API
a background worker
a scheduled job
a database
a queue
a bucket
an internal service behind private networking

Create “golden path” templates for those. In one internal repo. With docs that are actually short.

If someone can deploy a new service without having to invent IAM policies from scratch, you just saved yourself weeks of future support tickets.

Identity: pick one control plane, and make everything else defer to it

Identity is where three clouds become three separate kingdoms. And if you let each kingdom evolve independently, you will spend your life doing access reviews.

The sanity move is to pick a single source of truth for human identity. Usually this is your corporate IdP.

Common patterns:

Microsoft Entra ID (Azure AD) as the IdP for everything
Okta as the IdP for everything
Google Workspace identity, sometimes, depending on the org

Then:

Use SSO for AWS (IAM Identity Center or federation)
Use SSO for Azure (native)
Use SSO for GCP (workforce identity federation or SSO via your IdP)

The goal: nobody has permanent cloud local users. No random IAM users hanging around. No personal access keys for humans.

For workloads, do the same thing conceptually, even if the mechanics differ:

AWS: IAM roles for service accounts, IRSA for EKS, instance profiles
Azure: Managed Identities
GCP: Service Accounts with Workload Identity for GKE

And then establish a standard: workloads use short lived credentials, not static keys. Period.

If you have static keys today, do not shame yourself. Just start a migration plan and chip away at it. Static keys are like glitter. Once you have them, they get everywhere.

Networking: stop building three separate “special snowflake” network designs

Multi cloud networking can get complicated fast. Especially if you try to make it all feel like one flat network.

My preference, for sanity, is this:

Each cloud has its own clean, well segmented network.
Intercloud traffic is explicit and limited.
Shared services are either duplicated per cloud or centralized with clear boundaries.

You have a few connectivity options:

Site to site VPN between clouds
Dedicated interconnects (AWS Direct Connect, Azure ExpressRoute, GCP Interconnect) via a colocation or partner
Transit hubs (AWS Transit Gateway, Azure Virtual WAN, GCP Cloud Router plus hub VPC patterns)

The key is not the technology. It is governance.

Write down allowed paths. Like:

prod to prod only
dev cannot reach prod
only specific shared services can be reached cross cloud
all cross cloud traffic is logged and measured

If you do not define this, you end up with random peering and ad hoc VPNs because someone “just needed to test something”.

Also. DNS. Do not ignore DNS.

Pick a strategy for internal DNS resolution and stick to it. Hybrid DNS setups can become a haunted house if you let every team add records in a different place.

Observability: one pane of glass is nice, but one way of working is the real win

People love the phrase “single pane of glass”. Usually it means “we bought a tool”.

Tools help. But the sanity saver is consistency in:

how you name metrics
how you structure logs
how you set alerts
how you do incident response

You can keep native tools per cloud, but you need a common layer so on call does not have to think too hard.

What works well:

Centralize logs into one system (Elastic, Splunk, Datadog, Grafana Loki, whatever you use)
Standardize JSON log format across apps
Standardize trace propagation and service naming (OpenTelemetry helps a lot)
Use one alerting and on call tool (PagerDuty, Opsgenie, etc)

Then define a small set of global SLOs and alert policies that apply to all services, regardless of cloud.

Example:

latency p95
error rate
saturation signals (CPU, memory, queue depth)
dependency health

When an alert fires, the first steps should be identical whether the service is on ECS, AKS, or GKE.

That is how you keep your brain from melting.

FinOps: three clouds will quietly eat your budget if you do not set rules

Cloud billing is already confusing in one cloud. In three, it is basically a hobby.

You need a cost operating rhythm.

Not a once a quarter “cost optimization initiative”. An actual weekly habit.

Here is a simple setup that works:

Weekly cost review meeting, 30 minutes, same agenda
Each cloud has an owner who can explain anomalies
Top 10 services by spend are tracked
Idle resources report is tracked
Commitments are managed intentionally (Savings Plans, Reserved Instances, Azure Reservations, GCP CUDs)

A couple of rules that help:

No untagged spend in prod
No public IPs by default unless approved
Default to autoscaling and rightsizing
Put dev and preview environments on stricter budgets and schedules
Create “sandbox” areas with hard quotas so experiments do not become permanent bills

Also, build a habit of looking at unit cost. Cost per request, per customer, per job. Not just total bill. Total bill is too easy to shrug at until it doubles.

Security: unify the policy, not the tooling

Trying to make every cloud use the exact same security toolset can backfire. You will spend years integrating.

Instead, unify the policy and the outcomes.

Define your baseline controls:

MFA enforced for all humans
least privilege access reviews on a schedule
encryption at rest everywhere
TLS everywhere in transit
no public storage buckets
vulnerability scanning for images
patching policy for VMs
centralized secrets management approach
audit logging enabled and retained

Then map those controls to each cloud’s native capabilities and your chosen tools.

A pragmatic pattern is:

Use CSPM to get a consistent view (Wiz, Prisma, Defender for Cloud, etc)
Use cloud native security where it is strong and easy
Centralize findings into one place (SIEM) so security does not chase three dashboards

Most importantly, keep a single exception process. One form, one workflow, one place to track it.

Otherwise you end up with “temporary exceptions” that live forever because nobody knows where they were approved.

Deployment and IaC: pick one approach and enforce it, gently but firmly

If teams are clicking around in consoles across three clouds, you are going to suffer.

Infrastructure as Code is not optional in multi cloud. It is how you keep things repeatable.

Pick your primary IaC approach:

Terraform / OpenTofu is common because it spans clouds
Pulumi if you want code based IaC
Cloud native templates can exist, but then you have three systems

You can still allow cloud specific modules. That is fine. The goal is a consistent workflow:

PR based changes
code review
plan output visible
apply via CI
state handled properly
drift detection, at least for critical stuff

Then add platform guardrails:

approved modules for common things
policy as code (OPA, Sentinel, Azure Policy as gates, etc)
automated tests for IaC where possible

And please, keep modules boring. The more magic you put in them, the more you will be the only person who can debug them.

Governance without becoming the “no” team

Three clouds makes governance more necessary. It also makes governance easier to mess up.

If your central platform team becomes a gate for everything, teams will route around you. They will open tickets. They will create shadow infrastructure. They will do weird things with credit cards.

So aim for paved roads, not roadblocks.

Give teams self service templates that are safe
Enforce a few hard rules (identity, logging, tagging, network boundaries)
Let teams move fast inside those boundaries

A good sign you have it right: engineers complain a little, but they still use the templates because it is faster than DIY.

The on call reality: make “where is it running” irrelevant

When you are on call, you should not have to answer these questions from scratch:

which cloud is this service in?
where are its logs?
where are its dashboards?
what is its runbook?
who owns it?
what dependencies does it have?

You want a service catalog. Not fancy. Just accurate.

At minimum, every production service should have:

owner and escalation
cloud and region
repo link
deploy pipeline link
dashboard link
log search link
runbook link
dependency list

This sounds like bureaucracy until you are in an incident and you realize nobody knows which team owns the thing that is failing.

Also, do game days. Even small ones. Once a month, break something in a controlled way. Practice cross cloud failure scenarios if you actually depend on cross cloud traffic.

Because the first time you discover your DNS failover does not work should not be during a real outage.

A simple mental model that helps: “global standards, local implementations”

This one sentence has saved me a lot of pointless arguments.

You define global standards:

how identity works
what tagging is required
what logs must exist
what encryption is required
what environments exist
what SLOs look like
what the incident process is

Then each cloud implements those standards with its own native constructs.

You stop trying to make AWS look exactly like Azure. You stop trying to make GCP projects behave like AWS accounts. You accept the differences, but you make them feel consistent from the perspective of the engineer doing daily work.

That is the trick.

The sanity checklist (print this, seriously)

If you are managing three clouds, and you want a quick gut check, here you go:

One IdP, SSO everywhere, no long lived human keys
Clear environment boundaries in all clouds
Minimal tagging policy enforced
IaC as the default path, PR based
Centralized logging and a consistent log format
One alerting/on call system, consistent runbooks
Weekly cost review, top spend tracked, commitments managed
Baseline security controls mapped per cloud, exceptions tracked
Documented network connectivity rules, DNS strategy not ad hoc
Service catalog exists and is actually maintained

If half of these are missing, that is not a moral failure. It just means your future stress is already scheduled.

Wrapping up

Managing three clouds is not automatically “bad”. Sometimes it is the right call. Sometimes it is the result of reality. Mergers, customer requirements, team expertise, compliance, geography. All valid.

But three clouds will punish inconsistency.

So the way you keep your sanity is not by becoming an expert in every console. It is by reducing variation. Standardizing the boring stuff. Automating the repeatable stuff. Centralizing identity and observability. And making it easy for teams to do the right thing without asking permission every time.

Then, weirdly, it starts to feel calm again. Not effortless. But calm.

And that is kind of the best you can ask for when you are running three different clouds at once.

FAQs (Frequently Asked Questions)

Why does managing multiple clouds like AWS, Azure, and GCP get messy so fast?

Managing multiple clouds gets messy quickly because each cloud provider has different names, defaults, identity models, network boundaries, quotas, billing dimensions, support processes, project definitions, and log storage/pricing. The biggest challenge isn’t learning each cloud but dealing with constant context switching between different terminologies and mechanics, which increases mental load and complexity.

What are the common multi-cloud strategies organizations use?

Organizations typically fall into one of three multi-cloud categories: 1) Primary cloud plus satellites—one main cloud runs core platforms while others host specific services; 2) Active-active across clouds—running the same app and stack redundantly across multiple clouds (hardest and most expensive); 3) Portfolio split—different products run independently in different clouds often due to acquisitions or team independence. Defining your strategy clearly is essential for effective management.

What is the first rule to manage a multi-cloud environment effectively?

The first rule is to decide what ‘multi-cloud’ actually means for your organization. Be brutally clear about your multi-cloud model—whether it’s primary plus satellites, active-active redundancy, or portfolio split—and document it internally. This clarity prevents accidental complexity and helps guide decisions about where to deploy new services.

Which aspects should be standardized across multiple clouds to reduce complexity?

Standardize the boring fundamentals that consume most time: naming conventions, tagging policies (including owner, service name, environment, cost center), ownership clarity, environment boundaries (separate prod/non-prod accounts or projects), and golden path templates for common deployments like APIs, workers, databases, queues, and storage buckets. Enforce these standards with policies rather than just documentation.

How should identity management be handled in a multi-cloud setup?

Pick one identity control plane—usually your corporate Identity Provider (IdP) such as Microsoft Entra ID (Azure AD), Okta, or Google Workspace—and make all cloud environments defer to it using Single Sign-On (SSO). Avoid permanent local cloud users or personal access keys for humans by federating identities via SSO for AWS IAM Identity Center or federation, Azure native SSO, and GCP workforce identity federation to maintain centralized access control and simplify reviews.

Why is separating environments like production and development important in multi-cloud architectures?

Separating environments using hard boundaries—such as separate AWS accounts for prod/non-prod, separate Azure subscriptions, and distinct GCP projects/folders—prevents accidental cross-environment impacts. This separation simplifies billing management and access reviews while reducing risks associated with mixing dev/test workloads with production systems.