Cloud Architecture Operating Principles

Architecture Philosophy

These principles guide how I design, assess, and operate modern Microsoft cloud environments. They reflect 20+ years of lessons from enterprise infrastructure, hybrid identity, and multi-tenant deployments.

Click any number to jump directly to that principle →

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

1

Identity

2

Least Privilege

3

Consistency

4

Security Ops

5

Observability

6

Documentation

7

Automation

Principle One

Identity Is the Security Boundary

Zero Trust Foundation

›Centralized identity through Microsoft Entra ID
Microsoft Entra ID
Microsoft's cloud identity platform (formerly Azure Active Directory) - manages who can sign in, what they can access, and under what conditions.

Why it matters
When identity is fragmented across systems, there is no single place to enforce policy or detect compromise. Centralizing in Entra ID gives you one control plane for every access decision across M365, Azure, and connected applications.
›Privileged access with just-in-time elevation

Why it matters
Persistent admin accounts are a primary attack target. Just-in-time elevation through PIM
Privileged Identity Management
Grants admin access only when needed, for a defined time window, with optional approval - then removes it automatically.
means there is nothing standing to compromise outside of an active session.
›Strong authentication - FIDO2 or certificate-based

Why it matters
SMS and app-based MFA are better than passwords alone but are still vulnerable to phishing and SIM-swap attacks. Hardware-backed authentication eliminates those vectors entirely.
›Continuous evaluation of identity risk signals

Why it matters
A valid login at 9am doesn't mean the session is safe at 2pm. Conditional Access
Conditional Access
A policy engine that evaluates sign-in risk, device health, and user context continuously - not just at login - to decide whether to allow, block, or challenge each request.
with continuous access evaluation re-checks risk throughout a session, not just at authentication time.

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

1

"Identity is the new perimeter. Everything else depends on getting this right first."

1

Network perimeters are gone - every request must be authenticated and authorized regardless of where it originates

2

Zero standing access - admin privilege is granted only when needed and removed automatically when tasks are done

3

Risk is continuous - authentication is a point-in-time check; trust must be re-evaluated throughout every session

Principle Two

Least Privilege Is an Operational Discipline

Minimize Blast Radius

›Role-based access design across all workloads

Why it matters
When everyone has broad access "just in case," a single compromised account can cause catastrophic damage. Role-based design limits what any one account can touch to only what it legitimately needs.
›Privileged Identity Management
PIM - Privileged Identity Management
Entra ID feature that enables just-in-time privileged access - users request elevation, it is approved, they get access for a defined window, then it is removed automatically.
for elevation workflows

Why it matters
Standing Global Admin accounts are one of the most common audit findings and one of the most targeted attack vectors. PIM eliminates them without eliminating the ability to perform privileged tasks when legitimately needed.
›Regular review cycles for administrative assignments

Why it matters
Access tends to accumulate. People change roles, projects end, contractors leave - but their access often stays. Scheduled access reviews catch this before it becomes a security or compliance issue.
›Elimination of standing Global Administrator access

Why it matters
A Global Admin account that exists all the time is a high-value target at all times. With PIM, that target only exists during an active, time-limited, logged session - dramatically reducing the attack surface.

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

2

"Privilege should exist only long enough to do the job - then disappear."

1

Scope access tightly - minimum required permissions for the specific task, nothing broader

2

Time-limit everything elevated - no access should persist past the window it was needed for

3

Review and recertify regularly - access accumulates silently; scheduled reviews catch it before it becomes a liability

Principle Three

Platform Consistency Reduces Risk

Govern Before You Grow

›Azure landing zone
Azure Landing Zone
A pre-configured Azure environment with defined management groups, policies, networking, and governance built in before workloads are deployed - the foundation for consistent, governable cloud operations.
structure and management groups

Why it matters
Starting without a landing zone means every workload team makes its own infrastructure decisions. The result is a patchwork of inconsistent configurations that is expensive to secure and nearly impossible to audit.
›Policy-driven configuration baselines across environments

Why it matters
Azure Policy
Azure Policy
Enforces organizational rules on Azure resources - can prevent non-compliant configurations from being created, audit existing ones, or automatically remediate deviations.
enforces the baseline continuously - not just at deployment time. Resources that drift from the standard are flagged or automatically remediated before they cause problems.
›Standardized network and identity patterns

Why it matters
When every workload uses the same network topology and identity integration pattern, troubleshooting is faster, security reviews are simpler, and new team members can get up to speed without decoding a different architecture every time.
›Consistent monitoring and logging across all workloads

Why it matters
You can't correlate a security event across environments that log differently. Consistent log format, retention, and destination is what makes investigation possible - and what makes compliance audits manageable.

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

3

"Configuration drift is slow, quiet, and one of the most common root causes of cloud security incidents."

1

Build governance in early - a landing zone established before workloads arrive costs a fraction of retrofitting it later

2

Enforce continuously - policy that runs at deployment time only misses everything that changes afterward

3

Consistency enables speed - teams move faster when they don't have to re-solve infrastructure decisions that have already been made

Principle Four

Security Must Be Integrated into Operations

Security as Daily Practice

›Unified security telemetry through Microsoft Defender XDR
Microsoft Defender XDR
Extended Detection and Response platform that correlates signals across endpoints, identity, email, and cloud apps into a single investigation and response experience.

Why it matters
Security tools that don't talk to each other force analysts to manually correlate signals across consoles. Defender XDR surfaces the full attack chain in one place - from the initial phishing email through lateral movement to the target resource.
›Centralized logging and investigation workflows

Why it matters
When logs live in different systems with different formats and different retention windows, incident response slows to a crawl. Centralized logging through Microsoft Sentinel
Microsoft Sentinel
Microsoft's cloud-native SIEM and SOAR platform - ingests signals from across the environment and enables automated detection, investigation, and response workflows.
turns hours of manual log hunting into minutes.
›Automated alert correlation where appropriate

Why it matters
Security teams are drowning in alerts. Correlation rules that group related low-fidelity signals into high-fidelity incidents dramatically reduce alert fatigue and surface what actually matters - without requiring human review of every individual event.
›Clear incident response procedures, tested regularly

Why it matters
An incident response plan that has never been tested is a document, not a capability. Tabletop exercises and regular reviews of actual incident timelines are what build the muscle memory that matters when something real happens.

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

4

"Security that lives in a separate lane from operations will always be too slow to matter."

1

Visibility across the full stack - identity, endpoints, email, cloud apps, and infrastructure in one correlated view

2

Reduce signal noise - automated correlation surfaces real incidents from thousands of individual alerts

3

Response must be practiced - plans that have never been tested are not plans, they are intentions

Principle Five

Infrastructure Must Be Observable

You Cannot Fix What You Cannot See

›Centralized logging and diagnostics across all layers

Why it matters
Problems in cloud environments rarely announce themselves cleanly. Centralized diagnostics across identity, network, compute, and application layers let you see what actually happened - not just what the affected service reported.
›Monitoring of configuration drift from defined baselines

Why it matters
Environments change - sometimes deliberately, sometimes not. Drift monitoring alerts when a setting deviates from the approved baseline before it becomes a vulnerability, a compliance finding, or an outage root cause.
›Operational dashboards for key platform services

Why it matters
Dashboards aren't just for executives. An operations team with a clear daily view of platform health - identity, endpoint compliance, security posture, resource utilization - catches problems proactively rather than reactively.
›Log retention sufficient for investigation and compliance

Why it matters
Incidents are rarely discovered in real time. When investigation begins weeks or months later, retention policies determine whether the data you need still exists. Compliance frameworks specify minimums; security best practice often requires more.

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

5

"Observability isn't about collecting everything. It's about knowing where to look when something goes wrong."

1

Visibility across all layers - identity, endpoints, applications, and cloud resources, not just uptime dashboards

2

Ongoing drift detection - catch configuration changes before they become incidents or audit findings

3

Retain for investigation - you cannot investigate an incident with logs that have already been purged

Principle Six

Documentation Enables Continuity

Architecture Decisions Have a Memory

›Architecture diagrams and current-state design standards

Why it matters
Architecture documentation that reflects aspirational state rather than actual state is actively harmful - it leads teams to make decisions based on a picture of the environment that doesn't exist. Current-state documentation is the only kind that has operational value.
›Security and governance policies in accessible form

Why it matters
Policies that exist only as PDFs in a SharePoint library are not operational. Security and governance policies need to be findable, readable, and maintained - otherwise teams make judgment calls that contradict them without realizing it.
›Operational runbooks for routine and emergency procedures

Why it matters
I write runbooks the team can actually follow under pressure - not architecture documents that require deep context to interpret. If the on-call engineer has never seen the system before, the runbook should still get them through it.
›Incident review summaries that inform future decisions

Why it matters
Incidents that are resolved without a written review get repeated. A concise post-incident summary - what happened, why, what changed - turns a painful event into an institutional learning that prevents the next one.

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

6

"Documentation that reflects what was planned, not what was built, is worse than no documentation at all."

1

Current state only - aspirational architecture diagrams mislead; accurate ones enable

2

Runbooks over reference docs - operational documentation must work under pressure, not just on a good day

3

Incidents are a curriculum - every unreviewed outage is a lesson the organization paid for but never learned

Principle Seven

Automation Should Assist Human Operators

Speed With Judgment

›Infrastructure deployment through templates and policy

Why it matters
Manual infrastructure deployments introduce variability. Every environment deployed from a validated template or policy baseline starts in a known-good state - reducing the surface area for configuration errors from the first moment.
›Automated configuration validation and drift detection

Why it matters
Checking 25 tenants manually for configuration drift is not operationally viable. PowerShell-driven validation against a defined baseline runs in minutes and surfaces deviations before they compound into larger problems.
›AI-assisted investigation and incident summarization

Why it matters
Tools like Security Copilot
Microsoft Security Copilot
AI-powered security analysis tool that synthesizes signals from Defender, Sentinel, and Entra ID to accelerate investigation - summarizing incidents, explaining findings, and suggesting remediation steps.
can synthesize a complex incident timeline in seconds. That doesn't replace analyst judgment - it eliminates the data-gathering work so analysts can focus on decision-making.
›Human review preserved for decisions with material impact

Why it matters
Automation is a force multiplier, not a replacement for judgment. Routine, well-understood tasks are candidates for automation. Decisions with significant security, compliance, or operational consequences require a human in the loop - full stop.

Daniel Lepel

Principal Microsoft Cloud Architect

daniellepel.com

7

"Automation should make good operators faster, not replace the judgment that makes them good."

1

Deploy from baselines - every environment starts in a known-good state, not wherever manual steps happened to land

2

AI as analyst support - Security Copilot and Defender AI accelerate investigation; they don't replace the analyst making the call

3

Humans own material decisions - full transparency and traceability in every automated action, with human review where consequences are significant

In Practice

How These Principles Work Together

These principles are not a checklist. They are a lens for evaluating tradeoffs. When an organization is under pressure to move fast, they are what prevent speed from becoming debt. When an environment is mature, they are what keep it that way.

Identity - the security boundary everything else depends on
Least Privilege - limits the damage any single compromise can cause
Consistency - prevents the drift that erodes both security and manageability
Security Ops - makes threats visible and usable in daily operations
Observability - makes the environment knowable and investigations possible
Documentation - preserves institutional knowledge across people and time
Automation - amplifies human capability without replacing human judgment

Daniel Lepel

Principal Microsoft Cloud Architect

· (212) 252-9200
daniellepel.com

“

Good architecture isn't about the technology - it's about whether the people relying on it can do their jobs without thinking about it.

”