azure-infra-engineer
PassExpert in Microsoft Azure cloud services, specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF).
(0)
6stars
0downloads
0views
Install Skill
Skills are third-party code from public GitHub repositories. SkillHub scans for known malicious patterns but cannot guarantee safety. Review the source code before installing.
Install globally (user-level):
npx skillhub install 404kidwiz/claude-supercode-skills/azure-infra-engineerInstall in current project:
npx skillhub install 404kidwiz/claude-supercode-skills/azure-infra-engineer --projectSuggested path: ~/.claude/skills/azure-infra-engineer/
SKILL.md Content
---
name: azure-infra-engineer
description: Expert in Microsoft Azure cloud services, specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF).
---
# Azure Infrastructure Engineer
## Purpose
Provides Microsoft Azure cloud expertise specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF) implementations. Designs and deploys enterprise-grade Azure environments with governance, networking, and infrastructure as code.
## When to Use
- Deploying Azure resources using Bicep or ARM templates
- Designing Hub-and-Spoke network topologies (Virtual WAN, ExpressRoute)
- Implementing Azure Policy and Management Groups (Governance)
- Migrating workloads to Azure (ASR, Azure Migrate)
- Automating Azure DevOps pipelines for infrastructure
- Configuring Azure Active Directory (Entra ID) RBAC and PIM
---
---
## 2. Decision Framework
### IaC Tool Selection (Azure Context)
| Tool | Status | Recommendation |
|------|--------|----------------|
| **Bicep** | **Recommended** | Native, first-class support, concise syntax. |
| **Terraform** | **Alternative** | Best for multi-cloud strategies. |
| **ARM Templates** | **Legacy** | Verbose JSON. Avoid for new projects (compile Bicep instead). |
| **PowerShell/CLI** | **Scripting** | Use for ad-hoc tasks or pipeline glue, not state management. |
### Networking Architecture
```
What is the connectivity need?
│
├─ **Hub-and-Spoke** (Standard)
│ ├─ Central Hub: Firewall, VPN Gateway, Bastion
│ └─ Spokes: Workload VNets (Peered to Hub)
│
├─ **Virtual WAN** (Global Scale)
│ ├─ Multi-region connectivity? → **Yes**
│ └─ Branch-to-Branch (SD-WAN)? → **Yes**
│
└─ **Private Access**
├─ PaaS Services? → **Private Link / Private Endpoints**
└─ Service Endpoints? → Legacy (Use Private Link where possible)
```
### Governance Strategy (CAF)
1. **Management Groups:** Hierarchy for policy inheritance (Root > Geo > Landing Zones).
2. **Azure Policy:** "Deny" non-compliant resources (e.g., only East US region).
3. **RBAC:** Least privilege access via Entra ID Groups.
4. **Blueprints:** Rapid deployment of compliant environments (being replaced by Template Specs + Stacks).
**Red Flags → Escalate to `security-engineer`:**
- Public access enabled on Storage Accounts or SQL Databases
- Management Ports (RDP/SSH) open to internet
- Subscription Owner permissions granted to individual users (Use Contributors/PIM)
- No cost controls/budgets configured
---
---
## 4. Core Workflows
### Workflow 1: Bicep Resource Deployment
**Goal:** Deploy a secure Storage Account with Private Endpoint.
**Steps:**
1. **Define Bicep Module (`storage.bicep`)**
```bicep
param location string = resourceGroup().location
param name string
resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: name
location: location
sku: { name: 'Standard_LRS' }
kind: 'StorageV2'
properties: {
minimumTlsVersion: 'TLS1_2'
supportsHttpsTrafficOnly: true
publicNetworkAccess: 'Disabled' // Secure by default
}
}
output id string = stg.id
```
2. **Main Deployment (`main.bicep`)**
```bicep
module storage './modules/storage.bicep' = {
name: 'deployStorage'
params: {
name: 'stappprod001'
}
}
```
3. **Deploy via CLI**
```bash
az deployment group create --resource-group rg-prod --template-file main.bicep
```
---
---
### Workflow 3: Landing Zone Setup (CAF)
**Goal:** Establish the foundational hierarchy.
**Steps:**
1. **Create Management Groups**
- `MG-Root`
- `MG-Platform` (Identity, Connectivity, Management)
- `MG-LandingZones` (Online, Corp)
- `MG-Sandbox` (Playground)
2. **Assign Policies**
- Assign "Allowed Locations" to `MG-Root`.
- Assign "Enable Azure Monitor" to `MG-LandingZones`.
3. **Deploy Hub Network**
- Deploy VNet in connectivity subscription.
- Deploy Azure Firewall and VPN Gateway.
---
---
## 5. Anti-Patterns & Gotchas
### ❌ Anti-Pattern 1: "ClickOps"
**What it looks like:**
- Creating resources manually in the Azure Portal.
**Why it fails:**
- Unrepeatable.
- Configuration drift.
- Disaster recovery is impossible (no code to redeploy).
**Correct approach:**
- **Everything as Code:** Even if prototyping, export the ARM template or write basic Bicep.
### ❌ Anti-Pattern 2: One Giant Resource Group
**What it looks like:**
- `rg-production` contains VNets, VMs, Databases, and Web Apps for 5 different projects.
**Why it fails:**
- IAM nightmare (cannot grant access to Project A without Project B).
- Tagging and cost analysis becomes difficult.
- Risk of accidental deletion.
**Correct approach:**
- **Lifecycle Grouping:** Group resources that share a lifecycle (e.g., `rg-network`, `rg-app1-prod`, `rg-app1-dev`).
### ❌ Anti-Pattern 3: Ignoring Naming Conventions
**What it looks like:**
- `myvm1`, `test-storage`, `sql-server`.
**Why it fails:**
- Cannot identify resource type, environment, or region from name.
- Name collisions (Storage accounts must be globally unique).
**Correct approach:**
- **CAF Naming Standard:** `[Resource Type]-[Workload]-[Environment]-[Region]-[Instance]`
- Example: `st-myapp-prod-eus-001` (Storage Account, MyApp, Prod, East US, 001).
---
---
## 7. Quality Checklist
**Governance:**
- [ ] **Naming:** Resources follow CAF naming conventions.
- [ ] **Tagging:** Resources tagged with `CostCenter`, `Environment`, `Owner`.
- [ ] **Policies:** Azure Policy enforces compliance (e.g., allowed SKUs).
**Security:**
- [ ] **Network:** No public IPs on backend resources (VMs, DBs).
- [ ] **Identity:** Managed Identities used instead of Service Principals/Keys where possible.
- [ ] **Encryption:** CMK (Customer Managed Keys) enabled for sensitive data.
**Reliability:**
- [ ] **Availability Zones:** Critical resources deployed zone-redundant (ZRS).
- [ ] **Backup:** Azure Backup enabled for VMs and SQL.
- [ ] **Locks:** Resource Locks (`CanNotDelete`) on critical production resources.
**Cost:**
- [ ] **Sizing:** Resources right-sized based on metrics.
- [ ] **Reservations:** Reserved Instances purchased for steady workloads.
- [ ] **Cleanup:** Unused resources (orphaned disks/NICs) deleted.
## Examples
### Example 1: Multi-Subscription Landing Zone Setup
**Scenario:** A healthcare company needs to deploy a compliant landing zone for HIPAA-regulated workloads across three environments (dev, staging, prod).
**Architecture:**
1. **Management Group Hierarchy**: Root > Organization > Environments > Workloads
2. **Network Design**: Hub-and-spoke with Azure Firewall, separate VNets per environment
3. **Policy Enforcement**: Azure Policy to enforce HIPAA compliance (encryption, backup, private endpoints)
4. **CI/CD Pipeline**: Azure DevOps pipeline with approval gates for prod deployments
**Key Components:**
- Azure Firewall Manager for centralized policy
- Private DNS Zones for app-internal resolution
- Azure Backup with immutable vaults for compliance
- Cost Management tags for departmental chargebacks
### Example 2: Zero-Trust Network Architecture
**Scenario:** A financial services firm needs to replace their VPN-based access with a Zero Trust architecture using Azure Private Link and Conditional Access.
**Implementation:**
1. **Private Endpoints**: All PaaS services accessed via Private Endpoints (SQL, Storage, Key Vault)
2. **Identity-Based Access**: Conditional Access policies requiring compliant device and MFA
3. **Micro-segmentation**: NSG rules denying all traffic by default, allowing only required flows
4. **Monitoring**: Azure Sentinel for security analytics and anomaly detection
**Security Controls:**
- Azure AD Conditional Access with device compliance
- Just-In-Time VM access for administration
- Azure Defender for Cloud threat protection
- Comprehensive audit logging to Log Analytics
### Example 3: Cost-Optimized Dev/Test Environment
**Scenario:** A software company wants to reduce their Azure dev/test environment costs by 60% while maintaining developer productivity.
**Optimization Strategy:**
1. **Auto-Shutdown**: Dev VMs auto-shutdown evenings and weekends via Automation Runbooks
2. **Reserved Capacity**: Prod-like dev environments use Reserved Instances
3. **Dev-Optimized SKUs**: Development uses Dev/Test SKUs where available
4. **Tagging and Governance**: Required tags for cost allocation, orphaned resource cleanup
**Cost Savings Results:**
- 65% reduction in dev/test compute costs
- Automated cleanup of unused resources saving $2K/month
- Reserved Instance savings for stable environments
- Developer productivity maintained with auto-start capabilities
## Best Practices
### Infrastructure as Code
- **Everything as Code**: Every resource defined in Bicep, never manual portal changes
- **Module Library**: Create reusable Bicep modules for common patterns
- **Parameter Files**: Separate parameter files per environment (dev, staging, prod)
- **GitOps Workflow**: Infrastructure changes via PR and approval process
- **State Management**: Use AzDO stateful pipelines or Terraform backend
### Networking Excellence
- **Hub-and-Spoke Default**: Standard architecture for most workloads
- **Private by Default**: All PaaS access via Private Endpoints
- **DNS Planning**: Private DNS Zones with VNet links, avoid host file modifications
- **Firewall Integration**: Centralized threat protection with Azure Firewall
- **Hybrid Connectivity**: ExpressRoute for production, VPN for secondary
### Security Hardening
- **Least Privilege**: RBAC with specific roles, avoid Subscription Owner
- **Managed Identities**: Prefer over Service Principals with secrets
- **Secrets Management**: Key Vault for all secrets, never environment variables
- **Encryption Everywhere**: CMK for sensitive data, TLS 1.2+ everywhere
- **Network Isolation**: NSG rules denying by default, allow-listing required traffic
### Cost Management
- **Right-Sizing**: Regular review of actual utilization vs allocated size
- **Reservation Planning**: Identify stable workloads for Reserved Instances
- **Auto-Shutdown**: Dev/test resources off during off-hours
- **Tagging Strategy**: Required tags for cost center, environment, owner
- **Budget Alerts**: Budget thresholds with alerts at 50%, 75%, 90%
### Governance and Compliance
- **Policy as Guardrails**: Azure Policy for prevention, not just detection
- **Management Groups**: Hierarchy reflecting organizational structure
- **Blueprint Usage**: Azure Blueprints for standard compliant environments
- **Monitoring Strategy**: Centralized logging to Log Analytics workspace
- **Automation**: Runbooks for routine operational tasks