azure-infra-engineer

Pass

Expert in Microsoft Azure cloud services, specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF).

@404kidwiz
MIT2/22/2026
(0)
6stars
0downloads
0views

Install Skill

Skills are third-party code from public GitHub repositories. SkillHub scans for known malicious patterns but cannot guarantee safety. Review the source code before installing.

Install globally (user-level):

npx skillhub install 404kidwiz/claude-supercode-skills/azure-infra-engineer

Install in current project:

npx skillhub install 404kidwiz/claude-supercode-skills/azure-infra-engineer --project

Suggested path: ~/.claude/skills/azure-infra-engineer/

SKILL.md Content

---
name: azure-infra-engineer
description: Expert in Microsoft Azure cloud services, specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF).
---

# Azure Infrastructure Engineer

## Purpose

Provides Microsoft Azure cloud expertise specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF) implementations. Designs and deploys enterprise-grade Azure environments with governance, networking, and infrastructure as code.

## When to Use

- Deploying Azure resources using Bicep or ARM templates
- Designing Hub-and-Spoke network topologies (Virtual WAN, ExpressRoute)
- Implementing Azure Policy and Management Groups (Governance)
- Migrating workloads to Azure (ASR, Azure Migrate)
- Automating Azure DevOps pipelines for infrastructure
- Configuring Azure Active Directory (Entra ID) RBAC and PIM

---
---

## 2. Decision Framework

### IaC Tool Selection (Azure Context)

| Tool | Status | Recommendation |
|------|--------|----------------|
| **Bicep** | **Recommended** | Native, first-class support, concise syntax. |
| **Terraform** | **Alternative** | Best for multi-cloud strategies. |
| **ARM Templates** | **Legacy** | Verbose JSON. Avoid for new projects (compile Bicep instead). |
| **PowerShell/CLI** | **Scripting** | Use for ad-hoc tasks or pipeline glue, not state management. |

### Networking Architecture

```
What is the connectivity need?
│
├─ **Hub-and-Spoke** (Standard)
│  ├─ Central Hub: Firewall, VPN Gateway, Bastion
│  └─ Spokes: Workload VNets (Peered to Hub)
│
├─ **Virtual WAN** (Global Scale)
│  ├─ Multi-region connectivity? → **Yes**
│  └─ Branch-to-Branch (SD-WAN)? → **Yes**
│
└─ **Private Access**
   ├─ PaaS Services? → **Private Link / Private Endpoints**
   └─ Service Endpoints? → Legacy (Use Private Link where possible)
```

### Governance Strategy (CAF)

1.  **Management Groups:** Hierarchy for policy inheritance (Root > Geo > Landing Zones).
2.  **Azure Policy:** "Deny" non-compliant resources (e.g., only East US region).
3.  **RBAC:** Least privilege access via Entra ID Groups.
4.  **Blueprints:** Rapid deployment of compliant environments (being replaced by Template Specs + Stacks).

**Red Flags → Escalate to `security-engineer`:**
- Public access enabled on Storage Accounts or SQL Databases
- Management Ports (RDP/SSH) open to internet
- Subscription Owner permissions granted to individual users (Use Contributors/PIM)
- No cost controls/budgets configured

---
---

## 4. Core Workflows

### Workflow 1: Bicep Resource Deployment

**Goal:** Deploy a secure Storage Account with Private Endpoint.

**Steps:**

1.  **Define Bicep Module (`storage.bicep`)**
    ```bicep
    param location string = resourceGroup().location
    param name string
    
    resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
      name: name
      location: location
      sku: { name: 'Standard_LRS' }
      kind: 'StorageV2'
      properties: {
        minimumTlsVersion: 'TLS1_2'
        supportsHttpsTrafficOnly: true
        publicNetworkAccess: 'Disabled' // Secure by default
      }
    }
    
    output id string = stg.id
    ```

2.  **Main Deployment (`main.bicep`)**
    ```bicep
    module storage './modules/storage.bicep' = {
      name: 'deployStorage'
      params: {
        name: 'stappprod001'
      }
    }
    ```

3.  **Deploy via CLI**
    ```bash
    az deployment group create --resource-group rg-prod --template-file main.bicep
    ```

---
---

### Workflow 3: Landing Zone Setup (CAF)

**Goal:** Establish the foundational hierarchy.

**Steps:**

1.  **Create Management Groups**
    -   `MG-Root`
        -   `MG-Platform` (Identity, Connectivity, Management)
        -   `MG-LandingZones` (Online, Corp)
        -   `MG-Sandbox` (Playground)

2.  **Assign Policies**
    -   Assign "Allowed Locations" to `MG-Root`.
    -   Assign "Enable Azure Monitor" to `MG-LandingZones`.

3.  **Deploy Hub Network**
    -   Deploy VNet in connectivity subscription.
    -   Deploy Azure Firewall and VPN Gateway.

---
---

## 5. Anti-Patterns & Gotchas

### ❌ Anti-Pattern 1: "ClickOps"

**What it looks like:**
-   Creating resources manually in the Azure Portal.

**Why it fails:**
-   Unrepeatable.
-   Configuration drift.
-   Disaster recovery is impossible (no code to redeploy).

**Correct approach:**
-   **Everything as Code:** Even if prototyping, export the ARM template or write basic Bicep.

### ❌ Anti-Pattern 2: One Giant Resource Group

**What it looks like:**
-   `rg-production` contains VNets, VMs, Databases, and Web Apps for 5 different projects.

**Why it fails:**
-   IAM nightmare (cannot grant access to Project A without Project B).
-   Tagging and cost analysis becomes difficult.
-   Risk of accidental deletion.

**Correct approach:**
-   **Lifecycle Grouping:** Group resources that share a lifecycle (e.g., `rg-network`, `rg-app1-prod`, `rg-app1-dev`).

### ❌ Anti-Pattern 3: Ignoring Naming Conventions

**What it looks like:**
-   `myvm1`, `test-storage`, `sql-server`.

**Why it fails:**
-   Cannot identify resource type, environment, or region from name.
-   Name collisions (Storage accounts must be globally unique).

**Correct approach:**
-   **CAF Naming Standard:** `[Resource Type]-[Workload]-[Environment]-[Region]-[Instance]`
-   Example: `st-myapp-prod-eus-001` (Storage Account, MyApp, Prod, East US, 001).

---
---

## 7. Quality Checklist

**Governance:**
-   [ ] **Naming:** Resources follow CAF naming conventions.
-   [ ] **Tagging:** Resources tagged with `CostCenter`, `Environment`, `Owner`.
-   [ ] **Policies:** Azure Policy enforces compliance (e.g., allowed SKUs).

**Security:**
-   [ ] **Network:** No public IPs on backend resources (VMs, DBs).
-   [ ] **Identity:** Managed Identities used instead of Service Principals/Keys where possible.
-   [ ] **Encryption:** CMK (Customer Managed Keys) enabled for sensitive data.

**Reliability:**
-   [ ] **Availability Zones:** Critical resources deployed zone-redundant (ZRS).
-   [ ] **Backup:** Azure Backup enabled for VMs and SQL.
-   [ ] **Locks:** Resource Locks (`CanNotDelete`) on critical production resources.

**Cost:**
-   [ ] **Sizing:** Resources right-sized based on metrics.
-   [ ] **Reservations:** Reserved Instances purchased for steady workloads.
-   [ ] **Cleanup:** Unused resources (orphaned disks/NICs) deleted.

## Examples

### Example 1: Multi-Subscription Landing Zone Setup

**Scenario:** A healthcare company needs to deploy a compliant landing zone for HIPAA-regulated workloads across three environments (dev, staging, prod).

**Architecture:**
1. **Management Group Hierarchy**: Root > Organization > Environments > Workloads
2. **Network Design**: Hub-and-spoke with Azure Firewall, separate VNets per environment
3. **Policy Enforcement**: Azure Policy to enforce HIPAA compliance (encryption, backup, private endpoints)
4. **CI/CD Pipeline**: Azure DevOps pipeline with approval gates for prod deployments

**Key Components:**
- Azure Firewall Manager for centralized policy
- Private DNS Zones for app-internal resolution
- Azure Backup with immutable vaults for compliance
- Cost Management tags for departmental chargebacks

### Example 2: Zero-Trust Network Architecture

**Scenario:** A financial services firm needs to replace their VPN-based access with a Zero Trust architecture using Azure Private Link and Conditional Access.

**Implementation:**
1. **Private Endpoints**: All PaaS services accessed via Private Endpoints (SQL, Storage, Key Vault)
2. **Identity-Based Access**: Conditional Access policies requiring compliant device and MFA
3. **Micro-segmentation**: NSG rules denying all traffic by default, allowing only required flows
4. **Monitoring**: Azure Sentinel for security analytics and anomaly detection

**Security Controls:**
- Azure AD Conditional Access with device compliance
- Just-In-Time VM access for administration
- Azure Defender for Cloud threat protection
- Comprehensive audit logging to Log Analytics

### Example 3: Cost-Optimized Dev/Test Environment

**Scenario:** A software company wants to reduce their Azure dev/test environment costs by 60% while maintaining developer productivity.

**Optimization Strategy:**
1. **Auto-Shutdown**: Dev VMs auto-shutdown evenings and weekends via Automation Runbooks
2. **Reserved Capacity**: Prod-like dev environments use Reserved Instances
3. **Dev-Optimized SKUs**: Development uses Dev/Test SKUs where available
4. **Tagging and Governance**: Required tags for cost allocation, orphaned resource cleanup

**Cost Savings Results:**
- 65% reduction in dev/test compute costs
- Automated cleanup of unused resources saving $2K/month
- Reserved Instance savings for stable environments
- Developer productivity maintained with auto-start capabilities

## Best Practices

### Infrastructure as Code

- **Everything as Code**: Every resource defined in Bicep, never manual portal changes
- **Module Library**: Create reusable Bicep modules for common patterns
- **Parameter Files**: Separate parameter files per environment (dev, staging, prod)
- **GitOps Workflow**: Infrastructure changes via PR and approval process
- **State Management**: Use AzDO stateful pipelines or Terraform backend

### Networking Excellence

- **Hub-and-Spoke Default**: Standard architecture for most workloads
- **Private by Default**: All PaaS access via Private Endpoints
- **DNS Planning**: Private DNS Zones with VNet links, avoid host file modifications
- **Firewall Integration**: Centralized threat protection with Azure Firewall
- **Hybrid Connectivity**: ExpressRoute for production, VPN for secondary

### Security Hardening

- **Least Privilege**: RBAC with specific roles, avoid Subscription Owner
- **Managed Identities**: Prefer over Service Principals with secrets
- **Secrets Management**: Key Vault for all secrets, never environment variables
- **Encryption Everywhere**: CMK for sensitive data, TLS 1.2+ everywhere
- **Network Isolation**: NSG rules denying by default, allow-listing required traffic

### Cost Management

- **Right-Sizing**: Regular review of actual utilization vs allocated size
- **Reservation Planning**: Identify stable workloads for Reserved Instances
- **Auto-Shutdown**: Dev/test resources off during off-hours
- **Tagging Strategy**: Required tags for cost center, environment, owner
- **Budget Alerts**: Budget thresholds with alerts at 50%, 75%, 90%

### Governance and Compliance

- **Policy as Guardrails**: Azure Policy for prevention, not just detection
- **Management Groups**: Hierarchy reflecting organizational structure
- **Blueprint Usage**: Azure Blueprints for standard compliant environments
- **Monitoring Strategy**: Centralized logging to Log Analytics workspace
- **Automation**: Runbooks for routine operational tasks