kubernetes-specialist

تایید شده

Expert Kubernetes Specialist with deep expertise in container orchestration, cluster management, and cloud-native applications. Proficient in Kubernetes architecture, Helm charts, operators, and multi-cluster management across EKS, AKS, GKE, and on-premises deployments.

@404kidwiz•MIT•۱۴۰۴/۱۲/۳

(0)

۶ستاره

۰دانلود

۱بازدید

نصب مهارت

مهارت‌ها کدهای شخص ثالث از مخازن عمومی GitHub هستند. SkillHub الگوهای مخرب شناخته‌شده را اسکن می‌کند اما نمی‌تواند امنیت را تضمین کند. قبل از نصب، کد منبع را بررسی کنید.

نصب با CLI

نصب سراسری (سطح کاربر):

npx skillhub install 404kidwiz/claude-supercode-skills/kubernetes-specialist

نصب در پروژه فعلی:

npx skillhub install 404kidwiz/claude-supercode-skills/kubernetes-specialist --project

مسیر پیشنهادی: ~/.claude/skills/kubernetes-specialist/

محتوای SKILL.md

---
name: kubernetes-specialist
description: "Expert Kubernetes Specialist with deep expertise in container orchestration, cluster management, and cloud-native applications. Proficient in Kubernetes architecture, Helm charts, operators, and multi-cluster management across EKS, AKS, GKE, and on-premises deployments."
---

# Kubernetes Specialist

## Purpose

Provides expert Kubernetes orchestration and cloud-native application expertise with deep knowledge of container orchestration, cluster management, and production-grade deployments. Specializes in Kubernetes architecture, Helm charts, operators, multi-cluster management, and GitOps workflows across EKS, AKS, GKE, and on-premises deployments.

## When to Use

- Designing Kubernetes cluster architecture for production workloads
- Implementing Helm charts, operators, or GitOps workflows (ArgoCD, Flux)
- Troubleshooting cluster issues (networking, storage, performance)
- Planning Kubernetes upgrades or multi-cluster strategies
- Optimizing resource utilization and cost in Kubernetes environments
- Setting up service mesh (Istio, Linkerd) and observability
- Implementing Kubernetes security and RBAC policies

## Quick Start

**Invoke this skill when:**
- Designing Kubernetes cluster architecture for production workloads
- Implementing Helm charts, operators, or GitOps workflows
- Troubleshooting cluster issues (networking, storage, performance)
- Planning Kubernetes upgrades or multi-cluster strategies
- Optimizing resource utilization and cost in Kubernetes environments

**Do NOT invoke when:**
- Simple Docker container needs (use docker commands directly)
- Cloud infrastructure provisioning (use cloud-architect instead)
- Application code debugging (use backend-developer/frontend-developer)
- Database-specific issues (use database-administrator instead)

## Decision Framework

### Deployment Strategy Selection

```
├─ Zero downtime required?
│   ├─ Instant rollback needed → Blue-Green Deployment
│   │   Pros: Instant switch, easy rollback
│   │   Cons: 2x resources during deployment
│   │
│   ├─ Gradual rollout → Canary Deployment
│   │   Pros: Test with subset of traffic
│   │   Cons: Complex routing setup
│   │
│   └─ Simple updates → Rolling Update (default)
│       Pros: Built-in, no extra resources
│       Cons: Rollback takes time
│
├─ Stateful application?
│   ├─ Database → StatefulSet + PVC
│   │   Pros: Stable network IDs, ordered deployment
│   │   Cons: Complex scaling
│   │
│   └─ Stateless → Deployment
│       Pros: Easy scaling, self-healing
│
└─ Batch processing?
    ├─ One-time → Job
    ├─ Scheduled → CronJob
    └─ Parallel processing → Job with parallelism
```

### Resource Configuration Matrix

| Workload Type | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---------------|-------------|-----------|----------------|--------------|
| **Web API** | 100m-500m | 1000m | 256Mi-512Mi | 1Gi |
| **Worker** | 500m-1000m | 2000m | 512Mi-1Gi | 2Gi |
| **Database** | 1000m-2000m | 4000m | 2Gi-4Gi | 8Gi |
| **Cache** | 100m-250m | 500m | 1Gi-4Gi | 8Gi |
| **Batch Job** | 500m-2000m | 4000m | 1Gi-4Gi | 8Gi |

### Node Pool Strategy

| Use Case | Instance Type | Scaling | Cost |
|----------|--------------|---------|------|
| **System pods** | t3.large (3 nodes) | Fixed | Low |
| **Applications** | m5.xlarge | Auto 3-20 | Medium |
| **Batch/Spot** | m5.large-2xlarge | Auto 0-50 | Very Low |
| **GPU workloads** | p3.2xlarge | Manual | High |

### Red Flags → Escalate

**STOP and escalate if:**
- Cluster upgrade with breaking API changes (deprecated versions)
- Multi-region active-active requirements
- Compliance requirements (PCI-DSS, HIPAA) need validation
- Custom scheduler or controller development needed
- etcd corruption or cluster state issues

## Quality Checklist

### Cluster Configuration
- [ ] Multi-AZ deployment (nodes spread across availability zones)
- [ ] Node autoscaling configured (Cluster Autoscaler or Karpenter)
- [ ] System node pool with taints (separate critical addons from apps)
- [ ] Encryption enabled (secrets at rest with KMS)
- [ ] Audit logging enabled (API server logs)

### Security
- [ ] Pod Security Standards enforced (restricted or baseline)
- [ ] Network policies configured (default deny + explicit allow)
- [ ] RBAC configured (least privilege for all service accounts)
- [ ] Image scanning enabled (scan for vulnerabilities)
- [ ] Private container registry configured

### Resource Management
- [ ] All pods have resource requests and limits
- [ ] HorizontalPodAutoscalers configured for scalable workloads
- [ ] PodDisruptionBudgets defined (prevent too many pods down)
- [ ] ResourceQuotas set per namespace
- [ ] LimitRanges defined (default limits for pods)

### High Availability
- [ ] Deployments have ≥2 replicas
- [ ] Anti-affinity rules prevent pod co-location
- [ ] Readiness and liveness probes configured
- [ ] PodDisruptionBudgets allow for rolling updates
- [ ] Multi-region cluster (if global scale required)

### Observability
- [ ] Metrics server installed (kubectl top works)
- [ ] Prometheus monitoring application metrics
- [ ] Centralized logging (CloudWatch, Elasticsearch, Loki)
- [ ] Distributed tracing (Jaeger, Tempo)
- [ ] Dashboards for cluster and application health

### Disaster Recovery
- [ ] Velero installed for cluster backups
- [ ] Backup schedule configured (daily minimum)
- [ ] Restore tested (annual drill)
- [ ] etcd backups automated (cloud-managed clusters)

## Additional Resources

- **Detailed Technical Reference**: See [REFERENCE.md](REFERENCE.md)
- **Code Examples & Patterns**: See [EXAMPLES.md](EXAMPLES.md)

لینک‌ها

مخزن