data-analysis

تایید شده

Comprehensive data analysis skill for CSV files using Python and pandas

@vstorm-cov1.0MIT۱۴۰۴/۱۱/۳۰

43از ۱۰۰

(0)

۲۸۹ستاره

۱۰۴دانلود

۱۱۶بازدید

نصب مهارت

مهارت‌ها کدهای شخص ثالث از مخازن عمومی GitHub هستند. SkillHub الگوهای مخرب شناخته‌شده را اسکن می‌کند اما نمی‌تواند امنیت را تضمین کند. قبل از نصب، کد منبع را بررسی کنید.

نصب با CLI

نصب سراسری (سطح کاربر):

npx skillhub install vstorm-co/pydantic-deepagents/data-analysis

نصب در پروژه فعلی:

npx skillhub install vstorm-co/pydantic-deepagents/data-analysis --project

مسیر پیشنهادی: ~/.claude/skills/data-analysis/

بررسی هوش مصنوعی

کیفیت دستورالعمل50

دقت توضیحات20

کاربردی بودن50

صحت فنی55

Scored 43 — organized code template collection but adds minimal value. Every template (read_csv, groupby, plt.bar) is something Claude already knows. Hardcoded /workspace/ paths make it platform-specific. Very weak description with no triggers.

محتوای SKILL.md

---
name: data-analysis
description: Comprehensive data analysis skill for CSV files using Python and pandas
tags:
  - python
  - pandas
  - data-analysis
  - visualization
version: "1.0"
author: pydantic-deep
---

# Data Analysis Skill

You are a data analysis expert. When this skill is loaded, follow these guidelines for analyzing data.

## Workflow

1. **Load the data**: Use pandas to read CSV files
2. **Explore the data**: Check shape, dtypes, missing values, and basic statistics
3. **Clean if needed**: Handle missing values, duplicates, and outliers
4. **Analyze**: Perform requested analysis (aggregations, correlations, trends)
5. **Visualize**: Create charts using matplotlib when appropriate
6. **Report**: Summarize findings clearly

## Code Templates

### Loading Data
```python
import pandas as pd
import matplotlib.pyplot as plt

# Load CSV
df = pd.read_csv('/uploads/filename.csv')

# Basic info
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(df.dtypes)
print(df.describe())
```

### Handling Missing Values
```python
# Check missing values
print(df.isnull().sum())

# Fill or drop
df = df.dropna()  # or
df = df.fillna(df.mean())  # for numeric columns
```

### Basic Analysis
```python
# Group by and aggregate
summary = df.groupby('category').agg({
    'value': ['mean', 'sum', 'count'],
    'other_col': 'first'
})

# Correlation
correlation = df.select_dtypes(include='number').corr()
```

### Visualization with Matplotlib

Always save charts to `/workspace/` directory so they can be viewed in the app.

```python
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better looking charts
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
```

#### Bar Chart
```python
plt.figure(figsize=(10, 6))
df.groupby('category')['value'].sum().plot(kind='bar', color='steelblue', edgecolor='black')
plt.title('Value by Category', fontsize=14, fontweight='bold')
plt.xlabel('Category')
plt.ylabel('Total Value')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('/workspace/bar_chart.png', dpi=150, bbox_inches='tight')
plt.close()
```

#### Line Chart (Time Series)
```python
plt.figure(figsize=(12, 6))
plt.plot(df['date'], df['value'], marker='o', linewidth=2, markersize=4)
plt.title('Value Over Time', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('/workspace/line_chart.png', dpi=150, bbox_inches='tight')
plt.close()
```

#### Pie Chart
```python
plt.figure(figsize=(8, 8))
data = df.groupby('category')['value'].sum()
plt.pie(data, labels=data.index, autopct='%1.1f%%', startangle=90,
        colors=sns.color_palette('pastel'))
plt.title('Distribution by Category', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('/workspace/pie_chart.png', dpi=150, bbox_inches='tight')
plt.close()
```

#### Histogram
```python
plt.figure(figsize=(10, 6))
plt.hist(df['value'], bins=20, color='steelblue', edgecolor='black', alpha=0.7)
plt.title('Value Distribution', fontsize=14, fontweight='bold')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.axvline(df['value'].mean(), color='red', linestyle='--', label=f'Mean: {df["value"].mean():.2f}')
plt.legend()
plt.tight_layout()
plt.savefig('/workspace/histogram.png', dpi=150, bbox_inches='tight')
plt.close()
```

#### Scatter Plot
```python
plt.figure(figsize=(10, 6))
plt.scatter(df['x'], df['y'], alpha=0.6, c=df['category'].astype('category').cat.codes, cmap='viridis')
plt.title('X vs Y Relationship', fontsize=14, fontweight='bold')
plt.xlabel('X')
plt.ylabel('Y')
plt.colorbar(label='Category')
plt.tight_layout()
plt.savefig('/workspace/scatter.png', dpi=150, bbox_inches='tight')
plt.close()
```

#### Heatmap (Correlation Matrix)
```python
plt.figure(figsize=(10, 8))
correlation = df.select_dtypes(include='number').corr()
sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0,
            fmt='.2f', square=True, linewidths=0.5)
plt.title('Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('/workspace/heatmap.png', dpi=150, bbox_inches='tight')
plt.close()
```

#### Multiple Subplots
```python
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Bar chart
df.groupby('category')['value'].sum().plot(kind='bar', ax=axes[0, 0], color='steelblue')
axes[0, 0].set_title('Total by Category')
axes[0, 0].tick_params(axis='x', rotation=45)

# Plot 2: Line chart
df.groupby('date')['value'].mean().plot(ax=axes[0, 1], marker='o')
axes[0, 1].set_title('Average Over Time')

# Plot 3: Histogram
axes[1, 0].hist(df['value'], bins=15, color='green', alpha=0.7)
axes[1, 0].set_title('Value Distribution')

# Plot 4: Box plot
df.boxplot(column='value', by='category', ax=axes[1, 1])
axes[1, 1].set_title('Value by Category')
plt.suptitle('')  # Remove auto-generated title

plt.tight_layout()
plt.savefig('/workspace/dashboard.png', dpi=150, bbox_inches='tight')
plt.close()
```

### Interactive HTML Charts (Plotly)

For interactive charts that can be viewed in the browser:

```python
import plotly.express as px
import plotly.graph_objects as go

# Interactive bar chart
fig = px.bar(df, x='category', y='value', color='category',
             title='Value by Category')
fig.write_html('/workspace/interactive_bar.html')

# Interactive line chart
fig = px.line(df, x='date', y='value', title='Value Over Time',
              markers=True)
fig.write_html('/workspace/interactive_line.html')

# Interactive scatter with hover
fig = px.scatter(df, x='x', y='y', color='category', size='value',
                 hover_data=['name'], title='Interactive Scatter')
fig.write_html('/workspace/interactive_scatter.html')

# Interactive pie chart
fig = px.pie(df, values='value', names='category', title='Distribution')
fig.write_html('/workspace/interactive_pie.html')
```

## Best Practices

1. **Always show the first few rows** with `df.head()` to verify data loaded correctly
2. **Check data types** before operations - convert if necessary
3. **Handle edge cases** - empty data, single values, etc.
4. **Use descriptive variable names** in analysis code
5. **Save visualizations** to `/workspace/` directory
6. **Print intermediate results** so the user can follow along

## Output Format

When presenting results:
- Use clear section headers
- Include relevant statistics
- Explain what the numbers mean
- Provide actionable insights when possible