[
  {
    "path": ".claude-plugin/marketplace.json",
    "content": "{\n  \"name\": \"karpathy-skills\",\n  \"id\": \"karpathy-skills\",\n  \"owner\": {\n    \"name\": \"forrestchang\"\n  },\n  \"metadata\": {\n    \"description\": \"Behavioral guidelines to reduce common LLM coding mistakes, derived from Andrej Karpathy's observations\",\n    \"version\": \"1.0.0\"\n  },\n  \"plugins\": [\n    {\n      \"name\": \"andrej-karpathy-skills\",\n      \"source\": \"./\",\n      \"description\": \"Behavioral guidelines to reduce common LLM coding mistakes: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution\",\n      \"version\": \"1.0.0\",\n      \"author\": {\n        \"name\": \"forrestchang\"\n      },\n      \"keywords\": [\n        \"guidelines\",\n        \"best-practices\",\n        \"coding\",\n        \"karpathy\"\n      ],\n      \"category\": \"workflow\"\n    }\n  ]\n}\n"
  },
  {
    "path": ".claude-plugin/plugin.json",
    "content": "{\n  \"name\": \"andrej-karpathy-skills\",\n  \"description\": \"Behavioral guidelines to reduce common LLM coding mistakes, derived from Andrej Karpathy's observations on LLM coding pitfalls\",\n  \"version\": \"1.0.0\",\n  \"author\": {\n    \"name\": \"forrestchang\"\n  },\n  \"license\": \"MIT\",\n  \"keywords\": [\"guidelines\", \"best-practices\", \"coding\", \"karpathy\"],\n  \"skills\": [\"./skills/karpathy-guidelines\"]\n}\n"
  },
  {
    "path": "CLAUDE.md",
    "content": "# CLAUDE.md\n\nBehavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.\n\n**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.\n\n## 1. Think Before Coding\n\n**Don't assume. Don't hide confusion. Surface tradeoffs.**\n\nBefore implementing:\n- State your assumptions explicitly. If uncertain, ask.\n- If multiple interpretations exist, present them - don't pick silently.\n- If a simpler approach exists, say so. Push back when warranted.\n- If something is unclear, stop. Name what's confusing. Ask.\n\n## 2. Simplicity First\n\n**Minimum code that solves the problem. Nothing speculative.**\n\n- No features beyond what was asked.\n- No abstractions for single-use code.\n- No \"flexibility\" or \"configurability\" that wasn't requested.\n- No error handling for impossible scenarios.\n- If you write 200 lines and it could be 50, rewrite it.\n\nAsk yourself: \"Would a senior engineer say this is overcomplicated?\" If yes, simplify.\n\n## 3. Surgical Changes\n\n**Touch only what you must. Clean up only your own mess.**\n\nWhen editing existing code:\n- Don't \"improve\" adjacent code, comments, or formatting.\n- Don't refactor things that aren't broken.\n- Match existing style, even if you'd do it differently.\n- If you notice unrelated dead code, mention it - don't delete it.\n\nWhen your changes create orphans:\n- Remove imports/variables/functions that YOUR changes made unused.\n- Don't remove pre-existing dead code unless asked.\n\nThe test: Every changed line should trace directly to the user's request.\n\n## 4. Goal-Driven Execution\n\n**Define success criteria. Loop until verified.**\n\nTransform tasks into verifiable goals:\n- \"Add validation\" → \"Write tests for invalid inputs, then make them pass\"\n- \"Fix the bug\" → \"Write a test that reproduces it, then make it pass\"\n- \"Refactor X\" → \"Ensure tests pass before and after\"\n\nFor multi-step tasks, state a brief plan:\n```\n1. [Step] → verify: [check]\n2. [Step] → verify: [check]\n3. [Step] → verify: [check]\n```\n\nStrong success criteria let you loop independently. Weak criteria (\"make it work\") require constant clarification.\n\n---\n\n**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.\n"
  },
  {
    "path": "EXAMPLES.md",
    "content": "# Examples\n\nReal-world code examples demonstrating the four principles. Each example shows what LLMs commonly do wrong and how to fix it.\n\n---\n\n## 1. Think Before Coding\n\n### Example 1: Hidden Assumptions\n\n**User Request:** \"Add a feature to export user data\"\n\n**❌ What LLMs Do (Wrong Assumptions)**\n\n```python\ndef export_users(format='json'):\n    \"\"\"Export all users to a file.\"\"\"\n    users = User.query.all()\n    \n    if format == 'json':\n        with open('users.json', 'w') as f:\n            json.dump([u.to_dict() for u in users], f)\n    elif format == 'csv':\n        with open('users.csv', 'w') as f:\n            writer = csv.DictWriter(f, fieldnames=['id', 'email', 'name'])\n            writer.writeheader()\n            writer.writerows([u.to_dict() for u in users])\n    \n    return f\"Exported {len(users)} users\"\n```\n\n**Problems:**\n- Assumed it should export ALL users (what about pagination? privacy?)\n- Assumed file location without asking\n- Assumed which fields to include\n- Assumed CSV fieldnames without checking actual data structure\n\n**✅ What Should Happen (Surface Assumptions)**\n\n```\nBefore implementing, I need to clarify:\n\n1. **Scope**: Export all users or filtered subset? (privacy implications)\n2. **Format**: You mentioned \"export\" - did you mean:\n   - Download file in browser?\n   - Background job with email notification?\n   - API endpoint returning data?\n3. **Fields**: Which user fields? (some might be sensitive)\n4. **Volume**: How many users typically? (affects approach)\n\nSimplest approach: Add an API endpoint that returns paginated JSON.\nWould need more info for file-based exports.\n\nWhat's your preference?\n```\n\n### Example 2: Multiple Interpretations\n\n**User Request:** \"Make the search faster\"\n\n**❌ What LLMs Do (Pick Silently)**\n\n```python\n# Adds caching, database indexes, and async processing\n# without asking which \"faster\" matters\n@lru_cache(maxsize=1000)\nasync def search(query: str) -> List[Result]:\n    # ... 200 lines of optimization code\n```\n\n**✅ What Should Happen**\n\n```\n\"Make the search faster\" could mean:\n\n1. **Faster response time** (< 100ms instead of 500ms)\n   - Add database indexes\n   - Cache frequent queries\n   - Estimated effort: 2 hours\n\n2. **Handle more concurrent searches** (throughput)\n   - Add async/await\n   - Connection pooling\n   - Estimated effort: 4 hours\n\n3. **Faster perceived speed** (UX)\n   - Show partial results immediately\n   - Progressive loading\n   - Estimated effort: 3 hours\n\nCurrent search takes ~500ms for typical queries. \nWhich aspect matters most for your use case?\n```\n\n---\n\n## 2. Simplicity First\n\n### Example 1: Over-abstraction\n\n**User Request:** \"Add a function to calculate discount\"\n\n**❌ What LLMs Do (Overengineered)**\n\n```python\nfrom abc import ABC, abstractmethod\nfrom enum import Enum\nfrom typing import Protocol, Union\nfrom dataclasses import dataclass\n\nclass DiscountStrategy(ABC):\n    @abstractmethod\n    def calculate(self, amount: float) -> float:\n        pass\n\nclass PercentageDiscount(DiscountStrategy):\n    def __init__(self, percentage: float):\n        self.percentage = percentage\n    \n    def calculate(self, amount: float) -> float:\n        return amount * (self.percentage / 100)\n\nclass FixedDiscount(DiscountStrategy):\n    def __init__(self, fixed_amount: float):\n        self.fixed_amount = fixed_amount\n    \n    def calculate(self, amount: float) -> float:\n        return min(self.fixed_amount, amount)\n\n@dataclass\nclass DiscountConfig:\n    strategy: DiscountStrategy\n    min_purchase: float = 0.0\n    max_discount: float = float('inf')\n\nclass DiscountCalculator:\n    def __init__(self, config: DiscountConfig):\n        self.config = config\n    \n    def apply_discount(self, amount: float) -> float:\n        if amount < self.config.min_purchase:\n            return 0\n        discount = self.config.strategy.calculate(amount)\n        return min(discount, self.config.max_discount)\n\n# Usage requires 30+ lines of setup for a simple calculation\n```\n\n**✅ What Should Happen (Simple)**\n\n```python\ndef calculate_discount(amount: float, percent: float) -> float:\n    \"\"\"Calculate discount amount. percent should be 0-100.\"\"\"\n    return amount * (percent / 100)\n\n# Usage\ndiscount = calculate_discount(100.0, 10.0)  # $10 off\n```\n\n**When to add complexity:** Only when you actually need multiple discount types. If that requirement comes later, refactor then.\n\n### Example 2: Speculative Features\n\n**User Request:** \"Save user preferences to database\"\n\n**❌ What LLMs Do (Unnecessary Features)**\n\n```python\nclass PreferenceManager:\n    def __init__(self, db, cache=None, validator=None):\n        self.db = db\n        self.cache = cache or InMemoryCache()\n        self.validator = validator or DefaultValidator()\n    \n    def save(self, user_id: int, prefs: dict, \n             merge: bool = True, \n             validate: bool = True,\n             notify: bool = False) -> bool:\n        \"\"\"Save preferences with optional merging, validation, and notifications.\"\"\"\n        \n        if validate:\n            errors = self.validator.validate(prefs)\n            if errors:\n                raise ValidationError(errors)\n        \n        existing = self.get(user_id) if merge else {}\n        \n        if merge:\n            existing.update(prefs)\n            prefs = existing\n        \n        self.db.execute(\n            \"UPDATE users SET preferences = ? WHERE id = ?\",\n            (json.dumps(prefs), user_id)\n        )\n        \n        if self.cache:\n            self.cache.set(f\"prefs:{user_id}\", prefs)\n        \n        if notify:\n            self.notify_preference_change(user_id, prefs)\n        \n        return True\n    \n    def notify_preference_change(self, user_id: int, prefs: dict):\n        # Another 30 lines for a feature nobody asked for\n        pass\n```\n\n**✅ What Should Happen (Just What Was Asked)**\n\n```python\ndef save_preferences(db, user_id: int, preferences: dict):\n    \"\"\"Save user preferences to database.\"\"\"\n    db.execute(\n        \"UPDATE users SET preferences = ? WHERE id = ?\",\n        (json.dumps(preferences), user_id)\n    )\n```\n\n**Add later if needed:** Caching (when performance matters), validation (when bad data appears), merging (when requirement emerges).\n\n---\n\n## 3. Surgical Changes\n\n### Example 1: Drive-by Refactoring\n\n**User Request:** \"Fix the bug where empty emails crash the validator\"\n\n**❌ What LLMs Do (Too Much)**\n\n```diff\n  def validate_user(user_data):\n-     # Check email format\n-     if not user_data.get('email'):\n+     \"\"\"Validate user data.\"\"\"\n+     email = user_data.get('email', '').strip()\n+     \n+     # Validate email\n+     if not email:\n          raise ValueError(\"Email required\")\n-     \n-     # Basic email validation\n-     if '@' not in user_data['email']:\n+     if '@' not in email or '.' not in email.split('@')[1]:\n          raise ValueError(\"Invalid email\")\n      \n-     # Check username\n-     if not user_data.get('username'):\n+     # Validate username\n+     username = user_data.get('username', '').strip()\n+     if not username:\n          raise ValueError(\"Username required\")\n+     if len(username) < 3:\n+         raise ValueError(\"Username too short\")\n+     if not username.isalnum():\n+         raise ValueError(\"Username must be alphanumeric\")\n      \n      return True\n```\n\n**Problems:**\n- \"Improved\" email validation beyond the bug fix\n- Added username validation nobody asked for\n- Changed comments\n- Added docstring\n\n**✅ What Should Happen (Surgical)**\n\n```diff\n  def validate_user(user_data):\n      # Check email format\n-     if not user_data.get('email'):\n+     email = user_data.get('email', '')\n+     if not email or not email.strip():\n          raise ValueError(\"Email required\")\n      \n      # Basic email validation\n-     if '@' not in user_data['email']:\n+     if '@' not in email:\n          raise ValueError(\"Invalid email\")\n      \n      # Check username\n      if not user_data.get('username'):\n          raise ValueError(\"Username required\")\n      \n      return True\n```\n\n**Only changed:** The specific lines that fix empty email handling.\n\n### Example 2: Style Drift\n\n**User Request:** \"Add logging to the upload function\"\n\n**❌ What LLMs Do (Reformat Everything)**\n\n```diff\n- def upload_file(file_path, destination):\n+ def upload_file(file_path: str, destination: str) -> bool:\n+     \"\"\"Upload file to destination with logging.\"\"\"\n+     logger.info(f\"Uploading {file_path} to {destination}\")\n+     \n      try:\n-         with open(file_path, 'rb') as f:\n-             data = f.read()\n-         \n-         response = requests.post(destination, files={'file': data})\n-         \n-         if response.status_code == 200:\n-             return True\n-         else:\n-             return False\n+         with open(file_path, \"rb\") as f:\n+             data = f.read()\n+         response = requests.post(destination, files={\"file\": data})\n+         success = response.status_code == 200\n+         \n+         if success:\n+             logger.info(f\"Upload successful: {file_path}\")\n+         else:\n+             logger.error(f\"Upload failed: {file_path}, status={response.status_code}\")\n+         \n+         return success\n      except Exception as e:\n-         print(f\"Error: {e}\")\n+         logger.exception(f\"Upload error for {file_path}: {e}\")\n          return False\n```\n\n**Problems:**\n- Changed quote style ('' to \"\")\n- Added type hints nobody asked for\n- Added docstring\n- Reformatted whitespace\n- Changed boolean return logic\n\n**✅ What Should Happen (Match Existing Style)**\n\n```diff\n+ import logging\n+ \n+ logger = logging.getLogger(__name__)\n+ \n  def upload_file(file_path, destination):\n+     logger.info(f'Starting upload: {file_path}')\n      try:\n          with open(file_path, 'rb') as f:\n              data = f.read()\n          \n          response = requests.post(destination, files={'file': data})\n          \n          if response.status_code == 200:\n+             logger.info(f'Upload successful: {file_path}')\n              return True\n          else:\n+             logger.error(f'Upload failed: {file_path}, status={response.status_code}')\n              return False\n      except Exception as e:\n-         print(f\"Error: {e}\")\n+         logger.exception(f'Upload error: {file_path}')\n          return False\n```\n\n**Matched:** Single quotes, no type hints, existing boolean pattern, spacing style.\n\n---\n\n## 4. Goal-Driven Execution\n\n### Example 1: Vague vs. Verifiable\n\n**User Request:** \"Fix the authentication system\"\n\n**❌ What LLMs Do (Vague Approach)**\n\n```\nI'll fix the authentication system by:\n1. Reviewing the code\n2. Identifying issues\n3. Making improvements\n4. Testing the changes\n\n[Proceeds to make changes without clear success criteria]\n```\n\n**✅ What Should Happen (Verifiable Goals)**\n\n```\nTo \"fix authentication,\" I need to define success criteria. \nWhat specific issue are we solving?\n\nIf the issue is \"users stay logged in after password change\":\n\nPlan:\n1. Write test: Change password → verify old session invalidated\n   Verify: Test fails (reproduces bug)\n\n2. Implement: Invalidate sessions on password change\n   Verify: Test passes\n\n3. Check edge cases: Multiple active sessions, concurrent changes\n   Verify: Additional tests pass\n\n4. Verify no regression: Existing auth tests still pass\n   Verify: Full test suite green\n\nCurrent test coverage for auth: [shows coverage]\nWhat's the specific auth issue you're seeing?\n```\n\n### Example 2: Multi-Step with Verification\n\n**User Request:** \"Add rate limiting to the API\"\n\n**❌ What LLMs Do (All at Once)**\n\n```python\n# Implements full rate limiting with Redis, multiple strategies,\n# configuration system, and monitoring in one 300-line commit\n# No clear verification steps\n```\n\n**✅ What Should Happen (Incremental with Verification)**\n\n```\nPlan for rate limiting:\n\n1. Add basic in-memory rate limiting (single endpoint)\n   Verify: \n   - Test: 100 requests → first 10 succeed, rest get 429\n   - Manual: curl endpoint 11 times, see rate limit error\n\n2. Extract to middleware (apply to all endpoints)\n   Verify:\n   - Test: Rate limits apply to /users and /posts\n   - Existing endpoint tests still pass\n\n3. Add Redis backend (for multi-server)\n   Verify:\n   - Test: Rate limit persists across app restarts\n   - Test: Two app instances share rate limit counter\n\n4. Add configuration (rates per endpoint)\n   Verify:\n   - Test: /search allows 10/min, /users allows 100/min\n   - Config file parsed correctly\n\nEach step is independently verifiable and deployable.\nStart with step 1?\n```\n\n### Example 3: Test-First Verification\n\n**User Request:** \"The sorting breaks when there are duplicate scores\"\n\n**❌ What LLMs Do (Fix Without Reproducing)**\n\n```python\n# Immediately changes sort logic without confirming the bug\ndef sort_scores(scores):\n    return sorted(scores, key=lambda x: (-x['score'], x['name']))\n```\n\n**✅ What Should Happen (Reproduce First)**\n\n```python\n# 1. First, write a test that reproduces the issue\ndef test_sort_with_duplicate_scores():\n    \"\"\"Test sorting when multiple items have same score.\"\"\"\n    scores = [\n        {'name': 'Alice', 'score': 100},\n        {'name': 'Bob', 'score': 100},\n        {'name': 'Charlie', 'score': 90},\n    ]\n    \n    result = sort_scores(scores)\n    \n    # The bug: order is non-deterministic for duplicates\n    # Run this test multiple times, it should be consistent\n    assert result[0]['score'] == 100\n    assert result[1]['score'] == 100\n    assert result[2]['score'] == 90\n\n# Verify: Run test 10 times → fails with inconsistent ordering\n\n# 2. Now fix with stable sort\ndef sort_scores(scores):\n    \"\"\"Sort by score descending, then name ascending for ties.\"\"\"\n    return sorted(scores, key=lambda x: (-x['score'], x['name']))\n\n# Verify: Test passes consistently\n```\n\n---\n\n## Anti-Patterns Summary\n\n| Principle | Anti-Pattern | Fix |\n|-----------|-------------|-----|\n| Think Before Coding | Silently assumes file format, fields, scope | List assumptions explicitly, ask for clarification |\n| Simplicity First | Strategy pattern for single discount calculation | One function until complexity is actually needed |\n| Surgical Changes | Reformats quotes, adds type hints while fixing bug | Only change lines that fix the reported issue |\n| Goal-Driven | \"I'll review and improve the code\" | \"Write test for bug X → make it pass → verify no regressions\" |\n\n## Key Insight\n\nThe \"overcomplicated\" examples aren't obviously wrong—they follow design patterns and best practices. The problem is **timing**: they add complexity before it's needed, which:\n\n- Makes code harder to understand\n- Introduces more bugs\n- Takes longer to implement\n- Harder to test\n\nThe \"simple\" versions are:\n- Easier to understand\n- Faster to implement\n- Easier to test\n- Can be refactored later when complexity is actually needed\n\n**Good code is code that solves today's problem simply, not tomorrow's problem prematurely.**\n"
  },
  {
    "path": "README.md",
    "content": "# Karpathy-Inspired Claude Code Guidelines\n\nA single `CLAUDE.md` file to improve Claude Code behavior, derived from [Andrej Karpathy's observations](https://x.com/karpathy/status/2015883857489522876) on LLM coding pitfalls.\n\n## The Problems\n\nFrom Andrej's post:\n\n> \"The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should.\"\n\n> \"They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do.\"\n\n> \"They still sometimes change/remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task.\"\n\n## The Solution\n\nFour principles in one file that directly address these issues:\n\n| Principle | Addresses |\n|-----------|-----------|\n| **Think Before Coding** | Wrong assumptions, hidden confusion, missing tradeoffs |\n| **Simplicity First** | Overcomplication, bloated abstractions |\n| **Surgical Changes** | Orthogonal edits, touching code you shouldn't |\n| **Goal-Driven Execution** | Leverage through tests-first, verifiable success criteria |\n\n## The Four Principles in Detail\n\n### 1. Think Before Coding\n\n**Don't assume. Don't hide confusion. Surface tradeoffs.**\n\nLLMs often pick an interpretation silently and run with it. This principle forces explicit reasoning:\n\n- **State assumptions explicitly** — If uncertain, ask rather than guess\n- **Present multiple interpretations** — Don't pick silently when ambiguity exists\n- **Push back when warranted** — If a simpler approach exists, say so\n- **Stop when confused** — Name what's unclear and ask for clarification\n\n### 2. Simplicity First\n\n**Minimum code that solves the problem. Nothing speculative.**\n\nCombat the tendency toward overengineering:\n\n- No features beyond what was asked\n- No abstractions for single-use code\n- No \"flexibility\" or \"configurability\" that wasn't requested\n- No error handling for impossible scenarios\n- If 200 lines could be 50, rewrite it\n\n**The test:** Would a senior engineer say this is overcomplicated? If yes, simplify.\n\n### 3. Surgical Changes\n\n**Touch only what you must. Clean up only your own mess.**\n\nWhen editing existing code:\n\n- Don't \"improve\" adjacent code, comments, or formatting\n- Don't refactor things that aren't broken\n- Match existing style, even if you'd do it differently\n- If you notice unrelated dead code, mention it — don't delete it\n\nWhen your changes create orphans:\n\n- Remove imports/variables/functions that YOUR changes made unused\n- Don't remove pre-existing dead code unless asked\n\n**The test:** Every changed line should trace directly to the user's request.\n\n### 4. Goal-Driven Execution\n\n**Define success criteria. Loop until verified.**\n\nTransform imperative tasks into verifiable goals:\n\n| Instead of... | Transform to... |\n|--------------|-----------------|\n| \"Add validation\" | \"Write tests for invalid inputs, then make them pass\" |\n| \"Fix the bug\" | \"Write a test that reproduces it, then make it pass\" |\n| \"Refactor X\" | \"Ensure tests pass before and after\" |\n\nFor multi-step tasks, state a brief plan:\n\n```\n1. [Step] → verify: [check]\n2. [Step] → verify: [check]\n3. [Step] → verify: [check]\n```\n\nStrong success criteria let the LLM loop independently. Weak criteria (\"make it work\") require constant clarification.\n\n## Install\n\n**Option A: Claude Code Plugin (recommended)**\n\nFrom within Claude Code, first add the marketplace:\n```\n/plugin marketplace add forrestchang/andrej-karpathy-skills\n```\n\nThen install the plugin:\n```\n/plugin install andrej-karpathy-skills@karpathy-skills\n```\n\nThis installs the guidelines as a Claude Code plugin, making the skill available across all your projects.\n\n**Option B: CLAUDE.md (per-project)**\n\nNew project:\n```bash\ncurl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md\n```\n\nExisting project (append):\n```bash\necho \"\" >> CLAUDE.md\ncurl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md\n```\n\n## Key Insight\n\nFrom Andrej:\n\n> \"LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go.\"\n\nThe \"Goal-Driven Execution\" principle captures this: transform imperative instructions into declarative goals with verification loops.\n\n## How to Know It's Working\n\nThese guidelines are working if you see:\n\n- **Fewer unnecessary changes in diffs** — Only requested changes appear\n- **Fewer rewrites due to overcomplication** — Code is simple the first time\n- **Clarifying questions come before implementation** — Not after mistakes\n- **Clean, minimal PRs** — No drive-by refactoring or \"improvements\"\n\n## Customization\n\nThese guidelines are designed to be merged with project-specific instructions. Add them to your existing `CLAUDE.md` or create a new one.\n\nFor project-specific rules, add sections like:\n\n```markdown\n## Project-Specific Guidelines\n\n- Use TypeScript strict mode\n- All API endpoints must have tests\n- Follow the existing error handling patterns in `src/utils/errors.ts`\n```\n\n## Tradeoff Note\n\nThese guidelines bias toward **caution over speed**. For trivial tasks (simple typo fixes, obvious one-liners), use judgment — not every change needs the full rigor.\n\nThe goal is reducing costly mistakes on non-trivial work, not slowing down simple tasks.\n\n## License\n\nMIT\n"
  },
  {
    "path": "skills/karpathy-guidelines/SKILL.md",
    "content": "---\nname: karpathy-guidelines\ndescription: Behavioral guidelines to reduce common LLM coding mistakes. Use when writing, reviewing, or refactoring code to avoid overcomplication, make surgical changes, surface assumptions, and define verifiable success criteria.\nlicense: MIT\n---\n\n# Karpathy Guidelines\n\nBehavioral guidelines to reduce common LLM coding mistakes, derived from [Andrej Karpathy's observations](https://x.com/karpathy/status/2015883857489522876) on LLM coding pitfalls.\n\n**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.\n\n## 1. Think Before Coding\n\n**Don't assume. Don't hide confusion. Surface tradeoffs.**\n\nBefore implementing:\n- State your assumptions explicitly. If uncertain, ask.\n- If multiple interpretations exist, present them - don't pick silently.\n- If a simpler approach exists, say so. Push back when warranted.\n- If something is unclear, stop. Name what's confusing. Ask.\n\n## 2. Simplicity First\n\n**Minimum code that solves the problem. Nothing speculative.**\n\n- No features beyond what was asked.\n- No abstractions for single-use code.\n- No \"flexibility\" or \"configurability\" that wasn't requested.\n- No error handling for impossible scenarios.\n- If you write 200 lines and it could be 50, rewrite it.\n\nAsk yourself: \"Would a senior engineer say this is overcomplicated?\" If yes, simplify.\n\n## 3. Surgical Changes\n\n**Touch only what you must. Clean up only your own mess.**\n\nWhen editing existing code:\n- Don't \"improve\" adjacent code, comments, or formatting.\n- Don't refactor things that aren't broken.\n- Match existing style, even if you'd do it differently.\n- If you notice unrelated dead code, mention it - don't delete it.\n\nWhen your changes create orphans:\n- Remove imports/variables/functions that YOUR changes made unused.\n- Don't remove pre-existing dead code unless asked.\n\nThe test: Every changed line should trace directly to the user's request.\n\n## 4. Goal-Driven Execution\n\n**Define success criteria. Loop until verified.**\n\nTransform tasks into verifiable goals:\n- \"Add validation\" → \"Write tests for invalid inputs, then make them pass\"\n- \"Fix the bug\" → \"Write a test that reproduces it, then make it pass\"\n- \"Refactor X\" → \"Ensure tests pass before and after\"\n\nFor multi-step tasks, state a brief plan:\n```\n1. [Step] → verify: [check]\n2. [Step] → verify: [check]\n3. [Step] → verify: [check]\n```\n\nStrong success criteria let you loop independently. Weak criteria (\"make it work\") require constant clarification.\n"
  }
]