[
  {
    "path": ".gitignore",
    "content": "config/agent_config.yaml\ntool/add_header.sh"
  },
  {
    "path": "CHANGELOG.md",
    "content": "0.0.1 (April 17, 2025)\n\n### First Version\n\nInclude web UI, CLI for DocAgent."
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "# Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, we as\ncontributors and maintainers pledge to make participation in our project and\nour community a harassment-free experience for everyone, regardless of age, body\nsize, disability, ethnicity, sex characteristics, gender identity and expression,\nlevel of experience, education, socio-economic status, nationality, personal\nappearance, race, religion, or sexual identity and orientation.\n\n## Our Standards\n\nExamples of behavior that contributes to creating a positive environment\ninclude:\n\n* Using welcoming and inclusive language\n* Being respectful of differing viewpoints and experiences\n* Gracefully accepting constructive criticism\n* Focusing on what is best for the community\n* Showing empathy towards other community members\n\nExamples of unacceptable behavior by participants include:\n\n* The use of sexualized language or imagery and unwelcome sexual attention or\nadvances\n* Trolling, insulting/derogatory comments, and personal or political attacks\n* Public or private harassment\n* Publishing others' private information, such as a physical or electronic\naddress, without explicit permission\n* Other conduct which could reasonably be considered inappropriate in a\nprofessional setting\n\n## Our Responsibilities\n\nProject maintainers are responsible for clarifying the standards of acceptable\nbehavior and are expected to take appropriate and fair corrective action in\nresponse to any instances of unacceptable behavior.\n\nProject maintainers have the right and responsibility to remove, edit, or\nreject comments, commits, code, wiki edits, issues, and other contributions\nthat are not aligned to this Code of Conduct, or to ban temporarily or\npermanently any contributor for other behaviors that they deem inappropriate,\nthreatening, offensive, or harmful.\n\n## Scope\n\nThis Code of Conduct applies within all project spaces, and it also applies when\nan individual is representing the project or its community in public spaces.\nExamples of representing a project or community include using an official\nproject e-mail address, posting via an official social media account, or acting\nas an appointed representative at an online or offline event. Representation of\na project may be further defined and clarified by project maintainers.\n\nThis Code of Conduct also applies outside the project spaces when there is a\nreasonable belief that an individual's behavior may have a negative impact on\nthe project or its community.\n\n## Enforcement\n\nInstances of abusive, harassing, or otherwise unacceptable behavior may be\nreported by contacting the project team at <opensource-conduct@meta.com>. All\ncomplaints will be reviewed and investigated and will result in a response that\nis deemed necessary and appropriate to the circumstances. The project team is\nobligated to maintain confidentiality with regard to the reporter of an incident.\nFurther details of specific enforcement policies may be posted separately.\n\nProject maintainers who do not follow or enforce the Code of Conduct in good\nfaith may face temporary or permanent repercussions as determined by other\nmembers of the project's leadership.\n\n## Attribution\n\nThis Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,\navailable at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html\n\n[homepage]: https://www.contributor-covenant.org\n\nFor answers to common questions about this code of conduct, see\nhttps://www.contributor-covenant.org/faq"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing to DocAgent\nWe want to make contributing to this project as easy and transparent as\npossible.\n\n## Pull Requests\nWe actively welcome your pull requests.\n\n1. Fork the repo and create your branch from `main`.\n2. If you've added code that should be tested, add tests.\n3. If you've changed APIs, update the documentation.\n4. Ensure the test suite passes.\n5. Make sure your code lints.\n6. If you haven't already, complete the Contributor License Agreement (\"CLA\").\n\n## Contributor License Agreement (\"CLA\")\nIn order to accept your pull request, we need you to submit a CLA. You only need\nto do this once to work on any of Meta's open source projects.\n\nComplete your CLA here: <https://code.facebook.com/cla>\n\n## Issues\nWe use GitHub issues to track public bugs. Please ensure your description is\nclear and has sufficient instructions to be able to reproduce the issue.\n\nMeta has a [bounty program](https://bugbounty.meta.com/) for the safe\ndisclosure of security bugs. In those cases, please go through the process\noutlined on that page and do not file a public issue.\n\n## Coding Style\n* 2 spaces for indentation rather than tabs\n* 80 character line length\n* Use [Black](https://github.com/psf/black) for code formatting.\n* Use [Flake8](https://flake8.pycqa.org/en/latest/) for linting.\n* Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guidelines.\n* Use snake_case for variable and function names.\n* Use PascalCase for class names.\n* Write docstrings for all public modules, classes, functions, and methods using Google style.\n* Use type hints for function signatures.\n* Keep imports organized: standard library first, then third-party libraries, then local application/library specific imports, each group separated by a blank line. Use [isort](https://pycqa.github.io/isort/) to automate this.\n\n## License\nBy contributing to DocAgent, you agree that your contributions will be licensed\nunder the LICENSE file in the root directory of this source tree."
  },
  {
    "path": "INSTALL.md",
    "content": "# Installation Guide\n\nThis guide details how to set up the environment for DocAgent.\n\n## Option 1: Installation with pip (Recommended)\n\n### Basic Installation\nTo install the basic package with core dependencies:\n\n```bash\n# For all dependencies\npip install -e \".[all]\"\n```\n\n\n\n## Development Setup\n\nFor development, we recommend installing in editable mode with dev dependencies:\n\n```bash\n# Install the package in editable mode with dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n```\n\n## Troubleshooting\n\n### GraphViz Dependencies\n\nFor visualization components, you may need to install system-level dependencies for GraphViz:\n\n```bash\n# Ubuntu/Debian\nsudo apt-get install graphviz graphviz-dev\n\n# CentOS/RHEL\nsudo yum install graphviz graphviz-devel\n\n# macOS\nbrew install graphviz\n```\n\n### CUDA Support\n\nIf you're using CUDA for accelerated processing, ensure you have the correct CUDA toolkit installed that matches your PyTorch version. "
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) Meta Platforms, Inc. and affiliates.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE."
  },
  {
    "path": "README.md",
    "content": "# DocAgent: Agentic Hierarchical Docstring Generation System\n\n<p align=\"center\">\n  <img src=\"assets/meta_logo_white.png\" width=\"20%\" alt=\"Meta Logo\">\n</p>\n\nDocAgent is a system designed to generate high-quality, context-aware docstrings for Python codebases using a multi-agent approach and hierarchical processing.\n\n## Citation\n\nIf you use DocAgent in your research, please cite our paper:\n\n```bibtex\n@misc{yang2025docagent,\n      title={DocAgent: A Multi-Agent System for Automated Code Documentation Generation}, \n      author={Dayu Yang and Antoine Simoulin and Xin Qian and Xiaoyi Liu and Yuwei Cao and Zhaopu Teng and Grey Yang},\n      year={2025},\n      eprint={2504.08725},\n      archivePrefix={arXiv},\n      primaryClass={cs.SE}\n}\n```\n\nYou can find the paper on arXiv: [https://arxiv.org/abs/2504.08725](https://arxiv.org/abs/2504.08725)\n\n## Table of Contents\n\n- [Motivation](#motivation)\n- [Methodology](#methodology)\n- [Installation](#installation)\n- [Components](#components)\n- [Configuration](#configuration)\n- [Usage](#usage)\n- [Running the Evaluation System](#running-the-evaluation-system)\n- [Optional: Using a Local LLM](#optional-using-a-local-llm)\n\n## Motivation\n\nHigh-quality docstrings are crucial for code readability, usability, and maintainability, especially in large repositories. They should explain the purpose, parameters, returns, exceptions, and usage within the broader context. Current LLMs often struggle with this, producing superficial or redundant comments and failing to capture essential context or rationale. DocAgent aims to address these limitations by generating informative, concise, and contextually aware docstrings.\n\n## Methodology\n\nDocAgent employs two key strategies:\n\n1.  **Hierarchical Traversal**: Processes code components by analyzing dependencies, starting with files having fewer dependencies. This builds a documented foundation before tackling more complex code, addressing the challenge of documenting context that itself lacks documentation.\n2.  **Agentic System**: Utilizes a team of specialized agents (`Reader`, `Searcher`, `Writer`, `Verifier`) coordinated by an `Orchestrator`. This system gathers context (internal and external), drafts docstrings according to standards, and verifies their quality in an iterative process.\n\n<img src=\"assets/system.png\" width=\"100%\" alt=\"System Overview\">\n\nFor more details on the agentic framework, see the [Agent Component README](./src/agent/README.md).\n\n## Installation\n\n1.  Clone the repository:\n    ```bash\n    git clone <repository_url>\n    cd DocAgent\n    ```\n2.  Install the necessary dependencies. It's recommended to use a virtual environment:\n    ```bash\n    python -m venv venv\n    source venv/bin/activate # if you use venv, you can also use conda\n    pip install -e .\n    ```\n    *Note: For optional features like development tools, web UI components, or specific hardware support (e.g., CUDA), refer to the comments in `setup.py` and install extras as needed (e.g., `pip install -e \".[dev,web]\"`).*\n\n## Components\n\nDocAgent is composed of several key parts:\n\n- **[Core Agent Framework](./src/agent/README.md)**: Implements the multi-agent system (Reader, Searcher, Writer, Verifier, Orchestrator) responsible for the generation logic.\n- **[Docstring Evaluator](./src/evaluator/README.md)**: Provides tools for evaluating docstring quality, primarily focusing on completeness based on static code analysis (AST). *Note: Evaluation is run separately, see its README.*\n- **[Generation Web UI](./src/web/README.md)**: A web interface for configuring, running, and monitoring the docstring *generation* process in real-time.\n\n## Configuration\n\nBefore running DocAgent, you **must** create a configuration file named `config/agent_config.yaml`. This file specifies crucial parameters for the agents, such as the LLM endpoints, API keys (if required), model names, and generation settings.\n\n1.  **Copy the Example**: An example configuration file is provided at `config/example_config.yaml`. Copy this file to `config/agent_config.yaml`:\n    ```bash\n    cp config/example_config.yaml config/agent_config.yaml\n    ```\n2.  **Edit the Configuration**: Open `config/agent_config.yaml` in a text editor and modify the settings according to your environment and requirements. Pay close attention to the LLM provider, model selection, and any necessary API credentials.\n\n## Usage\n\nYou can run the docstring generation process using either the command line or the web UI.\n\n**1. Command Line Interface (CLI)**\n\nThis is the primary method for running the generation process directly.\n\n```bash\n# Example: Run on a test repo (remove existing docstrings first if desired)\n./test/tool/remove_docstrings.sh data/raw_test_repo\npython generate_docstrings.py --repo-path data/raw_test_repo\n```\nUse `python generate_docstrings.py --help` to see available options, such as specifying different configurations or test modes.\n\n**2. Generation Web UI**\n\nThe web UI provides a graphical interface to configure, run, and monitor the process.\n\n- Note that when input repo path, always put complete absolute path.\n\n```bash\n# Launch the web UI server\npython run_web_ui.py --host 0.0.0.0 --port 5000\n```\n\nThen, access the UI in your web browser, typically at `http://localhost:5000`. If running the server remotely, you might need to set up SSH tunneling (see instructions below or the [Web UI README](./src/web/README.md)).\n\n*Basic SSH Tunneling (if running server remotely):*\n```bash\n# In your local terminal\nssh -L 5000:localhost:5000 <your_remote_username>@<your_remote_host>\n# Then access http://localhost:5000 in your local browser\n```\n\n## Running the Evaluation System\n\nDocAgent includes a separate web-based interface for evaluating the quality of generated docstrings.\n\n**1. Running Locally**\n\nTo run the evaluation system on your local machine:\n\n```bash\npython src/web_eval/app.py\n```\n\nThen, access the evaluation UI in your web browser at `http://localhost:5001`.\n\n**2. Running on a Remote Server**\n\nTo run the evaluation system on a remote server:\n\n```bash\npython src/web_eval/app.py --host 0.0.0.0 --port 5001\n```\n\nThen, set up SSH tunneling to access the remote server from your local machine:\n\n```bash\nssh -L 5001:localhost:5001 <your_remote_username>@<your_remote_host>\n```\n\nOnce the tunnel is established, access the evaluation UI in your local web browser at `http://localhost:5001`.\n\n## Optional: Using a Local LLM\n\nIf you prefer to use a local LLM (e.g., one hosted via Hugging Face), you can configure DocAgent to interact with it via an API endpoint.\n\n1.  **Serve the Local LLM**: Use a tool like `vllm` to serve your model. A convenience script is provided:\n    ```bash\n    # Ensure vllm is installed: pip install vllm\n    bash tool/serve_local_llm.sh\n    ```\n    This script will likely start an OpenAI-compatible API server (check the script details). Note the URL where the model is served (e.g., `http://localhost:8000/v1`).\n\n2.  **Configure DocAgent**: Update your `config/agent_config.yaml` to point to the local LLM API endpoint. You'll typically need to set:\n    - The `provider` to `openai` (if using an OpenAI-compatible server like vllm's default).\n    - The `api_base` or equivalent URL parameter to your local server address (e.g., `http://localhost:8000/v1`).\n    - The `model_name` to the appropriate identifier for your local model.\n    - Set the `api_key` to `None` or an empty string if no key is required by your local server.\n\n3.  **Run DocAgent**: Run the generation process as usual (CLI or Web UI). DocAgent will now send requests to your local LLM.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n"
  },
  {
    "path": "config/example_config.yaml",
    "content": "# Example configuration file for DocAgent\n# Copy this file to agent_config.yaml and add your own API keys\n\n# LLM configuration for all agents\nllm:\n  # Choose ONE of the following LLM provider configurations by uncommenting\n  \n  # Option 1: Claude (Anthropic)\n  type: \"claude\"  \n  api_key: \"your-anthropic-api-key-here\"  \n  model: \"claude-3-5-haiku-latest\"  # Options: claude-3-5-sonnet, claude-3-opus, etc.\n  temperature: 0.1\n  max_output_tokens: 4096\n  max_input_tokens: 100000  # Maximum number of tokens for input context\n  \n  # Option 2: OpenAI\n  # type: \"openai\"\n  # api_key: \"your-openai-api-key-here\"\n  # model: \"gpt-4o\"  # Options: gpt-4o, gpt-4-turbo, gpt-3.5-turbo, etc.\n  # temperature: 0.1\n  # max_output_tokens: 4096\n  # max_input_tokens: 100000\n\n  # Option 3: Gemini\n  # type: \"gemini\"\n  # api_key: \"your-gemini-api-key-here\"\n  # model: \"gemini-1.5-pro\"\n  # temperature: 0.1\n  # max_output_tokens: 4096\n  # max_input_tokens: 100000\n\n  # Option 4: HuggingFace (for local models)\n  # type: \"huggingface\"\n  # model: \"codellama/CodeLlama-34b-Instruct-hf\"\n  # api_base: \"http://localhost:8000/v1\"  # Local API endpoint\n  # api_key: \"EMPTY\"  # Can be empty for local models\n  # device: \"cuda\"  # Options: cuda, cpu\n  # torch_dtype: \"float16\"\n  # temperature: 0.1\n  # max_output_tokens: 4096\n  # max_input_tokens: 32000\n\n# Rate limit settings for different LLM providers\n# These are default values - adjust based on your specific API tier\nrate_limits:\n  # Claude rate limits\n  claude:\n    requests_per_minute: 50\n    input_tokens_per_minute: 20000\n    output_tokens_per_minute: 8000\n    input_token_price_per_million: 3.0\n    output_token_price_per_million: 15.0\n\n  # OpenAI rate limits\n  openai:\n    requests_per_minute: 500\n    input_tokens_per_minute: 200000\n    output_tokens_per_minute: 100000\n    input_token_price_per_million: 0.15\n    output_token_price_per_million: 0.60\n\n  # Gemini rate limits\n  gemini:\n    requests_per_minute: 60\n    input_tokens_per_minute: 30000\n    output_tokens_per_minute: 10000\n    input_token_price_per_million: 0.125\n    output_token_price_per_million: 0.375\n\n# Flow control parameters\nflow_control:\n  max_reader_search_attempts: 2  # Maximum times reader can call searcher\n  max_verifier_rejections: 1     # Maximum times verifier can reject a docstring\n  status_sleep_time: 1           # Time to sleep between status updates (seconds)\n\n# Docstring generation options\ndocstring_options:\n  overwrite_docstrings: false  # Whether to overwrite existing docstrings (default: false)\n\n# Perplexity API configuration (for web search capability)\nperplexity:\n  api_key: \"your-perplexity-api-key-here\"  # Replace with your actual Perplexity API key\n  model: \"sonar\"  # Default model\n  temperature: 0.1\n  max_output_tokens: 250 "
  },
  {
    "path": "data/raw_test_repo/README.md",
    "content": "# Vending Machine Test Repository\n\nA comprehensive vending machine implementation in Python that demonstrates various programming concepts, design patterns, and documentation styles. This repository serves as a test bed for docstring generation systems and code documentation analysis.\n\n## Project Structure\n\n```\ntest_repo_vm/\n├── __init__.py              # Main package initialization\n├── example.py              # Example usage demonstration\n├── vending_machine.py      # Main vending machine implementation\n├── models/                 # Data models\n│   ├── __init__.py\n│   └── product.py         # Product class definition\n├── payment/               # Payment processing\n│   ├── __init__.py\n│   └── payment_processor.py # Payment-related classes\n└── inventory/            # Inventory management\n    ├── __init__.py\n    └── inventory_manager.py # Inventory tracking system\n```\n\n## Components\n\n### 1. Product Management (`models/product.py`)\n- `Product` class with attributes like ID, name, price, quantity, and expiry date\n- Methods for checking availability and managing stock\n\n### 2. Payment Processing (`payment/payment_processor.py`)\n- Abstract `PaymentMethod` base class for different payment types\n- `CashPayment` implementation for handling cash transactions\n- `PaymentTransaction` class for tracking payment status\n- `PaymentStatus` enum for transaction states\n\n### 3. Inventory Management (`inventory/inventory_manager.py`)\n- `InventoryManager` class for product storage and retrieval\n- Slot-based product organization\n- Stock level tracking\n- Product availability checking\n\n### 4. Main Vending Machine (`vending_machine.py`)\n- `VendingMachine` class that coordinates all components\n- Product selection and purchase workflow\n- Payment processing and change calculation\n- Exception handling for error cases\n\n## Code Features\n\nThis repository demonstrates various Python programming features:\n\n1. **Object-Oriented Design**\n   - Abstract base classes\n   - Inheritance\n   - Encapsulation\n   - Interface definitions\n\n2. **Modern Python Features**\n   - Type hints\n   - Dataclasses\n   - Enums\n   - Optional types\n   - Package organization\n\n3. **Documentation**\n   - Comprehensive docstrings\n   - Type annotations\n   - Code organization\n   - Exception documentation\n\n4. **Best Practices**\n   - SOLID principles\n   - Clean code architecture\n   - Error handling\n   - Modular design\n\n## Usage Example\n\n```python\nfrom decimal import Decimal\nfrom vending_machine import VendingMachine\nfrom models.product import Product\n\n# Create a vending machine\nvm = VendingMachine()\n\n# Add products\nproduct = Product(\n    id=\"COLA001\",\n    name=\"Cola Classic\",\n    price=1.50,\n    quantity=10,\n    category=\"drinks\"\n)\nvm.inventory.add_product(product, slot=0)\n\n# Insert money\nvm.insert_money(Decimal('2.00'))\n\n# Purchase product\nproduct, change = vm.purchase_product(slot=0)\nprint(f\"Purchased: {product.name}\")\nprint(f\"Change: ${change:.2f}\")\n```\n\n## Running the Example\n\nTo run the example implementation:\n\n```bash\npython example.py\n```\n\nThis will demonstrate:\n1. Creating a vending machine\n2. Adding products to inventory\n3. Displaying available products\n4. Making a purchase\n5. Handling change\n6. Updating inventory\n\n## Testing Documentation Generation\n\nThis repository is structured to test various aspects of documentation generation:\n\n1. **Complex Imports**\n   - Cross-module dependencies\n   - Package-level imports\n   - Relative imports\n\n2. **Documentation Styles**\n   - Function documentation\n   - Class documentation\n   - Module documentation\n   - Package documentation\n\n3. **Code Complexity**\n   - Multiple inheritance\n   - Abstract classes\n   - Type annotations\n   - Exception hierarchies\n\n## Requirements\n\n- Python 3.7+\n- No external dependencies required\n\n## License\n\nThis project is open source and available under the MIT License. "
  },
  {
    "path": "data/raw_test_repo/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nVending Machine Package\n\nA comprehensive vending machine implementation with:\n- Product management\n- Inventory tracking\n- Payment processing\n- Transaction handling\n\"\"\"\n"
  },
  {
    "path": "data/raw_test_repo/example.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom decimal import Decimal\nfrom datetime import datetime, timedelta\nfrom models.product import Item\nfrom vending_machine import Sys, SysErr\n\n\ndef main():\n    s = Sys()\n    items = [Item(code='D1', label='Drink1', val=1.5, count=10, grp='d',\n        exp=datetime.now() + timedelta(days=90)), Item(code='S1', label=\n        'Snack1', val=1.0, count=15, grp='s', exp=datetime.now() +\n        timedelta(days=30)), Item(code='S2', label='Snack2', val=2.0, count\n        =8, grp='s', exp=datetime.now() + timedelta(days=60))]\n    for i, item in enumerate(items):\n        s.store.put(item, i)\n    try:\n        print('Items:')\n        for pos, item in s.ls():\n            print(f'Pos {pos}: {item.label} - ${item.val:.2f}')\n        pos = 0\n        print('\\nAdding $2.00...')\n        s.add_money(Decimal('2.00'))\n        item, ret = s.buy(pos)\n        print(f'\\nBought: {item.label}')\n        if ret:\n            print(f'Return: ${ret:.2f}')\n        print('\\nUpdated Items:')\n        for pos, item in s.ls():\n            print(\n                f'Pos {pos}: {item.label} - ${item.val:.2f} (Count: {item.count})'\n                )\n    except SysErr as e:\n        print(f'Err: {str(e)}')\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "data/raw_test_repo/inventory/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"Inventory management package for product stock tracking.\"\"\"\n"
  },
  {
    "path": "data/raw_test_repo/inventory/inventory_manager.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, List, Optional\nfrom ..models.product import Item\n\n\nclass Store:\n\n    def __init__(self, cap: int=20):\n        self.cap = cap\n        self._data: Dict[str, Item] = {}\n        self._map: Dict[int, str] = {}\n\n    def put(self, obj: Item, pos: Optional[int]=None) ->bool:\n        if obj.code in self._data:\n            curr = self._data[obj.code]\n            curr.count += obj.count\n            return True\n        if pos is not None:\n            if pos < 0 or pos >= self.cap:\n                return False\n            if pos in self._map:\n                return False\n            self._map[pos] = obj.code\n        else:\n            for i in range(self.cap):\n                if i not in self._map:\n                    self._map[i] = obj.code\n                    break\n            else:\n                return False\n        self._data[obj.code] = obj\n        return True\n\n    def rm(self, code: str) ->bool:\n        if code not in self._data:\n            return False\n        for k, v in list(self._map.items()):\n            if v == code:\n                del self._map[k]\n        del self._data[code]\n        return True\n\n    def get(self, code: str) ->Optional[Item]:\n        return self._data.get(code)\n\n    def get_at(self, pos: int) ->Optional[Item]:\n        if pos not in self._map:\n            return None\n        code = self._map[pos]\n        return self._data.get(code)\n\n    def ls(self) ->List[Item]:\n        return [obj for obj in self._data.values() if obj.check()]\n\n    def find(self, code: str) ->Optional[int]:\n        for k, v in self._map.items():\n            if v == code:\n                return k\n        return None\n"
  },
  {
    "path": "data/raw_test_repo/models/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"Models package for data structures used in the vending machine.\"\"\"\n"
  },
  {
    "path": "data/raw_test_repo/models/product.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom dataclasses import dataclass\nfrom typing import Optional\nfrom datetime import datetime\n\n@dataclass\nclass Item:\n    \"\"\"\n    Summary:\n    Represents an item with associated attributes for tracking and management in various contexts.\n\n    Description:\n    This class serves as a blueprint for creating items that can be tracked and managed within a system. Each item has attributes such as a unique code, a label, a value, a count, an optional expiration date, and a group classification. The primary motivation behind this class is to facilitate resource management, inventory tracking, or any scenario where items need to be monitored for validity and availability.\n\n    Use this class when you need to represent items that may have a limited lifespan or quantity, such as in inventory systems, gaming resources, or token management. It provides methods to check the validity of an item and to modify its count, ensuring that operations on the item are safe and consistent.\n\n    The class fits into larger systems by allowing for easy integration with resource management workflows, enabling developers to track item states and manage their lifecycle effectively.\n\n    Example:\n    ```python\n    from datetime import datetime, timedelta\n\n    # Create an item with a specific expiration date\n    item = Item(code='A123', label='Sample Item', val=10.0, count=5, exp=datetime.now() + timedelta(days=1))\n\n    # Check if the item is valid\n    is_valid = item.check()  # Returns True if count > 0 and not expired\n\n    # Modify the count of the item\n    item.mod(2)  # Decreases count by 2, returns True\n    ```\n\n    Parameters:\n    - code (str): A unique identifier for the item.\n    - label (str): A descriptive name for the item.\n    - val (float): The value associated with the item, representing its worth.\n    - count (int): The quantity of the item available. Must be a non-negative integer.\n    - exp (Optional[datetime]): An optional expiration date for the item. If set, the item will be considered invalid after this date.\n    - grp (str): A classification group for the item, defaulting to 'misc'.\n\n    Attributes:\n    - code (str): The unique identifier for the item.\n    - label (str): The name or description of the item.\n    - val (float): The monetary or functional value of the item.\n    - count (int): The current quantity of the item available, must be non-negative.\n    - exp (Optional[datetime]): The expiration date of the item, if applicable.\n    - grp (str): The group classification of the item, useful for categorization.\n    \"\"\"\n    code: str\n    label: str\n    val: float\n    count: int\n    exp: Optional[datetime] = None\n    grp: str = 'misc'\n\n    def check(self) -> bool:\n        \"\"\"\n        Validates the current object's state based on count and expiration.\n\n        Checks whether the object is still valid by verifying two key conditions:\n        1. The object's count is greater than zero\n        2. The object has not exceeded its expiration timestamp\n\n        This method is typically used to determine if an object is still usable\n        or has become stale/invalid. It provides a quick state validation check\n        that can be used in resource management, token validation, or lifecycle\n        tracking scenarios.\n\n        Returns:\n            bool: True if the object is valid (count > 0 and not expired),\n                  False otherwise.\n        \"\"\"\n        if self.count <= 0:\n            return False\n        if self.exp and datetime.now() > self.exp:\n            return False\n        return True\n\n    def mod(self, n: int=1) -> bool:\n        \"\"\"\n        Summary:\n        Determines if the current count can be decremented by a specified value.\n\n        Description:\n        This method checks if the `count` attribute is greater than or equal to the provided integer `n`. If so, it decrements `count` by `n` and returns `True`. If `count` is less than `n`, it returns `False`, indicating that the operation could not be performed.\n\n        Use this function when managing resources or operations that require a controlled decrement of a count, ensuring that the count does not drop below zero. This is particularly useful in scenarios such as resource allocation, gaming mechanics, or iterative processes.\n\n        The method is integral to classes that require precise control over a count, allowing for safe decrements while maintaining the integrity of the count value.\n\n        Args:\n        n (int, optional): The value to decrement from `count`. Must be a positive integer that does not exceed the current `count`. Default is 1.\n\n        Returns:\n        bool: Returns `True` if the decrement was successful (i.e., `count` was greater than or equal to `n`), otherwise returns `False`.\n\n        Raises:\n        No exceptions are raised by this method. Ensure that `n` is a positive integer and does not exceed the current `count` to avoid logical errors.\n\n        Examples:\n        ```python\n        obj = YourClass()\n        obj.count = 5\n        result = obj.mod(2)  # result will be True, obj.count will be 3\n        result = obj.mod(4)  # result will be False, obj.count remains 3\n        result = obj.mod(0)  # result will be False, as n should be greater than 0\n        result = obj.mod(-1) # result will be False, as n should be a positive integer\n        ```\n        \"\"\"\n        if self.count >= n:\n            self.count -= n\n            return True\n        return False"
  },
  {
    "path": "data/raw_test_repo/payment/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"Payment processing package for handling different payment methods.\"\"\"\n"
  },
  {
    "path": "data/raw_test_repo/payment/payment_processor.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom abc import ABC, abstractmethod\nfrom dataclasses import dataclass\nfrom enum import Enum\nfrom typing import Optional\nfrom decimal import Decimal\n\n\nclass TxStatus(Enum):\n    WAIT = 'pending'\n    DONE = 'completed'\n    ERR = 'failed'\n    RET = 'refunded'\n\n\n@dataclass\nclass Tx:\n    id: str\n    amt: Decimal\n    st: TxStatus\n    mth: str\n    msg: Optional[str] = None\n\n\nclass Handler(ABC):\n\n    @abstractmethod\n    def proc(self, amt: Decimal) ->Tx:\n        pass\n\n    @abstractmethod\n    def rev(self, tx: Tx) ->bool:\n        pass\n\n\nclass Cash(Handler):\n\n    def __init__(self):\n        self.bal: Decimal = Decimal('0.00')\n\n    def add(self, amt: Decimal) ->None:\n        self.bal += amt\n\n    def proc(self, amt: Decimal) ->Tx:\n        if self.bal >= amt:\n            self.bal -= amt\n            return Tx(id=f'C_{id(self)}', amt=amt, st=TxStatus.DONE, mth='cash'\n                )\n        return Tx(id=f'C_{id(self)}', amt=amt, st=TxStatus.ERR, mth='cash',\n            msg='insufficient')\n\n    def rev(self, tx: Tx) ->bool:\n        if tx.st == TxStatus.DONE:\n            self.bal += tx.amt\n            tx.st = TxStatus.RET\n            return True\n        return False\n\n    def ret(self) ->Decimal:\n        tmp = self.bal\n        self.bal = Decimal('0.00')\n        return tmp\n"
  },
  {
    "path": "data/raw_test_repo/vending_machine.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom decimal import Decimal\nfrom typing import Optional, List, Tuple\nfrom .models.product import Item\nfrom .payment.payment_processor import Handler, Tx, TxStatus, Cash\nfrom .inventory.inventory_manager import Store\n\n\nclass SysErr(Exception):\n    pass\n\n\nclass Sys:\n\n    def __init__(self, h: Optional[Handler]=None):\n        self.store = Store()\n        self.h = h or Cash()\n        self._tx: Optional[Tx] = None\n\n    def ls(self) ->List[Tuple[int, Item]]:\n        items = []\n        for item in self.store.ls():\n            pos = self.store.find(item.code)\n            if pos is not None:\n                items.append((pos, item))\n        return sorted(items, key=lambda x: x[0])\n\n    def pick(self, pos: int) ->Optional[Item]:\n        item = self.store.get_at(pos)\n        if not item:\n            raise SysErr('invalid pos')\n        if not item.check():\n            raise SysErr('unavailable')\n        return item\n\n    def add_money(self, amt: Decimal) ->None:\n        if not isinstance(self.h, Cash):\n            raise SysErr('cash not supported')\n        self.h.add(amt)\n\n    def buy(self, pos: int) ->Tuple[Item, Optional[Decimal]]:\n        item = self.pick(pos)\n        tx = self.h.proc(Decimal(str(item.val)))\n        self._tx = tx\n        if tx.st != TxStatus.DONE:\n            raise SysErr(tx.msg or 'tx failed')\n        if not item.mod():\n            self.h.rev(tx)\n            raise SysErr('dispense failed')\n        ret = None\n        if isinstance(self.h, Cash):\n            ret = self.h.ret()\n        return item, ret\n\n    def cancel(self) ->Optional[Decimal]:\n        if not self._tx:\n            raise SysErr('no tx')\n        ok = self.h.rev(self._tx)\n        if not ok:\n            raise SysErr('rev failed')\n        ret = None\n        if isinstance(self.h, Cash):\n            ret = self.h.ret()\n        self._tx = None\n        return ret\n"
  },
  {
    "path": "data/raw_test_repo_simple/helper.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nclass HelperClass:\n    \"\"\"\n    Represents a utility for managing and processing data.\n\n    The `HelperClass` is designed to facilitate data processing tasks by leveraging the `DataProcessor` class. It serves as an intermediary that manages the workflow of data processing, making it easier to handle data updates and retrievals within a system. This class is particularly useful in scenarios where data needs to be processed and accessed in a structured manner.\n\n    The `HelperClass` fits into the larger system architecture as a component that coordinates data processing tasks. It achieves its purpose by using the `DataProcessor` to perform the actual data processing and then managing the processed data internally.\n\n    Example:\n        # Initialize the HelperClass\n        helper = HelperClass()\n\n        # Process data using the helper\n        helper.process_data()\n\n        # Retrieve the processed data result\n        result = helper.get_result()\n        print(result)  # Output: '[1, 2, 3]'\n\n    Attributes:\n        data (list): Stores the processed data, initially an empty list.\n    \"\"\"\n\n    def __init__(self):\n        self.data = []\n\n    def process_data(self):\n        \"\"\"\n        Processes and updates the internal data.\n\n        This method orchestrates the data processing workflow by invoking the `DataProcessor.process()` method to perform the main data processing task. It then calls `_internal_process()` to finalize the processing and update the internal `data` attribute. Use this method when you need to refresh or initialize the data within the `HelperClass` instance.\n\n        Returns:\n            None: This method updates the internal state and does not return a value.\n        \"\"\"\n        self.data = DataProcessor.process()\n        self._internal_process()\n\n    def _internal_process(self):\n        \"\"\"\n        No docstring provided.\n        \"\"\"\n        return self.data\n\n    def get_result(self):\n        \"\"\"\n        No docstring provided.\n        \"\"\"\n        return str(self.data)\n\nclass DataProcessor:\n    '''\n    \"\"\"Handles basic data processing tasks within a system.\n\n        This class is designed to perform simple data processing operations, providing\n        utility methods that can be used in various scenarios where basic data manipulation\n        is required. It is particularly useful in contexts where a straightforward list of\n        integers is needed for further processing or testing.\n\n        The `DataProcessor` class fits into the larger system architecture as a utility\n        component, offering static and internal methods to handle specific processing tasks.\n        It achieves its purpose by providing a static method for general use and an internal\n        method for class-specific operations.\n\n        Example:\n            # Initialize the DataProcessor class\n            processor = DataProcessor()\n\n            # Use the static method to process data\n            result = DataProcessor.process()\n            print(result)  # Output: [1, 2, 3]\n\n            # Use the internal method for internal processing\n            internal_result = processor._internal_process()\n            print(internal_result)  # Output: 'processed'\n    \"\"\"\n    '''\n\n    @staticmethod\n    def process():\n        '''\n        \"\"\"Processes data and returns a list of integers.\n\n            This static method is designed to perform a basic data processing task\n            and return a predefined list of integers. It can be used whenever a simple\n            list of integers is required for further operations or testing purposes.\n\n            Returns:\n                list of int: A list containing the integers [1, 2, 3].\n        \"\"\"\n        '''\n        return [1, 2, 3]\n\n    def _internal_process(self):\n        '''\n        \"\"\"Processes internal data and returns a status message.\n\n            This method is used internally within the `DataProcessor` class to perform\n            specific data processing tasks that are not exposed publicly. It is typically\n            called by other methods within the class to handle intermediate processing\n            steps.\n\n            Returns:\n                str: A string indicating the processing status, specifically 'processed'.\n            \"\"\"\n        '''\n        return 'processed'"
  },
  {
    "path": "data/raw_test_repo_simple/inner/inner_functions.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\ndef inner_function():\n    \"\"\"\n    Returns a greeting message from an inner function.\n\n    This function is designed to return a simple greeting message, which can be used in nested or internal function calls to verify execution flow or for debugging purposes. It is typically used in development environments where confirming the execution of specific code paths is necessary.\n\n    Returns:\n        str: A greeting message stating 'Hello from inner function!'\n\n    Example:\n        >>> message = inner_function()\n        >>> print(message)\n        'Hello from inner function!'\n    \"\"\"\n    return 'Hello from inner function!'\n\ndef get_random_quote():\n    \"\"\"\n    Fetches a predefined inspirational quote.\n\n    This function is designed to provide users with a motivational quote, which can be used in applications that aim to inspire or uplift users. It is particularly useful in scenarios where a quick, positive message is needed to enhance user experience.\n\n    Returns:\n        str: A quote string stating 'The best way to predict the future is to create it.'\n\n    Example:\n        >>> quote = get_random_quote()\n        >>> print(quote)\n        'The best way to predict the future is to create it.'\n    \"\"\"\n    return 'The best way to predict the future is to create it.'\n\ndef generate_timestamp():\n    \"\"\"\n    Generates and returns a static timestamp.\n\n    This function provides a hardcoded timestamp string, which can be used in scenarios where a consistent and predictable timestamp is required for testing or logging purposes. It fits into workflows where a fixed date and time representation is needed without relying on the current system time.\n\n    Returns:\n        str: A string representing the static timestamp '2023-05-15 14:30:22'.\n    \"\"\"\n    return '2023-05-15 14:30:22'\n\ndef get_system_status():\n    \"\"\"\n    Provides a static message indicating the operational status of systems.\n\n    This function is used to retrieve a fixed status message that confirms all systems are functioning correctly. It is useful in monitoring dashboards or status pages where a quick confirmation of system health is required.\n\n    Returns:\n        str: A status message stating 'All systems operational.'\n\n    Example:\n        >>> status = get_system_status()\n        >>> print(status)\n        'All systems operational'\n    \"\"\"\n    return 'All systems operational'\n\ndef fetch_user_message():\n    '''\n    \"\"\"Fetches a predefined user message indicating notifications.\n\n        This function is used to retrieve a static message that informs the user about the number of notifications they have. It is typically used in scenarios where a quick status update is needed for user engagement.\n\n        Returns:\n            str: A message string stating 'Welcome back! You have 3 notifications.'\n\n        Example:\n            >>> message = fetch_user_message()\n            >>> print(message)\n            'Welcome back! You have 3 notifications.'\n        \"\"\"\n    '''\n    return 'Welcome back! You have 3 notifications.'"
  },
  {
    "path": "data/raw_test_repo_simple/main.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom helper import HelperClass\nfrom inner.inner_functions import inner_function, get_random_quote, generate_timestamp, get_system_status, fetch_user_message\n\ndef main_function():\n    \"\"\"\n    Executes data processing and utility operations, returning the processed data as a string.\n\n    This function initializes a `HelperClass` instance to manage and process data, invokes a utility function to provide a placeholder value, and generates a static timestamp for consistency in logging or testing scenarios. The function is useful when a complete data processing sequence is needed, integrating utility operations to produce a final result.\n\n    Returns:\n        str: The processed data result as a string, derived from the `HelperClass` instance after executing the data processing and utility functions.\n\n    Example:\n        # Execute the main function to process data and retrieve the result\n        result = main_function()\n        print(result)  # Output: '[1, 2, 3]'\n    \"\"\"\n    helper = HelperClass()\n    helper.process_data()\n    utility_function()\n    generate_timestamp()\n    return helper.get_result()\n\ndef utility_function():\n    \"\"\"\n    Returns a utility string.\n\n    This function provides a simple utility string, which can be used in various contexts where a placeholder or a generic return value is needed. It is typically used within workflows that require a consistent return value for testing or demonstration purposes.\n\n    Returns:\n        str: The string 'utility', serving as a generic utility value.\n    \"\"\"\n    return 'utility'"
  },
  {
    "path": "data/raw_test_repo_simple/processor.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom helper import HelperClass\nfrom processor import DataProcessor\nfrom main import utility_function\n\nclass AdvancedProcessor:\n    \"\"\"\n    Facilitates advanced data processing by coordinating multiple processing components.\n\n    The `AdvancedProcessor` class is designed to manage and execute complex data processing workflows by integrating the functionalities of `HelperClass` and `DataProcessor`. It is ideal for scenarios where a comprehensive processing sequence is needed, providing a streamlined approach to handle data operations and produce a final result.\n\n    This class fits into the larger system architecture as a high-level orchestrator of data processing tasks, ensuring that each component's capabilities are effectively utilized to achieve the desired outcome.\n\n    Example:\n        # Initialize the AdvancedProcessor\n        processor = AdvancedProcessor()\n\n        # Execute the processing workflow\n        result = processor.run()\n        print(result)  # Output: 'utility'\n\n    Attributes:\n        helper (HelperClass): An instance of `HelperClass` used to manage data processing tasks.\n        data_processor (DataProcessor): An instance of `DataProcessor` used to perform specific data processing operations.\n    \"\"\"\n\n    def __init__(self):\n        self.helper = HelperClass()\n        self.data_processor = DataProcessor()\n\n    def run(self):\n        \"\"\"\n        Executes the complete data processing workflow and returns the result.\n\n        This method coordinates the data processing tasks by utilizing both the `HelperClass` and `DataProcessor` to perform necessary operations. It is designed to be used when a full processing sequence is required, culminating in a final result that indicates the completion of these tasks.\n\n        Returns:\n            str: The result of the processing workflow, typically a utility string indicating successful completion.\n\n        Example:\n            # Create an instance of AdvancedProcessor\n            processor = AdvancedProcessor()\n\n            # Run the processing workflow\n            result = processor.run()\n            print(result)  # Output: 'utility'\n        \"\"\"\n        self.helper.process_data()\n        self.data_processor._internal_process()\n        return self.process_result()\n\n    def process_result(self):\n        \"\"\"\n        Returns a utility string as the result of processing.\n\n        This method is part of the `AdvancedProcessor` class workflow, providing a consistent utility value after processing operations. It is typically used when a placeholder or generic result is needed following the execution of data processing tasks within the class.\n\n        Returns:\n            str: The string 'utility', serving as a generic utility value to indicate the completion of processing tasks.\n        \"\"\"\n        return utility_function()"
  },
  {
    "path": "data/raw_test_repo_simple/test_file.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\ndef test_function():\n    \"\"\"\n    Returns a boolean value indicating a successful test condition.\n\n    This function is typically used in scenarios where a simple, consistent boolean value is required to represent a successful outcome or condition. It can be integrated into workflows that need a straightforward pass/fail indicator for testing or validation purposes.\n\n    Returns:\n        bool: The boolean value `True`, indicating a successful or positive condition.\n    \"\"\"\n    return True"
  },
  {
    "path": "eval_completeness.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport ast\nimport os\nfrom typing import Dict, Any, List, Union\nfrom pathlib import Path\nfrom evaluator.completeness import ClassCompletenessEvaluator, FunctionCompletenessEvaluator\nfrom tabulate import tabulate\n\ndef run_docstring_tests(source_file: str) -> Dict[str, Any]:\n    \"\"\"\n    Run comprehensive docstring evaluation tests on a Python source file.\n    \n    This function reads a Python file and evaluates docstrings for all classes,\n    functions, and methods found within. It provides detailed evaluation results\n    using different evaluators.\n    \n    Args:\n        source_file: Path to the Python file to analyze\n        \n    Returns:\n        Dictionary containing evaluation results for each found element\n        \n    Example:\n        >>> results = run_docstring_tests('my_module.py')\n        >>> print(results['functions'][0])\n        1.0\n    \"\"\"\n    with open(source_file, 'r', encoding='utf-8') as f:\n        source = f.read()\n    \n    try:\n        tree = ast.parse(source)\n    except SyntaxError as e:\n        return {\n            'status': 'error',\n            'message': f'Failed to parse {source_file}: {str(e)}'\n        }\n    \n    results = {\n        'status': 'success',\n        'file': source_file,\n        'classes': [],\n        'functions': [],\n        'debug_info': {}\n    }\n    \n    # Instantiate evaluators\n    class_evaluator = ClassCompletenessEvaluator()\n    func_evaluator = FunctionCompletenessEvaluator()\n    \n    # Process all nodes in the AST\n    for node in ast.iter_child_nodes(tree):\n        if isinstance(node, ast.ClassDef):\n            class_result = {\n                'name': node.name,\n                'type': 'class',\n                'completeness_score': class_evaluator.evaluate(node),\n                'completeness_elements': class_evaluator.element_scores,\n                'element_required': class_evaluator.element_required\n            }\n            results['classes'].append(class_result)\n            \n            # Evaluate methods within the class\n            for method in [n for n in ast.iter_child_nodes(node) if isinstance(n, ast.FunctionDef)]:\n                # Skip __init__ methods\n                if method.name == '__init__':\n                    continue\n                    \n                method_result = {\n                    'name': f\"{node.name}.{method.name}\",\n                    'type': 'method',\n                    'completeness_score': func_evaluator.evaluate(method),\n                    'completeness_elements': func_evaluator.element_scores,\n                    'element_required': func_evaluator.element_required\n                }\n                results['functions'].append(method_result)\n                \n        elif isinstance(node, ast.FunctionDef):\n            # Only process top-level functions\n            func_result = {\n                'name': node.name,\n                'type': 'function',\n                'completeness_score': func_evaluator.evaluate(node),\n                'completeness_elements': func_evaluator.element_scores,\n                'element_required': func_evaluator.element_required\n            }\n            results['functions'].append(func_result)\n    \n    # Add overall statistics\n    results['statistics'] = {\n        'total_classes': len(results['classes']),\n        'total_functions': len(results['functions']),\n        'average_class_score': sum(r['completeness_score'] for r in results['classes']) / \n                             max(1, len(results['classes'])),\n        'average_function_score': sum(r['completeness_score'] for r in results['functions']) / \n                                max(1, len(results['functions']))\n    }\n    \n    return results\n\ndef process_directory(directory_path: str) -> Dict[str, Any]:\n    \"\"\"\n    Process all Python files in a directory and its subdirectories.\n    \n    Args:\n        directory_path: Path to the directory to analyze\n        \n    Returns:\n        Dictionary containing aggregated evaluation results for all files\n    \"\"\"\n    directory = Path(directory_path)\n    \n    # Initialize aggregate results\n    aggregate_results = {\n        'status': 'success',\n        'directory': str(directory),\n        'files': [],\n        'file_results': [],\n        'classes': [],\n        'functions': [],\n        'statistics': {\n            'total_files': 0,\n            'successful_files': 0,\n            'failed_files': 0,\n            'total_classes': 0,\n            'total_functions': 0,\n            'average_class_score': 0.0,\n            'average_function_score': 0.0,\n            'overall_average_score': 0.0\n        }\n    }\n    \n    # Find all Python files recursively\n    python_files = []\n    for root, _, files in os.walk(directory):\n        for file in files:\n            if file.endswith('.py'):\n                python_files.append(os.path.join(root, file))\n    \n    if not python_files:\n        aggregate_results['status'] = 'error'\n        aggregate_results['message'] = f'No Python files found in {directory_path}'\n        return aggregate_results\n    \n    aggregate_results['statistics']['total_files'] = len(python_files)\n    \n    # Process each Python file\n    all_class_scores = []\n    all_function_scores = []\n    \n    for py_file in python_files:\n        file_result = run_docstring_tests(py_file)\n        \n        if file_result['status'] == 'success':\n            aggregate_results['successful_files'] = aggregate_results['statistics']['successful_files'] + 1\n            aggregate_results['file_results'].append(file_result)\n            aggregate_results['files'].append(py_file)\n            \n            # Accumulate classes and functions with file path context\n            for class_result in file_result['classes']:\n                class_result['file'] = py_file\n                aggregate_results['classes'].append(class_result)\n                all_class_scores.append(class_result['completeness_score'])\n            \n            for func_result in file_result['functions']:\n                func_result['file'] = py_file\n                aggregate_results['functions'].append(func_result)\n                all_function_scores.append(func_result['completeness_score'])\n                \n            # Update statistics\n            aggregate_results['statistics']['total_classes'] += file_result['statistics']['total_classes']\n            aggregate_results['statistics']['total_functions'] += file_result['statistics']['total_functions']\n        else:\n            aggregate_results['statistics']['failed_files'] += 1\n    \n    # Calculate average scores\n    if all_class_scores:\n        aggregate_results['statistics']['average_class_score'] = sum(all_class_scores) / len(all_class_scores)\n    \n    if all_function_scores:\n        aggregate_results['statistics']['average_function_score'] = sum(all_function_scores) / len(all_function_scores)\n    \n    # Calculate overall average score (classes and functions combined)\n    all_scores = all_class_scores + all_function_scores\n    if all_scores:\n        aggregate_results['statistics']['overall_average_score'] = sum(all_scores) / len(all_scores)\n    \n    return aggregate_results\n\ndef print_evaluation_results(results: Dict[str, Any]) -> None:\n    \"\"\"\n    Pretty print the evaluation results in a readable format with colors.\n    \n    Args:\n        results: Dictionary containing evaluation results from run_docstring_tests\n    \"\"\"\n    # ANSI color codes\n    GREEN = '\\033[92m'\n    RED = '\\033[91m'\n    BLUE = '\\033[94m'\n    YELLOW = '\\033[93m'\n    BOLD = '\\033[1m'\n    ENDC = '\\033[0m'\n    \n    # Check if this is a directory result or a file result\n    is_directory = 'directory' in results\n    \n    if is_directory:\n        # Print directory path\n        print(f\"\\n{BOLD}Evaluating Python files in directory: {results['directory']}{ENDC}\")\n        print(\"=\" * 80)\n        \n        # Print file summary\n        print(f\"\\n{BLUE}{BOLD}FILE SUMMARY:{ENDC}\")\n        stats_data = [\n            ['Total Files', results['statistics']['total_files']],\n            ['Successfully Processed Files', results['statistics']['successful_files']],\n            ['Failed Files', results['statistics']['failed_files']]\n        ]\n        print(tabulate(stats_data, tablefmt='simple'))\n        \n        # Print overall statistics\n        print(f\"\\n{BLUE}{BOLD}OVERALL STATISTICS:{ENDC}\")\n        \n        # Add colored statistics\n        class_score = results['statistics']['average_class_score']\n        if class_score >= 0.8:\n            class_score_str = f\"{GREEN}{class_score:.2f}{ENDC}\"\n        elif class_score >= 0.5:\n            class_score_str = f\"{YELLOW}{class_score:.2f}{ENDC}\"\n        else:\n            class_score_str = f\"{RED}{class_score:.2f}{ENDC}\"\n            \n        func_score = results['statistics']['average_function_score']\n        if func_score >= 0.8:\n            func_score_str = f\"{GREEN}{func_score:.2f}{ENDC}\"\n        elif func_score >= 0.5:\n            func_score_str = f\"{YELLOW}{func_score:.2f}{ENDC}\"\n        else:\n            func_score_str = f\"{RED}{func_score:.2f}{ENDC}\"\n            \n        overall_score = results['statistics']['overall_average_score']\n        if overall_score >= 0.8:\n            overall_score_str = f\"{GREEN}{overall_score:.2f}{ENDC}\"\n        elif overall_score >= 0.5:\n            overall_score_str = f\"{YELLOW}{overall_score:.2f}{ENDC}\"\n        else:\n            overall_score_str = f\"{RED}{overall_score:.2f}{ENDC}\"\n        \n        stats_data = [\n            ['Total Classes', results['statistics']['total_classes']],\n            ['Total Functions/Methods', results['statistics']['total_functions']],\n            ['Average Class Score', class_score_str],\n            ['Average Function Score', func_score_str],\n            ['Overall Average Score', overall_score_str]\n        ]\n        print(tabulate(stats_data, tablefmt='simple'))\n        \n        # Ask if the user wants to see details for individual files\n        print(f\"\\nUse python {os.path.basename(__file__)} <specific_file_path> to see detailed results for a specific file.\")\n        \n    else:\n        # Original single file display logic\n        # Print file path\n        print(f\"\\n{BOLD}Evaluating Python file: {results['file']}{ENDC}\")\n        print(\"=\" * 80)\n        \n        # Print class results table\n        if results['classes']:\n            print(f\"\\n{BLUE}{BOLD}CLASSES:{ENDC}\")\n            \n            headers = ['Class Name', 'Score']\n            elements = list(results['classes'][0]['completeness_elements'].keys())\n            headers.extend(elements)\n            \n            table_data = []\n            for class_result in results['classes']:\n                row = [class_result['name']]\n                score = class_result['completeness_score']\n                # Color the score based on value\n                if score >= 0.8:\n                    score_str = f\"{GREEN}{score:.2f}{ENDC}\"\n                elif score >= 0.5:\n                    score_str = f\"{YELLOW}{score:.2f}{ENDC}\"\n                else:\n                    score_str = f\"{RED}{score:.2f}{ENDC}\"\n                row.append(score_str)\n                \n                for element in elements:\n                    required = class_result['element_required'][element]\n                    has_element = class_result['completeness_elements'][element]\n                    if has_element:\n                        check = f\"{GREEN}✓{ENDC}\"\n                    else:\n                        check = f\"{RED}✗{ENDC}\"\n                    cell = f\"{YELLOW if required else '-'}{'R' if required else ''}{ENDC if required else ''} | {check}\"\n                    row.append(cell)\n                \n                table_data.append(row)\n                \n            print(tabulate(table_data, headers=headers, tablefmt='grid'))\n        \n        # Print function/method results table\n        if results['functions']:\n            print(f\"\\n{BLUE}{BOLD}FUNCTIONS/METHODS:{ENDC}\")\n            \n            headers = ['Function Name', 'Type', 'Score']\n            elements = list(results['functions'][0]['completeness_elements'].keys())\n            headers.extend(elements)\n            \n            table_data = []\n            for func_result in results['functions']:\n                row = [func_result['name'], func_result['type']]\n                score = func_result['completeness_score']\n                # Color the score based on value\n                if score >= 0.8:\n                    score_str = f\"{GREEN}{score:.2f}{ENDC}\"\n                elif score >= 0.5:\n                    score_str = f\"{YELLOW}{score:.2f}{ENDC}\"\n                else:\n                    score_str = f\"{RED}{score:.2f}{ENDC}\"\n                row.append(score_str)\n                \n                for element in elements:\n                    required = func_result['element_required'][element]\n                    has_element = func_result['completeness_elements'][element]\n                    if has_element:\n                        check = f\"{GREEN}✓{ENDC}\"\n                    else:\n                        check = f\"{RED}✗{ENDC}\"\n                    cell = f\"{YELLOW if required else '-'}{'R' if required else ''}{ENDC if required else ''} | {check}\"\n                    row.append(cell)\n                \n                table_data.append(row)\n                \n            print(tabulate(table_data, headers=headers, tablefmt='grid'))\n        \n        # Print overall statistics\n        print(f\"\\n{BLUE}{BOLD}OVERALL STATISTICS:{ENDC}\")\n        stats_data = []\n        \n        # Add colored statistics\n        class_score = results['statistics']['average_class_score']\n        if class_score >= 0.8:\n            class_score_str = f\"{GREEN}{class_score:.2f}{ENDC}\"\n        elif class_score >= 0.5:\n            class_score_str = f\"{YELLOW}{class_score:.2f}{ENDC}\"\n        else:\n            class_score_str = f\"{RED}{class_score:.2f}{ENDC}\"\n            \n        func_score = results['statistics']['average_function_score']\n        if func_score >= 0.8:\n            func_score_str = f\"{GREEN}{func_score:.2f}{ENDC}\"\n        elif func_score >= 0.5:\n            func_score_str = f\"{YELLOW}{func_score:.2f}{ENDC}\"\n        else:\n            func_score_str = f\"{RED}{func_score:.2f}{ENDC}\"\n            \n        stats_data = [\n            ['Total Classes', results['statistics']['total_classes']],\n            ['Total Functions/Methods', results['statistics']['total_functions']],\n            ['Average Class Score', class_score_str],\n            ['Average Function Score', func_score_str]\n        ]\n        print(tabulate(stats_data, tablefmt='simple'))\n\nif __name__ == \"__main__\":\n    # Example usage\n    import sys\n    \n    if len(sys.argv) < 2:\n        print(\"Usage: python eval_completeness.py <path_to_python_file_or_directory>\")\n        sys.exit(1)\n    \n    path = sys.argv[1]\n    if not Path(path).exists():\n        print(f\"Error: Path not found: {path}\")\n        sys.exit(1)\n    \n    if Path(path).is_dir():\n        # Process directory\n        results = process_directory(path)\n        if results['status'] == 'success':\n            print_evaluation_results(results)\n        else:\n            print(f\"Error: {results['message']}\")\n    else:\n        # Process single file\n        results = run_docstring_tests(path)\n        if results['status'] == 'success':\n            print_evaluation_results(results)\n        else:\n            print(f\"Error: {results['message']}\")"
  },
  {
    "path": "generate_docstrings.py",
    "content": "#!/usr/bin/env python3\n# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nDocstring Generator with Dependency-Based Ordering\n\nThis script generates docstrings for Python code components (functions, classes, methods)\nusing a DFS-based approach that starts from components with no dependencies.\n\nKey features:\n1. Parses Python code to identify components and their dependencies\n2. Builds a dependency graph where A→B means \"A depends on B\"\n3. Performs DFS traversal starting from components with no dependencies\n4. Processes dependencies before the components that depend on them\n5. Ensures classes depend on their methods, not vice versa\n6. Skips __init__ methods as they typically don't need separate docstrings\n7. Provides visual representation of progress in the terminal\n\nUsage:\n    python generate_docstrings.py --repo-path PATH --config-path PATH [--test-mode]\n\"\"\"\n\nimport os\nimport sys\nimport time\nimport ast\nimport json\nimport argparse\nimport logging\nimport random\nfrom pathlib import Path\nfrom typing import Dict, List, Set, Optional, Any\nfrom collections import defaultdict\nimport tiktoken  # Add this import for token counting\n\n# Setup logging\nlogging.basicConfig(\n    level=logging.INFO,\n    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',\n    handlers=[\n        logging.StreamHandler(sys.stdout)\n    ]\n)\nlogger = logging.getLogger(\"docstring_generator\")\n\n# Import dependency analyzer modules\nfrom src.dependency_analyzer import (\n    CodeComponent, \n    DependencyParser, \n    dependency_first_dfs, \n    build_graph_from_components\n)\nfrom src.visualizer import ProgressVisualizer\nfrom src.agent.orchestrator import Orchestrator\n\n\ndef generate_test_docstring(component: CodeComponent) -> str:\n    \"\"\"\n    Generate a placeholder docstring for test mode.\n    \n    Args:\n        component: The code component to generate a placeholder docstring for.\n        \n    Returns:\n        A placeholder docstring based on the component type.\n    \"\"\"\n    comp_type = component.component_type\n    name = component.id.split(\".\")[-1]\n    \n    if comp_type == \"function\":\n        return f\"\"\"\n        Test docstring for function '{name}'.\n        \n        This is a placeholder docstring generated in test mode.\n        In normal mode, this would be replaced with an AI-generated docstring.\n        \n        Args:\n            arg1: Description of first argument\n            arg2: Description of second argument\n            \n        Returns:\n            Description of return value\n        \"\"\"\n    elif comp_type == \"class\":\n        return f\"\"\"\n        Test docstring for class '{name}'.\n        \n        This is a placeholder docstring generated in test mode.\n        In normal mode, this would be replaced with an AI-generated docstring.\n        \n        Attributes:\n            attr1: Description of first attribute\n            attr2: Description of second attribute\n        \"\"\"\n    elif comp_type == \"method\":\n        class_name = component.id.split(\".\")[-2]\n        return f\"\"\"\n        Test docstring for method '{name}' in class '{class_name}'.\n        \n        This is a placeholder docstring generated in test mode.\n        In normal mode, this would be replaced with an AI-generated docstring.\n        \n        Args:\n            arg1: Description of first argument\n            arg2: Description of second argument\n            \n        Returns:\n            Description of return value\n        \"\"\"\n    else:\n        return f\"\"\"\n        Test docstring for {comp_type} '{name}'.\n        \n        This is a placeholder docstring generated in test mode.\n        \"\"\"\n\n\ndef generate_docstring_for_component(component: CodeComponent, orchestrator: Optional[Orchestrator], test_mode: str = 'none',\n                                     dependency_graph: Optional[Dict[str, List[str]]] = None) -> str:\n    \"\"\"\n    Generate a docstring for a single component.\n    \n    Args:\n        component: The component to generate a docstring for.\n        orchestrator: The orchestrator instance.\n        test_mode: The test mode to use.\n        dependency_graph: Optional dependency graph.\n        \n    Returns:\n        The generated docstring.\n    \"\"\"\n\n    # do not use try/except here, we want to fail if there is an error\n    if not orchestrator:\n        return \"\"\n    \n    file_path = component.file_path\n    \n    # Get the component code\n    component_code = component.source_code\n    \n    # Estimate token count of the focal component\n    encoding = tiktoken.get_encoding(\"cl100k_base\")  # Default OpenAI encoding\n    token_consume_focal = len(encoding.encode(component_code))\n    \n    # Skip if the component is too large (> 10000 tokens)\n    if token_consume_focal > 10000:\n        # truncate the component code to 10000 tokens\n        component_code = encoding.decode(encoding.encode(component_code)[:10000])\n    \n    # Parse the file\n    with open(file_path, \"r\", encoding=\"utf-8\") as f:\n        file_content = f.read()\n    \n    ast_tree = ast.parse(file_content)\n    ast_node = None\n    \n    # Locate the AST node for the component\n    component_parts = component.id.split(\".\")\n    component_name = component_parts[-1]\n    \n    if component.component_type == \"function\":\n        # Find top-level function\n        for node in ast.iter_child_nodes(ast_tree):\n            if (isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)) \n                    and node.name == component_name):\n                ast_node = node\n                break\n            \n    elif component.component_type == \"class\":\n        # Find class\n        for node in ast.iter_child_nodes(ast_tree):\n            if isinstance(node, ast.ClassDef) and node.name == component_name:\n                ast_node = node\n                break\n            \n    elif component.component_type == \"method\":\n        # Find method inside class\n        class_name, method_name = component_parts[-2:]\n        for node in ast.iter_child_nodes(ast_tree):\n            if isinstance(node, ast.ClassDef) and node.name == class_name:\n                for item in node.body:\n                    if (isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)) \n                            and item.name == method_name):\n                        ast_node = item\n                        break\n                break\n    \n    try:\n        # Pass component.id as the focal_node_dependency_path\n        docstring = orchestrator.process(\n            focal_component=component_code,\n            file_path=file_path,\n            ast_node=ast_node,\n            ast_tree=ast_tree,\n            dependency_graph=dependency_graph,\n            focal_node_dependency_path=component.id,\n            token_consume_focal=token_consume_focal  # Pass token count to orchestrator\n        )\n        return docstring\n    except Exception as e:\n        print(f\"Error generating docstring for {component.id}: {str(e)}\")\n        return \"\"\n\n\ndef set_docstring_in_file(file_path: str, component: CodeComponent, docstring: str) -> bool:\n    \"\"\"\n    Update a Python file with a newly generated docstring for a component.\n    \n    Args:\n        file_path: Path to the file to update.\n        component: The component to update with a docstring.\n        docstring: The docstring to insert.\n        \n    Returns:\n        True if successful, False otherwise.\n    \"\"\"\n    # Do not use Try/Except here, we want to fail if there is an error\n    # Read the file\n    with open(file_path, \"r\", encoding=\"utf-8\") as f:\n        source = f.read()\n    \n    # Parse the file\n    tree = ast.parse(source)\n    \n    # Find the component in the parsed AST\n    component_node = None\n    component_parts = component.id.split(\".\")\n    component_name = component_parts[-1]\n    \n    if component.component_type == \"function\":\n        # Find top-level function\n        for node in ast.iter_child_nodes(tree):\n            if (isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)) \n                    and node.name == component_name):\n                component_node = node\n                break\n                \n    elif component.component_type == \"class\":\n        # Find class\n        for node in ast.iter_child_nodes(tree):\n            if isinstance(node, ast.ClassDef) and node.name == component_name:\n                component_node = node\n                break\n                \n    elif component.component_type == \"method\":\n        # Find method inside class\n        class_name, method_name = component_parts[-2:]\n        for node in ast.iter_child_nodes(tree):\n            if isinstance(node, ast.ClassDef) and node.name == class_name:\n                for item in node.body:\n                    if (isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)) \n                            and item.name == method_name):\n                        component_node = item\n                        break\n                break\n    \n    if not component_node:\n        logger.error(f\"Could not find component {component.id} in {file_path}\")\n        return False\n    \n    # Set the docstring\n    set_node_docstring(component_node, docstring)\n    \n    # Unparse the AST back to source code\n    if hasattr(ast, \"unparse\"):\n        new_source = ast.unparse(tree)\n    else:\n        try:\n            import astor\n            new_source = astor.to_source(tree)\n        except ImportError:\n            logger.error(\n                \"Error: You need to install 'astor' or use Python 3.9+ to unparse the AST. \"\n                f\"Skipping file: {file_path}\"\n            )\n            return False\n    \n    # Write back to the file\n    with open(file_path, \"w\", encoding=\"utf-8\") as f:\n        f.write(new_source)\n    \n    return True\n\n\ndef set_node_docstring(node: ast.AST, docstring: str):\n    \"\"\"\n    Safely set or update the docstring on an AST node (ClassDef, FunctionDef, etc.).\n    Also adjusts indentation relative to the node's existing indentation level,\n    ensuring both the opening and closing triple quotes are properly aligned.\n\n    Args:\n        node: The AST node to modify (ClassDef, FunctionDef, etc.).\n        docstring: The new docstring (as a plain string) to insert.\n    \"\"\"\n    import textwrap\n    \n    # 1. Strip leading/trailing empty lines in the provided docstring\n    #    to avoid spurious blank lines.\n    stripped_docstring = docstring.strip('\\n')\n    if not stripped_docstring:\n        # If empty or all whitespace, provide a placeholder\n        stripped_docstring = \"No docstring provided.\"\n\n    # 2. Dedent possible indentation in docstring (so a multiline docstring\n    #    doesn't carry undesired left margins).\n    dedented = textwrap.dedent(stripped_docstring)\n\n    # 3. Determine how many spaces to indent for doc lines plus triple quotes.\n    existing_indent = getattr(node, 'col_offset', 0)\n    doc_indent_str = ' ' * (existing_indent + 4)\n\n    # 4. Build the final string: \n    #    - Start with a newline (so triple quotes appear on a new line).\n    #    - Indent all docstring lines.\n    #    - End with a newline+same indentation (so the closing triple quotes\n    #      line also has the doc_indent_str).\n    prepared_docstring = (\n        \"\\n\"\n        + textwrap.indent(dedented, doc_indent_str)\n        + \"\\n\"\n        + doc_indent_str\n    )\n\n    # 5. Create an AST Expr node to store this docstring as a constant.\n    docstring_node = ast.Expr(value=ast.Constant(value=prepared_docstring, kind=None))\n\n    # If there's no body, just make one containing our new docstring.\n    if not hasattr(node, \"body\") or not isinstance(node.body, list) or len(node.body) == 0:\n        node.body = [docstring_node]\n    else:\n        # If the first statement is an existing docstring, replace it;\n        # otherwise, insert the new docstring as the first statement.\n        first_stmt = node.body[0]\n        if (\n            isinstance(first_stmt, ast.Expr)\n            and isinstance(first_stmt.value, ast.Constant)\n            and isinstance(first_stmt.value.value, str)\n        ):\n            node.body[0] = docstring_node\n        else:\n            node.body.insert(0, docstring_node)\n\n\ndef main():\n    \"\"\"\n    Main entry point for the docstring generation script with flexible component ordering.\n    \n    The script supports different ordering modes through the --order-mode flag:\n    - 'topo' (default): Dependency-based ordering using a DFS-based approach:\n        1. If A depends on B, the graph has an edge A→B (meaning \"A depends on B\")\n        2. Root nodes (nodes with no dependencies) are processed first\n        3. Dependencies are always processed before the components that depend on them\n        4. This ensures proper docstring generation order\n    - 'random_node': Randomly shuffles all Python components, ignoring dependencies\n    - 'random_file': Processes files in random order, but preserves component order within files\n    \n    Class methods are processed before the classes that depend on them (not vice versa) in 'topo' mode,\n    ensuring proper docstring generation order. Special __init__ methods are skipped as\n    they typically don't need separate docstrings.\n    \n    The script provides options to skip or overwrite existing docstrings:\n    - By default, components with existing docstrings are skipped\n    - With --overwrite-docstrings flag, existing docstrings will be overwritten\n    - This behavior can also be configured in the config.yaml file under docstring_options.overwrite_docstrings\n    \n    Web interface integration:\n    - With --enable-web flag, the script enables integration with the web UI\n    - This allows visualization of the docstring generation process in a web browser\n    - Run the web UI separately using the run_web_ui.py script\n    \"\"\"\n    # Parse command line arguments\n    parser = argparse.ArgumentParser(\n        description='Generate docstrings for Python components in dependency order.'\n    )\n    parser.add_argument(\n        '--repo-path', \n        type=str, \n        default='data/raw_test_repo',\n        help='Path to the repository (default: data/raw_test_repo)'\n    )\n    parser.add_argument(\n        '--config-path', \n        type=str, \n        default='config/agent_config.yaml',\n        help='Path to the configuration file (default: config/agent_config.yaml)'\n    )\n    parser.add_argument(\n        '--test-mode',\n        type=str,\n        choices=['placeholder', 'context_print', 'none'],\n        default='none',\n        help='Test mode to run: \"placeholder\" for placeholder docstrings (no LLM calls), \"context_print\" to print context before writer calls, \"none\" for normal operation'\n    )\n    parser.add_argument(\n        '--order-mode',\n        type=str,\n        choices=['topo', 'random_node', 'random_file'],\n        default='topo',\n        help='Order mode for docstring generation: \"topo\" follows dependency order (default), \"random_node\" selects random Python nodes, \"random_file\" processes files in random order'\n    )\n    parser.add_argument(\n        '--enable-web',\n        action='store_true',\n        help='Enable integration with the web interface'\n    )\n    parser.add_argument(\n        '--overwrite-docstrings',\n        action='store_true',\n        help='Overwrite existing docstrings instead of skipping them (default: False)'\n    )\n    \n    args = parser.parse_args()\n    repo_path = args.repo_path\n    config_path = args.config_path\n    test_mode = args.test_mode\n    order_mode = args.order_mode\n    overwrite_docstrings = args.overwrite_docstrings\n    \n    # Create output directory for dependency graph\n    output_dir = os.path.join(\"output\", \"dependency_graphs\")\n    os.makedirs(output_dir, exist_ok=True)\n    \n    # Extract repository name from path for creating a unique filename\n    repo_name = os.path.basename(os.path.normpath(repo_path))\n    # Create a sanitized version of the repo name (remove special characters)\n    sanitized_repo_name = ''.join(c if c.isalnum() else '_' for c in repo_name)\n    dependency_graph_path = os.path.join(output_dir, f\"{sanitized_repo_name}_dependency_graph.json\")\n    \n    # Initialize the orchestrator for docstring generation\n    orchestrator = None\n    \n    # Initialize orchestrator unless we're in placeholder test mode\n    if test_mode != 'placeholder':\n        logger.info(f\"Initializing orchestrator with config: {config_path}\")\n        # Pass the test_mode to the orchestrator if it's \"context_print\"\n        orchestrator_test_mode = test_mode if test_mode != 'none' else None\n        orchestrator = Orchestrator(repo_path=repo_path, config_path=config_path, test_mode=orchestrator_test_mode)\n        \n        # Check if the overwrite_docstrings option is in the config file\n        # If it's there, it overrides the command-line argument\n        if hasattr(orchestrator, 'config'):\n            docstring_options = orchestrator.config.get('docstring_options', {})\n            config_overwrite = docstring_options.get('overwrite_docstrings')\n            if config_overwrite is not None:\n                overwrite_docstrings = config_overwrite\n                logger.info(f\"Using config file setting for overwrite_docstrings: {overwrite_docstrings}\")\n    else:\n        logger.info(\"Running in PLACEHOLDER TEST MODE with placeholder docstrings (no LLM calls)\")\n    \n    # Parse the repository to build the dependency graph\n    logger.info(f\"Parsing repository: {repo_path}\")\n    parser = DependencyParser(repo_path)\n    components = parser.parse_repository()\n    \n    # Save the dependency graph for future reference\n    parser.save_dependency_graph(dependency_graph_path)\n    logger.info(f\"Dependency graph saved to: {dependency_graph_path}\")\n    \n    # Build the graph for traversal\n    graph = build_graph_from_components(components)\n    \n    # Create a dependency graph in the format expected by the orchestrator\n    # Dictionary mapping component paths to their dependencies\n    dependency_graph = {}\n    for component_id, deps in graph.items():\n        dependency_graph[component_id] = list(deps)\n    \n    # Perform DFS-based traversal\n    logger.info(\"Performing DFS traversal on the dependency graph (starting from nodes with no dependencies)\")\n    sorted_components = dependency_first_dfs(graph)\n    logger.info(f\"Sorted {len(sorted_components)} components for processing\")\n    \n    # Apply the selected ordering mode\n    if order_mode == 'random_node':\n        # Randomly shuffle all components\n        logger.info(\"Using random node ordering mode - shuffling all components\")\n        random.shuffle(sorted_components)\n    elif order_mode == 'random_file':\n        # Group components by file path\n        logger.info(\"Using random file ordering mode - processing files in random order\")\n        # Group components by file\n        file_to_components = defaultdict(list)\n        for component_id in sorted_components:\n            component = components.get(component_id)\n            if component:\n                file_to_components[component.file_path].append(component_id)\n        \n        # Randomly shuffle the file order but maintain the order of components within each file\n        file_paths = list(file_to_components.keys())\n        random.shuffle(file_paths)\n        \n        # Create a new ordering based on randomly shuffled files\n        sorted_components = []\n        for file_path in file_paths:\n            sorted_components.extend(file_to_components[file_path])\n    else:\n        # Default to topological order (already set in sorted_components)\n        logger.info(\"Using topological ordering mode - processing components based on dependencies\")\n    \n    # Check if web interface is enabled\n    if args.enable_web:\n        try:\n            from src.visualizer.web_bridge import patch_visualizers\n            logger.info(\"Web interface integration enabled\")\n            patch_visualizers()\n        except ImportError as e:\n            logger.warning(f\"Failed to enable web interface integration: {e}\")\n            logger.warning(\"Make sure you have installed the required web dependencies\")\n\n    # Initialize the progress visualizer\n    visualizer = ProgressVisualizer(components, sorted_components)\n    visualizer.initialize()\n    \n    # Show dependency statistics\n    visualizer.show_dependency_stats()\n    \n    # Process components in order determined by DFS traversal\n    for component_id in sorted_components:\n        component = components.get(component_id)\n        if not component:\n            logger.warning(f\"Component {component_id} not found in parsed components\")\n            continue\n        \n        # Skip __init__ methods as they don't need docstrings\n        if component.component_type == \"method\" and component_id.endswith(\".__init__\"):\n            logger.info(f\"Skipping {component_id} - __init__ methods don't need docstrings\")\n            visualizer.update(component_id, \"completed\")\n            continue\n        \n        # compute the length of docstring if exists (using white space as delimiter)\n        docstring_length = len(component.docstring.split()) if component.has_docstring else 0\n        # Skip components that already have docstrings (unless overwrite_docstrings is True)\n        if component.has_docstring and not overwrite_docstrings and docstring_length > 10:\n            logger.info(f\"Skipping {component_id} - already has docstring\")\n            visualizer.update(component_id, \"completed\")\n            continue\n        elif component.has_docstring and overwrite_docstrings:\n            logger.info(f\"Overwriting existing docstring for {component_id}\")\n        \n        # Update the visualizer\n        visualizer.update(component_id, \"processing\")\n        \n        # Log the component type\n        comp_type = component.component_type\n        logger.info(f\"Processing {comp_type}: {component_id}\")\n        \n        # Generate the docstring\n        logger.info(f\"Generating docstring for {component_id}\")\n        docstring = generate_docstring_for_component(component, orchestrator, test_mode, dependency_graph)\n        \n        # Update the file with the new docstring\n        file_path = component.file_path\n        success = set_docstring_in_file(file_path, component, docstring)\n        \n        if success:\n            logger.info(f\"Successfully updated docstring for {component_id}\")\n            visualizer.update(component_id, \"completed\")\n        else:\n            logger.error(f\"Failed to update docstring for {component_id}\")\n            visualizer.update(component_id, \"error\")\n        \n        # Re-parse the file in case the line numbers changed due to docstring insertion\n        # This is only necessary if there are more components from the same file\n        same_file_components = [\n            comp_id for comp_id in sorted_components \n            if comp_id != component_id and components[comp_id].file_path == file_path\n        ]\n        \n        if same_file_components:\n            logger.info(f\"Re-parsing file {file_path} for updated line numbers\")\n            parser = DependencyParser(repo_path)\n            updated_components = parser.parse_repository()\n            \n            # Update the components dictionary with new line numbers\n            for comp_id, comp in updated_components.items():\n                if comp_id in components:\n                    components[comp_id] = comp\n    \n    # Finalize the visualization\n    visualizer.finalize()\n    \n    # Create a more descriptive mode message based on the test mode\n    if test_mode == 'placeholder':\n        mode_str = \"PLACEHOLDER TEST MODE (no LLM calls)\"\n    elif test_mode == 'context_print':\n        mode_str = \"CONTEXT PRINT TEST MODE (with context debugging)\"\n    else:\n        mode_str = \"normal mode\"\n    \n    # Add ordering mode to the completion message\n    order_mode_str = {\n        'topo': 'topological ordering',\n        'random_node': 'random node ordering',\n        'random_file': 'random file ordering'\n    }.get(order_mode, 'unknown ordering')\n    \n    logger.info(f\"Docstring generation complete ({mode_str}, {order_mode_str})\")\n    \n    # Print usage statistics for LLM providers if available\n    if orchestrator:\n        try:\n            # Access the rate limiters from agents\n            rate_limiters = []\n            \n            for agent_name in ['reader', 'writer', 'verifier']:\n                agent = getattr(orchestrator, agent_name, None)\n                if agent and hasattr(agent, 'llm') and hasattr(agent.llm, 'rate_limiter'):\n                    rate_limiters.append(agent.llm.rate_limiter)\n            \n            # Print statistics for each rate limiter\n            if rate_limiters:\n                logger.info(\"=\" * 50)\n                logger.info(\"TOKEN USAGE AND COST STATISTICS\")\n                logger.info(\"=\" * 50)\n                \n                for limiter in rate_limiters:\n                    limiter.print_usage_stats()\n                \n                # Calculate total cost across all limiters\n                total_cost = sum(limiter.total_cost for limiter in rate_limiters)\n                logger.info(\"=\" * 50)\n                logger.info(f\"TOTAL COST: ${total_cost:.6f}\")\n                logger.info(\"=\" * 50)\n        except Exception as e:\n            logger.warning(f\"Could not print token usage statistics: {e}\")\n\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "output/dependency_graphs/raw_test_repo_dependency_graph.json",
    "content": "{\n  \"helper.HelperClass\": {\n    \"id\": \"helper.HelperClass\",\n    \"component_type\": \"class\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [\n      \"helper.HelperClass.get_result\",\n      \"helper.DataProcessor\",\n      \"helper.HelperClass._internal_process\",\n      \"helper.HelperClass.process_data\"\n    ],\n    \"start_line\": 1,\n    \"end_line\": 14,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"helper.HelperClass.__init__\": {\n    \"id\": \"helper.HelperClass.__init__\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [],\n    \"start_line\": 3,\n    \"end_line\": 4,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"helper.HelperClass.process_data\": {\n    \"id\": \"helper.HelperClass.process_data\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [\n      \"helper.DataProcessor\"\n    ],\n    \"start_line\": 6,\n    \"end_line\": 8,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"helper.HelperClass._internal_process\": {\n    \"id\": \"helper.HelperClass._internal_process\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [],\n    \"start_line\": 10,\n    \"end_line\": 11,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"helper.HelperClass.get_result\": {\n    \"id\": \"helper.HelperClass.get_result\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [],\n    \"start_line\": 13,\n    \"end_line\": 14,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"helper.DataProcessor\": {\n    \"id\": \"helper.DataProcessor\",\n    \"component_type\": \"class\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [\n      \"helper.DataProcessor.process\",\n      \"helper.DataProcessor._internal_process\"\n    ],\n    \"start_line\": 16,\n    \"end_line\": 72,\n    \"has_docstring\": true,\n    \"docstring\": \"\\n    \\\"\\\"\\\"Handles basic data processing tasks within a system.\\n\\n        This class is designed to perform simple data processing operations, providing\\n        utility methods that can be used in various scenarios where basic data manipulation\\n        is required. It is particularly useful in contexts where a straightforward list of\\n        integers is needed for further processing or testing.\\n\\n        The `DataProcessor` class fits into the larger system architecture as a utility\\n        component, offering static and internal methods to handle specific processing tasks.\\n        It achieves its purpose by providing a static method for general use and an internal\\n        method for class-specific operations.\\n\\n        Example:\\n            # Initialize the DataProcessor class\\n            processor = DataProcessor()\\n\\n            # Use the static method to process data\\n            result = DataProcessor.process()\\n            print(result)  # Output: [1, 2, 3]\\n\\n            # Use the internal method for internal processing\\n            internal_result = processor._internal_process()\\n            print(internal_result)  # Output: 'processed'\\n    \\\"\\\"\\\"\\n    \"\n  },\n  \"helper.DataProcessor.process\": {\n    \"id\": \"helper.DataProcessor.process\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [],\n    \"start_line\": 45,\n    \"end_line\": 57,\n    \"has_docstring\": true,\n    \"docstring\": \"\\n        \\\"\\\"\\\"Processes data and returns a list of integers.\\n\\n            This static method is designed to perform a basic data processing task\\n            and return a predefined list of integers. It can be used whenever a simple\\n            list of integers is required for further operations or testing purposes.\\n\\n            Returns:\\n                list of int: A list containing the integers [1, 2, 3].\\n        \\\"\\\"\\\"\\n        \"\n  },\n  \"helper.DataProcessor._internal_process\": {\n    \"id\": \"helper.DataProcessor._internal_process\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/helper.py\",\n    \"relative_path\": \"helper.py\",\n    \"depends_on\": [],\n    \"start_line\": 59,\n    \"end_line\": 72,\n    \"has_docstring\": true,\n    \"docstring\": \"\\n        \\\"\\\"\\\"Processes internal data and returns a status message.\\n\\n            This method is used internally within the `DataProcessor` class to perform\\n            specific data processing tasks that are not exposed publicly. It is typically\\n            called by other methods within the class to handle intermediate processing\\n            steps.\\n\\n            Returns:\\n                str: A string indicating the processing status, specifically 'processed'.\\n            \\\"\\\"\\\"\\n        \"\n  },\n  \"main.main_function\": {\n    \"id\": \"main.main_function\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/main.py\",\n    \"relative_path\": \"main.py\",\n    \"depends_on\": [\n      \"helper.HelperClass\",\n      \"inner.inner_functions.generate_timestamp\",\n      \"main.utility_function\"\n    ],\n    \"start_line\": 5,\n    \"end_line\": 10,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"main.utility_function\": {\n    \"id\": \"main.utility_function\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/main.py\",\n    \"relative_path\": \"main.py\",\n    \"depends_on\": [],\n    \"start_line\": 13,\n    \"end_line\": 14,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"processor.AdvancedProcessor\": {\n    \"id\": \"processor.AdvancedProcessor\",\n    \"component_type\": \"class\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/processor.py\",\n    \"relative_path\": \"processor.py\",\n    \"depends_on\": [\n      \"processor.AdvancedProcessor.run\",\n      \"helper.HelperClass\",\n      \"processor.AdvancedProcessor.process_result\",\n      \"main.utility_function\",\n      \"processor.DataProcessor\"\n    ],\n    \"start_line\": 6,\n    \"end_line\": 18,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"processor.AdvancedProcessor.__init__\": {\n    \"id\": \"processor.AdvancedProcessor.__init__\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/processor.py\",\n    \"relative_path\": \"processor.py\",\n    \"depends_on\": [\n      \"helper.HelperClass\",\n      \"processor.DataProcessor\"\n    ],\n    \"start_line\": 8,\n    \"end_line\": 10,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"processor.AdvancedProcessor.run\": {\n    \"id\": \"processor.AdvancedProcessor.run\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/processor.py\",\n    \"relative_path\": \"processor.py\",\n    \"depends_on\": [],\n    \"start_line\": 12,\n    \"end_line\": 15,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"processor.AdvancedProcessor.process_result\": {\n    \"id\": \"processor.AdvancedProcessor.process_result\",\n    \"component_type\": \"method\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/processor.py\",\n    \"relative_path\": \"processor.py\",\n    \"depends_on\": [\n      \"main.utility_function\"\n    ],\n    \"start_line\": 17,\n    \"end_line\": 18,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"test_file.test_function\": {\n    \"id\": \"test_file.test_function\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/test_file.py\",\n    \"relative_path\": \"test_file.py\",\n    \"depends_on\": [],\n    \"start_line\": 1,\n    \"end_line\": 2,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"inner.inner_functions.inner_function\": {\n    \"id\": \"inner.inner_functions.inner_function\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/inner/inner_functions.py\",\n    \"relative_path\": \"inner/inner_functions.py\",\n    \"depends_on\": [],\n    \"start_line\": 1,\n    \"end_line\": 15,\n    \"has_docstring\": true,\n    \"docstring\": \"\\n    Returns a greeting message from an inner function.\\n\\n    This function is designed to return a simple greeting message, which can be used in nested or internal function calls to verify execution flow or for debugging purposes. It is typically used in development environments where confirming the execution of specific code paths is necessary.\\n\\n    Returns:\\n        str: A greeting message stating 'Hello from inner function!'\\n\\n    Example:\\n        >>> message = inner_function()\\n        >>> print(message)\\n        'Hello from inner function!'\\n    \"\n  },\n  \"inner.inner_functions.get_random_quote\": {\n    \"id\": \"inner.inner_functions.get_random_quote\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/inner/inner_functions.py\",\n    \"relative_path\": \"inner/inner_functions.py\",\n    \"depends_on\": [],\n    \"start_line\": 17,\n    \"end_line\": 31,\n    \"has_docstring\": true,\n    \"docstring\": \"\\n    Fetches a predefined inspirational quote.\\n\\n    This function is designed to provide users with a motivational quote, which can be used in applications that aim to inspire or uplift users. It is particularly useful in scenarios where a quick, positive message is needed to enhance user experience.\\n\\n    Returns:\\n        str: A quote string stating 'The best way to predict the future is to create it.'\\n\\n    Example:\\n        >>> quote = get_random_quote()\\n        >>> print(quote)\\n        'The best way to predict the future is to create it.'\\n    \"\n  },\n  \"inner.inner_functions.generate_timestamp\": {\n    \"id\": \"inner.inner_functions.generate_timestamp\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/inner/inner_functions.py\",\n    \"relative_path\": \"inner/inner_functions.py\",\n    \"depends_on\": [],\n    \"start_line\": 33,\n    \"end_line\": 34,\n    \"has_docstring\": false,\n    \"docstring\": \"\"\n  },\n  \"inner.inner_functions.get_system_status\": {\n    \"id\": \"inner.inner_functions.get_system_status\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/inner/inner_functions.py\",\n    \"relative_path\": \"inner/inner_functions.py\",\n    \"depends_on\": [],\n    \"start_line\": 36,\n    \"end_line\": 50,\n    \"has_docstring\": true,\n    \"docstring\": \"\\n    Provides a static message indicating the operational status of systems.\\n\\n    This function is used to retrieve a fixed status message that confirms all systems are functioning correctly. It is useful in monitoring dashboards or status pages where a quick confirmation of system health is required.\\n\\n    Returns:\\n        str: A status message stating 'All systems operational.'\\n\\n    Example:\\n        >>> status = get_system_status()\\n        >>> print(status)\\n        'All systems operational'\\n    \"\n  },\n  \"inner.inner_functions.fetch_user_message\": {\n    \"id\": \"inner.inner_functions.fetch_user_message\",\n    \"component_type\": \"function\",\n    \"file_path\": \"/home/dayuyang/DocAgent/data/raw_test_repo/inner/inner_functions.py\",\n    \"relative_path\": \"inner/inner_functions.py\",\n    \"depends_on\": [],\n    \"start_line\": 52,\n    \"end_line\": 67,\n    \"has_docstring\": true,\n    \"docstring\": \"\\n    \\\"\\\"\\\"Fetches a predefined user message indicating notifications.\\n\\n        This function is used to retrieve a static message that informs the user about the number of notifications they have. It is typically used in scenarios where a quick status update is needed for user engagement.\\n\\n        Returns:\\n            str: A message string stating 'Welcome back! You have 3 notifications.'\\n\\n        Example:\\n            >>> message = fetch_user_message()\\n            >>> print(message)\\n            'Welcome back! You have 3 notifications.'\\n        \\\"\\\"\\\"\\n    \"\n  }\n}"
  },
  {
    "path": "run_web_ui.py",
    "content": "#!/usr/bin/env python3\nimport eventlet\neventlet.monkey_patch()  \n# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nWeb UI Launcher for DocAgent Docstring Generator\n\nThis script launches the web-based user interface for the docstring generation tool.\nThe UI provides a more interactive and visual way to use the docstring generator,\nwith real-time feedback and progress tracking.\n\nUsage:\n    python run_web_ui.py [--host HOST] [--port PORT] [--debug]\n\"\"\"\n\nimport argparse\nimport os\nimport sys\nimport logging\nfrom pathlib import Path\n\n# Configure logging\nlogging.basicConfig(\n    level=logging.INFO,\n    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',\n    handlers=[\n        logging.StreamHandler(sys.stdout)\n    ]\n)\nlogger = logging.getLogger(\"docstring_web\")\n\n# Add the current directory to the path\nsys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))\n\ndef check_dependencies():\n    \"\"\"Check if all required dependencies are installed.\"\"\"\n    try:\n        import flask\n        import flask_socketio\n        import eventlet\n        import yaml\n        import tabulate\n        import colorama\n        return True\n    except ImportError as e:\n        missing_module = str(e).split(\"'\")[1]\n        logger.error(f\"Missing dependency: {missing_module}\")\n        logger.error(\"Please install all required dependencies with:\")\n        logger.error(\"pip install -r requirements-web.txt\")\n        return False\n\ndef main():\n    \"\"\"Parse command line arguments and start the web UI.\"\"\"\n    parser = argparse.ArgumentParser(description='Launch the DocAgent Web UI')\n    parser.add_argument('--host', default='127.0.0.1', help='Host to bind the server to')\n    parser.add_argument('--port', type=int, default=5000, help='Port to bind the server to')\n    parser.add_argument('--debug', action='store_true', help='Run in debug mode')\n    \n    args = parser.parse_args()\n    \n    # Check dependencies\n    if not check_dependencies():\n        return 1\n    \n    # Print banner\n    print(\"\\n\" + \"=\" * 80)\n    print(\"DocAgent Web Interface\".center(80))\n    print(\"=\" * 80)\n    \n    # Import and run the web app\n    try:\n        # First try to import eventlet to ensure it's properly initialized\n        import eventlet\n        eventlet.monkey_patch()\n        \n        from src.web.app import create_app\n        \n        app, socketio = create_app(debug=args.debug)\n        \n        logger.info(f\"Starting DocAgent Web UI at: http://{args.host}:{args.port}\")\n        logger.info(\"Press Ctrl+C to stop the server\")\n        \n        # Start the server\n        socketio.run(app, host=args.host, port=args.port, debug=args.debug, allow_unsafe_werkzeug=True)\n        \n        return 0\n    except ImportError as e:\n        logger.error(f\"Error importing web application: {e}\")\n        logger.error(\"Make sure the src/web directory exists and contains the necessary files.\")\n        return 1\n    except Exception as e:\n        logger.error(f\"Error running web application: {e}\")\n        return 1\n\nif __name__ == '__main__':\n    try:\n        sys.exit(main())\n    except KeyboardInterrupt:\n        print(\"\\nServer stopped.\")\n        sys.exit(0) "
  },
  {
    "path": "setup.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom setuptools import setup, find_packages\n\n# Read the contents of README file\nfrom pathlib import Path\nthis_directory = Path(__file__).parent\nlong_description = (this_directory / \"README.md\").read_text()\n\n# Prepare all extras\ndev_requires = [\n    \"pytest>=8.3.4\",\n    \"pytest-cov>=2.0\",\n    \"black>=22.0\",\n    \"flake8>=3.9\",\n]\n\nweb_requires = [\n    \"flask>=3.1.0\",\n    \"flask-socketio>=5.5.1\",\n    \"eventlet>=0.39.0\",\n    \"python-socketio>=5.12.1\",\n    \"python-engineio>=4.11.2\",\n    \"bidict>=0.23.0\",\n    \"dnspython>=2.7.0\",\n    \"six>=1.16.0\",\n]\n\nvisualization_requires = [\n    \"matplotlib>=3.10.0\",\n    \"pygraphviz>=1.14\",\n    \"networkx>=3.4.2\",\n]\n\ncuda_requires = [\n    \"torch>=2.0.0\",\n    \"accelerate>=1.4.0\",\n]\n\n# Combine all extras for the 'all' option\nall_requires = dev_requires + web_requires + visualization_requires + cuda_requires\n\nsetup(\n    name=\"DocstringGenerator\",\n    version=\"0.1.0\",\n    author=\"Dayu Yang\",\n    author_email=\"dayuyang@meta.com\",\n    description=\"DocAgent for High-quality docstring generation in Large-scale Python projects\",\n    long_description=long_description,\n    long_description_content_type=\"text/markdown\",\n    packages=find_packages(where=\"src\"),\n    package_dir={\"\": \"src\"},\n    classifiers=[\n        \"Development Status :: 3 - Alpha\",\n        \"Intended Audience :: Developers\",\n        \"License :: OSI Approved :: MIT License\",\n        \"Programming Language :: Python :: 3\",\n        \"Programming Language :: Python :: 3.8\",\n        \"Programming Language :: Python :: 3.9\",\n        \"Programming Language :: Python :: 3.10\",\n    ],\n    python_requires=\">=3.8\",\n    install_requires=[\n        # Core dependencies\n        \"numpy>=1.23.5\",\n        \"pyyaml>=6.0\",\n        \"jinja2>=3.1.5\",\n        \"requests>=2.32.0\",\n        \"urllib3>=2.3.0\",\n        \n        # Code analysis tools\n        \"astor>=0.8.1\",\n        \"code2flow>=2.5.1\",\n        \"pydeps>=3.0.0\",\n        \n        # AI/LLM related dependencies\n        \"anthropic>=0.45.0\",\n        \"openai>=1.60.1\",\n        \"langchain-anthropic>=0.3.4\",\n        \"langchain-openai>=0.3.2\",\n        \"langchain-core>=0.3.31\",\n        \"langgraph>=0.2.67\",\n        \"tiktoken>=0.8.0\",\n        \"transformers>=4.48.0\",\n        \"huggingface-hub>=0.28.0\",\n        \"google-generativeai>=0.6.0\",\n        \n        # Utility packages\n        \"tqdm>=4.67.1\",\n        \"tabulate>=0.9.0\",\n        \"colorama>=0.4.6\",\n        \"termcolor>=2.5.0\",\n        \"pydantic>=2.10.0\",\n\n        # Web requirements \n        \"flask>=3.1.0\",\n        \"flask-socketio>=5.5.1\",\n        \"eventlet>=0.39.0\",\n        \"python-socketio>=5.12.1\",\n        \"python-engineio>=4.11.2\",\n        \"bidict>=0.23.0\",\n        \"dnspython>=2.7.0\",\n        \"six>=1.16.0\",\n\n        # CUDA requirements \n        \"torch>=2.0.0\",\n        \"accelerate>=1.4.0\",\n    ],\n    extras_require={\n        \"dev\": dev_requires,\n        \"web\": web_requires,  # Keep for potential compatibility, now included in core\n        \"visualization\": visualization_requires,\n        \"cuda\": cuda_requires, # Keep for potential compatibility, now included in core\n        \"all\": all_requires,\n    }\n)"
  },
  {
    "path": "src/DocstringGenerator.egg-info/PKG-INFO",
    "content": "Metadata-Version: 2.2\nName: DocstringGenerator\nVersion: 0.1.0\nSummary: DocAgent for High-quality docstring generation in Large-scale Python projects\nAuthor: Dayu Yang\nAuthor-email: dayuyang@meta.com\nClassifier: Development Status :: 3 - Alpha\nClassifier: Intended Audience :: Developers\nClassifier: License :: OSI Approved :: MIT License\nClassifier: Programming Language :: Python :: 3\nClassifier: Programming Language :: Python :: 3.8\nClassifier: Programming Language :: Python :: 3.9\nClassifier: Programming Language :: Python :: 3.10\nRequires-Python: >=3.8\nDescription-Content-Type: text/markdown\nRequires-Dist: numpy>=1.23.5\nRequires-Dist: pyyaml>=6.0\nRequires-Dist: jinja2>=3.1.5\nRequires-Dist: requests>=2.32.0\nRequires-Dist: urllib3>=2.3.0\nRequires-Dist: astor>=0.8.1\nRequires-Dist: code2flow>=2.5.1\nRequires-Dist: pydeps>=3.0.0\nRequires-Dist: anthropic>=0.45.0\nRequires-Dist: openai>=1.60.1\nRequires-Dist: langchain-anthropic>=0.3.4\nRequires-Dist: langchain-openai>=0.3.2\nRequires-Dist: langchain-core>=0.3.31\nRequires-Dist: langgraph>=0.2.67\nRequires-Dist: tiktoken>=0.8.0\nRequires-Dist: transformers>=4.48.0\nRequires-Dist: huggingface-hub>=0.28.0\nRequires-Dist: google-generativeai>=0.6.0\nRequires-Dist: tqdm>=4.67.1\nRequires-Dist: tabulate>=0.9.0\nRequires-Dist: colorama>=0.4.6\nRequires-Dist: termcolor>=2.5.0\nRequires-Dist: pydantic>=2.10.0\nRequires-Dist: flask>=3.1.0\nRequires-Dist: flask-socketio>=5.5.1\nRequires-Dist: eventlet>=0.39.0\nRequires-Dist: python-socketio>=5.12.1\nRequires-Dist: python-engineio>=4.11.2\nRequires-Dist: bidict>=0.23.0\nRequires-Dist: dnspython>=2.7.0\nRequires-Dist: six>=1.16.0\nRequires-Dist: torch>=2.0.0\nRequires-Dist: accelerate>=1.4.0\nProvides-Extra: dev\nRequires-Dist: pytest>=8.3.4; extra == \"dev\"\nRequires-Dist: pytest-cov>=2.0; extra == \"dev\"\nRequires-Dist: black>=22.0; extra == \"dev\"\nRequires-Dist: flake8>=3.9; extra == \"dev\"\nProvides-Extra: web\nRequires-Dist: flask>=3.1.0; extra == \"web\"\nRequires-Dist: flask-socketio>=5.5.1; extra == \"web\"\nRequires-Dist: eventlet>=0.39.0; extra == \"web\"\nRequires-Dist: python-socketio>=5.12.1; extra == \"web\"\nRequires-Dist: python-engineio>=4.11.2; extra == \"web\"\nRequires-Dist: bidict>=0.23.0; extra == \"web\"\nRequires-Dist: dnspython>=2.7.0; extra == \"web\"\nRequires-Dist: six>=1.16.0; extra == \"web\"\nProvides-Extra: visualization\nRequires-Dist: matplotlib>=3.10.0; extra == \"visualization\"\nRequires-Dist: pygraphviz>=1.14; extra == \"visualization\"\nRequires-Dist: networkx>=3.4.2; extra == \"visualization\"\nProvides-Extra: cuda\nRequires-Dist: torch>=2.0.0; extra == \"cuda\"\nRequires-Dist: accelerate>=1.4.0; extra == \"cuda\"\nProvides-Extra: all\nRequires-Dist: pytest>=8.3.4; extra == \"all\"\nRequires-Dist: pytest-cov>=2.0; extra == \"all\"\nRequires-Dist: black>=22.0; extra == \"all\"\nRequires-Dist: flake8>=3.9; extra == \"all\"\nRequires-Dist: flask>=3.1.0; extra == \"all\"\nRequires-Dist: flask-socketio>=5.5.1; extra == \"all\"\nRequires-Dist: eventlet>=0.39.0; extra == \"all\"\nRequires-Dist: python-socketio>=5.12.1; extra == \"all\"\nRequires-Dist: python-engineio>=4.11.2; extra == \"all\"\nRequires-Dist: bidict>=0.23.0; extra == \"all\"\nRequires-Dist: dnspython>=2.7.0; extra == \"all\"\nRequires-Dist: six>=1.16.0; extra == \"all\"\nRequires-Dist: matplotlib>=3.10.0; extra == \"all\"\nRequires-Dist: pygraphviz>=1.14; extra == \"all\"\nRequires-Dist: networkx>=3.4.2; extra == \"all\"\nRequires-Dist: torch>=2.0.0; extra == \"all\"\nRequires-Dist: accelerate>=1.4.0; extra == \"all\"\nDynamic: author\nDynamic: author-email\nDynamic: classifier\nDynamic: description\nDynamic: description-content-type\nDynamic: provides-extra\nDynamic: requires-dist\nDynamic: requires-python\nDynamic: summary\n\n# DocAgent: Agentic Hierarchical Docstring Generation System\n\n<p align=\"center\">\n  <img src=\"assets/meta_logo_white.png\" width=\"20%\" alt=\"Meta Logo\">\n</p>\n\nDocAgent is a system designed to generate high-quality, context-aware docstrings for Python codebases using a multi-agent approach and hierarchical processing.\n\n## Table of Contents\n\n- [Motivation](#motivation)\n- [Methodology](#methodology)\n- [Installation](#installation)\n- [Components](#components)\n- [Usage](#usage)\n- [Data Handling](#data-handling)\n- [Baselines](#baselines)\n- [Development Notes](#development-notes)\n\n## Motivation\n\nHigh-quality docstrings are crucial for code readability, usability, and maintainability, especially in large repositories. They should explain the purpose, parameters, returns, exceptions, and usage within the broader context. Current LLMs often struggle with this, producing superficial or redundant comments and failing to capture essential context or rationale. DocAgent aims to address these limitations by generating informative, concise, and contextually aware docstrings.\n\n## Methodology\n\nDocAgent employs two key strategies:\n\n1.  **Hierarchical Traversal**: Processes code components by analyzing dependencies, starting with files having fewer dependencies. This builds a documented foundation before tackling more complex code, addressing the challenge of documenting context that itself lacks documentation.\n2.  **Agentic System**: Utilizes a team of specialized agents (`Reader`, `Searcher`, `Writer`, `Verifier`) coordinated by an `Orchestrator`. This system gathers context (internal and external), drafts docstrings according to standards, and verifies their quality in an iterative process.\n\n<img src=\"assets/system.png\" width=\"100%\" alt=\"System Overview\">\n\nFor more details on the agentic framework, see the [Agent Component README](./src/agent/README.md).\n\n## Installation\n\nDetailed installation instructions using `pip` or `conda`, including optional dependencies and troubleshooting tips, can be found in [INSTALL.md](./INSTALL.md).\n\n## Components\n\nDocAgent is composed of several key parts:\n\n- **[Core Agent Framework](./src/agent/README.md)**: Implements the multi-agent system (Reader, Searcher, Writer, Verifier, Orchestrator) responsible for the generation logic.\n- **[Docstring Evaluator](./src/evaluator/README.md)**: Provides tools for evaluating docstring quality, primarily focusing on completeness based on static code analysis (AST).\n- **[Generation Web UI](./src/web/README.md)**: A web interface for configuring, running, and monitoring the docstring *generation* process in real-time, visualizing agent activity and repository structure.\n- **[Evaluation Web UI](./src/web_eval/README.md)**: A separate web interface for configuring and running docstring *evaluations*, assessing completeness and helpfulness (using LLMs).\n\n## Usage\n\nThe primary ways to interact with DocAgent are:\n\n1.  **Generation Web UI**: Recommended for visualizing the generation process. Launch via:\n    ```bash\n    python run_web_ui.py\n    ```\n    Then access `http://localhost:5000` (or as configured). See the [Generation Web UI README](./src/web/README.md) for details.\n\n2.  **Evaluation Web UI**: Recommended for assessing docstring quality. Launch via:\n    ```bash\n    cd src/web_eval\n    ./start_server.sh # Or python app.py\n    ```\n    Then access `http://localhost:5000` (or as configured). See the [Evaluation Web UI README](./src/web_eval/README.md) for details.\n\n3.  **Command Line (Generation)**: Run the generation process directly:\n    ```bash\n    # Example: Run on a test repo, removing existing docstrings first\n    ./tool/remove_docstrings.sh data/raw_test_repo\n    python generate_docstrings.py --repo-path data/raw_test_repo\n    ```\n    Use `--help` for more options.\n\n## Data Handling\n\nTools are included for managing datasets for evaluation:\n\n- **GitHub Repository Downloader (`src/data/parse/downloader.py`)**: Finds and downloads GitHub repositories based on configurable criteria (language, stars, size, etc.).\n- **Repository Selection (`experiments/select_repos.py`)**: Selects a diverse subset of downloaded repositories based on metrics like code size and complexity.\n\n(See original README sections for detailed usage if needed).\n\n## Baselines\n\nA simple \"copy and paste\" baseline is implemented (`experiments/generate_docstrings_copy_and_paste.py`) for comparison. It sends isolated code components to an LLM without context.\n\n```bash\n# Example: Run baseline on a test repo\n./tool/remove_docstrings.sh data/raw_test_repo\npython experiments/generate_docstrings_copy_and_paste.py --repo-path data/raw_test_repo\n```\n\n## Development Notes\n\n- Remember to activate your chosen environment (`pip` or `conda`).\n- Use `pip install -e \".[dev]\"` for development dependencies.\n- Run tests using `pytest`.\n- See [INSTALL.md](./INSTALL.md) for setting up system dependencies like GraphViz if needed for visualizations.\n\n---\n*This README provides a high-level overview. Please refer to the linked component READMEs and `INSTALL.md` for specific details.*\n\n# Todo\n\n- repo-level eval script\n- argument vs parameter\n- \"Need more information\" seems does not work for codellama34B and gemini.\n- some repo depends on \"not-install\" package(ask you to install autogen after download the repo)\n- Query should also search internally\n- truncated \"called by\", especially Class (too long)\n- Overkill issue\n\n# For ACL Experiments\n\n## Note\n\nclass evaluate:\n- really means eval the init function (if has init)\n\n\n## Data\n\n### GitHub Repository Downloader\n\nThe project includes a GitHubRepoDownloader that automates the process of finding and downloading repositories for docstring generation tasks. This tool allows you to specify various criteria to target repositories that match your requirements.\n\n#### Features:\n\n- **Configurable Search Criteria**: Filter repositories by owner, creation date, language, stars, forks, size, and license.\n- **Python Language Filtering**: Ensures downloaded repositories contain a minimum percentage of Python code (default: 80%).\n- **Repository Metadata**: Automatically saves metadata about each downloaded repository.\n- **Rate Limit Handling**: Respects GitHub API rate limits to avoid throttling.\n- **Logging**: Comprehensive logging of the download process.\n\n#### Usage:\n\nTo download repositories, create a configuration file and run the downloader:\n\n```bash\npython -m src.data.parse.downloader\n```\n\n#### Configuration:\n\nCreate a YAML configuration file with the following structure:\n\n```yaml\n# GitHub authentication\nGITHUB_TOKEN: \"your-github-token\"\n\n# Output directory\noutput_directory: \"data/downloaded_repos\"\n\n# Repository limits\nmax_repos: 10\nskip_archived: true\nskip_forks: true\nmin_python_percentage: 80  # Minimum percentage of Python code required\n\n# Search criteria\nsearch_criteria:\n  language: \"python\"\n  stars:\n    min: 100\n  forks:\n    min: 10\n  dates:\n    created_after: \"2020-01-01\"\n  owners:\n    - \"username1\"\n    - \"org_name\"\n```\n\nThe downloader will:\n1. Search GitHub repositories matching your criteria\n2. Check if each repository meets the Python percentage requirement\n3. Clone qualifying repositories to the specified output directory\n4. Save repository metadata for further analysis\n\n### Repository Selection\n\nAfter downloading repositories, you may want to select a diverse subset for analysis. The project includes a repository selection tool that helps you choose repositories with varying characteristics:\n\n#### Features:\n\n- **Diversity-Based Selection**: Select repositories based on code size and structural complexity.\n- **Code Size Metrics**: Calculates the number of Python files and total lines of code.\n- **Topological Complexity**: Measures the depth of the repository directory structure.\n- **Visualization**: Generates scatter plots showing the distribution of selected repositories.\n\n#### Usage:\n\nTo select repositories from your downloaded collection:\n\n```bash\npython -m experiments.select_repos\n```\n\n#### Process:\n\nThe selection process follows these steps:\n1. Analyzes each repository to extract metrics (Python files count, total lines, directory depth)\n2. Normalizes the metrics to ensure fair comparison\n3. Creates clusters of repositories with similar characteristics\n4. Selects representatives from each cluster to ensure diversity\n5. Generates a visualization of the selection results\n\nThis approach ensures that your analysis includes repositories with varying sizes and complexity levels, providing a more comprehensive evaluation of docstring generation techniques.\n\n##  Baseline\n\n### Copy and Paste\nWe implemented a simple \"copy and paste\" baseline system that mimics the approach of users copying code components and pasting them directly to an LLM interface. This baseline:\n\n1. Extracts individual code components (functions, classes, methods) from Python files\n2. Sends only the component's source code to an LLM without any surrounding context\n3. Asks the LLM to generate a docstring based solely on that isolated component\n4. Inserts the generated docstring back into the code\n\nThis baseline serves as a comparison point to demonstrate the effectiveness of our full agentic hierarchical system, which considers dependency relationships and broader context when generating docstrings.\n\nTo run the baseline system:\n\n```bash\nclear\n./tool/remove_docstrings.sh data/raw_test_repo\npython experiments/generate_docstrings_copy_and_paste.py --repo-path data/raw_test_repo\n```\n\nThe baseline uses the same configuration file (agent_config.yaml) as the main system, so it can work with any supported LLM (Claude, OpenAI, HuggingFace).\n\nTo run in placeholder mode (no actual LLM calls):\n\n```bash\npython experiments/generate_docstrings_copy_and_paste.py --repo-path data/raw_test_repo --test-mode placeholder\n```\n\nTo overwrite existing docstrings:\n\n```bash\npython experiments/generate_docstrings_copy_and_paste.py --repo-path data/raw_test_repo --overwrite-docstrings\n```\n\n\n### Main Experiments\n\n\n\n## Motivation:\n\nIn the realm of large-scale software repositories, the presence of high-quality, user-oriented docstrings is crucial for maintaining code readability, usability, and maintainability. A well-crafted docstring should not only provide comprehensive details about parameters, return values, exceptions, and usage examples but also clearly articulate the purpose of the function or class within the broader context of the repository. This includes explaining when and how to use the function or class, as well as its relationship to other components in the codebase.\nDespite the importance of such documentation, current large language models (LLMs) often fall short in generating docstrings that meet these expectations. Common issues include the production of redundant or superficial commentary, a failure to highlight the underlying rationale behind implementation choices, and the omission of crucial constraints and assumptions. These shortcomings can lead to misunderstandings and inefficiencies for developers who rely on these docstrings for guidance.\nThe challenge, therefore, is to develop methods that enable the generation of high-quality docstrings that are both informative and concise, avoiding redundancy by not reiterating information that can be inferred from the code itself, such as parameter types when type hints are present. Addressing these challenges is essential for enhancing the utility of docstrings in large-scale repositories, ultimately contributing to more efficient and effective software development processes.\n\n## Challenges and Limitations of Existing Docstring Generation System:\n\nThe task of generating high-quality docstrings in large-scale repositories presents several significant challenges. One of the primary difficulties lies in the evaluation of docstring quality. There is inherent ambiguity in assessing what constitutes a \"good\" docstring, as gold-standard data is scarce. Even highly-rated repositories often contain docstrings that are either inadequate or only partially effective, complicating the establishment of reliable benchmarks for quality assessment.\nAnother challenge is the limitation imposed by the context window of large language models (LLMs). It is impractical to include an entire repository in a single prompt, necessitating a focus on selecting and summarizing relevant information. Determining what is \"relevant\" is crucial for providing the LLM with a comprehensive understanding of the purpose of the focal function or class. This involves discerning which aspects of the codebase should be included to give the LLM a \"global sense\" of the function's or class's role and significance.\nFurthermore, there is a \"chicken and egg\" problem inherent in this task. Generating high-quality docstrings requires a well-rounded understanding of the context in which the focal function or class operates. However, the context itself often lacks sufficient documentation to clearly convey its purpose and interrelations. This lack of existing high-quality docstrings in the surrounding code complicates the process of generating new ones, as the foundational understanding needed to inform the generation process is itself incomplete.\nAddressing these challenges is essential for advancing the capability of LLMs to produce docstrings that are not only informative and concise but also contextually aware and aligned with the broader objectives of the codebase.\n\n## Methodology:\n\nTo address the challenges of generating high-quality docstrings in large-scale repositories, we propose a hierarchical traversal approach combined with an agentic system composed of specialized roles: reader, searcher, writer, and verifier. This methodology is designed to systematically and efficiently produce comprehensive and contextually aware docstrings.\n\nHierarchical Traverse\n\nThe hierarchical traverse principle is central to our approach. By prioritizing the generation of docstrings for source code files with fewer dependencies, we aim to build a solid foundation of well-documented base classes and utility functions before tackling more complex implementations. This strategy effectively addresses the \"chicken and egg\" problem by ensuring that the foundational components of the codebase are well-understood and documented first. Unlike existing systems that generate docstrings in a random order, our method provides a structured and logical progression through the codebase.\n\nAgentic System\n\nOur agentic system is designed to facilitate the docstring generation process through a series of coordinated roles:\n\n- Reader: The reader initiates the process by examining the focal code component and identifying any additional internal or external information needed to understand its purpose and context. If further information is required, the reader sends a request to the searcher.\n- Searcher: The searcher traverses the dependency graph to gather relevant information, both from within the codebase and from open-internet sources if necessary. This information is then used to update the context state, providing a more comprehensive understanding of the focal component.\n- Writer: Once the context is deemed sufficient, the reader passes the focal code component and its context to the writer. The - writer drafts the docstring, ensuring it adheres to the specified quality and instructional guidelines.\n- Verifier: The verifier conducts a final quality check of the drafted docstring. If formatting issues are detected, the docstring is returned to the writer for revision. If additional context is needed to enhance informativeness, the process returns to the reader for further information gathering.\n\nThis iterative and collaborative approach ensures that each docstring is not only accurate and informative but also contextually aligned with the broader objectives of the codebase. By leveraging the strengths of each agent, our methodology provides a robust framework for generating high-quality documentation in large-scale repositories.\n\n\n# For Test\n\nThe easiest way to interact with DocAgent is through Web App. Assuming the Web App is hosted on remote server.\n\n## Docstring Generation System (Agentic + Hierarchical)\n\n\nWithout Web UI:\n```bash\nclear\n./tool/remove_docstrings.sh data/raw_test_repo\npython generate_docstrings.py --repo-path data/raw_test_repo --test-mode context_print\n```\n\nWith Web UI:\n```bash\nclear\n./tool/remove_docstrings.sh data/raw_test_repo\npython run_web_ui.py --host 0.0.0.0 --port 5000\n```\n\n## Docstring Eval system\n\nWithout WebUI: Manual run the test files under `test/evaluator`.\n\nWith WebUI:\n```bash\npython src/web_eval/app.py --host 0.0.0.0 --port 5001\n```\n\n## Test Hierarchical Generation Only (no LLM call)\n\nFor test hierarchical generation only (no LLM call), run the following command: (testing on `data/test_repo_vm` and `data/downloaded_repos/AutoSurvey`)\n```bash\n./tool/remove_docstrings.sh data/downloaded_repos/AutoSurvey\nclear\npython generate_docstrings.py --repo-path data/downloaded_repos/AutoSurvey --test-mode\nbash tool/visualize.sh output/dependency_graphs/dependency_graph.json output/dependency_graphs/dependency_graph_visualization.png\n```\n\n## Depreciated Tests\n\nTest Completeness\n```bash\npython test/evaluator/test_completeness.py data/downloaded_repos/AutoSurvey/src/agents/judge.py\n```\n\nTest reader-searcher communication.\n\n```bash\npython test/agent/depreciated_test_orchestrator.py --mode reader-searcher --verbose-context\n```\n\nRemove all docstrings from a repository.\n```bash\n./tool/remove_docstrings.sh <repo_path>\n```\n\nTest hierarchical generation.\n```bash\npython generate_docstrings.py --repo-path data/test_repo_vm --test-mode\n```\n\nVisualize dependency graph.\n```bash\nbash tool/visualize.sh\n```\n\n\n\n# Installation\n\n\n## Create Config File\n\nCreate a config folder `config/` and a config file `agent_config.yaml`under `config/`. e.g. `config/agent_config.yaml`:\n\nThe structure of config is as follows:\n```bash\nllm:\n  type: \"claude\"  # Options: openai, claude, huggingface\n  api_key: \"your-anthropic-api-key-here\"  # Replace with your Anthropic API key\n  model: \"claude-3-5-haiku-latest\"\n  temperature: 0.1\n  max_output_tokens: 4096\n\n# Flow control parameters\nflow_control:\n  max_reader_search_attempts: 2  # Maximum times reader can call searcher\n  max_verifier_rejections: 3     # Maximum times verifier can reject a docstring\n  status_sleep_time: 3           # Time to sleep between status updates (seconds)\n\n# Perplexity API configuration\nperplexity:\n  api_key: \"your-perplexity-api-key-here\"  # Replace with your Perplexity API key\n  model: \"sonar\"  # Default model\n  temperature: 0.1\n  max_output_tokens: 4096\n\n```\n\n## Installation\n\n### Basic Installation\nTo install the basic package with core dependencies:\n\n```bash\npip install -e .\n```\n\n### Install with Additional Features\n\nYou can install the package with additional optional dependencies:\n\n```bash\n# For development tools (pytest, black, flake8)\npip install -e \".[dev]\"\n\n# For web UI components\npip install -e \".[web]\"\n\n# For visualization tools\npip install -e \".[visualization]\"\n\n# For CUDA support\npip install -e \".[cuda]\"\n\n# For all optional dependencies\npip install -e \".[all]\"\n```\n\nYou can also combine multiple optional dependencies:\n\n```bash\npip install -e \".[web,visualization]\"\n```\n\n## Access Web UI from Local (if running on remote server)\n\n## Running Docstring Generation System\n\nIn remote:\n```bash\n\npython run_web_ui.py --host 0.0.0.0 --port 5000\n```\nThis tells Flask to listen on all network interfaces, not just the loopback interface.\n\nIn local:\n```bash\nssh -L 5000:localhost:5000 <remote_host>\n```\n\nFor example, for devserver, `ssh -L 5000:localhost:5000 dayuyang@devgpu003.rva5.facebook.com`.\nThis command creates a tunnel from your local port 5000 to port 5000 on the remote server. After running this command, you can open your browser and go to http://localhost:5000 to access the web interface running on the remote server.\n\n\nkill any program running on port 5000:\n```bash\nlsof -i :5000 | awk 'NR>1 {print $2}' | sort -u | xargs -r kill\n```\n\n## Running Docstring Eval System\n\nIn remote:\n```bash\npython src/web_eval/app.py --host 0.0.0.0 --port 5001\n```\n\nIn local:\n```\nssh -L 5001:localhost:5001 dayuyang@devgpu003.rva5.facebook.com\n\n```\n\n\nIf run both:\n\nin local, run:\n```bash\nssh -L 5000:localhost:5000 -L 5001:localhost:5001 -L 5002:localhost:5002 dayuyang@devgpu003.rva5.facebook.com\n\n# 5002 for backup\n```\n\n\n## Serve local LLM\n\nFirst install vllm\n```bash\npip install vllm\n```\n\nRun `bash serve_local_llm.sh`\n\n\n# Concerns\n\n- hierachical generation\n    - Circular dependencies\n\n- Import from external source?\n    - assuming \"external source\" is well-known library and LLM should already know about it?\n\n# TO FIX/ADD/IMPROVE\n\n- If already has docstring, skip. (or al least give an option to skip)\n\n- Improve generation instructions (the LLM will strictly follow the instructions, leading the generation usually too long.)\n    - high-quality docstring also needs to be concise.\n\n- add time out warning (if stucked...)\n\n- Claude has rate limit: `50,000 input tokens per minute per organization`\n\n\n- add error handling capability system wise\n\n- add price calculation\n\n- simple code will no need to use this tool.\n    - before system, a small LLM/ determinstic way to determinate if using the system. (balance between efficiency and effectiveness)\n\nnow the logic is file-level, function-level, method-level, class-level.\n\n\n# Evaluator\n\n\n\n## Completeness\n\n\n## Helpfulness\n\nFor Summary, Description, Arguments, Parameters, Attributes, each docstring component is evaluated on a 1-5 scale (POOR to EXCELLENT):\n\nFor Example, the docstring is evaluated on Binary scale (0 or 1).\n- Evaluates if docstring examples enable users to correctly use the code by comparing predicted usage against ground truth.\n\n\n\n# Vulnerability\n\n1. helpfulness description, when class is too long, may need truncate.\n2. when evaluating parameters/arguments/attributes, the input context (class/function signatures) should be reasonably sized to avoid LLM token limits.\n\n\n\n# Logic Control flow (process function under orchestrator.py)\n\nonce searcher is called, reader's memory is refreshed.\n\nonce more context is needed by judge from verifier, writer, verifier's memory is refreshed.\n\n# Note\n\nWhen evaluating examples, the signature must contain decorator `@staticmethod` or `@classmethod`.\n\nwhen writing docstring for class, first write docstring for __init__ method, then write docstring for other methods, finally write docstring for class. (provide full class code as code component when writing docstring for class)\n\nMethod is extremely difficult to handle. Now only support self.method(), instance.method() and ClassName.method(). See `get_child_method` under `CallGraphBuilder` for more details.\n\n\nError handle: ask reader: if \"XXX is not accessible\", do not ask the same code component again. If unsuccessful, callgraphbuilder will return something like \"XXX is not accessible\".\n\nfor LLM generated docstring, no triple quotes (\\\"\\\"\\\") is added originally.\n\n\nFor generate_docstrings.py. Add features:\nMultiple Passes (\"Category\" Approach):\n• We split docstring generation for each file into three passes, in this order:\n(a) Top-level functions (i.e., \"function\")\n(b) Methods inside classes (i.e., \"method\")\n(c) Classes (i.e., \"class\")\nEach pass visits all .py files in the repo.\nImmediate File Rewrite After Each Code Component:\n• In each pass, we repeatedly parse a file, gather all code components of the chosen category in ascending line order, pick off the first component, generate a docstring, and immediately rewrite the file. Then we re-parse the updated file before moving on to the next component.\n• This ensures that each code component is added in the final version of the file before the next code component's docstring is generated. It also meets the request for \"refresh the python file after each generation for a single code component.\"\n\nHowever, this approach is more computationally expensive than generating all docstrings in memory and rewriting once per file, but it achieves the desired incremental rewriting and strict function → method → class ordering.\n\n\nWhy the python file is not updated after the docstring is generated for each code component?\n- because updating file needs re-parsing the file and rebuild the AST, which is expensive.\n\n\n\nDependency clarification for\n\n# Future Work\n\nhuman in the loop:\n- human can be the judge and can provide more information to the system.\n"
  },
  {
    "path": "src/DocstringGenerator.egg-info/SOURCES.txt",
    "content": "README.md\nsetup.py\nsrc/DocstringGenerator.egg-info/PKG-INFO\nsrc/DocstringGenerator.egg-info/SOURCES.txt\nsrc/DocstringGenerator.egg-info/dependency_links.txt\nsrc/DocstringGenerator.egg-info/requires.txt\nsrc/DocstringGenerator.egg-info/top_level.txt\nsrc/agent/__init__.py\nsrc/agent/base.py\nsrc/agent/orchestrator.py\nsrc/agent/reader.py\nsrc/agent/searcher.py\nsrc/agent/verifier.py\nsrc/agent/workflow.py\nsrc/agent/writer.py\nsrc/agent/llm/__init__.py\nsrc/agent/llm/base.py\nsrc/agent/llm/claude_llm.py\nsrc/agent/llm/factory.py\nsrc/agent/llm/gemini_llm.py\nsrc/agent/llm/huggingface_llm.py\nsrc/agent/llm/openai_llm.py\nsrc/agent/llm/rate_limiter.py\nsrc/dependency_analyzer/__init__.py\nsrc/dependency_analyzer/ast_parser.py\nsrc/dependency_analyzer/topo_sort.py\nsrc/evaluator/__init__.py\nsrc/evaluator/base.py\nsrc/evaluator/completeness.py\nsrc/evaluator/evaluation_common.py\nsrc/evaluator/helpfulness_attributes.py\nsrc/evaluator/helpfulness_description.py\nsrc/evaluator/helpfulness_evaluator.py\nsrc/evaluator/helpfulness_evaluator_ablation.py\nsrc/evaluator/helpfulness_examples.py\nsrc/evaluator/helpfulness_parameters.py\nsrc/evaluator/helpfulness_summary.py\nsrc/evaluator/segment.py\nsrc/evaluator/truthfulness.py\nsrc/visualizer/__init__.py\nsrc/visualizer/progress.py\nsrc/visualizer/status.py\nsrc/visualizer/web_bridge.py\nsrc/web/__init__.py\nsrc/web/app.py\nsrc/web/config_handler.py\nsrc/web/process_handler.py\nsrc/web/run.py\nsrc/web/visualization_handler.py"
  },
  {
    "path": "src/DocstringGenerator.egg-info/dependency_links.txt",
    "content": "\n"
  },
  {
    "path": "src/DocstringGenerator.egg-info/requires.txt",
    "content": "numpy>=1.23.5\npyyaml>=6.0\njinja2>=3.1.5\nrequests>=2.32.0\nurllib3>=2.3.0\nastor>=0.8.1\ncode2flow>=2.5.1\npydeps>=3.0.0\nanthropic>=0.45.0\nopenai>=1.60.1\nlangchain-anthropic>=0.3.4\nlangchain-openai>=0.3.2\nlangchain-core>=0.3.31\nlanggraph>=0.2.67\ntiktoken>=0.8.0\ntransformers>=4.48.0\nhuggingface-hub>=0.28.0\ngoogle-generativeai>=0.6.0\ntqdm>=4.67.1\ntabulate>=0.9.0\ncolorama>=0.4.6\ntermcolor>=2.5.0\npydantic>=2.10.0\nflask>=3.1.0\nflask-socketio>=5.5.1\neventlet>=0.39.0\npython-socketio>=5.12.1\npython-engineio>=4.11.2\nbidict>=0.23.0\ndnspython>=2.7.0\nsix>=1.16.0\ntorch>=2.0.0\naccelerate>=1.4.0\n\n[all]\npytest>=8.3.4\npytest-cov>=2.0\nblack>=22.0\nflake8>=3.9\nflask>=3.1.0\nflask-socketio>=5.5.1\neventlet>=0.39.0\npython-socketio>=5.12.1\npython-engineio>=4.11.2\nbidict>=0.23.0\ndnspython>=2.7.0\nsix>=1.16.0\nmatplotlib>=3.10.0\npygraphviz>=1.14\nnetworkx>=3.4.2\ntorch>=2.0.0\naccelerate>=1.4.0\n\n[cuda]\ntorch>=2.0.0\naccelerate>=1.4.0\n\n[dev]\npytest>=8.3.4\npytest-cov>=2.0\nblack>=22.0\nflake8>=3.9\n\n[visualization]\nmatplotlib>=3.10.0\npygraphviz>=1.14\nnetworkx>=3.4.2\n\n[web]\nflask>=3.1.0\nflask-socketio>=5.5.1\neventlet>=0.39.0\npython-socketio>=5.12.1\npython-engineio>=4.11.2\nbidict>=0.23.0\ndnspython>=2.7.0\nsix>=1.16.0\n"
  },
  {
    "path": "src/DocstringGenerator.egg-info/top_level.txt",
    "content": "agent\ndependency_analyzer\nevaluator\nvisualizer\nweb\n"
  },
  {
    "path": "src/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n"
  },
  {
    "path": "src/agent/README.md",
    "content": "# Agent Framework for Docstring Generation\n\nThis directory contains the core components of the multi-agent system responsible for generating high-quality docstrings for code components.\n\n## Overview\n\nThe system employs a collaborative workflow involving several specialized agents, managed by an Orchestrator. The goal is to analyze code, gather necessary context (both internal and external), generate a docstring, and verify its quality before finalizing.\n\nThe main workflow is initiated via the `generate_docstring` function in `workflow.py`.\n\n## Agents\n\n1.  **`BaseAgent` (`base.py`)**\n    *   **Role:** Abstract base class for all agents.\n    *   **Functionality:** Provides common infrastructure including LLM initialization (using `LLMFactory`), configuration loading, memory management (storing conversation history), and basic LLM interaction (`generate_response`). Ensures consistency across agents.\n\n2.  **`Reader` (`reader.py`)**\n    *   **Role:** Contextual Analysis and Information Needs Assessment.\n    *   **Functionality:** Analyzes the input code component (`focal_component`) and any existing context. Determines if additional information is required to write a comprehensive docstring. If more information is needed, it generates a structured request specifying whether internal codebase details (e.g., callers, callees) or external web search results are required.\n\n3.  **`Searcher` (`searcher.py`)**\n    *   **Role:** Information Retrieval.\n    *   **Functionality:** Acts upon the requests generated by the `Reader`. It retrieves the specified information by:\n        *   Querying the internal codebase using AST analysis (`ASTNodeAnalyzer`) and dependency graphs.\n        *   Performing external web searches via APIs (e.g., `PerplexityAPI`).\n    *   Returns the gathered context in a structured format.\n\n4.  **`Writer` (`writer.py`)**\n    *   **Role:** Docstring Generation.\n    *   **Functionality:** Takes the original code component and the accumulated context (provided by the `Orchestrator` after `Reader` and `Searcher` steps) as input. Uses its configured LLM and detailed prompts (tailored for classes vs. functions/methods, adhering to Google style guide) to generate the docstring. Outputs the generated docstring within specific XML tags (`<DOCSTRING>`).\n\n5.  **`Verifier` (`verifier.py`)**\n    *   **Role:** Quality Assurance.\n    *   **Functionality:** Evaluates the docstring produced by the `Writer` against the original code and the context used. Checks for clarity, accuracy, completeness, information value (avoiding redundancy), and appropriate level of detail. Determines if the docstring meets quality standards or requires revision. If revision is needed, it specifies whether more context is required or provides direct suggestions for improvement.\n\n6.  **`Orchestrator` (`orchestrator.py`)**\n    *   **Role:** Workflow Management.\n    *   **Functionality:** Coordinates the entire process. It manages the sequence of agent interactions:\n        *   Calls `Reader` to assess context needs.\n        *   Calls `Searcher` iteratively if more context is requested (up to a limit).\n        *   Calls `Writer` to generate the docstring.\n        *   Calls `Verifier` to evaluate the docstring.\n        *   Manages revision loops based on `Verifier` feedback, potentially involving further searches or refinement by the `Writer` (up to a limit).\n    *   Handles context accumulation, token limit constraints, and status visualization.\n\n## Supporting Files\n\n*   **`workflow.py`:** Provides the primary entry point function `generate_docstring` to initiate the docstring generation process for a given code component.\n*   **`__init__.py`:** Makes the `agent` directory a Python package.\n*   **`llm/`:** Contains LLM-related code, including the `LLMFactory` and base LLM classes.\n*   **`tool/`:** Contains tools used by agents, such as the `ASTNodeAnalyzer` for internal code traversal and the `PerplexityAPI` wrapper for external search. "
  },
  {
    "path": "src/agent/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n# Import only essential components to avoid circular imports\nfrom .reader import CodeComponentType\n\n# Explicitly list what should be accessible, but don't import until needed\n# to prevent circular imports\n__all__ = ['generate_docstring', 'CodeComponentType']\n\n# Lazy load generate_docstring when it's actually needed\ndef __getattr__(name):\n    if name == 'generate_docstring':\n        from .workflow import generate_docstring\n        return generate_docstring\n    raise AttributeError(f\"module '{__name__}' has no attribute '{name}'\") "
  },
  {
    "path": "src/agent/base.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom abc import ABC, abstractmethod\nfrom typing import Any, Dict, Optional, List\nimport os\nfrom pathlib import Path\n\nfrom .llm.factory import LLMFactory\nfrom .llm.base import BaseLLM\n\nclass BaseAgent(ABC):\n    \"\"\"Base class for all agents in the docstring generation system.\"\"\"\n    \n    def __init__(self, name: str, config_path: Optional[str] = None):\n        \"\"\"Initialize the base agent.\n        \n        Args:\n            name: The name of the agent\n            config_path: Optional path to the configuration file\n        \"\"\"\n        self.name = name\n        self._memory: list[Dict[str, Any]] = []\n        \n        # Initialize LLM and parameters from config\n        self.llm, self.llm_params = self._initialize_llm(name, config_path)\n\n    \n    def _initialize_llm(self, agent_name: str, config_path: Optional[str] = None) -> tuple[BaseLLM, Dict[str, Any]]:\n        \"\"\"Initialize the LLM for this agent.\n        \n        Args:\n            agent_name: Name of the agent\n            config_path: Optional path to the configuration file\n            \n        Returns:\n            Tuple of (Initialized LLM instance, LLM parameters dictionary)\n        \"\"\"\n        # Load configuration\n        if config_path is None:\n            config_path = \"config/agent_config.yaml\"\n            print(f\"Using default config from {config_path}\")\n            \n        config = LLMFactory.load_config(config_path)\n        \n        # Check for agent-specific configuration\n        agent_config = config.get(\"agent_llms\", {}).get(agent_name.lower())\n        \n        # Use agent-specific config if available, otherwise use default\n        llm_config = agent_config if agent_config else config.get(\"llm\", {})\n        \n        # Verify api_key is provided in config\n        if (\"api_key\" not in llm_config or not llm_config[\"api_key\"]) and (llm_config[\"type\"] not in [\"huggingface\", \"local\"]):\n            raise ValueError(\"API key must be specified directly in the config file\")\n\n        # Extract LLM parameters\n        llm_params = {\n            \"max_output_tokens\": llm_config.get(\"max_output_tokens\", 4096),\n            \"temperature\": llm_config.get(\"temperature\", 0.1),\n            \"model\": llm_config.get(\"model\")\n        }\n\n        return LLMFactory.create_llm(llm_config), llm_params\n    \n    def add_to_memory(self, role: str, content: str) -> None:\n        \"\"\"Add a message to the agent's memory.\n        \n        Args:\n            role: The role of the message sender (e.g., 'system', 'user', 'assistant')\n            content: The content of the message\n        \"\"\"\n        assert content is not None and content != \"\", \"Content cannot be empty\"\n        self._memory.append(self.llm.format_message(role, content))\n    \n    def refresh_memory(self, new_memory: list[Dict[str, Any]]) -> None:\n        \"\"\"Replace the current memory with new memory.\n        \n        Args:\n            new_memory: The new memory to replace the current memory\n        \"\"\"\n        self._memory = [\n            self.llm.format_message(msg[\"role\"], msg[\"content\"])\n            for msg in new_memory\n        ]\n    \n    def clear_memory(self) -> None:\n        \"\"\"Clear the agent's memory.\"\"\"\n        self._memory = []\n    \n    @property\n    def memory(self) -> list[Dict[str, Any]]:\n        \"\"\"Get the agent's memory.\n        \n        Returns:\n            The agent's memory as a list of message dictionaries\n        \"\"\"\n        return self._memory.copy()\n    \n    def generate_response(self, messages: Optional[List[Dict[str, Any]]] = None) -> str:\n        \"\"\"Generate a response using the agent's LLM and memory.\n        \n        Args:\n            messages: Optional list of messages to use instead of memory\n            \n        Returns:\n            Generated response text\n        \"\"\"\n        return self.llm.generate(\n            messages=messages if messages is not None else self._memory,\n            temperature=self.llm_params[\"temperature\"],\n            max_tokens=self.llm_params[\"max_output_tokens\"]\n        )\n    \n    @abstractmethod\n    def process(self, *args, **kwargs) -> Any:\n        \"\"\"Process the input and generate output.\n        \n        This method should be implemented by each specific agent.\n        \"\"\"\n        pass "
  },
  {
    "path": "src/agent/llm/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom .base import BaseLLM\nfrom .openai_llm import OpenAILLM\nfrom .claude_llm import ClaudeLLM\nfrom .huggingface_llm import HuggingFaceLLM\nfrom .gemini_llm import GeminiLLM\nfrom .factory import LLMFactory\n\n__all__ = [\n    'BaseLLM',\n    'OpenAILLM',\n    'ClaudeLLM',\n    'HuggingFaceLLM',\n    'GeminiLLM',\n    'LLMFactory'\n] "
  },
  {
    "path": "src/agent/llm/base.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom abc import ABC, abstractmethod\nfrom typing import List, Dict, Any, Optional\n\nclass BaseLLM(ABC):\n    \"\"\"Base class for LLM wrappers.\"\"\"\n    \n    @abstractmethod\n    def generate(\n        self,\n        messages: List[Dict[str, str]],\n        temperature: float = 0.7,\n        max_output_tokens: Optional[int] = None\n    ) -> str:\n        \"\"\"Generate a response from the LLM.\n        \n        Args:\n            messages: List of message dictionaries with 'role' and 'content' keys\n            temperature: Sampling temperature (0.0 to 1.0)\n            max_output_tokens: Maximum number of tokens to generate\n            \n        Returns:\n            The generated response text\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def format_message(self, role: str, content: str) -> Dict[str, str]:\n        \"\"\"Format a message for the specific LLM API.\n        \n        Args:\n            role: The role of the message sender\n            content: The content of the message\n            \n        Returns:\n            Formatted message dictionary\n        \"\"\"\n        pass "
  },
  {
    "path": "src/agent/llm/claude_llm.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import List, Dict, Any, Optional\nimport anthropic\nfrom .base import BaseLLM\nfrom .rate_limiter import RateLimiter\nimport logging\n\nclass ClaudeLLM(BaseLLM):\n    \"\"\"Anthropic Claude API wrapper.\"\"\"\n    \n    def __init__(\n        self,\n        api_key: str,\n        model: str,\n        rate_limits: Optional[Dict[str, Any]] = None\n    ):\n        \"\"\"Initialize Claude LLM.\n        \n        Args:\n            api_key: Anthropic API key\n            model: Model identifier (e.g., \"claude-3-sonnet-20240229\")\n            rate_limits: Optional dictionary with rate limit settings\n        \"\"\"\n        self.client = anthropic.Anthropic(api_key=api_key)\n        self.model = model\n        \n        # Default rate limits for Claude 3.7 Sonnet\n        default_limits = {\n            \"requests_per_minute\": 50,\n            \"input_tokens_per_minute\": 20000,\n            \"output_tokens_per_minute\": 8000,\n            \"input_token_price_per_million\": 3.0,\n            \"output_token_price_per_million\": 15.0\n        }\n        \n        # Use provided rate limits or defaults\n        limits = rate_limits or default_limits\n        \n        # Initialize rate limiter\n        self.rate_limiter = RateLimiter(\n            provider=\"Claude\",\n            requests_per_minute=limits.get(\"requests_per_minute\", default_limits[\"requests_per_minute\"]),\n            input_tokens_per_minute=limits.get(\"input_tokens_per_minute\", default_limits[\"input_tokens_per_minute\"]),\n            output_tokens_per_minute=limits.get(\"output_tokens_per_minute\", default_limits[\"output_tokens_per_minute\"]),\n            input_token_price_per_million=limits.get(\"input_token_price_per_million\", default_limits[\"input_token_price_per_million\"]),\n            output_token_price_per_million=limits.get(\"output_token_price_per_million\", default_limits[\"output_token_price_per_million\"])\n        )\n    \n    def _count_tokens(self, text: str) -> int:\n        \"\"\"Count tokens in a string using Claude's tokenizer.\n        \n        Args:\n            text: Text to count tokens for\n            \n        Returns:\n            Token count\n        \"\"\"\n        if not text:\n            return 0\n            \n        try:\n            # Format text as a message for token counting\n            count = self.client.beta.messages.count_tokens(\n                model=self.model,\n                messages=[\n                    {\"role\": \"user\", \"content\": text}\n                ]\n            )\n            return count.input_tokens\n        except Exception as e:\n            # Log the error but don't fail\n            logging.warning(f\"Failed to count tokens with Claude tokenizer: {e}\")\n            # Fallback: rough estimate if tokenizer fails\n            return len(text.split()) * 1.3\n    \n    def _count_messages_tokens(self, messages: List[Dict[str, str]], system_message: Optional[str] = None) -> int:\n        \"\"\"Count tokens in message list with optional system message.\n        \n        Args:\n            messages: List of message dictionaries\n            system_message: Optional system message\n            \n        Returns:\n            Total token count\n        \"\"\"\n        if not messages:\n            return 0\n            \n        # Convert messages to Claude format\n        claude_messages = [self._convert_to_claude_message(msg) for msg in messages \n                          if msg[\"role\"] != \"system\"]\n        \n        # Format system message if provided\n        system_content = None\n        if system_message:\n            system_content = system_message\n        \n        try:\n            # Use the API to count tokens for all messages at once\n            count = self.client.beta.messages.count_tokens(\n                model=self.model,\n                messages=claude_messages,\n                system=system_content\n            )\n            return count.input_tokens\n        except Exception as e:\n            # Log the error but don't fail\n            logging.warning(f\"Failed to count tokens with Claude tokenizer: {e}\")\n            \n            # Fallback: count tokens individually\n            total_tokens = 0\n            for msg in claude_messages:\n                if \"content\" in msg and msg[\"content\"]:\n                    total_tokens += self._count_tokens(msg[\"content\"])\n            \n            # Add system message tokens if provided\n            if system_message:\n                total_tokens += self._count_tokens(system_message)\n                \n            # Add overhead for message formatting\n            total_tokens += 10 * len(claude_messages)  # Add ~10 tokens per message for formatting\n            \n            return total_tokens\n    \n    def generate(\n        self,\n        messages: List[Dict[str, str]],\n        temperature: float,\n        max_tokens: Optional[int]\n    ) -> str:\n        \"\"\"Generate a response using Claude API with rate limiting.\n        \n        Args:\n            messages: List of message dictionaries\n            temperature: Sampling temperature\n            max_output_tokens: Maximum tokens to generate\n            \n        Returns:\n            Generated response text\n        \"\"\"\n        # Extract system message if present\n        system_message = None\n        chat_messages = []\n        \n        for msg in messages:\n            if msg[\"role\"] == \"system\":\n                system_message = msg[\"content\"]\n            else:\n                chat_messages.append(self._convert_to_claude_message(msg))\n        \n        # Count input tokens\n        input_tokens = self._count_messages_tokens(messages, system_message)\n        \n        # Wait if we're approaching rate limits (estimate output tokens as max_output_tokens)\n        self.rate_limiter.wait_if_needed(input_tokens, max_tokens)\n        \n        # Make the API call\n        response = self.client.messages.create(\n            model=self.model,\n            messages=chat_messages,\n            system=system_message,\n            temperature=temperature,\n            max_tokens=max_tokens\n        )\n        \n        result_text = response.content[0].text\n        \n        # Count output tokens and record request\n        output_tokens = self._count_tokens(result_text)\n        self.rate_limiter.record_request(input_tokens, output_tokens)\n        \n        return result_text\n    \n    def format_message(self, role: str, content: str) -> Dict[str, str]:\n        \"\"\"Format message for Claude API.\n        \n        Args:\n            role: Message role (system, user, assistant)\n            content: Message content\n            \n        Returns:\n            Formatted message dictionary\n        \"\"\"\n        # Store in standard format, conversion happens in generate()\n        return {\"role\": role, \"content\": content}\n    \n    def _convert_to_claude_message(self, message: Dict[str, str]) -> Dict[str, str]:\n        \"\"\"Convert standard message format to Claude's format.\n        \n        Args:\n            message: Standard format message\n            \n        Returns:\n            Claude format message\n        \"\"\"\n        role_mapping = {\n            \"user\": \"user\",\n            \"assistant\": \"assistant\"\n        }\n        \n        role = role_mapping[message[\"role\"]]\n        content = message[\"content\"]\n        \n        return {\"role\": role, \"content\": content} "
  },
  {
    "path": "src/agent/llm/factory.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, Optional\nfrom pathlib import Path\nimport yaml\n\nfrom .base import BaseLLM\nfrom .openai_llm import OpenAILLM\nfrom .claude_llm import ClaudeLLM\nfrom .huggingface_llm import HuggingFaceLLM\nfrom .gemini_llm import GeminiLLM\n\nclass LLMFactory:\n    \"\"\"Factory class for creating LLM instances.\"\"\"\n    \n    @staticmethod\n    def create_llm(config: Dict[str, Any]) -> BaseLLM:\n        \"\"\"Create an LLM instance based on configuration.\n        \n        Args:\n            config: Configuration dictionary containing LLM settings\n            \n        Returns:\n            An instance of BaseLLM\n            \n        Raises:\n            ValueError: If the LLM type is not supported\n        \"\"\"\n        llm_type = config[\"type\"].lower()\n        model = config.get(\"model\")\n        \n        if not model:\n            raise ValueError(\"Model must be specified in the config file\")\n        \n        # Extract rate limit settings from config\n        # First check if there are specific rate limits in the LLM config\n        rate_limits = config.get(\"rate_limits\", {})\n        \n        # If not, check if there are global rate limits for this provider type\n        global_config = LLMFactory.load_config()\n        if not rate_limits and \"rate_limits\" in global_config:\n            # Map LLM types to provider names in rate_limits section\n            provider_map = {\n                \"openai\": \"openai\",\n                \"claude\": \"claude\",\n                \"gemini\": \"gemini\"\n            }\n            provider_key = provider_map.get(llm_type, llm_type)\n            provider_limits = global_config.get(\"rate_limits\", {}).get(provider_key, {})\n            if provider_limits:\n                rate_limits = provider_limits\n        \n        if llm_type == \"openai\":\n            return OpenAILLM(\n                api_key=config[\"api_key\"],\n                model=model,\n                rate_limits=rate_limits\n            )\n        elif llm_type == \"claude\":\n            return ClaudeLLM(\n                api_key=config[\"api_key\"],\n                model=model,\n                rate_limits=rate_limits\n            )\n        elif llm_type == \"gemini\":\n            return GeminiLLM(\n                api_key=config[\"api_key\"],\n                model=model,\n                rate_limits=rate_limits\n            )\n        elif llm_type == \"huggingface\":\n            return HuggingFaceLLM(\n                model_name=model,\n                device=config.get(\"device\", \"cuda\"),\n                torch_dtype=config.get(\"torch_dtype\", \"float16\")\n            )\n        else:\n            raise ValueError(f\"Unsupported LLM type: {llm_type}\")\n    \n    @staticmethod\n    def load_config(config_path: Optional[str] = None) -> Dict[str, Any]:\n        \"\"\"Load LLM configuration from file.\n        \n        Args:\n            config_path: Path to the configuration file. If None, uses default path.\n            \n        Returns:\n            Configuration dictionary\n            \n        Raises:\n            FileNotFoundError: If the configuration file doesn't exist\n        \"\"\"\n        if config_path is None:\n            config_path = str(Path(__file__).parent.parent.parent.parent / \"config\" / \"agent_config.yaml\")\n        \n        if not Path(config_path).exists():\n            raise FileNotFoundError(f\"Configuration file not found: {config_path}\")\n        \n        with open(config_path, 'r') as f:\n            config = yaml.safe_load(f)\n        \n        return config "
  },
  {
    "path": "src/agent/llm/gemini_llm.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import List, Dict, Any, Optional\nimport tiktoken\nimport google.generativeai as genai\nfrom .base import BaseLLM\nfrom .rate_limiter import RateLimiter\n\nclass GeminiLLM(BaseLLM):\n    \"\"\"Google Gemini API wrapper.\"\"\"\n    \n    def __init__(\n        self,\n        api_key: str,\n        model: str,\n        rate_limits: Optional[Dict[str, Any]] = None\n    ):\n        \"\"\"Initialize Gemini LLM.\n        \n        Args:\n            api_key: Google API key\n            model: Model identifier (e.g., \"gemini-1.5-flash\", \"gemini-1.5-pro\")\n            rate_limits: Optional dictionary with rate limit settings\n        \"\"\"\n        genai.configure(api_key=api_key)\n        self.model_name = model\n        self.model = genai.GenerativeModel(model)\n        \n        try:\n            # Initialize tokenizer for token counting\n            # Gemini doesn't have a direct tokenizer in the public API\n            # Using tiktoken cl100k_base as a reasonable approximation\n            self.tokenizer = tiktoken.get_encoding(\"cl100k_base\")\n        except:\n            # Fallback to basic word counting if tokenizer fails\n            self.tokenizer = None\n        \n        # Default rate limits for Gemini (adjust based on actual API limits)\n        default_limits = {\n            \"requests_per_minute\": 60,\n            \"input_tokens_per_minute\": 100000,\n            \"output_tokens_per_minute\": 50000,\n            \"input_token_price_per_million\": 0.125,  # Approximate for gemini-1.5-flash\n            \"output_token_price_per_million\": 0.375  # Approximate for gemini-1.5-flash\n        }\n        \n        # Use provided rate limits or defaults\n        limits = rate_limits or default_limits\n        \n        # Initialize rate limiter\n        self.rate_limiter = RateLimiter(\n            provider=\"Gemini\",\n            requests_per_minute=limits.get(\"requests_per_minute\", default_limits[\"requests_per_minute\"]),\n            input_tokens_per_minute=limits.get(\"input_tokens_per_minute\", default_limits[\"input_tokens_per_minute\"]),\n            output_tokens_per_minute=limits.get(\"output_tokens_per_minute\", default_limits[\"output_tokens_per_minute\"]),\n            input_token_price_per_million=limits.get(\"input_token_price_per_million\", default_limits[\"input_token_price_per_million\"]),\n            output_token_price_per_million=limits.get(\"output_token_price_per_million\", default_limits[\"output_token_price_per_million\"])\n        )\n    \n    def _count_tokens(self, text: str) -> int:\n        \"\"\"Count tokens in a string using the model's tokenizer.\n        \n        Args:\n            text: Text to count tokens for\n            \n        Returns:\n            Token count\n        \"\"\"\n        if not text:\n            return 0\n            \n        try:\n            if self.tokenizer:\n                return len(self.tokenizer.encode(text))\n            else:\n                # Fallback: rough estimate if tokenizer not available\n                return len(text.split()) * 1.3\n        except Exception as e:\n            # Log the error but don't fail\n            import logging\n            logging.warning(f\"Failed to count tokens for Gemini: {e}\")\n            # Fallback: rough estimate if tokenizer fails\n            return len(text.split()) * 1.3\n    \n    def _count_messages_tokens(self, messages: List[Dict[str, str]]) -> int:\n        \"\"\"Count tokens in all messages.\n        \n        Args:\n            messages: List of message dictionaries\n            \n        Returns:\n            Total token count\n        \"\"\"\n        if not messages:\n            return 0\n            \n        total_tokens = 0\n        \n        # Count tokens in each message\n        for message in messages:\n            if \"content\" in message and message[\"content\"]:\n                total_tokens += self._count_tokens(message[\"content\"])\n            \n        # Add overhead for message formatting (estimated)\n        total_tokens += 4 * len(messages)\n        \n        return total_tokens\n    \n    def _convert_messages_to_gemini_format(self, messages: List[Dict[str, str]]) -> List[Dict[str, str]]:\n        \"\"\"Convert standard message format to Gemini-specific format.\n        \n        Args:\n            messages: List of message dictionaries with 'role' and 'content' keys\n            \n        Returns:\n            List of Gemini-formatted messages\n        \"\"\"\n        gemini_messages = []\n        \n        # Gemini uses \"user\" and \"model\" for roles\n        role_mapping = {\n            \"user\": \"user\",\n            \"assistant\": \"model\",\n            \"system\": \"user\"  # Gemini doesn't have a system role, handle specifically\n        }\n        \n        # Check if first message is a system message\n        if messages and messages[0].get(\"role\") == \"system\":\n            # For system message, we'll add it as a user message with a prefix\n            system_content = messages[0].get(\"content\", \"\")\n            if system_content:\n                # Add the rest of the messages\n                for message in messages[1:]:\n                    role = role_mapping.get(message.get(\"role\", \"user\"), \"user\")\n                    content = message.get(\"content\", \"\")\n                    gemini_messages.append({\"role\": role, \"parts\": content})\n        else:\n            # No system message, just convert roles\n            for message in messages:\n                role = role_mapping.get(message.get(\"role\", \"user\"), \"user\")\n                content = message.get(\"content\", \"\")\n                gemini_messages.append({\"role\": role, \"parts\": content})\n        \n        return gemini_messages\n    \n    def generate(\n        self,\n        messages: List[Dict[str, str]],\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None\n    ) -> str:\n        \"\"\"Generate a response using Gemini API with rate limiting.\n        \n        Args:\n            messages: List of message dictionaries\n            temperature: Sampling temperature\n            max_output_tokens: Maximum tokens to generate\n            \n        Returns:\n            Generated response text\n        \"\"\"\n        # Count input tokens\n        input_tokens = self._count_messages_tokens(messages)\n        \n        # Wait if we're approaching rate limits\n        self.rate_limiter.wait_if_needed(input_tokens, max_tokens if max_tokens else 1000)\n        \n        # Format messages for Gemini API\n        gemini_messages = self._convert_messages_to_gemini_format(messages)\n        \n        # Check if we need to start a chat or just generate\n        if len(gemini_messages) > 1:\n            # Start a chat with history\n            history = gemini_messages[:-1]  # All but the last message\n            last_message = gemini_messages[-1]  # The last message to send\n            \n            chat = self.model.start_chat(\n                history=history,\n            )\n            \n            # Send the last message to get a response\n            response = chat.send_message(last_message.get(\"parts\", \"\"))\n            result_text = response.text\n        else:\n            # Single message, use generate_content\n            content = gemini_messages[0].get(\"parts\", \"\") if gemini_messages else \"\"\n        \n            response = self.model.generate_content(\n                content,\n                generation_config={\n                    \"temperature\": temperature,\n                    \"max_tokens\": max_tokens if max_tokens else None\n                }\n            )\n            \n            result_text = response.text\n        \n        # Estimate output tokens (Gemini API doesn't provide usage stats)\n        output_tokens = self._count_tokens(result_text)\n        \n        # Record the request\n        self.rate_limiter.record_request(input_tokens, output_tokens)\n        \n        return result_text\n    \n    def format_message(self, role: str, content: str) -> Dict[str, str]:\n        \"\"\"Format message for standard API.\n        \n        Args:\n            role: Message role (system, user, assistant)\n            content: Message content\n            \n        Returns:\n            Formatted message dictionary\n        \"\"\"\n        # Standard format - conversion to Gemini format happens in generate method\n        return {\"role\": role, \"content\": content}"
  },
  {
    "path": "src/agent/llm/huggingface_llm.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import List, Dict, Any, Optional\nfrom openai import OpenAI\nimport torch\nimport tiktoken\nfrom .base import BaseLLM\n\nclass HuggingFaceLLM(BaseLLM):\n    \"\"\"HuggingFace model wrapper using vLLM's OpenAI-compatible API.\"\"\"\n    \n    def __init__(\n        self,\n        model_name: str,\n        api_base: str = \"http://localhost:8000/v1\",\n        api_key: str = \"EMPTY\",\n        device: str = None,  # Kept for backward compatibility\n        torch_dtype: torch.dtype = None,  # Kept for backward compatibility\n        max_input_tokens: int = 10000  # Maximum input tokens allowed\n    ):\n        \"\"\"Initialize HuggingFace LLM via vLLM API.\n        \n        Args:\n            model_name: Name of the model\n            api_base: Base URL for the vLLM API endpoint\n            api_key: API key (typically \"EMPTY\" for local vLLM deployments)\n            device: Ignored (handled by vLLM server)\n            torch_dtype: Ignored (handled by vLLM server)\n            max_input_tokens: Maximum number of input tokens allowed\n        \"\"\"\n        self.model_name = model_name\n        self.client = OpenAI(\n            api_key=api_key,\n            base_url=api_base,\n        )\n        self.max_input_tokens = max_input_tokens\n        # Initialize tokenizer based on model\n        try:\n            self.tokenizer = tiktoken.encoding_for_model(model_name)\n        except KeyError:\n            # Fall back to cl100k_base for unknown models (used by GPT-4, GPT-3.5-turbo)\n            self.tokenizer = tiktoken.get_encoding(\"cl100k_base\")\n    \n    def _count_tokens(self, messages: List[Dict[str, str]]) -> int:\n        \"\"\"Count the number of tokens in a list of messages.\n        \n        Args:\n            messages: List of message dictionaries\n            \n        Returns:\n            Total token count\n        \"\"\"\n        token_count = 0\n        \n        for message in messages:\n            # Count tokens in content\n            token_count += len(self.tokenizer.encode(message[\"content\"]))\n            # Add overhead for message format (role, etc.)\n            token_count += 4  # Approximate tokens for message formatting\n            \n        # Add tokens for the formatting between messages\n        token_count += 2  # Final assistant message tokens\n        \n        return token_count\n    \n    def _truncate_messages(self, messages: List[Dict[str, str]]) -> List[Dict[str, str]]:\n        \"\"\"Truncate messages to stay within the token limit.\n        \n        Args:\n            messages: List of message dictionaries\n            \n        Returns:\n            Truncated list of message dictionaries\n        \"\"\"\n        if not messages:\n            return []\n            \n        system_messages = [m for m in messages if m[\"role\"].lower() == \"system\"]\n        non_system_messages = [m for m in messages if m[\"role\"].lower() != \"system\"]\n        \n        # Always keep system messages intact\n        result = system_messages.copy()\n        token_budget = self.max_input_tokens - self._count_tokens(result)\n        \n        # Process non-system messages from newest to oldest\n        for message in reversed(non_system_messages):\n            message_tokens = self._count_tokens([message])\n            \n            if message_tokens <= token_budget:\n                # We can include the entire message\n                result.insert(len(system_messages), message)\n                token_budget -= message_tokens\n            elif message[\"role\"].lower() == \"user\" and token_budget > 20:\n                # For user messages, we can truncate content if needed\n                # Keep enough tokens for comprehension (at least some portion)\n                content = message[\"content\"]\n                # Estimate how much content to keep\n                keep_ratio = token_budget / message_tokens\n                # Truncate from beginning to keep most recent content\n                if keep_ratio < 0.5:\n                    # If we need to cut more than half, add indicator of truncation\n                    truncated_content = f\"[...truncated...] {content[int(len(content) * (1 - keep_ratio + 0.1)):].strip()}\"\n                else:\n                    truncated_content = content[int(len(content) * (1 - keep_ratio)):].strip()\n                \n                truncated_message = {\n                    \"role\": message[\"role\"],\n                    \"content\": truncated_content\n                }\n                \n                # Verify the truncated message fits\n                truncated_tokens = self._count_tokens([truncated_message])\n                if truncated_tokens <= token_budget:\n                    result.insert(len(system_messages), truncated_message)\n                    token_budget -= truncated_tokens\n            \n            # If we can't fit any more messages, stop\n            if token_budget <= 20:  # Keep some buffer\n                break\n                \n        # Ensure the messages are in the correct order (system first, then chronological)\n        result.sort(key=lambda m: 0 if m[\"role\"].lower() == \"system\" else 1)\n        \n        return result\n    \n    def generate(\n        self,\n        messages: List[Dict[str, str]],\n        temperature: float,\n        max_tokens: Optional[int]\n    ) -> str:\n        \"\"\"Generate a response using the vLLM API.\n        \n        Args:\n            messages: List of message dictionaries\n            temperature: Sampling temperature\n            max_output_tokens: Maximum tokens to generate\n            \n        Returns:\n            Generated response text\n        \"\"\"\n        max_output_tokens = max_tokens if max_tokens is not None else self.max_output_tokens\n        # Check token count and truncate if needed\n        total_tokens = self._count_tokens(messages)\n        if total_tokens > self.max_input_tokens:\n            messages = self._truncate_messages(messages)\n            \n        # vLLM expects strictly alternating user/assistant roles with an optional system message at the beginning\n        # Prepare the messages with the proper format\n        formatted_messages = []\n        \n        # First, check for a system message to include at the beginning\n        system_messages = [m for m in messages if m[\"role\"].lower() == \"system\"]\n        if system_messages:\n            # Use the last system message if multiple exist\n            formatted_messages.append({\n                \"role\": \"system\",\n                \"content\": system_messages[-1][\"content\"]\n            })\n        \n        # Filter out system messages and process the rest\n        user_assistant_messages = [m for m in messages if m[\"role\"].lower() != \"system\"]\n        \n        # Ensure messages alternate between user and assistant\n        current_role = \"user\"  # Start with user message\n        \n        for message in user_assistant_messages:\n            role = message[\"role\"].lower()\n            \n            # Map roles to either user or assistant\n            if role in [\"user\", \"human\"]:\n                mapped_role = \"user\"\n            else:\n                mapped_role = \"assistant\"\n            \n            # If this message would create consecutive messages with the same role,\n            # skip adding it to avoid the alternating pattern error\n            if formatted_messages and mapped_role == formatted_messages[-1][\"role\"]:\n                continue\n            \n            # Add the properly mapped message\n            formatted_messages.append({\n                \"role\": mapped_role,\n                \"content\": message[\"content\"]\n            })\n        \n        # Make sure the last message is from the user, so the model will respond as assistant\n        if not formatted_messages or formatted_messages[-1][\"role\"] != \"user\":\n            # If we don't have any messages or the last one isn't from user, we need to add a user message\n            # Use an empty message or the last assistant message as context\n            formatted_messages.append({\n                \"role\": \"user\",\n                \"content\": \"Please continue.\" if not formatted_messages else \n                           f\"Based on your last response: '{formatted_messages[-1]['content']}', please continue.\"\n            })\n        \n        # Call the API\n        response = self.client.chat.completions.create(\n            model=self.model_name,\n            messages=formatted_messages,\n            temperature=temperature,\n            max_tokens=max_output_tokens\n        )\n        \n        # Extract the generated text\n        return response.choices[0].message.content\n    \n    def format_message(self, role: str, content: str) -> Dict[str, str]:\n        \"\"\"Format message for OpenAI API compatible format.\n        \n        Args:\n            role: Message role (system, user, assistant)\n            content: Message content\n            \n        Returns:\n            Formatted message dictionary\n        \"\"\"\n        # Map to standard OpenAI roles if needed\n        if role.lower() not in [\"system\", \"user\", \"assistant\"]:\n            if role.lower() in [\"human\"]:\n                role = \"user\"\n            elif role.lower() in [\"ai\", \"assistant\"]:\n                role = \"assistant\"\n            else:\n                # Default unexpected roles to user\n                role = \"user\"\n                \n        return {\"role\": role, \"content\": content}\n    \n    def _messages_to_prompt(self, messages: List[Dict[str, str]]) -> str:\n        \"\"\"Convert messages to a single prompt string.\n        \n        This method is kept for backward compatibility but is not used\n        in the API-based implementation.\n        \n        Args:\n            messages: List of message dictionaries\n            \n        Returns:\n            Formatted prompt string\n        \"\"\"\n        prompt_parts = []\n        \n        for message in messages:\n            role = message[\"role\"]\n            content = message[\"content\"]\n            \n            if role == \"system\":\n                prompt_parts.append(f\"System: {content}\")\n            elif role == \"user\":\n                prompt_parts.append(f\"Human: {content}\")\n            elif role == \"assistant\":\n                prompt_parts.append(f\"Assistant: {content}\")\n        \n        prompt_parts.append(\"Assistant: \")  # Add final prompt for generation\n        return \"\\n\".join(prompt_parts) "
  },
  {
    "path": "src/agent/llm/openai_llm.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import List, Dict, Any, Optional\nimport openai\nimport tiktoken\nfrom .base import BaseLLM\nfrom .rate_limiter import RateLimiter\n\nclass OpenAILLM(BaseLLM):\n    \"\"\"OpenAI API wrapper.\"\"\"\n    \n    def __init__(\n        self,\n        api_key: str,\n        model: str,\n        rate_limits: Optional[Dict[str, Any]] = None\n    ):\n        \"\"\"Initialize OpenAI LLM.\n        \n        Args:\n            api_key: OpenAI API key\n            model: Model identifier (e.g., \"gpt-4\", \"gpt-3.5-turbo\")\n            rate_limits: Optional dictionary with rate limit settings\n        \"\"\"\n        self.client = openai.OpenAI(api_key=api_key)\n        self.model = model\n        \n        try:\n            # Initialize tokenizer for the model\n            self.tokenizer = tiktoken.encoding_for_model(model)\n        except:\n            # Fallback to cl100k_base for new models\n            self.tokenizer = tiktoken.get_encoding(\"cl100k_base\")\n        \n        # Default rate limits for GPT-4o-mini\n        default_limits = {\n            \"requests_per_minute\": 500,\n            \"input_tokens_per_minute\": 200000,\n            \"output_tokens_per_minute\": 100000,\n            \"input_token_price_per_million\": 0.15,\n            \"output_token_price_per_million\": 0.60\n        }\n        \n        # Use provided rate limits or defaults\n        limits = rate_limits or default_limits\n        \n        # Initialize rate limiter\n        self.rate_limiter = RateLimiter(\n            provider=\"OpenAI\",\n            requests_per_minute=limits.get(\"requests_per_minute\", default_limits[\"requests_per_minute\"]),\n            input_tokens_per_minute=limits.get(\"input_tokens_per_minute\", default_limits[\"input_tokens_per_minute\"]),\n            output_tokens_per_minute=limits.get(\"output_tokens_per_minute\", default_limits[\"output_tokens_per_minute\"]),\n            input_token_price_per_million=limits.get(\"input_token_price_per_million\", default_limits[\"input_token_price_per_million\"]),\n            output_token_price_per_million=limits.get(\"output_token_price_per_million\", default_limits[\"output_token_price_per_million\"])\n        )\n    \n    def _count_tokens(self, text: str) -> int:\n        \"\"\"Count tokens in a string using the model's tokenizer.\n        \n        Args:\n            text: Text to count tokens for\n            \n        Returns:\n            Token count\n        \"\"\"\n        if not text:\n            return 0\n            \n        try:\n            return len(self.tokenizer.encode(text))\n        except Exception as e:\n            # Log the error but don't fail\n            import logging\n            logging.warning(f\"Failed to count tokens with OpenAI tokenizer: {e}\")\n            # Fallback: rough estimate if tokenizer fails\n            return len(text.split()) * 1.3\n    \n    def _count_messages_tokens(self, messages: List[Dict[str, str]]) -> int:\n        \"\"\"Count tokens in all messages.\n        \n        Args:\n            messages: List of message dictionaries\n            \n        Returns:\n            Total token count\n        \"\"\"\n        if not messages:\n            return 0\n            \n        total_tokens = 0\n        \n        # Count tokens in each message\n        for message in messages:\n            if \"content\" in message and message[\"content\"]:\n                total_tokens += self._count_tokens(message[\"content\"])\n            \n        # Add overhead for message formatting (varies by model, but ~4 tokens per message)\n        total_tokens += 4 * len(messages)\n        \n        # Add tokens for model overhead (varies by model)\n        total_tokens += 3  # Every reply is primed with <|start|>assistant<|message|>\n        \n        return total_tokens\n    \n    def generate(\n        self,\n        messages: List[Dict[str, str]],\n        temperature: float,\n        max_tokens: Optional[int]\n    ) -> str:\n        \"\"\"Generate a response using OpenAI API with rate limiting.\n        \n        Args:\n            messages: List of message dictionaries\n            temperature: Sampling temperature\n            max_output_tokens: Maximum tokens to generate\n            \n        Returns:\n            Generated response text\n        \"\"\"\n        # Count input tokens\n        input_tokens = self._count_messages_tokens(messages)\n        \n        # Wait if we're approaching rate limits (estimate output tokens as max_output_tokens)\n        self.rate_limiter.wait_if_needed(input_tokens, max_tokens)\n        \n        # Make the API call\n        response = self.client.chat.completions.create(\n            model=self.model,\n            messages=messages,\n            temperature=temperature,\n            max_tokens=max_tokens if max_tokens else None\n        )\n        \n        result_text = response.choices[0].message.content\n        \n        # Count output tokens and record request\n        output_tokens = response.usage.completion_tokens if hasattr(response, 'usage') else self._count_tokens(result_text)\n        input_tokens = response.usage.prompt_tokens if hasattr(response, 'usage') else input_tokens\n        \n        self.rate_limiter.record_request(input_tokens, output_tokens)\n        \n        return result_text\n    \n    def format_message(self, role: str, content: str) -> Dict[str, str]:\n        \"\"\"Format message for OpenAI API.\n        \n        Args:\n            role: Message role (system, user, assistant)\n            content: Message content\n            \n        Returns:\n            Formatted message dictionary\n        \"\"\"\n        # OpenAI uses standard role names\n        return {\"role\": role, \"content\": content} "
  },
  {
    "path": "src/agent/llm/rate_limiter.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport time\nfrom typing import Dict, List, Optional\nfrom collections import deque\nimport threading\nimport logging\n\n# Configure logging\nlogging.basicConfig(level=logging.INFO, \n                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(\"RateLimiter\")\n\nclass RateLimiter:\n    \"\"\"\n    Rate limiter for LLM API calls.\n    Tracks requests, input tokens, and output tokens per minute.\n    Also tracks cost based on token pricing.\n    \"\"\"\n    \n    def __init__(\n        self,\n        provider: str,\n        requests_per_minute: int,\n        input_tokens_per_minute: int,\n        output_tokens_per_minute: int,\n        input_token_price_per_million: float,\n        output_token_price_per_million: float,\n        buffer_percentage: float = 0.1  # Buffer to avoid hitting exact limits\n    ):\n        \"\"\"\n        Initialize the rate limiter.\n        \n        Args:\n            provider: LLM provider name (\"openai\" or \"claude\")\n            requests_per_minute: Maximum requests per minute\n            input_tokens_per_minute: Maximum input tokens per minute\n            output_tokens_per_minute: Maximum output tokens per minute\n            input_token_price_per_million: Price per million input tokens\n            output_token_price_per_million: Price per million output tokens\n            buffer_percentage: Percentage buffer to avoid hitting exact limits\n        \"\"\"\n        self.provider = provider\n        self.requests_per_minute = requests_per_minute * (1 - buffer_percentage)\n        self.input_tokens_per_minute = input_tokens_per_minute * (1 - buffer_percentage)\n        self.output_tokens_per_minute = output_tokens_per_minute * (1 - buffer_percentage)\n        \n        # Pricing\n        self.input_token_price = input_token_price_per_million / 1_000_000\n        self.output_token_price = output_token_price_per_million / 1_000_000\n        \n        # Track usage within a sliding window (1 minute)\n        self.request_timestamps = deque()\n        self.input_token_usage = deque()  # Tuples of (timestamp, token_count)\n        self.output_token_usage = deque()  # Tuples of (timestamp, token_count)\n        \n        # Total usage stats\n        self.total_requests = 0\n        self.total_input_tokens = 0\n        self.total_output_tokens = 0\n        self.total_cost = 0.0\n        \n        # Thread lock for thread safety\n        self.lock = threading.Lock()\n    \n    def _clean_old_entries(self, usage_queue: deque, current_time: float):\n        \"\"\"Remove entries older than 1 minute from the queue.\"\"\"\n        one_minute_ago = current_time - 60\n        \n        # Handle different queue formats (timestamps vs. (timestamp, value) tuples)\n        if usage_queue and isinstance(usage_queue[0], tuple):\n            # For token usage queues that store (timestamp, count) tuples\n            while usage_queue and usage_queue[0][0] < one_minute_ago:\n                usage_queue.popleft()\n        else:\n            # For request_timestamps queue that stores timestamp floats directly\n            while usage_queue and usage_queue[0] < one_minute_ago:\n                usage_queue.popleft()\n    \n    def _get_usage_count(self, usage_queue: deque):\n        \"\"\"Get the total count from a usage queue.\"\"\"\n        return sum(count for _, count in usage_queue)\n    \n    def wait_if_needed(self, input_tokens: int, estimated_output_tokens: Optional[int] = None):\n        \"\"\"\n        Check if we're about to exceed rate limits and wait if necessary.\n        This improved version uses a while loop instead of recursion to\n        avoid potential infinite waiting scenarios.\n        \n        Args:\n            input_tokens: Number of input tokens for the upcoming request\n            estimated_output_tokens: Estimated number of output tokens\n        \"\"\"\n        with self.lock:\n            if estimated_output_tokens is None:\n                estimated_output_tokens = input_tokens // 2  # Rough fallback estimate\n            \n            # If this single request is bigger than the entire capacity, warn or handle\n            if input_tokens > self.input_tokens_per_minute or estimated_output_tokens > self.output_tokens_per_minute:\n                logger.warning(\n                    f\"Request uses more tokens ({input_tokens} in / {estimated_output_tokens} out) \"\n                    f\"than the configured per-minute capacity. This request may never succeed.\"\n                )\n            \n            while True:\n                current_time = time.time()\n                \n                # Clean up old entries\n                self._clean_old_entries(self.request_timestamps, current_time)\n                self._clean_old_entries(self.input_token_usage, current_time)\n                self._clean_old_entries(self.output_token_usage, current_time)\n                \n                # Calculate current usage\n                current_requests = len(self.request_timestamps)\n                current_input_tokens = self._get_usage_count(self.input_token_usage)\n                current_output_tokens = self._get_usage_count(self.output_token_usage)\n                \n                # Check if adding this request would exceed limits\n                if ((current_requests + 1) <= self.requests_per_minute and\n                    (current_input_tokens + input_tokens) <= self.input_tokens_per_minute and\n                    (current_output_tokens + estimated_output_tokens) <= self.output_tokens_per_minute):\n                    # We can proceed now\n                    break\n                \n                # Otherwise, compute how long to wait\n                wait_time = 0\n                if self.request_timestamps:\n                    wait_time = max(wait_time, 60 - (current_time - self.request_timestamps[0]))\n                if self.input_token_usage:\n                    wait_time = max(wait_time, 60 - (current_time - self.input_token_usage[0][0]))\n                if self.output_token_usage:\n                    wait_time = max(wait_time, 60 - (current_time - self.output_token_usage[0][0]))\n                \n                # If wait_time is still <= 0, we won't fix usage by waiting\n                if wait_time <= 0:\n                    logger.warning(\n                        \"Waiting cannot reduce usage enough to allow this request; \"\n                        \"request exceeds per-minute capacity or usage remains too high.\"\n                    )\n                    break\n                \n                logger.info(f\"Rate limit approaching for {self.provider}. Waiting {wait_time:.2f} seconds...\")\n                time.sleep(wait_time)\n    \n    def record_request(self, input_tokens: int, output_tokens: int):\n        \"\"\"\n        Record an API request and its token usage.\n        \n        Args:\n            input_tokens: Number of input tokens used\n            output_tokens: Number of output tokens generated\n        \"\"\"\n        with self.lock:\n            current_time = time.time()\n            \n            # Record request and token usage\n            self.request_timestamps.append(current_time)\n            self.input_token_usage.append((current_time, input_tokens))\n            self.output_token_usage.append((current_time, output_tokens))\n            \n            # Update total stats\n            self.total_requests += 1\n            self.total_input_tokens += input_tokens\n            self.total_output_tokens += output_tokens\n            \n            # Calculate cost\n            input_cost = input_tokens * self.input_token_price\n            output_cost = output_tokens * self.output_token_price\n            total_cost = input_cost + output_cost\n            self.total_cost += total_cost\n            \n            # Log usage and cost\n            logger.info(\n                f\"{self.provider} Request: {self.total_requests} | \"\n                f\"Tokens: {input_tokens}in/{output_tokens}out | \"\n                f\"Cost: ${total_cost:.6f} | \"\n                f\"Total Cost: ${self.total_cost:.6f}\"\n            )\n    \n    def print_usage_stats(self):\n        \"\"\"Print current usage statistics.\"\"\"\n        with self.lock:\n            logger.info(f\"{self.provider} Usage Statistics:\")\n            logger.info(f\"  Total Requests: {self.total_requests}\")\n            logger.info(f\"  Total Input Tokens: {self.total_input_tokens}\")\n            logger.info(f\"  Total Output Tokens: {self.total_output_tokens}\")\n            logger.info(f\"  Total Cost: ${self.total_cost:.6f}\")"
  },
  {
    "path": "src/agent/orchestrator.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, Optional, List\nimport time\nfrom .base import BaseAgent\nfrom .reader import Reader\nfrom .searcher import Searcher\nfrom .writer import Writer\nfrom .verifier import Verifier\nfrom visualizer import StatusVisualizer\nimport re\nimport yaml\nimport ast\nimport tiktoken\n\n# Dummy visualizer class that mimics StatusVisualizer but does nothing\nclass DummyVisualizer:\n    \"\"\"A no-op visualizer that implements the same interface as StatusVisualizer but does nothing.\"\"\"\n    \n    def reset(self):\n        \"\"\"Do nothing.\"\"\"\n        pass\n    \n    def set_current_component(self, component, file_path):\n        \"\"\"Do nothing.\"\"\"\n        pass\n    \n    def update(self, agent_name, status):\n        \"\"\"Do nothing.\"\"\"\n        pass\n\nclass Orchestrator(BaseAgent):\n    \"\"\"Agent responsible for managing the workflow between all other agents.\"\"\"\n    \n    def __init__(self, repo_path: str, config_path: Optional[str] = None, test_mode: Optional[str] = None):\n        \"\"\"Initialize the Orchestrator agent and its sub-agents.\n        \n        Args:\n            repo_path: Path to the repository being analyzed\n            config_path: Optional path to the configuration file\n            test_mode: Optional test mode to run only specific components. Values: \"reader_searcher\", \"context_print\" or None\n        \"\"\"\n        super().__init__(\"Orchestrator\")\n        self.repo_path = repo_path\n        self.context = \"\"\n        self.test_mode = test_mode\n        \n        # Load configuration\n        self.config = {}\n        if config_path:\n            with open(config_path, 'r') as f:\n                self.config = yaml.safe_load(f)\n        \n        # Get flow control parameters with defaults\n        flow_config = self.config.get('flow_control', {})\n        self.max_reader_search_attempts = flow_config.get('max_reader_search_attempts', 4)\n        self.max_verifier_rejections = flow_config.get('max_verifier_rejections', 3)\n        self.status_sleep_time = flow_config.get('status_sleep_time', 3)\n        \n        # Check model type for context constraints\n        llm_config = self.config.get('llm', {})\n        self.model_type = llm_config.get('type', 'openai')\n        \n        # Add max_input_tokens to config for context length constraint\n        if 'max_input_tokens' not in self.config:\n            self.config['max_input_tokens'] = llm_config.get('max_input_tokens', 10000)\n        \n        # Initialize visualization - use dummy visualizer for \"context_print\" test mode\n        if test_mode == \"context_print\":\n            self.visualizer = DummyVisualizer()\n        else:\n            self.visualizer = StatusVisualizer()\n        \n        # Initialize all sub-agents\n        self.reader = Reader(config_path=config_path)\n        self.searcher = Searcher(repo_path, config_path=config_path)\n        \n        # Only initialize writer and verifier if not in reader_searcher test mode\n        if test_mode != \"reader_searcher\":\n            self.writer = Writer(config_path=config_path)\n            self.verifier = Verifier(config_path=config_path)\n\n    def _parse_verifier_response(self, response: str) -> Dict[str, Any]:\n        \"\"\"Parse the verifier's XML response into a structured format.\n        \n        Args:\n            response: The XML response from the verifier\n            \n        Returns:\n            Dictionary containing parsed verification results with structure:\n            {\n                'needs_revision': bool,\n                'needs_context': bool,\n                'suggestion': str,\n                'context_suggestion': str\n            }\n        \"\"\"\n        result = {\n            'needs_revision': False,\n            'needs_context': False,\n            'suggestion': '',\n            'context_suggestion': ''\n        }\n        \n        # Parse NEED_REVISION\n        need_revision_match = re.search(r'<NEED_REVISION>(.*?)</NEED_REVISION>', response, re.DOTALL)\n        if need_revision_match:\n            result['needs_revision'] = need_revision_match.group(1).strip().lower() == 'true'\n            \n            if result['needs_revision']:\n                # Parse MORE_CONTEXT\n                more_context_match = re.search(r'<MORE_CONTEXT>(.*?)</MORE_CONTEXT>', response, re.DOTALL)\n                if more_context_match:\n                    result['needs_context'] = more_context_match.group(1).strip().lower() == 'true'\n                    \n                    if result['needs_context']:\n                        # Extract context suggestion\n                        context_suggestion_match = re.search(r'<SUGGESTION_CONTEXT>(.*?)</SUGGESTION_CONTEXT>', response, re.DOTALL)\n                        if context_suggestion_match:\n                            result['context_suggestion'] = context_suggestion_match.group(1).strip()\n                    else:\n                        # Extract improvement suggestion\n                        suggestion_match = re.search(r'<SUGGESTION>(.*?)</SUGGESTION>', response, re.DOTALL)\n                        if suggestion_match:\n                            result['suggestion'] = suggestion_match.group(1).strip()\n        \n        return result\n\n    def process(\n        self,\n        focal_component: str,\n        file_path: str,\n        ast_node: ast.AST = None,\n        ast_tree: ast.AST = None,\n        dependency_graph: Dict[str, List[str]] = None,\n        focal_node_dependency_path: str = None,\n        token_consume_focal: int = 0\n    ) -> str:\n        \"\"\"Process a docstring generation request through the entire agent workflow.\n        \n        Args:\n            focal_component: The code component needing a docstring (full code snippet)\n            file_path: Path to the file containing the component (Only input relative file path to the belonged repo!)\n            ast_node: Optional AST node representing the focal component\n            ast_tree: Optional AST tree for the entire file\n            \n        Returns:\n            The generated and verified docstring, or reader response in test mode\n        \"\"\"\n        # Reset visualization and set current component\n        self.visualizer.reset()\n        self.visualizer.set_current_component(focal_component, file_path)\n        # context should be reset to empty string\n        self.context = \"\"\n        # Initialize attempt counters\n        reader_search_attempts = 0\n        verifier_rejection_count = 0\n        \n        while True:\n            # Step 1: Reader determines if more context is needed\n            self.visualizer.update('reader', \"Analyzing code component...\")\n            reader_response = self.reader.process(\n                focal_component,\n                self.context\n            )\n            # add reader_response to reader's memory (assistant)\n            self.reader.add_to_memory(\"assistant\", reader_response)\n            \n            # Step 2: Check if more information is needed\n            match = re.search(r'<INFO_NEED>(.*?)</INFO_NEED>', reader_response, re.DOTALL)\n            needs_info = match and match.group(1).strip().lower() == 'true'\n            \n            if needs_info and reader_search_attempts < self.max_reader_search_attempts:\n                reader_search_attempts += 1\n                self.visualizer.update('reader', f\"Need more information (attempt {reader_search_attempts}/{self.max_reader_search_attempts}), ask Searcher to search additional context...\")\n                if self.test_mode != \"context_print\":\n                    time.sleep(self.status_sleep_time)\n                # Use Searcher to gather more information\n                self.visualizer.update('searcher', \"Searching for additional context...\")\n                if self.test_mode != \"context_print\":\n                    time.sleep(self.status_sleep_time)\n                search_results = self.searcher.process(reader_response, ast_node, ast_tree, dependency_graph, focal_node_dependency_path)\n                self._update_context(search_results, token_consume_focal)\n                # Refresh reader's memory with new context\n                self.reader.refresh_memory([\n                    {\"role\": \"system\", \"content\": self.reader.system_prompt},\n                    {\"role\": \"user\", \"content\": f\"Current context:\\n{self.context}\"}\n                ])\n                self.visualizer.update('reader', \"Search complete, Context updated, restarting analysis...\")\n                if self.test_mode != \"context_print\":\n                    time.sleep(self.status_sleep_time)\n                continue\n            elif needs_info:\n                self.visualizer.update('reader', f\"Max search attempts ({self.max_reader_search_attempts}) reached, proceeding with current context...\")\n                if self.test_mode != \"context_print\":\n                    time.sleep(self.status_sleep_time)\n\n            self.visualizer.update('reader', \"No additional context needed, starting docstring generation...\")\n            if self.test_mode != \"context_print\":\n                time.sleep(self.status_sleep_time)\n            \n            # If in reader_searcher test mode, return after context gathering\n            if self.test_mode == \"reader_searcher\":\n                return reader_response\n            \n            while True:  # Inner loop for writer-verifier cycle\n                # Step 3: When enough context is gathered, use Writer to generate docstring\n                self.visualizer.update('writer', \"Generating docstring...\")\n                \n                # Print context if in context_print test mode\n                if self.test_mode == \"context_print\":\n                    print(\"\\n=== CONTEXT BEFORE WRITER CALL ===\")\n                    print(self.context)\n                    print(\"=== END OF CONTEXT ===\\n\")\n                \n                docstring = self.writer.process(\n                    focal_component,\n                    self.context\n                )\n                # assert docstring is not empty\n                # add writer_response to writer's memory (assistant)\n                self.writer.add_to_memory(\"assistant\", docstring)\n\n                # Step 4: Use Verifier to check the quality\n                self.visualizer.update('verifier', \"Verifying docstring quality...\")\n                verification_response = self.verifier.process(\n                    focal_component,\n                    docstring,\n                    self.context\n                )\n                \n                # Step 5: Parse and process verification results\n                verification_result = self._parse_verifier_response(verification_response)\n                \n                if not verification_result['needs_revision'] or verifier_rejection_count >= self.max_verifier_rejections:\n                    if verifier_rejection_count >= self.max_verifier_rejections:\n                        self.visualizer.update('verifier', f\"Max rejection attempts ({self.max_verifier_rejections}) reached, accepting current docstring.\")\n                    else:\n                        self.visualizer.update('verifier', \"Docstring generated successfully! No need for revision.\")\n                    if self.test_mode != \"context_print\":\n                        time.sleep(self.status_sleep_time)\n                    return docstring\n                # if needs_revision is true, then needs_context is true\n                else:\n                    verifier_rejection_count += 1\n                    # clean verifier's memory\n                    self.verifier.clear_memory()\n                    if verification_result['needs_context'] and reader_search_attempts < self.max_reader_search_attempts:\n                        self.visualizer.update('verifier', f\"Need more context (rejection {verifier_rejection_count}/{self.max_verifier_rejections}), hands back to reader...\")\n                        if self.test_mode != \"context_print\":\n                            time.sleep(self.status_sleep_time)\n                        # Add context suggestion to reader's memory and break inner loop to get more context\n                        self.reader.add_to_memory(\n                            \"user\",\n                            f\"Additional context needed: {verification_result['context_suggestion']}\"\n                        )\n\n                        # clean writer's and verifier's memory\n                        self.writer.clear_memory()\n                        \n                        break  # Break inner loop to return to reader-searcher cycle\n                    else:\n                        self.visualizer.update('verifier', f\"Content is not good enough (rejection {verifier_rejection_count}/{self.max_verifier_rejections}), hands back to writer...\")\n                        if self.test_mode != \"context_print\":\n                            time.sleep(self.status_sleep_time)\n                        # Add improvement suggestion to writer's memory and continue inner loop\n                        self.writer.add_to_memory(\n                            \"user\",\n                            f\"Please improve the docstring based on this suggestion: {verification_result['suggestion']}\"\n                        )\n                        # Continue inner loop to generate new docstring\n\n    def _update_context(self, search_results: Dict[str, Any], token_consume_focal: int) -> None:\n        \"\"\"Update the context with new search results by merging content within existing XML tags.\n        \n        Args:\n            search_results: Dictionary containing new context information structured as:\n                {\n                    'internal': {\n                        'calls': {\n                            'class': {'class1': 'content1', ...},\n                            'function': {'func1': 'content1', ...},\n                            'method': {'method1': 'content1', ...},\n                        },\n                        'called_by': ['code snippet1', ...]\n                    },\n                    'external': {\n                        'query1': 'result1',\n                        'query2': 'result2'\n                    }\n                }\n        \"\"\"\n        if not self.context:\n            # Initialize empty context structure if none exists\n            self.context = \"\"\"<CONTEXT>\n<INTERNAL_INFO>\n<CLASS>\n</CLASS>\n<FUNCTION>\n</FUNCTION>\n<METHOD>\n</METHOD>\n<CALL_BY>\n</CALL_BY>\n</INTERNAL_INFO>\n<EXTERNAL_RETRIEVAL_INFO>\n</EXTERNAL_RETRIEVAL_INFO>\n</CONTEXT>\"\"\"\n\n        if 'internal' in search_results:\n            internal_info = search_results['internal']\n            \n            # Handle calls (class, function, method)\n            if 'calls' in internal_info:\n                calls = internal_info['calls']\n                \n                # Helper function to safely update XML content\n                def update_xml_section(tag: str, content_list: list) -> None:\n                    if not content_list:\n                        return\n                    pattern = f'<{tag}>(.*?)</{tag}>'\n                    match = re.search(pattern, self.context, re.DOTALL)\n                    if not match:\n                        # If pattern doesn't exist, something is wrong with context structure\n                        return\n                    existing_text = match.group(1).strip()\n                    new_content = existing_text + \"\\n\" + \"\\n\".join(content_list) if existing_text else \"\\n\".join(content_list)\n                    # Escape backslashes in new_content to prevent regex interpretation issues\n                    new_content = new_content.replace('\\\\', '\\\\\\\\')\n                    self.context = re.sub(pattern, f'<{tag}>\\n{new_content}\\n</{tag}>', self.context, flags=re.DOTALL)\n                \n                # Update class calls\n                if 'class' in calls:\n                    class_content = [f\"<{class_name}>{content}</{class_name}>\" for class_name, content in calls['class'].items()]\n                    update_xml_section('CLASS', class_content)\n\n                # Update function calls\n                if 'function' in calls:\n                    func_content = [f\"<{func_name}>{content}</{func_name}>\" for func_name, content in calls['function'].items()]\n                    update_xml_section('FUNCTION', func_content)\n\n                # Update method calls\n                if 'method' in calls:\n                    method_content = [f\"<{method_name}>{content}</{method_name}>\" for method_name, content in calls['method'].items()]\n                    update_xml_section('METHOD', method_content)\n\n            # Update called_by\n            if 'called_by' in internal_info:\n                called_by_content = internal_info['called_by']\n                update_xml_section('CALL_BY', called_by_content)\n\n        # Update external info\n        if 'external' in search_results:\n            external_content = []\n            for query, result in search_results['external'].items():\n                external_content.append(f\"<QUERY>{query}</QUERY>\")\n                external_content.append(f\"<r>{result}</r>\")\n            update_xml_section('EXTERNAL_RETRIEVAL_INFO', external_content) \n        \n        # Apply context length constraint for all models\n        if hasattr(self, 'config') and 'max_input_tokens' in self.config:\n            max_input_tokens = self.config.get('max_input_tokens', 10000)\n        else:\n            max_input_tokens = 10000  # Default fallback\n            \n        self._constrain_context_length(max_input_tokens=max_input_tokens, token_consume_focal=token_consume_focal)\n    \n    def _constrain_context_length(self, max_input_tokens: int = 10000, token_consume_focal: int = 0) -> None:\n        \"\"\"Constrain context length for models by truncating the longest component.\n        \n        Args:\n            max_input_tokens: Maximum number of tokens allowed in the input context\n            token_consume_focal: Number of tokens consumed by the focal component itself\n        \"\"\"\n        try:\n            # Use tiktoken to count tokens\n            encoding = tiktoken.get_encoding(\"cl100k_base\")  # Using a common encoding\n            current_tokens = len(encoding.encode(self.context))\n            \n            # Check if we need to truncate considering both context and focal component tokens\n            if current_tokens + token_consume_focal <= max_input_tokens:\n                return  # No need to truncate\n            \n            # Find the XML section with the most tokens to truncate\n            component_tokens = {}\n            components = [\n                ('CODE_CONTEXT', r'<CODE_CONTEXT>(.*?)</CODE_CONTEXT>'),\n                ('FOCAL_COMPONENT', r'<FOCAL_COMPONENT>(.*?)</FOCAL_COMPONENT>'),\n                ('RELATED_COMPONENTS', r'<RELATED_COMPONENTS>(.*?)</RELATED_COMPONENTS>'),\n                ('FOCAL_DEPENDENCIES', r'<FOCAL_DEPENDENCIES>(.*?)</FOCAL_DEPENDENCIES>'),\n                ('EXTERNAL_RETRIEVAL_INFO', r'<EXTERNAL_RETRIEVAL_INFO>(.*?)</EXTERNAL_RETRIEVAL_INFO>')\n            ]\n            \n            for name, pattern in components:\n                match = re.search(pattern, self.context, re.DOTALL)\n                if match:\n                    content = match.group(1)\n                    tokens = len(encoding.encode(content))\n                    component_tokens[name] = (content, tokens)\n            \n            # Find the component with the most tokens\n            if not component_tokens:\n                return  # No components found\n                \n            longest_component = max(component_tokens.items(), key=lambda x: x[1][1])\n            component_name = longest_component[0]\n            content = longest_component[1][0]\n            component_token_count = longest_component[1][1]\n            \n            # Calculate tokens to remove, considering focal component\n            tokens_to_remove = current_tokens + token_consume_focal - max_input_tokens\n            \n            if tokens_to_remove <= 0:\n                return  # No need to truncate\n                \n            # Print information about truncation\n            print(f\"Truncating {component_name}: removing {tokens_to_remove} tokens from {component_token_count} tokens. Current total: {current_tokens} tokens\")\n                \n            if tokens_to_remove >= component_token_count:\n                # If removing the entire component isn't enough, we'll just remove it and deal with the rest later\n                new_content = \"\"\n            else:\n                # Truncate the content by removing tokens from the end\n                encoded_content = encoding.encode(content)\n                truncated_encoded = encoded_content[:-tokens_to_remove]\n                new_content = encoding.decode(truncated_encoded)\n            \n            # Update the context with truncated content\n            pattern = f'<{component_name}>(.*?)</{component_name}>'\n            self.context = re.sub(pattern, f'<{component_name}>\\n{new_content}\\n</{component_name}>', self.context, flags=re.DOTALL)\n            \n        except Exception as e:\n            print(f\"Error constraining context length: {e}\") \n        "
  },
  {
    "path": "src/agent/reader.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom dataclasses import dataclass\nfrom enum import Enum\nfrom typing import Any, Dict, List, Optional, Tuple\n\nfrom .base import BaseAgent\n\n\nclass CodeComponentType(Enum):\n    \"\"\"Enum for different types of code components.\"\"\"\n\n    FUNCTION = \"function\"\n    METHOD = \"method\"\n    CLASS = \"class\"\n\n\n@dataclass\nclass InformationRequest:\n    \"\"\"Data class for structured information requests.\"\"\"\n\n    internal_requests: List[str]\n    external_requests: List[str]\n\n\nclass Reader(BaseAgent):\n    \"\"\"Agent responsible for determining if more context is needed for docstring generation.\"\"\"\n\n    def __init__(self, config_path: Optional[str] = None):\n        \"\"\"Initialize the Reader agent.\n\n        Args:\n            config_path: Optional path to the configuration file\n        \"\"\"\n        super().__init__(\"Reader\", config_path)\n        self.system_prompt = \"\"\"You are a Reader agent responsible for determining if more context\n        is needed to generate a high-quality docstring. You should analyze the code component and\n        current context to make this determination.\n\n        You have access to two types of information sources:\n\n        1. Internal Codebase Information (from local code repository):\n            For Functions:\n            - Code components called within the function body\n            - Places where this function is called\n\n            For Methods:\n            - Code components called within the method body\n            - Places where this method is called\n            - The class this method belongs to\n\n            For Classes:\n            - Code components called in the __init__ method\n            - Places where this class is instantiated\n            - Complete class implementation beyond __init__\n\n        2. External Open Internet retrieval Information:\n            - External Retrieval is extremely expensive. Only request external open internet retrieval information if the component involves a novel, state of the art, recently-proposed algorithms or techniques.\n              (e.g. computing a novel loss function (NDCG Loss, Alignment and Uniformity Loss, etc), certain novel metrics (Cohen's Kappa, etc), specialized novel ideas)\n            - Each query should be a clear, natural language question\n\n        Your response should:\n        1. First provide a free text analysis of the current code and context\n        2. Explain what additional information might be needed (if any)\n        3. Include an <INFO_NEED>true</INFO_NEED> tag if more information is needed,\n           or <INFO_NEED>false</INFO_NEED> if current context is sufficient\n        4. If more information is needed, end your response with a structured request in XML format:\n\n        <REQUEST>\n            <INTERNAL>\n                <CALLS>\n                    <CLASS>class1,class2</CLASS>\n                    <FUNCTION>func1,func2</FUNCTION>\n                    <METHOD>self.method1,instance.method2,class.method3</METHOD>\n                </CALLS>\n                <CALL_BY>true/false</CALL_BY>\n            </INTERNAL>\n            <RETRIEVAL>\n                <QUERY>query1,query2</QUERY>\n            </RETRIEVAL>\n        </REQUEST>\n\n        Important rules for structured request:\n        1. For CALLS sections, only include names that are explicitly needed\n        2. If no items exist for a category, use empty tags (e.g., <CLASS></CLASS>)\n        3. CALL_BY should be \"true\" only if you need to know what calls/uses a component\n        4. Each external QUERY should be a concise, clear, natural language search query\n        5. Use comma-separated values without spaces for multiple items\n        6. For METHODS, keep dot notation in the same format as the input.\n        7. Only first-level calls of the focal code component are accessible. Do not request information on code components that are not directly called by the focal component.\n        8. External Open-Internet Retrieval is extremely expensive. Only request external open internet retrieval information if the component involves a novel, state of the art, recently-proposed algorithms or techniques.\n              (e.g. computing a novel loss function (NDCG Loss, Alignment and Uniformity Loss, etc), certain novel metrics (Cohen's Kappa, etc), specialized novel ideas)\n\n\n        Important rules:\n        1. Only request internal codebase information that you think is necessary for docstring generation task. For some components that is simple and obvious, you do not need any other information for docstring generation.\n        2. External Open-Internet retrieval request is extremely expensive. Only request information that you think is absolutely necessary for docstring generation task.\n\n        <Example_response>\n        The current code shows a database connection function. To write a comprehensive docstring, we need to understand:\n        1. Where this function is called - this will reveal the expected input patterns and common use cases\n        2. What internal database functions it relies on - this will help document any dependencies or prerequisites\n\n        This additional context is necessary because database connections often have specific setup requirements and usage patterns that should be documented for proper implementation.\n\n        <INFO_NEED>true</INFO_NEED>\n\n        <REQUEST>\n            <INTERNAL>\n                <CALLS>\n                    <CLASS></CLASS>\n                    <FUNCTION>execute_query,connect_db</FUNCTION>\n                    <METHOD>self.process_data,data_processor._internal_process</METHOD>\n                </CALLS>\n                <CALL_BY>true</CALL_BY>\n            </INTERNAL>\n            <RETRIEVAL>\n                <QUERY></QUERY>\n            </RETRIEVAL>\n        </REQUEST>\n\n        </Example_response>\n\n        Keep in mind that:\n\n        3. You do not need to generate docstring for the component. Just determine if more information is needed.\n        \"\"\"\n        self.add_to_memory(\"system\", self.system_prompt)\n\n    def process(self, focal_component: str, context: str = \"\") -> str:\n        \"\"\"Process the input and determine if more context is needed.\n\n        Args:\n            instruction: The instruction for docstring generation\n            focal_component: The code component needing a docstring (full code snippet)\n            component_type: The type of the code component (function, method, or class)\n            context: Current context information (if any)\n\n        Returns:\n            A string containing the analysis and <INFO_NEED> tag indicating if more information is needed\n        \"\"\"\n        # Add the current task to memory\n        task_description = f\"\"\"\n        <context>\n        Current context:\n        {context if context else 'No context provided yet.'}\n        </context>\n\n        <component>\n        Analyze the following code component:\n\n        {focal_component}\n        </component>\n        \"\"\"\n        self.add_to_memory(\"user\", task_description)\n\n        # Generate response using LLM\n        response = self.generate_response()\n        return response\n"
  },
  {
    "path": "src/agent/searcher.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, List, Any, Optional\nfrom .base import BaseAgent\nfrom .reader import InformationRequest\nfrom .tool.internal_traverse import ASTNodeAnalyzer  # Updated import to use only ASTNodeAnalyzer\nfrom .tool.perplexity_api import PerplexityAPI, PerplexityResponse\nimport re\nfrom dataclasses import dataclass, field\nimport xml.etree.ElementTree as ET\nfrom io import StringIO\nimport ast  # Keep for type annotations\n\n@dataclass\nclass ParsedInfoRequest:\n    \"\"\"Structured format for parsed information requests.\n    \n    Attributes:\n        internal_requests: Dictionary containing:\n            - call: Dictionary with keys 'class', 'function', 'method', each containing\n                   a list of code component names that are called\n            - call_by: Boolean indicating if caller information is needed\n        external_requests: List of query strings for external information search\n    \"\"\"\n    internal_requests: Dict[str, Any] = field(default_factory=lambda: {\n        'call': {\n            'class': [],\n            'function': [], \n            'method': []\n        },\n        'call_by': False\n    })\n    external_requests: List[str] = field(default_factory=list)\n\nclass Searcher(BaseAgent):\n    \"\"\"Agent responsible for gathering requested information from internal and external sources.\"\"\"\n    \n    def __init__(self, repo_path: str, config_path: Optional[str] = None):\n        \"\"\"Initialize the Searcher agent.\n        \n        Args:\n            repo_path: Path to the repository being analyzed\n            config_path: Optional path to the configuration file\n        \"\"\"\n        super().__init__(\"Searcher\", config_path=config_path)\n        self.repo_path = repo_path\n        self.ast_analyzer = ASTNodeAnalyzer(repo_path)\n\n    def process(\n        self, \n        reader_response: str, \n        ast_node: ast.AST,\n        ast_tree: ast.AST,\n        dependency_graph: Dict[str, List[str]],\n        focal_node_dependency_path: str\n    ) -> Dict[str, Any]:\n        \"\"\"Process the reader's response and gather the requested information.\n        \n        Args:\n            reader_response: Response from the Reader agent containing\n                           information requests in structured XML format\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_graph: Dictionary mapping component paths to their dependencies\n            focal_node_dependency_path: Dependency path of the focal component\n                        \n        Returns:\n            A dictionary containing the gathered information, structured as:\n            {\n                'internal': {\n                    'calls': {\n                        'class': ['class1': 'content1', 'class2': 'content2', ...],\n                        'function': ['func1': 'content1', 'func2': 'content2', ...],\n                        'method': ['method1': 'content1', 'method2': 'content2', ...],\n                        },\n                    'called_by': ['code snippet1', 'code snippet2', ...],\n                },\n                'external': {\n                    'query1': 'result1',\n                    'query2': 'result2'\n                }\n            }\n        \"\"\"\n        # Parse the reader's response into structured format\n        parsed_request = self._parse_reader_response(reader_response)\n\n        # Gather internal information using dependency graph and AST analyzer\n        internal_info = self._gather_internal_info(\n            ast_node,\n            ast_tree,\n            focal_node_dependency_path,\n            dependency_graph,\n            parsed_request\n        )\n\n        # Gather external information using Perplexity API\n        external_info = self._gather_external_info(parsed_request.external_requests)\n        \n        return {\n            'internal': internal_info,\n            'external': external_info\n        }\n\n    def _parse_reader_response(self, reader_response: str) -> ParsedInfoRequest:\n        \"\"\"Parse the reader's structured XML response.\n        \n        Args:\n            reader_response: Response from Reader agent containing XML\n            \n        Returns:\n            ParsedInfoRequest object containing structured requests\n        \"\"\"\n        # Extract the XML content between REQUEST tags\n        xml_match = re.search(r'<REQUEST>(.*?)</REQUEST>', \n                            reader_response, re.DOTALL)\n        if not xml_match:\n            # Return empty request if no valid XML found\n            return ParsedInfoRequest()\n            \n        xml_content = f'<REQUEST>{xml_match.group(1)}</REQUEST>'\n        \n        try:\n            # Parse XML\n            root = ET.fromstring(xml_content)\n            \n            # Parse internal requests\n            internal = root.find('INTERNAL')\n            calls = internal.find('CALLS')\n            internal_requests = {\n                'call': {\n                    'class': self._parse_comma_list(calls.find('CLASS').text),\n                    'function': self._parse_comma_list(calls.find('FUNCTION').text),\n                    'method': self._parse_comma_list(calls.find('METHOD').text)\n                },\n                'call_by': internal.find('CALL_BY').text.lower() == 'true'\n            }\n            \n            # Parse external requests\n            external = root.find('RETRIEVAL')\n            external_requests = self._parse_comma_list(external.find('QUERY').text)\n            \n            return ParsedInfoRequest(\n                internal_requests=internal_requests,\n                external_requests=external_requests\n            )\n            \n        except (ET.ParseError, AttributeError) as e:\n            print(f\"Error parsing XML: {e}\")\n            # Return empty request if XML parsing fails\n            return ParsedInfoRequest()\n    \n    def _parse_comma_list(self, text: str | None) -> List[str]:\n        \"\"\"Parse comma-separated text into list of strings.\n        \n        Args:\n            text: Comma-separated text or None\n            \n        Returns:\n            List of non-empty strings\n        \"\"\"\n        if not text:\n            return []\n        return [item.strip() for item in text.split(',') if item.strip()]\n\n    def _gather_internal_info(\n        self, \n        ast_node: ast.AST,\n        ast_tree: ast.AST,\n        focal_dependency_path: str,\n        dependency_graph: Dict[str, List[str]],\n        parsed_request: ParsedInfoRequest\n    ) -> Dict[str, Any]:\n        \"\"\"Gather internal information using the dependency graph and AST analyzer.\n        \n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            focal_dependency_path: Dependency path of the focal component\n            dependency_graph: Dictionary mapping component paths to their dependencies\n            parsed_request: Structured format of information requests\n            \n        Returns:\n            Dictionary containing gathered internal information structured as:\n            {\n                'calls': {\n                    'class': {'class_name': 'code_content', ...},\n                    'function': {'function_name': 'code_content', ...},\n                    'method': {'method_name': 'code_content', ...}\n                },\n                'called_by': ['code_snippet1', 'code_snippet2', ...]\n            }\n        \"\"\"\n        result = {\n            'calls': {\n                'class': {},\n                'function': {},\n                'method': {}\n            },\n            'called_by': []\n        }\n        \n        # Get dependencies of the focal component from the dependency graph\n        component_dependencies = dependency_graph.get(focal_dependency_path, [])\n        \n        # Process class dependencies\n        if parsed_request.internal_requests['call']['class']:\n            requested_classes = parsed_request.internal_requests['call']['class']\n            for dependency_path in component_dependencies:\n                # Check if this is a class dependency by looking at capitalization of the last part\n                path_parts = dependency_path.split('.')\n                if path_parts and path_parts[-1][0].isupper():\n                    # This looks like a class dependency\n                    class_name = path_parts[-1]\n                    \n                    # Check if this class is in the requested classes\n                    # Use flexible matching for partial class names or with prefixes\n                    for requested_class in requested_classes:\n                        # Match by exact name, or as part of a path\n                        if (requested_class == class_name or \n                            requested_class in dependency_path or \n                            class_name.endswith(requested_class)):\n                            \n                            # Get the class initialization code\n                            class_code = self.ast_analyzer.get_component_by_path(\n                                ast_node, \n                                ast_tree, \n                                dependency_path\n                            )\n                            \n                            if class_code:\n                                result['calls']['class'][requested_class] = class_code\n                                break\n        \n        # Process function dependencies\n        if parsed_request.internal_requests['call']['function']:\n            requested_functions = parsed_request.internal_requests['call']['function']\n            for dependency_path in component_dependencies:\n                # Check if this is likely a function (last part starts with lowercase)\n                path_parts = dependency_path.split('.')\n                if path_parts and path_parts[-1][0].islower():\n                    # This looks like a function or method, differentiate by checking if it's in a class\n                    # If the second-to-last part starts with uppercase, it's likely a method\n                    if len(path_parts) >= 2 and path_parts[-2][0].isupper():\n                        # This is likely a method, skip for now\n                        continue\n                        \n                    function_name = path_parts[-1]\n                    \n                    # Check if this function is in the requested functions\n                    for requested_function in requested_functions:\n                        # Match by exact name, or as part of a path\n                        if (requested_function == function_name or \n                            requested_function in dependency_path or \n                            function_name.endswith(requested_function)):\n                            \n                            # Get the function code\n                            function_code = self.ast_analyzer.get_component_by_path(\n                                ast_node, \n                                ast_tree, \n                                dependency_path\n                            )\n                            \n                            if function_code:\n                                result['calls']['function'][requested_function] = function_code\n                                break\n        \n        # Process method dependencies\n        if parsed_request.internal_requests['call']['method']:\n            requested_methods = parsed_request.internal_requests['call']['method']\n            for dependency_path in component_dependencies:\n                # Check if this is likely a method (part after a part that starts with uppercase)\n                path_parts = dependency_path.split('.')\n                if len(path_parts) >= 2 and path_parts[-1][0].islower() and path_parts[-2][0].isupper():\n                    method_name = path_parts[-1]\n                    class_name = path_parts[-2]\n                    full_method_name = f\"{class_name}.{method_name}\"\n                    \n                    # Check if this method is in the requested methods\n                    for requested_method in requested_methods:\n                        # Match by exact name, class.method, or just method name\n                        if (requested_method == full_method_name or \n                            requested_method == method_name or \n                            requested_method in dependency_path or\n                            method_name.endswith(requested_method)):\n                            \n                            # Get the method code\n                            method_code = self.ast_analyzer.get_component_by_path(\n                                ast_node, \n                                ast_tree, \n                                dependency_path\n                            )\n                            \n                            if method_code:\n                                result['calls']['method'][requested_method] = method_code\n                                break\n        \n        # Handle call_by (what calls this component)\n        if parsed_request.internal_requests['call_by']:\n            parent_components = self.ast_analyzer.get_parent_components(\n                ast_node, \n                ast_tree, \n                focal_dependency_path,\n                dependency_graph\n            )\n            \n            if parent_components:\n                result['called_by'].extend(parent_components)\n            else:\n                result['called_by'].append(\"This component is never called by any other component.\")\n        \n        return result\n\n    def _gather_external_info(self, queries: List[str]) -> Dict[str, str]:\n        \"\"\"Gather external information using Perplexity API.\n        \n        Args:\n            queries: List of search queries\n            \n        Returns:\n            Dictionary mapping queries to their responses\n        \"\"\"\n        if not queries:\n            return {}\n            \n        try:\n            perplexity = PerplexityAPI()\n            responses = perplexity.batch_query(\n                questions=queries,\n                system_prompt=\"You are a helpful assistant providing concise and accurate information about programming concepts and code. Focus on technical accuracy and clarity.\",\n                temperature=0.1\n            )\n            \n            # Create mapping of queries to responses\n            results = {}\n            for query, response in zip(queries, responses):\n                if response is not None:\n                    results[query] = response.content\n                else:\n                    results[query] = \"Error: Failed to get response from Perplexity API\"\n                    \n            return results\n            \n        except Exception as e:\n            print(f\"Error using Perplexity API: {str(e)}\")\n            return {query: f\"Error: {str(e)}\" for query in queries}"
  },
  {
    "path": "src/agent/tool/README.md",
    "content": "# AST Call Graph Analysis Tool\n\nThis tool provides functionality to analyze Python codebases by building and querying call graphs using Abstract Syntax Tree (AST) parsing. It helps in understanding code relationships and dependencies between functions, methods, and classes.\n\n## Features\n\n### Call Graph Building\n- Automatically builds a complete call graph for a Python repository\n- Tracks relationships between functions, methods, and classes\n- Handles cross-file dependencies\n- Caches AST parsing results for better performance\n\n### Code Component Analysis\n\nThe tool provides six main functionalities for analyzing code relationships:\n\n1. **Child Function Analysis** (`get_child_function`)\n   - Input: Component signature, file path, and child function name\n   - Output: Full code of the function being called\n   - Use case: Finding implementation of functions called within your code\n\n2. **Child Method Analysis** (`get_child_method`)\n   - Input: Component signature, file path, and child method name\n   - Output: Full code of the method being called\n   - Use case: Finding implementation of methods called on objects\n\n3. **Child Class Analysis** (`get_child_class`)\n   - Input: Component signature, file path, and child class name\n   - Output: Class signature and initialization code\n   - Use case: Finding class definitions for instantiated objects\n\n4. **Parent Function Analysis** (`get_parent_function`)\n   - Input: Component signature, file path, and parent function name\n   - Output: Full code of the function that calls the component\n   - Use case: Finding where a function is being used\n\n5. **Parent Method Analysis** (`get_parent_method`)\n   - Input: Component signature, file path, and parent method name\n   - Output: Full code of the method that calls the component\n   - Use case: Finding where a method is being called\n\n6. **Parent Class Analysis** (`get_parent_class`)\n   - Input: Component signature, file path, and parent class name\n   - Output: Full code of the class that uses the component\n   - Use case: Finding classes that depend on other classes\n\n## Usage Example\n\n```python\nfrom agent.tool.ast import CallGraphBuilder\n\n# Initialize the builder with repository path\nbuilder = CallGraphBuilder(\"/path/to/repo\")\n\n# Find where a function is called\nparent_code = builder.get_parent_function(\n    \"def process_data(self):\",\n    \"src/data/processor.py\",\n    \"main_function\"\n)\n\n# Find what methods a class uses\nchild_code = builder.get_child_method(\n    \"class DataProcessor:\",\n    \"src/data/processor.py\",\n    \"transform_data\"\n)\n```\n\n## Implementation Details\n\n- Uses Python's built-in `ast` module for code parsing\n- Maintains parent-child relationships in AST nodes\n- Handles various Python constructs:\n  - Function definitions and calls\n  - Class definitions and instantiations\n  - Method calls (both direct and through objects)\n  - Static methods\n  - Internal methods\n  - Cross-file dependencies\n\n## Limitations\n\n- Currently only supports Python files\n- Requires valid Python syntax in source files\n- Does not handle dynamic code execution (eval, exec)\n- Method resolution is name-based (doesn't handle complex inheritance)\n- Doesn't track calls through variables or complex expressions "
  },
  {
    "path": "src/agent/tool/ast.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport ast\nfrom typing import Dict, List, Optional, Set, Tuple, Union\nfrom pathlib import Path\nimport os\nfrom abc import ABC, abstractmethod\n\nclass ASTUtility(ABC):\n    \"\"\"Abstract base class for AST utilities.\"\"\"\n    \n    @abstractmethod\n    def _get_component_name_from_code(self, code_snippet: str) -> Optional[str]:\n        \"\"\"Extract component name from a code snippet.\n        \n        Args:\n            code_snippet (str): The full code snippet of a function/method/class\n            \n        Returns:\n            Optional[str]: The name of the component if found, None otherwise\n            \n        Example:\n            >>> builder = CallGraphBuilder(\"repo_path\")\n            >>> builder._get_component_name_from_code(\"def process_data(self):\\\\n    return data\")\n            'process_data'\n            >>> builder._get_component_name_from_code(\"class DataProcessor:\\\\n    def __init__(self):\")\n            'DataProcessor'\n        \"\"\"\n        pass\n\n    def _is_code_similar(self, code1: str, code2: str, threshold: float = 0.9) -> bool:\n        \"\"\"Check if two code snippets are similar using fuzzy matching.\n        \n        Args:\n            code1 (str): First code snippet\n            code2 (str): Second code snippet\n            threshold (float): Similarity threshold (0.0 to 1.0). Default is 0.9\n            \n        Returns:\n            bool: True if similarity score is above threshold\n        \"\"\"\n        # Special handling for class components\n        if code1.lstrip().startswith('class ') and code2.lstrip().startswith('class '):\n            # For classes, just compare the class names\n            class1_name = self._get_component_name_from_code(code1)\n            class2_name = self._get_component_name_from_code(code2)\n            return class1_name == class2_name\n\n        # Normalize whitespace and remove empty lines\n        def normalize(code: str) -> str:\n            return '\\n'.join(line.strip() for line in code.split('\\n') if line.strip())\n            \n        code1_norm = normalize(code1)\n        code2_norm = normalize(code2)\n        \n        # Simple length-based early check\n        if abs(len(code1_norm) - len(code2_norm)) / max(len(code1_norm), len(code2_norm)) > (1 - threshold):\n            return False\n            \n        # Character-based similarity score\n        matches = sum(a == b for a, b in zip(code1_norm, code2_norm))\n        similarity = matches / max(len(code1_norm), len(code2_norm))\n        \n        return similarity >= threshold\n\ndef _get_component_name_from_code(code_snippet: str) -> Optional[str]:\n    \"\"\"Extract component name from a code snippet.\n    \n    Args:\n        code_snippet (str): The full code snippet of a function/method/class\n        \n    Returns:\n        Optional[str]: The name of the component if found, None otherwise\n        \n    Example:\n        >>> _get_component_name_from_code(\"def process_data(self):\\\\n    return data\")\n        'process_data'\n        >>> _get_component_name_from_code(\"class DataProcessor:\\\\n    def __init__(self):\")\n        'DataProcessor'\n    \"\"\"\n    # Remove leading whitespace and get first line\n    first_line = code_snippet.lstrip().split('\\n')[0]\n    \n    # Check if it's a class\n    if first_line.startswith('class '):\n        # Find the class name - it's between 'class ' and either '(' or ':'\n        class_decl = first_line[6:].strip()  # Remove 'class ' prefix\n        class_name = class_decl.split('(')[0].split(':')[0].strip()\n        return class_name\n        \n    # Check if it's a function/method\n    elif first_line.startswith('def '):\n        # Find the function name - it's between 'def ' and '('\n        func_decl = first_line[4:].strip()  # Remove 'def ' prefix\n        func_name = func_decl.split('(')[0].strip()\n        return func_name\n    \n    return None\n\nclass ParentNodeTransformer(ast.NodeTransformer):\n    \"\"\"AST transformer that adds parent references to each node.\"\"\"\n    def visit(self, node):\n        for child in ast.iter_child_nodes(node):\n            child.parent = node\n        return super().visit(node)\n\nclass CallGraphBuilder(ASTUtility):\n    \"\"\"A class to build and analyze call graphs for Python code.\n    \n    This class helps analyze function calls, method calls, and class relationships\n    within a Python repository.\n    \"\"\"\n    \n    def __init__(self, repo_path: str):\n        \"\"\"Initialize the CallGraphBuilder with a repository path.\n        \n        Args:\n            repo_path (str): Path to the Python repository to analyze\n        \"\"\"\n        self.repo_path = Path(repo_path)\n        self.call_graph = {}\n        self.class_info = {}\n        self.method_info = {}\n        self.function_info = {}\n        self.file_asts = {}\n        self._build_call_graph()\n    \n    def _parse_file(self, file_path: str) -> ast.AST:\n        \"\"\"Parse a Python file and return its AST.\n        \n        Args:\n            file_path (str): Path to the file relative to repo_path\n        \"\"\"\n        if file_path in self.file_asts:\n            return self.file_asts[file_path]\n        \n        # Construct absolute path by joining repo_path with file_path\n        abs_path = self.repo_path / file_path\n        \n        with open(abs_path) as f:\n            content = f.read()\n        tree = ast.parse(content)\n        # Add parent references\n        transformer = ParentNodeTransformer()\n        tree = transformer.visit(tree)\n        self.file_asts[file_path] = tree\n        return tree\n\n    def _get_signature_from_code(self, code: str, is_class: bool = False) -> str:\n        \"\"\"Extract signature from code.\n        For functions/methods: signature ends with first ':' after first matching ')'\n        For classes: signature is the class definition line ending with ':'\"\"\"\n        lines = code.split('\\n')\n        first_line = lines[0].strip()\n        \n        if is_class:\n            return first_line\n            \n        # For functions/methods\n        # Find the closing parenthesis\n        paren_count = 0\n        end_paren_idx = -1\n        for i, char in enumerate(first_line):\n            if char == '(':\n                paren_count += 1\n            elif char == ')':\n                paren_count -= 1\n                if paren_count == 0:\n                    end_paren_idx = i\n                    break\n                    \n        if end_paren_idx == -1:\n            return first_line\n            \n        # Find the first : after the closing parenthesis\n        colon_idx = first_line.find(':', end_paren_idx)\n        if colon_idx == -1:\n            return first_line\n            \n        return first_line[:colon_idx+1]\n\n    def _get_node_code(self, file_path: str, node: ast.AST) -> str:\n        \"\"\"Get the source code for a node.\n        \n        Args:\n            file_path (str): Path to the file relative to repo_path\n            node (ast.AST): The AST node to get code for\n        \"\"\"\n        abs_path = self.repo_path / file_path\n        with open(abs_path) as f:\n            content = f.readlines()\n        return ''.join(content[node.lineno-1:node.end_lineno])\n\n    def _is_method(self, node: ast.FunctionDef) -> bool:\n        \"\"\"Check if a function definition is a method.\"\"\"\n        parent = getattr(node, 'parent', None)\n        while parent is not None:\n            if isinstance(parent, ast.ClassDef):\n                return True\n            parent = getattr(parent, 'parent', None)\n        return False\n\n    def _build_call_graph(self):\n        \"\"\"Build the complete call graph for the repository.\"\"\"\n        for root, _, files in os.walk(self.repo_path):\n            for file in files:\n                if not file.endswith('.py'):\n                    continue\n                \n                abs_file_path = Path(root) / file\n                # Convert absolute path to relative path\n                rel_file_path = str(abs_file_path.relative_to(self.repo_path))\n                tree = self._parse_file(rel_file_path)\n                \n                for node in ast.walk(tree):\n                    if isinstance(node, ast.ClassDef):\n                        # Store class info\n                        class_code = self._get_node_code(rel_file_path, node)\n                        self.class_info[(rel_file_path, class_code)] = node\n                        \n                        # Store method info\n                        for item in node.body:\n                            if isinstance(item, ast.FunctionDef):\n                                method_code = self._get_node_code(rel_file_path, item)\n                                self.method_info[(rel_file_path, method_code)] = item\n                                \n                    elif isinstance(node, ast.FunctionDef):\n                        if not self._is_method(node):\n                            # Store function info\n                            func_code = self._get_node_code(rel_file_path, node)\n                            self.function_info[(rel_file_path, func_code)] = node\n\n    def _get_component_name_from_code(self, code_snippet: str) -> Optional[str]:\n        \"\"\"Extract component name from a code snippet.\n        \n        Args:\n            code_snippet (str): The full code snippet of a function/method/class\n            \n        Returns:\n            Optional[str]: The name of the component if found, None otherwise\n        \"\"\"\n        return _get_component_name_from_code(code_snippet)\n\n    def get_child_function(self, code_component: str, file_path: str, child_function: str) -> Optional[str]:\n        \"\"\"Get the code of a child function that is called by the component.\n        \n        Args:\n            code_component (str): The full code snippet of the calling component. This is used to\n                                uniquely identify the component in case of name collisions.\n            file_path (str): Path to the file containing the component\n            child_function (str): Name of the function being called\n            \n        Returns:\n            Optional[str]: The code of the child function if found, None otherwise\n            \n        Example:\n            >>> builder = CallGraphBuilder(\"repo_path\")\n            >>> builder.get_child_function(\n            ...     \"def main_function():\\\\n    result = utility_function()\\\\n    return result\",\n            ...     \"main.py\",\n            ...     \"utility_function\"\n            ... )\n            'def utility_function():\\\\n    return \"utility\"'\n        \"\"\"\n        tree = self._parse_file(file_path)\n        target_node = None\n        \n        component_name = self._get_component_name_from_code(code_component)\n        if not component_name:\n            return None\n        \n        # Find the target node\n        for node in ast.walk(tree):\n            if isinstance(node, (ast.FunctionDef, ast.ClassDef)) and node.name == component_name:\n                # Get the code of this node and verify it matches using fuzzy matching\n                node_code = self._get_node_code(file_path, node)\n                if self._is_code_similar(node_code, code_component):\n                    target_node = node\n                    break\n        \n        if not target_node:\n            return None\n            \n        # Look for calls to the child function\n        for node in ast.walk(target_node):\n            if isinstance(node, ast.Call):\n                if isinstance(node.func, ast.Name) and node.func.id == child_function:\n                    # Find the function definition\n                    for func_file, func_code in self.function_info:\n                        func_node = self.function_info[(func_file, func_code)]\n                        if func_node.name == child_function:\n                            return func_code\n        return None\n\n    def _resolve_instance_type(self, node: ast.AST, instance_name: str) -> Optional[str]:\n        \"\"\"Resolve the class type of an instance variable by looking at assignments.\n        \n        Args:\n            node: The AST node to start searching from (usually a function/method)\n            instance_name: The name of the instance variable to resolve\n            \n        Returns:\n            Optional[str]: The name of the class if found, None otherwise\n        \"\"\"\n        # First check local assignments in the current function/method\n        for n in ast.walk(node):\n            if isinstance(n, ast.Assign):\n                for target in n.targets:\n                    if isinstance(target, ast.Name) and target.id == instance_name:\n                        if isinstance(n.value, ast.Call) and isinstance(n.value.func, ast.Name):\n                            return n.value.func.id\n                            \n        # If not found locally and we're in a method, check class __init__\n        if isinstance(node, ast.FunctionDef):\n            class_node = self._get_class_node(node)\n            if class_node:\n                for method in class_node.body:\n                    if isinstance(method, ast.FunctionDef) and method.name == '__init__':\n                        for n in ast.walk(method):\n                            if isinstance(n, ast.Assign):\n                                for target in n.targets:\n                                    if isinstance(target, ast.Attribute) and \\\n                                       isinstance(target.value, ast.Name) and \\\n                                       target.value.id == 'self' and \\\n                                       target.attr == instance_name and \\\n                                       isinstance(n.value, ast.Call) and \\\n                                       isinstance(n.value.func, ast.Name):\n                                        return n.value.func.id\n        return None\n\n    def _get_class_node(self, method_node: ast.FunctionDef) -> Optional[ast.ClassDef]:\n        \"\"\"Get the ClassDef node that contains this method.\"\"\"\n        parent = getattr(method_node, 'parent', None)\n        while parent is not None:\n            if isinstance(parent, ast.ClassDef):\n                return parent\n            parent = getattr(parent, 'parent', None)\n        return None\n\n    def get_child_method(self, code_component: str, file_path: str, \n                        method_name: str, prefix: Optional[str] = None, find_all: bool = False) -> Union[Optional[str], Dict[str, str]]:\n        \"\"\"Get the code of a child method that is called by the component.\n        \n        Args:\n            code_component (str): The full code snippet of the calling component. This is used to\n                                uniquely identify the component in case of name collisions.\n            file_path (str): Path to the file containing the component\n            method_name (str): Name of the method being called\n            prefix (Optional[str]): Optional prefix before method name (e.g., 'self', instance name, or class name)\n            find_all (bool): Whether to find all methods with this name across classes\n            \n        Returns:\n            If find_all=False:\n                Optional[str]: The code of the child method if found, None otherwise\n            If find_all=True:\n                Dict[str, str]: Dictionary mapping class names to method code for all matching methods\n                \n        Note:\n            This method handles three types of method calls:\n            1. self.method() - method in same class\n            2. ClassName.method() - direct class method call\n            3. instance.method() - method call through instance variable\n            \n            If prefix is provided:\n            - If prefix is 'self': looks for method in the same class\n            - If prefix starts with uppercase: treats it as a class name\n            - If prefix starts with lowercase: treats it as an instance variable\n        \"\"\"\n        tree = self._parse_file(file_path)\n        target_node = None\n        \n        component_name = self._get_component_name_from_code(code_component)\n        if not component_name:\n            return {} if find_all else None\n        \n        # Find the target node\n        for node in ast.walk(tree):\n            if isinstance(node, (ast.FunctionDef, ast.ClassDef)) and node.name == component_name:\n                # Get the code of this node and verify it matches using fuzzy matching\n                node_code = self._get_node_code(file_path, node)\n                if self._is_code_similar(node_code, code_component):\n                    target_node = node\n                    break\n        \n        if not target_node:\n            return {} if find_all else None\n\n        if find_all:\n            # Find all methods with this name across all classes\n            results = {}\n            for method_file, method_code in self.method_info:\n                method_node = self.method_info[(method_file, method_code)]\n                if method_node.name == method_name:\n                    class_node = self._get_class_node(method_node)\n                    if class_node:\n                        results[class_node.name] = method_code\n            return results\n            \n        # If prefix is provided, use it to narrow down the search\n        if prefix is not None:\n            target_class = None\n            \n            if prefix == 'self':\n                # Case 1: self.method()\n                target_class = self._get_class_of_method(target_node)\n            elif prefix[0].isupper():\n                # Case 2: ClassName.method()\n                target_class = prefix\n            else:\n                # Case 3: instance.method()\n                target_class = self._resolve_instance_type(target_node, prefix)\n                \n            if target_class:\n                for method_file, method_code in self.method_info:\n                    method_node = self.method_info[(method_file, method_code)]\n                    if method_node.name == method_name:\n                        # Verify this method belongs to the target class\n                        method_class = self._get_class_of_method(method_node)\n                        if method_class == target_class:\n                            return method_code\n                return None\n            \n        # If no prefix or target class not found, fall back to original behavior\n        # Look for method calls\n        for node in ast.walk(target_node):\n            if isinstance(node, ast.Call):\n                if isinstance(node.func, ast.Attribute) and node.func.attr == method_name:\n                    target_class = None\n                    \n                    if isinstance(node.func.value, ast.Name):\n                        if node.func.value.id == 'self':\n                            # Case 1: self.method()\n                            target_class = self._get_class_of_method(target_node)\n                        else:\n                            # Case 2: ClassName.method() or Case 3: instance.method()\n                            # Try as class name first\n                            for class_file, class_code in self.class_info:\n                                class_node = self.class_info[(class_file, class_code)]\n                                if class_node.name == node.func.value.id:\n                                    target_class = class_node.name\n                                    break\n                            \n                            # If not found as class name, try as instance variable\n                            if not target_class:\n                                target_class = self._resolve_instance_type(target_node, node.func.value.id)\n                    \n                    elif isinstance(node.func.value, ast.Attribute):\n                        # Handle nested attributes like self.processor.process()\n                        if isinstance(node.func.value.value, ast.Name):\n                            if node.func.value.value.id == 'self':\n                                # Get type of self.processor\n                                instance_var = node.func.value.attr\n                                target_class = self._resolve_instance_type(target_node, instance_var)\n                    \n                    # If we found the target class, find the method\n                    if target_class:\n                        for method_file, method_code in self.method_info:\n                            method_node = self.method_info[(method_file, method_code)]\n                            if method_node.name == method_name:\n                                # Verify this method belongs to the target class\n                                method_class = self._get_class_of_method(method_node)\n                                if method_class == target_class:\n                                    return method_code\n        return None\n\n    def get_child_class(self, code_component: str, file_path: str, child_class: str) -> Optional[str]:\n        \"\"\"Get the class signature and init function of a child class used by the component.\n        \n        Args:\n            code_component (str): The full code snippet of the calling component. This is used to\n                                uniquely identify the component in case of name collisions.\n            file_path (str): Path to the file containing the calling component\n            child_class (str): Name of the class being used\n            \n        Returns:\n            Optional[str]: The code of the child class and its __init__ if found, None otherwise\n            \n        Example:\n            >>> builder = CallGraphBuilder(\"repo_path\")\n            >>> builder.get_child_class(\n            ...     \"def main_function():\\\\n    helper = HelperClass()\\\\n    return helper.data\",\n            ...     \"main.py\",\n            ...     \"HelperClass\"\n            ... )\n            'class HelperClass:\\\\n    def __init__(self):\\\\n        self.data = []'\n        \"\"\"\n        tree = self._parse_file(file_path)\n        target_node = None\n        \n        component_name = self._get_component_name_from_code(code_component)\n        if not component_name:\n            return None\n        \n        # Find the target node\n        for node in ast.walk(tree):\n            if isinstance(node, (ast.FunctionDef, ast.ClassDef)) and node.name == component_name:\n                # Get the code of this node and verify it matches using fuzzy matching\n                node_code = self._get_node_code(file_path, node)\n                if self._is_code_similar(node_code, code_component):\n                    target_node = node\n                    break\n        \n        if not target_node:\n            return None\n            \n        # Look for class usage\n        for node in ast.walk(target_node):\n            if isinstance(node, ast.Call) and isinstance(node.func, ast.Name):\n                if node.func.id == child_class:\n                    # Find the class definition\n                    for class_file, class_code in self.class_info:\n                        class_node = self.class_info[(class_file, class_code)]\n                        if class_node.name == child_class:\n                            # Get class signature and __init__\n                            init_method = None\n                            for item in class_node.body:\n                                if isinstance(item, ast.FunctionDef) and item.name == '__init__':\n                                    init_method = self._get_node_code(class_file, item)\n                                    break\n                            if init_method:\n                                return f\"{class_code}\\n{init_method}\"\n                            return class_code\n        return None\n\n    def get_child_class_init(self, code_component: str, file_path: str, child_class: str) -> Optional[str]:\n        \"\"\"Get the class signature and init function of a child class used by the component.\n        Similar to get_child_class but only returns up to the end of __init__ if it exists.\n        \n        Args:\n            code_component (str): The full code snippet of the calling component. This is used to\n                                uniquely identify the component in case of name collisions.\n            file_path (str): Path to the file containing the calling component\n            child_class (str): Name of the class being used\n            \n        Returns:\n            Optional[str]: The code of the child class up to the end of __init__ if found,\n                         or the full class code if __init__ doesn't exist, None if class not found\n            \n        Example:\n            >>> builder = CallGraphBuilder(\"repo_path\")\n            >>> builder.get_child_class_init(\n            ...     \"def main_function():\\\\n    helper = HelperClass()\\\\n    return helper.data\",\n            ...     \"main.py\",\n            ...     \"HelperClass\"\n            ... )\n            'class HelperClass:\\\\n    def __init__(self):\\\\n        self.data = []'\n        \"\"\"\n        # Get the full class code first using existing method\n        full_code = self.get_child_class(code_component, file_path, child_class)\n        if not full_code:\n            return None\n            \n        # Split into lines for analysis\n        lines = full_code.split('\\n')\n        \n        # Find the __init__ method\n        init_start = -1\n        for i, line in enumerate(lines):\n            if line.strip().startswith('def __init__'):\n                init_start = i\n                break\n                \n        # If no __init__, return full code\n        if init_start == -1:\n            return full_code\n            \n        # Find the next method definition after __init__\n        next_method_start = -1\n        for i, line in enumerate(lines[init_start + 1:], start=init_start + 1):\n            if line.strip().startswith('def '):\n                next_method_start = i\n                break\n                \n        # If no next method found, return up to the end\n        if next_method_start == -1:\n            return full_code\n            \n        # Return code up to the start of next method\n        return '\\n'.join(lines[:next_method_start])\n\n    def _get_class_of_method(self, method_node: ast.FunctionDef) -> Optional[str]:\n        \"\"\"Get the name of the class that contains this method.\"\"\"\n        parent = getattr(method_node, 'parent', None)\n        while parent is not None:\n            if isinstance(parent, ast.ClassDef):\n                return parent.name\n            parent = getattr(parent, 'parent', None)\n        return None\n\n    def get_parent(self, code_component: str, file_path: str, class_name: Optional[str] = None) -> List[str]:\n        \"\"\"Get the code of any components that use the focal component.\n        \n        Args:\n            code_component: String representation of the component\n            file_path: Path to the file containing the component\n            class_name: If the component is a method, specify its class name to avoid\n                     false matches with methods of same name in other classes\n            \n        Returns:\n            List[str]: List of code blocks of parent components that use this component\n        \"\"\"\n        results = []\n        \n        component_name = self._get_component_name_from_code(code_component)\n        if not component_name:\n            return []\n        \n        \n        tree = self._parse_file(file_path)\n        found_target = False\n        for node in ast.walk(tree):\n            if isinstance(node, (ast.FunctionDef, ast.ClassDef)) and node.name == component_name:\n                node_code = self._get_node_code(file_path, node)\n                if self._is_code_similar(node_code, code_component):\n                    found_target = True\n                    break\n        \n        if not found_target:\n            return []\n        \n        # Check functions\n        for func_file, func_code in self.function_info:\n            func_node = self.function_info[(func_file, func_code)]\n            # Check if this function calls our component\n            for node in ast.walk(func_node):\n                if isinstance(node, ast.Call):\n                    if isinstance(node.func, ast.Name) and node.func.id == component_name:\n                        results.append(func_code)\n                        break  # Found usage in this function, move to next\n                    \n        # Check methods\n        for method_file, method_code in self.method_info:\n            method_node = self.method_info[(method_file, method_code)]\n            # Skip __init__ methods\n            if method_node.name == '__init__':\n                continue\n            # Check if this method calls our component\n            for node in ast.walk(method_node):\n                if isinstance(node, ast.Call):\n                    if isinstance(node.func, ast.Attribute) and node.func.attr == component_name:\n                        # If class_name is specified, verify the method belongs to that class\n                        if class_name:\n                            # Get the class of the target method\n                            target_class = None\n                            if isinstance(node.func.value, ast.Name):\n                                # For self.method() calls\n                                if node.func.value.id == 'self':\n                                    target_class = self._get_class_of_method(method_node)\n                                # For ClassName.method() calls\n                                else:\n                                    target_class = node.func.value.id\n                            # For instance.method() calls through instance variables\n                            elif isinstance(node.func.value, ast.Attribute):\n                                # Try to find the instance variable in __init__\n                                method_class = self._get_class_of_method(method_node)\n                                if method_class:\n                                    # Look up the class definition\n                                    for class_file, class_code in self.class_info:\n                                        class_node = self.class_info[(class_file, class_code)]\n                                        if class_node.name == method_class:\n                                            # Find __init__ method\n                                            for init_node in class_node.body:\n                                                if isinstance(init_node, ast.FunctionDef) and init_node.name == '__init__':\n                                                    # Look for assignments to this instance variable\n                                                    instance_var = node.func.value.value.id  # e.g., 'self' from self.data_processor\n                                                    var_name = node.func.value.attr  # e.g., 'data_processor' from self.data_processor\n                                                    if instance_var == 'self':\n                                                        for n in ast.walk(init_node):\n                                                            if isinstance(n, ast.Assign):\n                                                                for target in n.targets:\n                                                                    if isinstance(target, ast.Attribute) and \\\n                                                                       isinstance(target.value, ast.Name) and \\\n                                                                       target.value.id == 'self' and \\\n                                                                       target.attr == var_name and \\\n                                                                       isinstance(n.value, ast.Call):\n                                                                        # Found the initialization\n                                                                        if isinstance(n.value.func, ast.Name):\n                                                                            target_class = n.value.func.id\n                                                                            break\n                            if target_class == class_name:\n                                results.append(method_code)\n                        else:\n                            results.append(method_code)\n                        break  # Found usage in this method, move to next\n                    elif isinstance(node.func, ast.Name) and node.func.id == component_name:\n                        results.append(method_code)\n                        break  # Found usage in this method, move to next\n                        \n        # Check class __init__ methods\n        for class_file, class_code in self.class_info:\n            class_node = self.class_info[(class_file, class_code)]\n            # Look for __init__ method\n            for node in class_node.body:\n                if isinstance(node, ast.FunctionDef) and node.name == '__init__':\n                    # Check if __init__ uses our component\n                    for call_node in ast.walk(node):\n                        if isinstance(call_node, ast.Call):\n                            if isinstance(call_node.func, ast.Name) and call_node.func.id == component_name:\n                                # Get class signature and init method\n                                class_sig = self._get_node_code(class_file, class_node).split('\\n')[0]\n                                init_code = self._get_node_code(class_file, node)\n                                results.append(f\"{class_sig}\\n{init_code}\")\n                                break  # Found usage in this class, move to next\n                                \n        return results \n\n# Add this new class after the CallGraphBuilder class\nclass ASTNodeAnalyzer:\n    \"\"\"A class to analyze AST nodes directly without string matching.\n    \n    This class works directly with AST nodes to analyze function calls, method calls,\n    and class relationships within a Python repository, avoiding the need to re-parse\n    files that have already been parsed.\n    \"\"\"\n    \n    def __init__(self, repo_path: str):\n        \"\"\"Initialize the ASTNodeAnalyzer with a repository path.\n        \n        Args:\n            repo_path (str): Path to the Python repository to analyze\n        \"\"\"\n        self.repo_path = Path(repo_path)\n        # Reference to an existing CallGraphBuilder to reuse the pre-built info\n        self.call_graph_builder = CallGraphBuilder(repo_path)\n        \n    def get_child_function(self, focal_node: ast.AST, file_tree: ast.AST, \n                          file_path: str, child_function: str) -> Optional[str]:\n        \"\"\"Get the code of a child function that is called by the component.\n        \n        Args:\n            focal_node: The AST node representing the focal component\n            file_tree: The AST tree for the entire file\n            file_path: Path to the file containing the component\n            child_function: Name of the function being called\n            \n        Returns:\n            Optional[str]: The code of the child function if found, None otherwise\n        \"\"\"\n        # Look for calls to the child function in the focal node\n        for node in ast.walk(focal_node):\n            if isinstance(node, ast.Call):\n                if isinstance(node.func, ast.Name) and node.func.id == child_function:\n                    # Find the function definition in the function_info dictionary\n                    for func_file, func_code in self.call_graph_builder.function_info:\n                        func_node = self.call_graph_builder.function_info[(func_file, func_code)]\n                        if func_node.name == child_function:\n                            return func_code\n        return None\n    \n    def get_child_method(self, focal_node: ast.AST, file_tree: ast.AST,\n                        file_path: str, method_name: str,\n                        prefix: Optional[str] = None,\n                        find_all: bool = False) -> Union[Optional[str], Dict[str, str]]:\n        \"\"\"Get the code of a child method that is called by the component.\n        \n        Args:\n            focal_node: The AST node representing the focal component\n            file_tree: The AST tree for the entire file\n            file_path: Path to the file containing the component\n            method_name: Name of the method being called\n            prefix: Optional prefix before method name (e.g., 'self', instance name, or class name)\n            find_all: Whether to find all methods with this name across classes\n            \n        Returns:\n            If find_all=False:\n                Optional[str]: The code of the child method if found, None otherwise\n            If find_all=True:\n                Dict[str, str]: Dictionary mapping class names to method code for all matching methods\n        \"\"\"\n        if find_all:\n            # Find all methods with this name across all classes\n            results = {}\n            for method_file, method_code in self.call_graph_builder.method_info:\n                method_node = self.call_graph_builder.method_info[(method_file, method_code)]\n                if method_node.name == method_name:\n                    class_node = self.call_graph_builder._get_class_node(method_node)\n                    if class_node:\n                        results[class_node.name] = method_code\n            return results\n        \n        # If prefix is provided, use it to narrow down the search\n        if prefix is not None:\n            target_class = None\n            \n            if prefix == 'self':\n                # Case 1: self.method()\n                target_class = self.call_graph_builder._get_class_of_method(focal_node)\n            elif prefix[0].isupper():\n                # Case 2: ClassName.method()\n                target_class = prefix\n            else:\n                # Case 3: instance.method()\n                target_class = self.call_graph_builder._resolve_instance_type(focal_node, prefix)\n                \n            if target_class:\n                for method_file, method_code in self.call_graph_builder.method_info:\n                    method_node = self.call_graph_builder.method_info[(method_file, method_code)]\n                    if method_node.name == method_name:\n                        # Verify this method belongs to the target class\n                        method_class = self.call_graph_builder._get_class_of_method(method_node)\n                        if method_class == target_class:\n                            return method_code\n                return None\n        \n        # If no prefix or target class not found, fall back to searching in the AST\n        for node in ast.walk(focal_node):\n            if isinstance(node, ast.Call):\n                if isinstance(node.func, ast.Attribute) and node.func.attr == method_name:\n                    target_class = None\n                    \n                    if isinstance(node.func.value, ast.Name):\n                        if node.func.value.id == 'self':\n                            # Case 1: self.method()\n                            target_class = self.call_graph_builder._get_class_of_method(focal_node)\n                        else:\n                            # Case 2: ClassName.method() or Case 3: instance.method()\n                            # Try as class name first\n                            for class_file, class_code in self.call_graph_builder.class_info:\n                                class_node = self.call_graph_builder.class_info[(class_file, class_code)]\n                                if class_node.name == node.func.value.id:\n                                    target_class = class_node.name\n                                    break\n                            \n                            # If not found as class name, try as instance variable\n                            if not target_class:\n                                target_class = self.call_graph_builder._resolve_instance_type(focal_node, node.func.value.id)\n                    \n                    elif isinstance(node.func.value, ast.Attribute):\n                        # Handle nested attributes like self.processor.process()\n                        if isinstance(node.func.value.value, ast.Name):\n                            if node.func.value.value.id == 'self':\n                                # Get type of self.processor\n                                instance_var = node.func.value.attr\n                                target_class = self.call_graph_builder._resolve_instance_type(focal_node, instance_var)\n                    \n                    # If we found the target class, find the method\n                    if target_class:\n                        for method_file, method_code in self.call_graph_builder.method_info:\n                            method_node = self.call_graph_builder.method_info[(method_file, method_code)]\n                            if method_node.name == method_name:\n                                # Verify this method belongs to the target class\n                                method_class = self.call_graph_builder._get_class_of_method(method_node)\n                                if method_class == target_class:\n                                    return method_code\n        return None\n    \n    def get_child_class_init(self, focal_node: ast.AST, file_tree: ast.AST,\n                            file_path: str, child_class: str) -> Optional[str]:\n        \"\"\"Get the class signature and init function of a child class used by the component.\n        \n        Args:\n            focal_node: The AST node representing the focal component\n            file_tree: The AST tree for the entire file\n            file_path: Path to the file containing the component\n            child_class: Name of the class being used\n            \n        Returns:\n            Optional[str]: The code of the child class up to the end of __init__ if found,\n                         or the full class code if __init__ doesn't exist, None if class not found\n        \"\"\"\n        # Look for calls to the child class in the focal node\n        for node in ast.walk(focal_node):\n            if isinstance(node, ast.Call) and isinstance(node.func, ast.Name):\n                if node.func.id == child_class:\n                    # Find the class definition\n                    for class_file, class_code in self.call_graph_builder.class_info:\n                        class_node = self.call_graph_builder.class_info[(class_file, class_code)]\n                        if class_node.name == child_class:\n                            # Get class signature and __init__\n                            init_method = None\n                            for item in class_node.body:\n                                if isinstance(item, ast.FunctionDef) and item.name == '__init__':\n                                    init_method = self.call_graph_builder._get_node_code(class_file, item)\n                                    break\n                            \n                            if init_method:\n                                return f\"{class_code}\\n{init_method}\"\n                            return class_code\n        return None\n    \n    def get_parent_components(self, focal_node: ast.AST, file_tree: ast.AST,\n                             file_path: str, class_name: Optional[str] = None) -> List[str]:\n        \"\"\"Get the code of any components that use the focal component.\n        \n        Args:\n            focal_node: The AST node representing the focal component\n            file_tree: The AST tree for the entire file\n            file_path: Path to the file containing the component\n            class_name: If the component is a method, specify its class name to avoid\n                     false matches with methods of same name in other classes\n            \n        Returns:\n            List[str]: List of code blocks of parent components that use the focal component\n        \"\"\"\n        # Check what type of node this is\n        component_name = None\n        if isinstance(focal_node, ast.FunctionDef):\n            component_name = focal_node.name\n        elif isinstance(focal_node, ast.ClassDef):\n            component_name = focal_node.name\n        else:\n            return []\n            \n        # Get the source code of the focal node\n        focal_code = self.call_graph_builder._get_node_code(file_path, focal_node)\n        \n        # Now use the existing implementation from CallGraphBuilder\n        return self.call_graph_builder.get_parent(focal_code, file_path, class_name) "
  },
  {
    "path": "src/agent/tool/internal_traverse.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport ast\nimport os\nfrom typing import List, Optional, Dict, Any, Tuple\n\n\nclass ASTNodeAnalyzer:\n    \"\"\"\n    Tool for analyzing AST nodes to find relationships between components in code.\n    Used to identify calls (child components) and called_by (parent components).\n    \"\"\"\n\n    def __init__(self, repo_path: str):\n        \"\"\"\n        Initialize the AST Node Analyzer.\n\n        Args:\n            repo_path: Path to the repository being analyzed\n        \"\"\"\n        self.repo_path = repo_path\n\n    def get_component_by_path(\n        self, \n        ast_node: ast.AST, \n        ast_tree: ast.AST, \n        dependency_path: str\n    ) -> Optional[str]:\n        \"\"\"\n        Universal function to get any code component (class, function, method) by its dependency path.\n\n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the dependency in format: folder1.folder2.file.component_name\n                         or: folder1.folder2.file.class_name.method_name\n\n        Returns:\n            The code of the component if found, None otherwise\n        \"\"\"\n        path_parts = dependency_path.split('.')\n        if len(path_parts) < 2:\n            return None\n            \n        # Determine the component type based on the path structure\n        if len(path_parts) >= 3 and path_parts[-2] != 'self':\n            # This could be a method: folder1.folder2.file.class_name.method_name\n            last_part = path_parts[-1]\n            second_last_part = path_parts[-2]\n            \n            # Check if this is likely a method\n            if last_part[0].islower() and second_last_part[0].isupper():\n                # This looks like a method\n                return self._get_method_component(ast_node, ast_tree, dependency_path)\n        \n        # Check if this is a class (typically starts with uppercase)\n        if path_parts[-1][0].isupper():\n            # This looks like a class\n            return self._get_class_component(ast_node, ast_tree, dependency_path)\n        \n        # Default to function (or could be a module)\n        return self._get_function_component(ast_node, ast_tree, dependency_path)\n    \n    def _get_class_component(self, ast_node: ast.AST, ast_tree: ast.AST, dependency_path: str) -> Optional[str]:\n        \"\"\"\n        Get a class component by its dependency path.\n        \n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the dependency in format: folder1.folder2.file.ClassName\n            \n        Returns:\n            The code of the class if found, None otherwise\n        \"\"\"\n        path_parts = dependency_path.split('.')\n        class_name = path_parts[-1]\n        file_name = path_parts[-2] + '.py'\n        folder_path = os.path.join(*path_parts[:-2]) if len(path_parts) > 2 else ''\n        \n        # Special case for 'self' which refers to the current component\n        if class_name == 'self':\n            if isinstance(ast_node, ast.ClassDef):\n                return self._get_node_source(file_path=os.path.relpath(ast_tree.file_path, self.repo_path) if hasattr(ast_tree, 'file_path') else \"\", node=ast_node)\n            return None\n        \n        # First check if the class is used in the current file\n        local_class_info = self._find_class_init_in_node(ast_node, class_name)\n        if local_class_info:\n            return local_class_info\n            \n        # Try to find the file in the repository\n        target_file_path = os.path.join(folder_path, file_name)\n        full_file_path = os.path.join(self.repo_path, target_file_path)\n        \n        # If file doesn't exist, return None\n        if not os.path.exists(full_file_path):\n            return None\n            \n        # Parse the target file and find the class\n        try:\n            with open(full_file_path, 'r') as f:\n                file_content = f.read()\n                target_ast = ast.parse(file_content)\n                \n            # Find the class in the target file\n            for node in ast.walk(target_ast):\n                if isinstance(node, ast.ClassDef) and node.name == class_name:\n                    return self._get_node_source(target_file_path, node)\n        except Exception as e:\n            return f\"Error retrieving class {class_name}: {e}\"\n            \n        return None\n        \n    def _get_function_component(self, ast_node: ast.AST, ast_tree: ast.AST, dependency_path: str) -> Optional[str]:\n        \"\"\"\n        Get a function component by its dependency path.\n        \n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the dependency in format: folder1.folder2.file.function_name\n            \n        Returns:\n            The code of the function if found, None otherwise\n        \"\"\"\n        path_parts = dependency_path.split('.')\n        function_name = path_parts[-1]\n        file_name = path_parts[-2] + '.py'\n        folder_path = os.path.join(*path_parts[:-2]) if len(path_parts) > 2 else ''\n        \n        # Special case for 'self' which refers to the current component\n        if function_name == 'self':\n            if isinstance(ast_node, ast.FunctionDef):\n                return self._get_node_source(file_path=os.path.relpath(ast_tree.file_path, self.repo_path) if hasattr(ast_tree, 'file_path') else \"\", node=ast_node)\n            return None\n        \n        # Try to find the file in the repository\n        target_file_path = os.path.join(folder_path, file_name)\n        full_file_path = os.path.join(self.repo_path, target_file_path)\n        \n        # If file doesn't exist, check the current file\n        if not os.path.exists(full_file_path):\n            # Look for the function in the current file\n            for node in ast.walk(ast_tree):\n                if isinstance(node, ast.FunctionDef) and node.name == function_name:\n                    return self._get_node_source(file_path=os.path.relpath(ast_tree.file_path, self.repo_path) if hasattr(ast_tree, 'file_path') else \"\", node=node)\n            return None\n        \n        # Parse the target file and find the function\n        try:\n            with open(full_file_path, 'r') as f:\n                file_content = f.read()\n                target_ast = ast.parse(file_content)\n                \n            # Find the function in the target file\n            for node in ast.walk(target_ast):\n                if isinstance(node, ast.FunctionDef) and node.name == function_name:\n                    return self._get_node_source(target_file_path, node)\n        except Exception as e:\n            return f\"Error retrieving function {function_name}: {e}\"\n            \n        return None\n        \n    def _get_method_component(self, ast_node: ast.AST, ast_tree: ast.AST, dependency_path: str) -> Optional[str]:\n        \"\"\"\n        Get a method component by its dependency path.\n        \n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the dependency in format: folder1.folder2.file.ClassName.method_name\n            \n        Returns:\n            The code of the method if found, None otherwise\n        \"\"\"\n        path_parts = dependency_path.split('.')\n        if len(path_parts) < 3:  # Need at least file.class.method\n            return None\n            \n        method_name = path_parts[-1]\n        class_name = path_parts[-2]\n        file_name = path_parts[-3] + '.py'\n        folder_path = os.path.join(*path_parts[:-3]) if len(path_parts) > 3 else ''\n        \n        # Special case for 'self' which refers to the current component\n        if class_name == 'self':\n            # Find the method in the current node if it's a class\n            if isinstance(ast_node, ast.ClassDef):\n                for item in ast_node.body:\n                    if isinstance(item, ast.FunctionDef) and item.name == method_name:\n                        return self._get_node_source(file_path=os.path.relpath(ast_tree.file_path, self.repo_path) if hasattr(ast_tree, 'file_path') else \"\", node=item)\n            return None\n        \n        # Try to find the file in the repository\n        target_file_path = os.path.join(folder_path, file_name)\n        full_file_path = os.path.join(self.repo_path, target_file_path)\n        \n        # If file doesn't exist, check the current file\n        if not os.path.exists(full_file_path):\n            # Look for the class and method in the current file\n            for node in ast.walk(ast_tree):\n                if isinstance(node, ast.ClassDef) and node.name == class_name:\n                    for item in node.body:\n                        if isinstance(item, ast.FunctionDef) and item.name == method_name:\n                            return self._get_node_source(file_path=os.path.relpath(ast_tree.file_path, self.repo_path) if hasattr(ast_tree, 'file_path') else \"\", node=item)\n            return None\n        \n        # Parse the target file and find the class and method\n        try:\n            with open(full_file_path, 'r') as f:\n                file_content = f.read()\n                target_ast = ast.parse(file_content)\n                \n            # Find the class in the target file\n            for node in ast.walk(target_ast):\n                if isinstance(node, ast.ClassDef) and node.name == class_name:\n                    # Find the method in the class\n                    for item in node.body:\n                        if isinstance(item, ast.FunctionDef) and item.name == method_name:\n                            return self._get_node_source(target_file_path, item)\n        except Exception as e:\n            return f\"Error retrieving method {class_name}.{method_name}: {e}\"\n            \n        return None\n\n    def get_child_class_init(\n        self, \n        ast_node: ast.AST, \n        ast_tree: ast.AST, \n        dependency_path: str\n    ) -> Optional[str]:\n        \"\"\"\n        Get the class signature and init function of a child class used by the component.\n        Returns up to the end of __init__ if it exists (to save tokens).\n\n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the dependency in format: folder1.folder2.file.ClassName\n\n        Returns:\n            The code of the class initialization if found, None otherwise\n        \"\"\"\n        class_code = self.get_component_by_path(ast_node, ast_tree, dependency_path)\n        if not class_code:\n            return None\n            \n        # Parse the class code to find the __init__ method if it exists\n        try:\n            class_ast = ast.parse(class_code)\n            for node in ast.walk(class_ast):\n                if isinstance(node, ast.ClassDef):\n                    # Look for the __init__ method\n                    init_method = None\n                    for item in node.body:\n                        if isinstance(item, ast.FunctionDef) and item.name == \"__init__\":\n                            init_method = item\n                            break\n                    \n                    if init_method:\n                        # Get the class signature and everything up to the end of __init__\n                        class_lines = class_code.split('\\n')\n                        init_end_line = init_method.end_lineno - node.lineno + 1\n                        \n                        # Ensure init_end_line doesn't exceed the total lines\n                        init_end_line = min(init_end_line, len(class_lines))\n                        \n                        # Return class signature through the end of __init__\n                        return '\\n'.join(class_lines[:init_end_line])\n        except:\n            # If we can't parse the class code, just return it as is\n            pass\n            \n        return class_code\n\n    def get_child_function(\n        self, \n        ast_node: ast.AST, \n        ast_tree: ast.AST, \n        dependency_path: str\n    ) -> Optional[str]:\n        \"\"\"\n        Find a function that is called by the focal component.\n\n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the dependency in format: folder1.folder2.file.function_name\n\n        Returns:\n            The code of the function if found, None otherwise\n        \"\"\"\n        return self.get_component_by_path(ast_node, ast_tree, dependency_path)\n\n    def get_child_method(\n        self, \n        ast_node: ast.AST, \n        ast_tree: ast.AST, \n        dependency_path: str\n    ) -> Optional[str]:\n        \"\"\"\n        Find a method that is called by the focal component.\n\n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the dependency in format: folder1.folder2.file.ClassName.method_name\n\n        Returns:\n            The code of the method if found, None otherwise\n        \"\"\"\n        return self.get_component_by_path(ast_node, ast_tree, dependency_path)\n\n    def get_parent_components(\n        self, \n        ast_node: ast.AST, \n        ast_tree: ast.AST, \n        dependency_path: str,\n        dependency_graph: Optional[Dict[str, List[str]]] = None\n    ) -> List[str]:\n        \"\"\"\n        Find components that call/depend on the focal component by looking at the dependency graph.\n\n        Args:\n            ast_node: AST node representing the focal component\n            ast_tree: AST tree for the entire file\n            dependency_path: Path to the focal component in format: folder1.folder2.file.component_name\n            dependency_graph: Optional dictionary mapping component ids to their dependencies.\n                              If not provided, will only check the current file.\n\n        Returns:\n            List of code snippets of components that call/depend on the focal component\n        \"\"\"\n        parent_components = []\n        \n        # If no dependency graph provided, fall back to checking just the current file\n        if not dependency_graph:\n            component_name = self._get_component_name(ast_node)\n            if not component_name:\n                return parent_components\n                \n            # Parse the dependency path to get the file path for the current file\n            path_parts = dependency_path.split('.')\n            if len(path_parts) < 2:\n                return parent_components\n                \n            file_name = path_parts[-2] + '.py'\n            folder_path = os.path.join(*path_parts[:-2]) if len(path_parts) > 2 else ''\n            target_file_path = os.path.join(folder_path, file_name)\n            \n            # Check for calls in the current file\n            for node in ast.walk(ast_tree):\n                # Skip the component itself\n                if node == ast_node:\n                    continue\n                # Check if this is a function, async function, or class definition\n                if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):\n                    if self._contains_call_to(node, component_name):\n                        parent_components.append(self._get_node_source(target_file_path, node))\n            \n            return parent_components\n        \n        # With dependency graph, we can find all components that depend on this component\n        parent_ids = []\n        for component_id, dependencies in dependency_graph.items():\n            if dependency_path in dependencies:\n                parent_ids.append(component_id)\n        \n        # Now retrieve the source code for each parent component\n        for parent_id in parent_ids:\n            parent_code = self.get_component_by_path(ast_node, ast_tree, parent_id)\n            if parent_code:\n                parent_components.append(parent_code)\n        \n        return parent_components\n        \n    def _find_class_init_in_node(self, ast_node: ast.AST, class_name: str) -> Optional[str]:\n        \"\"\"\n        Find class instantiation in the given node.\n\n        Args:\n            ast_node: AST node to search in\n            class_name: Name of the class to find\n\n        Returns:\n            The code of the class instantiation if found, None otherwise\n        \"\"\"\n        for node in ast.walk(ast_node):\n            if isinstance(node, ast.Call) and self._get_call_name(node) == class_name:\n                return self._format_call_node(node)\n        return None\n\n    def _find_function_call_in_node(self, ast_node: ast.AST, function_name: str) -> bool:\n        \"\"\"\n        Check if a function is called in the given node.\n\n        Args:\n            ast_node: AST node to search in\n            function_name: Name of the function to find\n\n        Returns:\n            True if the function is called, False otherwise\n        \"\"\"\n        for node in ast.walk(ast_node):\n            if isinstance(node, ast.Call):\n                call_name = self._get_call_name(node)\n                if call_name == function_name:\n                    return True\n        return False\n\n    def _find_method_call_in_node(\n        self, \n        ast_node: ast.AST, \n        method_name: str, \n        prefix: Optional[str] = None\n    ) -> bool:\n        \"\"\"\n        Check if a method is called in the given node.\n\n        Args:\n            ast_node: AST node to search in\n            method_name: Name of the method to find\n            prefix: Optional prefix (object name) of the method\n\n        Returns:\n            True if the method is called, False otherwise\n        \"\"\"\n        for node in ast.walk(ast_node):\n            if isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute):\n                if node.func.attr == method_name:\n                    if prefix is None or (\n                        isinstance(node.func.value, ast.Name) and node.func.value.id == prefix\n                    ):\n                        return True\n        return False\n\n    def _find_class_for_prefix(self, ast_tree: ast.AST, prefix: Optional[str]) -> Optional[str]:\n        \"\"\"\n        Try to determine the class name for a given object prefix.\n        This is a naive approach that checks for:\n            prefix = ClassName()\n        or\n            prefix: ClassName\n\n        Args:\n            ast_tree: AST tree for the entire file\n            prefix: The object name to find the class for\n\n        Returns:\n            Name of the class if found, None otherwise\n        \"\"\"\n        if not prefix:\n            return None\n\n        # Look for prefix = ClassName()\n        for node in ast.walk(ast_tree):\n            if isinstance(node, ast.Assign):\n                for target in node.targets:\n                    if isinstance(target, ast.Name) and target.id == prefix:\n                        if (\n                            isinstance(node.value, ast.Call)\n                            and isinstance(node.value.func, ast.Name)\n                        ):\n                            return node.value.func.id\n\n        # Look for prefix: ClassName\n        for node in ast.walk(ast_tree):\n            if isinstance(node, ast.AnnAssign) and isinstance(node.target, ast.Name):\n                if node.target.id == prefix and isinstance(node.annotation, ast.Name):\n                    return node.annotation.id\n\n        return None\n\n    def _get_component_name(self, ast_node: ast.AST) -> Optional[str]:\n        \"\"\"\n        Get the name of a component (function, async function, or class).\n\n        Args:\n            ast_node: AST node representing the component\n\n        Returns:\n            Name of the component if present, None otherwise\n        \"\"\"\n        if isinstance(ast_node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):\n            return ast_node.name\n        return None\n\n    def _contains_call_to(self, ast_node: ast.AST, component_name: str) -> bool:\n        \"\"\"\n        Check if ast_node contains a call to the specified component name.\n\n        Args:\n            ast_node: AST node to check\n            component_name: Name of the component to look for\n\n        Returns:\n            True if the node contains a call to the component, False otherwise\n        \"\"\"\n        for node in ast.walk(ast_node):\n            if isinstance(node, ast.Call):\n                call_name = self._get_call_name(node)\n                if call_name == component_name:\n                    return True\n        return False\n\n    def _get_call_name(self, call_node: ast.Call) -> Optional[str]:\n        \"\"\"\n        Get the name being called in a Call node.\n\n        Args:\n            call_node: AST Call node\n\n        Returns:\n            Name being called, or None if it cannot be determined\n        \"\"\"\n        if isinstance(call_node.func, ast.Name):\n            return call_node.func.id\n        elif isinstance(call_node.func, ast.Attribute):\n            return call_node.func.attr\n        return None\n\n    def _format_call_node(self, call_node: ast.Call) -> str:\n        \"\"\"\n        Format a call node as a string for demonstration.\n\n        Args:\n            call_node: AST Call node\n\n        Returns:\n            String representation of the call\n        \"\"\"\n        call_name = self._get_call_name(call_node)\n        return f\"{call_name}(...)\"\n\n    def _get_node_source(self, file_path: str, node: ast.AST) -> str:\n        \"\"\"\n        Get the source code for an AST node from the original file.\n\n        Args:\n            file_path: Path to the file containing the node\n            node: AST node to get the source for\n\n        Returns:\n            Source code for the node, or an error message\n        \"\"\"\n        try:\n            full_path = os.path.join(self.repo_path, file_path)\n            with open(full_path, 'r') as f:\n                file_content = f.read()\n\n            start_line = node.lineno\n            end_line = self._get_end_line(node, file_content)\n            lines = file_content.split('\\n')\n\n            # Check for docstring if this is a function or class definition\n            if isinstance(node, (ast.FunctionDef, ast.ClassDef)):\n                # The docstring would be the first element in the body if it exists\n                if (node.body and isinstance(node.body[0], ast.Expr) and \n                    isinstance(node.body[0].value, ast.Str)):\n                    # Docstring is already included in the range from lineno to end_lineno\n                    pass\n\n            # Safeguard: ensure end_line does not exceed total line count\n            end_line = min(end_line, len(lines))\n            return '\\n'.join(lines[start_line - 1:end_line])\n        except Exception as e:\n            return f\"Error retrieving source for {type(node).__name__}: {e}\"\n\n    def _get_end_line(self, node: ast.AST, file_content: str) -> int:\n        \"\"\"\n        Get the end line number for an AST node, using end_lineno if present.\n\n        Args:\n            node: AST node\n            file_content: Content of the file\n\n        Returns:\n            End line number of the node\n        \"\"\"\n        if hasattr(node, 'end_lineno') and node.end_lineno:\n            return node.end_lineno\n        if hasattr(node, 'body') and node.body:\n            last_subnode = node.body[-1]\n            return self._get_end_line(last_subnode, file_content)\n        return node.lineno"
  },
  {
    "path": "src/agent/tool/perplexity_api.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport os\nimport requests\nfrom typing import List, Dict, Any\nfrom dataclasses import dataclass\nimport yaml\n\n@dataclass\nclass PerplexityResponse:\n    \"\"\"Structured response from Perplexity API\"\"\"\n    content: str\n    raw_response: Dict[str, Any]\n\nclass PerplexityAPI:\n    \"\"\"Wrapper for Perplexity API interactions\"\"\"\n    \n    def __init__(self, api_key: str | None = None, config_path: str = \"config/agent_config.yaml\"):\n        \"\"\"Initialize the API wrapper.\n        \n        Args:\n            api_key: Perplexity API key. If None, will try to get from config.\n            config_path: Path to the configuration file\n        \"\"\"\n        self.config = self._load_config(config_path)\n        self.api_key = api_key or self.config.get('api_key')\n        if not self.api_key:\n            raise ValueError(\"Perplexity API key not provided and not found in config\")\n            \n        self.base_url = \"https://api.perplexity.ai/chat/completions\"\n        self.headers = {\n            \"Authorization\": f\"Bearer {self.api_key}\",\n            \"Content-Type\": \"application/json\"\n        }\n    \n    def _load_config(self, config_path: str) -> Dict[str, Any]:\n        \"\"\"Load configuration from yaml file.\"\"\"\n        try:\n            with open(config_path, 'r') as f:\n                config = yaml.safe_load(f)\n                return config.get('perplexity', {})\n        except Exception as e:\n            print(f\"Warning: Could not load config file: {e}\")\n            return {}\n    \n    def query(self, \n             question: str,\n             system_prompt: str = \"Be precise and concise.\",\n             temperature: float | None = None,\n             model: str | None = None,\n             max_output_tokens: int | None = 4096) -> PerplexityResponse:\n        \"\"\"Send a single query to Perplexity API.\n        \n        Args:\n            question: The question to ask\n            system_prompt: System prompt to guide the response\n            temperature: Temperature for response generation (0.0-1.0)\n            model: Model to use for generation\n            max_output_tokens: Maximum tokens in response\n            \n        Returns:\n            PerplexityResponse containing the response content and raw API response\n            \n        Raises:\n            requests.exceptions.RequestException: If API request fails\n            ValueError: If API response is invalid\n        \"\"\"\n        payload = {\n            \"model\": model or self.config.get('model', 'sonar'),\n            \"messages\": [\n                {\n                    \"role\": \"system\",\n                    \"content\": system_prompt\n                },\n                {\n                    \"role\": \"user\",\n                    \"content\": question\n                }\n            ],\n            \"temperature\": temperature or self.config.get('temperature', 0.1),\n            \"max_tokens\": max_output_tokens or self.config.get('max_output_tokens', 200),\n            \"top_p\": 0.9,\n            \"return_images\": False,\n            \"return_related_questions\": False\n        }\n        \n        response = requests.post(self.base_url, json=payload, headers=self.headers)\n        response.raise_for_status()\n        \n        response_data = response.json()\n        if \"choices\" not in response_data or not response_data[\"choices\"]:\n            raise ValueError(\"Invalid API response: missing choices\")\n            \n        content = response_data[\"choices\"][0].get(\"message\", {}).get(\"content\", \"\")\n        if not content:\n            raise ValueError(\"Invalid API response: missing content\")\n            \n        return PerplexityResponse(content=content, raw_response=response_data)\n    \n    def batch_query(self, \n                   questions: List[str],\n                   system_prompt: str = \"Be precise and concise.\",\n                   temperature: float | None = None,\n                   model: str | None = None,\n                   max_output_tokens: int | None = None) -> List[PerplexityResponse]:\n        \"\"\"Send multiple queries to Perplexity API.\n        \n        Args:\n            questions: List of questions to ask\n            system_prompt: System prompt to guide the responses\n            temperature: Temperature for response generation (0.0-1.0)\n            model: Model to use for generation\n            max_output_tokens: Maximum tokens in response\n            \n        Returns:\n            List of PerplexityResponse objects\n        \"\"\"\n        responses = []\n        for question in questions:\n            try:\n                response = self.query(\n                    question=question,\n                    system_prompt=system_prompt,\n                    temperature=temperature,\n                    model=model,\n                    max_output_tokens=max_output_tokens\n                )\n                responses.append(response)\n            except Exception as e:\n                # If a query fails, add None to maintain order with input questions\n                print(f\"Error querying Perplexity API: {str(e)}\")\n                responses.append(None)\n        \n        return responses "
  },
  {
    "path": "src/agent/verifier.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\nfrom typing import Optional, List\nfrom .base import BaseAgent\n\n\nclass Verifier(BaseAgent):\n    \"\"\"Agent responsible for verifying the quality of generated docstrings.\"\"\"\n    \n    def __init__(self, config_path: Optional[str] = None):\n        \"\"\"Initialize the Verifier agent.\n        \n        Args:\n            config_path: Optional path to the configuration file\n        \"\"\"\n        super().__init__(\"Verifier\", config_path=config_path)\n        self.system_prompt = \"\"\"You are a Verifier agent responsible for ensuring the quality of generated docstrings. \n        Your role is to evaluate docstrings from the perspective of a first-time user encountering the code component.\n        \n        Analysis Process:\n        1. First read the code component as if you're seeing it for the first time\n        2. Read the docstring and analyze how well it helps you understand the code\n        3. Evaluate if the docstring provides the right level of abstraction and information\n        \n        Verification Criteria:\n        1. Information Value:\n           - Identify parts that merely repeat the code without adding value\n           - Flag docstrings that state the obvious without providing insights\n           - Check if explanations actually help understand the purpose and usage\n        \n        2. Appropriate Detail Level:\n           - Flag overly detailed technical explanations of implementation\n           - Ensure focus is on usage and purpose, not line-by-line explanation\n           - Check if internal implementation details are unnecessarily exposed\n        \n        3. Completeness Check:\n           - Verify all required sections are present (summary, args, returns, etc.)\n           - Check if each section provides meaningful information\n           - Ensure critical usage information is not missing\n        \n        Output Format:\n        Your analysis must include:\n        1. <NEED_REVISION>true/false</NEED_REVISION>\n           - Indicates if docstring needs improvement\n        \n        2. If revision needed:\n           <MORE_CONTEXT>true/false</MORE_CONTEXT>\n           - Indicates if additional context is required for improvement\n           - Keep in mind that collecting context is very expensive and may fail, so only use it when absolutely necessary\n        \n        3. Based on MORE_CONTEXT, provide suggestions at the end of your response:\n           If true:\n           <SUGGESTION_CONTEXT>explain why and what specific context is needed</SUGGESTION_CONTEXT>\n           \n           If false:\n           <SUGGESTION>specific improvement suggestions</SUGGESTION>\n        \n        Do not generate other things after </SUGGESTION> or </SUGGESTION_CONTEXT>.\n        \"\"\"\n        self.add_to_memory(\"system\", self.system_prompt)\n\n    def process(\n        self,\n        focal_component: str,\n        docstring: str,\n        context: str = \"\"\n    ) -> str:\n        \"\"\"Verify the quality of a generated docstring.\n        \n        Args:\n            instruction: The original instruction for docstring generation\n            focal_component: The code component with the docstring\n            component_type: The type of the code component\n            docstring: The generated docstring to verify\n            context: The context used to generate the docstring\n            \n        Returns:\n            List of VerificationFeedback objects for each aspect that needs improvement\n        \"\"\"\n        task_description = f\"\"\"\n        Context Used:\n        {context if context else 'No context was used.'}\n\n        Verify the quality of the following docstring for the following Code Component:\n        \n        Code Component:\n        {focal_component}\n        \n        Generated Docstring:\n        {docstring}\n\n        \"\"\"\n        self.add_to_memory(\"user\", task_description)\n        \n        full_response = self.generate_response()\n        return full_response\n    "
  },
  {
    "path": "src/agent/workflow.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Optional\nfrom pathlib import Path\nfrom .orchestrator import Orchestrator\nfrom .reader import CodeComponentType\n\ndef generate_docstring(\n    repo_path: str,\n    file_path: str,\n    focal_component: str,\n    component_type: CodeComponentType,\n    instruction: Optional[str] = None\n) -> str:\n    \"\"\"Generate a high-quality docstring for a code component using the multi-agent system.\n    \n    Args:\n        repo_path: Path to the repository containing the code\n        file_path: Path to the file containing the component\n        focal_component: The code component needing a docstring\n        component_type: The type of the code component (function, method, or class)\n        instruction: Optional specific instructions for docstring generation\n        \n    Returns:\n        The generated and verified docstring\n        \n    Raises:\n        FileNotFoundError: If the repository or file path doesn't exist\n        ValueError: If the component type is invalid\n    \"\"\"\n    # Validate inputs\n    repo_path = str(Path(repo_path).resolve())\n    file_path = str(Path(file_path).resolve())\n    \n    if not Path(repo_path).exists():\n        raise FileNotFoundError(f\"Repository path does not exist: {repo_path}\")\n    if not Path(file_path).exists():\n        raise FileNotFoundError(f\"File path does not exist: {file_path}\")\n    \n    # Use default instruction if none provided\n    if instruction is None:\n        instruction = \"\"\"Generate a comprehensive and helpful docstring that includes:\n        1. A clear description of what the component does\n        2. All parameters and their types\n        3. Return value and type\n        4. Any exceptions that may be raised\n        5. Usage examples where appropriate\n        The docstring should follow PEP 257 style guidelines.\"\"\"\n    \n    # Create orchestrator and generate docstring\n    orchestrator = Orchestrator(repo_path)\n    return orchestrator.process(\n        instruction=instruction,\n        focal_component=focal_component,\n        component_type=component_type,\n        file_path=file_path\n    ) "
  },
  {
    "path": "src/agent/writer.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, Optional\nfrom abc import abstractmethod\nfrom .base import BaseAgent\nfrom .reader import CodeComponentType\n\nclass Writer(BaseAgent):\n    \"\"\"Agent responsible for generating high-quality docstrings based on the code and context.\"\"\"\n    \n    def __init__(self, config_path: Optional[str] = None):\n        \"\"\"Initialize the Writer agent.\n        \n        Args:\n            config_path: Optional path to the configuration file\n        \"\"\"\n        super().__init__(\"Writer\", config_path=config_path)\n        \n        # Base prompt that applies to all documentation\n        self.base_prompt = \"\"\"You are a Writer agent responsible for generating high-quality \n        docstrings that are both complete and helpful. Accessible context is provided to you for \n        generating the docstring.\n        \n        General Guidelines:\n        1. Make docstrings actionable and specific:\n           - Focus on practical usage\n           - Highlight important considerations\n           - Include warnings or gotchas\n        \n        2. Use clear, concise language:\n           - Avoid jargon unless necessary\n           - Use active voice\n           - Be direct and specific\n        \n        3. Type Information:\n           - Include precise type hints\n           - Note any type constraints\n           - Document generic type parameters\n        \n        4. Context and Integration:\n           - Explain component relationships\n           - Note any dependencies\n           - Describe side effects\n        \n        5. Follow Google docstring format:\n           - Use consistent indentation\n           - Maintain clear section separation\n           - Keep related information grouped\"\"\"\n\n        self.add_to_memory(\"system\", self.base_prompt)\n\n        # Class-specific prompt\n        self.class_prompt = \"\"\"You are documenting a CLASS. Focus on describing the object it represents \n        and its role in the system.\n\n        Required sections:\n        1. Summary: \n           - One-line description focusing on WHAT the class represents\n           - Avoid repeating the class name or obvious terms\n           - Focus on the core purpose or responsibility\n        \n        2. Description: \n           - WHY: Explain the motivation and purpose behind this class\n           - WHEN: Describe scenarios or conditions where this class should be used\n           - WHERE: Explain how it fits into the larger system architecture\n           - HOW: Provide a high-level overview of how it achieves its purpose\n        \n        3. Example: \n           - Show a practical, real-world usage scenario\n           - Include initialization and common method calls\n           - Demonstrate typical workflow\n\n        Conditional sections:\n        1. Parameters (if class's __init__ has parameters):\n           - Focus on explaining the significance of each parameter\n           - Include valid value ranges or constraints\n           - Explain parameter relationships if they exist\n        \n        2. Attributes:\n           - Explain the purpose and significance of each attribute\n           - Include type information and valid values\n           - Note any dependencies between attributes\"\"\"\n\n        # Function/Method-specific prompt\n        self.function_prompt = \"\"\"You are documenting a FUNCTION or METHOD. Focus on describing \n        the action it performs and its effects.\n\n        Required sections:\n        1. Summary:\n           - One-line description focusing on WHAT the function does\n           - Avoid repeating the function name\n           - Emphasize the outcome or effect\n        \n        2. Description:\n           - WHY: Explain the purpose and use cases\n           - WHEN: Describe when to use this function\n           - WHERE: Explain how it fits into the workflow\n           - HOW: Provide high-level implementation approach\n\n        Conditional sections:\n        1. Args (if present):\n           - Explain the significance of each parameter\n           - Include valid value ranges or constraints\n           - Note any parameter interdependencies\n        \n        2. Returns:\n           - Explain what the return value represents\n           - Include possible return values or ranges\n           - Note any conditions affecting the return value\n        \n        3. Raises:\n           - List specific conditions triggering each exception\n           - Explain how to prevent or handle exceptions\n        \n        4. Examples (if public and not abstract):\n           - Show practical usage scenarios\n           - Include common parameter combinations\n           - Demonstrate error handling if relevant\"\"\"\n\n    def is_class_component(code: str) -> bool:\n        \"\"\"Determine if the given code component is a class definition.\n        \n        Args:\n            code: The code component to analyze\n            \n        Returns:\n            bool: True if the component is a class definition, False otherwise\n        \"\"\"\n        return \"class \" in code.split('\\n')[0]\n\n    def get_custom_prompt(self, code: str) -> str:\n        \"\"\"Get the appropriate system prompt based on the component type.\n        \n        Args:\n            code: The code component to analyze\n            \n        Returns:\n            str: The appropriate system prompt for the component type\n        \"\"\"\n        is_class = Writer.is_class_component(code)\n        specific_prompt = self.class_prompt if is_class else self.function_prompt\n        return specific_prompt\n\n    def extract_docstring(self, response: str) -> str:\n        \"\"\"Extract the docstring from the LLM response.\n        \n        Args:\n            response: The full response from the LLM containing the docstring between XML tags\n            \n        Returns:\n            str: The extracted docstring, or empty string if no docstring found\n        \"\"\"\n        start_tag = \"<DOCSTRING>\"\n        end_tag = \"</DOCSTRING>\"\n        \n        try:\n            start_idx = response.index(start_tag) + len(start_tag)\n            end_idx = response.index(end_tag)\n            return response[start_idx:end_idx].strip()\n        except ValueError:\n            import logging\n            logger = logging.getLogger(__name__)\n            logger.warning(\"\\033[93mError parsing, no DOCSTRING XML tags found in response, directly return the response as docstring %s\\033[0m\")\n            return response\n\n    def process(\n        self,\n        focal_component: str,\n        context: Dict[str, Any],\n    ) -> str:\n        \"\"\"Generate a docstring for the given code component.\n        \n        Args:\n            focal_component: The code component needing a docstring\n            context: Dictionary containing gathered context information\n            \n        Returns:\n            str: The generated docstring following the specified format\n        \"\"\"\n        \n        task_description = f\"\"\"\n        Available context:\n        {context}\n\n        {self.get_custom_prompt(focal_component)}\n\n        Now, generate a high-quality docstring for the following Code Component based on the Available context:\n        \n        <FOCAL_CODE_COMPONENT>\n        {focal_component}\n        </FOCAL_CODE_COMPONENT>\n\n        Keep in mind:\n        1. Generate docstring between XML tag: <DOCSTRING> and </DOCSTRING>\n        2. First analysis the code component and then generate the docstring at the end based on the context.\n        3. Do not add triple quotes (\\\"\\\"\\\") to your generated docstring.\n        4. Always double check if the generated docstring is within the XML tags: <DOCSTRING> and </DOCSTRING>. This is critical for parsing the docstring.\n        \"\"\"\n        self.add_to_memory(\"user\", task_description)\n        \n        # Generate response using LLM\n        full_response = self.generate_response()\n        \n        # Extract and return just the docstring part\n        return self.extract_docstring(full_response)\n    \n"
  },
  {
    "path": "src/analyze_helpfulness_significance.py",
    "content": "#!/usr/bin/env python\n# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nScript to analyze statistical significance between docstring helpfulness scores\nof different systems.\n\nUsage:\n    conda activate docstringgen\n    python src/analyze_significance.py\n\"\"\"\n\nimport json\nimport os\nimport argparse\nimport numpy as np\nfrom scipy import stats\nimport pandas as pd\nfrom typing import Dict, List, Tuple, Any\n\n\ndef load_results(filepath: str) -> Dict[str, Any]:\n    \"\"\"Load the helpfulness evaluation results from JSON file.\"\"\"\n    with open(filepath, 'r') as f:\n        return json.load(f)\n\n\ndef get_system_scores(results: Dict[str, Any], system: str) -> Dict[str, List[int]]:\n    \"\"\"\n    Extract scores for a specific system, organized by aspect.\n    \n    Returns:\n        Dictionary mapping aspect to list of scores\n    \"\"\"\n    system_results = [r for r in results[\"results\"] if r[\"system\"] == system]\n    scores_by_aspect = {}\n    \n    for result in system_results:\n        aspect = result[\"aspect\"]\n        score = result[\"score\"]\n        \n        if aspect not in scores_by_aspect:\n            scores_by_aspect[aspect] = []\n        \n        scores_by_aspect[aspect].append(score)\n    \n    return scores_by_aspect\n\n\ndef get_paired_scores(results: Dict[str, Any], system1: str, system2: str) -> Dict[str, Tuple[List[int], List[int]]]:\n    \"\"\"\n    Extract paired scores for two systems, organized by aspect.\n    Only includes components that have scores for both systems.\n    \n    Returns:\n        Dictionary mapping aspect to tuple of (system1_scores, system2_scores)\n    \"\"\"\n    # Get all component IDs evaluated by both systems\n    system1_results = [r for r in results[\"results\"] if r[\"system\"] == system1]\n    system2_results = [r for r in results[\"results\"] if r[\"system\"] == system2]\n    \n    system1_components = {(r[\"component_id\"], r[\"aspect\"]): r for r in system1_results}\n    system2_components = {(r[\"component_id\"], r[\"aspect\"]): r for r in system2_results}\n    \n    # Find common component-aspect pairs\n    common_pairs = set(system1_components.keys()).intersection(system2_components.keys())\n    \n    # Organize paired scores by aspect\n    paired_scores = {}\n    for component_id, aspect in common_pairs:\n        if aspect not in paired_scores:\n            paired_scores[aspect] = ([], [])\n        \n        paired_scores[aspect][0].append(system1_components[(component_id, aspect)][\"score\"])\n        paired_scores[aspect][1].append(system2_components[(component_id, aspect)][\"score\"])\n    \n    return paired_scores\n\n\ndef run_significance_tests(results: Dict[str, Any]) -> Dict[str, Any]:\n    \"\"\"\n    Run statistical significance tests between specified system pairs.\n    \n    Returns:\n        Dictionary with test results\n    \"\"\"\n    system_pairs = [\n        (\"copy_paste_codellama34b\", \"docassist-codellama34b\"),\n        (\"copy_paste_gpt4o_mini\", \"docassist-gpt4o_mini\"),\n        (\"fim-codellama13b\", \"docassist-codellama34b\")\n    ]\n    \n    significance_results = {}\n    \n    for system1, system2 in system_pairs:\n        pair_key = f\"{system1} vs {system2}\"\n        significance_results[pair_key] = {}\n        \n        # Get paired scores for the two systems\n        paired_scores = get_paired_scores(results, system1, system2)\n        \n        # Calculate overall paired scores across all aspects\n        all_scores_sys1 = []\n        all_scores_sys2 = []\n        \n        for aspect, (scores1, scores2) in paired_scores.items():\n            all_scores_sys1.extend(scores1)\n            all_scores_sys2.extend(scores2)\n            \n            # Run tests for each aspect\n            if len(scores1) >= 5:  # Only run tests if we have enough samples\n                # Perform Wilcoxon signed-rank test (non-parametric paired test)\n                try:\n                    w_stat, p_value = stats.wilcoxon(scores1, scores2)\n                    is_significant = p_value < 0.05\n                    better_system = system2 if np.mean(scores2) > np.mean(scores1) else system1\n                    \n                    significance_results[pair_key][aspect] = {\n                        \"mean_1\": np.mean(scores1),\n                        \"mean_2\": np.mean(scores2),\n                        \"p_value\": p_value,\n                        \"is_significant\": is_significant,\n                        \"better_system\": better_system if is_significant else \"No significant difference\",\n                        \"n_samples\": len(scores1)\n                    }\n                except ValueError as e:\n                    # This can happen if the differences are all zero\n                    significance_results[pair_key][aspect] = {\n                        \"mean_1\": np.mean(scores1),\n                        \"mean_2\": np.mean(scores2),\n                        \"p_value\": 1.0,\n                        \"is_significant\": False,\n                        \"better_system\": \"No significant difference\",\n                        \"n_samples\": len(scores1),\n                        \"note\": \"Test could not be performed: \" + str(e)\n                    }\n        \n        # Run test for overall scores\n        if len(all_scores_sys1) >= 5:\n            try:\n                w_stat, p_value = stats.wilcoxon(all_scores_sys1, all_scores_sys2)\n                is_significant = p_value < 0.05\n                better_system = system2 if np.mean(all_scores_sys2) > np.mean(all_scores_sys1) else system1\n                \n                significance_results[pair_key][\"overall\"] = {\n                    \"mean_1\": np.mean(all_scores_sys1),\n                    \"mean_2\": np.mean(all_scores_sys2),\n                    \"p_value\": p_value,\n                    \"is_significant\": is_significant,\n                    \"better_system\": better_system if is_significant else \"No significant difference\",\n                    \"n_samples\": len(all_scores_sys1)\n                }\n            except ValueError as e:\n                significance_results[pair_key][\"overall\"] = {\n                    \"mean_1\": np.mean(all_scores_sys1),\n                    \"mean_2\": np.mean(all_scores_sys2),\n                    \"p_value\": 1.0,\n                    \"is_significant\": False,\n                    \"better_system\": \"No significant difference\",\n                    \"n_samples\": len(all_scores_sys1),\n                    \"note\": \"Test could not be performed: \" + str(e)\n                }\n    \n    return significance_results\n\n\ndef format_significance_markdown(significance_results: Dict[str, Any]) -> str:\n    \"\"\"Format significance test results as markdown.\"\"\"\n    md = \"## Statistical Significance Tests\\n\\n\"\n    md += \"Statistical significance was assessed using the Wilcoxon signed-rank test with a significance level of p < 0.05.\\n\\n\"\n    \n    for pair_key, pair_results in significance_results.items():\n        md += f\"### {pair_key}\\n\\n\"\n        \n        # Create a table for this pair\n        md += \"| Aspect | System 1 Mean | System 2 Mean | p-value | Significant? | Better System | n |\\n\"\n        md += \"| ------ | ------------ | ------------ | ------- | ------------ | ------------- | --- |\\n\"\n        \n        # Add overall results first\n        if \"overall\" in pair_results:\n            overall = pair_results[\"overall\"]\n            md += f\"| Overall | {overall['mean_1']:.2f} | {overall['mean_2']:.2f} | {overall['p_value']:.4f} | {overall['is_significant']} | {overall['better_system']} | {overall['n_samples']} |\\n\"\n        \n        # Add results for each aspect\n        for aspect, results in pair_results.items():\n            if aspect != \"overall\":\n                md += f\"| {aspect.capitalize()} | {results['mean_1']:.2f} | {results['mean_2']:.2f} | {results['p_value']:.4f} | {results['is_significant']} | {results['better_system']} | {results['n_samples']} |\\n\"\n        \n        md += \"\\n\"\n    \n    return md\n\n\ndef update_markdown_report(stats_path: str, significance_md: str):\n    \"\"\"Update the markdown report to include significance test results.\"\"\"\n    with open(stats_path, 'r') as f:\n        content = f.read()\n    \n    # Append significance test results\n    updated_content = content + \"\\n\" + significance_md\n    \n    with open(stats_path, 'w') as f:\n        f.write(updated_content)\n\n\ndef main():\n    parser = argparse.ArgumentParser(description=\"Analyze statistical significance of docstring helpfulness\")\n    parser.add_argument(\"--results-path\", type=str, \n                        default=\"experiments/eval/results/helpfulness/helpfulness_evaluation_results.json\",\n                        help=\"Path to the helpfulness evaluation results JSON\")\n    parser.add_argument(\"--stats-path\", type=str, \n                        default=\"experiments/eval/results/helpfulness/helpfulness_evaluation_stats.md\",\n                        help=\"Path to the helpfulness evaluation stats markdown file\")\n    parser.add_argument(\"--output-dir\", type=str, \n                        default=\"experiments/eval/results/helpfulness\",\n                        help=\"Directory to store significance test results\")\n    args = parser.parse_args()\n    \n    # Check if result file exists\n    if not os.path.exists(args.results_path):\n        print(f\"Error: Results file not found at {args.results_path}\")\n        return\n    \n    # Load results\n    results = load_results(args.results_path)\n    \n    # Run significance tests\n    significance_results = run_significance_tests(results)\n    \n    # Format results as markdown\n    significance_md = format_significance_markdown(significance_results)\n    \n    # Save significance test results as separate file\n    significance_path = os.path.join(args.output_dir, \"significance_tests.md\")\n    with open(significance_path, 'w') as f:\n        f.write(significance_md)\n    \n    # Update the stats markdown file\n    if os.path.exists(args.stats_path):\n        update_markdown_report(args.stats_path, significance_md)\n    \n    print(f\"Significance test results saved to {significance_path}\")\n    if os.path.exists(args.stats_path):\n        print(f\"Updated stats report with significance tests at {args.stats_path}\")\n\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "src/data/parse/data_process.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport os\nimport ast\nimport json\nfrom tqdm import tqdm\nimport argparse\nimport re\nfrom langdetect import detect\n\ndef is_english(text):\n    \"\"\"Check if text contains only English using langdetect.\"\"\"\n    try:\n        return detect(text) == 'en' and text.isascii()\n    except:\n        return False\n\ndef is_high_quality_file_docstring(docstring):\n    \"\"\"Heuristic for file-level docstrings: \n    - At least one meaningful sentence and length ≥ 10 chars.\"\"\"\n    if not docstring or len(docstring.strip()) < 10:\n        return False\n    else:\n        return True\n\n    # Check if it seems like a sentenc\n\ndef is_high_quality_class_docstring(docstring):\n    \"\"\"Heuristic for class docstrings:\n    - At least 2 lines\n    - Possibly mentions common docstring sections (Attributes, Args, Returns)\"\"\"\n    if not docstring:\n        return False\n    lines = docstring.strip().split('\\n')\n    if len(lines) < 2:\n        return False\n    keywords = [\"Attributes\", \"Args\", \"Returns\", \"Example\", \"Methods\", \"Param\", \"arguments\", \"Parameters\"]\n    if any(kw in docstring for kw in keywords):\n        return True\n    # If at least moderately long, consider it acceptable\n    if len(docstring.strip()) > 30:\n        return True\n    return False\n\ndef is_high_quality_function_docstring(docstring):\n    \"\"\"Heuristic for function or class method docstrings:\n    - At least 3 lines\n    - Mention parameters, args, or returns\n    \"\"\"\n    if not docstring:\n        return False\n    lines = docstring.strip().split('\\n')\n    if len(lines) < 3:\n        return False\n    keywords = [\"Parameters\", \"Args\", \"Returns\", \"Param\", \"arguments\"]\n    if any(kw.lower() in docstring.lower() for kw in keywords):\n        return True\n    # If reasonably long (>30 chars), consider it good\n    if len(docstring.strip()) > 30:\n        return True\n    return False\n\ndef is_high_quality_docstring(docstring, doc_type):\n    \"\"\"Check if docstring meets quality criteria and is in English.\"\"\"\n    if not docstring:\n        return False\n        \n    # First check if it's English\n    if not is_english(docstring):\n        return False\n        \n    # Then apply other quality checks\n    if doc_type == \"file\":\n        return is_high_quality_file_docstring(docstring)\n    elif doc_type == \"class\":\n        return is_high_quality_class_docstring(docstring)\n    elif doc_type in (\"function\", \"class_method\"):\n        return is_high_quality_function_docstring(docstring)\n    return False\n\ndef get_repo_name_from_path(path):\n    \"\"\"Extract repo name from path like: data/downloaded_repos/USERNAME/REPO_NAME\"\"\"\n    parts = path.split(os.sep)\n    try:\n        # Find the index where the username starts (after downloaded_repos)\n        for i, part in enumerate(parts):\n            if part == \"downloaded_repos\":\n                # Return username/repo_name format\n                return f\"{parts[i+1]}/{parts[i+2]}\"\n    except IndexError:\n        pass\n    return None\n\ndef extract_docstrings_from_file(file_path):\n    \"\"\"\n    Parse a single Python file with AST and extract:\n    - file-level docstring\n    - class-level docstrings\n    - function-level docstrings (including class methods)\n    \"\"\"\n    with open(file_path, \"r\", encoding=\"utf-8\", errors='replace') as f:\n        source = f.read()\n    \n    try:\n        tree = ast.parse(source)\n    except SyntaxError:\n        return []\n    \n    docstrings_info = []\n    repo_name = get_repo_name_from_path(file_path)\n\n    # File-level docstring\n    module_docstring = ast.get_docstring(tree)\n    if is_high_quality_docstring(module_docstring, \"file\"):\n        signature = f\"File: {os.path.basename(file_path)}\"\n        docstrings_info.append({\n            \"type\": \"file\",\n            \"location\": file_path,\n            \"repo_name\": repo_name,\n            \"content\": module_docstring.strip(),\n            \"signature\": signature\n        })\n    \n    # Classes and functions\n    for node in ast.walk(tree):\n        if isinstance(node, ast.ClassDef):\n            class_docstring = ast.get_docstring(node)\n            if hasattr(ast, \"unparse\"):\n                bases = [ast.unparse(base) for base in node.bases]\n            else:\n                # fallback for older python versions: just get the name of base classes if simple\n                bases = []\n                for base in node.bases:\n                    if isinstance(base, ast.Name):\n                        bases.append(base.id)\n                    else:\n                        # If complex base, just ignore\n                        bases.append(\"Base\")\n\n            class_signature = f\"class {node.name}\"\n            if bases:\n                class_signature += f\"({', '.join(bases)})\"\n            \n            if is_high_quality_docstring(class_docstring, \"class\"):\n                docstrings_info.append({\n                    \"type\": \"class\",\n                    \"location\": file_path,\n                    \"repo_name\": repo_name,\n                    \"content\": class_docstring.strip(),\n                    \"signature\": class_signature\n                })\n            \n            # Class methods\n            for body_item in node.body:\n                if isinstance(body_item, ast.FunctionDef):\n                    func_docstring = ast.get_docstring(body_item)\n                    args_list = [arg.arg for arg in body_item.args.args]\n                    func_signature = f\"def {body_item.name}({', '.join(args_list)})\"\n                    if is_high_quality_docstring(func_docstring, \"class_method\"):\n                        docstrings_info.append({\n                            \"type\": \"class_method\",\n                            \"location\": file_path,\n                            \"repo_name\": repo_name,\n                            \"content\": func_docstring.strip(),\n                            \"signature\": func_signature\n                        })\n        elif isinstance(node, ast.FunctionDef):\n            # Top-level functions\n            if isinstance(node.parent, ast.Module):  # We'll add a small hack to set parents\n                func_docstring = ast.get_docstring(node)\n                args_list = [arg.arg for arg in node.args.args]\n                func_signature = f\"def {node.name}({', '.join(args_list)})\"\n                if is_high_quality_docstring(func_docstring, \"function\"):\n                    docstrings_info.append({\n                        \"type\": \"function\",\n                        \"location\": file_path,\n                        \"repo_name\": repo_name,\n                        \"content\": func_docstring.strip(),\n                        \"signature\": func_signature\n                    })\n\n    return docstrings_info\n\ndef add_parent_references(tree):\n    \"\"\"Add parent references to nodes, so we can distinguish top-level functions from class methods easily.\"\"\"\n    for node in ast.walk(tree):\n        for child in ast.iter_child_nodes(node):\n            child.parent = node\n\ndef gather_python_files(top_dir):\n    py_files = []\n    for root, dirs, files in os.walk(top_dir):\n        for file in files:\n            if file.endswith(\".py\"):\n                py_files.append(os.path.join(root, file))\n    return py_files\n\ndef process_all_repos(top_dir, output_file):\n    \"\"\"Process all repositories and extract docstrings.\n    \n    Args:\n        top_dir (str): Path to directory containing downloaded repos\n        output_file (str): Path where to save the output JSONL file\n    \"\"\"\n    py_files = gather_python_files(top_dir)\n    # Setup output file\n    # We'll write each docstring object as a single JSON line.\n    # This allows incremental updates without invalidating JSON format.\n    with open(output_file, \"w\", encoding=\"utf-8\") as out_f:\n        # Using tqdm to show progress over Python files\n        for file_path in tqdm(py_files, desc=\"Processing files\"):\n            # Parse the file and extract docstrings\n            with open(file_path, \"r\", encoding=\"utf-8\", errors='replace') as f:\n                source = f.read()\n            try:\n                tree = ast.parse(source)\n                add_parent_references(tree)\n            except SyntaxError:\n                # Skip files that have syntax errors\n                continue\n\n            docstrings = []\n            # File-level docstring\n            repo_name = get_repo_name_from_path(file_path)\n            module_docstring = ast.get_docstring(tree)\n            if is_high_quality_docstring(module_docstring, \"file\"):\n                docstrings.append({\n                    \"type\": \"file\",\n                    \"location\": file_path,\n                    \"repo_name\": repo_name,\n                    \"content\": module_docstring.strip(),\n                    \"signature\": f\"File: {os.path.basename(file_path)}\"\n                })\n            \n            for node in ast.walk(tree):\n                if isinstance(node, ast.ClassDef):\n                    class_docstring = ast.get_docstring(node)\n                    if hasattr(ast, \"unparse\"):\n                        bases = [ast.unparse(base) for base in node.bases]\n                    else:\n                        bases = []\n                        for base in node.bases:\n                            if isinstance(base, ast.Name):\n                                bases.append(base.id)\n                            else:\n                                bases.append(\"Base\")\n\n                    class_signature = f\"class {node.name}\"\n                    if bases:\n                        class_signature += f\"({', '.join(bases)})\"\n                    \n                    if is_high_quality_docstring(class_docstring, \"class\"):\n                        docstrings.append({\n                            \"type\": \"class\",\n                            \"location\": file_path,\n                            \"repo_name\": repo_name,\n                            \"content\": class_docstring.strip(),\n                            \"signature\": class_signature\n                        })\n                    \n                    # Class methods\n                    for body_item in node.body:\n                        if isinstance(body_item, ast.FunctionDef):\n                            func_docstring = ast.get_docstring(body_item)\n                            args_list = [arg.arg for arg in body_item.args.args]\n                            func_signature = f\"def {body_item.name}({', '.join(args_list)})\"\n                            if is_high_quality_docstring(func_docstring, \"class_method\"):\n                                docstrings.append({\n                                    \"type\": \"class_method\",\n                                    \"location\": file_path,\n                                    \"repo_name\": repo_name,\n                                    \"content\": func_docstring.strip(),\n                                    \"signature\": func_signature\n                                })\n                elif isinstance(node, ast.FunctionDef):\n                    # Check if top-level (parent is module)\n                    if isinstance(node.parent, ast.Module):\n                        func_docstring = ast.get_docstring(node)\n                        args_list = [arg.arg for arg in node.args.args]\n                        func_signature = f\"def {node.name}({', '.join(args_list)})\"\n                        if is_high_quality_docstring(func_docstring, \"function\"):\n                            docstrings.append({\n                                \"type\": \"function\",\n                                \"location\": file_path,\n                                \"repo_name\": repo_name,\n                                \"content\": func_docstring.strip(),\n                                \"signature\": func_signature\n                            })\n            \n            # Write each docstring as a separate JSON line immediately\n            for d in docstrings:\n                out_f.write(json.dumps(d, ensure_ascii=False) + \"\\n\")\n                out_f.flush()\n\ndef main():\n    parser = argparse.ArgumentParser(description='Process Python files for docstrings')\n    parser.add_argument('--input-dir', type=str,  default=\"data/downloaded_repos\",\n                      help='Input directory containing downloaded repos')\n    parser.add_argument('--output-file', type=str,  default=\"data/parsed_downloaded_repos/docstrings.jsonl\",\n                      help='Output JSONL file path')\n    args = parser.parse_args()\n\n    process_all_repos(top_dir=args.input_dir, output_file=args.output_file)\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "src/data/parse/downloader.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport yaml\nimport os\nimport logging\nfrom github import Github\nfrom pathlib import Path\nimport git\nfrom typing import Dict, Any, List\nimport time\nfrom datetime import datetime\nfrom tqdm import tqdm\nimport json\n\nclass GitHubRepoDownloader:\n    def __init__(self, config_path: str):\n        self.config = self._load_config(config_path)\n        self.token = self.config.get('GITHUB_TOKEN')\n        if not self.token:\n            raise ValueError(\"GITHUB_TOKEN not found in config file\")\n        self.gh = Github(self.token)\n        self.setup_logging()\n\n    def _load_config(self, config_path: str) -> Dict[str, Any]:\n        try:\n            with open(config_path, 'r') as f:\n                config = yaml.safe_load(f) or {}\n            if 'search_criteria' not in config:\n                config['search_criteria'] = {}\n            return config\n        except yaml.YAMLError as e:\n            logging.error(f\"Error parsing YAML file: {e}\")\n            raise\n        except FileNotFoundError:\n            logging.error(f\"Config file not found: {config_path}\")\n            raise\n\n    def setup_logging(self):\n        log_filename = f\"github_downloader_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log\"\n        logging.basicConfig(\n            level=logging.INFO,\n            format='%(asctime)s - %(levelname)s - %(message)s',\n            handlers=[\n                logging.FileHandler(log_filename),\n                logging.StreamHandler()\n            ]\n        )\n\n    def build_query(self) -> str:\n        \"\"\"Build GitHub search query from config.\"\"\"\n        criteria = self.config.get('search_criteria', {})\n        query_parts = []\n        \n        # Handle owners/users\n        if owners := criteria.get('owners'):\n            if isinstance(owners, list):\n                query_parts.extend(f\"user:{owner}\" for owner in owners)\n            else:\n                query_parts.append(f\"user:{owners}\")\n        \n        # Handle dates - ensure proper date format and use created: qualifier\n        dates = criteria.get('dates', {})\n        if created_after := dates.get('created_after'):\n            # GitHub's search API requires YYYY-MM-DD format\n            if isinstance(created_after, datetime):\n                created_after = created_after.strftime('%Y-%m-%d')\n            query_parts.append(f\"created:>{created_after}\")\n        \n        if created_before := dates.get('created_before'):\n            if isinstance(created_before, datetime):\n                created_before = created_before.strftime('%Y-%m-%d')\n            query_parts.append(f\"created:<{created_before}\")\n        \n        # Handle language\n        if language := criteria.get('language'):\n            if isinstance(language, list):\n                query_parts.append(f\"language:{language[0]}\")  # GitHub API limitation: one language at a time\n            else:\n                query_parts.append(f\"language:{language}\")\n        \n        # Handle stars\n        if stars := criteria.get('stars'):\n            if isinstance(stars, dict):\n                if min_stars := stars.get('min'):\n                    query_parts.append(f\"stars:>{min_stars}\")\n                if max_stars := stars.get('max'):\n                    query_parts.append(f\"stars:<{max_stars}\")\n            else:\n                query_parts.append(f\"stars:>{stars}\")\n        \n        # Handle forks\n        if forks := criteria.get('forks'):\n            if isinstance(forks, dict):\n                if min_forks := forks.get('min'):\n                    query_parts.append(f\"forks:>{min_forks}\")\n                if max_forks := forks.get('max'):\n                    query_parts.append(f\"forks:<{max_forks}\")\n            else:\n                query_parts.append(f\"forks:>{forks}\")\n        \n        # Handle size\n        if size := criteria.get('size'):\n            if isinstance(size, dict):\n                if min_size := size.get('min'):\n                    query_parts.append(f\"size:>{min_size}\")\n                if max_size := size.get('max'):\n                    query_parts.append(f\"size:<{max_size}\")\n            else:\n                query_parts.append(f\"size:>{size}\")\n        \n        # Handle license\n        if license_type := criteria.get('license'):\n            if isinstance(license_type, list):\n                query_parts.append(f\"license:{license_type[0]}\")  # GitHub API limitation: one license at a time\n            else:\n                query_parts.append(f\"license:{license_type}\")\n        \n        query = ' '.join(query_parts) if query_parts else \"is:public\"\n        logging.info(f\"Search query: {query}\")\n        return query\n\n    def clone_repository(self, repo, output_dir: Path) -> bool:\n        \"\"\"Clone a repository using GitPython.\"\"\"\n        repo_dir = output_dir / repo.full_name\n        if repo_dir.exists():\n            logging.info(f\"Repository directory already exists: {repo_dir}\")\n            return False\n        \n        try:\n            # Create clone URL with token\n            clone_url = f\"https://{self.token}@github.com/{repo.full_name}.git\"\n            \n            # Clone the repository\n            git.Repo.clone_from(clone_url, str(repo_dir))\n            \n            # Save repository metadata\n            metadata = {\n                'name': repo.name,\n                'full_name': repo.full_name,\n                'description': repo.description,\n                'stars': repo.stargazers_count,\n                'forks': repo.forks_count,\n                'language': repo.language,\n                'license': repo.license.name if repo.license else None,\n                'created_at': repo.created_at.isoformat() if repo.created_at else None,\n                'updated_at': repo.updated_at.isoformat() if repo.updated_at else None,\n                'topics': repo.get_topics(),\n                'size': repo.size,\n                'clone_time': datetime.now().isoformat(),\n            }\n            \n            with open(repo_dir / 'repo_metadata.yaml', 'w') as f:\n                yaml.dump(metadata, f)\n            \n            logging.info(f\"Successfully cloned: {repo.full_name}\")\n            return True\n        except Exception as e:\n            logging.error(f\"Error cloning repository {repo.full_name}: {e}\")\n            return False\n\n    def run(self):\n        output_dir = Path(self.config.get('output_directory', 'downloaded_repos'))\n        output_dir.mkdir(parents=True, exist_ok=True)\n        \n        # Initialize or load existing metadata file\n        meta_file = output_dir / 'repositories_metadata.json'\n        if meta_file.exists():\n            with open(meta_file, 'r') as f:\n                all_metadata = json.load(f)\n        else:\n            all_metadata = {\n                'download_session': datetime.now().isoformat(),\n                'search_query': self.build_query(),\n                'repositories': {}\n            }\n        \n        max_repos = self.config.get('max_repos', 5)\n        skip_archived = self.config.get('skip_archived', True)\n        skip_forks = self.config.get('skip_forks', True)\n        min_python_percentage = self.config.get('min_python_percentage', 80)  # Default to 80% if not specified\n        \n        # Get date filters from config\n        dates = self.config.get('search_criteria', {}).get('dates', {})\n        created_after = dates.get('created_after')\n        if isinstance(created_after, str):\n            created_after = datetime.fromisoformat(created_after.replace('Z', '+00:00'))\n        \n        created_before = dates.get('created_before')\n        if isinstance(created_before, str):\n            created_before = datetime.fromisoformat(created_before.replace('Z', '+00:00'))\n        \n        query = self.build_query()\n        logging.info(f\"Starting repository search with query: {query}\")\n        \n        try:\n            repos = self.gh.search_repositories(\n                query=query,\n                sort=self.config.get('search_criteria', {}).get('sort', 'stars'),\n                order=self.config.get('search_criteria', {}).get('order', 'desc')\n            )\n            \n            total_count = repos.totalCount\n            logging.info(f\"Found {total_count} repositories matching the search criteria\")\n            \n            downloaded = 0\n            pbar = tqdm(total=max_repos, desc=\"Downloading repositories\")\n            \n            for repo in repos:\n                if downloaded >= max_repos:\n                    break\n                \n                if skip_archived and repo.archived:\n                    logging.info(f\"Skipping archived repository: {repo.full_name}\")\n                    continue\n                \n                if skip_forks and repo.fork:\n                    logging.info(f\"Skipping forked repository: {repo.full_name}\")\n                    continue\n                \n                # Check Python language percentage\n                try:\n                    languages = repo.get_languages()\n                    total_bytes = sum(languages.values())\n                    python_bytes = languages.get('Python', 0)\n                    \n                    if total_bytes > 0:\n                        python_percentage = (python_bytes / total_bytes) * 100\n                        if python_percentage < min_python_percentage:\n                            logging.info(f\"Skipping repository {repo.full_name}: Python code is only {python_percentage:.2f}% (required: {min_python_percentage}%)\")\n                            continue\n                        logging.info(f\"Repository {repo.full_name} has {python_percentage:.2f}% Python code\")\n                    elif min_python_percentage > 0:\n                        logging.info(f\"Skipping repository {repo.full_name}: No language data available\")\n                        continue\n                except Exception as e:\n                    logging.warning(f\"Couldn't check language stats for {repo.full_name}: {e}\")\n                    # Continue even if we can't check language stats, to avoid missing potentially valid repositories\n                \n                if self.clone_repository(repo, output_dir):\n                    # Add repository metadata to the collective metadata\n                    metadata = {\n                        'name': repo.name,\n                        'full_name': repo.full_name,\n                        'description': repo.description,\n                        'stars': repo.stargazers_count,\n                        'forks': repo.forks_count,\n                        'language': repo.language,\n                        'license': repo.license.name if repo.license else None,\n                        'created_at': repo.created_at.isoformat() if repo.created_at else None,\n                        'updated_at': repo.updated_at.isoformat() if repo.updated_at else None,\n                        'topics': repo.get_topics(),\n                        'size': repo.size,\n                        'clone_time': datetime.now().isoformat(),\n                        'local_path': str(output_dir / repo.full_name)\n                    }\n                    all_metadata['repositories'][repo.full_name] = metadata\n                    \n                    # Update the metadata file after each successful download\n                    with open(meta_file, 'w') as f:\n                        json.dump(all_metadata, f, indent=2)\n                    \n                    downloaded += 1\n                    pbar.update(1)\n                \n                # Respect GitHub API rate limits\n                time.sleep(1)\n            \n            pbar.close()\n            logging.info(f\"Successfully downloaded {downloaded} repositories\")\n            logging.info(f\"Metadata file created at: {meta_file}\")\n        \n        except Exception as e:\n            logging.error(f\"Error during repository download process: {e}\")\n            raise\n\nif __name__ == \"__main__\":\n    try:\n        downloader = GitHubRepoDownloader(\"config/download_repo_config.yaml\")\n        downloader.run()\n    except Exception as e:\n        logging.error(f\"Fatal error: {e}\")\n        raise"
  },
  {
    "path": "src/data/parse/repo_tree.py",
    "content": "#!/usr/bin/env python3\n# Copyright (c) Meta Platforms, Inc. and affiliates\nimport os\nimport argparse\nfrom pathlib import Path\nimport json\nfrom typing import Dict, List, Optional\n\nclass ProjectStructureGenerator:\n    def __init__(self, ignore_patterns: List[str] = None):\n        self.ignore_patterns = ignore_patterns or [\n            '.git', '__pycache__', '.pytest_cache',\n            '.env', 'venv', 'node_modules', '.DS_Store',\n            '*.pyc', '*.pyo', '*.pyd', '.Python', '*.so'\n        ]\n    \n    def should_ignore(self, path: str) -> bool:\n        \"\"\"Check if the path should be ignored based on patterns.\"\"\"\n        path_obj = Path(path)\n        return any(\n            path_obj.match(pattern) or\n            any(parent.match(pattern) for parent in path_obj.parents)\n            for pattern in self.ignore_patterns\n        )\n    \n    def generate_structure(self, root_path: str, max_depth: Optional[int] = None) -> Dict:\n        \"\"\"Generate a hierarchical structure of the project.\"\"\"\n        root_path = os.path.abspath(root_path)\n        root_name = os.path.basename(root_path)\n        \n        def explore_directory(current_path: str, current_depth: int = 0) -> Dict:\n            if max_depth is not None and current_depth > max_depth:\n                return {\"type\": \"directory\", \"name\": os.path.basename(current_path), \"truncated\": True}\n            \n            structure = {\n                \"type\": \"directory\",\n                \"name\": os.path.basename(current_path),\n                \"contents\": []\n            }\n            \n            try:\n                for item in sorted(os.listdir(current_path)):\n                    item_path = os.path.join(current_path, item)\n                    \n                    if self.should_ignore(item_path):\n                        continue\n                    \n                    if os.path.isfile(item_path):\n                        file_info = {\n                            \"type\": \"file\",\n                            \"name\": item,\n                            \"extension\": os.path.splitext(item)[1][1:] or \"none\"\n                        }\n                        structure[\"contents\"].append(file_info)\n                    elif os.path.isdir(item_path):\n                        subdir = explore_directory(item_path, current_depth + 1)\n                        if subdir.get(\"contents\") or not subdir.get(\"truncated\"):\n                            structure[\"contents\"].append(subdir)\n            \n            except PermissionError:\n                structure[\"error\"] = \"Permission denied\"\n            \n            return structure\n        \n        return explore_directory(root_path)\n    \n    def format_structure(self, structure: Dict, indent: int = 0) -> str:\n        \"\"\"Format the structure in a hierarchical text format.\"\"\"\n        output = []\n        prefix = \"│   \" * (indent - 1) + \"├── \" if indent > 0 else \"\"\n        \n        if structure.get(\"truncated\"):\n            output.append(f\"{prefix}{structure['name']} [...]\")\n            return \"\\n\".join(output)\n        \n        output.append(f\"{prefix}{structure['name']}/\")\n        \n        if \"contents\" in structure:\n            for i, item in enumerate(structure[\"contents\"]):\n                is_last = i == len(structure[\"contents\"]) - 1\n                if item[\"type\"] == \"file\":\n                    item_prefix = \"│   \" * indent + (\"└── \" if is_last else \"├── \")\n                    output.append(f\"{item_prefix}{item['name']}\")\n                else:\n                    output.append(self.format_structure(item, indent + 1))\n        \n        return \"\\n\".join(output)\n\ndef main():\n    parser = argparse.ArgumentParser(\n        description=\"Generate a project structure in LLM-friendly format\"\n    )\n    parser.add_argument(\n        \"path\",\n        nargs=\"?\",\n        default=\".\",\n        help=\"Path to the project directory (default: current directory)\"\n    )\n    parser.add_argument(\n        \"--max-depth\",\n        type=int,\n        help=\"Maximum depth to traverse (default: no limit)\"\n    )\n    parser.add_argument(\n        \"--output\",\n        choices=[\"text\", \"json\"],\n        default=\"text\",\n        help=\"Output format (default: text)\"\n    )\n    parser.add_argument(\n        \"--ignore\",\n        nargs=\"+\",\n        help=\"Additional patterns to ignore\"\n    )\n    \n    args = parser.parse_args()\n    \n    generator = ProjectStructureGenerator()\n    if args.ignore:\n        generator.ignore_patterns.extend(args.ignore)\n    \n    structure = generator.generate_structure(args.path, args.max_depth)\n    \n    if args.output == \"json\":\n        print(json.dumps(structure, indent=2))\n    else:\n        print(generator.format_structure(structure))\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "src/dependency_analyzer/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nDependency analyzer module for building and processing import dependency graphs \nbetween Python code components.\n\"\"\"\n\nfrom .ast_parser import CodeComponent, DependencyParser\nfrom .topo_sort import topological_sort, resolve_cycles, build_graph_from_components, dependency_first_dfs\n\n__all__ = [\n    'CodeComponent', \n    'DependencyParser',\n    'topological_sort',\n    'resolve_cycles',\n    'build_graph_from_components',\n    'dependency_first_dfs'\n]"
  },
  {
    "path": "src/dependency_analyzer/ast_parser.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nAST-based Python code parser that extracts dependency information between code components.\n\nThis module identifies imports and references between Python code components (functions, classes, methods)\nand builds a dependency graph for topological sorting.\n\"\"\"\n\nimport ast\nimport os\nimport json\nimport logging\nimport builtins\nfrom dataclasses import dataclass, field\nfrom typing import Dict, List, Set, Tuple, Optional, Any, Union\nfrom pathlib import Path\n\nlogger = logging.getLogger(__name__)\n\n# Built-in Python types and modules that should be excluded from dependencies\nBUILTIN_TYPES = {name for name in dir(builtins)}\nSTANDARD_MODULES = {\n    'abc', 'argparse', 'array', 'asyncio', 'base64', 'collections', 'copy', \n    'csv', 'datetime', 'enum', 'functools', 'glob', 'io', 'itertools', \n    'json', 'logging', 'math', 'os', 'pathlib', 'random', 're', 'shutil', \n    'string', 'sys', 'time', 'typing', 'uuid', 'warnings', 'xml'\n}\nEXCLUDED_NAMES = {'self', 'cls'}\n\n@dataclass\nclass CodeComponent:\n    \"\"\"\n    Represents a single code component (function, class, or method) in a Python codebase.\n    \n    Stores the component's identifier, AST node, dependencies, and other metadata.\n    \"\"\"\n    # Unique identifier for the component, format: module_path.ClassName.method_name\n    id: str\n    \n    # AST node representing this component\n    node: ast.AST\n    \n    # Type of component: 'class', 'function', or 'method'\n    component_type: str\n    \n    # Full path to the file containing this component\n    file_path: str\n    \n    # Relative path within the repo\n    relative_path: str\n    \n    # Set of component IDs this component depends on\n    depends_on: Set[str] = field(default_factory=set)\n    \n    # Original source code of the component\n    source_code: Optional[str] = None\n    \n    # Line numbers in the file (1-indexed)\n    start_line: int = 0\n    end_line: int = 0\n    \n    # Whether the component already has a docstring\n    has_docstring: bool = False\n    \n    # Content of the docstring if it exists, empty string otherwise\n    docstring: str = \"\"\n\n    def to_dict(self) -> Dict[str, Any]:\n        \"\"\"Convert this component to a dictionary representation for JSON serialization.\"\"\"\n        return {\n            'id': self.id,\n            'component_type': self.component_type,\n            'file_path': self.file_path,\n            'relative_path': self.relative_path,\n            'depends_on': list(self.depends_on),\n            'start_line': self.start_line,\n            'end_line': self.end_line,\n            'has_docstring': self.has_docstring,\n            'docstring': self.docstring\n        }\n\n    @staticmethod\n    def from_dict(data: Dict[str, Any]) -> 'CodeComponent':\n        \"\"\"Create a CodeComponent from a dictionary representation.\"\"\"\n        component = CodeComponent(\n            id=data['id'],\n            node=None,  # AST node is not serialized\n            component_type=data['component_type'],\n            file_path=data['file_path'],\n            relative_path=data['relative_path'],\n            depends_on=set(data.get('depends_on', [])),\n            start_line=data.get('start_line', 0),\n            end_line=data.get('end_line', 0),\n            has_docstring=data.get('has_docstring', False),\n            docstring=data.get('docstring', \"\")\n        )\n        return component\n\n\nclass ImportCollector(ast.NodeVisitor):\n    \"\"\"Collects import statements from Python code.\"\"\"\n    \n    def __init__(self):\n        self.imports = set()\n        self.from_imports = {}  # module -> [names]\n        \n    def visit_Import(self, node: ast.Import):\n        \"\"\"Process 'import x' statements.\"\"\"\n        for name in node.names:\n            self.imports.add(name.name)\n        self.generic_visit(node)\n    \n    def visit_ImportFrom(self, node: ast.ImportFrom):\n        \"\"\"Process 'from x import y' statements.\"\"\"\n        if node.module is not None:\n            module = node.module\n            if module not in self.from_imports:\n                self.from_imports[module] = []\n            \n            for name in node.names:\n                if name.name != '*':\n                    self.from_imports[module].append(name.name)\n        \n        self.generic_visit(node)\n\n\nclass MethodDependencyCollector(ast.NodeVisitor):\n    \"\"\"\n    Special dependency collector for methods that also tracks 'self.XXX' references\n    as potential dependencies.\n    \"\"\"\n    \n    def __init__(self, class_id: str, method_id: str, class_methods: Dict[str, str]):\n        self.class_id = class_id\n        self.method_id = method_id\n        self.class_methods = class_methods  # method_name -> full_method_id\n        self.self_attr_refs = set()  # Set of attributes accessed via self.XXX\n        \n    def visit_Attribute(self, node: ast.Attribute):\n        \"\"\"Process attribute access, specifically looking for self.XXX references.\"\"\"\n        if (isinstance(node.value, ast.Name) and \n            node.value.id == 'self' and \n            isinstance(node.ctx, ast.Load)):\n            \n            # Found a self.XXX reference\n            attr_name = node.attr\n            self.self_attr_refs.add(attr_name)\n        \n        self.generic_visit(node)\n    \n    def get_method_dependencies(self) -> Set[str]:\n        \"\"\"\n        Get the set of methods that this method depends on based on self.XXX references.\n        \n        Returns:\n            A set of method IDs that this method depends on\n        \"\"\"\n        dependencies = set()\n        \n        # Check if any self.attr references match method names\n        for attr in self.self_attr_refs:\n            if attr in self.class_methods:\n                # This is a reference to another method in the class\n                dependencies.add(self.class_methods[attr])\n        \n        return dependencies\n\n\nclass DependencyCollector(ast.NodeVisitor):\n    \"\"\"\n    Collects dependencies between code components by analyzing\n    attribute access, function calls, and class references.\n    \"\"\"\n    \n    def __init__(self, imports, from_imports, current_module, repo_modules):\n        self.imports = imports\n        self.from_imports = from_imports\n        self.current_module = current_module\n        self.repo_modules = repo_modules\n        self.dependencies = set()\n        self._current_class = None\n        # Track local variables defined in the current context\n        self.local_variables = set()\n    \n    def visit_ClassDef(self, node: ast.ClassDef):\n        \"\"\"Process class definitions.\"\"\"\n        old_class = self._current_class\n        self._current_class = node.name\n        \n        # Check for base classes dependencies\n        for base in node.bases:\n            if isinstance(base, ast.Name):\n                # Simple name reference, could be an imported class\n                self._add_dependency(base.id)\n            elif isinstance(base, ast.Attribute):\n                # Module.Class reference\n                self._process_attribute(base)\n        \n        self.generic_visit(node)\n        self._current_class = old_class\n    \n    def visit_Assign(self, node: ast.Assign):\n        \"\"\"Track local variable assignments.\"\"\"\n        for target in node.targets:\n            if isinstance(target, ast.Name):\n                # Add to local variables\n                self.local_variables.add(target.id)\n        self.generic_visit(node)\n    \n    def visit_Call(self, node: ast.Call):\n        \"\"\"Process function calls.\"\"\"\n        if isinstance(node.func, ast.Name):\n            # Direct function call\n            self._add_dependency(node.func.id)\n        elif isinstance(node.func, ast.Attribute):\n            # Method call or module.function call\n            self._process_attribute(node.func)\n        \n        self.generic_visit(node)\n    \n    def visit_Name(self, node: ast.Name):\n        \"\"\"Process name references.\"\"\"\n        if isinstance(node.ctx, ast.Load):\n            self._add_dependency(node.id)\n        self.generic_visit(node)\n    \n    def visit_Attribute(self, node: ast.Attribute):\n        \"\"\"Process attribute access.\"\"\"\n        self._process_attribute(node)\n        self.generic_visit(node)\n    \n    def _process_attribute(self, node: ast.Attribute):\n        \"\"\"Process an attribute node to extract potential dependencies.\"\"\"\n        parts = []\n        current = node\n        \n        # Traverse the attribute chain (e.g., module.submodule.Class.method)\n        while isinstance(current, ast.Attribute):\n            parts.insert(0, current.attr)\n            current = current.value\n        \n        if isinstance(current, ast.Name):\n            parts.insert(0, current.id)\n            \n            # Skip if the first part is a local variable\n            if parts[0] in self.local_variables:\n                return\n                \n            # Skip if the first part is in our excluded names\n            if parts[0] in EXCLUDED_NAMES:\n                return\n                \n            # Check if the first part is an imported module\n            if parts[0] in self.imports:\n                module_path = parts[0]\n                # Skip standard library modules\n                if module_path in STANDARD_MODULES:\n                    return\n                    \n                # If it's a repo module, add as dependency\n                if module_path in self.repo_modules:\n                    if len(parts) > 1:\n                        # Example: module.Class or module.function\n                        self.dependencies.add(f\"{module_path}.{parts[1]}\")\n            \n            # Check from imports\n            elif parts[0] in self.from_imports.keys():\n                # Skip standard library modules\n                if parts[0] in STANDARD_MODULES:\n                    return\n                    \n                # Check if the name is in the imported names\n                if len(parts) > 1 and parts[1] in self.from_imports[parts[0]]:\n                    self.dependencies.add(f\"{parts[0]}.{parts[1]}\")\n    \n    def _add_dependency(self, name):\n        \"\"\"Add a potential dependency based on a name reference.\"\"\"\n        # Skip built-in types\n        if name in BUILTIN_TYPES:\n            return\n            \n        # Skip excluded names\n        if name in EXCLUDED_NAMES:\n            return\n            \n        # Skip local variables\n        if name in self.local_variables:\n            return\n            \n        # Check if name is directly imported from a module\n        for module, imported_names in self.from_imports.items():\n            # Skip standard library modules\n            if module in STANDARD_MODULES:\n                continue\n                \n            if name in imported_names and module in self.repo_modules:\n                self.dependencies.add(f\"{module}.{name}\")\n                return\n                \n        # Check if name refers to a component in the current module\n        local_component_id = f\"{self.current_module}.{name}\"\n        self.dependencies.add(local_component_id)\n\n\ndef add_parent_to_nodes(tree: ast.AST) -> None:\n    \"\"\"\n    Add a 'parent' attribute to each node in the AST.\n    \n    Args:\n        tree: The AST to process\n    \"\"\"\n    for node in ast.walk(tree):\n        for child in ast.iter_child_nodes(node):\n            child.parent = node\n\n\nclass DependencyParser:\n    \"\"\"\n    Parses Python code to build a dependency graph between code components.\n    \"\"\"\n    \n    def __init__(self, repo_path: str):\n        self.repo_path = os.path.abspath(repo_path)\n        self.components: Dict[str, CodeComponent] = {}\n        self.dependency_graph: Dict[str, List[str]] = {}\n        self.modules: Set[str] = set()\n        \n    def parse_repository(self):\n        \"\"\"\n        Parse all Python files in the repository to build the dependency graph.\n        \"\"\"\n        logger.info(f\"Parsing repository at {self.repo_path}\")\n        \n        # First pass: collect all modules and code components\n        for root, _, files in os.walk(self.repo_path):\n            for file in files:\n                if not file.endswith(\".py\"):\n                    continue\n                \n                file_path = os.path.join(root, file)\n                relative_path = os.path.relpath(file_path, self.repo_path)\n                \n                # Convert file path to module path\n                module_path = self._file_to_module_path(relative_path)\n                self.modules.add(module_path)\n                \n                # Parse the file to collect components\n                self._parse_file(file_path, relative_path, module_path)\n        \n        # Second pass: resolve dependencies\n        self._resolve_dependencies()\n        \n        # Third pass: add class dependencies on methods\n        self._add_class_method_dependencies()\n        \n        logger.info(f\"Found {len(self.components)} code components\")\n        return self.components\n    \n    def _file_to_module_path(self, file_path: str) -> str:\n        \"\"\"Convert a file path to a Python module path.\"\"\"\n        # Remove .py extension and convert / to .\n        path = file_path[:-3] if file_path.endswith(\".py\") else file_path\n        return path.replace(os.path.sep, \".\")\n    \n    def _parse_file(self, file_path: str, relative_path: str, module_path: str):\n        \"\"\"Parse a single Python file to collect code components.\"\"\"\n        try:\n            with open(file_path, \"r\", encoding=\"utf-8\") as f:\n                source = f.read()\n            \n            tree = ast.parse(source)\n            \n            # Add parent field to AST nodes for easier traversal\n            add_parent_to_nodes(tree)\n            \n            # Collect imports\n            import_collector = ImportCollector()\n            import_collector.visit(tree)\n            \n            # Collect code components\n            self._collect_components(tree, file_path, relative_path, module_path, source)\n            \n        except (SyntaxError, UnicodeDecodeError) as e:\n            logger.warning(f\"Error parsing {file_path}: {e}\")\n    \n    def _collect_components(self, tree: ast.AST, file_path: str, relative_path: str, \n                          module_path: str, source: str):\n        \"\"\"Collect all code components (functions, classes, methods) from an AST.\"\"\"\n        for node in ast.walk(tree):\n            if isinstance(node, ast.ClassDef):\n                # Class definition\n                class_id = f\"{module_path}.{node.name}\"\n                \n                # Check if the class has a docstring\n                has_docstring = (\n                    len(node.body) > 0 \n                    and isinstance(node.body[0], ast.Expr) \n                    and isinstance(node.body[0].value, ast.Constant)\n                    and isinstance(node.body[0].value.value, str)\n                )\n                \n                # Extract docstring if it exists\n                docstring = self._get_docstring(source, node) if has_docstring else \"\"\n                \n                component = CodeComponent(\n                    id=class_id,\n                    node=node,\n                    component_type=\"class\",\n                    file_path=file_path,\n                    relative_path=relative_path,\n                    source_code=self._get_source_segment(source, node),\n                    start_line=node.lineno,\n                    end_line=getattr(node, \"end_lineno\", node.lineno),\n                    has_docstring=has_docstring,\n                    docstring=docstring\n                )\n                \n                self.components[class_id] = component\n                \n                # Collect methods within the class\n                for item in node.body:\n                    if isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):\n                        method_id = f\"{class_id}.{item.name}\"\n                        \n                        # Check if the method has a docstring\n                        method_has_docstring = (\n                            len(item.body) > 0 \n                            and isinstance(item.body[0], ast.Expr) \n                            and isinstance(item.body[0].value, ast.Constant)\n                            and isinstance(item.body[0].value.value, str)\n                        )\n                        \n                        # Extract docstring if it exists\n                        method_docstring = self._get_docstring(source, item) if method_has_docstring else \"\"\n                        \n                        method_component = CodeComponent(\n                            id=method_id,\n                            node=item,\n                            component_type=\"method\",\n                            file_path=file_path,\n                            relative_path=relative_path,\n                            source_code=self._get_source_segment(source, item),\n                            start_line=item.lineno,\n                            end_line=getattr(item, \"end_lineno\", item.lineno),\n                            has_docstring=method_has_docstring,\n                            docstring=method_docstring\n                        )\n                        \n                        self.components[method_id] = method_component\n            \n            elif isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):\n                # Only collect top-level functions\n                if hasattr(node, 'parent') and isinstance(node.parent, ast.Module):\n                    func_id = f\"{module_path}.{node.name}\"\n                    \n                    # Check if the function has a docstring\n                    has_docstring = (\n                        len(node.body) > 0 \n                        and isinstance(node.body[0], ast.Expr) \n                        and isinstance(node.body[0].value, ast.Constant)\n                        and isinstance(node.body[0].value.value, str)\n                    )\n                    \n                    # Extract docstring if it exists\n                    docstring = self._get_docstring(source, node) if has_docstring else \"\"\n                    \n                    component = CodeComponent(\n                        id=func_id,\n                        node=node,\n                        component_type=\"function\",\n                        file_path=file_path,\n                        relative_path=relative_path,\n                        source_code=self._get_source_segment(source, node),\n                        start_line=node.lineno,\n                        end_line=getattr(node, \"end_lineno\", node.lineno),\n                        has_docstring=has_docstring,\n                        docstring=docstring\n                    )\n                    \n                    self.components[func_id] = component\n    \n    def _resolve_dependencies(self):\n        \"\"\"\n        Second pass to resolve dependencies between components.\n        \"\"\"\n        for component_id, component in self.components.items():\n            file_path = component.file_path\n            \n            try:\n                with open(file_path, \"r\", encoding=\"utf-8\") as f:\n                    source = f.read()\n                \n                # Parse file to get imports\n                tree = ast.parse(source)\n                \n                # Add parent field to AST nodes for easier traversal\n                add_parent_to_nodes(tree)\n                \n                # Collect imports\n                import_collector = ImportCollector()\n                import_collector.visit(tree)\n                \n                # Find the component node in the tree\n                component_node = None\n                module_path = self._file_to_module_path(component.relative_path)\n                \n                if component.component_type == \"function\":\n                    # Find top-level function\n                    for node in ast.iter_child_nodes(tree):\n                        if (isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)) \n                                and node.name == component.id.split(\".\")[-1]):\n                            component_node = node\n                            break\n                \n                elif component.component_type == \"class\":\n                    # Find class\n                    for node in ast.iter_child_nodes(tree):\n                        if isinstance(node, ast.ClassDef) and node.name == component.id.split(\".\")[-1]:\n                            component_node = node\n                            break\n                \n                elif component.component_type == \"method\":\n                    # Find method inside class\n                    class_name, method_name = component.id.split(\".\")[-2:]\n                    class_node = None\n                    \n                    for node in ast.iter_child_nodes(tree):\n                        if isinstance(node, ast.ClassDef) and node.name == class_name:\n                            class_node = node\n                            for item in node.body:\n                                if (isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)) \n                                        and item.name == method_name):\n                                    component_node = item\n                                    break\n                            break\n                \n                if component_node:\n                    # Collect dependencies for this specific component\n                    dependency_collector = DependencyCollector(\n                        import_collector.imports,\n                        import_collector.from_imports,\n                        module_path,\n                        self.modules\n                    )\n                    \n                    # For functions and methods, collect variables defined in the function\n                    if isinstance(component_node, (ast.FunctionDef, ast.AsyncFunctionDef)):\n                        # Add function parameters to local variables\n                        for arg in component_node.args.args:\n                            dependency_collector.local_variables.add(arg.arg)\n                            \n                    dependency_collector.visit(component_node)\n                    \n                    # Add dependencies to the component\n                    component.depends_on.update(dependency_collector.dependencies)\n                    \n                    # Filter out non-existent dependencies\n                    component.depends_on = {\n                        dep for dep in component.depends_on \n                        if dep in self.components or dep.split(\".\", 1)[0] in self.modules\n                    }\n                \n            except (SyntaxError, UnicodeDecodeError) as e:\n                logger.warning(f\"Error analyzing dependencies in {file_path}: {e}\")\n    \n    def _add_class_method_dependencies(self):\n        \"\"\"\n        Third pass to make classes dependent on their methods (except __init__).\n        \"\"\"\n        # Group components by class\n        class_methods = {}\n        \n        # Collect all methods for each class\n        for component_id, component in self.components.items():\n            if component.component_type == \"method\":\n                parts = component_id.split(\".\")\n                if len(parts) >= 2:\n                    method_name = parts[-1]\n                    class_id = \".\".join(parts[:-1])\n                    \n                    if class_id not in class_methods:\n                        class_methods[class_id] = []\n                    \n                    # Don't include __init__ methods as dependencies of the class\n                    if method_name != \"__init__\":\n                        class_methods[class_id].append(component_id)\n        \n        # Add method dependencies to their classes\n        for class_id, method_ids in class_methods.items():\n            if class_id in self.components:\n                class_component = self.components[class_id]\n                for method_id in method_ids:\n                    class_component.depends_on.add(method_id)\n    \n    def _get_source_segment(self, source: str, node: ast.AST) -> str:\n        \"\"\"Get source code segment for an AST node.\"\"\"\n        try:\n            if hasattr(ast, \"get_source_segment\"):\n                segment = ast.get_source_segment(source, node)\n                if segment is not None:\n                    return segment\n            \n            # Fallback to manual extraction\n            lines = source.split(\"\\n\")\n            start_line = node.lineno - 1\n            end_line = getattr(node, \"end_lineno\", node.lineno) - 1\n            return \"\\n\".join(lines[start_line:end_line + 1])\n        \n        except Exception as e:\n            logger.warning(f\"Error getting source segment: {e}\")\n            return \"\"\n    \n    def _get_docstring(self, source: str, node: ast.AST) -> str:\n        \"\"\"Get the docstring for a given AST node.\"\"\"\n        try:\n            if isinstance(node, ast.FunctionDef) or isinstance(node, ast.AsyncFunctionDef):\n                for item in node.body:\n                    if isinstance(item, ast.Expr) and isinstance(item.value, ast.Constant):\n                        if isinstance(item.value.value, str):\n                            return item.value.value\n            elif isinstance(node, ast.ClassDef):\n                for item in node.body:\n                    if isinstance(item, ast.Expr) and isinstance(item.value, ast.Constant):\n                        if isinstance(item.value.value, str):\n                            return item.value.value\n            return \"\"\n        except Exception as e:\n            logger.warning(f\"Error getting docstring: {e}\")\n            return \"\"\n    \n    def save_dependency_graph(self, output_path: str):\n        \"\"\"Save the dependency graph to a JSON file.\"\"\"\n        # Convert to serializable format\n        serializable_components = {\n            comp_id: component.to_dict()\n            for comp_id, component in self.components.items()\n        }\n        \n        # Create directories if they don't exist\n        os.makedirs(os.path.dirname(output_path), exist_ok=True)\n        \n        with open(output_path, \"w\", encoding=\"utf-8\") as f:\n            json.dump(serializable_components, f, indent=2)\n        \n        logger.info(f\"Saved dependency graph to {output_path}\")\n    \n    def load_dependency_graph(self, input_path: str):\n        \"\"\"Load the dependency graph from a JSON file.\"\"\"\n        with open(input_path, \"r\", encoding=\"utf-8\") as f:\n            serialized_components = json.load(f)\n        \n        # Convert back to CodeComponent objects\n        self.components = {\n            comp_id: CodeComponent.from_dict(comp_data)\n            for comp_id, comp_data in serialized_components.items()\n        }\n        \n        logger.info(f\"Loaded {len(self.components)} components from {input_path}\")\n        return self.components "
  },
  {
    "path": "src/dependency_analyzer/topo_sort.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nTopological sorting utilities for dependency graphs with cycle handling.\n\nThis module provides functions to perform topological sorting on a dependency graph,\nincluding detection and resolution of dependency cycles.\n\"\"\"\n\nimport logging\nfrom typing import Dict, List, Set, Tuple, Any, Optional\nfrom collections import defaultdict, deque\n\nlogger = logging.getLogger(__name__)\n\ndef detect_cycles(graph: Dict[str, Set[str]]) -> List[List[str]]:\n    \"\"\"\n    Detect cycles in a dependency graph using Tarjan's algorithm to find\n    strongly connected components.\n    \n    Args:\n        graph: A dependency graph represented as adjacency lists\n               (node -> set of dependencies)\n    \n    Returns:\n        A list of lists, where each inner list contains the nodes in a cycle\n    \"\"\"\n    # Implementation of Tarjan's algorithm\n    index_counter = [0]\n    index = {}  # node -> index\n    lowlink = {}  # node -> lowlink value\n    onstack = set()  # nodes currently on the stack\n    stack = []  # stack of nodes\n    result = []  # list of cycles (strongly connected components)\n    \n    def strongconnect(node):\n        # Set the depth index for node\n        index[node] = index_counter[0]\n        lowlink[node] = index_counter[0]\n        index_counter[0] += 1\n        stack.append(node)\n        onstack.add(node)\n        \n        # Consider successors\n        for successor in graph.get(node, set()):\n            if successor not in index:\n                # Successor has not yet been visited; recurse on it\n                strongconnect(successor)\n                lowlink[node] = min(lowlink[node], lowlink[successor])\n            elif successor in onstack:\n                # Successor is on the stack and hence in the current SCC\n                lowlink[node] = min(lowlink[node], index[successor])\n        \n        # If node is a root node, pop the stack and generate an SCC\n        if lowlink[node] == index[node]:\n            # Start a new strongly connected component\n            scc = []\n            while True:\n                successor = stack.pop()\n                onstack.remove(successor)\n                scc.append(successor)\n                if successor == node:\n                    break\n            \n            # Only include SCCs with more than one node (actual cycles)\n            if len(scc) > 1:\n                result.append(scc)\n    \n    # Visit each node\n    for node in graph:\n        if node not in index:\n            strongconnect(node)\n    \n    return result\n\ndef resolve_cycles(graph: Dict[str, Set[str]]) -> Dict[str, Set[str]]:\n    \"\"\"\n    Resolve cycles in a dependency graph by identifying strongly connected\n    components and breaking cycles.\n    \n    Args:\n        graph: A dependency graph represented as adjacency lists\n               (node -> set of dependencies)\n    \n    Returns:\n        A new acyclic graph with the same nodes but with cycles broken\n    \"\"\"\n    # Detect cycles (SCCs)\n    cycles = detect_cycles(graph)\n    \n    if not cycles:\n        logger.info(\"No cycles detected in the dependency graph\")\n        return graph\n    \n    logger.info(f\"Detected {len(cycles)} cycles in the dependency graph\")\n    \n    # Create a copy of the graph to modify\n    new_graph = {node: deps.copy() for node, deps in graph.items()}\n    \n    # Process each cycle\n    for i, cycle in enumerate(cycles):\n        logger.info(f\"Cycle {i+1}: {' -> '.join(cycle)}\")\n        \n        # Strategy: Break the cycle by removing the \"weakest\" dependency\n        # Here, we just arbitrarily remove the last edge to make the graph acyclic\n        # In a real-world scenario, you might use heuristics to determine which edge to break\n        # For example, removing edges between different modules before edges within the same module\n        for j in range(len(cycle) - 1):\n            current = cycle[j]\n            next_node = cycle[j + 1]\n            \n            if next_node in new_graph[current]:\n                logger.info(f\"Breaking cycle by removing dependency: {current} -> {next_node}\")\n                new_graph[current].remove(next_node)\n                break\n    \n    return new_graph\n\ndef topological_sort(graph: Dict[str, Set[str]]) -> List[str]:\n    \"\"\"\n    Perform a topological sort on a dependency graph.\n    \n    Args:\n        graph: A dependency graph represented as adjacency lists\n               (node -> set of dependencies)\n    \n    Returns:\n        A list of nodes in topological order (dependencies first)\n    \"\"\"\n    # First, check for and resolve cycles\n    acyclic_graph = resolve_cycles(graph)\n    \n    # Initialize in-degree counter for all nodes\n    in_degree = {node: 0 for node in acyclic_graph}\n    \n    # Count in-degrees\n    for node, dependencies in acyclic_graph.items():\n        for dep in dependencies:\n            if dep in in_degree:\n                in_degree[dep] += 1\n    \n    # Queue of nodes with no dependencies (in-degree of 0)\n    queue = deque([node for node, degree in in_degree.items() if degree == 0])\n    \n    # Result list to store the topological order\n    result = []\n    \n    # Process nodes in topological order\n    while queue:\n        node = queue.popleft()\n        result.append(node)\n        \n        # Reduce in-degree for each node that depends on the current node\n        for dependent, deps in acyclic_graph.items():\n            if node in deps:\n                in_degree[dependent] -= 1\n                if in_degree[dependent] == 0:\n                    queue.append(dependent)\n    \n    # Check if the sort was successful (all nodes included)\n    if len(result) != len(acyclic_graph):\n        logger.warning(\"Topological sort failed: graph has cycles that weren't resolved\")\n        # Return all nodes in some order to avoid breaking the process\n        return list(acyclic_graph.keys())\n    \n    # Reverse the result to get dependencies first\n    return result[::-1]\n\ndef dependency_first_dfs(graph: Dict[str, Set[str]]) -> List[str]:\n    \"\"\"\n    Perform a depth-first traversal of the dependency graph, starting from root nodes\n    that have no dependencies.\n    \n    The graph uses natural dependency direction:\n    - If A depends on B, the graph has an edge A → B\n    - This means an edge from X to Y represents \"X depends on Y\"\n    - Root nodes (nodes with no incoming edges/dependencies) are processed first,\n      followed by nodes that depend on them\n    \n    Args:\n        graph: A dependency graph with natural direction (A→B if A depends on B)\n    \n    Returns:\n        A list of nodes in an order where dependencies come before their dependents\n    \"\"\"\n    # First, resolve cycles to ensure we have a DAG\n    acyclic_graph = resolve_cycles(graph)\n    \n    # Find root nodes (nodes with no dependencies)\n    root_nodes = []\n    # Create a reverse graph to easily check if a node has incoming edges\n    has_incoming_edge = {node: False for node in acyclic_graph}\n    \n    for node, deps in acyclic_graph.items():\n        for dep in deps:\n            has_incoming_edge[dep] = True\n    \n    # Nodes with no incoming edges are root nodes\n    for node in acyclic_graph:\n        if not has_incoming_edge.get(node, False) and node in acyclic_graph:\n            root_nodes.append(node)\n    \n    if not root_nodes:\n        logger.warning(\"No root nodes found in the graph, using arbitrary starting point\")\n        root_nodes = list(acyclic_graph.keys())[:1]  # Use the first node as starting point\n    \n    # Track visited nodes\n    visited = set()\n    result = []\n    \n    # DFS function that processes dependencies first\n    def dfs(node):\n        if node in visited:\n            return\n        visited.add(node)\n        \n        # Visit all dependencies first\n        for dep in sorted(acyclic_graph.get(node, set())):\n            dfs(dep)\n        \n        # Add this node to the result after all its dependencies\n        result.append(node)\n    \n    # Start DFS from each root node\n    for root in sorted(root_nodes):\n        dfs(root)\n    \n    # Check if all nodes were visited\n    if len(result) != len(acyclic_graph):\n        # Some nodes weren't visited - try to visit remaining nodes\n        for node in sorted(acyclic_graph.keys()):\n            if node not in visited:\n                dfs(node)\n    \n    return result\n\ndef build_graph_from_components(components: Dict[str, Any]) -> Dict[str, Set[str]]:\n    \"\"\"\n    Build a dependency graph from a collection of code components.\n    \n    The graph uses the natural dependency direction:\n    - If A depends on B, we create an edge A → B\n    - This means an edge from node X to node Y represents \"X depends on Y\"\n    - Root nodes (nodes with no dependencies) are components that don't depend on anything\n    \n    Args:\n        components: A dictionary of code components, where each component\n                   has a 'depends_on' attribute\n    \n    Returns:\n        A dependency graph with natural dependency direction\n    \"\"\"\n    graph = {}\n    \n    for comp_id, component in components.items():\n        # Initialize the node's adjacency list\n        if comp_id not in graph:\n            graph[comp_id] = set()\n        \n        # Add dependencies\n        for dep_id in component.depends_on:\n            # Only include dependencies that are actual components in our repository\n            if dep_id in components:\n                graph[comp_id].add(dep_id)\n    \n    return graph "
  },
  {
    "path": "src/evaluate_helpfulness.py",
    "content": "#!/usr/bin/env python\n# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nScript to evaluate the helpfulness of docstrings generated by different systems.\n\nUsage:\n    conda activate docstringgen\n    python src/evaluate_helpfulness.py\n\"\"\"\n\nimport os\nimport yaml\nimport argparse\nimport sys\nfrom pathlib import Path\n\n# Add the src directory to the path so we can import modules\nsrc_dir = Path(__file__).parent.parent\nsys.path.insert(0, str(src_dir))\n\nfrom src.evaluator.helpfulness_evaluator import DocstringHelpfulnessEvaluator\n\ndef main():\n    parser = argparse.ArgumentParser(description=\"Evaluate docstring helpfulness\")\n    parser.add_argument(\"--data-path\", type=str, \n                        default=\"experiments/eval/results/completeness_evaluation_cleaned.json\",\n                        help=\"Path to the completeness evaluation data\")\n    parser.add_argument(\"--output-dir\", type=str, \n                        default=\"experiments/eval/results/helpfulness\",\n                        help=\"Directory to store evaluation results\")\n    parser.add_argument(\"--n-samples\", type=int, default=50,\n                        help=\"Number of components to sample\")\n    parser.add_argument(\"--seed\", type=int, default=42,\n                        help=\"Random seed for reproducibility\")\n    parser.add_argument(\"--model\", type=str, default=None,\n                        help=\"LLM model to use (defaults to model in config)\")\n    args = parser.parse_args()\n    \n    # Create output directory if it doesn't exist\n    os.makedirs(args.output_dir, exist_ok=True)\n    \n    # Get configuration\n    config_path = \"config/agent_config.yaml\"\n    with open(config_path, 'r') as f:\n        config = yaml.safe_load(f)\n    \n    # Get API key and model from config\n    api_key = config[\"llm\"][\"api_key\"]\n    model = args.model or config[\"llm\"][\"model\"]\n    \n    print(f\"Using model: {model}\")\n    print(f\"Sampling {args.n_samples} components with seed {args.seed}\")\n    \n    # Initialize evaluator\n    evaluator = DocstringHelpfulnessEvaluator(\n        data_path=args.data_path,\n        output_dir=args.output_dir,\n        api_key=api_key,\n        model=model\n    )\n    \n    # Run evaluation\n    results = evaluator.run_evaluation(\n        n_samples=args.n_samples,\n        seed=args.seed\n    )\n    \n    # Print summary\n    print(\"\\n=== Evaluation Complete ===\")\n    print(f\"Results saved to {args.output_dir}\")\n    print(f\"Total evaluations: {len(results['results'])}\")\n    \n    # Calculate average score\n    scores = [r[\"score\"] for r in results[\"results\"]]\n    avg_score = sum(scores) / len(scores) if scores else 0\n    print(f\"Overall average score: {avg_score:.2f}\")\n    \n    # Calculate average by system\n    systems = evaluator.SYSTEMS\n    for system in systems:\n        system_scores = [r[\"score\"] for r in results[\"results\"] if r[\"system\"] == system]\n        if system_scores:\n            avg = sum(system_scores) / len(system_scores)\n            print(f\"{system}: {avg:.2f} (n={len(system_scores)})\")\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "src/evaluator/README.md",
    "content": "# Docstring Quality Evaluator\n\nprovides a robust framework for evaluating the quality of Python docstrings. It uses static analysis through the Abstract Syntax Tree (AST) to examine docstrings in Python code and assess their completeness based on established documentation standards.\n\n## Architecture Overview\n\nThe project follows a hierarchical design with clear separation of concerns:\n\n### Base Evaluator\n\nThe foundation of the evaluation system is the `BaseEvaluator` abstract class. This class establishes the core interface that all evaluators must implement:\n\n```python\nclass BaseEvaluator(ABC):\n    def __init__(self, name: str, description: str):\n        self._score: float = 0.0\n        self._name = name\n        self._description = description\n```\n\nEvery evaluator derives from this base class, ensuring consistent scoring behavior and interface across the system. The base evaluator enforces score validation (must be between 0 and 1) and provides the abstract `evaluate` method that all concrete evaluators must implement.\n\n### Completeness Evaluation\n\nThe completeness evaluation system is structured in three layers:\n\n1. `CompletenessEvaluator`: The base class for completeness evaluation\n2. `ClassCompletenessEvaluator`: Specializes in evaluating class docstrings\n3. `FunctionCompletenessEvaluator`: Specializes in evaluating function/method docstrings\n\n#### Class Docstring Evaluation\n\nThe `ClassCompletenessEvaluator` examines four essential elements of class documentation:\n\n1. **Summary** (required)\n   - A one-line description at the start of the docstring\n   - Must be the first non-empty line\n   - Should provide a quick overview of the class's purpose\n\n2. **Description** (required)\n   - Detailed explanation following the summary\n   - Multiple lines describing the class's functionality\n   - Appears before any special sections (Attributes, Examples, etc.)\n   \n3. **Attributes** (required if class has attributes)\n   - Documentation of class attributes\n   - Must start with \"Attributes:\" section\n   - Lists each attribute with type information and description\n   - Required if class has class variables, instance variables in __init__, or enum values\n\n4. **Parameters** (required if class has __init__ parameters)\n   - Documentation of constructor parameters\n   - Must start with \"Parameters:\" section\n   - Lists each parameter with type information and description\n   - Required if __init__ has parameters beyond self\n\n5. **Examples** (required for public classes)\n   - Usage examples showing how to use the class\n   - Must start with \"Example:\" or \"Examples:\" section\n   - Should include executable code snippets\n   - Only required for classes not starting with underscore (_)\n\nEach element is evaluated independently through dedicated methods:\n```python\n@staticmethod\ndef evaluate_summary(docstring: str) -> float:\n    \"\"\"Evaluates if a proper one-liner summary exists.\"\"\"\n\n@staticmethod\ndef evaluate_description(docstring: str) -> float:\n    \"\"\"Evaluates if a proper description section exists.\"\"\"\n\n@staticmethod\ndef evaluate_attributes(docstring: str) -> float:\n    \"\"\"Evaluates if attribute documentation exists.\"\"\"\n\n@staticmethod\ndef evaluate_examples(docstring: str) -> float:\n    \"\"\"Evaluates if usage examples exist.\"\"\"\n```\n\n#### Function Docstring Evaluation\n\nThe `FunctionCompletenessEvaluator` examines up to six elements, with required elements determined dynamically based on the function's characteristics:\n\n1. **Summary** (required for all functions)\n   - One-line description at the start\n   - Concise explanation of function's purpose\n\n2. **Description** (required for all functions)\n   - Detailed explanation of functionality\n   - Implementation details and usage notes\n\n3. **Arguments** (required if function has parameters)\n   - Documentation for each parameter\n   - Must start with \"Args:\" or \"Arguments:\"\n   - Includes type information and description\n\n4. **Returns** (required if function has return statement)\n   - Documentation of return value\n   - Must start with \"Returns:\"\n   - Includes type information and description\n\n5. **Raises** (required if function has raise statements)\n   - Documentation of exceptions\n   - Must start with \"Raises:\"\n   - Lists each exception type and trigger condition\n\n6. **Examples** (required for public functions)\n   - Usage examples\n   - Must start with \"Example:\" or \"Examples:\"\n   - Not required for private methods (starting with underscore)\n\nThe evaluator automatically determines required sections through AST analysis:\n```python\ndef _get_required_sections(self, node: ast.FunctionDef) -> List[str]:\n    \"\"\"Determines which sections are required based on function characteristics.\"\"\"\n```\n\n### Scoring System\n\nBoth evaluators use a normalized scoring system:\n\n1. Each required element contributes equally to the final score\n2. Scores are always between 0.0 and 1.0\n3. Individual element scores are stored in `element_scores` dictionary\n4. Final score is the average of all required element scores\n\nFor example, if a class docstring has all elements except examples:\n```python\nelement_scores = {\n    'summary': 1.0,\n    'description': 1.0,\n    'attributes': 1.0,\n    'examples': 0.0\n}\nfinal_score = 0.75  # (1.0 + 1.0 + 1.0 + 0.0) / 4\n```\n\n## Usage Examples\n\n### Evaluating a Class Docstring\n\n```python\nfrom docstring_evaluator import ClassCompletenessEvaluator\nimport ast\n\n# Create evaluator\nevaluator = ClassCompletenessEvaluator()\n\n# Define class with docstring\nclass_code = '''\nclass MyClass:\n    \"\"\"\n    A demonstration class.\n\n    This class shows proper docstring formatting.\n\n    Attributes:\n        name (str): The class name.\n\n    Example:\n        >>> obj = MyClass()\n    \"\"\"\n    pass\n'''\n\n# Parse and evaluate\nnode = ast.parse(class_code).body[0]\nscore = evaluator.evaluate(node)\nprint(f\"Overall score: {score}\")\nprint(\"Element scores:\", evaluator.element_scores)\n```\n\n### Evaluating a Function Docstring\n\n```python\nfrom docstring_evaluator import FunctionCompletenessEvaluator\nimport ast\n\n# Create evaluator\nevaluator = FunctionCompletenessEvaluator()\n\n# Define function with docstring\nfunction_code = '''\ndef process_data(data: List[str]) -> Dict[str, int]:\n    \"\"\"\n    Process a list of strings and return word frequencies.\n\n    This function takes a list of strings and returns a dictionary\n    containing the frequency of each word.\n\n    Args:\n        data (List[str]): List of strings to process.\n\n    Returns:\n        Dict[str, int]: Dictionary mapping words to their frequencies.\n\n    Raises:\n        ValueError: If input list is empty.\n\n    Example:\n        >>> process_data([\"hello\", \"world\", \"hello\"])\n        {'hello': 2, 'world': 1}\n    \"\"\"\n    if not data:\n        raise ValueError(\"Empty input list\")\n    return Counter(data)\n'''\n\n# Parse and evaluate\nnode = ast.parse(function_code).body[0]\nscore = evaluator.evaluate(node)\nprint(f\"Overall score: {score}\")\nprint(\"Element scores:\", evaluator.element_scores)\n```\n\n### Exception Handling Guidelines\n\nThe evaluator checks for uncaught exceptions in two ways:\n\n1. Direct raise statements:\n   - Walks through all raise statements in the function\n   - Checks if each raise is inside a try-except block\n   - If a raise is not caught by any except handler, it's considered to bubble up\n\n2. Function calls:\n   - Walks through all function call nodes\n   - Assumes any uncaught function call could potentially raise\n   - Checks if the call is inside a try-except block\n   - If not caught, considers it as a potential exception source\n\nThe evaluator uses AST traversal to track parent-child relationships and determine if exceptions are properly handled within the function scope.\n\n### Function Analysis Limitations\n\n- Nested functions (functions defined inside other functions) are not evaluated by the tool. These inner functions are skipped during analysis.\n\n\n### Other Notes\n\n- __init__ function is not evaluated. (will be considered during the evaluation of the class)\n\n\n## Best Practices for Documentation\n\nTo achieve high scores, follow these guidelines:\n\n1. Always start with a clear, one-line summary\n2. Provide detailed description in subsequent paragraphs\n3. Document all attributes for classes\n4. Include practical usage examples\n5. For functions:\n   - Document all parameters under \"Args:\"\n   - Specify return type and value under \"Returns:\"\n   - List all possible exceptions under \"Raises:\"\n   - Provide examples for public functions\n\n## Development\n\n### Adding New Evaluators\n\nTo create new evaluators:\n\n1. Inherit from `BaseEvaluator`\n2. Implement the `evaluate` method\n3. Define specific evaluation criteria\n4. Add unit tests\n\nExample:\n```python\nclass StyleEvaluator(BaseEvaluator):\n    \"\"\"Evaluates docstring style consistency.\"\"\"\n    \n    def evaluate(self, node: ast.AST) -> float:\n        # Implementation here\n        pass\n```\n\n# Limitations\n\n- the elements must start with the included labels. (see definition of evaluators) Otherwise, the evaluator will not be able to detect the element.\n    - except summary and description. (which is detected by the first and second non-empty line)\n- each element must seperate by at least one empty line."
  },
  {
    "path": "src/evaluator/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom .base import BaseEvaluator\nfrom .completeness import (  # Remove 'evaluators.' from the path\n    CompletenessEvaluator,\n    ClassCompletenessEvaluator,\n    FunctionCompletenessEvaluator\n)\n\n__all__ = [\n    'BaseEvaluator',\n    'CompletenessEvaluator',\n    'ClassCompletenessEvaluator',\n    'FunctionCompletenessEvaluator'\n]"
  },
  {
    "path": "src/evaluator/base.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom abc import ABC, abstractmethod\nimport ast\nfrom typing import Optional, Dict, Any\n\nclass BaseEvaluator(ABC):\n    \"\"\"\n    Base class for all docstring evaluators.\n    \n    This class provides the foundation for implementing various docstring quality \n    evaluators. Each evaluator should focus on a specific aspect of docstring \n    quality such as completeness, helpfulness, or redundancy.\n    \n    Attributes:\n        score (float): The evaluation score, ranging from 0 to 1.\n        name (str): The name of the evaluator.\n        description (str): A description of what this evaluator checks.\n    \"\"\"\n    \n    def __init__(self, name: str, description: str):\n        self._score: float = 0.0\n        self._name = name\n        self._description = description\n    \n    @property\n    def score(self) -> float:\n        \"\"\"\n        Returns the current evaluation score.\n        \n        Returns:\n            float: A score between 0 and 1 indicating the quality measure.\n        \"\"\"\n        return self._score\n    \n    @score.setter\n    def score(self, value: float) -> None:\n        \"\"\"\n        Sets the evaluation score.\n        \n        Args:\n            value (float): The score to set, must be between 0 and 1.\n            \n        Raises:\n            ValueError: If the score is not between 0 and 1.\n        \"\"\"\n        if not 0 <= value <= 1:\n            raise ValueError(\"Score must be between 0 and 1\")\n        self._score = value\n    \n    @abstractmethod\n    def evaluate(self, node: ast.AST) -> float:\n        \"\"\"\n        Evaluates the quality of a docstring based on specific criteria.\n        \n        Args:\n            node (ast.AST): The AST node containing the docstring to evaluate.\n            \n        Returns:\n            float: The evaluation score between 0 and 1.\n        \"\"\"\n        pass"
  },
  {
    "path": "src/evaluator/completeness.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport ast\nimport re\nfrom typing import Dict, List, Optional\n\nfrom evaluator.base import BaseEvaluator\n\n\nclass CompletenessEvaluator(BaseEvaluator):\n    \"\"\"\n    Base class for evaluating docstring completeness.\n\n    This evaluator examines whether a docstring contains all necessary elements\n    according to common documentation standards.\n\n    Attributes:\n        score (float): The completeness score from 0 to 1.\n        element_scores (Dict[str, bool]): Individual scores for each docstring element.\n        element_required (Dict[str, bool]): Whether each element is required.\n        weights (List[float]): Weights for each element in scoring.\n    \"\"\"\n\n    def __init__(self, name: str, description: str):\n        super().__init__(name=name, description=description)\n        self.element_scores: Dict[str, bool] = {}\n        self.element_required: Dict[str, bool] = {}\n        self.weights: List[float] = []\n\n    def evaluate(self, node: ast.AST) -> float:\n        \"\"\"\n        Evaluates the completeness of a docstring.\n\n        This method determines which specific evaluator to use based on the\n        AST node type and delegates the evaluation accordingly.\n\n        Args:\n            node (ast.AST): The AST node to evaluate.\n\n        Returns:\n            float: The completeness score between 0 and 1.\n\n        Raises:\n            ValueError: If the node type is not supported.\n        \"\"\"\n        if isinstance(node, ast.ClassDef):\n            evaluator = ClassCompletenessEvaluator()\n            self.score = evaluator.evaluate(node)\n        elif isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):\n            evaluator = FunctionCompletenessEvaluator()\n            self.score = evaluator.evaluate(node)\n        else:\n            raise ValueError(f\"Unsupported node type: {type(node)}\")\n\n        return self.score\n\n\nclass ClassCompletenessEvaluator(CompletenessEvaluator):\n    \"\"\"\n    Evaluator for class docstring completeness.\n\n    This evaluator checks for the presence of required elements in class\n    docstrings including summary, description, attributes, parameters, and examples.\n\n    Attributes:\n        score (float): The overall completeness score from 0 to 1.\n        element_scores (Dict[str, bool]): Individual scores for each docstring element.\n        element_required (Dict[str, bool]): Whether each element is required.\n        weights (List[float]): Weights for each element in scoring.\n        required_sections (List[str]): List of required sections for the current class.\n    \"\"\"\n\n    # Valid section labels (case-insensitive)\n    ATTRIBUTE_LABELS = {\n        \"attributes:\",\n        \"members:\",\n        \"member variables:\",\n        \"instance variables:\",\n        \"properties:\",\n    }\n    EXAMPLE_LABELS = {\n        \"example:\",\n        \"examples:\",\n        \"usage:\",\n        \"usage example:\",\n        \"usage examples:\",\n    }\n    PARAMETER_LABELS = {\"parameters:\", \"params:\", \"args:\", \"arguments:\"}\n\n    def __init__(self):\n        super().__init__(\n            name=\"Class Completeness Evaluator\",\n            description=\"Evaluates the completeness of class docstrings\",\n        )\n\n        # Initialize element scores and requirements\n        elements = [\"summary\", \"description\", \"parameters\", \"attributes\", \"examples\"]\n\n        self.element_scores = {el: False for el in elements}\n        self.element_required = {\n            el: False for el in elements\n        }  # Will be set during evaluation\n        self.weights = [0.2] * len(elements)  # Equal weights by default\n\n        # Verify dictionaries have same keys in same order\n        assert list(self.element_scores.keys()) == list(self.element_required.keys())\n        assert len(self.element_scores) == len(self.weights)\n\n        self.required_sections: List[str] = []\n\n    @staticmethod\n    def evaluate_summary(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has a proper one-liner summary.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if summary exists, False otherwise.\n        \"\"\"\n        lines = docstring.strip().split(\"\\n\")\n        return bool(lines and lines[0].strip())\n\n    @staticmethod\n    def evaluate_description(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has a proper description.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if description exists, False otherwise.\n        \"\"\"\n        # Split docstring into chunks by empty lines\n        chunks = []\n        current_chunk = []\n\n        for line in docstring.strip().split(\"\\n\"):\n            if not line.strip():\n                if current_chunk:\n                    chunks.append(current_chunk)\n                    current_chunk = []\n            else:\n                current_chunk.append(line.strip())\n\n        if current_chunk:\n            chunks.append(current_chunk)\n\n        # Need at least 2 chunks (summary and description)\n        if len(chunks) < 2:\n            return False\n\n        # Check if second chunk starts with any other section label\n        description_chunk = chunks[1]\n        if not description_chunk:\n            return False\n\n        first_line = description_chunk[0].lower()\n        for labels in [\n            ClassCompletenessEvaluator.ATTRIBUTE_LABELS,\n            ClassCompletenessEvaluator.PARAMETER_LABELS,\n            ClassCompletenessEvaluator.EXAMPLE_LABELS,\n        ]:\n            if any(first_line.startswith(label.lower()) for label in labels):\n                return False\n\n        return True\n\n    @staticmethod\n    def evaluate_attributes(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has attribute documentation.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if attributes section exists, False otherwise.\n        \"\"\"\n        # Check if any attribute label appears anywhere in the docstring\n        return any(\n            label.lower() in docstring.lower()\n            for label in ClassCompletenessEvaluator.ATTRIBUTE_LABELS\n        )\n\n    @staticmethod\n    def evaluate_parameters(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has constructor parameter documentation.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if parameters section exists, False otherwise.\n        \"\"\"\n        # Check if any parameter label appears anywhere in the docstring\n        return any(\n            label.lower() in docstring.lower()\n            for label in ClassCompletenessEvaluator.PARAMETER_LABELS\n        )\n\n    @staticmethod\n    def evaluate_examples(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has usage examples.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if examples section exists, False otherwise.\n        \"\"\"\n        # Check if any example label appears anywhere in the docstring\n        return any(\n            label.lower() in docstring.lower()\n            for label in ClassCompletenessEvaluator.EXAMPLE_LABELS\n        )\n\n    def _has_attributes(self, node: ast.ClassDef) -> bool:\n        \"\"\"\n        Checks if the class has attributes by looking for class variables, instance variables in __init__, or enum values.\n\n        Args:\n            node (ast.ClassDef): The class definition node.\n\n        Returns:\n            bool: True if class has attributes, False otherwise.\n        \"\"\"\n        # Check for class variables\n        has_class_vars = any(\n            isinstance(item, (ast.AnnAssign, ast.Assign)) for item in node.body\n        )\n\n        # Check for instance variables in __init__\n        has_instance_vars = False\n        for item in node.body:\n            if isinstance(item, ast.FunctionDef) and item.name == \"__init__\":\n                has_instance_vars = any(\n                    isinstance(stmt, ast.Assign)\n                    and isinstance(stmt.targets[0], ast.Attribute)\n                    and isinstance(stmt.targets[0].value, ast.Name)\n                    and stmt.targets[0].value.id == \"self\"\n                    for stmt in ast.walk(item)\n                )\n                break\n\n        # Check if it's an Enum\n        is_enum = (\n            hasattr(node, \"bases\")\n            and node.bases\n            and any(\n                isinstance(base, ast.Name) and base.id == \"Enum\" for base in node.bases\n            )\n        )\n\n        return has_class_vars or has_instance_vars or is_enum\n\n    def _get_required_sections(self, node: ast.ClassDef) -> List[str]:\n        \"\"\"\n        Determines which sections are required for the class docstring.\n\n        Args:\n            node (ast.ClassDef): The class definition node.\n\n        Returns:\n            List[str]: List of required section names.\n        \"\"\"\n        required = [\"summary\", \"description\"]\n\n        if self._has_attributes(node):\n            required.append(\"attributes\")\n\n        # Check if __init__ has parameters beyond self\n        if self._has_init_parameters(node):\n            required.append(\"parameters\")\n\n        # Examples are required for public classes\n        if not node.name.startswith(\"_\"):\n            required.append(\"examples\")\n\n        return required\n\n    def _has_init_parameters(self, node: ast.ClassDef) -> bool:\n        \"\"\"\n        Checks if the class __init__ method has parameters beyond self.\n\n        Args:\n            node (ast.ClassDef): The class definition node.\n\n        Returns:\n            bool: True if __init__ has parameters beyond self.\n        \"\"\"\n        for item in node.body:\n            if isinstance(item, ast.FunctionDef) and item.name == \"__init__\":\n                args = [arg for arg in item.args.args if arg.arg != \"self\"]\n                return bool(args or item.args.kwonlyargs)\n        return False\n\n    def evaluate(self, node: ast.ClassDef) -> float:\n        \"\"\"\n        Evaluates the completeness of a class docstring.\n\n        Checks for:\n        1. One-liner summary\n        2. Description\n        3. Attributes documentation\n        4. Parameters documentation (if __init__ has parameters beyond self)\n        5. Usage examples\n\n        Args:\n            node (ast.ClassDef): The class definition node to evaluate.\n\n        Returns:\n            float: The completeness score between 0 and 1.\n        \"\"\"\n        # Get required sections for this class first\n        self.required_sections = self._get_required_sections(node)\n\n        # Reset scores and update requirements\n        self.element_scores = {key: False for key in self.element_scores}\n        self.element_required = {\n            key: key in self.required_sections for key in self.element_scores\n        }\n\n        docstring = ast.get_docstring(node)\n        if not docstring:\n            self.score = 0.0\n            return self.score\n\n        # Evaluate each element\n        if \"summary\" in self.required_sections:\n            self.element_scores[\"summary\"] = self.evaluate_summary(docstring)\n        if \"description\" in self.required_sections:\n            self.element_scores[\"description\"] = self.evaluate_description(docstring)\n        if \"parameters\" in self.required_sections:\n            self.element_scores[\"parameters\"] = self.evaluate_parameters(docstring)\n        if \"attributes\" in self.required_sections:\n            self.element_scores[\"attributes\"] = self.evaluate_attributes(docstring)\n        if \"examples\" in self.required_sections:\n            self.element_scores[\"examples\"] = self.evaluate_examples(docstring)\n\n        # Calculate weighted score considering requirements\n        total_weight = 0.0\n        weighted_score = 0.0\n\n        for (key, score), weight, required in zip(\n            self.element_scores.items(), self.weights, self.element_required.values()\n        ):\n            if required:\n                total_weight += weight\n                if score:\n                    weighted_score += weight\n\n        self.score = weighted_score / total_weight if total_weight > 0 else 0.0\n        return self.score\n\n    def evaluate_using_string(self, docstring: str, element_required: Dict) -> Dict:\n        \"\"\" \"\"\"\n        # Get required sections for this class first\n\n        # Reset scores and update requirements\n        element_scores = {key: False for key in element_required}\n\n        if not docstring:\n            score = 0.0\n            return element_scores\n\n        # Evaluate each element\n        for key in element_required:\n            if key == \"summary\":\n                element_scores[key] = self.evaluate_summary(docstring)\n            elif key == \"description\":\n                element_scores[key] = self.evaluate_description(docstring)\n            elif key == \"parameters\":\n                element_scores[key] = self.evaluate_parameters(docstring)\n            elif key == \"attributes\":\n                element_scores[key] = self.evaluate_attributes(docstring)\n            elif key == \"examples\":\n                element_scores[key] = self.evaluate_examples(docstring)\n\n        return element_scores\n\n\nclass FunctionCompletenessEvaluator(CompletenessEvaluator):\n    \"\"\"\n    Evaluator for function/method docstring completeness.\n\n    This evaluator checks for the presence of required elements in function\n    docstrings including summary, description, arguments, returns, raises,\n    and examples.\n\n    Attributes:\n        score (float): The overall completeness score from 0 to 1.\n        element_scores (Dict[str, bool]): Individual scores for each docstring element.\n        element_required (Dict[str, bool]): Whether each element is required.\n        weights (List[float]): Weights for each element in scoring.\n        required_sections (List[str]): List of required sections for the current function.\n    \"\"\"\n\n    # Valid section labels (case-insensitive)\n    ARGS_LABELS = {\"args:\", \"arguments:\", \"parameters:\", \"params:\"}\n    RETURNS_LABELS = {\n        \"returns:\",\n        \"return:\",\n        \"return value:\",\n        \"return type:\",\n        \"yields:\",\n        \"yield:\",\n    }\n    RAISES_LABELS = {\"raises:\", \"exceptions:\", \"throws:\"}\n    EXAMPLE_LABELS = {\n        \"example:\",\n        \"examples:\",\n        \"usage:\",\n        \"usage example:\",\n        \"usage examples:\",\n    }\n\n    def __init__(self):\n        super().__init__(\n            name=\"Function Completeness Evaluator\",\n            description=\"Evaluates the completeness of function docstrings\",\n        )\n\n        # Initialize element scores and requirements\n        elements = [\"summary\", \"description\", \"args\", \"returns\", \"raises\", \"examples\"]\n\n        self.element_scores = {el: False for el in elements}\n        self.element_required = {\n            el: False for el in elements\n        }  # Will be set during evaluation\n        self.weights = [1 / len(elements)] * len(elements)  # Equal weights by default\n\n        # Verify dictionaries have same keys in same order\n        assert list(self.element_scores.keys()) == list(self.element_required.keys())\n        assert len(self.element_scores) == len(self.weights)\n\n        self.required_sections: List[str] = []\n\n    @staticmethod\n    def evaluate_summary(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has a proper one-liner summary.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if summary exists, False otherwise.\n        \"\"\"\n        lines = docstring.strip().split(\"\\n\")\n        return bool(lines and lines[0].strip())\n\n    @staticmethod\n    def evaluate_description(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has a proper description.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if description exists, False otherwise.\n        \"\"\"\n        # Split docstring into chunks by empty lines\n        chunks = []\n        current_chunk = []\n\n        for line in docstring.strip().split(\"\\n\"):\n            if not line.strip():\n                if current_chunk:\n                    chunks.append(current_chunk)\n                    current_chunk = []\n            else:\n                current_chunk.append(line.strip())\n\n        if current_chunk:\n            chunks.append(current_chunk)\n\n        # Need at least 2 chunks (summary and description)\n        if len(chunks) < 2:\n            return False\n\n        # Check if second chunk starts with any other section label\n        description_chunk = chunks[1]\n        if not description_chunk:\n            return False\n\n        first_line = description_chunk[0].lower()\n        for labels in [\n            FunctionCompletenessEvaluator.ARGS_LABELS,\n            FunctionCompletenessEvaluator.RETURNS_LABELS,\n            FunctionCompletenessEvaluator.RAISES_LABELS,\n            FunctionCompletenessEvaluator.EXAMPLE_LABELS,\n        ]:\n            if any(first_line.startswith(label.lower()) for label in labels):\n                return False\n\n        return True\n\n    @staticmethod\n    def evaluate_args(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has argument documentation.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if arguments section exists, False otherwise.\n        \"\"\"\n        # Check if any argument label appears anywhere in the docstring\n        return any(\n            label.lower() in docstring.lower()\n            for label in FunctionCompletenessEvaluator.ARGS_LABELS\n        )\n\n    @staticmethod\n    def evaluate_returns(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has return value or yield documentation.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if returns/yields section exists, False otherwise.\n        \"\"\"\n        # Check if any return label appears anywhere in the docstring\n        return any(\n            label.lower() in docstring.lower()\n            for label in FunctionCompletenessEvaluator.RETURNS_LABELS\n        )\n\n    @staticmethod\n    def evaluate_raises(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has exception documentation.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if raises section exists, False otherwise.\n        \"\"\"\n        # Check if any raise label appears anywhere in the docstring\n        return any(\n            label.lower() in docstring.lower()\n            for label in FunctionCompletenessEvaluator.RAISES_LABELS\n        )\n\n    @staticmethod\n    def evaluate_examples(docstring: str) -> bool:\n        \"\"\"\n        Evaluates if the docstring has usage examples.\n\n        Args:\n            docstring (str): The docstring to evaluate.\n\n        Returns:\n            bool: True if examples section exists, False otherwise.\n        \"\"\"\n        # Check if any example label appears anywhere in the docstring\n        return any(\n            label.lower() in docstring.lower()\n            for label in FunctionCompletenessEvaluator.EXAMPLE_LABELS\n        )\n\n    def evaluate(self, node: ast.FunctionDef) -> float:\n        \"\"\"\n        Evaluates the completeness of a function docstring.\n\n        Checks for:\n        1. One-liner summary\n        2. Description\n        3. Arguments documentation (if has arguments)\n        4. Returns documentation (if has return)\n        5. Raises documentation (if has raise statements)\n        6. Examples (if not private)\n\n        Args:\n            node (ast.FunctionDef): The function definition node to evaluate.\n\n        Returns:\n            float: The completeness score between 0 and 1.\n        \"\"\"\n        # Skip __init__ methods\n        if node.name == \"__init__\":\n            self.score = 1.0\n            return self.score\n\n        # Get required sections for this function first\n        self.required_sections = self._get_required_sections(node)\n\n        # Reset scores and update requirements\n        self.element_scores = {key: False for key in self.element_scores}\n        self.element_required = {\n            key: key in self.required_sections for key in self.element_scores\n        }\n\n        docstring = ast.get_docstring(node)\n        if not docstring:\n            self.score = 0.0\n            return self.score\n\n        # Evaluate each element\n        if \"summary\" in self.required_sections:\n            self.element_scores[\"summary\"] = self.evaluate_summary(docstring)\n        if \"description\" in self.required_sections:\n            self.element_scores[\"description\"] = self.evaluate_description(docstring)\n        if \"args\" in self.required_sections:\n            self.element_scores[\"args\"] = self.evaluate_args(docstring)\n        if \"returns\" in self.required_sections:\n            self.element_scores[\"returns\"] = self.evaluate_returns(docstring)\n        if \"raises\" in self.required_sections:\n            self.element_scores[\"raises\"] = self.evaluate_raises(docstring)\n        if \"examples\" in self.required_sections:\n            self.element_scores[\"examples\"] = self.evaluate_examples(docstring)\n\n        # Calculate weighted score considering requirements\n        total_weight = 0.0\n        weighted_score = 0.0\n\n        for (key, score), weight, required in zip(\n            self.element_scores.items(), self.weights, self.element_required.values()\n        ):\n            if required:\n                total_weight += weight\n                if score:\n                    weighted_score += weight\n\n        self.score = weighted_score / total_weight if total_weight > 0 else 0.0\n        return self.score\n\n    def evaluate_using_string(self, docstring: str, element_required: Dict) -> Dict:\n        \"\"\" \"\"\"\n        # Get required sections for this class first\n\n        # Reset scores and update requirements\n        element_scores = {key: False for key in element_required}\n\n        if not docstring:\n            return element_scores\n\n        # Evaluate each element\n        for key in element_required:\n            if key == \"summary\":\n                element_scores[key] = self.evaluate_summary(docstring)\n            elif key == \"description\":\n                element_scores[key] = self.evaluate_description(docstring)\n            elif key == \"args\":\n                element_scores[key] = self.evaluate_args(docstring)\n            elif key == \"returns\":\n                element_scores[key] = self.evaluate_returns(docstring)\n            elif key == \"raises\":\n                element_scores[key] = self.evaluate_raises(docstring)\n            elif key == \"examples\":\n                element_scores[key] = self.evaluate_examples(docstring)\n\n        return element_scores\n\n    def _get_required_sections(self, node: ast.FunctionDef) -> List[str]:\n        \"\"\"\n        Determines which sections are required for the function docstring.\n\n        Args:\n            node (ast.FunctionDef): The function definition node.\n\n        Returns:\n            List[str]: List of required section names.\n        \"\"\"\n        required = [\"summary\", \"description\"]\n\n        # Check if function has arguments beyond just 'self'\n        args = [arg for arg in node.args.args if arg.arg != \"self\"]\n        if args or node.args.kwonlyargs:\n            required.append(\"args\")\n\n        # Check if function has returns\n        if self._has_return_statement(node):\n            required.append(\"returns\")\n\n        # Check if function has raise statements\n        if self._has_raise_statement(node):\n            required.append(\"raises\")\n\n        # Check if function is public (not starting with _)\n        if not node.name.startswith(\"_\"):\n            required.append(\"examples\")\n\n        return required\n\n    def _has_return_statement(self, node: ast.FunctionDef) -> bool:\n        \"\"\"\n        Checks if the function has any meaningful return statements or yields.\n\n        A return statement is considered meaningful if it:\n        1. Returns a value other than None\n        2. Uses yield or yield from (generator function)\n        3. Has an explicit return None statement\n\n        Args:\n            node (ast.FunctionDef): The function definition node.\n\n        Returns:\n            bool: True if the function has a meaningful return value or is a generator.\n        \"\"\"\n        has_explicit_return = False\n\n        for child in ast.walk(node):\n            if isinstance(child, ast.Return):\n                if child.value is not None:\n                    # Return with any value (including None)\n                    has_explicit_return = True\n                    if (\n                        not isinstance(child.value, ast.Constant)\n                        or child.value.value is not None\n                    ):\n                        return True\n            elif isinstance(child, (ast.Yield, ast.YieldFrom)):\n                # Function is a generator\n                return True\n\n        return has_explicit_return\n\n    def _has_raise_statement(self, node: ast.FunctionDef) -> bool:\n        \"\"\"\n        Checks if the function has any uncaught raise statements that bubble up to caller.\n\n        Args:\n            node (ast.FunctionDef): The function definition node.\n\n        Returns:\n            bool: True if the function has any uncaught raise statements.\n        \"\"\"\n        for child in ast.walk(node):\n            if isinstance(child, ast.Raise):\n                # Check if this raise is inside a try-except block\n                parent = child\n                while parent != node:\n                    if isinstance(parent, ast.ExceptHandler):\n                        # Exception is caught, skip this raise\n                        break\n                    parent = next(\n                        p\n                        for p in ast.walk(node)\n                        if any(\n                            isinstance(c, type(parent)) and c is parent\n                            for c in ast.iter_child_nodes(p)\n                        )\n                    )\n                else:\n                    # No except handler found, exception bubbles up\n                    return True\n\n        # Also check any function calls that may raise\n        for child in ast.walk(node):\n            if isinstance(child, ast.Call):\n                # Here we could recursively check called functions\n                # but for now we'll assume any uncaught function call\n                # could potentially raise\n                try:\n                    parent = child\n                    while parent != node:\n                        if isinstance(parent, ast.ExceptHandler):\n                            break\n                        parent = next(\n                            p\n                            for p in ast.walk(node)\n                            if any(\n                                isinstance(c, type(parent)) and c is parent\n                                for c in ast.iter_child_nodes(p)\n                            )\n                        )\n                    else:\n                        return True\n                except StopIteration:\n                    continue\n\n        return False\n"
  },
  {
    "path": "src/evaluator/evaluation_common.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"Common utilities and classes for docstring evaluation.\"\"\"\n\nfrom typing import Dict, Any, List, Optional, Tuple\nfrom dataclasses import dataclass\nfrom enum import Enum\n\nclass ScoreLevel(Enum):\n    \"\"\"Defines the possible score levels for docstring evaluation.\"\"\"\n    POOR = 1\n    FAIR = 2\n    GOOD = 3\n    VERY_GOOD = 4\n    EXCELLENT = 5\n\n@dataclass\nclass SummaryEvaluationExample:\n    \"\"\"Stores an example of docstring summary evaluation with different quality levels.\"\"\"\n    function_signature: str\n    summaries: Dict[ScoreLevel, str]\n    explanations: Dict[ScoreLevel, str]\n\n@dataclass\nclass DescriptionEvaluationExample:\n    \"\"\"Stores an example of docstring description evaluation with different quality levels.\"\"\"\n    function_signature: str\n    descriptions: Dict[ScoreLevel, str]\n    explanations: Dict[ScoreLevel, str]\n\n@dataclass\nclass ParameterEvaluationExample:\n    \"\"\"Stores an example of docstring parameter evaluation with different quality levels.\"\"\"\n    parameters: Dict[str, str]\n    quality_examples: Dict[ScoreLevel, Dict[str, str]]\n    explanations: Dict[ScoreLevel, str] "
  },
  {
    "path": "src/evaluator/helper/context_finder.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import List, Dict, Optional, Tuple\nimport os\nimport ast\nimport json\nfrom pathlib import Path\nimport re\n\nclass UsageLocation:\n    \"\"\"Represents a location where a function/class/method is used.\"\"\"\n    def __init__(self, file_path: str, line_number: int, usage_type: str):\n        self.file_path = file_path\n        self.line_number = line_number\n        self.usage_type = usage_type  # 'function', 'class', or 'method'\n    \n    def to_dict(self) -> Dict:\n        \"\"\"Convert to dictionary for JSON serialization.\"\"\"\n        return {\n            'file_path': self.file_path,\n            'line_number': self.line_number,\n            'usage_type': self.usage_type,\n            'repo_path': self.repo_path,\n            'signature': self.signature\n        }\n    \n    @classmethod\n    def from_dict(cls, data: Dict) -> 'UsageLocation':\n        \"\"\"Create from dictionary.\"\"\"\n        return cls(data['file_path'], data['line_number'], data['usage_type'])\n\nclass ContextSearcher:\n    \"\"\"\n    Searches for usage of functions, classes, and methods in a Python project.\n    Caches results to avoid repeated searches.\n    \"\"\"\n    \n    def __init__(self, repo_path: str):\n        \"\"\"\n        Initialize the searcher.\n        \n        Args:\n            repo_path: Path to the repository root\n        \"\"\"\n        self.repo_path = Path(repo_path)\n        self.cache_dir = os.path.join('data', 'evaluator' , 'search_cache')\n        os.makedirs(self.cache_dir, exist_ok=True)\n    \n    def _get_cache_key(self, file_path: str, signature: str) -> str:\n        \"\"\"Generate a cache key for the search.\"\"\"\n        import hashlib\n        # Create a unique key based on file path and signature\n        key = f\"{file_path}:{signature}\"\n        return hashlib.md5(key.encode()).hexdigest()\n    \n    def _load_from_cache(self, cache_key: str) -> Optional[List[UsageLocation]]:\n        \"\"\"Load search results from cache if available.\"\"\"\n        cache_file = self.cache_dir + f\"/{cache_key}.json\"\n        if os.path.exists(cache_file):\n            with open(cache_file) as f:\n                data = json.load(f)\n                return [UsageLocation.from_dict(loc) for loc in data]\n        return None\n    \n    def _save_to_cache(self, cache_key: str, locations: List[UsageLocation]):\n        \"\"\"Save search results to cache.\"\"\"\n        cache_file = self.cache_dir + f\"/{cache_key}.json\"\n        with open(cache_file, 'w') as f:\n            json.dump([loc.to_dict() for loc in locations], f, indent=2)\n    \n    def find_usages(self, target_file: str, signature: str) -> List[UsageLocation]:\n        \"\"\"\n        Find all usages of a function/class/method in the repository.\n        \n        Args:\n            target_file: Relative path to the file containing the target\n            signature: The signature of the function/class/method\n            \n        Returns:\n            List of UsageLocation objects\n        \"\"\"\n        cache_key = self._get_cache_key(target_file, signature)\n        \n        # Try to load from cache first\n        cached_results = self._load_from_cache(cache_key)\n        if cached_results is not None:\n            return cached_results\n        \n        # Parse signature to get name and type\n        name, usage_type = self._parse_signature(signature)\n        \n        locations = []\n        \n        # Walk through all Python files in the repo\n        for root, _, files in os.walk(self.repo_path):\n            for file in files:\n                if not file.endswith('.py'):\n                    continue\n                    \n                file_path = Path(root) / file\n                rel_path = file_path.relative_to(self.repo_path)\n                \n                # Skip the target file itself\n                if str(rel_path) == target_file:\n                    continue\n                \n                try:\n                    with open(file_path) as f:\n                        content = f.read()\n                    \n                    # Find all usages in this file\n                    file_locations = self._find_usages_in_file(\n                        content, str(rel_path), name, usage_type\n                    )\n                    \n                    # Add repo path and signature to each location\n                    for loc in file_locations:\n                        loc.repo_path = str(self.repo_path)\n                        loc.signature = signature\n                        \n                    locations.extend(file_locations)\n                    \n                except Exception as e:\n                    print(f\"Error processing {file_path}: {e}\")\n        \n        # Cache the results\n        self._save_to_cache(cache_key, locations)\n        \n        return locations\n    \n    def _parse_signature(self, signature: str) -> Tuple[str, str]:\n        \"\"\"Parse a signature to get name and type.\"\"\"\n        signature = signature.strip()\n        \n        # Split into lines to check for decorators\n        is_static = '@staticmethod' in signature\n        # remove @staticmethod decorator\n        if is_static:\n            signature = signature.replace('@staticmethod', '').strip()\n        \n        if signature.startswith('class '):\n            return signature.split()[1].split('(')[0].split(':')[0], 'class'\n        elif signature.startswith('def '):\n            name = signature.split()[1].split('(')[0]\n            if name == '__init__':\n                return None, 'method'  # Skip __init__ methods\n            if is_static:\n                return name, 'staticmethod'\n            return name, 'function' if '(self' not in signature else 'method'\n        \n        raise ValueError(f\"Invalid signature: {signature}\")\n    \n    def _find_usages_in_file(self, content: str, file_path: str, name: str, \n                            usage_type: str) -> List[UsageLocation]:\n        \"\"\"Find all usages in a single file.\"\"\"\n        locations = []\n        tree = ast.parse(content)\n        \n        for node in ast.walk(tree):\n            # For function calls and static methods\n            if usage_type in ('function', 'method', 'staticmethod'):\n                if usage_type == 'staticmethod':\n                    if isinstance(node, ast.Assign):\n                        if isinstance(node.value, ast.Call):\n                            if isinstance(node.value.func, ast.Attribute) and node.value.func.attr == name:\n                                locations.append(UsageLocation(\n                                    file_path, node.lineno, usage_type\n                                ))\n                elif isinstance(node, ast.Call):\n                    if usage_type == 'function' and isinstance(node.func, ast.Name):\n                        if node.func.id == name:\n                            locations.append(UsageLocation(\n                                file_path, node.lineno, usage_type\n                            ))\n                    elif usage_type == 'method' and isinstance(node.func, ast.Attribute):\n                        if node.func.attr == name:\n                            locations.append(UsageLocation(\n                                file_path, node.lineno, usage_type\n                            ))\n            \n            # For class instantiation\n            elif usage_type == 'class':\n                if isinstance(node, ast.Call) and isinstance(node.func, ast.Name):\n                    if node.func.id == name:\n                        locations.append(UsageLocation(\n                            file_path, node.lineno, usage_type\n                        ))\n        \n        return locations\n\nclass ContextPreparer:\n    \"\"\"\n    Prepares context for example evaluation by extracting relevant code\n    from usage locations.\n    \"\"\"\n    \n    def __init__(self, repo_path: str):\n        \"\"\"\n        Initialize the preparer.\n        \n        Args:\n            repo_path: Path to the repository root\n        \"\"\"\n        self.repo_path = Path(repo_path)\n        self.searcher = ContextSearcher(repo_path)\n    \n    def prepare_contexts(self, target_file: str, signature: str) -> List[Tuple[str, str]]:\n        \"\"\"\n        Prepare context for all usages of a function/class/method.\n        \n        Args:\n            target_file: Relative path to the file containing the target\n            signature: The signature of the function/class/method\n            \n        Returns:\n            List of tuples (context_code, ground_truth) where:\n            - context_code is the code leading up to the usage\n            - ground_truth is the actual usage line\n        \"\"\"\n        locations = self.searcher.find_usages(target_file, signature)\n        contexts = []\n        \n        for location in locations:\n            context, ground_truth = self._prepare_single_context(location)\n            if context and ground_truth:\n                contexts.append((context, ground_truth))\n        \n        return contexts\n    \n    def _prepare_single_context(self, location: UsageLocation) -> Tuple[Optional[str], Optional[str]]:\n        \"\"\"Prepare context for a single usage location.\"\"\"\n        file_path = self.repo_path / location.file_path\n        \n        with open(file_path) as f:\n            lines = f.readlines()\n        \n        # Get the ground truth lines\n        ground_truth_lines = []\n        i = location.line_number - 1\n        \n        # Keep adding lines until we find a line ending with colon after right parenthesis\n        while i < len(lines):\n            line = lines[i].strip()\n            ground_truth_lines.append(line)\n            if ')' in line:\n                break\n            i += 1\n            \n        ground_truth = '\\n'.join(ground_truth_lines)\n        \n        # Get the context (all lines up to the usage)\n        context_lines = lines[:location.line_number - 1]\n        \n        # Remove trailing empty lines\n        while context_lines and not context_lines[-1].strip():\n            context_lines.pop()\n        \n        context = ''.join(context_lines)\n        \n        return context, ground_truth"
  },
  {
    "path": "src/evaluator/helpfulness_attributes.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, List, Optional, Tuple\nimport re\nfrom dataclasses import dataclass\nfrom enum import Enum\n\nclass ScoreLevel(Enum):\n    \"\"\"Defines the possible score levels for docstring evaluation.\"\"\"\n    POOR = 1\n    FAIR = 2\n    GOOD = 3\n    VERY_GOOD = 4\n    EXCELLENT = 5\n\n@dataclass\nclass EvaluationExample:\n    \"\"\"Stores an example of docstring attribute evaluation with different quality levels.\"\"\"\n    class_signature: str\n    init_function: str\n    attributes: Dict[str, str]\n    quality_examples: Dict[ScoreLevel, Dict[str, str]]\n    explanations: Dict[ScoreLevel, str]\n\nclass DocstringAttributeEvaluator:\n    \"\"\"\n    Evaluates the quality of Python docstring attribute descriptions using predefined criteria.\n    \n    This class assesses how well attribute descriptions in docstrings convey the purpose,\n    lifecycle, and usage context of class attributes, going beyond mere type information\n    to provide meaningful guidance about attribute roles and behaviors.\n    \"\"\"\n\n    def __init__(self):\n        \"\"\"Initialize the evaluator with predefined criteria and examples.\"\"\"\n        self.criteria = self._initialize_criteria()\n        self.examples = self._initialize_examples()\n\n    def _initialize_criteria(self) -> Dict[str, Any]:\n        \"\"\"\n        Set up the evaluation criteria for attribute descriptions.\n        \n        The criteria define five quality levels, from mere type repetition (1) \n        to excellent usage guidance and context (5).\n        \n        Returns:\n            Dict containing the evaluation criteria and descriptions for each score level.\n        \"\"\"\n        return {\n            'description': (\n                'Evaluate how effectively the attribute descriptions convey the purpose, '\n                'lifecycle, and usage context of class attributes. High-quality descriptions '\n                'should go beyond type information to provide meaningful guidance about '\n                'attribute roles, initialization, modification patterns, and relationships '\n                'with class behavior.'\n            ),\n            'score_criteria': {\n                ScoreLevel.POOR: (\n                    'The attribute descriptions merely restate the attribute types or '\n                    'convert the type hints to natural language without adding any '\n                    'meaningful information about purpose or lifecycle.'\n                ),\n                ScoreLevel.FAIR: (\n                    'The descriptions provide basic information about attribute purpose '\n                    'but lack details about initialization, modification, or usage patterns. '\n                    'They may use vague language or miss important details.'\n                ),\n                ScoreLevel.GOOD: (\n                    'The descriptions explain attribute purpose and include some key '\n                    'information about initialization or usage patterns, but might miss '\n                    'important lifecycle details or relationships with class behavior.'\n                ),\n                ScoreLevel.VERY_GOOD: (\n                    'The descriptions clearly explain purpose, initialization, and common '\n                    'usage patterns. They may note important relationships with class '\n                    'methods and document any special handling or constraints.'\n                ),\n                ScoreLevel.EXCELLENT: (\n                    'The descriptions provide comprehensive guidance including purpose, '\n                    'initialization, modification patterns, relationships with class '\n                    'behavior, and any special considerations. They help users understand '\n                    'both how and when to interact with the attributes.'\n                )\n            }\n        }\n\n    def _initialize_examples(self) -> List[EvaluationExample]:\n        \"\"\"\n        Set up concrete examples of attribute descriptions at different quality levels.\n        \n        Each example includes class and __init__ signatures with corresponding attribute\n        descriptions at different quality levels, along with explanations of the ratings.\n        \n        Returns:\n            List of EvaluationExample objects containing the example cases.\n        \"\"\"\n        return [\n            EvaluationExample(\n                class_signature=\"class DataProcessor:\",\n                init_function='''def __init__(self, config: Dict[str, Any]):\n    \"\"\"Initialize the data processor.\n    \n    Args:\n        config: Configuration dictionary for the processor\n    \"\"\"\n    self.config = config\n    self.data_cache = {}\n    self.is_initialized = False\n    self.stats = defaultdict(int)\n    self._lock = threading.Lock()''',\n                attributes={\n                    \"config\": \"Configuration settings for the processor\",\n                    \"data_cache\": \"Cache for processed data\",\n                    \"is_initialized\": \"Whether the processor is initialized\",\n                    \"stats\": \"Processing statistics\",\n                    \"_lock\": \"Thread synchronization lock\"\n                },\n                quality_examples={\n                    ScoreLevel.POOR: {\n                        \"config\": \"Dictionary of configuration\",\n                        \"data_cache\": \"Dictionary for cache\",\n                        \"is_initialized\": \"Boolean flag\",\n                        \"stats\": \"Dictionary of statistics\",\n                        \"_lock\": \"Threading lock object\"\n                    },\n                    ScoreLevel.FAIR: {\n                        \"config\": \"Configuration settings for processing\",\n                        \"data_cache\": \"Cache storage for processed items\",\n                        \"is_initialized\": \"Tracks initialization status\",\n                        \"stats\": \"Counts of processed items\",\n                        \"_lock\": \"Lock for thread safety\"\n                    },\n                    ScoreLevel.GOOD: {\n                        \"config\": \"Configuration dictionary controlling processing behavior. Set at initialization\",\n                        \"data_cache\": \"Cache of processed items to avoid recomputation. Cleared with reset()\",\n                        \"is_initialized\": \"Flag indicating if setup() has been called successfully\",\n                        \"stats\": \"Counters tracking number of items processed, errors, cache hits etc\",\n                        \"_lock\": \"Thread lock ensuring thread-safe access to shared resources\"\n                    },\n                    ScoreLevel.VERY_GOOD: {\n                        \"config\": \"Configuration dictionary controlling processing behavior. Set at initialization and accessed by all processing methods. Read-only after initialization\",\n                        \"data_cache\": \"Cache of processed items to avoid recomputation. Cleared with reset(). Keys are item IDs, values are processed results\",\n                        \"is_initialized\": \"Flag indicating if setup() has been called successfully. Methods will raise RuntimeError if called before initialization\",\n                        \"stats\": \"Counters tracking processing metrics (items processed, errors, cache hits etc). Updated by process() and reset by clear_stats()\",\n                        \"_lock\": \"Thread lock ensuring thread-safe access to cache and stats. Used internally by all public methods\"\n                    },\n                    ScoreLevel.EXCELLENT: {\n                        \"config\": \"Configuration dictionary controlling processing behavior. Set at initialization and accessed by all processing methods. Read-only after initialization. Must contain 'batch_size' and 'max_cache_size' keys. See CONFIG_SCHEMA for full specification\",\n                        \"data_cache\": \"Cache of processed items to avoid recomputation. Cleared with reset(). Keys are item IDs, values are processed results. Limited to max_cache_size items with LRU eviction. Thread-safe access via _lock\",\n                        \"is_initialized\": \"Flag indicating if setup() has been called successfully. Methods will raise RuntimeError if called before initialization. Set to True by setup() and False by reset(). Thread-safe access via _lock\",\n                        \"stats\": \"Counters tracking processing metrics (items processed, errors, cache hits etc). Updated by process() and reset by clear_stats(). Access via get_stats() for thread-safe snapshot. Used for monitoring and auto-scaling decisions\",\n                        \"_lock\": \"Thread lock ensuring thread-safe access to cache and stats. Used internally by all public methods. Reentrant lock allowing nested acquisition by same thread. Consider using async methods for high-concurrency scenarios\"\n                    }\n                },\n                explanations={\n                    ScoreLevel.POOR: \"These descriptions merely restate the attribute types without adding value\",\n                    ScoreLevel.FAIR: \"Provides basic purpose but lacks lifecycle and usage guidance\",\n                    ScoreLevel.GOOD: \"Includes initialization context and some usage patterns but could be more comprehensive\",\n                    ScoreLevel.VERY_GOOD: \"Clear purpose, initialization, and usage patterns with thread-safety context\",\n                    ScoreLevel.EXCELLENT: \"Comprehensive guidance including constraints, thread-safety, and practical usage tips\"\n                }\n            )\n        ]\n\n    def get_evaluation_prompt(self, class_signature: str, init_function: str,\n                            attribute_descriptions: Dict[str, str]) -> str:\n        \"\"\"\n        Generates a prompt for LLM evaluation of attribute descriptions.\n        \n        Args:\n            class_signature: The complete class signature.\n            init_function: The complete __init__ function including docstring.\n            attribute_descriptions: Dict mapping attribute names to their descriptions.\n            \n        Returns:\n            A formatted prompt string that can be sent to an LLM for evaluation.\n        \"\"\"\n        example = self.examples[0]  # Use first example as reference\n        \n        prompt = [\n            \"Please evaluate the following Python docstring attribute descriptions based on these criteria:\",\n            \"\",\n            \"<class_info>\",\n            f\"Class signature:\\n{class_signature}\",\n            \"\",\n            f\"Init function:\\n{init_function}\",\n            \"</class_info>\",\n            \"\",\n            \"<attributes_to_evaluate>\",\n            \"Attribute descriptions to evaluate:\",\n        ]\n        \n        for attr, desc in attribute_descriptions.items():\n            prompt.append(f\"{attr}: {desc}\")\n        prompt.append(\"</attributes_to_evaluate>\")\n        \n        prompt.extend([\n            \"\",\n            \"<evaluation_criteria>\",\n            \"Evaluation criteria:\",\n            self.criteria['description'],\n            \"\",\n            \"Score levels:\",\n        ])\n        \n        # Add criteria for each score level\n        for level in ScoreLevel:\n            prompt.append(f\"{level.value}. {self.criteria['score_criteria'][level]}\")\n        prompt.append(\"</evaluation_criteria>\")\n        \n        # Add example\n        prompt.extend([\n            \"\",\n            \"<reference_example>\",\n            \"Example for reference:\",\n            f\"Class: {example.class_signature}\",\n            f\"Init:\\n{example.init_function}\",\n            \"\",\n            \"Attribute descriptions at different quality levels:\",\n        ])\n        \n        for level in ScoreLevel:\n            prompt.extend([\n                f\"Level {level.value}:\",\n                *[f\"{attr}: {desc}\" for attr, desc in example.quality_examples[level].items()],\n                f\"Explanation: {example.explanations[level]}\",\n                \"\"\n            ])\n        prompt.append(\"</reference_example>\")\n        \n        prompt.extend([\n            \"\",\n            \"<analysis_instructions>\",\n            \"IMPORTANT INSTRUCTIONS FOR ANALYSIS:\",\n            \"1. Analyze how well each attribute description provides meaningful information beyond type hints\",\n            \"2. Consider completeness of lifecycle documentation (initialization, modification, access patterns)\",\n            \"3. Look for helpful context about relationships with class behavior\",\n            \"4. Check for thread-safety and special handling documentation where relevant\",\n            \"</analysis_instructions>\",\n            \"\",\n            \"<response_format>\",\n            \"Please structure your response as follows:\",\n            \"1. Analyze each attribute description's strengths and weaknesses\",\n            \"2. Compare against the criteria and example quality levels\",\n            \"3. Suggest specific improvements for weaker descriptions\",\n            \"4. Provide your score (1-5) enclosed in <score></score> tags\",\n            \"</response_format>\",\n            \"\",\n            \"Remember: Do not rush to assign a score. Take time to analyze thoroughly and justify your reasoning.\",            \n            \"No need to provide Suggestions for Improvement\",\n            \"The score should reflect your careful analysis and should be the last part of your response.\"\n        ])\n        \n        return \"\\n\".join(prompt)\n    \n    def parse_llm_response(self, response: str) -> Tuple[int, str]:\n        \"\"\"\n        Extracts the numerical score and full analysis from an LLM's response.\n        \n        Args:\n            response: The complete response text from the LLM.\n            \n        Returns:\n            A tuple containing:\n            - The numerical score (1-5)\n            - The full analysis text\n            \n        Raises:\n            ValueError: If no valid score is found or if multiple scores are found.\n        \"\"\"\n        # Extract score from XML tags\n        score_matches = re.findall(r'<score>(\\d)</score>', response)\n        \n        if not score_matches:\n            raise ValueError(\"No valid score found in LLM response. Response must include a score in <score></score> tags.\")\n        \n        if len(score_matches) > 1:\n            raise ValueError(\"Multiple scores found in LLM response. Expected exactly one score.\")\n            \n        score = int(score_matches[0])\n        if score < 1 or score > 5:\n            raise ValueError(f\"Invalid score value: {score}. Score must be between 1 and 5.\")\n        \n        # Remove the score tags from the analysis text\n        analysis = re.sub(r'<score>\\d</score>', '', response).strip()\n        \n        return score, analysis\n\n    def get_criteria_description(self) -> str:\n        \"\"\"Returns the main criteria description.\"\"\"\n        return self.criteria['description']\n\n    def get_score_criteria(self, level: ScoreLevel) -> str:\n        \"\"\"Returns the criteria description for a specific score level.\"\"\"\n        return self.criteria['score_criteria'][level]\n\n    def get_examples(self) -> List[EvaluationExample]:\n        \"\"\"Returns all evaluation examples.\"\"\"\n        return self.examples "
  },
  {
    "path": "src/evaluator/helpfulness_description.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, List, Tuple\nfrom dataclasses import dataclass\nfrom enum import Enum\nimport re\n\nfrom src.evaluator.evaluation_common import ScoreLevel\n\nclass DescriptionAspect(Enum):\n    \"\"\"Defines the different aspects of docstring description evaluation.\"\"\"\n    MOTIVATION = \"motivation\"\n    USAGE_SCENARIOS = \"usage_scenarios\"\n    INTEGRATION = \"integration\"\n    FUNCTIONALITY = \"functionality\"\n\n@dataclass\nclass AspectCriteria:\n    \"\"\"Stores criteria for a single evaluation aspect.\"\"\"\n    description: str\n    score_criteria: Dict[ScoreLevel, str]\n    example_good: str\n    example_poor: str\n\nclass DocstringDescriptionEvaluator:\n    \"\"\"\n    Evaluates the quality of Python docstring descriptions across multiple aspects.\n    \n    This evaluator analyzes docstring descriptions based on four key aspects:\n    1. Motivation/Purpose explanation\n    2. Usage scenarios and conditions\n    3. System integration and interactions\n    4. Functionality overview\n    \n    Each aspect is scored independently on a scale of 1-5, providing a comprehensive\n    assessment of the description's effectiveness.\n    \"\"\"\n\n    def __init__(self):\n        \"\"\"Initialize the evaluator with predefined criteria for each aspect.\"\"\"\n        self.criteria = self._initialize_criteria()\n\n    def _initialize_criteria(self) -> Dict[DescriptionAspect, AspectCriteria]:\n        \"\"\"\n        Set up the evaluation criteria for each aspect of docstring descriptions.\n        \n        Returns:\n            Dictionary mapping aspects to their evaluation criteria.\n        \"\"\"\n        return {\n            DescriptionAspect.MOTIVATION: AspectCriteria(\n                description=\"How well does the description explain the reason or motivation behind the code?\",\n                score_criteria={\n                    ScoreLevel.POOR: \"No explanation of why the code exists or its purpose\",\n                    ScoreLevel.FAIR: \"Basic purpose stated but without context or reasoning\",\n                    ScoreLevel.GOOD: \"Clear explanation of purpose with some context\",\n                    ScoreLevel.VERY_GOOD: \"Thorough explanation of purpose with business/technical context\",\n                    ScoreLevel.EXCELLENT: \"Comprehensive explanation of purpose, context, and value proposition\"\n                },\n                example_good=(\n                    \"This cache manager addresses the performance bottleneck in our API \"\n                    \"responses by reducing database load during peak hours, while ensuring \"\n                    \"data freshness for critical operations.\"\n                ),\n                example_poor=\"This is a cache manager for storing data.\"\n            ),\n            \n            DescriptionAspect.USAGE_SCENARIOS: AspectCriteria(\n                description=\"How effectively does it describe when and how to use the code?\",\n                score_criteria={\n                    ScoreLevel.POOR: \"No information about usage scenarios\",\n                    ScoreLevel.FAIR: \"Basic usage information without specific scenarios\",\n                    ScoreLevel.GOOD: \"Some key usage scenarios described\",\n                    ScoreLevel.VERY_GOOD: \"Detailed usage scenarios with common cases\",\n                    ScoreLevel.EXCELLENT: \"Comprehensive coverage of use cases, including edge cases\"\n                },\n                example_good=(\n                    \"Use this validator when processing user-submitted data, especially \"\n                    \"for high-stakes operations like financial transactions. It handles \"\n                    \"various edge cases including partial submissions and legacy formats.\"\n                ),\n                example_poor=\"Validates data according to rules.\"\n            ),\n            \n            DescriptionAspect.INTEGRATION: AspectCriteria(\n                description=\"How well does it explain integration with other system components?\",\n                score_criteria={\n                    ScoreLevel.POOR: \"No mention of system integration\",\n                    ScoreLevel.FAIR: \"Minimal reference to other components\",\n                    ScoreLevel.GOOD: \"Basic explanation of main interactions\",\n                    ScoreLevel.VERY_GOOD: \"Clear description of integration points and dependencies\",\n                    ScoreLevel.EXCELLENT: \"Comprehensive overview of system interactions and data flow\"\n                },\n                example_good=(\n                    \"This service interfaces with the UserAuth system for validation, \"\n                    \"writes logs to CloudWatch, and triggers notifications through SNS. \"\n                    \"It serves as a crucial link between the frontend and payment processor.\"\n                ),\n                example_poor=\"Processes data and sends it to other services.\"\n            ),\n            \n            DescriptionAspect.FUNCTIONALITY: AspectCriteria(\n                description=\"How clearly does it explain the functionality without excessive technical detail?\",\n                score_criteria={\n                    ScoreLevel.POOR: \"No explanation of functionality\",\n                    ScoreLevel.FAIR: \"Overly technical or vague explanation\",\n                    ScoreLevel.GOOD: \"Basic explanation of main functionality\",\n                    ScoreLevel.VERY_GOOD: \"Clear, balanced explanation of functionality\",\n                    ScoreLevel.EXCELLENT: \"Perfect balance of clarity and technical detail\"\n                },\n                example_good=(\n                    \"Processes incoming customer data by first validating format and required fields, \"\n                    \"then enriching with relevant historical data, and finally \"\n                    \"generating risk scores using configurable criteria.\"\n                ),\n                example_poor=\"Processes data using various functions and algorithms.\"\n            )\n        }\n\n    def get_evaluation_prompt(self, code_implementation: str, docstring: str, eval_type: str = None) -> str:\n        \"\"\"\n        Generates a prompt for LLM evaluation of docstring descriptions.\n        \n        Args:\n            code_implementation: The function or class implementation\n            docstring: The docstring to evaluate\n            eval_type: The type of code component (class, function, method). \n                       If not provided, it will be determined from code_implementation.\n            \n        Returns:\n            Prompt for LLM evaluation\n        \"\"\"\n        # Determine eval_type if not provided\n        if eval_type is None:\n            if code_implementation.strip().startswith(\"class \"):\n                eval_type = \"class\"\n            else:\n                eval_type = \"function\" if \"self\" not in code_implementation.split(\"(\")[0] else \"method\"\n        \n        # Extract description from docstring (everything after the summary)\n        description = self._extract_description(docstring)\n        \n        if not description:\n            return \"The docstring does not have a description section to evaluate.\"\n        \n        prompt = [\"# Docstring Description Evaluation\", \"\"]\n        \n        prompt.extend([\n            \"## Code Component\",\n            f\"```python\",\n            f\"{code_implementation}\",\n            f\"```\",\n            \"\",\n        ])\n        \n        prompt.extend([\n            \"## Docstring Description to Evaluate\",\n            f\"```\",\n            f\"{description}\",\n            f\"```\",\n            \"\",\n        ])\n        \n        # Add evaluation criteria\n        prompt.extend([\n            \"## Evaluation Criteria\",\n            \"Please evaluate the above docstring description across these four aspects:\",\n            \"\"\n        ])\n        \n        for aspect in DescriptionAspect:\n            criteria = self.criteria[aspect]\n            prompt.extend([\n                f\"### {aspect.value.title()}\",\n                f\"{criteria.description}\",\n                \"\",\n                \"Score levels:\",\n                \"\",\n            ])\n            \n            for level in ScoreLevel:\n                prompt.append(f\"{level.value}. {criteria.score_criteria[level]}\")\n            \n            prompt.extend([\n                \"\",\n                \"Examples:\",\n                f\"Good: \\\"{criteria.example_good}\\\"\",\n                f\"Poor: \\\"{criteria.example_poor}\\\"\",\n                \"\",\n            ])\n        \n        # Add output format instructions\n        prompt.extend([\n            \"## Output Format\",\n            \"Please evaluate the description and provide your assessment in this format:\",\n            \"\",\n            \"```\",\n            \"Motivation: [score 1-5]\",\n            \"Usage Scenarios: [score 1-5]\",\n            \"Integration: [score 1-5]\",\n            \"Functionality: [score 1-5]\",\n            \"\",\n            \"Overall: [average of the scores, rounded to nearest integer]\",\n            \"\",\n            \"Suggestions: [2-3 concrete suggestions for improvement focusing on the weakest aspects]\",\n            \"```\",\n        ])\n        \n        return \"\\n\".join(prompt)\n\n    def parse_llm_response(self, response: str) -> Tuple[int, str]:\n        \"\"\"\n        Extracts scores and suggestions from an LLM's response.\n        \n        Args:\n            response: The complete response text from the LLM.\n            \n        Returns:\n            Tuple of (overall_score, suggestions)\n            \n        Raises:\n            ValueError: If required information is missing or invalid.\n        \"\"\"\n        # Default score if we can't find explicit scores\n        default_score = 3\n        \n        # If the response indicates no description section\n        if \"docstring does not have a description section\" in response:\n            return default_score, \"Add a description section to the docstring.\"\n        \n        # Try to extract an overall score first (easiest)\n        overall_pattern = r\"Overall:\\s*\\[?(\\d)\\.?\\d*\\]?\"\n        overall_matches = re.findall(overall_pattern, response, re.IGNORECASE)\n        \n        if overall_matches:\n            overall_score = int(overall_matches[0])\n        else:\n            # If we can't find an explicit overall score, use a default\n            overall_score = default_score\n        \n        # Extract suggestions\n        # Look for several common patterns\n        suggestion_patterns = [\n            r\"Suggestions:\\s*(.+?)(?:\\n\\n|\\Z)\",  # Format in prompt\n            r\"<suggestions>(.*?)</suggestions>\",  # XML tags\n            r\"suggestions?:?\\s*\\n\\s*(.+?)(?:\\n\\n|\\Z)\",  # Common formats\n        ]\n        \n        for pattern in suggestion_patterns:\n            suggestion_matches = re.findall(pattern, response, re.DOTALL | re.IGNORECASE)\n            if suggestion_matches:\n                suggestion = suggestion_matches[0].strip()\n                break\n        else:\n            # Default suggestion if none found\n            suggestion = \"Consider adding more detail to the description section.\"\n        \n        return overall_score, suggestion\n\n    def _extract_description(self, docstring: str) -> str:\n        \"\"\"\n        Extract the description part from a docstring.\n        \n        The description is everything after the summary line (first line)\n        and before any parameter sections, return sections, etc.\n        \n        Args:\n            docstring: The complete docstring\n            \n        Returns:\n            The extracted description, or empty string if none found\n        \"\"\"\n        if not docstring:\n            return \"\"\n            \n        # Split into lines and remove empty lines at start/end\n        lines = [line.strip() for line in docstring.strip().split('\\n')]\n        if not lines:\n            return \"\"\n            \n        # Skip the first line (summary)\n        lines = lines[1:]\n        \n        # Find where the parameters section or other sections begin\n        section_markers = ['Args:', 'Parameters:', 'Arguments:', 'Returns:', 'Raises:', 'Yields:', 'Examples:']\n        \n        description_lines = []\n        for line in lines:\n            # Stop if we hit a section marker\n            if any(line.strip().startswith(marker) for marker in section_markers):\n                break\n            description_lines.append(line)\n        \n        # Join and strip to get the description\n        description = '\\n'.join(description_lines).strip()\n        return description"
  },
  {
    "path": "src/evaluator/helpfulness_evaluator.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport json\nimport random\nimport os\nimport sys\nfrom pathlib import Path\nfrom typing import Dict, Any, List, Optional, Tuple\nfrom dataclasses import dataclass\n\n# Add the project root to path\nproject_root = Path(__file__).parent.parent.parent\nsys.path.insert(0, str(project_root))\n\nfrom src.evaluator.helpfulness_summary import DocstringSummaryEvaluator\nfrom src.evaluator.helpfulness_description import DocstringDescriptionEvaluator  \nfrom src.evaluator.helpfulness_parameters import DocstringParametersEvaluator\nfrom src.agent.llm.openai_llm import OpenAILLM\n\n@dataclass\nclass EvaluationResult:\n    \"\"\"Store the results of a single evaluation.\"\"\"\n    system: str\n    component_id: str\n    aspect: str\n    score: int\n    suggestion: str\n\nclass DocstringHelpfulnessEvaluator:\n    \"\"\"Evaluates the helpfulness of docstrings generated by different systems.\"\"\"\n    \n    SYSTEMS = [\n        \"copy_paste_codellama34b\",\n        \"copy_paste_gpt4o_mini\",\n        \"docassist-codellama34b\",\n        \"docassist-gpt4o_mini\",\n        \"fim-codellama13b\",\n    ]\n    \n    ASPECTS = [\"summary\", \"description\", \"parameters\"]\n    \n    def __init__(self, data_path: str, output_dir: str, api_key: str, model: str = \"gpt-4o\"):\n        \"\"\"Initialize the evaluator.\n        \n        Args:\n            data_path: Path to the completeness evaluation data\n            output_dir: Directory to store evaluation results\n            api_key: OpenAI API key\n            model: LLM model to use for evaluation\n        \"\"\"\n        self.data_path = data_path\n        self.output_dir = output_dir\n        self.llm = OpenAILLM(api_key=api_key, model=model)\n        \n        # Initialize evaluators for each aspect\n        self.evaluators = {\n            \"summary\": DocstringSummaryEvaluator(),\n            \"description\": DocstringDescriptionEvaluator(),\n            \"parameters\": DocstringParametersEvaluator()\n        }\n        \n        # Load evaluation data\n        with open(self.data_path, 'r') as f:\n            self.data = json.load(f)\n            \n        # Create output directory if it doesn't exist\n        os.makedirs(self.output_dir, exist_ok=True)\n    \n    def sample_components(self, n: int = 50, seed: int = 42) -> List[str]:\n        \"\"\"Randomly sample code components where all systems have valid docstrings.\n        \n        Args:\n            n: Number of components to sample\n            seed: Random seed for reproducibility\n            \n        Returns:\n            List of component IDs\n        \"\"\"\n        random.seed(seed)\n        \n        # Filter components where all systems have valid docstrings\n        valid_components = []\n        for component_id, component_data in self.data.items():\n            # Check if all systems have docstrings\n            has_all_docstrings = True\n            for system in self.SYSTEMS:\n                if system not in component_data.get(\"docstrings\", {}):\n                    has_all_docstrings = False\n                    break\n                \n                # Check if docstring is not empty\n                docstring = component_data[\"docstrings\"].get(system, {}).get(\"docstring\", \"\")\n                if not docstring or docstring == \"example string\":\n                    has_all_docstrings = False\n                    break\n            \n            if has_all_docstrings:\n                valid_components.append(component_id)\n        \n        # Sample n components\n        if len(valid_components) < n:\n            print(f\"Warning: Only {len(valid_components)} components have valid docstrings for all systems\")\n            return valid_components\n        \n        return random.sample(valid_components, n)\n    \n    def evaluate_component(self, component_id: str) -> List[EvaluationResult]:\n        \"\"\"Evaluate docstrings from all systems for a given component.\n        \n        Args:\n            component_id: Component ID\n            \n        Returns:\n            List of evaluation results\n        \"\"\"\n        component_data = self.data[component_id]\n        results = []\n        \n        component_type = component_data.get(\"type\", \"function\")\n        source_code = component_data.get(\"source_code\", \"\")\n        \n        for system in self.SYSTEMS:\n            if system not in component_data.get(\"docstrings\", {}):\n                continue\n                \n            system_data = component_data[\"docstrings\"][system]\n            docstring = system_data.get(\"docstring\", \"\")\n            \n            # Skip if docstring is empty or the example placeholder\n            if not docstring or docstring == \"example string\":\n                continue\n            \n            print(f\"  Evaluating system: {system}\")\n            # Evaluate each aspect\n            for aspect in self.ASPECTS:\n                # Check if the aspect is present in the docstring\n                element_scores = system_data.get(\"element_scores\", {})\n                if aspect not in element_scores or not element_scores[aspect]:\n                    print(f\"    Skipping aspect '{aspect}' - not present in docstring\")\n                    continue\n                \n                print(f\"    Evaluating aspect: {aspect}\")\n                try:\n                    # Get the evaluator for this aspect\n                    evaluator = self.evaluators[aspect]\n                    \n                    # Create prompt for evaluation\n                    prompt = evaluator.get_evaluation_prompt(source_code, docstring, component_type)\n                    \n                    # Call LLM for evaluation\n                    messages = [\n                        self.llm.format_message(\"system\", \"You are an expert docstring quality evaluator.\"),\n                        self.llm.format_message(\"user\", prompt)\n                    ]\n                    \n                    response = self.llm.generate(messages, temperature=0.1, max_tokens=1024)\n                    \n                    # Parse response\n                    score, suggestion = evaluator.parse_llm_response(response)\n                    \n                    print(f\"      Score: {score}\")\n                    \n                    # Store result\n                    result = EvaluationResult(\n                        system=system,\n                        component_id=component_id,\n                        aspect=aspect,\n                        score=score,\n                        suggestion=suggestion\n                    )\n                    \n                    results.append(result)\n                except Exception as e:\n                    print(f\"    Error evaluating {aspect}: {str(e)}\")\n                    # Continue with other evaluations\n        \n        return results\n    \n    def run_evaluation(self, n_samples: int = 50, seed: int = 42) -> Dict[str, Any]:\n        \"\"\"Run the helpfulness evaluation on sampled components.\n        \n        Args:\n            n_samples: Number of components to sample\n            seed: Random seed for reproducibility\n            \n        Returns:\n            Evaluation results\n        \"\"\"\n        # Sample components\n        component_ids = self.sample_components(n_samples, seed)\n        \n        # Evaluate each component\n        all_results = []\n        for component_id in component_ids:\n            print(f\"Evaluating component: {component_id}\")\n            results = self.evaluate_component(component_id)\n            all_results.extend(results)\n        \n        # Organize results\n        results_dict = {\n            \"metadata\": {\n                \"n_samples\": len(component_ids),\n                \"seed\": seed,\n                \"systems\": self.SYSTEMS,\n                \"aspects\": self.ASPECTS\n            },\n            \"component_ids\": component_ids,\n            \"results\": [\n                {\n                    \"system\": r.system,\n                    \"component_id\": r.component_id,\n                    \"aspect\": r.aspect,\n                    \"score\": r.score,\n                    \"suggestion\": r.suggestion\n                }\n                for r in all_results\n            ]\n        }\n        \n        # Save results to file\n        output_path = os.path.join(self.output_dir, \"helpfulness_evaluation_results.json\")\n        with open(output_path, 'w') as f:\n            json.dump(results_dict, f, indent=2)\n        \n        # Generate statistics\n        stats = self.calculate_statistics(results_dict)\n        \n        # Save statistics to file\n        stats_path = os.path.join(self.output_dir, \"helpfulness_evaluation_stats.md\")\n        with open(stats_path, 'w') as f:\n            f.write(self.format_statistics_markdown(stats))\n        \n        return results_dict\n    \n    def calculate_statistics(self, results: Dict[str, Any]) -> Dict[str, Any]:\n        \"\"\"Calculate statistics from evaluation results.\n        \n        Args:\n            results: Evaluation results\n            \n        Returns:\n            Statistics\n        \"\"\"\n        stats = {\n            \"overall\": {},\n            \"by_system\": {},\n            \"by_aspect\": {},\n            \"by_system_and_aspect\": {}\n        }\n        \n        # Calculate overall average\n        scores = [r[\"score\"] for r in results[\"results\"]]\n        stats[\"overall\"][\"average_score\"] = sum(scores) / len(scores) if scores else 0\n        stats[\"overall\"][\"count\"] = len(scores)\n        \n        # Calculate average by system\n        for system in self.SYSTEMS:\n            system_scores = [r[\"score\"] for r in results[\"results\"] if r[\"system\"] == system]\n            stats[\"by_system\"][system] = {\n                \"average_score\": sum(system_scores) / len(system_scores) if system_scores else 0,\n                \"count\": len(system_scores)\n            }\n        \n        # Calculate average by aspect\n        for aspect in self.ASPECTS:\n            aspect_scores = [r[\"score\"] for r in results[\"results\"] if r[\"aspect\"] == aspect]\n            stats[\"by_aspect\"][aspect] = {\n                \"average_score\": sum(aspect_scores) / len(aspect_scores) if aspect_scores else 0,\n                \"count\": len(aspect_scores)\n            }\n        \n        # Calculate average by system and aspect\n        for system in self.SYSTEMS:\n            stats[\"by_system_and_aspect\"][system] = {}\n            for aspect in self.ASPECTS:\n                scores = [r[\"score\"] for r in results[\"results\"] \n                         if r[\"system\"] == system and r[\"aspect\"] == aspect]\n                stats[\"by_system_and_aspect\"][system][aspect] = {\n                    \"average_score\": sum(scores) / len(scores) if scores else 0,\n                    \"count\": len(scores)\n                }\n        \n        return stats\n    \n    def format_statistics_markdown(self, stats: Dict[str, Any]) -> str:\n        \"\"\"Format statistics as markdown.\n        \n        Args:\n            stats: Statistics\n            \n        Returns:\n            Markdown representation of statistics\n        \"\"\"\n        md = \"# Docstring Helpfulness Evaluation Results\\n\\n\"\n        \n        # Overall statistics\n        md += \"## Overall Statistics\\n\\n\"\n        md += f\"- Average Score: {stats['overall']['average_score']:.2f}\\n\"\n        md += f\"- Number of Evaluations: {stats['overall']['count']}\\n\\n\"\n        \n        # By system\n        md += \"## Results by System\\n\\n\"\n        md += \"| System | Average Score | Count |\\n\"\n        md += \"| ------ | ------------- | ----- |\\n\"\n        for system, system_stats in stats[\"by_system\"].items():\n            md += f\"| {system} | {system_stats['average_score']:.2f} | {system_stats['count']} |\\n\"\n        md += \"\\n\"\n        \n        # By aspect\n        md += \"## Results by Aspect\\n\\n\"\n        md += \"| Aspect | Average Score | Count |\\n\"\n        md += \"| ------ | ------------- | ----- |\\n\"\n        for aspect, aspect_stats in stats[\"by_aspect\"].items():\n            md += f\"| {aspect} | {aspect_stats['average_score']:.2f} | {aspect_stats['count']} |\\n\"\n        md += \"\\n\"\n        \n        # By system and aspect\n        md += \"## Results by System and Aspect\\n\\n\"\n        md += \"| System | Aspect | Average Score | Count |\\n\"\n        md += \"| ------ | ------ | ------------- | ----- |\\n\"\n        for system, aspects in stats[\"by_system_and_aspect\"].items():\n            for aspect, aspect_stats in aspects.items():\n                md += f\"| {system} | {aspect} | {aspect_stats['average_score']:.2f} | {aspect_stats['count']} |\\n\"\n        \n        return md\n\ndef main():\n    \"\"\"Run the docstring helpfulness evaluation.\"\"\"\n    # Configuration\n    data_path = \"experiments/eval/results/completeness_evaluation_cleaned.json\"\n    output_dir = \"experiments/eval/results/helpfulness\"\n    \n    # Get API key from config\n    with open(\"config/agent_config.yaml\", 'r') as f:\n        config = yaml.safe_load(f)\n        api_key = config[\"llm\"][\"api_key\"]\n        model = config[\"llm\"][\"model\"]\n    \n    # Run evaluation\n    evaluator = DocstringHelpfulnessEvaluator(data_path, output_dir, api_key, model)\n    evaluator.run_evaluation(n_samples=50, seed=42)\n\nif __name__ == \"__main__\":\n    import yaml\n    main() "
  },
  {
    "path": "src/evaluator/helpfulness_evaluator_ablation.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport json\nimport random\nimport os\nimport sys\nfrom pathlib import Path\nfrom typing import Dict, Any, List, Optional, Tuple\nfrom dataclasses import dataclass\n\n# Add the project root to path\nproject_root = Path(__file__).parent.parent.parent\nsys.path.insert(0, str(project_root))\n\nfrom src.evaluator.helpfulness_summary import DocstringSummaryEvaluator\nfrom src.evaluator.helpfulness_description import DocstringDescriptionEvaluator  \nfrom src.evaluator.helpfulness_parameters import DocstringParametersEvaluator\nfrom src.agent.llm.openai_llm import OpenAILLM\n\n@dataclass\nclass EvaluationResult:\n    \"\"\"Store the results of a single evaluation.\"\"\"\n    system: str\n    component_id: str\n    aspect: str\n    score: int\n    suggestion: str\n\nclass DocstringHelpfulnessEvaluatorAblation:\n    \"\"\"Evaluates the helpfulness of docstrings generated by different systems.\"\"\"\n    \n    SYSTEMS = [\n        \"docassist-codellama34b-random-file\",\n        \"docassist-codellama34b-random-node\",\n        \"docassist-gpt4o_mini-random-file\",\n        \"docassist-gpt4o_mini-random-node\",\n        \"docassist-codellama34b\",\n        \"docassist-gpt4o_mini\",\n    ]\n    \n    ASPECTS = [\"summary\", \"description\", \"parameters\"]\n    \n    def __init__(self, data_path: str, output_dir: str, api_key: str, model: str = \"gpt-4o\"):\n        \"\"\"Initialize the evaluator.\n        \n        Args:\n            data_path: Path to the completeness evaluation data\n            output_dir: Directory to store evaluation results\n            api_key: OpenAI API key\n            model: LLM model to use for evaluation\n        \"\"\"\n        self.data_path = data_path\n        self.output_dir = output_dir\n        self.llm = OpenAILLM(api_key=api_key, model=model)\n        \n        # Initialize evaluators for each aspect\n        self.evaluators = {\n            \"summary\": DocstringSummaryEvaluator(),\n            \"description\": DocstringDescriptionEvaluator(),\n            \"parameters\": DocstringParametersEvaluator()\n        }\n        \n        # Load evaluation data\n        with open(self.data_path, 'r') as f:\n            self.data = json.load(f)\n            \n        # Create output directory if it doesn't exist\n        os.makedirs(self.output_dir, exist_ok=True)\n    \n    def sample_components(self, n: Optional[int] = 50, seed: int = 42) -> List[str]:\n        \"\"\"Randomly sample code components where all systems have valid docstrings.\n        \n        Args:\n            n: Number of components to sample. If None, return all valid components.\n            seed: Random seed for reproducibility\n            \n        Returns:\n            List of component IDs\n        \"\"\"\n        random.seed(seed)\n        \n        # Filter components where all systems have valid docstrings\n        valid_components = []\n        for component_id, component_data in self.data.items():\n            # Check if all systems have docstrings\n            has_all_docstrings = True\n            for system in self.SYSTEMS:\n                if system not in component_data.get(\"docstrings\", {}):\n                    has_all_docstrings = False\n                    break\n                \n                # Check if docstring is not empty\n                docstring = component_data[\"docstrings\"].get(system, {}).get(\"docstring\", \"\")\n                if not docstring or docstring == \"example string\":\n                    has_all_docstrings = False\n                    break\n            \n            if has_all_docstrings:\n                valid_components.append(component_id)\n        \n        # If n is None, return all valid components\n        if n is None:\n            print(f\"Using all {len(valid_components)} valid components\")\n            return valid_components\n            \n        # Sample n components\n        if len(valid_components) < n:\n            print(f\"Warning: Only {len(valid_components)} components have valid docstrings for all systems\")\n            return valid_components\n        \n        return random.sample(valid_components, n)\n    \n    def evaluate_component(self, component_id: str) -> List[EvaluationResult]:\n        \"\"\"Evaluate docstrings from all systems for a given component.\n        \n        Args:\n            component_id: Component ID\n            \n        Returns:\n            List of evaluation results\n        \"\"\"\n        component_data = self.data[component_id]\n        results = []\n        \n        component_type = component_data.get(\"type\", \"function\")\n        source_code = component_data.get(\"source_code\", \"\")\n        \n        for system in self.SYSTEMS:\n            if system not in component_data.get(\"docstrings\", {}):\n                continue\n                \n            system_data = component_data[\"docstrings\"][system]\n            docstring = system_data.get(\"docstring\", \"\")\n            \n            # Skip if docstring is empty or the example placeholder\n            if not docstring or docstring == \"example string\":\n                continue\n            \n            print(f\"  Evaluating system: {system}\")\n            # Evaluate each aspect\n            for aspect in self.ASPECTS:\n                # Check if the aspect is present in the docstring\n                element_scores = system_data.get(\"element_scores\", {})\n                if aspect not in element_scores or not element_scores[aspect]:\n                    print(f\"    Skipping aspect '{aspect}' - not present in docstring\")\n                    continue\n                \n                print(f\"    Evaluating aspect: {aspect}\")\n                try:\n                    # Get the evaluator for this aspect\n                    evaluator = self.evaluators[aspect]\n                    \n                    # Create prompt for evaluation\n                    prompt = evaluator.get_evaluation_prompt(source_code, docstring, component_type)\n                    \n                    # Call LLM for evaluation\n                    messages = [\n                        self.llm.format_message(\"system\", \"You are an expert docstring quality evaluator.\"),\n                        self.llm.format_message(\"user\", prompt)\n                    ]\n                    \n                    response = self.llm.generate(messages, temperature=0.1, max_tokens=1024)\n                    \n                    # Parse response\n                    score, suggestion = evaluator.parse_llm_response(response)\n                    \n                    print(f\"      Score: {score}\")\n                    \n                    # Store result\n                    result = EvaluationResult(\n                        system=system,\n                        component_id=component_id,\n                        aspect=aspect,\n                        score=score,\n                        suggestion=suggestion\n                    )\n                    \n                    results.append(result)\n                except Exception as e:\n                    print(f\"    Error evaluating {aspect}: {str(e)}\")\n                    # Continue with other evaluations\n        \n        return results\n    \n    def run_evaluation(self, n_samples: int = 50, seed: int = 42) -> Dict[str, Any]:\n        \"\"\"Run the helpfulness evaluation on sampled components.\n        \n        Args:\n            n_samples: Number of components to sample\n            seed: Random seed for reproducibility\n            \n        Returns:\n            Evaluation results\n        \"\"\"\n        # Sample components\n        component_ids = self.sample_components(n_samples, seed)\n        \n        # Evaluate each component\n        all_results = []\n        for component_id in component_ids:\n            print(f\"Evaluating component: {component_id}\")\n            results = self.evaluate_component(component_id)\n            all_results.extend(results)\n        \n        # Organize results\n        results_dict = {\n            \"metadata\": {\n                \"n_samples\": len(component_ids),\n                \"seed\": seed,\n                \"systems\": self.SYSTEMS,\n                \"aspects\": self.ASPECTS\n            },\n            \"component_ids\": component_ids,\n            \"results\": [\n                {\n                    \"system\": r.system,\n                    \"component_id\": r.component_id,\n                    \"aspect\": r.aspect,\n                    \"score\": r.score,\n                    \"suggestion\": r.suggestion\n                }\n                for r in all_results\n            ]\n        }\n        \n        # Save results to file\n        output_path = os.path.join(self.output_dir, \"helpfulness_evaluation_results.json\")\n        with open(output_path, 'w') as f:\n            json.dump(results_dict, f, indent=2)\n        \n        # Generate statistics\n        stats = self.calculate_statistics(results_dict)\n        \n        # Save statistics to file\n        stats_path = os.path.join(self.output_dir, \"helpfulness_evaluation_stats.md\")\n        with open(stats_path, 'w') as f:\n            f.write(self.format_statistics_markdown(stats))\n        \n        return results_dict\n    \n    def calculate_statistics(self, results: Dict[str, Any]) -> Dict[str, Any]:\n        \"\"\"Calculate statistics from evaluation results.\n        \n        Args:\n            results: Evaluation results\n            \n        Returns:\n            Statistics\n        \"\"\"\n        stats = {\n            \"overall\": {},\n            \"by_system\": {},\n            \"by_aspect\": {},\n            \"by_system_and_aspect\": {}\n        }\n        \n        # Calculate overall average\n        scores = [r[\"score\"] for r in results[\"results\"]]\n        stats[\"overall\"][\"average_score\"] = sum(scores) / len(scores) if scores else 0\n        stats[\"overall\"][\"count\"] = len(scores)\n        \n        # Calculate average by system\n        for system in self.SYSTEMS:\n            system_scores = [r[\"score\"] for r in results[\"results\"] if r[\"system\"] == system]\n            stats[\"by_system\"][system] = {\n                \"average_score\": sum(system_scores) / len(system_scores) if system_scores else 0,\n                \"count\": len(system_scores)\n            }\n        \n        # Calculate average by aspect\n        for aspect in self.ASPECTS:\n            aspect_scores = [r[\"score\"] for r in results[\"results\"] if r[\"aspect\"] == aspect]\n            stats[\"by_aspect\"][aspect] = {\n                \"average_score\": sum(aspect_scores) / len(aspect_scores) if aspect_scores else 0,\n                \"count\": len(aspect_scores)\n            }\n        \n        # Calculate average by system and aspect\n        for system in self.SYSTEMS:\n            stats[\"by_system_and_aspect\"][system] = {}\n            for aspect in self.ASPECTS:\n                scores = [r[\"score\"] for r in results[\"results\"] \n                         if r[\"system\"] == system and r[\"aspect\"] == aspect]\n                stats[\"by_system_and_aspect\"][system][aspect] = {\n                    \"average_score\": sum(scores) / len(scores) if scores else 0,\n                    \"count\": len(scores)\n                }\n        \n        return stats\n    \n    def format_statistics_markdown(self, stats: Dict[str, Any]) -> str:\n        \"\"\"Format statistics as markdown.\n        \n        Args:\n            stats: Statistics\n            \n        Returns:\n            Markdown representation of statistics\n        \"\"\"\n        md = \"# Docstring Helpfulness Evaluation Results\\n\\n\"\n        \n        # Overall statistics\n        md += \"## Overall Statistics\\n\\n\"\n        md += f\"- Average Score: {stats['overall']['average_score']:.2f}\\n\"\n        md += f\"- Number of Evaluations: {stats['overall']['count']}\\n\\n\"\n        \n        # By system\n        md += \"## Results by System\\n\\n\"\n        md += \"| System | Average Score | Count |\\n\"\n        md += \"| ------ | ------------- | ----- |\\n\"\n        for system, system_stats in stats[\"by_system\"].items():\n            md += f\"| {system} | {system_stats['average_score']:.2f} | {system_stats['count']} |\\n\"\n        md += \"\\n\"\n        \n        # By aspect\n        md += \"## Results by Aspect\\n\\n\"\n        md += \"| Aspect | Average Score | Count |\\n\"\n        md += \"| ------ | ------------- | ----- |\\n\"\n        for aspect, aspect_stats in stats[\"by_aspect\"].items():\n            md += f\"| {aspect} | {aspect_stats['average_score']:.2f} | {aspect_stats['count']} |\\n\"\n        md += \"\\n\"\n        \n        # By system and aspect\n        md += \"## Results by System and Aspect\\n\\n\"\n        md += \"| System | Aspect | Average Score | Count |\\n\"\n        md += \"| ------ | ------ | ------------- | ----- |\\n\"\n        for system, aspects in stats[\"by_system_and_aspect\"].items():\n            for aspect, aspect_stats in aspects.items():\n                md += f\"| {system} | {aspect} | {aspect_stats['average_score']:.2f} | {aspect_stats['count']} |\\n\"\n        \n        return md\n\ndef main():\n    \"\"\"Run the docstring helpfulness evaluation.\"\"\"\n    # Configuration\n    data_path = \"experiments/eval/results/completeness_evaluation_ablation_cleaned.json\"\n    output_dir = \"experiments/eval/results/helpfulness_ablation\"\n    \n    # Get API key from config\n    with open(\"config/agent_config.yaml\", 'r') as f:\n        config = yaml.safe_load(f)\n        api_key = config[\"llm\"][\"api_key\"]\n        model = config[\"llm\"][\"model\"]\n    \n    # Run evaluation\n    evaluator = DocstringHelpfulnessEvaluatorAblation(data_path, output_dir, api_key, model)\n    evaluator.run_evaluation(n_samples=50, seed=42)\n\nif __name__ == \"__main__\":\n    import yaml\n    main() "
  },
  {
    "path": "src/evaluator/helpfulness_examples.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, List, Optional, Tuple, Union\nfrom dataclasses import dataclass\nfrom abc import ABC, abstractmethod\nimport ast\nimport re\n\n\n\ndef get_callable_name(node: Union[ast.Name, ast.Attribute]) -> str:\n    \"\"\"\n    Extract the name of a callable whether it's an ast.Name or ast.Attribute.\n    \"\"\"\n    if isinstance(node, ast.Name):\n        # e.g., \"my_function\"\n        return node.id\n    elif isinstance(node, ast.Attribute):\n        # e.g., \"some_module.my_function\"\n        # node.value.id -> \"some_module\", node.attr -> \"my_function\"\n        return node.attr\n    else:\n        raise ValueError(f\"Unsupported node type for function/class: {type(node)}\")\n\n\n@dataclass\nclass FunctionCallExample:\n    \"\"\"Stores an example of function usage with context and expected output.\"\"\"\n    context_code: str  # Code leading up to the function call\n    function_signature: str  # The complete function signature\n    docstring_example: str  # Only the example part of the docstring\n    expected_call: str  # The expected function call line(s)\n\n@dataclass\nclass ClassCallExample:\n    \"\"\"Stores an example of class instantiation with context and expected output.\"\"\"\n    context_code: str  # Code leading up to class instantiation\n    class_signature: str  # The class signature\n    init_signature: str  # The __init__ method signature\n    docstring_example: str  # Only the example part of the docstring\n    expected_call: str  # The expected instantiation line(s)\n\n@dataclass\nclass MethodCallExample:\n    \"\"\"Stores an example of method usage with context and expected output.\"\"\"\n    context_code: str  # Code leading up to method call\n    method_signature: str  # The method signature\n    docstring_example: str  # Only the example part of the docstring\n    expected_call: str  # The expected method call line(s)\n\nclass BaseExampleEvaluator(ABC):\n    \"\"\"\n    Base class for evaluating docstring examples.\n    \n    This class provides the foundation for evaluating how well docstring examples\n    enable users to correctly use the code without needing to understand its implementation.\n    \"\"\"\n    \n    @abstractmethod\n    def get_evaluation_prompt(self, context_code: str, signature: str, example: str) -> str:\n        \"\"\"\n        Generates a prompt for LLM to predict the next line(s) of code.\n        \n        Args:\n            context_code: The code leading up to where the prediction should be made\n            signature: The complete signature of the function/class/method\n            example: The example part of the docstring\n            \n        Returns:\n            A formatted prompt string for the LLM\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def evaluate_prediction(self, prediction: str, ground_truth: str) -> Tuple[bool, str]:\n        \"\"\"\n        Evaluates if the predicted usage matches the ground truth.\n        \n        Args:\n            prediction: The LLM's predicted line(s) of code\n            ground_truth: The expected line(s) of code\n            \n        Returns:\n            A tuple containing:\n            - Boolean indicating if the prediction is correct\n            - String explaining the evaluation result\n        \"\"\"\n        pass\n\nclass FunctionExampleEvaluator(BaseExampleEvaluator):\n    \"\"\"\n    Evaluates the quality of function docstring examples by testing if they enable\n    correct function usage prediction.\n    \"\"\"\n    \n    def get_evaluation_prompt(self, context_code: str, signature: str, example: str) -> str:\n        \"\"\"\n        Generates a prompt for LLM to predict the next line of function usage.\n        \n        Args:\n            context_code: The code leading up to the function call\n            signature: The complete function signature\n            example: The example part of the docstring\n            \n        Returns:\n            A formatted prompt string that can be sent to an LLM for prediction\n        \"\"\"\n        prompt = [\n            \"Given the following context, predict ONLY the next line of code that calls the function.\",\n            \"Your prediction should be based solely on the function signature and example provided.\",\n            \"\",\n            \"Function signature:\",\n            signature,\n            \"\",\n            \"Example from docstring:\",\n            example,\n            \"\",\n            \"Context code leading up to function call:\",\n            context_code,\n            \"\",\n            \"IMPORTANT INSTRUCTIONS:\",\n            \"1. Predict ONLY the next line(s) that calls the function\",\n            \"2. Base your prediction solely on the signature and example\",\n            \"3. Include ONLY the function call, no additional explanation\",\n            \"4. If the function call spans multiple lines, include all necessary lines\",\n            \"5. Ensure the prediction is valid Python syntax\",\n            \"\",\n            \"Your prediction should be enclosed in <prediction></prediction> tags\",\n        ]\n        \n        return \"\\n\".join(prompt)\n    \n    def evaluate_prediction(self, prediction: str, ground_truth: str) -> Tuple[bool, str]:\n        \"\"\"\n        Evaluates if the predicted function call matches the ground truth.\n        \n        Performs robust parsing of both prediction and ground truth to compare:\n        1. Function name\n        2. Argument names and their order\n        3. Argument values (when they are literals)\n        \n        Args:\n            prediction: The LLM's predicted function call\n            ground_truth: The expected function call\n            \n        Returns:\n            Tuple containing:\n            - Boolean indicating if the prediction is correct\n            - String explaining why the prediction was correct or incorrect\n        \"\"\"\n        # Parse both prediction and ground truth into AST\n        pred_ast = ast.parse(prediction.strip()).body[0].value\n        truth_ast = ast.parse(ground_truth.strip()).body[0].value\n        \n        # Verify it's a function call\n        if not isinstance(pred_ast, ast.Call) or not isinstance(truth_ast, ast.Call):\n            return False, \"Not a valid function call\"\n        \n        # Check function name\n        pred_name = get_callable_name(pred_ast.func)\n        truth_name = get_callable_name(truth_ast.func)\n\n        if pred_name != truth_name:\n            return False, f\"Mismatch: expected '{truth_name}', got '{pred_name}'\"\n        \n        # Get argument information\n        pred_args = {\n            kw.arg: kw.value for kw in pred_ast.keywords\n        }\n        truth_args = {\n            kw.arg: kw.value for kw in truth_ast.keywords\n        }\n        \n        # Check positional arguments\n        if len(pred_ast.args) != len(truth_ast.args):\n            return False, \"Mismatched number of positional arguments\"\n        \n        # Check keyword arguments\n        if set(pred_args.keys()) != set(truth_args.keys()):\n            return False, \"Mismatched keyword argument names\"\n        \n        # Check argument order for positional args\n        for i, (p_arg, t_arg) in enumerate(zip(pred_ast.args, truth_ast.args)):\n            if not self._compare_ast_nodes(p_arg, t_arg):\n                return False, f\"Positional argument {i+1} mismatch\"\n        \n        # Check keyword argument values\n        for arg_name, t_value in truth_args.items():\n            p_value = pred_args[arg_name]\n            if not self._compare_ast_nodes(p_value, t_value):\n                return False, f\"Keyword argument '{arg_name}' value mismatch\"\n        \n        return True, \"Function call matches expected usage\"\n    \n    def _compare_ast_nodes(self, node1: ast.AST, node2: ast.AST) -> bool:\n        \"\"\"\n        Helper method to compare two AST nodes.\n        \n        Args:\n            node1: First AST node\n            node2: Second AST node\n            \n        Returns:\n            Boolean indicating if the nodes are equivalent\n        \"\"\"\n        # For literals (strings, numbers, etc.)\n        if isinstance(node1, (ast.Str, ast.Num, ast.NameConstant)):\n            return isinstance(node2, type(node1)) and node1.value == node2.value\n        \n        # For variable names\n        if isinstance(node1, ast.Name) and isinstance(node2, ast.Name):\n            return node1.id == node2.id\n        \n        # For attribute access (e.g., obj.attr)\n        if isinstance(node1, ast.Attribute) and isinstance(node2, ast.Attribute):\n            return node1.attr == node2.attr and self._compare_ast_nodes(node1.value, node2.value)\n        \n        # For lists/tuples\n        if isinstance(node1, (ast.List, ast.Tuple)) and isinstance(node2, type(node1)):\n            if len(node1.elts) != len(node2.elts):\n                return False\n            return all(self._compare_ast_nodes(e1, e2) for e1, e2 in zip(node1.elts, node2.elts))\n        \n        return False\n\nclass ClassExampleEvaluator(BaseExampleEvaluator):\n    \"\"\"\n    Evaluates the quality of class docstring examples by testing if they enable\n    correct class instantiation prediction.\n    \"\"\"\n    \n    def get_evaluation_prompt(self, context_code: str, signature: str, example: str) -> str:\n        \"\"\"\n        Generates a prompt for LLM to predict the class instantiation line.\n        \n        Args:\n            context_code: The code leading up to class instantiation\n            signature: Combined class and __init__ signatures\n            example: The example part of the docstring\n            \n        Returns:\n            A formatted prompt string that can be sent to an LLM for prediction\n        \"\"\"\n        prompt = [\n            \"Given the following context, predict ONLY the next line of code that creates a class instance.\",\n            \"Your prediction should be based solely on the class signature and example provided.\",\n            \"\",\n            \"Class and __init__ signatures:\",\n            signature,\n            \"\",\n            \"Example from docstring:\",\n            example,\n            \"\",\n            \"Context code leading up to class instantiation:\",\n            context_code,\n            \"\",\n            \"IMPORTANT INSTRUCTIONS:\",\n            \"1. Predict ONLY the next line(s) that creates the class instance\",\n            \"2. Base your prediction solely on the signatures and example\",\n            \"3. Include ONLY the instantiation code, no additional explanation\",\n            \"4. If the instantiation spans multiple lines, include all necessary lines\",\n            \"5. Ensure the prediction is valid Python syntax\",\n            \"\",\n            \"Your prediction should be enclosed in <prediction></prediction> tags\",\n        ]\n        \n        return \"\\n\".join(prompt)\n    \n    def _compare_ast_nodes(self, node1: ast.AST, node2: ast.AST) -> bool:\n        \"\"\"\n        Example placeholder comparison method. \n        You should implement your logic based on how you want \n        to compare constant values, variable references, etc.\n        \"\"\"\n        if isinstance(node1, ast.Constant) and isinstance(node2, ast.Constant):\n            return node1.value == node2.value\n        # Extend your comparison logic here (e.g., for lists, dicts, names, etc.)\n        return ast.dump(node1) == ast.dump(node2)\n\n    def _get_func_name(self, node: Union[ast.Name, ast.Attribute]) -> str:\n        \"\"\"\n        Extract the function/class name whether it's `Name` or `Attribute`.\n        - ast.Name: directly has `node.id`\n        - ast.Attribute: the class name is in `node.attr` (e.g. `some_module.MyClass`)\n        \"\"\"\n        if isinstance(node, ast.Name):\n            return node.id\n        elif isinstance(node, ast.Attribute):\n            return node.attr\n        else:\n            # If your code can handle more node types, add logic here\n            raise ValueError(f\"Unsupported node type for function/class: {type(node)}\")\n\n    def evaluate_prediction(self, prediction: str, ground_truth: str) -> Tuple[bool, str]:\n        \"\"\"\n        Evaluates if the predicted class instantiation matches the ground truth.\n\n        Performs robust parsing of both prediction and ground truth to compare:\n        1. Class name\n        2. Constructor argument names and order\n        3. Argument values (when literals)\n        \n        Args:\n            prediction: The LLM's predicted instantiation code\n            ground_truth: The expected instantiation code\n            \n        Returns:\n            Tuple containing:\n            - Boolean indicating if the prediction is correct\n            - String explaining why the prediction was correct or incorrect\n        \"\"\"\n        # Parse both prediction and ground truth into AST\n        pred_ast = ast.parse(prediction.strip()).body[0].value\n        truth_ast = ast.parse(ground_truth.strip()).body[0].value\n        \n        # Verify it's a class instantiation\n        if not isinstance(pred_ast, ast.Call) or not isinstance(truth_ast, ast.Call):\n            return False, \"Not a valid class instantiation\"\n        \n        # Safely extract the class name from both\n        pred_func_name = self._get_func_name(pred_ast.func)\n        truth_func_name = self._get_func_name(truth_ast.func)\n        \n        # Check class name\n        if pred_func_name != truth_func_name:\n            return False, f\"Class name mismatch: expected {truth_func_name}, got {pred_func_name}\"\n        \n        # Get argument information (keyword args)\n        pred_args = {kw.arg: kw.value for kw in pred_ast.keywords}\n        truth_args = {kw.arg: kw.value for kw in truth_ast.keywords}\n        \n        # Check positional arguments\n        if len(pred_ast.args) != len(truth_ast.args):\n            return False, \"Mismatched number of positional arguments\"\n        \n        # Check keyword arguments\n        if set(pred_args.keys()) != set(truth_args.keys()):\n            return False, \"Mismatched keyword argument names\"\n        \n        # Check argument order and values for positional args\n        for i, (p_arg, t_arg) in enumerate(zip(pred_ast.args, truth_ast.args)):\n            if not self._compare_ast_nodes(p_arg, t_arg):\n                return False, f\"Positional argument {i+1} mismatch\"\n        \n        # Check keyword argument values\n        for arg_name, t_value in truth_args.items():\n            p_value = pred_args[arg_name]\n            if not self._compare_ast_nodes(p_value, t_value):\n                return False, f\"Keyword argument '{arg_name}' value mismatch\"\n        \n        return True, \"Class instantiation matches expected usage\"\n    \n    def _compare_ast_nodes(self, node1: ast.AST, node2: ast.AST) -> bool:\n        \"\"\"Helper method to compare two AST nodes.\"\"\"\n        # Reuse the same implementation as FunctionExampleEvaluator\n        return FunctionExampleEvaluator._compare_ast_nodes(self, node1, node2)\n\nclass MethodExampleEvaluator(BaseExampleEvaluator):\n    \"\"\"\n    Evaluates the quality of class method docstring examples by testing if they enable\n    correct method call prediction.\n    \"\"\"\n    \n    def get_evaluation_prompt(self, context_code: str, signature: str, example: str) -> str:\n        \"\"\"\n        Generates a prompt for LLM to predict the method call line.\n        \n        Args:\n            context_code: The code leading up to method call\n            signature: The method signature\n            example: The example part of the docstring\n            \n        Returns:\n            A formatted prompt string that can be sent to an LLM for prediction\n        \"\"\"\n        prompt = [\n            \"Given the following context, predict ONLY the next line of code that calls the class method.\",\n            \"Your prediction should be based solely on the method signature and example provided.\",\n            \"\",\n            \"Method signature:\",\n            \"<method_signature>\",\n            signature,\n            \"</method_signature>\",\n            \"\",\n            \"Example from docstring:\", \n            \"<docstring_example>\",\n            example,\n            \"</docstring_example>\",\n            \"\",\n            \"Context code leading up to method call:\",\n            \"<context_code>\",\n            context_code,\n            \"</context_code>\",\n            \"\",\n            \"IMPORTANT INSTRUCTIONS:\",\n            \"1. Predict ONLY the next line(s) that calls the method\",\n            \"2. Base your prediction solely on the signature and example\",\n            \"3. Include ONLY the method call, no additional explanation\", \n            \"4. If the method call spans multiple lines, include all necessary lines\",\n            \"5. Ensure the prediction is valid Python syntax\",\n            \"\",\n            \"Your prediction should be enclosed in <prediction></prediction> tags\",\n        ]\n        \n        return \"\\n\".join(prompt)\n    \n    def evaluate_prediction(self, prediction: str, ground_truth: str) -> Tuple[bool, str]:\n        \"\"\"\n        Evaluates if the predicted method call matches the ground truth.\n        \n        Performs robust parsing of both prediction and ground truth to compare:\n        1. Object and method names\n        2. Argument names and order\n        3. Argument values (when literals)\n        \n        Args:\n            prediction: The LLM's predicted method call\n            ground_truth: The expected method call\n            \n        Returns:\n            Tuple containing:\n            - Boolean indicating if the prediction is correct\n            - String explaining why the prediction was correct or incorrect\n        \"\"\"\n        # Parse both prediction and ground truth into AST\n        pred_ast = ast.parse(prediction.strip()).body[0].value\n        truth_ast = ast.parse(ground_truth.strip()).body[0].value\n        \n        # Verify it's a method call\n        if not isinstance(pred_ast, ast.Call) or not isinstance(truth_ast, ast.Call):\n            return False, \"Not a valid method call\"\n        \n        # For method calls, we need to check both object and method names\n        if not isinstance(pred_ast.func, ast.Attribute) or not isinstance(truth_ast.func, ast.Attribute):\n            return False, \"Not a valid method call (missing object reference)\"\n        \n        # Check object name\n        if not self._compare_ast_nodes(pred_ast.func.value, truth_ast.func.value):\n            return False, \"Object reference mismatch\"\n        \n        # Check method name\n        if pred_ast.func.attr != truth_ast.func.attr:\n            return False, f\"Method name mismatch: expected {truth_ast.func.attr}, got {pred_ast.func.attr}\"\n        \n        # Get argument information\n        pred_args = {\n            kw.arg: kw.value for kw in pred_ast.keywords\n        }\n        truth_args = {\n            kw.arg: kw.value for kw in truth_ast.keywords\n        }\n        \n        # Check positional arguments\n        if len(pred_ast.args) != len(truth_ast.args):\n            return False, \"Mismatched number of positional arguments\"\n        \n        # Check keyword arguments\n        if set(pred_args.keys()) != set(truth_args.keys()):\n            return False, \"Mismatched keyword argument names\"\n        \n        # Check argument order for positional args\n        for i, (p_arg, t_arg) in enumerate(zip(pred_ast.args, truth_ast.args)):\n            if not self._compare_ast_nodes(p_arg, t_arg):\n                return False, f\"Positional argument {i+1} mismatch\"\n        \n        # Check keyword argument values\n        for arg_name, t_value in truth_args.items():\n            p_value = pred_args[arg_name]\n            if not self._compare_ast_nodes(p_value, t_value):\n                return False, f\"Keyword argument '{arg_name}' value mismatch\"\n        \n        return True, \"Method call matches expected usage\"\n    \n    def _compare_ast_nodes(self, node1: ast.AST, node2: ast.AST) -> bool:\n        \"\"\"Helper method to compare two AST nodes.\"\"\"\n        # Reuse the same implementation as FunctionExampleEvaluator\n        return FunctionExampleEvaluator._compare_ast_nodes(self, node1, node2) "
  },
  {
    "path": "src/evaluator/helpfulness_parameters.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, List, Optional, Tuple\nimport re\nfrom dataclasses import dataclass\nfrom enum import Enum\n\nfrom src.evaluator.evaluation_common import ScoreLevel, ParameterEvaluationExample\n\nclass DocstringParametersEvaluator:\n    \"\"\"\n    Evaluates the quality of Python docstring parameter descriptions using predefined criteria.\n    \n    This class assesses how well parameter descriptions in docstrings convey the purpose,\n    constraints, and usage context of class initialization parameters, going beyond mere\n    type information to provide meaningful guidance to users.\n    \"\"\"\n\n    def __init__(self):\n        \"\"\"Initialize the evaluator with predefined criteria and examples.\"\"\"\n        self.criteria = self._initialize_criteria()\n        self.examples = self._initialize_examples()\n\n    def _initialize_criteria(self) -> Dict[str, Any]:\n        \"\"\"\n        Set up the evaluation criteria for parameter descriptions.\n        \n        The criteria define five quality levels, from mere type repetition (1) \n        to excellent usage guidance and context (5).\n        \n        Returns:\n            Dict containing the evaluation criteria and descriptions for each score level.\n        \"\"\"\n        return {\n            'description': (\n                'Evaluate how effectively the parameter descriptions convey the purpose, '\n                'constraints, and usage context of class initialization parameters. '\n                'High-quality descriptions should go beyond type information to provide '\n                'meaningful guidance about parameter usage, valid values, and impact '\n                'on class behavior.'\n            ),\n            'score_criteria': {\n                ScoreLevel.POOR: (\n                    'The parameter descriptions merely restate the parameter types or '\n                    'convert the type hints to natural language without adding any '\n                    'meaningful information about usage or purpose.'\n                ),\n                ScoreLevel.FAIR: (\n                    'The descriptions provide basic information about parameter purpose '\n                    'but lack details about constraints, valid values, or usage context. '\n                    'They may use vague language or miss important details.'\n                ),\n                ScoreLevel.GOOD: (\n                    'The descriptions explain parameter purpose and include some key '\n                    'constraints or valid value ranges, but might miss edge cases or '\n                    'lack examples where helpful.'\n                ),\n                ScoreLevel.VERY_GOOD: (\n                    'The descriptions clearly explain purpose, constraints, and common '\n                    'usage patterns. They may include examples for complex parameters '\n                    'and note important edge cases or default behaviors.'\n                ),\n                ScoreLevel.EXCELLENT: (\n                    'The descriptions provide comprehensive guidance including purpose, '\n                    'constraints, examples, edge cases, and impact on class behavior. But still keep it concise and focus on the most important information.'\n                    'They help users make informed decisions about parameter values.'\n                )\n            }\n        }\n    \n    def _initialize_examples(self) -> List[ParameterEvaluationExample]:\n        \"\"\"\n        Set up concrete examples of parameter descriptions at different quality levels.\n        \n        Each example includes class and __init__ signatures with corresponding parameter\n        descriptions at different quality levels, along with explanations of the ratings.\n        \n        Returns:\n            List of ParameterEvaluationExample objects containing the example cases.\n        \"\"\"\n        return [\n            ParameterEvaluationExample(\n                parameters={\n                    \"Model_entity_id\": \"Numeric identifier for the model entity\",\n                    \"Dist_pg\": \"Distributed process group for coordination\",\n                    \"Checkpoint_config\": \"Defines checkpoint saving intervals and retention\",\n                    \"Runtime_config\": \"Specifies resource or environmental constraints\",\n                    \"Train_module\": \"Orchestrates training steps and interfaces with checkpoints\"\n                },\n                quality_examples={\n                    ScoreLevel.POOR: {\n                        \"Model_entity_id\": \"the model entity ID\",\n                        \"Dist_pg\": \"The Process group\",\n                        \"Checkpoint_config\": \"The checkpoint Configuration\",\n                        \"Runtime_config\": \"The Runtime configuration\",\n                        \"Train_module\": \"The Training module\"\n                    },\n                    ScoreLevel.FAIR: {\n                        \"Model_entity_id\": \"A number that identifies the model\",\n                        \"Dist_pg\": \"Process group for distributed operations\",\n                        \"Checkpoint_config\": \"Settings for checkpoint management\",\n                        \"Runtime_config\": \"Configuration for runtime behavior\",\n                        \"Train_module\": \"Module that manages the training process\"\n                    },\n                    ScoreLevel.GOOD: {\n                        \"Model_entity_id\": \"identifier for the model entity.\",\n                        \"Dist_pg\": \"PyTorch distributed process group that handles communication between processes\",\n                        \"Checkpoint_config\": \"Configuration that determines when checkpoints are saved and how many are kept\",\n                        \"Runtime_config\": \"Specifies runtime parameters like memory limits and timeout settings\",\n                        \"Train_module\": \"Module that implements training logic and interacts with the checkpoint system\"\n                    },\n                    ScoreLevel.VERY_GOOD: {\n                        \"Model_entity_id\": \"Unique numeric identifier for the model entity in the registry. Must be a valid registered model ID\",\n                        \"Dist_pg\": \"PyTorch distributed process group that coordinates operations across GPUs/nodes during training. Should match your distributed setup\",\n                        \"Checkpoint_config\": \"Controls checkpoint frequency, storage locations, and retention policies. Important for balancing disk usage with recovery capabilities\",\n                        \"Runtime_config\": \"Defines resource constraints and operational parameters. Must be configured appropriately for your hardware to avoid performance issues\",\n                        \"Train_module\": \"Orchestrates the training workflow, manages state transitions, and defines what model components get checkpointed\"\n                    },\n                    ScoreLevel.EXCELLENT: {\n                        \"Model_entity_id\": \"Unique integer ID for the model entity (e.g., 1014925). Should always be a 7 digits number. Must exist in the model registry before checkpointing, otherwise will hit CheckpointNotFoundError and fail to load the checkpoint.\",\n                        \"Dist_pg\": \"Distributed process group that handles collective operations for multi-GPU or multi-node setups. This setup must be consistent with the training configuration 'distributed_training_config'.\",\n                        \"Checkpoint_config\": \"Specifies saving intervals, naming formats, and retention. Supports advanced features like asynchronous checkpointing. See examples in 'https://fb.workplace.com/groups/652446422242/preview'.\",\n                        \"Runtime_config\": \"Contains environment constraints (e.g., memory, disk I/O) and concurrency policies. Ensures checkpointing does not stall training under restricted resources, otherwise will hit CheckpointAccessError and fail to load the checkpoint.\",\n                        \"Train_module\": \"Manages end-to-end training flow, triggers checkpoint saving at appropriate intervals, and provides context on what states/parameters to store.\"\n                    },\n                },\n                explanations={\n                    ScoreLevel.POOR: \"Descriptions recite minimal type info, lacking usage or constraints\",\n                    ScoreLevel.FAIR: \"Provides a basic sense of the purpose for each parameter, but lacks detail\",\n                    ScoreLevel.GOOD: \"Covers core constraints and a bit of context, but some usage details are still missing\",\n                    ScoreLevel.VERY_GOOD: \"Explains relevant usage patterns, constraints, and environment needs\",\n                    ScoreLevel.EXCELLENT: \"Comprehensive coverage including resource impact, advanced usage scenarios, and constraints\"\n                }\n            )\n        ]\n\n    def get_evaluation_prompt(self, code_component: str, docstring: str, eval_type: str = None) -> str:\n        \"\"\"\n        Generates a prompt for LLM evaluation of parameter descriptions.\n\n        Args:\n            code_component: The code implementation (class or function/method)\n            docstring: The docstring to evaluate\n            eval_type: The type of code component (class, function, method).\n                      If not provided, it will be determined from code_component.\n            \n        Returns:\n            Prompt for LLM evaluation\n        \"\"\"\n        # Determine eval_type if not provided\n        if eval_type is None:\n            if code_component.strip().startswith(\"class \"):\n                eval_type = \"class\"\n            else:\n                eval_type = \"function\" if \"self\" not in code_component.split(\"(\")[0] else \"method\"\n        \n        assert eval_type in [\"class\", \"function\", \"method\"], \"eval_type must be one of 'class', 'function', or 'method'\"\n\n        example = self.examples[0]  # Use first example as reference\n\n        # system prompt    \n        prompt = [\n            \"Please evaluate the parameter description section for a docstring of a \" + eval_type + \" based on these criteria:\"]\n\n        # second part, the evaluation criteria\n        prompt.extend([\n            \"\",\n            \"<evaluation_criteria>\",\n            \"Evaluation criteria:\",\n            self.criteria['description'],\n            \"\",\n            \"Score levels:\",\n        ])\n        \n        # Add criteria for each score level\n        for level in ScoreLevel:\n            prompt.append(f\"{level.value}. {self.criteria['score_criteria'][level]}\")\n        prompt.append(\"</evaluation_criteria>\")\n        \n        # Add example\n        prompt.extend([\n            \"\",\n            \"<reference_example>\",\n            \"Parameter descriptions at different quality levels:\",\n        ])\n        \n        for level in ScoreLevel:\n            prompt.extend([\n                f\"Level {level.value}:\",\n                *[f\"{param}: {desc}\" for param, desc in example.quality_examples[level].items()],\n                f\"Explanation: {example.explanations[level]}\",\n                \"\"\n            ])\n        prompt.append(\"</reference_example>\")\n        \n\n        # add focal code component and docstring\n        prompt.extend([\n            \"\",\n            \"<original_code_component>\",\n            f\"{code_component}\",\n            \"</original_code_component>\",\n            \"\",\n            \"<parameters_to_evaluate>\",\n            \"Parameter descriptions to evaluate:\",\n            f\"{docstring}\",\n            \"</parameters_to_evaluate>\"\n        ])\n\n        prompt.extend([\n            \"\",\n            \"<analysis_instructions>\",\n            \"IMPORTANT INSTRUCTIONS FOR ANALYSIS:\",\n            \"1. Analyze how well each parameter description provides meaningful information beyond type hints\",\n            \"2. Consider completeness of constraint and valid value documentation\",\n            \"3. Look for helpful context about parameter impact on code component's behavior\",\n            \"4. Check for clear examples or guidance where appropriate\",\n            \"</analysis_instructions>\",\n            \"\",\n            \"<response_format>\",\n            \"Please structure your response as follows:\",\n            \"1. Compare against the criteria and example quality levels\",\n            \"2. Suggest specific improvements for weaker descriptions. Include your suggestions in <suggestions></suggestions> tags. No need to provide suggestions for excellent descriptions.\",\n            \"3. Provide your score (1-5) enclosed in <score></score> tags\",\n            \"</response_format>\",\n            \"\",\n            \"Remember: Do not rush to assign a score. Take time to analyze thoroughly and justify your reasoning.\",\n            \"The score should reflect your careful analysis and should be the last part of your response.\",\n        ])\n        \n        return \"\\n\".join(prompt)\n    \n    def parse_llm_response(self, response: str) -> Tuple[int, str]:\n        \"\"\"\n        Extracts the numerical score and suggestions from an LLM's response.\n        \n        Args:\n            response: The complete response text from the LLM.\n            \n        Returns:\n            A tuple containing:\n            - The numerical score (1-5)\n            - The suggestions for improvement\n            \n        Raises:\n            ValueError: If no valid score is found.\n        \"\"\"\n        # Extract score from XML tags\n        score_patterns = [\n            r'<score>(\\d)</score>',  # XML tags\n            r'score:\\s*(\\d)',  # Common format\n            r'score\\s*=\\s*(\\d)',  # Alternative format\n            r'(\\d)\\s*/\\s*5',  # Rating format\n        ]\n        \n        # Try each pattern\n        for pattern in score_patterns:\n            score_matches = re.findall(pattern, response, re.IGNORECASE)\n            if score_matches:\n                score = int(score_matches[0])\n                if 1 <= score <= 5:\n                    break\n        else:\n            # If no score found, use a default\n            score = 3\n        \n        # Extract suggestions - look for several common patterns\n        suggestion_patterns = [\n            r'<suggestions>(.*?)</suggestions>',  # XML tags\n            r'suggestions?:\\s*(.+?)(?:\\n\\n|\\Z)',  # Common format\n            r'improve?:?\\s*(.+?)(?:\\n\\n|\\Z)',     # Alternative format\n        ]\n        \n        # Try each pattern\n        for pattern in suggestion_patterns:\n            suggestion_matches = re.findall(pattern, response, re.DOTALL | re.IGNORECASE)\n            if suggestion_matches:\n                suggestion = suggestion_matches[0].strip()\n                break\n        else:\n            # Try to find any text that looks like suggestions\n            lines = response.split('\\n')\n            for i, line in enumerate(lines):\n                if \"suggest\" in line.lower() and i < len(lines) - 1:\n                    suggestion = lines[i+1].strip()\n                    break\n            else:\n                suggestion = \"Consider adding more detailed parameter descriptions.\"\n        \n        return score, suggestion\n\n    def get_criteria_description(self) -> str:\n        \"\"\"Returns the main criteria description.\"\"\"\n        return self.criteria['description']\n\n    def get_score_criteria(self, level: ScoreLevel) -> str:\n        \"\"\"Returns the criteria description for a specific score level.\"\"\"\n        return self.criteria['score_criteria'][level]\n\n    def get_examples(self) -> List[ParameterEvaluationExample]:\n        \"\"\"Returns all evaluation examples.\"\"\"\n        return self.examples "
  },
  {
    "path": "src/evaluator/helpfulness_summary.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Any, List, Optional, Tuple\nimport re\nfrom dataclasses import dataclass\nfrom enum import Enum\n\nfrom src.evaluator.evaluation_common import ScoreLevel, SummaryEvaluationExample\n\nclass DocstringSummaryEvaluator:\n    \"\"\"\n    Evaluates the quality of Python docstring summaries using predefined criteria and examples.\n    \n    This class provides a structured way to assess how well a docstring's summary line conveys\n    the purpose and value of a function or class. It includes detailed criteria for different\n    quality levels and concrete examples to guide the evaluation process.\n    \"\"\"\n\n    def __init__(self):\n        \"\"\"Initialize the evaluator with predefined criteria and examples.\"\"\"\n        self.criteria = self._initialize_criteria()\n        self.examples = self._initialize_examples()\n\n    def _initialize_criteria(self) -> Dict[str, Any]:\n        \"\"\"\n        Set up the evaluation criteria for docstring summaries.\n        \n        The criteria define five quality levels, from mere signature repetition (1) \n        to excellent context and purpose explanation (5).\n        \n        Returns:\n            Dict containing the evaluation criteria and descriptions for each score level.\n        \"\"\"\n        return {\n                'description': (\n                    'Evaluate how effectively the one-line summary conveys '\n                    'the purpose and value of the function/class while providing additional '\n                    'context beyond what is apparent from the signature. A high-quality '\n                    'summary should be concise yet informative, avoiding mere signature '\n                    'repetition while adding meaningful context about the \"why\" or '\n                    'higher-level purpose.'\n                    ),\n                'score_criteria': {\n                        ScoreLevel.POOR: (\n                            'The summary merely restates the function signature in natural '\n                            'language or is completely unrelated to the function purpose. '\n                            'The summary provides no additional information beyond what is '\n                            'already obvious from the function name and parameters.'\n                        ),\n                        ScoreLevel.FAIR: (\n                            'The summary provides minimal information beyond the signature, '\n                            'perhaps adding one minor detail but still failing to convey '\n                            'meaningful context or purpose. It may use vague or overly '\n                            'technical language that doesn\\'t help understanding.'\n                        ),\n                        ScoreLevel.GOOD: (\n                            'The summary provides some useful context beyond the signature, '\n                            'touching on either the \"why\" or a key use case, but could be '\n                            'more specific or comprehensive. It gives readers a general idea '\n                            'but may leave out important context.'\n                        ),\n                        ScoreLevel.VERY_GOOD: (\n                            'The summary effectively communicates both what the function does '\n                            'and its higher-level purpose, using clear language that helps '\n                            'readers understand when/why to use it. It avoids technical '\n                            'jargon unless necessary.'\n                        ),\n                        ScoreLevel.EXCELLENT: (\n                            'The summary excellently balances conciseness with informativeness, '\n                            'clearly conveying the function\\'s purpose, value, and context in '\n                            'business/practical terms. It helps readers immediately understand '\n                            'both what the function does and why it matters.'\n                        )\n                        }\n                }\n\n    def _initialize_examples(self) -> List[SummaryEvaluationExample]:\n        \"\"\"\n        Set up concrete examples of docstring summaries at different quality levels.\n        \n        Each example includes a function signature and corresponding summaries at\n        different quality levels, along with explanations of the ratings.\n        \n        Returns:\n            List of SummaryEvaluationExample objects containing the example cases.\n        \"\"\"\n        return [\n            SummaryEvaluationExample(\n                function_signature=(\n                    \"def calculate_user_metrics(user_id: str, start_date: datetime, \"\n                    \"end_date: datetime) -> Dict[str, float]\"\n                ),\n                summaries={\n                    ScoreLevel.POOR: \"Calculates metrics for a user between two dates.\",\n                    ScoreLevel.FAIR: \"Processes user metrics data through various calculation methods.\",\n                    ScoreLevel.GOOD: \"Analyzes user engagement patterns by computing daily interaction statistics.\",\n                    ScoreLevel.VERY_GOOD: (\n                        \"Generates user engagement insights for quarterly reporting by \"\n                        \"processing daily interaction metrics.\"\n                    ),\n                    ScoreLevel.EXCELLENT: (\n                        \"Identifies at-risk users by analyzing engagement patterns \"\n                        \"against historical churn indicators.\"\n                    )\n                },\n                explanations={\n                    ScoreLevel.POOR: \"This summary merely converts the function signature into a sentence, providing no additional value.\",\n                    ScoreLevel.FAIR: \"While this adds slightly more information than the signature, it remains vague and unhelpful.\",\n                    ScoreLevel.GOOD: (\n                        \"This provides some context about the purpose (engagement analysis) \"\n                        \"but could be more specific about why we track this.\"\n                    ),\n                    ScoreLevel.VERY_GOOD: (\n                        \"This effectively communicates both what it does and why \"\n                        \"(quarterly reporting), giving clear context for its use.\"\n                    ),\n                    ScoreLevel.EXCELLENT: (\n                        \"This excellently conveys both the technical function and its \"\n                        \"business purpose (preventing churn) in a clear, meaningful way.\"\n                    )\n                }\n            ),\n            SummaryEvaluationExample(\n                function_signature=(\n                    \"class DatasetLoader:\"\n                ),\n                summaries={\n                    ScoreLevel.POOR: \"A class that loads datasets.\",\n                    ScoreLevel.FAIR: \"Handles loading of data from various sources.\",\n                    ScoreLevel.GOOD: \"Provides unified interface for loading and validating datasets from multiple sources.\",\n                    ScoreLevel.VERY_GOOD: (\n                        \"Streamlines dataset ingestion by providing a consistent interface \"\n                        \"for loading and validating data from diverse sources.\"\n                    ),\n                    ScoreLevel.EXCELLENT: (\n                        \"Ensures data quality and consistency by providing a unified interface \"\n                        \"for loading, validating, and preprocessing datasets across multiple \"\n                        \"formats and sources while handling common edge cases.\"\n                    )\n                },\n                explanations={\n                    ScoreLevel.POOR: \"Simply restates the class name without adding value.\",\n                    ScoreLevel.FAIR: \"Adds minimal information, remains vague about capabilities.\",\n                    ScoreLevel.GOOD: (\n                        \"Provides context about key functionality but could better explain \"\n                        \"benefits and use cases.\"\n                    ),\n                    ScoreLevel.VERY_GOOD: (\n                        \"Clearly communicates purpose and value while highlighting key \"\n                        \"features and benefits.\"\n                    ),\n                    ScoreLevel.EXCELLENT: (\n                        \"Excellently balances technical capabilities with practical benefits, \"\n                        \"while highlighting key differentiators and value proposition.\"\n                    )\n                }\n            )\n        ]\n\n    def get_evaluation_prompt(self, code_component: str, docstring: str, eval_type: str = None) -> str:\n        \"\"\"\n        Generates a prompt for LLM evaluation of docstring summaries.\n        \n        Args:\n            code_component: The code implementation (class or function/method)\n            docstring: The docstring to evaluate\n            eval_type: The type of code component (class, function, method).\n                      If not provided, it will be determined from code_component.\n        \n        Returns:\n            Prompt for LLM evaluation\n        \"\"\"\n        # Determine eval_type if not provided\n        if eval_type is None:\n            if code_component.strip().startswith(\"class \"):\n                eval_type = \"class\"\n            else:\n                eval_type = \"function\" if \"self\" not in code_component.split(\"(\")[0] else \"method\"\n        \n        # Determine if input is a class or function signature\n        is_class = eval_type == \"class\"\n        \n        # Select relevant example based on signature type\n        relevant_example = next(\n            example for example in self.examples \n            if (example.function_signature.startswith('class') == is_class)\n        )\n        \n        prompt = [\n            \"Please evaluate the summary part of a docstring of a \" + eval_type + \" based on these criteria:\",\n        ]\n        \n        # Add criteria for each score level\n        for level in ScoreLevel:\n            prompt.append(f\"{level.value}. {self.criteria['score_criteria'][level]}\")\n        prompt.append(\"</evaluation_criteria>\")\n        \n        # Add single relevant example\n        prompt.extend([\n            \"\",\n            \"<reference_example>\",\n            \"Summaries at different levels:\",\n        ])\n        \n        for level in ScoreLevel:\n            prompt.extend([\n                f\"Level {level.value}: {relevant_example.summaries[level]}\",\n                f\"Explanation: {relevant_example.explanations[level]}\",\n                \"\"\n            ])\n        prompt.append(\"</reference_example>\")\n\n        # add the code component and the docstring\n        prompt.extend([\n            \"\",\n            \"<original_code_component>\",\n            f\"{code_component}\",\n            \"</original_code_component>\",\n        ])\n\n        prompt.extend([\n            \"\",\n            \"<docstring_to_evaluate>\",\n            f\"{docstring}\",\n            \"</docstring_to_evaluate>\",\n        ])\n        \n        prompt.extend([\n            \"\",\n            \"<analysis_instructions>\",\n            \"IMPORTANT INSTRUCTIONS FOR ANALYSIS:\",\n            \"1. Take your time to analyze the relationship between the focal code component and the summary part of the docstring.\",\n            \"2. Consider how much additional context and value the summary provides beyond the signature.\",\n            \"3. Compare the summary against each score level's criteria methodically.\",\n            \"4. Look for similarities with the provided example at each quality level.\",\n            \"</analysis_instructions>\",\n            \"\",\n            \"<response_format>\",\n            \"Please structure your response as follows:\",\n            \"1. First explain your reasoning by comparing against the criteria\",\n            \"2. If applicable, suggest specific improvements. Include your suggestions in <suggestions></suggestions> tags. No need to provide suggestions for excellent summaries.\",\n            \"3. Finally, provide your score (1-5) enclosed in <score></score> tags\",\n            \"</response_format>\",\n            \"\",\n            \"Remember: Do not rush to assign a score. Take time to analyze thoroughly and justify your reasoning.\",\n            \"The score should reflect your careful analysis and should be the last part of your response.\"\n        ])\n        \n        return \"\\n\".join(prompt)\n    \n    def parse_llm_response(self, response: str) -> Tuple[int, str]:\n        \"\"\"\n        Extracts the numerical score and suggestions from an LLM's response.\n        \n        Args:\n            response: The complete response text from the LLM.\n            \n        Returns:\n            A tuple containing:\n            - The numerical score (1-5)\n            - The suggestions for improvement\n            \n        Raises:\n            ValueError: If no valid score is found.\n        \"\"\"\n        # Extract score from various patterns\n        score_patterns = [\n            r'<score>(\\d)</score>',  # XML tags\n            r'score:\\s*(\\d)',        # Common format\n            r'score\\s*=\\s*(\\d)',     # Alternative format\n            r'(\\d)\\s*/\\s*5',         # Rating format\n            r'level\\s*(\\d)',         # Level references\n        ]\n        \n        # Try each pattern\n        for pattern in score_patterns:\n            score_matches = re.findall(pattern, response, re.IGNORECASE)\n            if score_matches:\n                score = int(score_matches[0])\n                if 1 <= score <= 5:\n                    break\n        else:\n            # If no score found, default to 3\n            score = 3\n        \n        # Extract suggestions - look for several common patterns\n        suggestion_patterns = [\n            r'<suggestions>(.*?)</suggestions>',    # XML tags\n            r'suggestions?:\\s*(.+?)(?:\\n\\n|\\Z)',    # Common format\n            r'could be improved by:?\\s*(.+?)(?:\\n\\n|\\Z)', # Alternative phrasing\n            r'improvement:?\\s*(.+?)(?:\\n\\n|\\Z)',    # Another alternative\n        ]\n        \n        # Try each pattern\n        for pattern in suggestion_patterns:\n            suggestion_matches = re.findall(pattern, response, re.DOTALL | re.IGNORECASE)\n            if suggestion_matches:\n                suggestion = suggestion_matches[0].strip()\n                break\n        else:\n            # If we can't find a suggestion, extract sentences that seem like suggestions\n            suggestion_sentences = []\n            for sentence in re.split(r'[.!?]\\s+', response):\n                if any(word in sentence.lower() for word in ['could', 'should', 'might', 'consider', 'suggest', 'improve', 'better']):\n                    suggestion_sentences.append(sentence.strip())\n            \n            if suggestion_sentences:\n                suggestion = ' '.join(suggestion_sentences) + '.'\n            else:\n                # Default suggestion\n                suggestion = \"Consider adding more context and purpose to the summary.\"\n        \n        return score, suggestion\n\n    def get_criteria_description(self) -> str:\n        \"\"\"Returns the main criteria description.\"\"\"\n        return self.criteria['description']\n\n    def get_score_criteria(self, level: ScoreLevel) -> str:\n        \"\"\"Returns the criteria description for a specific score level.\"\"\"\n        return self.criteria['score_criteria'][level]\n\n    def get_examples(self) -> List[SummaryEvaluationExample]:\n        \"\"\"Returns all evaluation examples.\"\"\"\n        return self.examples"
  },
  {
    "path": "src/evaluator/segment.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport re\n\ndef parse_google_style_docstring(docstring):\n    \"\"\"\n    A robust parser for Google-style docstrings that have multiple possible\n    labels for each section.\n\n    For example, any of the lines in EXAMPLE_LABELS indicates the start of the \"examples\" section.\n    \"\"\"\n\n    # Define all recognized sections. The key is the canonical name (lowercase).\n    # The value is a set of synonyms (also lowercase).\n    SECTION_LABELS = {\n        \"summary\":        {\"summary:\", \"short description:\", \"brief:\", \"overview:\"},\n        \"description\":    {\"description:\", \"desc:\", \"details:\", \"detailed description:\", \"long description:\"},\n        \"parameters\":     {\"parameters:\", \"params:\", \"args:\", \"arguments:\", \"keyword args:\", \"keyword arguments:\", \"**kwargs:\"},\n        \"attributes\":     {\"attributes:\", \"members:\", \"member variables:\", \"instance variables:\", \"properties:\", \"vars:\", \"variables:\"},\n        \"returns\":        {\"returns:\", \"return:\", \"return value:\", \"return values:\"},\n        \"raises\":         {\"raises:\", \"exceptions:\", \"throws:\", \"raise:\", \"exception:\", \"throw:\"},\n        \"examples\":       {\"example:\", \"examples:\", \"usage:\", \"usage example:\", \"usage examples:\", \"example usage:\"},\n    }\n\n    # Prepare a dictionary to hold the parsed content for each canonical key\n    parsed_content = {key: [] for key in SECTION_LABELS.keys()}\n\n    # Split by lines; if docstring uses Windows line endings, .splitlines() handles that gracefully\n    lines = docstring.strip().splitlines()\n\n    # -- 1) Fallback: no explicit sections at all in the entire docstring --\n    #    If no recognized label appears anywhere, treat the first line as summary, rest as description.\n    has_section_labels = False\n    for line in lines:\n        line_lower = line.strip().lower()\n        for labels in SECTION_LABELS.values():\n            for label in labels:\n                if line_lower.startswith(label):\n                    has_section_labels = True\n                    break\n            if has_section_labels:\n                break\n        if has_section_labels:\n            break\n            \n    if len(lines) > 0 and not has_section_labels:\n        parsed_content[\"summary\"] = [lines[0]]\n        if len(lines) > 1:\n            parsed_content[\"description\"] = lines[1:]\n        # Convert lists to single strings\n        return {key: \"\\n\".join(value).strip() for key, value in parsed_content.items()}\n\n    # -- 2) Partial Fallback for the first line only --\n    #    If the first line doesn't match any known label, treat it as summary and then\n    #    switch to \"description\" until an explicit label is found.\n    current_section = None  # keep track of which section we're in\n    \n    first_line = lines[0].strip().lower() if lines else \"\"\n    if not any(first_line.startswith(label) for labels in SECTION_LABELS.values() for label in labels):\n        if lines:\n            # Save first line as summary\n            parsed_content[\"summary\"] = [lines[0]]\n            # Make the current section \"description\"\n            current_section = \"description\"\n            lines = lines[1:]  # We'll handle the rest below\n\n    for line in lines:\n        # We'll do a trimmed, lowercase version of the line to check for a header\n        # but keep original_line if you want to preserve original indentation or case.\n        trimmed_line = line.strip().lower()\n\n        # Check if the trimmed line (minus trailing colon, if present) matches a known section\n        # We'll also handle any trailing colon, extra spaces, etc.\n        # e.g. \"  Parameters:  \" -> \"parameters:\"\n        # We only match a line if it starts exactly with that label.\n        # If you want more flexible matching (like partial lines), you can adapt this.\n        matched_section = None\n        for canonical_name, synonyms in SECTION_LABELS.items():\n            # Each synonym might be \"parameters:\", \"args:\", etc.\n            # We'll see if the trimmed_line starts exactly with one of them.\n            for synonym in synonyms:\n                # If line starts with the synonym, we treat it as a new section.\n                # Example: \"PARAMETERS:\" -> synonyms might contain \"parameters:\" in lowercase\n                if trimmed_line.startswith(synonym):\n                    matched_section = canonical_name\n                    # Extract leftover text on the same line, after the label\n                    leftover = line.strip()[len(synonym):].strip()\n                    if leftover:\n                        parsed_content[matched_section].append(leftover)\n                    break\n\n            if matched_section:\n                break\n\n        # If matched_section is not None, we found a new section header\n        if matched_section is not None:\n            # Switch to that section\n            current_section = matched_section\n            # No need to append the header line to content - we've already handled any content after the label\n        else:\n            # Otherwise, accumulate this line under the current section if we have one\n            if current_section is not None:\n                parsed_content[current_section].append(line)\n\n    # Convert list of lines to a single string for each section, \n    # with consistent line breaks, and strip extra whitespace\n    for section in parsed_content:\n        parsed_content[section] = \"\\n\".join(parsed_content[section]).strip()\n\n    return parsed_content\n\n\n# ------------------------------ Example Usage ------------------------------\nif __name__ == \"__main__\":\n    sample_docstring = \"\"\"\nSummary:\n    Provides a utility for processing and managing data through a structured workflow.\n\nDescription:\n    This class is designed to facilitate data processing tasks by integrating with the `DataProcessor` class.\n    It retrieves and manipulates data.\n\nParameters:\n    param1: This is the first parameter.\n    param2: This is the second parameter.\n\nAttributes:\n    data: Stores the current data.\n\nExample:\n    ```python\n    helper = HelperClass()\n    helper.process_data()\n    print(helper.data)\n    ```\n    \"\"\"\n\n    result = parse_google_style_docstring(sample_docstring)\n\n    # Print out each section\n    for section_name, content in result.items():\n        print(\"SECTION:\", section_name.upper())\n        print(\"CONTENT:\\n\", content)\n        print(\"-\" * 40)\n"
  },
  {
    "path": "src/evaluator/truthfulness.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport json\nimport os\nimport re\nimport sys\nfrom typing import List, Dict, Any, Set, Tuple\nimport google.generativeai as genai\nfrom tqdm import tqdm\nimport pandas as pd\nfrom collections import defaultdict\n\n# Constants\nSYSTEMS = [\n    \"copy_paste_codellama34b\",\n    \"copy_paste_gpt4o_mini\",\n    \"docassist-codellama34b\",\n    \"docassist-gpt4o_mini\",\n    \"fim-codellama13b\",\n]\n\nGEMINI_API_KEY = os.environ.get(\"GEMINI_API_KEY\")\nif not GEMINI_API_KEY:\n    raise ValueError(\"GEMINI_API_KEY is not set\")\n\n# Configure Gemini API\ngenai.configure(api_key=GEMINI_API_KEY)\nmodel = genai.GenerativeModel(\"gemini-2.0-flash\")\n\ndef extract_components_from_docstring(docstring: str) -> List[str]:\n    \"\"\"\n    Extract code components (classes, methods, functions) mentioned in a docstring\n    using Gemini API.\n    \n    Args:\n        docstring: The docstring text to analyze\n        \n    Returns:\n        List of code component names mentioned in the docstring\n    \"\"\"\n    prompt = f\"\"\"\n    Please extract all the non-common (very likely to be newly-defined in the repository) code components (classes, methods, functions) mentioned in \n    the following docstring. \n\n    Ignore the example part of the docstring if it exists (the code component you extract should not come from the example code).\n    \n    For example, \"List\" is a very common class, so it should not be included.\n    On the other hand, \"InMemoryCache\" is not a common class, so it should be included.\n\n    Return only a Python list of strings with the exact names.\n    If no code components are mentioned, return an empty list.\n    \n    Docstring:\n    ```\n    {docstring}\n    ```\n    \n    Format your response as a Python list wrapped in XML tags like this:\n    <python_list>[\"ClassA\", \"method_b\", \"function_c\"]</python_list>\n    \"\"\"\n    \n    try:\n        response = model.generate_content(prompt)\n        response_text = response.text.strip()\n        \n        # Extract list from XML tags\n        match = re.search(r'<python_list>(.*?)</python_list>', response_text, re.DOTALL)\n        if match:\n            list_str = match.group(1)\n            try:\n                # Safely evaluate the list string\n                components = eval(list_str)\n                if isinstance(components, list):\n                    return components\n            except:\n                # If evaluation fails, extract strings manually\n                components = re.findall(r'\"([^\"]*)\"', list_str)\n                return components\n        \n        # Fallback: try to extract using regex for regular list\n        match = re.search(r'\\[.*?\\]', response_text, re.DOTALL)\n        if match:\n            list_str = match.group(0)\n            try:\n                # Safely evaluate the list string\n                components = eval(list_str)\n                if isinstance(components, list):\n                    return components\n            except:\n                # If evaluation fails, extract strings manually\n                components = re.findall(r'\"([^\"]*)\"', list_str)\n                return components\n        \n        # Fallback: try to find any mention of code looking elements\n        components = re.findall(r'`([^`]+)`', docstring)\n        return [c for c in components if not c.startswith('(') and not c.endswith(')')]\n    \n    except Exception as e:\n        print(f\"Error calling Gemini API: {e}\")\n        # Fallback: try to find any mention of code looking elements\n        components = re.findall(r'`([^`]+)`', docstring)\n        return [c for c in components if not c.startswith('(') and not c.endswith(')')]\n\ndef load_dependency_graph(repo_name: str) -> Dict[str, Any]:\n    \"\"\"\n    Load the dependency graph for a given repository.\n    \n    Args:\n        repo_name: Repository name\n        \n    Returns:\n        Dependency graph data\n    \"\"\"\n    file_path = f\"output/dependency_graphs/{repo_name}_dependency_graph.json\"\n    try:\n        with open(file_path, 'r') as f:\n            return json.load(f)\n    except FileNotFoundError:\n        print(f\"Dependency graph not found: {file_path}\")\n        return {}\n    \ndef check_component_existence(\n    component_name: str, \n    dependency_graph: Dict[str, Any],\n    docstring_path: str\n) -> Tuple[bool, bool]:\n    \"\"\"\n    Check if a component exists in the dependency graph and if it's a cross-file reference.\n    \n    Args:\n        component_name: Name of the component to check\n        dependency_graph: Dependency graph data\n        docstring_path: Path of the docstring's component\n        \n    Returns:\n        Tuple of (exists, is_cross_file)\n    \"\"\"\n    exists = False\n    is_cross_file = False\n    \n    docstring_relative_path = None\n    if \"/\" in docstring_path:\n        # Extract the relative path from the docstring path\n        parts = docstring_path.split(\"/\")\n        repo_name = parts[1]\n        relative_path = \"/\".join(parts[1:-1])\n        docstring_relative_path = relative_path\n    \n    for comp_id, comp_data in dependency_graph.items():\n        # Check if the component name is in the ID\n        if component_name in comp_id.split(\".\")[-1]:\n            exists = True\n            \n            # Check if it's a cross-file reference\n            if docstring_relative_path and \"relative_path\" in comp_data:\n                comp_relative_path = comp_data[\"relative_path\"]\n                if docstring_relative_path != comp_relative_path:\n                    is_cross_file = True\n            \n            break\n    \n    return exists, is_cross_file\n\ndef main():\n    # Load completeness evaluation data\n    print(\"Loading completeness evaluation data...\")\n    with open(\"experiments/eval/results/completeness_evaluation_cleaned.json\", 'r') as f:\n        completeness_data = json.load(f)\n    \n    results = {}\n    \n    # Process each component in the completeness data\n    for component_path, component_data in tqdm(completeness_data.items()):\n        if \"docstrings\" not in component_data:\n            continue\n        \n        # Extract repo name\n        parts = component_path.split(\"/\")\n        repo_name = parts[1]\n        # replace all - in reponame to _\n        repo_name = repo_name.replace(\"-\", \"_\")\n        \n        # Load dependency graph for this repo (once)\n        if repo_name not in results:\n            print(f\"Loading dependency graph for {repo_name}...\")\n            dependency_graph = load_dependency_graph(repo_name)\n            results[repo_name] = {}\n        \n        # For each system, analyze the docstring\n        for system in SYSTEMS:\n            if system not in component_data[\"docstrings\"]:\n                continue\n                \n            docstring = component_data[\"docstrings\"][system][\"docstring\"]\n            \n            # Extract mentioned components from docstring\n            components = extract_components_from_docstring(docstring)\n            \n            # Check existence of each component in the dependency graph\n            component_results = []\n            for comp in components:\n                exists, is_cross_file = check_component_existence(\n                    comp, dependency_graph, component_path\n                )\n                \n                component_results.append({\n                    \"name\": comp,\n                    \"exists\": exists,\n                    \"is_cross_file\": is_cross_file\n                })\n            \n            # Store results\n            if component_path not in results[repo_name]:\n                results[repo_name][component_path] = {}\n            \n            results[repo_name][component_path][system] = {\n                \"mentioned_components\": component_results,\n                \"total_mentions\": len(components),\n                \"existing_mentions\": sum(1 for c in component_results if c[\"exists\"]),\n                \"cross_file_mentions\": sum(1 for c in component_results if c[\"is_cross_file\"])\n            }\n    \n    # Save detailed results\n    with open(\"experiments/eval/results/docstring_truthfulness_evaluation.json\", 'w') as f:\n        json.dump(results, f, indent=2)\n    \n    # Generate summary report\n    generate_summary_report(results)\n\ndef generate_summary_report(results: Dict[str, Dict[str, Dict[str, Any]]]):\n    \"\"\"\n    Generate a summary report comparing the five systems.\n    \n    Args:\n        results: The evaluation results\n    \"\"\"\n    # Aggregate statistics\n    stats = {\n        system: {\n            \"total_components_mentioned\": 0,\n            \"existing_components\": 0,\n            \"cross_file_mentions\": 0,\n            \"docstrings_analyzed\": 0\n        }\n        for system in SYSTEMS\n    }\n    \n    # Calculate statistics\n    for repo_name, repo_data in results.items():\n        for component_path, comp_data in repo_data.items():\n            for system, system_data in comp_data.items():\n                if system in SYSTEMS:\n                    stats[system][\"total_components_mentioned\"] += system_data[\"total_mentions\"]\n                    stats[system][\"existing_components\"] += system_data[\"existing_mentions\"]\n                    stats[system][\"cross_file_mentions\"] += system_data[\"cross_file_mentions\"]\n                    stats[system][\"docstrings_analyzed\"] += 1\n    \n    # Calculate ratios\n    for system in SYSTEMS:\n        total = stats[system][\"total_components_mentioned\"]\n        if total > 0:\n            stats[system][\"existence_ratio\"] = stats[system][\"existing_components\"] / total\n        else:\n            stats[system][\"existence_ratio\"] = 0\n            \n        if stats[system][\"existing_components\"] > 0:\n            stats[system][\"cross_file_ratio\"] = stats[system][\"cross_file_mentions\"] / stats[system][\"existing_components\"]\n        else:\n            stats[system][\"cross_file_ratio\"] = 0\n            \n        if stats[system][\"docstrings_analyzed\"] > 0:\n            stats[system][\"avg_mentions_per_doc\"] = total / stats[system][\"docstrings_analyzed\"]\n        else:\n            stats[system][\"avg_mentions_per_doc\"] = 0\n    \n    # Create markdown report\n    report = \"# Docstring Truthfulness Evaluation Report\\n\\n\"\n    \n    # Table 1: Component Existence\n    report += \"## Component Existence Ratio (higher is better)\\n\\n\"\n    report += \"| System | Components Mentioned | Existing Components | Existence Ratio |\\n\"\n    report += \"|--------|---------------------|---------------------|-----------------|\\n\"\n    \n    for system in SYSTEMS:\n        report += f\"| {system} | {stats[system]['total_components_mentioned']} | {stats[system]['existing_components']} | {stats[system]['existence_ratio']:.2%} |\\n\"\n    \n    # Table 2: Component Mentions\n    report += \"\\n## Component Mention Frequency (higher is better)\\n\\n\"\n    report += \"| System | Docstrings Analyzed | Total Components | Avg Mentions Per Doc |\\n\"\n    report += \"|--------|---------------------|------------------|-----------------------|\\n\"\n    \n    for system in SYSTEMS:\n        report += f\"| {system} | {stats[system]['docstrings_analyzed']} | {stats[system]['total_components_mentioned']} | {stats[system]['avg_mentions_per_doc']:.2f} |\\n\"\n    \n    # Table 3: Cross-file References\n    report += \"\\n## Cross-file References (higher is better)\\n\\n\"\n    report += \"| System | Existing Components | Cross-file References | Cross-file Ratio |\\n\"\n    report += \"|--------|---------------------|----------------------|-----------------|\\n\"\n    \n    for system in SYSTEMS:\n        report += f\"| {system} | {stats[system]['existing_components']} | {stats[system]['cross_file_mentions']} | {stats[system]['cross_file_ratio']:.2%} |\\n\"\n    \n    # Save the report\n    with open(\"experiments/eval/results/docstring_truthfulness_report.md\", 'w') as f:\n        f.write(report)\n        \n    print(\"Summary report generated: docstring_truthfulness_report.md\")\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "src/visualizer/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom .status import StatusVisualizer\nfrom .progress import ProgressVisualizer\n\n__all__ = ['StatusVisualizer', 'ProgressVisualizer'] "
  },
  {
    "path": "src/visualizer/progress.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nTerminal-based progress visualization for docstring generation.\n\nThis module provides a class for visualizing the progress of generating docstrings\nusing a topologically sorted dependency graph.\n\"\"\"\n\nimport sys\nimport time\nimport os\nfrom typing import Dict, List, Set, Optional\nfrom colorama import Fore, Back, Style, init\nfrom tqdm import tqdm\n\nclass ProgressVisualizer:\n    \"\"\"Visualizes the progress of docstring generation in the terminal.\"\"\"\n    \n    def __init__(self, components: Dict[str, any], sorted_order: List[str]):\n        \"\"\"\n        Initialize the progress visualizer.\n        \n        Args:\n            components: Dictionary of code components\n            sorted_order: List of component IDs in topological order\n        \"\"\"\n        init()  # Initialize colorama\n        self.components = components\n        self.sorted_order = sorted_order\n        self.processed = set()  # Set of processed component IDs\n        self.current = None  # Current component being processed\n        self.progress_bar = None\n        self.start_time = time.time()\n    \n    def initialize(self):\n        \"\"\"Initialize the visualization and show the initial state.\"\"\"\n        self._clear_screen()\n        self._print_header()\n        \n        # Create progress bar\n        self.progress_bar = tqdm(\n            total=len(self.sorted_order),\n            desc=\"Generating docstrings\",\n            bar_format=\"{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}]\"\n        )\n        \n        # Print initial component status\n        self._print_component_status()\n    \n    def update(self, component_id: str = None, status: str = \"processing\"):\n        \"\"\"\n        Update the visualization with the current component status.\n        \n        Args:\n            component_id: ID of the component being processed (or None)\n            status: Status of the component ('processing', 'completed', or 'error')\n        \"\"\"\n        if component_id is not None:\n            self.current = component_id\n            \n            if status == \"completed\":\n                self.processed.add(component_id)\n                self.progress_bar.update(1)\n                \n        # Update the visualization\n        self._print_component_status()\n    \n    def finalize(self):\n        \"\"\"Finalize the visualization and show summary statistics.\"\"\"\n        if self.progress_bar:\n            self.progress_bar.close()\n        \n        # Calculate elapsed time\n        elapsed = time.time() - self.start_time\n        minutes, seconds = divmod(elapsed, 60)\n        hours, minutes = divmod(minutes, 60)\n        \n        self._clear_screen()\n        self._print_header()\n        \n        # Print summary\n        print(f\"\\n{Fore.GREEN}Docstring Generation Complete!{Style.RESET_ALL}\")\n        print(f\"Total components processed: {len(self.processed)}/{len(self.sorted_order)}\")\n        print(f\"Time elapsed: {int(hours):02d}:{int(minutes):02d}:{int(seconds):02d}\")\n        print(\"\\nComponents by type:\")\n        \n        # Count components by type\n        type_counts = {\"function\": 0, \"method\": 0, \"class\": 0}\n        for comp_id in self.processed:\n            comp_type = self.components[comp_id].component_type\n            type_counts[comp_type] += 1\n        \n        for comp_type, count in type_counts.items():\n            print(f\"  {comp_type.capitalize()}: {count}\")\n        \n        print(\"\\nGeneration complete. Results saved to repository files.\")\n    \n    def _clear_screen(self):\n        \"\"\"Clear the terminal screen.\"\"\"\n        sys.stdout.write(\"\\033[2J\\033[H\")\n        sys.stdout.flush()\n    \n    def _print_header(self):\n        \"\"\"Print the header with title and information.\"\"\"\n        title = \"Topological Docstring Generator\"\n        print(f\"\\n{Fore.CYAN}{Style.BRIGHT}{title}{Style.RESET_ALL}\\n\")\n        print(f\"Generating docstrings for {len(self.sorted_order)} code components in dependency order\")\n        print(f\"Components will be processed in topological order to ensure all dependencies\")\n        print(f\"have docstrings before dependent components.\")\n    \n    def _print_component_status(self):\n        \"\"\"Print the current status of components in the dependency graph.\"\"\"\n        if not self.current:\n            return\n        \n        # Get the current component and its info\n        current_comp = self.components.get(self.current)\n        if not current_comp:\n            return\n            \n        # Print current component information\n        comp_type = current_comp.component_type.capitalize()\n        file_path = current_comp.relative_path\n        \n        # Create a simplified name for display\n        parts = self.current.split('.')\n        if len(parts) > 2 and current_comp.component_type == \"method\":\n            # For methods, show Class.method\n            name = f\"{parts[-2]}.{parts[-1]}\"\n        else:\n            # For functions and classes, show just the name\n            name = parts[-1]\n        \n        # Print status line\n        print(f\"\\n{Fore.YELLOW}Currently processing: {Style.RESET_ALL}{comp_type} '{name}' in {file_path}\")\n        \n        # Print dependency information\n        if current_comp.depends_on:\n            deps = [dep_id for dep_id in current_comp.depends_on if dep_id in self.components]\n            if deps:\n                print(f\"{Fore.CYAN}Dependencies:{Style.RESET_ALL}\")\n                for dep_id in deps:\n                    dep = self.components.get(dep_id)\n                    if not dep:\n                        continue\n                        \n                    # Format the dependency name similarly\n                    parts = dep_id.split('.')\n                    if len(parts) > 2 and dep.component_type == \"method\":\n                        dep_name = f\"{parts[-2]}.{parts[-1]}\"\n                    else:\n                        dep_name = parts[-1]\n                    \n                    # Color based on processing status\n                    if dep_id in self.processed:\n                        status_color = Fore.GREEN\n                        status_text = \"(processed)\"\n                    else:\n                        status_color = Fore.RED\n                        status_text = \"(not yet processed)\"\n                        \n                    print(f\"  {status_color}{dep.component_type.capitalize()} '{dep_name}' {status_text}{Style.RESET_ALL}\")\n        \n        # Add some space after the component status\n        print()\n    \n    def show_dependency_stats(self):\n        \"\"\"Show statistics about the dependency graph.\"\"\"\n        # Calculate dependency metrics\n        total_deps = sum(len(self.components[comp_id].depends_on) for comp_id in self.components)\n        max_deps = max((len(self.components[comp_id].depends_on), comp_id) for comp_id in self.components)\n        \n        avg_deps = total_deps / len(self.components) if self.components else 0\n        \n        # Count components by type\n        types = {\"function\": 0, \"method\": 0, \"class\": 0}\n        for comp_id in self.components:\n            comp_type = self.components[comp_id].component_type\n            types[comp_type] += 1\n        \n        print(f\"\\n{Fore.CYAN}Dependency Graph Statistics:{Style.RESET_ALL}\")\n        print(f\"Total components: {len(self.components)}\")\n        print(f\"  Functions: {types['function']}\")\n        print(f\"  Methods: {types['method']}\")\n        print(f\"  Classes: {types['class']}\")\n        print(f\"Average dependencies per component: {avg_deps:.2f}\")\n        print(f\"Max dependencies: {max_deps[0]} (in component '{max_deps[1]}')\")\n        \n        # Print information about cycles if available\n        print(f\"\\nComponents will be processed in topological order.\")\n        print() "
  },
  {
    "path": "src/visualizer/status.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nfrom typing import Dict, Set\nfrom colorama import Fore, Back, Style, init\nimport sys\nimport time\nimport ast\nfrom agent.tool.ast import _get_component_name_from_code\nclass StatusVisualizer:\n    \"\"\"Visualizes the workflow status of DocAssist agents in the terminal.\"\"\"\n    \n    def __init__(self):\n        \"\"\"Initialize the status visualizer.\"\"\"\n        init()  # Initialize colorama\n        self.active_agent = None  # Track only the currently active agent\n        self._agent_art = {\n            'reader': [\n                \"┌─────────┐\",\n                \"│ READER  │\",\n                \"└─────────┘\"\n            ],\n            'searcher': [\n                \"┌─────────┐\",\n                \"│SEARCHER │\",\n                \"└─────────┘\"\n            ],\n            'writer': [\n                \"┌─────────┐\",\n                \"│ WRITER  │\",\n                \"└─────────┘\"\n            ],\n            'verifier': [\n                \"┌─────────┐\",\n                \"│VERIFIER │\",\n                \"└─────────┘\"\n            ]\n        }\n        self._status_message = \"\"\n        self._current_component = \"\"\n        self._current_file = \"\"\n    \n    def _clear_screen(self):\n        \"\"\"Clear the terminal screen.\"\"\"\n        sys.stdout.write(\"\\033[2J\\033[H\")\n        sys.stdout.flush()\n    \n    def _get_agent_color(self, agent: str) -> str:\n        \"\"\"Get the color for an agent based on its state.\"\"\"\n        return Fore.GREEN if agent == self.active_agent else Fore.WHITE\n    \n    def set_current_component(self, focal_component: str, file_path: str):\n        \"\"\"Set the current component being processed and display its information.\n        \n        Args:\n            focal_component: The code component being processed\n            file_path: Relative path to the file containing the component\n        \"\"\"\n        # Try to extract the component name from the code\n        try:\n            self._current_component = _get_component_name_from_code(focal_component)\n        except:\n            # If parsing fails, just use a generic name\n            self._current_component = \"unknown component\"\n        \n        self._current_file = file_path\n        self._display_component_info()\n    \n    def _display_component_info(self):\n        \"\"\"Display information about the current component being processed.\"\"\"\n        # print(f\"\\n{Fore.CYAN}Currently Processing:{Style.RESET_ALL}\")\n        print(f\"Component: {self._current_component}\")\n        print(f\"File: {self._current_file}\\n\")\n    \n    def update(self, active_agent: str, status_message: str = \"\"):\n        \"\"\"Update the visualization with the current active agent and status.\n        \n        Args:\n            active_agent: Name of the currently active agent\n            status_message: Current status message to display\n        \"\"\"\n        self.active_agent = active_agent  # Update the single active agent\n        self._status_message = status_message\n        self._clear_screen()\n        \n        # Build the visualization\n        lines = []\n        \n        # Add header\n        # lines.append(f\"{Fore.CYAN}DocAssist Workflow Status{Style.RESET_ALL}\")\n        # lines.append(\"\")\n        \n        # Display current component info if available\n        if self._current_component and self._current_file:\n            lines.append(f\"Processing: {self._current_component}\")\n            lines.append(f\"File: {self._current_file}\")\n            lines.append(\"\")\n        \n        # Input arrow to Reader\n        # lines.append(\"     Input\")\n        # lines.append(\"       ↓\")\n        \n        # First row: Reader and Searcher with loop\n        for i in range(3):\n            line = (f\"{self._get_agent_color('reader')}{self._agent_art['reader'][i]}\"\n                   f\"  ←→  \"\n                   f\"{self._get_agent_color('searcher')}{self._agent_art['searcher'][i]}\"\n                   f\"{Style.RESET_ALL}\")\n            lines.append(line)\n        \n        # Arrow from Reader to Writer\n        # lines.append(\"       ↓\")\n        \n        # Second row: Writer\n        for i in range(3):\n            line = (f\"    {self._get_agent_color('writer')}{self._agent_art['writer'][i]}{Style.RESET_ALL}\")\n            lines.append(line)\n        \n        # Arrow from Writer to Verifier\n        # lines.append(\"       ↓\")\n        \n        # Third row: Verifier with output\n        for i in range(3):\n            if i == 1:\n                line = (f\"    {self._get_agent_color('verifier')}{self._agent_art['verifier'][i]}{Style.RESET_ALL}  →  Output\")\n            else:\n                line = (f\"    {self._get_agent_color('verifier')}{self._agent_art['verifier'][i]}{Style.RESET_ALL}\")\n            lines.append(line)\n        \n        # # Feedback arrows from Verifier\n        # lines.append(\"       ↑\")\n        # lines.append(\"    ↗  ↑\")\n        \n        # Add status message\n        if self._status_message:\n            lines.append(\"\")\n            lines.append(f\"{Fore.YELLOW}Status: {self._status_message}{Style.RESET_ALL}\")\n        \n        # Print the visualization\n        print(\"\\n\".join(lines))\n        sys.stdout.flush()\n    \n    def reset(self):\n        \"\"\"Reset the visualization state.\"\"\"\n        self.active_agent = None\n        self._status_message = \"\"\n        self._current_component = \"\"\n        self._current_file = \"\"\n        self._clear_screen() "
  },
  {
    "path": "src/visualizer/web_bridge.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nWeb bridge for the docstring generation visualizers.\n\nThis module provides adapters that connect the existing terminal-based\nvisualizers to the web interface. When enabled, the visualizers will send\nupdates to the web interface in addition to their normal terminal output.\n\"\"\"\n\nimport threading\nimport time\nimport functools\nfrom typing import Dict, Any, Optional\n\n# Singleton pattern for the web socket manager\nclass WebSocketManager:\n    \"\"\"Manages the connection to the web socket for sending visualization updates.\"\"\"\n    \n    _instance = None\n    _socket = None\n    _enabled = False\n    \n    def __new__(cls):\n        if cls._instance is None:\n            cls._instance = super(WebSocketManager, cls).__new__(cls)\n        return cls._instance\n    \n    @classmethod\n    def set_socket(cls, socket):\n        \"\"\"Set the socket.io instance for sending updates.\"\"\"\n        cls._socket = socket\n        cls._enabled = True\n    \n    @classmethod\n    def is_enabled(cls):\n        \"\"\"Check if web visualization is enabled.\"\"\"\n        return cls._enabled and cls._socket is not None\n    \n    @classmethod\n    def emit(cls, event, data):\n        \"\"\"Emit an event to the web interface.\"\"\"\n        if cls.is_enabled():\n            try:\n                cls._socket.emit(event, data)\n            except Exception as e:\n                print(f\"Error sending web update: {e}\")\n                \n    @classmethod\n    def disable(cls):\n        \"\"\"Disable web visualization.\"\"\"\n        cls._enabled = False\n\n\nclass WebStatusAdapter:\n    \"\"\"Adapter for the StatusVisualizer to send updates to the web interface.\"\"\"\n    \n    def __init__(self, original_visualizer):\n        \"\"\"\n        Initialize the web status adapter.\n        \n        Args:\n            original_visualizer: The original StatusVisualizer instance\n        \"\"\"\n        self.original = original_visualizer\n        self.socket_manager = WebSocketManager()\n        \n        # Store original methods to avoid recursion\n        self._original_set_active_agent = original_visualizer.set_active_agent\n        self._original_set_status_message = original_visualizer.set_status_message\n        self._original_set_current_component = original_visualizer.set_current_component\n    \n    def set_active_agent(self, agent_name):\n        \"\"\"\n        Set the active agent and send update to web interface.\n        \n        Args:\n            agent_name: Name of the active agent\n        \"\"\"\n        # Call the original method directly\n        result = self._original_set_active_agent(agent_name)\n        \n        # Send update to web interface\n        if self.socket_manager.is_enabled():\n            self.socket_manager.emit('status_update', {\n                'status': {\n                    'active_agent': agent_name,\n                    'status_message': self.original._status_message,\n                    'current_component': self.original._current_component,\n                    'current_file': self.original._current_file\n                }\n            })\n        \n        return result\n    \n    def set_status_message(self, message):\n        \"\"\"\n        Set the status message and send update to web interface.\n        \n        Args:\n            message: The status message\n        \"\"\"\n        # Call the original method directly\n        result = self._original_set_status_message(message)\n        \n        # Send update to web interface\n        if self.socket_manager.is_enabled():\n            self.socket_manager.emit('status_update', {\n                'status': {\n                    'active_agent': self.original.active_agent,\n                    'status_message': message,\n                    'current_component': self.original._current_component,\n                    'current_file': self.original._current_file\n                }\n            })\n        \n        return result\n    \n    def set_current_component(self, focal_component, file_path):\n        \"\"\"\n        Set the current component being processed and send update to web interface.\n        \n        Args:\n            focal_component: The component being processed\n            file_path: The path to the file containing the component\n        \"\"\"\n        # Call the original method directly\n        result = self._original_set_current_component(focal_component, file_path)\n        \n        # Send update to web interface\n        if self.socket_manager.is_enabled():\n            self.socket_manager.emit('status_update', {\n                'status': {\n                    'active_agent': self.original.active_agent,\n                    'status_message': self.original._status_message,\n                    'current_component': focal_component,\n                    'current_file': file_path\n                }\n            })\n            \n            # Special message format for the web interface to parse\n            print(f\"COMPONENT: {focal_component} in file {file_path}\")\n        \n        return result\n\n\nclass WebProgressAdapter:\n    \"\"\"Adapter for the ProgressVisualizer to send updates to the web interface.\"\"\"\n    \n    def __init__(self, original_visualizer):\n        \"\"\"\n        Initialize the web progress adapter.\n        \n        Args:\n            original_visualizer: The original ProgressVisualizer instance\n        \"\"\"\n        self.original = original_visualizer\n        self.socket_manager = WebSocketManager()\n        \n        # Store original methods to avoid recursion\n        self._original_update = original_visualizer.update\n        if hasattr(original_visualizer, 'mark_complete'):\n            self._original_mark_complete = original_visualizer.mark_complete\n    \n    def update(self, component_id=None, status=\"processing\"):\n        \"\"\"\n        Update the progress visualization and send update to web interface.\n        \n        Args:\n            component_id: ID of the component being processed\n            status: Status of the component\n        \"\"\"\n        # Call the original method directly\n        result = self._original_update(component_id, status)\n        \n        # Send update to web interface\n        if self.socket_manager.is_enabled():\n            # Get the component status from the original visualizer\n            component_status = {}\n            for comp_id in self.original.components:\n                if comp_id in self.original.processed:\n                    component_status[comp_id] = \"complete\"\n                elif comp_id == self.original.current:\n                    component_status[comp_id] = \"in_progress\"\n                else:\n                    component_status[comp_id] = \"not_started\"\n            \n            self.socket_manager.emit('status_update', {\n                'progress': {\n                    'total_components': len(self.original.sorted_order),\n                    'processed_components': len(self.original.processed),\n                    'current_component': self.original.current,\n                    'component_status': component_status\n                }\n            })\n            \n            # Special message format for the web interface to parse\n            print(f\"PROGRESS: {len(self.original.processed)}/{len(self.original.sorted_order)} components processed\")\n        \n        return result\n    \n    def mark_complete(self, component_id):\n        \"\"\"\n        Mark a component as complete and send update to web interface.\n        \n        Args:\n            component_id: ID of the component to mark as complete\n        \"\"\"\n        # Check if the original visualizer has mark_complete\n        if not hasattr(self, '_original_mark_complete'):\n            # Fall back to update\n            return self.update(component_id, \"complete\")\n            \n        # Call the original method directly\n        result = self._original_mark_complete(component_id)\n        \n        # Update web interface\n        if self.socket_manager.is_enabled():\n            # Use the update method to send progress\n            self.update(component_id, \"complete\")\n        \n        return result\n\n\ndef patch_visualizers():\n    \"\"\"\n    Patch the existing visualizer classes to add web interface support.\n    \n    This function should be called before creating any visualizer instances\n    to ensure they have web support.\n    \"\"\"\n    from . import StatusVisualizer, ProgressVisualizer\n    \n    # Check if already patched to avoid double patching\n    if hasattr(StatusVisualizer, '_web_patched'):\n        return\n    \n    # Mark as patched\n    StatusVisualizer._web_patched = True\n    ProgressVisualizer._web_patched = True\n    \n    # Store the original __init__ methods\n    original_status_init = StatusVisualizer.__init__\n    original_progress_init = ProgressVisualizer.__init__\n    \n    # Create patched __init__ methods\n    def patched_status_init(self, *args, **kwargs):\n        original_status_init(self, *args, **kwargs)\n        # Create adapter and store original methods\n        adapter = WebStatusAdapter(self)\n        # Replace methods with adapter methods\n        self.set_active_agent = adapter.set_active_agent\n        self.set_status_message = adapter.set_status_message\n        self.set_current_component = adapter.set_current_component\n    \n    def patched_progress_init(self, *args, **kwargs):\n        original_progress_init(self, *args, **kwargs)\n        # Create adapter and store original methods\n        adapter = WebProgressAdapter(self)\n        # Replace methods with adapter methods\n        self.update = adapter.update\n        if hasattr(self, 'mark_complete'):\n            self.mark_complete = adapter.mark_complete\n    \n    # Apply the patches\n    StatusVisualizer.__init__ = patched_status_init\n    ProgressVisualizer.__init__ = patched_progress_init "
  },
  {
    "path": "src/web/README.md",
    "content": "# DocAgent Web Interface\n\nA real-time web visualization system for the DocAgent docstring generation tool.\n\n## Overview\n\nThe DocAgent Web Interface provides a modern, interactive web UI for generating and tracking Python docstring generation. The application visualizes the agent-based docstring generation process in real-time, allowing users to monitor progress, view code structure, track completeness metrics, and manage the configuration.\n\n## Features\n\n- **Configuration Management**: Easily configure all aspects of the docstring generation process (Repository Path, LLM settings, Flow Control, Docstring Options) through a user-friendly web form. Test LLM API connectivity before starting.\n- **Real-time Visualization**: Observe the docstring generation process as it happens.\n- **Agent Status Tracking**: View which agent (Reader, Searcher, Writer, Verifier) is currently active in the generation workflow via a visual graph.\n- **Repository Structure Visualization**: Interactive tree visualization of your Python codebase, highlighting files as they are processed (White: unprocessed, Yellow: processing, Green: completed).\n- **Dynamic Progress Tracking**: Real-time progress bars and component completion tracking.\n- **Completeness Metrics Visualization**: Visual representation of docstring completeness across your codebase, updated as the generation progresses (visible in the left sidebar).\n- **Log Viewer**: Consolidated view of the generation process logs.\n- **Process Control**: Start and stop the generation process via UI buttons.\n\n## Architecture\n\n### Backend\n\nThe web application is built using:\n\n- **Flask**: Web framework for the backend server\n- **Socket.IO**: Real-time bidirectional communication between client and server\n- **Eventlet**: Asynchronous networking library for handling concurrent connections\n\n### Frontend\n\nThe frontend uses:\n\n- **Bootstrap 5**: CSS framework for responsive design\n- **D3.js**: Data visualization library for interactive repository and agent visualizations\n- **Socket.IO Client**: Real-time communication with the backend\n- **jQuery**: DOM manipulation and event handling\n\n### Directory Structure\n\n```\nsrc/web/\n├── app.py                 - Main Flask application\n├── config_handler.py      - Handles configuration loading/saving\n├── process_handler.py     - Manages the docstring generation process\n├── visualization_handler.py - Handles visualization state management\n├── static/                - Static assets\n│   ├── css/               - CSS stylesheets\n│   │   └── style.css      - Custom styling\n│   └── js/                - JavaScript files\n│       ├── completeness.js     - Completeness visualization\n│       ├── config.js           - Configuration handling\n│       ├── log-handler.js      - Log display handling\n│       ├── main.js             - Main application logic\n│       ├── repo-structure.js   - Repository structure visualization\n│       └── status-visualizer.js - Agent status visualization\n└── templates/             - HTML templates\n    └── index.html         - Main application page\n```\n\n## Data Flow\n\n1.  User configures settings via the web form.\n2.  User clicks \"Start Generation\".\n3.  Flask backend spawns a subprocess running the `generate_docstrings.py` script (expected in the project root).\n4.  Process output (status updates, logs, metrics) is captured and parsed in real-time by the backend.\n5.  Parsed events are emitted via Socket.IO to the frontend.\n6.  Frontend components (Agent Status, Repo Structure, Logs, Progress, Completeness) update dynamically based on the received events.\n7.  User receives real-time feedback on the generation process.\n8.  User can stop the process using the \"Stop Generation\" button.\n\n\n\n## Usage Guide\n\n### 1. Starting the Web Interface\n\nRun the web application from the project root directory:\n\n```bash\npython run_web_ui.py\n```\n\nBy default, the web interface will be available at `http://127.0.0.1:5000`.\n\nYou can customize the host and port:\n\n```bash\n# Example: Run on port 8080, accessible externally\npython run_web_ui.py --host 0.0.0.0 --port 8080\n```\n\n### 2. Configuration\n\nThe initial screen presents configuration options:\n\n- **Repository Path**: Path to the Python codebase for docstring generation.\n- **LLM Configuration**: Settings for the language model (Type, API Key, Model, Temperature, Max Tokens). Use the \"Test API\" button to verify credentials.\n- **Flow Control**: Advanced settings for the generation process.\n- **Docstring Options**: Control options like overwriting existing docstrings.\n\n### 3. Starting the Generation Process\n\n1.  Fill in the configuration form accurately.\n2.  Click \"Start Generation\".\n3.  The interface will switch to the monitoring/visualization view.\n\n### 4. Monitoring the Generation Process\n\nThe visualization interface consists of several panels:\n\n- **Agent Status Panel**: Shows the current active agent in the workflow graph.\n- **Repository Structure Panel**: Displays the interactive codebase tree, highlighting the currently processed file.\n- **Logs and Progress Panel**: Shows real-time logs and overall progress.\n- **Completeness Panel (Sidebar)**: Shows statistics about docstring completeness.\n\n### 5. Stopping the Process\n\nClick the \"Stop Generation\" button in the header to terminate the process early.\n"
  },
  {
    "path": "src/web/__init__.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nWeb application for docstring generation visualization.\n\nThis module provides a web-based interface for configuring and visualizing\nthe progress of docstring generation in a Python codebase.\n\"\"\"\n\nfrom .app import create_app\n\n__all__ = ['create_app'] "
  },
  {
    "path": "src/web/app.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nMain Flask application for the docstring generation visualization.\n\nThis module defines the Flask application, routes, and event handlers for\nthe web-based docstring generation visualization system.\n\"\"\"\n\nimport os\nimport json\nimport yaml\nimport threading\nimport eventlet\nfrom pathlib import Path\nfrom flask import Flask, render_template, request, jsonify, send_from_directory\nfrom flask_socketio import SocketIO\n\n# Patch standard library for async support with eventlet\neventlet.monkey_patch()\n\nfrom . import config_handler\nfrom . import visualization_handler\nfrom . import process_handler\n\ndef create_app(debug=True):\n    \"\"\"\n    Create and configure the Flask application.\n    \n    Args:\n        debug: Whether to run the application in debug mode\n        \n    Returns:\n        The configured Flask application instance\n    \"\"\"\n    app = Flask(__name__, \n                static_folder='static',\n                template_folder='templates')\n    app.config['SECRET_KEY'] = 'docstring-generator-secret!'\n    app.config['DEBUG'] = debug\n    \n    # Initialize SocketIO for real-time updates with async mode\n    socketio = SocketIO(app, cors_allowed_origins=\"*\", async_mode='eventlet')\n    \n    # Store application state\n    app.config['APP_STATE'] = {\n        'is_running': False,\n        'config': {},\n        'repo_path': '',\n        'process': None\n    }\n    \n    # Routes\n    @app.route('/')\n    def index():\n        \"\"\"Render the main application page.\"\"\"\n        return render_template('index.html')\n    \n    @app.route('/api/default_config')\n    def get_default_config():\n        \"\"\"Get the default configuration from agent_config.yaml.\"\"\"\n        return jsonify(config_handler.get_default_config())\n    \n    @app.route('/api/test_api', methods=['POST'])\n    def test_api():\n        \"\"\"Test the LLM API connection with a simple query.\"\"\"\n        data = request.json\n        \n        if not data or 'api_key' not in data or not data['api_key']:\n            return jsonify({\n                'status': 'error',\n                'message': 'API key is required'\n            })\n        \n        # Get the configuration\n        llm_type = data.get('llm_type', 'claude')\n        api_key = data.get('api_key', '')\n        model = data.get('model', 'claude-3-5-haiku-latest')\n        \n        try:\n            # Import the appropriate LLM client based on type\n            if llm_type.lower() == 'claude':\n                try:\n                    import anthropic\n                    client = anthropic.Anthropic(api_key=api_key)\n                    \n                    # Send a simple test message\n                    response = client.messages.create(\n                        model=model,\n                        max_tokens=100,\n                        messages=[\n                            {\"role\": \"user\", \"content\": \"Who are you? Please keep your answer very brief.\"}\n                        ]\n                    )\n                    \n                    # Extract the response text\n                    if response and hasattr(response, 'content') and len(response.content) > 0:\n                        model_response = response.content[0].text\n                    else:\n                        model_response = \"No response content\"\n                    \n                    return jsonify({\n                        'status': 'success',\n                        'message': 'Successfully connected to Claude API',\n                        'model_response': model_response\n                    })\n                    \n                except Exception as e:\n                    return jsonify({\n                        'status': 'error',\n                        'message': f'Error connecting to Claude API: {str(e)}'\n                    })\n                    \n            elif llm_type.lower() == 'openai':\n                try:\n                    import openai\n                    client = openai.OpenAI(api_key=api_key)\n                    \n                    # Send a simple test message\n                    response = client.chat.completions.create(\n                        model=model,\n                        max_tokens=100,\n                        messages=[\n                            {\"role\": \"user\", \"content\": \"Who are you? Please keep your answer very brief.\"}\n                        ]\n                    )\n                    \n                    # Extract the response text\n                    if response and hasattr(response, 'choices') and len(response.choices) > 0:\n                        model_response = response.choices[0].message.content\n                    else:\n                        model_response = \"No response content\"\n                    \n                    return jsonify({\n                        'status': 'success',\n                        'message': 'Successfully connected to OpenAI API',\n                        'model_response': model_response\n                    })\n                    \n                except Exception as e:\n                    return jsonify({\n                        'status': 'error',\n                        'message': f'Error connecting to OpenAI API: {str(e)}'\n                    })\n            \n            else:\n                return jsonify({\n                    'status': 'error',\n                    'message': f'Unsupported LLM type: {llm_type}'\n                })\n                \n        except ImportError as e:\n            return jsonify({\n                'status': 'error',\n                'message': f'Missing required dependency: {str(e)}'\n            })\n    \n    @app.route('/api/start', methods=['POST'])\n    def start_generation():\n        \"\"\"Start the docstring generation process.\"\"\"\n        if app.config['APP_STATE']['is_running']:\n            return jsonify({'status': 'error', 'message': 'Generation already in progress'})\n        \n        data = request.json\n        \n        # Validate repo path\n        repo_path = data['repo_path']\n        if not os.path.exists(repo_path):\n            return jsonify({'status': 'error', 'message': f'Repository path not found: {repo_path}'})\n        \n        # Save configuration\n        try:\n            config_path = config_handler.save_config(data['config'])\n        except ValueError as e:\n            return jsonify({'status': 'error', 'message': str(e)})\n        \n        # Store in application state\n        app.config['APP_STATE']['config'] = data['config']\n        app.config['APP_STATE']['repo_path'] = repo_path\n        app.config['APP_STATE']['is_running'] = True\n        \n        # Start the generation process\n        thread = socketio.start_background_task(\n            process_handler.start_generation_process,\n            socketio, repo_path, config_path\n        )\n        \n        app.config['APP_STATE']['process'] = thread\n        \n        return jsonify({'status': 'success', 'message': 'Generation started'})\n    \n    @app.route('/api/stop', methods=['POST'])\n    def stop_generation():\n        \"\"\"Stop the docstring generation process.\"\"\"\n        if not app.config['APP_STATE']['is_running']:\n            return jsonify({'status': 'error', 'message': 'No generation in progress'})\n        \n        process_handler.stop_generation_process()\n        app.config['APP_STATE']['is_running'] = False\n        \n        return jsonify({'status': 'success', 'message': 'Generation stopped'})\n    \n    @app.route('/api/status')\n    def get_status():\n        \"\"\"Get the current status of the generation process.\"\"\"\n        return jsonify({\n            'is_running': app.config['APP_STATE']['is_running'],\n            'repo_path': app.config['APP_STATE']['repo_path']\n        })\n    \n    @app.route('/api/completeness')\n    def get_completeness():\n        \"\"\"Get the current completeness evaluation of the repository.\"\"\"\n        if not app.config['APP_STATE']['repo_path']:\n            return jsonify({'status': 'error', 'message': 'No repository selected'})\n        \n        results = visualization_handler.get_completeness_data(app.config['APP_STATE']['repo_path'])\n        return jsonify(results)\n    \n    # Socket.IO event handlers\n    @socketio.on('connect')\n    def handle_connect():\n        \"\"\"Handle client connection to Socket.IO.\"\"\"\n        if app.config['APP_STATE']['is_running']:\n            # Send current state to newly connected client\n            socketio.emit('status_update', visualization_handler.get_current_status())\n    \n    # Additional routes and event handlers can be added here\n    \n    return app, socketio "
  },
  {
    "path": "src/web/config_handler.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nConfiguration handler for the docstring generation web interface.\n\nThis module handles reading, writing, and validating the configuration for\nthe docstring generation process.\n\"\"\"\n\nimport os\nimport yaml\nimport json\nimport tempfile\nfrom pathlib import Path\n\ndef get_default_config():\n    \"\"\"\n    Get the default configuration from agent_config.yaml.\n    \n    Returns:\n        Dictionary containing the default configuration\n    \"\"\"\n    default_config_path = Path('config/agent_config.yaml')\n    \n    if not default_config_path.exists():\n        return {\n            'llm': {\n                'type': 'claude',\n                'api_key': '',\n                'model': 'claude-3-5-haiku-latest',\n                'temperature': 0.1,\n                'max_tokens': 4096\n            },\n            'flow_control': {\n                'max_reader_search_attempts': 2,\n                'max_verifier_rejections': 1,\n                'status_sleep_time': 1\n            },\n            'docstring_options': {\n                'overwrite_docstrings': False\n            }\n        }\n    \n    with open(default_config_path, 'r') as f:\n        config = yaml.safe_load(f)\n    \n    return config\n\ndef validate_config(config):\n    \"\"\"\n    Validate that the configuration has the required fields.\n    \n    Args:\n        config: Dictionary containing the configuration to validate\n        \n    Returns:\n        Tuple of (is_valid, error_message)\n    \"\"\"\n    required_keys = ['llm', 'flow_control', 'docstring_options']\n    \n    for key in required_keys:\n        if key not in config:\n            return False, f\"Missing required configuration section: {key}\"\n    \n    # Check specific required fields in llm section\n    llm_required = ['type', 'api_key', 'model']\n    for key in llm_required:\n        if key not in config['llm']:\n            return False, f\"Missing required field in llm section: {key}\"\n    \n    return True, \"\"\n\ndef save_config(config):\n    \"\"\"\n    Save the configuration to a temporary file for use by the generation process.\n    \n    Args:\n        config: Dictionary containing the configuration to save\n        \n    Returns:\n        Path to the saved configuration file\n    \"\"\"\n    # Validate configuration\n    is_valid, error_message = validate_config(config)\n    if not is_valid:\n        raise ValueError(f\"Invalid configuration: {error_message}\")\n    \n    # Create a temporary file\n    temp_dir = tempfile.gettempdir()\n    config_file = os.path.join(temp_dir, 'docstring_generator_config.yaml')\n    \n    with open(config_file, 'w') as f:\n        yaml.dump(config, f, default_flow_style=False)\n    \n    return config_file "
  },
  {
    "path": "src/web/process_handler.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nProcess handler for running the docstring generation.\n\nThis module handles starting, monitoring, and stopping the docstring generation\nprocess, as well as capturing its output and sending it to the web interface.\n\"\"\"\n\nimport os\nimport sys\nimport subprocess\nimport threading\nimport tempfile\nimport signal\nimport re\nfrom pathlib import Path\nfrom typing import Optional, Dict, Any\n\nfrom . import visualization_handler\n\n# Global variables to track the process\nprocess = None\nshould_stop = False\n\n# Custom output handler to intercept and parse the output\nclass OutputHandler(threading.Thread):\n    \"\"\"Thread to handle output from the docstring generation process.\"\"\"\n    \n    def __init__(self, process, socketio):\n        \"\"\"\n        Initialize the output handler.\n        \n        Args:\n            process: The subprocess.Popen object for the docstring generation process\n            socketio: The Flask-SocketIO instance for sending updates to clients\n        \"\"\"\n        threading.Thread.__init__(self)\n        self.process = process\n        self.socketio = socketio\n        self.daemon = True\n    \n    def run(self):\n        \"\"\"Read output from the process and update the visualization state.\"\"\"\n        global should_stop\n        \n        # Regular expressions for parsing different types of output\n        status_regex = re.compile(r'STATUS: Agent: (\\w+), Message: (.+)')\n        component_regex = re.compile(r'COMPONENT: (.+) in file (.+)')\n        progress_regex = re.compile(r'PROGRESS: (\\d+)/(\\d+) components processed')\n        log_regex = re.compile(r'\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} - (\\w+) - (\\w+) - (.+)')\n        \n        # Additional regex to detect agent activity from regular logs\n        agent_activity_regex = re.compile(r'(reader|writer|searcher|verifier)', re.IGNORECASE)\n        docstring_update_regex = re.compile(r'Successfully updated docstring for (.+)|Completed docstring generation for (.+)', re.IGNORECASE)\n        \n        # Patterns to filter out visualization-related output from logs\n        visualization_patterns = [\n            r'┌─+┐',     # Box top\n            r'│.*│',     # Box content\n            r'└─+┘',     # Box bottom\n            r'Agent:',   # Agent status\n            r'Status:',  # Status message\n            r'Component:',  # Component info\n            r'╔═+╗',     # Double-line box top\n            r'║.*║',     # Double-line box content\n            r'╚═+╝',     # Double-line box bottom\n            r'▶ ',       # Progress indicators\n            r'→ ',       # Arrow indicators\n            r'⦿',        # Bullet indicators\n            r'Processing component \\d+/\\d+',  # Progress messages\n            r'╡.*╞',     # Table separators\n            r'═+',       # Table lines\n            r'DocAgent (?:Workflow )?Status',  # Workflow status header\n            r'Processing: ',    # Processing status line\n            r'File: ',          # File status line\n            r'Active Agent: ',  # Agent status line\n            r'Status: ',        # Status message line\n            r'Workflow Input:',  # Input section\n            r'Component Name:',  # Input component name\n            r'File Path:',       # Input file path\n            r'Dependencies:',    # Input dependencies\n            r'Code:',            # Input code\n            r'^Input:',          # Input header\n            r'\\[.*?\\]',          # Status messages in brackets\n        ]\n        visualization_filter = re.compile('|'.join(visualization_patterns))\n        \n        # Read each line from the process output\n        for line in iter(self.process.stdout.readline, b''):\n            if should_stop:\n                break\n                \n            # Decode the line\n            try:\n                line = line.decode('utf-8').rstrip()\n            except UnicodeDecodeError:\n                continue\n            \n            # Process workflow status lines separately to update agent status\n            if 'Processing:' in line or 'File:' in line:\n                if 'Processing:' in line:\n                    component = line.split('Processing:')[1].strip()\n                    if component:\n                        visualization_handler.update_component_focus(component, \"\")\n                if 'File:' in line:\n                    file_path = line.split('File:')[1].strip()\n                    if file_path:\n                        # Update the current file without changing the component\n                        current_status = visualization_handler.get_current_status()\n                        if 'status' in current_status and current_status['status'].get('current_component'):\n                            visualization_handler.update_component_focus(\n                                current_status['status']['current_component'], \n                                file_path\n                            )\n                self.socketio.emit('status_update', visualization_handler.get_current_status())\n            \n            # Add to log messages - filter out visualization\n            if not visualization_filter.search(line):\n                visualization_handler.add_log_message(line)\n                self.socketio.emit('log_line', line)\n            \n            # Check for status updates\n            status_match = status_regex.search(line)\n            if status_match:\n                agent, message = status_match.groups()\n                visualization_handler.update_agent_status(agent, message)\n                self.socketio.emit('status_update', visualization_handler.get_current_status())\n                continue\n            \n            # Check for agent activity in regular logs\n            if not status_match:  # Only check if we didn't already match a status\n                agent_match = agent_activity_regex.search(line)\n                if agent_match and ('active' in line.lower() or 'using' in line.lower() or 'processing' in line.lower()):\n                    # Extract agent name from logs\n                    agent = agent_match.group(1).capitalize()\n                    visualization_handler.update_agent_status(agent, \"Processing\")\n                    self.socketio.emit('status_update', visualization_handler.get_current_status())\n            \n            # Check for component updates\n            component_match = component_regex.search(line)\n            if component_match:\n                component, file_path = component_match.groups()\n                visualization_handler.update_component_focus(component, file_path)\n                visualization_handler.update_file_status(file_path, 'in_progress')\n                self.socketio.emit('status_update', visualization_handler.get_current_status())\n                continue\n            \n            # Check for progress updates\n            progress_match = progress_regex.search(line)\n            if progress_match:\n                processed, total = progress_match.groups()\n                # We don't have the current component or component status from this regex,\n                # so we'll just update the counts\n                visualization_handler.update_progress(int(total), int(processed), '', {})\n                self.socketio.emit('status_update', visualization_handler.get_current_status())\n                continue\n            \n            # Also check for progress updates in normal log lines\n            progress_in_log = re.search(r'Processing component (\\d+)/(\\d+)', line)\n            if progress_in_log:\n                current, total = progress_in_log.groups()\n                visualization_handler.update_progress(int(total), int(current), '', {})\n                self.socketio.emit('status_update', visualization_handler.get_current_status())\n            \n            # Check for docstring updates\n            docstring_update_match = docstring_update_regex.search(line)\n            if docstring_update_match:\n                component = docstring_update_match.group(1) or docstring_update_match.group(2)\n                # If this is a file path, extract it\n                if component and '/' in component:\n                    file_path = component\n                    visualization_handler.update_file_status(file_path, 'complete')\n                    self.socketio.emit('status_update', visualization_handler.get_current_status())\n                    # Emit a special event for docstring updates\n                    self.socketio.emit('docstring_updated', {'component': component})\n            \n            # Try to extract component information from other log lines\n            if 'Processing' in line and ':' in line and 'file' in line:\n                parts = line.split('file')\n                if len(parts) > 1:\n                    file_path = parts[1].strip()\n                    component = parts[0].split('Processing')[-1].strip()\n                    if component and file_path:\n                        visualization_handler.update_component_focus(component, file_path)\n                        visualization_handler.update_file_status(file_path, 'in_progress')\n                        self.socketio.emit('status_update', visualization_handler.get_current_status())\n            \n            # Check for log messages\n            log_match = log_regex.search(line)\n            if log_match:\n                _, level, message = log_match.groups()\n                # If the message indicates completion of a file, update the file status\n                if 'Completed docstring generation for' in message or 'Successfully updated docstring for' in message:\n                    # Try to extract the file path from the message\n                    file_match = re.search(r'for file (.+)$|for (.+)', message)\n                    if file_match:\n                        file_path = file_match.group(1) or file_match.group(2)\n                        if file_path and '.' in file_path:  # Simple check to ensure it looks like a filename\n                            visualization_handler.update_file_status(file_path, 'complete')\n                            self.socketio.emit('status_update', visualization_handler.get_current_status())\n                            # Emit a special event for docstring updates\n                            self.socketio.emit('docstring_updated', {'component': file_path})\n                \n                self.socketio.emit('log_message', {'level': level, 'message': message})\n\ndef start_generation_process(socketio, repo_path: str, config_path: str):\n    \"\"\"\n    Start the docstring generation process.\n    \n    Args:\n        socketio: The Flask-SocketIO instance for sending updates to clients\n        repo_path: Path to the repository to generate docstrings for\n        config_path: Path to the configuration file\n    \"\"\"\n    global process, should_stop\n    \n    should_stop = False\n    \n    # Set an initial status to show we're starting\n    visualization_handler.update_agent_status(\"System\", \"Starting docstring generation...\")\n    socketio.emit('status_update', visualization_handler.get_current_status())\n    \n    # Connect the socket to the web bridge\n    try:\n        from src.visualizer.web_bridge import WebSocketManager\n        WebSocketManager.set_socket(socketio)\n    except ImportError:\n        socketio.emit('log_message', {\n            'level': 'warning',\n            'message': 'Web bridge not available. Some features may not work correctly.'\n        })\n    \n    # Get the repository structure and update the visualization state\n    try:\n        structure = visualization_handler.get_repo_structure(repo_path)\n        socketio.emit('status_update', visualization_handler.get_current_status())\n        socketio.emit('log_message', {\n            'level': 'info',\n            'message': f'Repository structure loaded with {len(structure[\"children\"])} top-level items'\n        })\n    except Exception as e:\n        socketio.emit('log_message', {\n            'level': 'error',\n            'message': f'Error loading repository structure: {str(e)}'\n        })\n    \n    # Find the generate_docstrings.py script\n    script_path = Path(__file__).parent.parent.parent / 'generate_docstrings.py'\n    \n    if not script_path.exists():\n        socketio.emit('error', {\n            'message': f'Could not find docstring generation script at {script_path}'\n        })\n        return\n    \n    # Start the process\n    try:\n        # Create a temporary file for redirecting stdout and stderr\n        process = subprocess.Popen(\n            [sys.executable, str(script_path), \n             '--repo-path', repo_path, \n             '--config-path', config_path,\n             '--enable-web'],  # Enable web integration\n            stdout=subprocess.PIPE,\n            stderr=subprocess.STDOUT,\n            bufsize=1,\n            universal_newlines=False\n        )\n        \n        # Start the output handler\n        handler = OutputHandler(process, socketio)\n        handler.start()\n        \n        # Wait for the process to complete\n        return_code = process.wait()\n        \n        if return_code == 0:\n            socketio.emit('complete', {\n                'message': 'Docstring generation completed successfully'\n            })\n        else:\n            socketio.emit('error', {\n                'message': f'Docstring generation failed with return code {return_code}'\n            })\n    \n    except Exception as e:\n        socketio.emit('error', {\n            'message': f'Error starting docstring generation process: {str(e)}'\n        })\n    \n    finally:\n        process = None\n\ndef stop_generation_process():\n    \"\"\"\n    Stop the docstring generation process.\n    \n    Returns:\n        True if the process was stopped, False otherwise\n    \"\"\"\n    global process, should_stop\n    \n    if process is None:\n        return False\n    \n    should_stop = True\n    \n    try:\n        # Disconnect from the web bridge\n        try:\n            from src.visualizer.web_bridge import WebSocketManager\n            WebSocketManager.disable()\n        except ImportError:\n            pass\n        \n        # Try to terminate the process gracefully first\n        process.terminate()\n        \n        # Wait for up to 5 seconds for the process to terminate\n        try:\n            process.wait(timeout=5)\n        except subprocess.TimeoutExpired:\n            # If the process didn't terminate, kill it\n            process.kill()\n        \n        return True\n    \n    except Exception as e:\n        print(f\"Error stopping process: {e}\")\n        return False "
  },
  {
    "path": "src/web/run.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nEntry point for running the docstring generation visualization web application.\n\nThis script creates and starts the Flask application for visualizing the\ndocstring generation process.\n\"\"\"\n\nimport os\nimport sys\nimport argparse\nfrom pathlib import Path\n\nfrom .app import create_app\n\ndef main():\n    \"\"\"\n    Parse command line arguments and start the web application.\n    \"\"\"\n    parser = argparse.ArgumentParser(description='Start the docstring generation visualization web application')\n    parser.add_argument('--host', default='127.0.0.1', help='Host to bind the server to')\n    parser.add_argument('--port', type=int, default=5000, help='Port to bind the server to')\n    parser.add_argument('--debug', action='store_true', help='Run the application in debug mode')\n    \n    args = parser.parse_args()\n    \n    # Create the Flask application\n    app, socketio = create_app(debug=args.debug)\n    \n    print(f\"Starting docstring generation visualization web application on http://{args.host}:{args.port}\")\n    print(\"Press Ctrl+C to stop the server\")\n    \n    # Start the server\n    socketio.run(app, host=args.host, port=args.port, debug=args.debug, allow_unsafe_werkzeug=True)\n\nif __name__ == '__main__':\n    # Add the parent directory to the path so we can import the module\n    sys.path.insert(0, str(Path(__file__).parent.parent.parent))\n    main() "
  },
  {
    "path": "src/web/static/css/style.css",
    "content": "/* Copyright (c) Meta Platforms, Inc. and affiliates */\n/* Main layout styles */\nbody {\n    overflow-x: hidden;\n}\n\n.sidebar {\n    transition: width 0.3s ease;\n    box-shadow: 2px 0 5px rgba(0, 0, 0, 0.1);\n}\n\n/* Header logo styles */\n.header-logo {\n    max-height: 30px;\n    margin-right: 10px;\n}\n\n/* Transition for main content when sidebar changes */\n.main-content-transition {\n    transition: all 0.3s ease;\n}\n\n/* Status visualizer styles */\n.agent-box {\n    border: 1px solid #ccc;\n    border-radius: 5px;\n    padding: 10px;\n    margin-bottom: 10px;\n    text-align: center;\n    transition: all 0.3s ease;\n}\n\n.agent-box.active {\n    border-color: #198754;\n    box-shadow: 0 0 5px rgba(25, 135, 84, 0.5);\n    background-color: rgba(25, 135, 84, 0.1);\n}\n\n.agent-box h3 {\n    margin-top: 5px;\n    font-size: 1.2rem;\n}\n\n.component-info {\n    margin-top: 20px;\n    padding: 10px;\n    background-color: #f8f9fa;\n    border-radius: 5px;\n    border-left: 3px solid #007bff;\n}\n\n/* Agent workflow visualization styles */\n#agent-workflow {\n    min-height: 200px;\n}\n\n.workflow-node circle {\n    fill: #ffffff;  /* White background by default */\n    stroke: #6c757d;\n    stroke-width: 1.5px;\n    transition: all 0.3s ease;\n}\n\n.workflow-node.active circle {\n    fill: #198754;  /* Green background when active */\n    stroke: #0d6efd;\n    stroke-width: 2px;\n}\n\n.workflow-link {\n    stroke: #adb5bd;\n    stroke-width: 2px;\n    fill: none;\n    marker-end: url(#arrowhead);\n}\n\n.workflow-label {\n    font-size: 12px;\n    text-anchor: middle;\n    dominant-baseline: middle;\n    fill: #212529;\n    pointer-events: none;\n    transition: all 0.3s ease;\n}\n\n.workflow-node.active .workflow-label {\n    fill: #fff;\n    font-weight: bold;\n}\n\n.workflow-text-label {\n    font-size: 14px;\n    text-anchor: middle;\n    dominant-baseline: middle;\n    fill: #666;\n    font-weight: bold;\n}\n\n/* Repository structure styles */\n.repo-node {\n    cursor: pointer;\n    transition: all 0.2s ease;\n}\n\n.repo-node:hover {\n    filter: brightness(0.9);\n}\n\n.repo-node-label {\n    font-size: 0.9rem;\n    overflow: hidden;\n    text-overflow: ellipsis;\n    white-space: nowrap;\n}\n\n.repo-node-complete {\n    fill: #198754;  /* Green */\n}\n\n.repo-node-in-progress {\n    fill: #ffc107;  /* Yellow */\n}\n\n.repo-node-not-started {\n    fill: #f8f9fa;  /* Light grey */\n}\n\n.repo-node-focus {\n    stroke: #dc3545;  /* Red */\n    stroke-width: 2;\n}\n\n/* Log container styles */\n#log-container {\n    font-family: monospace;\n    font-size: 0.85rem;\n    line-height: 1.5;\n    background-color: #212529;\n    color: #f8f9fa;\n    border-radius: 5px;\n    height: 250px;\n    max-height: 250px;\n}\n\n.log-line {\n    margin-bottom: 2px;\n    white-space: pre-wrap;\n    word-break: break-word;\n}\n\n.log-info {\n    color: #f8f9fa;\n}\n\n.log-warning {\n    color: #ffc107;\n}\n\n.log-error {\n    color: #dc3545;\n}\n\n.log-debug {\n    color: #6c757d;\n}\n\n/* Completeness table styles */\n.completeness-table {\n    font-size: 0.9rem;\n}\n\n.progress-cell {\n    width: 100px;\n}\n\n.progress-bar-mini {\n    height: 10px;\n    margin-top: 5px;\n    border-radius: 5px;\n}\n\n/* Animation for focus transitions */\n@keyframes pulse {\n    0% {\n        transform: scale(1);\n        opacity: 1;\n    }\n    50% {\n        transform: scale(1.05);\n        opacity: 0.8;\n    }\n    100% {\n        transform: scale(1);\n        opacity: 1;\n    }\n}\n\n.highlight-focus {\n    animation: pulse 1s;\n} "
  },
  {
    "path": "src/web/static/js/completeness.js",
    "content": "// Copyright (c) Meta Platforms, Inc. and affiliates\n/**\n * Completeness visualization for the docstring generation web application.\n * \n * This file provides functions for rendering and updating the completeness\n * visualization in the web interface.\n */\n\n/**\n * Update the completeness view with the evaluation results.\n * \n * @param {Object} completenessData - The completeness evaluation data from the server\n */\nfunction updateCompletenessView(completenessData) {\n    if (!completenessData || !completenessData.files) {\n        $('#completeness-data').html(`\n            <div class=\"alert alert-warning mb-0\">\n                No completeness data available\n            </div>\n        `);\n        return;\n    }\n    \n    // Calculate overall statistics\n    const totalFiles = completenessData.files.length;\n    let totalClasses = 0;\n    let totalClassesWithDocs = 0;\n    let totalFunctions = 0;\n    let totalFunctionsWithDocs = 0;\n    \n    completenessData.files.forEach(file => {\n        if (file.classes) {\n            totalClasses += file.classes.length;\n            totalClassesWithDocs += file.classes.filter(c => c.has_docstring).length;\n        }\n        if (file.functions) {\n            totalFunctions += file.functions.length;\n            totalFunctionsWithDocs += file.functions.filter(f => f.has_docstring).length;\n        }\n    });\n    \n    const classCompleteness = totalClasses > 0 ? Math.round((totalClassesWithDocs / totalClasses) * 100) : 0;\n    const functionCompleteness = totalFunctions > 0 ? Math.round((totalFunctionsWithDocs / totalFunctions) * 100) : 0;\n    const totalComponents = totalClasses + totalFunctions;\n    const totalComponentsWithDocs = totalClassesWithDocs + totalFunctionsWithDocs;\n    const overallCompleteness = totalComponents > 0 ? Math.round((totalComponentsWithDocs / totalComponents) * 100) : 0;\n    \n    // Create the HTML for the completeness view\n    let html = `\n        <div class=\"mb-3\">\n            <h5>Overall Completeness: ${overallCompleteness}%</h5>\n            <div class=\"progress mb-2\">\n                <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: ${overallCompleteness}%;\" aria-valuenow=\"${overallCompleteness}\" aria-valuemin=\"0\" aria-valuemax=\"100\">${overallCompleteness}%</div>\n            </div>\n            <div class=\"row\">\n                <div class=\"col-6\">\n                    <small>Classes: ${totalClassesWithDocs}/${totalClasses} (${classCompleteness}%)</small>\n                </div>\n                <div class=\"col-6\">\n                    <small>Functions: ${totalFunctionsWithDocs}/${totalFunctions} (${functionCompleteness}%)</small>\n                </div>\n            </div>\n        </div>\n        \n        <h5>Files (${totalFiles})</h5>\n        <div class=\"table-responsive\">\n            <table class=\"table table-sm completeness-table\">\n                <thead>\n                    <tr>\n                        <th>File</th>\n                        <th class=\"text-center\">Classes</th>\n                        <th class=\"text-center\">Functions</th>\n                        <th class=\"progress-cell\">Completeness</th>\n                    </tr>\n                </thead>\n                <tbody>\n    `;\n    \n    // Sort files by completeness (lowest first)\n    const sortedFiles = [...completenessData.files].sort((a, b) => {\n        const aTotal = (a.classes?.length || 0) + (a.functions?.length || 0);\n        const aWithDocs = (a.classes?.filter(c => c.has_docstring).length || 0) + \n                        (a.functions?.filter(f => f.has_docstring).length || 0);\n        const aPercentage = aTotal > 0 ? (aWithDocs / aTotal) : 1;\n        \n        const bTotal = (b.classes?.length || 0) + (b.functions?.length || 0);\n        const bWithDocs = (b.classes?.filter(c => c.has_docstring).length || 0) + \n                        (b.functions?.filter(f => f.has_docstring).length || 0);\n        const bPercentage = bTotal > 0 ? (bWithDocs / bTotal) : 1;\n        \n        return aPercentage - bPercentage;\n    });\n    \n    // Add rows for each file\n    sortedFiles.forEach(file => {\n        const classes = file.classes || [];\n        const functions = file.functions || [];\n        const classesWithDocs = classes.filter(c => c.has_docstring).length;\n        const functionsWithDocs = functions.filter(f => f.has_docstring).length;\n        const totalInFile = classes.length + functions.length;\n        const totalWithDocsInFile = classesWithDocs + functionsWithDocs;\n        const fileCompleteness = totalInFile > 0 ? Math.round((totalWithDocsInFile / totalInFile) * 100) : 100;\n        \n        // Determine the row color based on completeness\n        let rowClass = '';\n        if (fileCompleteness === 100) {\n            rowClass = 'table-success';\n        } else if (fileCompleteness >= 50) {\n            rowClass = 'table-warning';\n        } else {\n            rowClass = 'table-danger';\n        }\n        \n        html += `\n            <tr class=\"${rowClass}\">\n                <td><small>${file.file.split('/').pop()}</small></td>\n                <td class=\"text-center\"><small>${classesWithDocs}/${classes.length}</small></td>\n                <td class=\"text-center\"><small>${functionsWithDocs}/${functions.length}</small></td>\n                <td>\n                    <div class=\"progress progress-bar-mini\">\n                        <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: ${fileCompleteness}%;\" aria-valuenow=\"${fileCompleteness}\" aria-valuemin=\"0\" aria-valuemax=\"100\"></div>\n                    </div>\n                    <small class=\"d-block text-end\">${fileCompleteness}%</small>\n                </td>\n            </tr>\n        `;\n    });\n    \n    html += `\n                </tbody>\n            </table>\n        </div>\n    `;\n    \n    // Update the completeness data container\n    $('#completeness-data').html(html);\n} "
  },
  {
    "path": "src/web/static/js/config.js",
    "content": "// Copyright (c) Meta Platforms, Inc. and affiliates\n/**\n * Configuration handling for the docstring generation web application.\n * \n * This file provides functions for loading and saving configuration for the\n * docstring generation process.\n */\n\n/**\n * Load the default configuration from the server.\n */\nfunction loadDefaultConfig() {\n    $.ajax({\n        url: '/api/default_config',\n        type: 'GET',\n        success: function(config) {\n            applyConfigToForm(config);\n        },\n        error: function(xhr, status, error) {\n            console.error('Error loading default configuration:', error);\n            showMessage('warning', 'Failed to load default configuration. Using fallback values.');\n        }\n    });\n}\n\n/**\n * Apply a configuration object to the form inputs.\n * \n * @param {Object} config - The configuration object to apply\n */\nfunction applyConfigToForm(config) {\n    // Set LLM configuration\n    if (config.llm) {\n        $('#llm-type').val(config.llm.type || 'claude');\n        $('#llm-api-key').val(config.llm.api_key || '');\n        $('#llm-model').val(config.llm.model || 'claude-3-5-haiku-latest');\n        $('#llm-temperature').val(config.llm.temperature || 0.1);\n        $('#llm-max-tokens').val(config.llm.max_tokens || 4096);\n    }\n    \n    // Set flow control configuration\n    if (config.flow_control) {\n        $('#max-reader-search-attempts').val(config.flow_control.max_reader_search_attempts || 2);\n        $('#max-verifier-rejections').val(config.flow_control.max_verifier_rejections || 1);\n        $('#status-sleep-time').val(config.flow_control.status_sleep_time || 1);\n    }\n    \n    // Set docstring options\n    if (config.docstring_options) {\n        $('#overwrite-docstrings').prop('checked', config.docstring_options.overwrite_docstrings || false);\n    }\n}\n\n/**\n * Build a configuration object from the form inputs.\n * \n * @returns {Object} The configuration object\n */\nfunction buildConfigFromForm() {\n    return {\n        llm: {\n            type: $('#llm-type').val(),\n            api_key: $('#llm-api-key').val(),\n            model: $('#llm-model').val(),\n            temperature: parseFloat($('#llm-temperature').val()),\n            max_tokens: parseInt($('#llm-max-tokens').val())\n        },\n        flow_control: {\n            max_reader_search_attempts: parseInt($('#max-reader-search-attempts').val()),\n            max_verifier_rejections: parseInt($('#max-verifier-rejections').val()),\n            status_sleep_time: parseFloat($('#status-sleep-time').val())\n        },\n        docstring_options: {\n            overwrite_docstrings: $('#overwrite-docstrings').is(':checked')\n        }\n    };\n} "
  },
  {
    "path": "src/web/static/js/log-handler.js",
    "content": "// Copyright (c) Meta Platforms, Inc. and affiliates\n/**\n * Log message handler for the docstring generation web application.\n * \n * This file provides functions for displaying and managing log messages\n * in the web interface.\n */\n\n// Maximum number of log lines to keep in the UI\nconst MAX_LOG_LINES = 5000;\n\n/**\n * Add a log message to the log container.\n * \n * @param {string} level - The log level (info, warning, error, debug)\n * @param {string} message - The log message to display\n */\nfunction addLogMessage(level, message) {\n    // Create a CSS class based on the log level\n    let logClass = 'log-info';\n    switch (level.toLowerCase()) {\n        case 'warning':\n        case 'warn':\n            logClass = 'log-warning';\n            break;\n        case 'error':\n        case 'critical':\n            logClass = 'log-error';\n            break;\n        case 'debug':\n            logClass = 'log-debug';\n            break;\n    }\n    \n    // Create the log line element\n    const logLine = $(`<div class=\"log-line ${logClass}\"></div>`);\n    logLine.text(message);\n    \n    // Add the log line to the log content\n    $('#log-content').append(logLine);\n    \n    // Trim log lines if necessary\n    const logLines = $('#log-content .log-line');\n    if (logLines.length > MAX_LOG_LINES) {\n        // Remove the oldest lines\n        logLines.slice(0, logLines.length - MAX_LOG_LINES).remove();\n    }\n    \n    // Scroll to the bottom of the log container\n    const logContainer = $('#log-container');\n    logContainer.scrollTop(logContainer[0].scrollHeight);\n} "
  },
  {
    "path": "src/web/static/js/main.js",
    "content": "// Copyright (c) Meta Platforms, Inc. and affiliates\n/**\n * Main JavaScript for the docstring generation web application.\n * \n * This file provides the main functionality for the web interface, including\n * event handling, configuration, and communication with the server.\n */\n\n// Global state variables\nlet socket = null;\nlet processRunning = false;\nlet startTime = 0;\nlet timerInterval = null;\nlet apiTestModal = null;\n\n// Document ready handler\n$(document).ready(function() {\n    // Load default configuration\n    loadDefaultConfig();\n    \n    // Set up form submission handler\n    $('#config-form').on('submit', function(e) {\n        e.preventDefault();\n        startGeneration();\n    });\n    \n    // Set up test API button handler\n    $('#test-api-button').on('click', function() {\n        testApiConnection();\n    });\n    \n    // Initialize the API test modal\n    apiTestModal = new bootstrap.Modal(document.getElementById('api-test-modal'));\n    \n    // Check if a process is already running\n    checkProcessStatus();\n    \n    // Initialize the agent workflow visualization\n    initAgentWorkflow();\n    \n    // Handle window resize\n    $(window).on('resize', function() {\n        initAgentWorkflow();\n    });\n});\n\n/**\n * Test the API connection with the configured settings.\n */\nfunction testApiConnection() {\n    // Show the modal\n    apiTestModal.show();\n    \n    // Set the modal content to loading state\n    $('#api-test-result').html(`\n        <div class=\"text-center\">\n            <div class=\"spinner-border text-primary\" role=\"status\">\n                <span class=\"visually-hidden\">Testing API...</span>\n            </div>\n            <p class=\"mt-2\">Testing API connection...</p>\n        </div>\n    `);\n    \n    // Get the API configuration\n    const config = {\n        llm_type: $('#llm-type').val(),\n        api_key: $('#llm-api-key').val(),\n        model: $('#llm-model').val()\n    };\n    \n    // Send a test request to the server\n    $.ajax({\n        url: '/api/test_api',\n        type: 'POST',\n        contentType: 'application/json',\n        data: JSON.stringify(config),\n        success: function(response) {\n            if (response.status === 'success') {\n                $('#api-test-result').html(`\n                    <div class=\"alert alert-success\">\n                        <h5><i class=\"fas fa-check-circle\"></i> API Connection Successful</h5>\n                        <p>${response.message || 'The API connection is working correctly.'}</p>\n                        <hr>\n                        <div class=\"card p-2 bg-light\">\n                            <small class=\"text-muted\">Response from model:</small>\n                            <p class=\"mb-0\">${response.model_response || 'No response provided.'}</p>\n                        </div>\n                    </div>\n                `);\n            } else {\n                $('#api-test-result').html(`\n                    <div class=\"alert alert-danger\">\n                        <h5><i class=\"fas fa-exclamation-circle\"></i> API Connection Failed</h5>\n                        <p>${response.message || 'Failed to connect to the API.'}</p>\n                        <hr>\n                        <p class=\"mb-0\">Please check your API key and other settings.</p>\n                    </div>\n                `);\n            }\n        },\n        error: function(xhr, status, error) {\n            $('#api-test-result').html(`\n                <div class=\"alert alert-danger\">\n                    <h5><i class=\"fas fa-exclamation-circle\"></i> API Connection Failed</h5>\n                    <p>Error: ${error}</p>\n                    <hr>\n                    <p class=\"mb-0\">Please check your API key and other settings.</p>\n                </div>\n            `);\n        }\n    });\n}\n\n/**\n * Check if a process is already running.\n */\nfunction checkProcessStatus() {\n    $.ajax({\n        url: '/api/status',\n        type: 'GET',\n        success: function(data) {\n            processRunning = data.is_running;\n            \n            if (processRunning) {\n                // Process is running, switch to the running view\n                showRunningView();\n                \n                // Connect to Socket.IO\n                setupSocketHandlers();\n                \n                // Start the timer\n                startTimer();\n                \n                // Load completeness data initially\n                loadCompletenessData();\n            } else {\n                // Show the configuration view\n                showConfigView();\n            }\n        },\n        error: function(xhr, status, error) {\n            console.error('Error checking process status:', error);\n            showMessage('error', 'Error checking process status: ' + error);\n        }\n    });\n}\n\n/**\n * Set up Socket.IO event handlers.\n */\nfunction setupSocketHandlers() {\n    // Create Socket.IO connection if it doesn't exist\n    if (!socket) {\n        socket = io();\n        \n        // Status update handler\n        socket.on('status_update', function(data) {\n            console.log('Status update received:', data);\n            \n            if (data.status) {\n                updateStatusVisualizer(data.status);\n            }\n            \n            if (data.repo_structure) {\n                updateRepoStructure(data.repo_structure);\n            }\n        });\n        \n        // Log message handler\n        socket.on('log_message', function(data) {\n            addLogMessage(data.level, data.message);\n            \n            // If this is a docstring generation success message, refresh completeness\n            if (data.message && (\n                data.message.includes('Successfully updated docstring for') || \n                data.message.includes('Completed docstring generation for')\n            )) {\n                // Wait a brief moment for file changes to be detected\n                setTimeout(loadCompletenessData, 500);\n            }\n        });\n        \n        // Raw log message handler (for system prints)\n        socket.on('log_line', function(data) {\n            addLogMessage('info', data);\n            \n            // Check if this is a message about docstring generation\n            if (typeof data === 'string' && (\n                data.includes('Successfully updated docstring') ||\n                data.includes('Completed docstring generation')\n            )) {\n                // Refresh the completeness data\n                setTimeout(loadCompletenessData, 500);\n            }\n        });\n        \n        // Error handler\n        socket.on('error', function(data) {\n            addLogMessage('error', data.message);\n            showMessage('error', data.message);\n        });\n        \n        // Completion handler\n        socket.on('complete', function(data) {\n            processRunning = false;\n            $('#start-button').prop('disabled', false).text('Start Generation');\n            addLogMessage('info', data.message);\n            showMessage('success', 'Docstring generation completed');\n            stopTimer();\n            \n            // Final completeness refresh\n            loadCompletenessData();\n        });\n        \n        // Disconnection handler\n        socket.on('disconnect', function() {\n            addLogMessage('warning', 'Connection to server lost');\n        });\n    }\n}\n\n/**\n * Start the docstring generation process.\n */\nfunction startGeneration() {\n    if (processRunning) {\n        showMessage('warning', 'Generation already in progress');\n        return;\n    }\n    \n    // Get the repository path\n    const repoPath = $('#repo-path').val();\n    if (!repoPath) {\n        showMessage('error', 'Please enter a repository path');\n        return;\n    }\n    \n    // Disable the start button\n    $('#start-button').prop('disabled', true).text('Starting...');\n    \n    // Get the configuration\n    const config = buildConfigFromForm();\n    \n    // Clear log content\n    $('#log-content').empty();\n    \n    // Send the request to start generation\n    $.ajax({\n        url: '/api/start',\n        type: 'POST',\n        contentType: 'application/json',\n        data: JSON.stringify({\n            repo_path: repoPath,\n            config: config\n        }),\n        success: function(data) {\n            if (data.status === 'success') {\n                // Mark as running\n                processRunning = true;\n                \n                // Show the running view\n                showRunningView();\n                \n                // Connect to Socket.IO\n                setupSocketHandlers();\n                \n                // Start the timer\n                startTimer();\n                \n                // Make the completeness section visible and load initial data\n                $('#completeness-section').removeClass('d-none');\n                loadCompletenessData();\n                \n                // Show success message\n                showMessage('success', data.message);\n            } else {\n                // Show error message\n                showMessage('error', data.message);\n                $('#start-button').prop('disabled', false).text('Start Generation');\n            }\n        },\n        error: function(xhr, status, error) {\n            showMessage('error', 'Error starting generation: ' + error);\n            $('#start-button').prop('disabled', false).text('Start Generation');\n        }\n    });\n}\n\n/**\n * Stop the docstring generation process.\n */\nfunction stopGeneration() {\n    if (!processRunning) {\n        showMessage('warning', 'No generation in progress');\n        return;\n    }\n    \n    // Confirm stop\n    if (!confirm('Are you sure you want to stop the docstring generation process?')) {\n        return;\n    }\n    \n    // Send the request to stop generation\n    $.ajax({\n        url: '/api/stop',\n        type: 'POST',\n        success: function(data) {\n            if (data.status === 'success') {\n                processRunning = false;\n                $('#start-button').prop('disabled', false).text('Start Generation');\n                showMessage('success', data.message);\n                stopTimer();\n                \n                // Add log message\n                addLogMessage('warning', 'Generation process stopped by user');\n            } else {\n                showMessage('error', data.message);\n            }\n        },\n        error: function(xhr, status, error) {\n            showMessage('error', 'Error stopping generation: ' + error);\n        }\n    });\n}\n\n/**\n * Show the configuration view.\n */\nfunction showConfigView() {\n    $('#main-content').addClass('d-none');\n    $('#sidebar').removeClass('col-md-3').addClass('col-md-12');\n    $('#config-section').removeClass('d-none');\n    $('#completeness-section').addClass('d-none');\n}\n\n/**\n * Show the running view.\n */\nfunction showRunningView() {\n    $('#config-section').addClass('d-none');\n    $('#completeness-section').removeClass('d-none');\n    $('#sidebar').removeClass('col-md-12').addClass('col-md-3');\n    $('#main-content').removeClass('d-none');\n    \n    // Make sure the agent workflow is initialized\n    setTimeout(function() {\n        initAgentWorkflow();\n    }, 100);\n    \n    // Add a stop button to the header\n    if ($('#stop-button').length === 0) {\n        $('header').append(`\n            <button id=\"stop-button\" class=\"btn btn-danger btn-sm position-absolute\" style=\"right: 1rem; top: 1rem;\">\n                <i class=\"fas fa-stop\"></i> Stop Generation\n            </button>\n        `);\n        \n        // Add click handler\n        $('#stop-button').on('click', function() {\n            stopGeneration();\n        });\n    }\n}\n\n/**\n * Show a message to the user.\n * \n * @param {string} type - The type of message (success, error, warning, info)\n * @param {string} message - The message to show\n */\nfunction showMessage(type, message) {\n    // Create alert if it doesn't exist\n    if ($('#alert-container').length === 0) {\n        $('body').append(`\n            <div id=\"alert-container\" style=\"position: fixed; top: 20px; right: 20px; z-index: 9999;\"></div>\n        `);\n    }\n    \n    // Create a unique ID for the alert\n    const id = 'alert-' + Date.now();\n    \n    // Add the alert to the container\n    $('#alert-container').append(`\n        <div id=\"${id}\" class=\"alert alert-${type} alert-dismissible fade show\" role=\"alert\">\n            ${message}\n            <button type=\"button\" class=\"btn-close\" data-bs-dismiss=\"alert\" aria-label=\"Close\"></button>\n        </div>\n    `);\n    \n    // Automatically remove the alert after 5 seconds\n    setTimeout(() => {\n        $(`#${id}`).alert('close');\n    }, 5000);\n}\n\n/**\n * Start the timer.\n */\nfunction startTimer() {\n    // Set the start time\n    startTime = Date.now();\n    \n    // Clear any existing timer\n    if (timerInterval) {\n        clearInterval(timerInterval);\n    }\n    \n    // Update every second\n    timerInterval = setInterval(() => {\n        const elapsedSeconds = Math.floor((Date.now() - startTime) / 1000);\n        const minutes = Math.floor(elapsedSeconds / 60);\n        const seconds = elapsedSeconds % 60;\n        \n        // Format as MM:SS\n        const formattedTime = `${minutes.toString().padStart(2, '0')}:${seconds.toString().padStart(2, '0')}`;\n        \n        // Update the display\n        $('#progress-time').text(`Elapsed: ${formattedTime}`);\n    }, 1000);\n}\n\n/**\n * Stop the timer.\n */\nfunction stopTimer() {\n    if (timerInterval) {\n        clearInterval(timerInterval);\n        timerInterval = null;\n    }\n}\n\n/**\n * Load completeness data from the server.\n */\nfunction loadCompletenessData() {\n    // Only load data if the completeness section is visible\n    if ($('#completeness-section').hasClass('d-none')) {\n        return;\n    }\n    \n    $.ajax({\n        url: '/api/completeness',\n        type: 'GET',\n        success: function(response) {\n            if (response.status === 'success' && response.data) {\n                updateCompletenessView(response.data);\n            } else {\n                $('#completeness-data').html(`\n                    <div class=\"alert alert-warning mb-0\">\n                        ${response.message || 'Failed to load completeness data'}\n                    </div>\n                `);\n            }\n        },\n        error: function(xhr, status, error) {\n            console.error('Error loading completeness data:', error);\n            $('#completeness-data').html(`\n                <div class=\"alert alert-danger mb-0\">\n                    Error loading completeness data: ${error}\n                </div>\n            `);\n        }\n    });\n} "
  },
  {
    "path": "src/web/static/js/repo-structure.js",
    "content": "// Copyright (c) Meta Platforms, Inc. and affiliates\n/**\n * Repository structure visualization for the docstring generation web application.\n * \n * This file provides functions for rendering and updating the repository structure\n * visualization using D3.js.\n */\n\n// Store the current repository structure\nlet currentRepoStructure = null;\n\n// Keep track of the current focus path\nlet currentFocusPath = null;\n\n// D3 visualization settings\nconst margin = { top: 20, right: 20, bottom: 20, left: 20 };\nlet width = 600;\nlet height = 500;\nlet nodeRadius = 7;\nlet maxLabelLength = 20;\n\n/**\n * Update the repository structure visualization.\n * \n * @param {Object} repoStructure - The repository structure object from the server\n */\nfunction updateRepoStructure(repoStructure) {\n    // If there's no repo structure, show placeholder\n    if (!repoStructure || !repoStructure.tree || Object.keys(repoStructure.tree).length === 0) {\n        $('#repo-structure').html(`\n            <div class=\"text-center py-4\">\n                <p>No repository structure available</p>\n            </div>\n        `);\n        return;\n    }\n    \n    // Store the previous focus path\n    const prevFocusPath = currentFocusPath;\n    \n    // Update the current state\n    currentRepoStructure = repoStructure;\n    currentFocusPath = repoStructure.focus_path;\n    \n    // Update dimensions based on container size\n    const container = document.getElementById('repo-structure');\n    width = container.clientWidth - margin.left - margin.right;\n    height = container.clientHeight - margin.top - margin.bottom;\n    \n    // Clear existing visualization\n    $('#repo-structure').empty();\n    \n    // Create SVG container\n    const svg = d3.select('#repo-structure')\n        .append('svg')\n        .attr('width', width + margin.left + margin.right)\n        .attr('height', height + margin.top + margin.bottom)\n        .append('g')\n        .attr('transform', `translate(${margin.left},${margin.top})`);\n    \n    // Create hierarchy from the data\n    const root = d3.hierarchy(repoStructure.tree);\n    \n    // Set node size based on number of nodes to avoid overlapping\n    const nodeCount = root.descendants().length;\n    const dynamicRadius = Math.max(3, Math.min(7, 10 - Math.log(nodeCount)));\n    nodeRadius = dynamicRadius;\n    \n    // Create tree layout\n    const treeLayout = d3.tree()\n        .size([height, width - 160]);\n    \n    // Compute the tree layout\n    treeLayout(root);\n    \n    // Add links between nodes\n    svg.selectAll('.link')\n        .data(root.links())\n        .enter()\n        .append('path')\n        .attr('class', 'link')\n        .attr('d', d => {\n            return `M${d.source.y},${d.source.x}\n                   C${(d.source.y + d.target.y) / 2},${d.source.x}\n                    ${(d.source.y + d.target.y) / 2},${d.target.x}\n                    ${d.target.y},${d.target.x}`;\n        })\n        .attr('fill', 'none')\n        .attr('stroke', '#ccc')\n        .attr('stroke-width', 1.5);\n    \n    // Add nodes\n    const nodes = svg.selectAll('.node')\n        .data(root.descendants())\n        .enter()\n        .append('g')\n        .attr('class', 'node')\n        .attr('transform', d => `translate(${d.y},${d.x})`)\n        .attr('id', d => `node-${d.data.path.replace(/[\\/\\.]/g, '_')}`); // Add ID for easier selection\n    \n    // Add node circles\n    nodes.append('circle')\n        .attr('r', nodeRadius)\n        .attr('class', d => {\n            let classes = 'repo-node ';\n            \n            // Add status class\n            if (d.data.type === 'file') {\n                classes += `repo-node-${d.data.status || 'not-started'}`;\n            } else {\n                // For directories, determine status based on children\n                const hasCompleteChildren = d.descendants().slice(1).some(node => \n                    node.data.type === 'file' && node.data.status === 'complete');\n                const hasInProgressChildren = d.descendants().slice(1).some(node => \n                    node.data.type === 'file' && node.data.status === 'in_progress');\n                \n                if (hasCompleteChildren && !hasInProgressChildren) {\n                    classes += 'repo-node-complete';\n                } else if (hasInProgressChildren) {\n                    classes += 'repo-node-in-progress';\n                } else {\n                    classes += 'repo-node-not-started';\n                }\n            }\n            \n            // Add focus class if this is the focused node\n            if (d.data.path === currentFocusPath) {\n                classes += ' repo-node-focus';\n            }\n            \n            return classes;\n        })\n        .style('fill', d => {\n            if (d.data.type === 'dir') {\n                // Check children status for directory coloring\n                const completeCount = d.descendants().slice(1).filter(node => \n                    node.data.type === 'file' && node.data.status === 'complete').length;\n                const totalFiles = d.descendants().slice(1).filter(node => \n                    node.data.type === 'file').length;\n                const progress = totalFiles > 0 ? completeCount / totalFiles : 0;\n                \n                // Use color gradient based on completion percentage\n                if (progress === 1) return '#198754';  // All complete - green\n                if (progress > 0) return '#ffc107';    // Some complete - yellow\n                return '#6c757d';  // None complete - grey\n            } else {\n                // Colors for files based on status\n                return d.data.status === 'complete' ? '#198754' : \n                       d.data.status === 'in_progress' ? '#ffc107' : '#f8f9fa';\n            }\n        })\n        .style('stroke', d => d.data.path === currentFocusPath ? '#dc3545' : '#6c757d')\n        .style('stroke-width', d => d.data.path === currentFocusPath ? 2 : 1);\n    \n    // Add node labels\n    nodes.append('text')\n        .attr('dy', 3)\n        .attr('x', d => d.children ? -nodeRadius * 1.5 : nodeRadius * 1.5)\n        .attr('text-anchor', d => d.children ? 'end' : 'start')\n        .attr('class', 'repo-node-label')\n        .text(d => {\n            const name = d.data.name;\n            if (name.length > maxLabelLength) {\n                return name.substring(0, maxLabelLength - 3) + '...';\n            }\n            return name;\n        })\n        .append('title')  // Add tooltip with full name\n        .text(d => d.data.name);\n    \n    // Find the focused node if it exists\n    if (currentFocusPath) {\n        const focusedNode = root.descendants().find(d => d.data.path === currentFocusPath);\n        if (focusedNode) {\n            // If focus has changed, trigger the zoom animation\n            if (prevFocusPath !== currentFocusPath) {\n                zoomToNode(svg, focusedNode, width, height);\n            }\n        }\n    }\n}\n\n/**\n * Zoom to a specific node in the visualization.\n * \n * @param {Object} svg - The D3 SVG selection\n * @param {Object} node - The node to zoom to\n * @param {number} width - The width of the container\n * @param {number} height - The height of the container\n */\nfunction zoomToNode(svg, node, width, height) {\n    // Calculate the scale factor based on how deep the node is in the tree\n    const depth = node.depth;\n    const scale = Math.max(1, Math.min(2, 1 + depth * 0.2));\n    \n    // Calculate translation to center the node\n    const x = node.x;\n    const y = node.y;\n    const tx = width/2 - y * scale;\n    const ty = height/2 - x * scale;\n    \n    // Apply the zoom transformation\n    svg.transition()\n        .duration(750)\n        .attr('transform', `translate(${margin.left + tx},${margin.top + ty}) scale(${scale})`);\n    \n    // Add a highlight animation to the node\n    const nodeId = `#node-${node.data.path.replace(/[\\/\\.]/g, '_')} circle`;\n    d3.select(nodeId)\n        .classed('highlight-focus', true)\n        .transition()\n        .duration(750)\n        .on('end', function() {\n            d3.select(this).classed('highlight-focus', false);\n        });\n}\n\n/**\n * Update the status of a file in the repository structure.\n * \n * @param {string} file_path - The path of the file to update\n * @param {string} status - The new status (not_started, in_progress, complete)\n */\nfunction updateFileStatus(file_path, status) {\n    if (!currentRepoStructure) return;\n    \n    // Find the file in the tree\n    function updateNodeStatus(node) {\n        if (node.path === file_path) {\n            // Only update if the status is actually changing\n            if (node.status !== status) {\n                node.status = status;\n                return true;\n            }\n            return false;\n        }\n        \n        if (node.children) {\n            for (const child of node.children) {\n                if (updateNodeStatus(child)) {\n                    return true;\n                }\n            }\n        }\n        \n        return false;\n    }\n    \n    // Update the node status\n    if (updateNodeStatus(currentRepoStructure.tree)) {\n        // If the file status has changed, update the visualization\n        if (status === 'in_progress') {\n            currentRepoStructure.focus_path = file_path;\n        }\n        updateRepoStructure(currentRepoStructure);\n    }\n}\n\n// Initialize the visualization when the document is ready\n$(document).ready(function() {\n    // If we receive a docstring_updated event, update the repository structure\n    if (socket) {\n        socket.on('docstring_updated', function(data) {\n            if (data.component && currentRepoStructure) {\n                updateFileStatus(data.component, 'complete');\n            }\n        });\n    }\n});\n\n// Handle window resize to update visualization\n$(window).on('resize', function() {\n    if (currentRepoStructure) {\n        updateRepoStructure(currentRepoStructure);\n    }\n}); "
  },
  {
    "path": "src/web/static/js/status-visualizer.js",
    "content": "// Copyright (c) Meta Platforms, Inc. and affiliates\n/**\n * Status visualizer for the docstring generation web application.\n * \n * This file provides functions for rendering and updating the agent status\n * visualization in the web interface.\n */\n\n// Define the agent workflow structure\nconst agentWorkflow = {\n    nodes: [\n        { id: \"reader\", label: \"Reader\", x: 150, y: 80, isAgent: true },\n        { id: \"searcher\", label: \"Searcher\", x: 350, y: 80, isAgent: true },\n        { id: \"writer\", label: \"Writer\", x: 150, y: 200, isAgent: true },\n        { id: \"verifier\", label: \"Verifier\", x: 350, y: 200, isAgent: true }\n    ],\n    labels: [\n        { id: \"input\", label: \"Input\", x: 30, y: 140 },\n        { id: \"output\", label: \"Output\", x: 470, y: 140 }\n    ],\n    links: [\n        { source: \"input\", target: \"reader\" },\n        { source: \"reader\", target: \"searcher\" },\n        { source: \"searcher\", target: \"reader\" },\n        { source: \"reader\", target: \"writer\" },\n        { source: \"writer\", target: \"verifier\" },\n        { source: \"verifier\", target: \"output\" },\n        { source: \"verifier\", target: \"reader\" }\n    ]\n};\n\n// Keep track of the current active agent\nlet currentActiveAgent = null;\n\n// Initialize the agent workflow visualization\nfunction initAgentWorkflow() {\n    const container = document.getElementById('agent-workflow');\n    if (!container) return;\n\n    // Check if container is visible and has dimensions\n    const width = container.clientWidth || 600;\n    const height = container.clientHeight || 200;\n\n    // Clear any existing content\n    d3.select(container).selectAll(\"*\").remove();\n\n    // Create SVG container\n    const svg = d3.select(container)\n        .append(\"svg\")\n        .attr(\"width\", width)\n        .attr(\"height\", height)\n        .append(\"g\")\n        .attr(\"transform\", `translate(${Math.max(0, (width - 500) / 2)}, 0)`);\n\n    // Add arrowhead marker definition\n    svg.append(\"defs\").append(\"marker\")\n        .attr(\"id\", \"arrowhead\")\n        .attr(\"viewBox\", \"0 -5 10 10\")\n        .attr(\"refX\", 20)\n        .attr(\"refY\", 0)\n        .attr(\"markerWidth\", 6)\n        .attr(\"markerHeight\", 6)\n        .attr(\"orient\", \"auto\")\n        .append(\"path\")\n        .attr(\"d\", \"M0,-5L10,0L0,5\")\n        .attr(\"fill\", \"#adb5bd\");\n\n    // Helper function to get node coordinates by id\n    function getNodeCoords(id) {\n        const agentNode = agentWorkflow.nodes.find(n => n.id === id);\n        if (agentNode) return { x: agentNode.x, y: agentNode.y };\n        \n        const labelNode = agentWorkflow.labels.find(n => n.id === id);\n        if (labelNode) return { x: labelNode.x, y: labelNode.y };\n        \n        return null;\n    }\n\n    // Draw links\n    svg.selectAll(\".workflow-link\")\n        .data(agentWorkflow.links)\n        .enter()\n        .append(\"path\")\n        .attr(\"class\", \"workflow-link\")\n        .attr(\"d\", d => {\n            const source = getNodeCoords(d.source);\n            const target = getNodeCoords(d.target);\n            \n            if (!source || !target) return \"\";\n            \n            // Create curved paths\n            const dx = target.x - source.x;\n            const dy = target.y - source.y;\n            const dr = Math.sqrt(dx * dx + dy * dy) * 1.5;\n            \n            return `M${source.x},${source.y}A${dr},${dr} 0 0,1 ${target.x},${target.y}`;\n        });\n\n    // Draw agent nodes (circles)\n    const nodes = svg.selectAll(\".workflow-node\")\n        .data(agentWorkflow.nodes)\n        .enter()\n        .append(\"g\")\n        .attr(\"class\", d => `workflow-node ${d.id}`)\n        .attr(\"transform\", d => `translate(${d.x}, ${d.y})`);\n\n    // Add node circles for agents\n    nodes.append(\"circle\")\n        .attr(\"r\", 35);\n\n    // Add node labels for agents\n    nodes.append(\"text\")\n        .attr(\"class\", \"workflow-label\")\n        .attr(\"dy\", \".35em\")\n        .text(d => d.label);\n\n    // Add non-agent labels (input/output)\n    const textLabels = svg.selectAll(\".workflow-text\")\n        .data(agentWorkflow.labels)\n        .enter()\n        .append(\"g\")\n        .attr(\"class\", d => `workflow-text ${d.id}`)\n        .attr(\"transform\", d => `translate(${d.x}, ${d.y})`);\n    \n    // Add text for non-agent nodes\n    textLabels.append(\"text\")\n        .attr(\"class\", \"workflow-text-label\")\n        .attr(\"dy\", \".35em\")\n        .attr(\"text-anchor\", \"middle\")\n        .style(\"font-size\", \"14px\")\n        .style(\"font-weight\", \"bold\")\n        .style(\"fill\", \"#444\")\n        .text(d => d.label);\n\n    // Add event listeners to highlight nodes on hover\n    nodes.on(\"mouseover\", function() {\n        d3.select(this).style(\"opacity\", 0.8);\n    }).on(\"mouseout\", function() {\n        d3.select(this).style(\"opacity\", 1);\n    });\n    \n    // If we have a stored active agent, highlight it\n    if (currentActiveAgent) {\n        updateAgentWorkflow(currentActiveAgent);\n    }\n    \n    console.log(\"Agent workflow initialized with dimensions:\", width, \"x\", height);\n}\n\n// Ensure the workflow is initialized as soon as the document is ready\n$(document).ready(function() {\n    // Delay initialization slightly to ensure DOM is fully ready\n    setTimeout(initAgentWorkflow, 100);\n    \n    // Also handle window resize\n    $(window).on('resize', function() {\n        initAgentWorkflow();\n    });\n    \n    // Poll to ensure the graph is visible (workaround for tabs/containers that might be hidden initially)\n    let checkCount = 0;\n    const checkInterval = setInterval(function() {\n        const container = document.getElementById('agent-workflow');\n        if (container && container.clientWidth > 0 && container.clientHeight > 0) {\n            initAgentWorkflow();\n            clearInterval(checkInterval);\n        } else if (checkCount > 20) { // Stop after 20 attempts (10 seconds)\n            clearInterval(checkInterval);\n        }\n        checkCount++;\n    }, 500);\n});\n\n/**\n * Update the status visualizer with the current status.\n * \n * @param {Object} status - The status object from the server\n */\nfunction updateStatusVisualizer(status) {\n    console.log(\"Updating status visualizer with:\", status);\n    \n    // Update the agent workflow visualization\n    updateAgentWorkflow(status.active_agent);\n    \n    // If there's no active agent, show placeholder\n    if (!status.active_agent) {\n        $('#status-visualizer').html(`\n            <div class=\"text-center py-2\">\n                <p>No active agent</p>\n            </div>\n        `);\n        return;\n    }\n    \n    // Update component info and status message\n    let statusHtml = `<div class=\"text-center mb-2\">Processing with <strong>${status.active_agent}</strong></div>`;\n    \n    if (status.status_message) {\n        statusHtml += `<div class=\"alert alert-info py-2 mb-2\">${status.status_message}</div>`;\n    }\n    \n    if (status.current_component) {\n        statusHtml += `\n            <div class=\"component-info\">\n                <div><strong>Current Processing Component:</strong> ${status.current_component}</div>\n                <div class=\"text-muted mt-1\"><small>Current Processing File: ${status.current_file}</small></div>\n            </div>\n        `;\n    }\n    \n    $('#status-visualizer').html(statusHtml);\n}\n\n/**\n * Update the agent workflow visualization with the active agent.\n * \n * @param {string} activeAgent - The name of the active agent\n */\nfunction updateAgentWorkflow(activeAgent) {\n    // Store the active agent\n    currentActiveAgent = activeAgent;\n    \n    // Make sure the workflow is initialized \n    if ($('#agent-workflow svg').length === 0) {\n        initAgentWorkflow();\n        return; // The initialization will handle setting the active agent\n    }\n    \n    console.log(\"Updating agent workflow with active agent:\", activeAgent);\n    \n    // Remove active class from all nodes\n    d3.selectAll(\".workflow-node\").classed(\"active\", false);\n    \n    if (!activeAgent) {\n        return;\n    }\n    \n    // Skip non-agent entities\n    if (activeAgent.toLowerCase() === 'system' || \n        activeAgent.toLowerCase() === 'input' || \n        activeAgent.toLowerCase() === 'output') {\n        return;\n    }\n    \n    // Normalize the agent name to lowercase\n    const agentLower = activeAgent.toLowerCase();\n    \n    // Map certain agent names to our workflow nodes\n    let nodeId = null;\n    if (agentLower.includes('reader')) nodeId = 'reader';\n    else if (agentLower.includes('searcher')) nodeId = 'searcher';\n    else if (agentLower.includes('writer')) nodeId = 'writer';\n    else if (agentLower.includes('verifier')) nodeId = 'verifier';\n    \n    // Add active class to the current agent node\n    if (nodeId) {\n        const node = d3.select(`.workflow-node.${nodeId}`);\n        if (!node.empty()) {\n            node.classed(\"active\", true);\n            console.log(\"Activated node:\", nodeId);\n            // Briefly animate the node to draw attention\n            node.select(\"circle\")\n                .transition()\n                .duration(300)\n                .attr(\"r\", 40)\n                .transition()\n                .duration(300)\n                .attr(\"r\", 35);\n        } else {\n            console.warn(\"Could not find node for agent:\", activeAgent, \"mapped to:\", nodeId);\n        }\n    } else {\n        console.warn(\"Could not map agent name to a node:\", activeAgent);\n    }\n}\n\n/**\n * Update the progress information.\n * \n * @param {Object} progress - The progress object from the server\n */\nfunction updateProgress(progress) {\n    // Calculate percentage\n    const total = progress.total_components || 0;\n    const processed = progress.processed_components || 0;\n    const percentage = total > 0 ? Math.floor((processed / total) * 100) : 0;\n    \n    // Update progress bar\n    $('#progress-bar').css('width', `${percentage}%`);\n    $('#progress-bar').attr('aria-valuenow', percentage);\n    $('#progress-bar').text(`${percentage}%`);\n    \n    // Update progress text\n    $('#progress-text').text(`${processed}/${total} components processed`);\n} "
  },
  {
    "path": "src/web/templates/index.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    <title>DocAgent - Docstring Generation</title>\n    \n    <!-- Bootstrap CSS -->\n    <link rel=\"stylesheet\" href=\"https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css\">\n    \n    <!-- Font Awesome -->\n    <link rel=\"stylesheet\" href=\"https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css\">\n    \n    <!-- Custom CSS -->\n    <link rel=\"stylesheet\" href=\"{{ url_for('static', filename='css/style.css') }}\">\n</head>\n<body>\n    <div class=\"container-fluid vh-100 p-0 d-flex flex-column\">\n        <!-- Header -->\n        <header class=\"bg-dark text-white p-3 d-flex justify-content-between align-items-center\">\n            <h1 class=\"h3 mb-0\">DocAgent </h1>\n            <img src=\"{{ url_for('static', filename='assets/meta_logo_white.png') }}\" alt=\"Meta Logo\" class=\"header-logo\" height=\"30\">\n        </header>\n        \n        <div class=\"row flex-grow-1 m-0\">\n            <!-- Sidebar for configuration and completeness -->\n            <div id=\"sidebar\" class=\"col-md-3 bg-light p-3 sidebar\">\n                <div id=\"config-section\">\n                    <h2 class=\"h4 mb-3\">Configuration</h2>\n                    <form id=\"config-form\">\n                        <div class=\"mb-3\">\n                            <label for=\"repo-path\" class=\"form-label\">Repository Path</label>\n                            <input type=\"text\" class=\"form-control\" id=\"repo-path\" placeholder=\"e.g., data/raw_test_repo\">\n                        </div>\n                        \n                        <div class=\"mb-3\">\n                            <label class=\"form-label\">LLM Configuration</label>\n                            <div class=\"card p-2 mb-2\">\n                                <div class=\"mb-2\">\n                                    <label for=\"llm-type\" class=\"form-label\">Type</label>\n                                    <select class=\"form-select\" id=\"llm-type\">\n                                        <option value=\"claude\">Claude</option>\n                                        <option value=\"openai\">OpenAI</option>\n                                        <option value=\"huggingface\">HuggingFace</option>\n                                    </select>\n                                </div>\n                                <div class=\"mb-2\">\n                                    <label for=\"llm-api-key\" class=\"form-label\">API Key</label>\n                                    <input type=\"password\" class=\"form-control\" id=\"llm-api-key\">\n                                </div>\n                                <div class=\"mb-2\">\n                                    <label for=\"llm-model\" class=\"form-label\">Model</label>\n                                    <input type=\"text\" class=\"form-control\" id=\"llm-model\" placeholder=\"e.g., claude-3-5-haiku-latest\">\n                                </div>\n                                <div class=\"mb-2\">\n                                    <label for=\"llm-temperature\" class=\"form-label\">Temperature</label>\n                                    <input type=\"number\" class=\"form-control\" id=\"llm-temperature\" min=\"0\" max=\"1\" step=\"0.1\" value=\"0.1\">\n                                </div>\n                                <div class=\"mb-2\">\n                                    <label for=\"llm-max-tokens\" class=\"form-label\">Max Tokens</label>\n                                    <input type=\"number\" class=\"form-control\" id=\"llm-max-tokens\" value=\"4096\">\n                                </div>\n                                <div class=\"text-end\">\n                                    <button type=\"button\" class=\"btn btn-outline-primary btn-sm\" id=\"test-api-button\">\n                                        <i class=\"fas fa-vial\"></i> Test API\n                                    </button>\n                                </div>\n                            </div>\n                        </div>\n                        \n                        <div class=\"mb-3\">\n                            <label class=\"form-label\">Flow Control</label>\n                            <div class=\"card p-2 mb-2\">\n                                <div class=\"mb-2\">\n                                    <label for=\"max-reader-search-attempts\" class=\"form-label\">Max Reader Search Attempts</label>\n                                    <input type=\"number\" class=\"form-control\" id=\"max-reader-search-attempts\" value=\"2\">\n                                </div>\n                                <div class=\"mb-2\">\n                                    <label for=\"max-verifier-rejections\" class=\"form-label\">Max Verifier Rejections</label>\n                                    <input type=\"number\" class=\"form-control\" id=\"max-verifier-rejections\" value=\"1\">\n                                </div>\n                                <div class=\"mb-2\">\n                                    <label for=\"status-sleep-time\" class=\"form-label\">Status Sleep Time (s)</label>\n                                    <input type=\"number\" class=\"form-control\" id=\"status-sleep-time\" value=\"1\">\n                                </div>\n                            </div>\n                        </div>\n                        \n                        <div class=\"mb-3\">\n                            <label class=\"form-label\">Docstring Options</label>\n                            <div class=\"card p-2 mb-2\">\n                                <div class=\"form-check\">\n                                    <input class=\"form-check-input\" type=\"checkbox\" id=\"overwrite-docstrings\">\n                                    <label class=\"form-check-label\" for=\"overwrite-docstrings\">\n                                        Overwrite Existing Docstrings\n                                    </label>\n                                </div>\n                            </div>\n                        </div>\n                        \n                        <button type=\"submit\" class=\"btn btn-primary w-100\" id=\"start-button\">Start Generation</button>\n                    </form>\n                </div>\n                \n                <div id=\"completeness-section\" class=\"mt-4 d-none\">\n                    <h2 class=\"h4 mb-3\">Completeness</h2>\n                    <div id=\"completeness-data\" class=\"card p-3\">\n                        <div class=\"text-center\">\n                            <div class=\"spinner-border text-primary\" role=\"status\">\n                                <span class=\"visually-hidden\">Loading...</span>\n                            </div>\n                            <p class=\"mt-2\">Loading completeness data...</p>\n                        </div>\n                    </div>\n                </div>\n            </div>\n            \n            <!-- Main content -->\n            <div id=\"main-content\" class=\"col-md-9 p-0 d-none\">\n                <div class=\"row h-100 m-0\">\n                    <!-- Left Top: Status Visualizer -->\n                    <div class=\"col-md-6 p-2 h-50\">\n                        <div class=\"card h-100\">\n                            <div class=\"card-header bg-primary text-white\">\n                                Agent Status\n                            </div>\n                            <div class=\"card-body overflow-auto d-flex flex-column\">\n                                <div id=\"agent-workflow\" class=\"flex-grow-1\">\n                                    <!-- Agent workflow visualization will be rendered here -->\n                                </div>\n                                <div id=\"status-visualizer\" class=\"mt-2 p-2 border-top\">\n                                    <div class=\"text-center py-2\">\n                                        <p>No active agent</p>\n                                    </div>\n                                </div>\n                            </div>\n                        </div>\n                    </div>\n                    \n                    <!-- Right Top: Repository Structure -->\n                    <div class=\"col-md-6 p-2 h-50\">\n                        <div class=\"card h-100\">\n                            <div class=\"card-header bg-primary text-white\">\n                                Repository Structure\n                            </div>\n                            <div class=\"card-body overflow-auto\">\n                                <div id=\"repo-structure\">\n                                    <div class=\"text-center py-4\">\n                                        <p>No repository selected</p>\n                                    </div>\n                                </div>\n                            </div>\n                        </div>\n                    </div>\n                    \n                    <!-- Bottom: Logs -->\n                    <div class=\"col-md-12 p-2 h-50\">\n                        <div class=\"card h-100\">\n                            <div class=\"card-header bg-primary text-white d-flex justify-content-between align-items-center\">\n                                <span>Logs</span>\n                                <small id=\"progress-time\" class=\"text-white\">Elapsed: 00:00</small>\n                            </div>\n                            <div class=\"card-body d-flex flex-column\">\n                                <div id=\"log-container\" class=\"flex-grow-1 overflow-auto bg-dark text-light p-2 font-monospace\">\n                                    <div id=\"log-content\">\n                                        <!-- Log messages will be appended here -->\n                                    </div>\n                                </div>\n                            </div>\n                        </div>\n                    </div>\n                </div>\n            </div>\n        </div>\n    </div>\n\n    <!-- API Test Modal -->\n    <div class=\"modal fade\" id=\"api-test-modal\" tabindex=\"-1\" aria-labelledby=\"api-test-modal-label\" aria-hidden=\"true\">\n        <div class=\"modal-dialog\">\n            <div class=\"modal-content\">\n                <div class=\"modal-header\">\n                    <h5 class=\"modal-title\" id=\"api-test-modal-label\">API Test Result</h5>\n                    <button type=\"button\" class=\"btn-close\" data-bs-dismiss=\"modal\" aria-label=\"Close\"></button>\n                </div>\n                <div class=\"modal-body\" id=\"api-test-result\">\n                    <div class=\"text-center\">\n                        <div class=\"spinner-border text-primary\" role=\"status\">\n                            <span class=\"visually-hidden\">Testing API...</span>\n                        </div>\n                        <p class=\"mt-2\">Testing API connection...</p>\n                    </div>\n                </div>\n                <div class=\"modal-footer\">\n                    <button type=\"button\" class=\"btn btn-secondary\" data-bs-dismiss=\"modal\">Close</button>\n                </div>\n            </div>\n        </div>\n    </div>\n    \n    <!-- Bootstrap and jQuery -->\n    <script src=\"https://code.jquery.com/jquery-3.6.0.min.js\"></script>\n    <script src=\"https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/js/bootstrap.bundle.min.js\"></script>\n    \n    <!-- Socket.IO -->\n    <script src=\"https://cdn.socket.io/4.6.0/socket.io.min.js\"></script>\n    \n    <!-- D3.js for visualization -->\n    <script src=\"https://d3js.org/d3.v7.min.js\"></script>\n    \n    <!-- Custom JavaScript -->\n    <script src=\"{{ url_for('static', filename='js/config.js') }}\"></script>\n    <script src=\"{{ url_for('static', filename='js/status-visualizer.js') }}\"></script>\n    <script src=\"{{ url_for('static', filename='js/repo-structure.js') }}\"></script>\n    <script src=\"{{ url_for('static', filename='js/log-handler.js') }}\"></script>\n    <script src=\"{{ url_for('static', filename='js/completeness.js') }}\"></script>\n    <script src=\"{{ url_for('static', filename='js/main.js') }}\"></script>\n</body>\n</html> "
  },
  {
    "path": "src/web/visualization_handler.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nVisualization handler for the docstring generation web interface.\n\nThis module provides functions to collect and format data for visualization\nin the web interface, including status updates, progress tracking, and \nrepository structure visualization.\n\"\"\"\n\nimport os\nimport json\nimport sys\nimport subprocess\nfrom pathlib import Path\nfrom typing import Dict, List, Any\n\n# Singleton pattern to store current state\nclass VisualizationState:\n    \"\"\"Singleton class to store the current visualization state.\"\"\"\n    \n    _instance = None\n    \n    def __new__(cls):\n        if cls._instance is None:\n            cls._instance = super(VisualizationState, cls).__new__(cls)\n            cls._instance.status = {\n                'active_agent': None,\n                'status_message': '',\n                'current_component': '',\n                'current_file': ''\n            }\n            cls._instance.progress = {\n                'total_components': 0,\n                'processed_components': 0,\n                'current_component': '',\n                'component_status': {}\n            }\n            cls._instance.repo_structure = {\n                'tree': {},\n                'focus_path': ''\n            }\n            cls._instance.log_messages = []\n        return cls._instance\n\n# Initialize the state\nstate = VisualizationState()\n\ndef get_current_status():\n    \"\"\"\n    Get the current status of the docstring generation process.\n    \n    Returns:\n        Dictionary with the current status information\n    \"\"\"\n    return {\n        'status': state.status,\n        'progress': state.progress,\n        'repo_structure': state.repo_structure\n    }\n\ndef update_agent_status(active_agent: str, status_message: str):\n    \"\"\"\n    Update the current agent status.\n    \n    Args:\n        active_agent: The currently active agent (reader, searcher, writer, verifier)\n        status_message: Status message describing what the agent is doing\n    \"\"\"\n    state.status['active_agent'] = active_agent\n    state.status['status_message'] = status_message\n\ndef update_component_focus(component_path: str, file_path: str):\n    \"\"\"\n    Update the current component being processed.\n    \n    Args:\n        component_path: The path to the component being processed\n        file_path: The path to the file containing the component\n    \"\"\"\n    state.status['current_component'] = component_path\n    state.status['current_file'] = file_path\n    state.repo_structure['focus_path'] = file_path\n\ndef update_progress(total: int, processed: int, current: str, components_status: Dict[str, str]):\n    \"\"\"\n    Update the progress of the docstring generation process.\n    \n    Args:\n        total: Total number of components to process\n        processed: Number of components processed so far\n        current: The component currently being processed\n        components_status: Dictionary mapping component paths to their status\n    \"\"\"\n    state.progress['total_components'] = total\n    state.progress['processed_components'] = processed\n    state.progress['current_component'] = current\n    state.progress['component_status'] = components_status\n\ndef add_log_message(message: str):\n    \"\"\"\n    Add a log message to the visualization state.\n    \n    Args:\n        message: The log message to add\n    \"\"\"\n    state.log_messages.append(message)\n    # Keep only the latest 1000 messages\n    if len(state.log_messages) > 1000:\n        state.log_messages = state.log_messages[-1000:]\n\ndef get_repo_structure(repo_path: str) -> Dict[str, Any]:\n    \"\"\"\n    Get the structure of the repository as a tree.\n    \n    Args:\n        repo_path: Path to the repository\n        \n    Returns:\n        Dictionary representing the repository structure\n    \"\"\"\n    tree = {'name': os.path.basename(repo_path), 'path': repo_path, 'type': 'dir', 'children': []}\n    \n    def build_tree(path, node):\n        \"\"\"Recursively build the tree structure.\"\"\"\n        for item in os.listdir(path):\n            item_path = os.path.join(path, item)\n            \n            # Skip hidden files and directories\n            if item.startswith('.'):\n                continue\n                \n            # Skip __pycache__ and other common non-Python directories\n            if item in ['__pycache__', 'venv', 'env', '.git', '.idea', '.vscode']:\n                continue\n                \n            if os.path.isdir(item_path):\n                child = {'name': item, 'path': item_path, 'type': 'dir', 'children': []}\n                build_tree(item_path, child)\n                node['children'].append(child)\n            elif item.endswith('.py'):\n                node['children'].append({\n                    'name': item,\n                    'path': item_path,\n                    'type': 'file',\n                    'status': 'not_started'  # Possible values: not_started, in_progress, complete\n                })\n    \n    try:\n        build_tree(repo_path, tree)\n    except Exception as e:\n        print(f\"Error building repo structure: {e}\")\n    \n    state.repo_structure['tree'] = tree\n    return tree\n\ndef update_file_status(file_path: str, status: str):\n    \"\"\"\n    Update the status of a file in the repository structure.\n    \n    Args:\n        file_path: Path to the file\n        status: New status of the file (not_started, in_progress, complete)\n    \"\"\"\n    def update_status(node):\n        \"\"\"Recursively update the status of the file in the tree.\"\"\"\n        if node['type'] == 'file' and node['path'] == file_path:\n            node['status'] = status\n            return True\n            \n        if node['type'] == 'dir' and 'children' in node:\n            for child in node['children']:\n                if update_status(child):\n                    return True\n        \n        return False\n    \n    update_status(state.repo_structure['tree'])\n\ndef get_completeness_data(repo_path: str) -> Dict[str, Any]:\n    \"\"\"\n    Get the completeness evaluation data for the repository.\n    \n    Args:\n        repo_path: Path to the repository\n        \n    Returns:\n        Dictionary containing the completeness evaluation results\n    \"\"\"\n    try:\n        # Run the eval_completeness.py script to get the results\n        eval_script_path = Path(__file__).parent.parent.parent / 'eval_completeness.py'\n        \n        if not eval_script_path.exists():\n            return {\n                'status': 'error',\n                'message': f'Evaluation script not found at {eval_script_path}'\n            }\n        \n        # Create a simplified mock result for testing or when the script fails\n        mock_results = {\n            'status': 'success',\n            'files': []\n        }\n        \n        # Get Python files in the repository\n        all_python_files = []\n        for root, _, files in os.walk(repo_path):\n            for file in files:\n                if file.endswith('.py'):\n                    file_path = os.path.join(root, file)\n                    rel_path = os.path.relpath(file_path, repo_path)\n                    \n                    # Count functions and classes with simple parsing\n                    with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:\n                        content = f.read()\n                    \n                    # Simple counting of functions and classes\n                    functions = []\n                    classes = []\n                    \n                    function_count = content.count('def ')\n                    class_count = content.count('class ')\n                    \n                    # Simple docstring check (very basic)\n                    doc_count = content.count('\"\"\"') // 2  # Rough estimate\n                    \n                    # Create mock function and class objects\n                    for i in range(function_count):\n                        has_doc = i < doc_count\n                        functions.append({\n                            'name': f'function_{i}',\n                            'has_docstring': has_doc\n                        })\n                    \n                    for i in range(class_count):\n                        has_doc = i < (doc_count - function_count if doc_count > function_count else 0)\n                        classes.append({\n                            'name': f'class_{i}',\n                            'has_docstring': has_doc\n                        })\n                    \n                    mock_results['files'].append({\n                        'file': rel_path,\n                        'functions': functions,\n                        'classes': classes\n                    })\n        \n        # Try to run the actual script\n        try:\n            cmd = [sys.executable, str(eval_script_path), '--repo-path', repo_path]\n            result = subprocess.run(\n                cmd,\n                capture_output=True,\n                text=True,\n                timeout=30  # Add timeout to prevent hanging\n            )\n            \n            if result.returncode == 0 and result.stdout.strip():\n                try:\n                    data = json.loads(result.stdout)\n                    if 'files' in data and isinstance(data['files'], list):\n                        return {\n                            'status': 'success',\n                            'data': data\n                        }\n                except json.JSONDecodeError:\n                    pass  # Fall back to mock data\n            \n            # If script execution fails, use mock data but log the error\n            print(f\"Warning: Using mock completeness data. Script error: {result.stderr}\")\n            return {\n                'status': 'success',\n                'data': mock_results\n            }\n            \n        except (subprocess.TimeoutExpired, subprocess.SubprocessError) as e:\n            print(f\"Error running completeness script: {e}\")\n            # Fall back to mock data\n            return {\n                'status': 'success',\n                'data': mock_results\n            }\n    \n    except Exception as e:\n        print(f\"Error evaluating completeness: {e}\")\n        return {\n            'status': 'error',\n            'message': f'Error evaluating completeness: {str(e)}'\n        } "
  },
  {
    "path": "src/web_eval/README.md",
    "content": "# DocAgent - Docstring Evaluation System\n\nA web application for evaluating the quality of Python docstrings in your codebase, providing objective metrics and actionable feedback.\n\n\n## Overview\n\nDocAgentis a powerful tool that analyzes Python docstrings in a repository and evaluates them based on two key metrics:\n\n1. **Completeness**: Automatically checks if docstrings contain all required components (summary, description, arguments, returns, etc.)\n2. **Helpfulness**: Uses LLM-based evaluation to assess how helpful and informative each docstring component is on a scale of 1-5\n\nThe system provides an intuitive web interface for configuring evaluation settings, viewing results, and getting actionable feedback to improve your codebase documentation.\n\n## Features\n\n- **Configuration Interface**: User-friendly setup for LLM API (OpenAI or Claude) and repository path\n- **API Connection Testing**: Verify API credentials before running evaluations\n- **Automated Completeness Evaluation**: Scan all Python files in a repository to check for required docstring components\n- **Interactive Results Dashboard**: View completeness scores for all classes and functions with detailed breakdowns\n- **On-demand Helpfulness Assessment**: Use LLM-powered evaluation for specific docstring components\n- **Visual Status Indicators**: Clear visual feedback for required vs. optional components and their quality\n- **Component-specific Evaluations**: Different criteria for evaluating summaries, descriptions, parameters, etc.\n- **Refresh Functionality**: Re-run evaluation after making code changes\n- **Detailed Explanations**: Get specific feedback on why a component received its score and how to improve it\n\n## System Architecture\n\nDocAgent's web evaluation system consists of several key components:\n\n```\nsrc/web_eval/\n│\n├── app.py                     # Main Flask application \n├── helpers.py                 # Utility functions (parsing, extraction, etc.)\n├── requirements.txt           # Python dependencies\n├── start_server.sh            # Convenience script for starting the server\n├── test_docstring_parser.py   # Tests for the docstring parser\n│\n├── templates/                 # HTML templates\n│   ├── index.html             # Configuration page\n│   └── results.html           # Results display page\n│\n└── static/                    # Static assets\n    ├── css/                   # CSS stylesheets\n    ├── js/                    # JavaScript files\n    └── assets/                # Images and other assets\n```\n\nThe system follows a Model-View-Controller architecture:\n\n- **Model**: Evaluation logic in the imported evaluator modules and parsing functions in helpers.py\n- **View**: HTML templates with Jinja2 for rendering the UI\n- **Controller**: Flask routes in app.py that handle requests and connect the model with views\n\nThe application integrates with two key external components:\n\n1. **DocAgent Evaluator Modules**: Core evaluation logic for assessing docstring quality\n2. **LLM APIs**: OpenAI or Anthropic Claude for helpfulness evaluation\n\n"
  },
  {
    "path": "src/web_eval/app.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nimport os\nimport sys\nimport ast\nimport json\nimport argparse\nfrom flask import Flask, render_template, request, jsonify, redirect, url_for\nfrom typing import Dict, Any, List\n\n# Add parent directory to path to import from src\nsys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))\n\n# Import evaluation modules\nfrom evaluator.completeness import ClassCompletenessEvaluator, FunctionCompletenessEvaluator\nfrom evaluator.helpfulness_summary import DocstringSummaryEvaluator\nfrom evaluator.helpfulness_description import DocstringDescriptionEvaluator\n# from evaluator.helpfulness_arguments import DocstringArgumentEvaluator\nfrom evaluator.helpfulness_parameters import DocstringParametersEvaluator\nfrom evaluator.helpfulness_attributes import DocstringAttributeEvaluator\n# from evaluator.helpfulness_examples import DocstringExampleEvaluator\n\n# Import our helpers\nfrom src.web_eval.helpers import parse_llm_score_from_text, extract_docstring_component\n\n# Initialize Flask app\napp = Flask(__name__)\napp.config['SECRET_KEY'] = 'DocAgent-evaluation-system'\n\n# Add template filter for extracting docstring components\n@app.template_filter('extract_component')\ndef extract_component_filter(docstring, component):\n    \"\"\"\n    Jinja2 template filter for extracting docstring components.\n    \n    Args:\n        docstring: The full docstring\n        component: The component to extract (summary, description, etc.)\n        \n    Returns:\n        The extracted component, or empty string if not found\n    \"\"\"\n    result = extract_docstring_component(docstring, component)\n    return result or \"\"\n\n# Global variable to store evaluation results\nevaluation_results = {}\nconfig = {}\n\n@app.route('/')\ndef index():\n    \"\"\"\n    Renders the configuration page (entry page).\n    \n    This page allows users to configure LLM settings and repository path.\n    \"\"\"\n    return render_template('index.html')\n\n@app.route('/test_api', methods=['POST'])\ndef test_api():\n    \"\"\"\n    Tests the LLM API connection by sending a simple query.\n    \n    Returns:\n        JSON response with success/failure and any error message\n    \"\"\"\n    data = request.get_json()\n    \n    # Save config for later use\n    global config\n    config = {\n        'llm_type': data.get('llm_type'),\n        'api_key': data.get('api_key'),\n        'model': data.get('model'),\n        'temperature': float(data.get('temperature', 0.1)),\n        'max_output_tokens': int(data.get('max_output_tokens', 4096))\n    }\n    \n    # Test API connection based on LLM type\n    try:\n        if config['llm_type'] == 'openai':\n            import openai\n            openai.api_key = config['api_key']\n            response = openai.chat.completions.create(\n                model=config['model'],\n                messages=[{\"role\": \"user\", \"content\": \"Who are you?\"}],\n                temperature=config['temperature'],\n                max_tokens=100\n            )\n            return jsonify({\"success\": True, \"response\": response.choices[0].message.content})\n            \n        elif config['llm_type'] == 'claude':\n            from anthropic import Anthropic\n            client = Anthropic(api_key=config['api_key'])\n            response = client.messages.create(\n                model=config['model'],\n                max_tokens=100,\n                temperature=config['temperature'],\n                messages=[{\"role\": \"user\", \"content\": \"Who are you?\"}]\n            )\n            return jsonify({\"success\": True, \"response\": response.content[0].text})\n            \n        else:\n            return jsonify({\"success\": False, \"error\": f\"Unsupported LLM type: {config['llm_type']}\"})\n            \n    except Exception as e:\n        return jsonify({\"success\": False, \"error\": str(e)})\n\n@app.route('/evaluate', methods=['POST'])\ndef evaluate():\n    \"\"\"\n    Initiates the evaluation process for the specified repository.\n    \n    Returns:\n        Redirects to the results page\n    \"\"\"\n    data = request.get_json()\n    repo_path = data.get('repo_path')\n    \n    if not os.path.exists(repo_path):\n        return jsonify({\"success\": False, \"error\": f\"Repository path does not exist: {repo_path}\"})\n    \n    try:\n        # Start evaluation\n        global evaluation_results\n        evaluation_results = process_directory(repo_path)\n        return jsonify({\"success\": True, \"redirect\": url_for('results')})\n    except Exception as e:\n        return jsonify({\"success\": False, \"error\": str(e)})\n\n@app.route('/results')\ndef results():\n    \"\"\"\n    Renders the evaluation results page.\n    \"\"\"\n    return render_template('results.html', results=evaluation_results)\n\n@app.route('/evaluate_helpfulness', methods=['POST'])\ndef evaluate_helpfulness():\n    \"\"\"\n    Evaluates the helpfulness of a specific docstring component.\n    \n    Returns:\n        JSON response with the helpfulness score\n    \"\"\"\n    data = request.get_json()\n    component_type = data.get('component_type')  # class or function\n    component_name = data.get('component_name')\n    docstring_part = data.get('docstring_part')  # summary, description, etc.\n    docstring_content = data.get('docstring_content')\n    signature = data.get('signature', '')\n    \n    try:\n        # Select appropriate evaluator based on docstring part\n        evaluator = None\n        if docstring_part == 'summary':\n            evaluator = DocstringSummaryEvaluator()\n        elif docstring_part == 'description':\n            evaluator = DocstringDescriptionEvaluator()\n        # elif docstring_part == 'arguments':\n        #     evaluator = DocstringArgumentsEvaluator()\n        elif docstring_part == 'parameters':\n            evaluator = DocstringParametersEvaluator()\n        elif docstring_part == 'attributes':\n            evaluator = DocstringAttributesEvaluator()\n        elif docstring_part == 'examples':\n            evaluator = DocstringExamplesEvaluator()\n        else:\n            return jsonify({\"success\": False, \"error\": f\"Unsupported docstring part: {docstring_part}\"})\n        \n        # Generate prompt\n        prompt = evaluator.get_evaluation_prompt(signature, docstring_content)\n        \n        # Call LLM API based on configured type\n        if config['llm_type'] == 'openai':\n            import openai\n            openai.api_key = config['api_key']\n            response = openai.chat.completions.create(\n                model=config['model'],\n                messages=[{\"role\": \"user\", \"content\": prompt}],\n                temperature=config['temperature'],\n                max_tokens=config['max_output_tokens']\n            )\n            llm_response = response.choices[0].message.content\n            \n        elif config['llm_type'] == 'claude':\n            from anthropic import Anthropic\n            client = Anthropic(api_key=config['api_key'])\n            response = client.messages.create(\n                model=config['model'],\n                max_tokens=config['max_output_tokens'],\n                temperature=config['temperature'],\n                messages=[{\"role\": \"user\", \"content\": prompt}]\n            )\n            llm_response = response.content[0].text\n            \n        else:\n            return jsonify({\"success\": False, \"error\": f\"Unsupported LLM type: {config['llm_type']}\"})\n        \n        # Parse LLM response to get score\n        score, explanation = parse_llm_score_from_text(llm_response)\n        \n        # Update evaluation results with helpfulness score\n        if component_type == 'class':\n            for cls in evaluation_results['classes']:\n                if cls['name'] == component_name:\n                    if 'helpfulness_scores' not in cls:\n                        cls['helpfulness_scores'] = {}\n                    cls['helpfulness_scores'][docstring_part] = {\n                        'score': score,\n                        'explanation': explanation\n                    }\n                    break\n        else:  # function or method\n            for func in evaluation_results['functions']:\n                if func['name'] == component_name:\n                    if 'helpfulness_scores' not in func:\n                        func['helpfulness_scores'] = {}\n                    func['helpfulness_scores'][docstring_part] = {\n                        'score': score,\n                        'explanation': explanation\n                    }\n                    break\n        \n        return jsonify({\n            \"success\": True, \n            \"score\": score, \n            \"explanation\": explanation\n        })\n        \n    except Exception as e:\n        return jsonify({\"success\": False, \"error\": str(e)})\n\n@app.route('/refresh', methods=['POST'])\ndef refresh_evaluation():\n    \"\"\"\n    Refreshes the completeness evaluation results.\n    \n    Returns:\n        Redirects to the updated results page\n    \"\"\"\n    data = request.get_json()\n    repo_path = data.get('repo_path')\n    \n    try:\n        # Re-run evaluation\n        global evaluation_results\n        evaluation_results = process_directory(repo_path)\n        return jsonify({\"success\": True})\n    except Exception as e:\n        return jsonify({\"success\": False, \"error\": str(e)})\n\ndef run_docstring_tests(source_file: str) -> Dict[str, Any]:\n    \"\"\"\n    Run comprehensive docstring evaluation tests on a Python source file.\n    \n    This function reads a Python file and evaluates docstrings for all classes,\n    functions, and methods found within. It provides detailed evaluation results.\n    \n    Args:\n        source_file: Path to the Python file to analyze\n        \n    Returns:\n        Dictionary containing evaluation results for each found element\n    \"\"\"\n    with open(source_file, 'r', encoding='utf-8') as f:\n        source = f.read()\n    \n    try:\n        tree = ast.parse(source)\n    except SyntaxError as e:\n        return {\n            'status': 'error',\n            'message': f'Failed to parse {source_file}: {str(e)}'\n        }\n    \n    results = {\n        'status': 'success',\n        'file': source_file,\n        'classes': [],\n        'functions': [],\n        'debug_info': {}\n    }\n    \n    # Instantiate evaluators\n    class_evaluator = ClassCompletenessEvaluator()\n    func_evaluator = FunctionCompletenessEvaluator()\n    \n    # Process all nodes in the AST\n    for node in ast.iter_child_nodes(tree):\n        if isinstance(node, ast.ClassDef):\n            # Get actual docstring content\n            class_docstring = ast.get_docstring(node) or \"\"\n            \n            class_result = {\n                'name': node.name,\n                'type': 'class',\n                'docstring': class_docstring,\n                'signature': f\"class {node.name}:\",\n                'completeness_score': class_evaluator.evaluate(node),\n                'completeness_elements': class_evaluator.element_scores.copy(),\n                'element_required': class_evaluator.element_required.copy()\n            }\n            results['classes'].append(class_result)\n            \n            # Evaluate methods within the class\n            for method in [n for n in ast.iter_child_nodes(node) if isinstance(n, ast.FunctionDef)]:\n                # Skip __init__ methods for display purposes\n                if method.name == '__init__':\n                    continue\n                \n                # Get actual method docstring content\n                method_docstring = ast.get_docstring(method) or \"\"\n                \n                method_result = {\n                    'name': f\"{node.name}.{method.name}\",\n                    'type': 'method',\n                    'docstring': method_docstring,\n                    'signature': f\"def {method.name}():\",  # Simplified signature\n                    'completeness_score': func_evaluator.evaluate(method),\n                    'completeness_elements': func_evaluator.element_scores.copy(),\n                    'element_required': func_evaluator.element_required.copy()\n                }\n                results['functions'].append(method_result)\n                \n        elif isinstance(node, ast.FunctionDef):\n            # Get actual function docstring content\n            func_docstring = ast.get_docstring(node) or \"\"\n            \n            # Only process top-level functions\n            func_result = {\n                'name': node.name,\n                'type': 'function',\n                'docstring': func_docstring,\n                'signature': f\"def {node.name}():\",  # Simplified signature\n                'completeness_score': func_evaluator.evaluate(node),\n                'completeness_elements': func_evaluator.element_scores.copy(),\n                'element_required': func_evaluator.element_required.copy()\n            }\n            results['functions'].append(func_result)\n    \n    # Add overall statistics\n    results['statistics'] = {\n        'total_classes': len(results['classes']),\n        'total_functions': len(results['functions']),\n        'average_class_score': sum(r['completeness_score'] for r in results['classes']) / \n                             max(1, len(results['classes'])),\n        'average_function_score': sum(r['completeness_score'] for r in results['functions']) / \n                                max(1, len(results['functions']))\n    }\n    \n    return results\n\ndef process_directory(directory_path: str) -> Dict[str, Any]:\n    \"\"\"\n    Process all Python files in a directory and its subdirectories.\n    \n    Args:\n        directory_path: Path to the directory to analyze\n        \n    Returns:\n        Dictionary containing aggregated evaluation results for all files\n    \"\"\"\n    # Initialize aggregate results\n    aggregate_results = {\n        'status': 'success',\n        'directory': directory_path,\n        'files': [],\n        'file_results': [],\n        'classes': [],\n        'functions': [],\n        'statistics': {\n            'total_files': 0,\n            'successful_files': 0,\n            'failed_files': 0,\n            'total_classes': 0,\n            'total_functions': 0,\n            'average_class_score': 0.0,\n            'average_function_score': 0.0,\n            'overall_average_score': 0.0\n        }\n    }\n    \n    # Find all Python files recursively\n    python_files = []\n    for root, _, files in os.walk(directory_path):\n        for file in files:\n            if file.endswith('.py'):\n                python_files.append(os.path.join(root, file))\n    \n    if not python_files:\n        aggregate_results['status'] = 'error'\n        aggregate_results['message'] = f'No Python files found in {directory_path}'\n        return aggregate_results\n    \n    aggregate_results['statistics']['total_files'] = len(python_files)\n    \n    # Process each Python file\n    all_class_scores = []\n    all_function_scores = []\n    \n    for py_file in python_files:\n        file_result = run_docstring_tests(py_file)\n        \n        if file_result['status'] == 'success':\n            aggregate_results['statistics']['successful_files'] = aggregate_results['statistics'].get('successful_files', 0) + 1\n            aggregate_results['file_results'].append(file_result)\n            aggregate_results['files'].append(py_file)\n            \n            # Accumulate classes and functions with file path context\n            for class_result in file_result['classes']:\n                class_result['file'] = py_file\n                aggregate_results['classes'].append(class_result)\n                all_class_scores.append(class_result['completeness_score'])\n            \n            for func_result in file_result['functions']:\n                func_result['file'] = py_file\n                aggregate_results['functions'].append(func_result)\n                all_function_scores.append(func_result['completeness_score'])\n                \n            # Update statistics\n            aggregate_results['statistics']['total_classes'] += file_result['statistics']['total_classes']\n            aggregate_results['statistics']['total_functions'] += file_result['statistics']['total_functions']\n        else:\n            aggregate_results['statistics']['failed_files'] = aggregate_results['statistics'].get('failed_files', 0) + 1\n    \n    # Calculate average scores\n    if all_class_scores:\n        aggregate_results['statistics']['average_class_score'] = sum(all_class_scores) / len(all_class_scores)\n    \n    if all_function_scores:\n        aggregate_results['statistics']['average_function_score'] = sum(all_function_scores) / len(all_function_scores)\n    \n    # Calculate overall average score (classes and functions combined)\n    all_scores = all_class_scores + all_function_scores\n    if all_scores:\n        aggregate_results['statistics']['overall_average_score'] = sum(all_scores) / len(all_scores)\n    \n    return aggregate_results\n\nif __name__ == '__main__':\n    # Parse command line arguments\n    parser = argparse.ArgumentParser(description='Docstring Evaluation Web App')\n    parser.add_argument('--host', type=str, default='0.0.0.0', \n                        help='Host address to bind to (default: 0.0.0.0 - accessible from outside)')\n    parser.add_argument('--port', type=int, default=5000, \n                        help='Port to run the server on (default: 5000)')\n    parser.add_argument('--debug', action='store_true', \n                        help='Run in debug mode (default: False)')\n    \n    args = parser.parse_args()\n    \n    # Print access information\n    if args.host == '0.0.0.0':\n        print(f\"\\n🚀 DocAgent web server starting!\")\n        print(f\"💻 Local access: http://localhost:{args.port}\")\n        print(f\"🌐 Network access: http://<server-ip>:{args.port}\")\n        print(f\"   (Replace <server-ip> with your server's IP address)\")\n        if args.debug:\n            print(f\"⚠️  Running in debug mode - not recommended for production use\")\n        print(\"\\nPress CTRL+C to stop the server\\n\")\n    \n    # Run the Flask app\n    app.run(host=args.host, port=args.port, debug=args.debug) "
  },
  {
    "path": "src/web_eval/helpers.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nHelper functions for the DocAgent web application\n\"\"\"\n\nimport re\nfrom typing import Tuple, Optional, Dict, List\n\ndef parse_llm_score_from_text(text: str) -> Tuple[int, str]:\n    \"\"\"\n    Parse score and explanation from LLM response text.\n    \n    Args:\n        text: The raw LLM response text\n        \n    Returns:\n        Tuple containing (score, explanation)\n    \"\"\"\n    # Try to extract score from <score> tags\n    score_match = re.search(r'<score>(\\d+)</score>', text)\n    if score_match:\n        score = int(score_match.group(1))\n    else:\n        # Try looking for the score in various formats\n        score_patterns = [\n            r'score:?\\s*(\\d+)/5',\n            r'score:?\\s*(\\d+)',\n            r'rating:?\\s*(\\d+)/5',\n            r'rating:?\\s*(\\d+)',\n            r'(\\d+)/5',\n            r'I would rate this as a (\\d+)',\n            r'I would give this a (\\d+)'\n        ]\n        \n        for pattern in score_patterns:\n            match = re.search(pattern, text, re.IGNORECASE)\n            if match:\n                score = int(match.group(1))\n                break\n        else:\n            # Default score if we can't find one\n            score = 3\n    \n    # Limit score to 1-5 range\n    score = max(1, min(5, score))\n    \n    # Extract explanation (everything except the score tags)\n    explanation = re.sub(r'<score>\\d+</score>', '', text).strip()\n    \n    # If explanation is very long, truncate it\n    if len(explanation) > 500:\n        explanation = explanation[:497] + \"...\"\n    \n    return score, explanation\n\nfrom typing import Dict\n\ndef parse_google_style_docstring(docstring: str) -> Dict[str, str]:\n    \"\"\"\n    A robust parser for Google-style docstrings that handles multiple possible\n    labels for each section.\n    \n    Args:\n        docstring: The docstring to parse\n        \n    Returns:\n        Dictionary with canonical section names as keys and their content as values\n    \"\"\"\n    # If docstring is empty or None, return empty sections\n    if not docstring:\n        return {key: \"\" for key in ['summary', 'description', 'parameters', 'attributes', 'returns', 'raises', 'examples']}\n\n    # Define all recognized sections. The key is the canonical name (lowercase).\n    # The value is a set of synonyms (also lowercase).\n    SECTION_LABELS = {\n        \"summary\":        {\"summary:\", \"brief:\", \"overview:\"},\n        \"description\":    {\"description:\", \"desc:\", \"details:\", \"long description:\"},\n        \"parameters\":     {\"parameters:\", \"params:\", \"args:\", \"arguments:\", \"keyword args:\", \"keyword arguments:\", \"**kwargs:\"},\n        \"attributes\":     {\"attributes:\", \"members:\", \"member variables:\", \"instance variables:\", \"properties:\", \"vars:\", \"variables:\"},\n        \"returns\":        {\"returns:\", \"return:\", \"return value:\", \"return values:\"},\n        \"raises\":         {\"raises:\", \"exceptions:\", \"throws:\", \"raise:\", \"exception:\", \"throw:\"},\n        \"examples\":       {\"example:\", \"examples:\", \"usage:\", \"usage example:\", \"usage examples:\", \"example usage:\"},\n    }\n\n    # Prepare a dictionary to hold the parsed content for each canonical key\n    parsed_content = {key: [] for key in SECTION_LABELS.keys()}\n\n    # Split by lines; if docstring uses Windows line endings, .splitlines() handles that gracefully\n    lines = docstring.strip().splitlines()\n\n    # -- 1) Fallback: no explicit sections at all in the entire docstring --\n    #    If no recognized label appears anywhere, treat the first line as summary, rest as description.\n    has_section_labels = False\n    for line in lines:\n        line_lower = line.strip().lower()\n        for labels in SECTION_LABELS.values():\n            for label in labels:\n                if line_lower.startswith(label):\n                    has_section_labels = True\n                    break\n            if has_section_labels:\n                break\n        if has_section_labels:\n            break\n            \n    if len(lines) > 0 and not has_section_labels:\n        parsed_content[\"summary\"] = [lines[0]]\n        if len(lines) > 1:\n            parsed_content[\"description\"] = lines[1:]\n        # Convert lists to single strings\n        return {key: \"\\n\".join(value).strip() for key, value in parsed_content.items()}\n\n    # We'll track the current section as we parse line by line\n    current_section = None\n\n    # -- 2) Partial Fallback for the first line only --\n    #    If the first line doesn't match any known label, treat it as summary and then\n    #    switch to \"description\" until an explicit label is found.\n    first_line = lines[0].strip().lower() if lines else \"\"\n    if not any(first_line.startswith(label) for labels in SECTION_LABELS.values() for label in labels):\n        if lines:\n            # Save first line as summary\n            parsed_content[\"summary\"] = [lines[0]]\n            # Make the current section \"description\"\n            current_section = \"description\"\n            lines = lines[1:]  # We'll handle the rest below\n\n    # -- 3) Main Parsing Loop --\n    for line in lines:\n        trimmed_line = line.strip().lower()\n        matched_section = None\n\n        # Check if this line begins with a known label (case-insensitive)\n        # If so, we identify that as a new section.\n        for canonical_name, synonyms in SECTION_LABELS.items():\n            for synonym in synonyms:\n                if trimmed_line.startswith(synonym):\n                    matched_section = canonical_name\n                    # Extract leftover text on the same line, after the label\n                    leftover = line.strip()[len(synonym):].strip()\n                    if leftover:\n                        parsed_content[matched_section].append(leftover)\n                    break\n            if matched_section:\n                break\n\n        if matched_section is not None:\n            # We found a new section header on this line\n            current_section = matched_section\n            # No need to append the header line to content - we've already handled any content after the label\n        else:\n            # Otherwise, continue appending lines to the current section\n            if current_section is not None:\n                parsed_content[current_section].append(line)\n\n    # -- 4) Convert list of lines to single string, preserving line breaks --\n    for section in parsed_content:\n        parsed_content[section] = \"\\n\".join(parsed_content[section]).strip()\n\n    return parsed_content\n\n\ndef extract_docstring_component(docstring: str, component: str) -> Optional[str]:\n    \"\"\"\n    Extract a specific component from a docstring using the robust parser.\n    \n    Args:\n        docstring: The full docstring text\n        component: The component to extract (summary, description, etc.)\n        \n    Returns:\n        The extracted component text, or None if not found\n    \"\"\"\n    if not docstring:\n        return None\n        \n    # Map component name to canonical name used in the parser\n    component_map = {\n        'summary': 'summary',\n        'description': 'description',\n        # 'arguments': 'parameters',\n        'params': 'parameters',\n        'parameters': 'parameters',\n        'attributes': 'attributes',\n        'returns': 'returns',\n        'raises': 'raises',\n        'examples': 'examples'\n    }\n    \n    canonical_component = component_map.get(component.lower(), component.lower())\n    \n    # Parse the docstring\n    parsed = parse_google_style_docstring(docstring)\n    \n    # Return the requested component\n    if canonical_component in parsed:\n        return parsed[canonical_component] or None\n    \n    return None "
  },
  {
    "path": "src/web_eval/requirements.txt",
    "content": "flask>=2.0.0\nopenai>=1.0.0\nanthropic>=0.5.0\ntabulate>=0.8.0 "
  },
  {
    "path": "src/web_eval/start_server.sh",
    "content": "#!/bin/bash\n# Copyright (c) Meta Platforms, Inc. and affiliates\n\n# Default values\nHOST=\"0.0.0.0\"\nPORT=\"8080\"\nDEBUG=\"\"\n\n# Show help function\nshow_help() {\n  echo \"Usage: ./start_server.sh [options]\"\n  echo \"\"\n  echo \"Options:\"\n  echo \"  -h, --host HOST     Host address to bind to (default: 0.0.0.0)\"\n  echo \"  -p, --port PORT     Port to run the server on (default: 8080)\"\n  echo \"  -d, --debug         Run in debug mode\"\n  echo \"  --help              Show this help message\"\n  echo \"\"\n  echo \"Examples:\"\n  echo \"  ./start_server.sh                   # Run on default host:port (0.0.0.0:8080)\"\n  echo \"  ./start_server.sh -p 9090           # Run on port 9090\"\n  echo \"  ./start_server.sh -h 127.0.0.1      # Run on localhost only\"\n  echo \"  ./start_server.sh -d                # Run in debug mode\"\n  echo \"\"\n}\n\n# Parse command line arguments\nwhile [[ $# -gt 0 ]]; do\n  case \"$1\" in\n    -h|--host)\n      HOST=\"$2\"\n      shift 2\n      ;;\n    -p|--port)\n      PORT=\"$2\"\n      shift 2\n      ;;\n    -d|--debug)\n      DEBUG=\"--debug\"\n      shift\n      ;;\n    --help)\n      show_help\n      exit 0\n      ;;\n    *)\n      echo \"Unknown option: $1\"\n      show_help\n      exit 1\n      ;;\n  esac\ndone\n\n# Display startup message\necho \"Starting DocAgent Web Server...\"\necho \"Host: $HOST\"\necho \"Port: $PORT\"\nif [ -n \"$DEBUG\" ]; then\n  echo \"Mode: DEBUG (not recommended for production)\"\nelse\n  echo \"Mode: Production\"\nfi\necho \"\"\n\n# Run the Flask app with the specified options\npython app.py --host \"$HOST\" --port \"$PORT\" $DEBUG "
  },
  {
    "path": "src/web_eval/static/css/style.css",
    "content": "/* Copyright (c) Meta Platforms, Inc. and affiliates */\n/* DocAgent - Docstring Evaluation System Styles */\n\n/* General Styles */\nbody {\n    background-color: #f8f9fa;\n}\n\n.card {\n    border-radius: 0.5rem;\n    overflow: hidden;\n}\n\n.card-header {\n    border-bottom: none;\n}\n\n/* Table Styles */\n.table {\n    font-size: 0.9rem;\n}\n\n.table th {\n    font-weight: 600;\n}\n\n.table-responsive {\n    max-height: 70vh;\n    overflow-y: auto;\n}\n\n/* Button Styles */\n.evaluate-btn {\n    font-size: 0.75rem;\n    padding: 0.2rem 0.5rem;\n}\n\n/* Modal Styles */\n.modal-content {\n    border-radius: 0.5rem;\n    overflow: hidden;\n}\n\n.modal-header {\n    border-bottom: none;\n}\n\n.modal-footer {\n    border-top: none;\n}\n\n/* Docstring content display */\npre#docstringContent {\n    max-height: 300px;\n    overflow-y: auto;\n    font-size: 0.9rem;\n    white-space: pre-wrap;\n}\n\n/* Badges */\n.badge {\n    font-weight: 500;\n    padding: 0.35rem 0.65rem;\n}\n\n/* Alert Styles */\n.alert {\n    border-radius: 0.5rem;\n}\n\n/* Responsive Adjustments */\n@media (max-width: 992px) {\n    .table {\n        font-size: 0.8rem;\n    }\n    \n    .evaluate-btn {\n        font-size: 0.7rem;\n        padding: 0.15rem 0.4rem;\n    }\n    \n    .badge {\n        font-size: 0.7rem;\n        padding: 0.25rem 0.5rem;\n    }\n}\n\n/* Custom scrollbar */\n::-webkit-scrollbar {\n    width: 8px;\n    height: 8px;\n}\n\n::-webkit-scrollbar-track {\n    background: #f1f1f1;\n    border-radius: 4px;\n}\n\n::-webkit-scrollbar-thumb {\n    background: #888;\n    border-radius: 4px;\n}\n\n::-webkit-scrollbar-thumb:hover {\n    background: #555;\n} "
  },
  {
    "path": "src/web_eval/templates/index.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    <title>DocAgent - Docstring Evaluation System</title>\n    <link href=\"https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css\" rel=\"stylesheet\">\n    <link rel=\"stylesheet\" href=\"{{ url_for('static', filename='css/style.css') }}\">\n</head>\n<body>\n    <div class=\"container my-5\">\n        <div class=\"row justify-content-center\">\n            <div class=\"col-md-8\">\n                <div class=\"card shadow\">\n                    <div class=\"bg-dark text-white p-3 d-flex justify-content-between align-items-center\">\n                        <h2 class=\"mb-0\">DocAgent - Docstring Evaluation System</h2>\n                        <img src=\"{{ url_for('static', filename='assets/meta_logo_white.png') }}\" alt=\"Meta Logo\" class=\"header-logo\" height=\"30\">\n                    </div>\n                    <div class=\"card-body\">\n                        <form id=\"configForm\">\n                            <h4 class=\"mb-4\">LLM Configuration</h4>\n                            \n                            <div class=\"mb-3\">\n                                <label for=\"llm_type\" class=\"form-label\">LLM Type</label>\n                                <select class=\"form-select\" id=\"llm_type\" name=\"llm_type\" required>\n                                    <option value=\"openai\">OpenAI</option>\n                                    <option value=\"claude\">Claude (Anthropic)</option>\n                                </select>\n                            </div>\n                            \n                            <div class=\"mb-3\">\n                                <label for=\"api_key\" class=\"form-label\">API Key</label>\n                                <input type=\"password\" class=\"form-control\" id=\"api_key\" name=\"api_key\" required>\n                            </div>\n                            \n                            <div class=\"mb-3\">\n                                <label for=\"model\" class=\"form-label\">Model</label>\n                                <input type=\"text\" class=\"form-control\" id=\"model\" name=\"model\" placeholder=\"e.g., gpt-4, claude-3-opus-20240229\" required>\n                            </div>\n                            \n                            <div class=\"row\">\n                                <div class=\"col-md-6 mb-3\">\n                                    <label for=\"temperature\" class=\"form-label\">Temperature</label>\n                                    <input type=\"number\" class=\"form-control\" id=\"temperature\" name=\"temperature\" min=\"0\" max=\"1\" step=\"0.1\" value=\"0.1\" required>\n                                </div>\n                                \n                                <div class=\"col-md-6 mb-3\">\n                                    <label for=\"max_tokens\" class=\"form-label\">Max Tokens</label>\n                                    <input type=\"number\" class=\"form-control\" id=\"max_tokens\" name=\"max_tokens\" min=\"100\" step=\"1\" value=\"4096\" required>\n                                </div>\n                            </div>\n                            \n                            <div class=\"mb-4\">\n                                <button type=\"button\" id=\"testApiBtn\" class=\"btn btn-outline-primary\">\n                                    <span id=\"testApiSpinner\" class=\"spinner-border spinner-border-sm d-none\" role=\"status\" aria-hidden=\"true\"></span>\n                                    Test API Connection\n                                </button>\n                                <div id=\"apiTestResult\" class=\"mt-2\"></div>\n                            </div>\n                            \n                            <hr class=\"my-4\">\n                            \n                            <h4 class=\"mb-4\">Repository Configuration</h4>\n                            <div class=\"mb-3\">\n                                <label for=\"repo_path\" class=\"form-label\">Repository Path</label>\n                                <input type=\"text\" class=\"form-control\" id=\"repo_path\" name=\"repo_path\" placeholder=\"e.g., /path/to/repository\" required>\n                            </div>\n                            \n                            <div class=\"d-grid\">\n                                <button type=\"button\" id=\"evaluateBtn\" class=\"btn btn-primary\">\n                                    <span id=\"evaluateSpinner\" class=\"spinner-border spinner-border-sm d-none\" role=\"status\" aria-hidden=\"true\"></span>\n                                    Start Evaluation\n                                </button>\n                            </div>\n                        </form>\n                    </div>\n                </div>\n            </div>\n        </div>\n    </div>\n\n    <script src=\"https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/js/bootstrap.bundle.min.js\"></script>\n    <script>\n        document.addEventListener('DOMContentLoaded', function() {\n            // Test API Connection button\n            document.getElementById('testApiBtn').addEventListener('click', function() {\n                const testApiBtn = this;\n                const spinner = document.getElementById('testApiSpinner');\n                const resultDiv = document.getElementById('apiTestResult');\n                \n                // Get form data\n                const llmType = document.getElementById('llm_type').value;\n                const apiKey = document.getElementById('api_key').value;\n                const model = document.getElementById('model').value;\n                const temperature = document.getElementById('temperature').value;\n                const maxTokens = document.getElementById('max_tokens').value;\n                \n                // Validate form\n                if (!llmType || !apiKey || !model || !temperature || !maxTokens) {\n                    resultDiv.innerHTML = '<div class=\"alert alert-danger\">Please fill in all LLM configuration fields</div>';\n                    return;\n                }\n                \n                // Show spinner\n                spinner.classList.remove('d-none');\n                testApiBtn.disabled = true;\n                resultDiv.innerHTML = '';\n                \n                // Send request to test API\n                fetch('/test_api', {\n                    method: 'POST',\n                    headers: {\n                        'Content-Type': 'application/json'\n                    },\n                    body: JSON.stringify({\n                        llm_type: llmType,\n                        api_key: apiKey,\n                        model: model,\n                        temperature: temperature,\n                        max_tokens: maxTokens\n                    })\n                })\n                .then(response => response.json())\n                .then(data => {\n                    if (data.success) {\n                        resultDiv.innerHTML = `<div class=\"alert alert-success\">\n                            <strong>Success!</strong> API connection works.\n                            <p class=\"mt-2\"><strong>Response:</strong> ${data.response}</p>\n                        </div>`;\n                    } else {\n                        resultDiv.innerHTML = `<div class=\"alert alert-danger\">\n                            <strong>Error:</strong> ${data.error}\n                        </div>`;\n                    }\n                })\n                .catch(error => {\n                    resultDiv.innerHTML = `<div class=\"alert alert-danger\">\n                        <strong>Error:</strong> ${error.message}\n                    </div>`;\n                })\n                .finally(() => {\n                    // Hide spinner\n                    spinner.classList.add('d-none');\n                    testApiBtn.disabled = false;\n                });\n            });\n            \n            // Start Evaluation button\n            document.getElementById('evaluateBtn').addEventListener('click', function() {\n                const evaluateBtn = this;\n                const spinner = document.getElementById('evaluateSpinner');\n                \n                // Get repository path\n                const repoPath = document.getElementById('repo_path').value;\n                \n                // Validate repository path\n                if (!repoPath) {\n                    alert('Please enter a repository path');\n                    return;\n                }\n                \n                // Validate LLM configuration\n                const llmType = document.getElementById('llm_type').value;\n                const apiKey = document.getElementById('api_key').value;\n                const model = document.getElementById('model').value;\n                const temperature = document.getElementById('temperature').value;\n                const maxTokens = document.getElementById('max_tokens').value;\n                \n                if (!llmType || !apiKey || !model || !temperature || !maxTokens) {\n                    alert('Please fill in all LLM configuration fields');\n                    return;\n                }\n                \n                // Show spinner\n                spinner.classList.remove('d-none');\n                evaluateBtn.disabled = true;\n                \n                // Send request to evaluate repository\n                fetch('/evaluate', {\n                    method: 'POST',\n                    headers: {\n                        'Content-Type': 'application/json'\n                    },\n                    body: JSON.stringify({\n                        repo_path: repoPath\n                    })\n                })\n                .then(response => response.json())\n                .then(data => {\n                    if (data.success) {\n                        window.location.href = data.redirect;\n                    } else {\n                        alert(`Error: ${data.error}`);\n                        // Hide spinner\n                        spinner.classList.add('d-none');\n                        evaluateBtn.disabled = false;\n                    }\n                })\n                .catch(error => {\n                    alert(`Error: ${error.message}`);\n                    // Hide spinner\n                    spinner.classList.add('d-none');\n                    evaluateBtn.disabled = false;\n                });\n            });\n        });\n    </script>\n</body>\n</html> "
  },
  {
    "path": "src/web_eval/templates/results.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    <title>DocAgent - Evaluation Results</title>\n    <link href=\"https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css\" rel=\"stylesheet\">\n    <link rel=\"stylesheet\" href=\"{{ url_for('static', filename='css/style.css') }}\">\n</head>\n<body>\n    <div class=\"container-fluid my-3\">\n        <div class=\"row mb-3\">\n            <div class=\"col-12\">\n                <div class=\"card shadow\">\n                    <div class=\"bg-dark text-white p-3 d-flex justify-content-between align-items-center\">\n                        <h2 class=\"mb-0\">DocAgent - Evaluation Results</h2>\n                        <div>\n                            <button id=\"refreshBtn\" class=\"btn btn-light me-2\">\n                                <span id=\"refreshSpinner\" class=\"spinner-border spinner-border-sm d-none\" role=\"status\" aria-hidden=\"true\"></span>\n                                Refresh Evaluation\n                            </button>\n                            <a href=\"{{ url_for('index') }}\" class=\"btn btn-outline-light\">Return to Config</a>\n                        </div>\n                    </div>\n                    <div class=\"card-body\">\n                        <div class=\"alert alert-info\">\n                            <h5>Repository: {{ results.directory }}</h5>\n                            <div class=\"row mt-3\">\n                                <div class=\"col-md-3\">\n                                    <strong>Total Files:</strong> {{ results.statistics.total_files }}\n                                </div>\n                                <div class=\"col-md-3\">\n                                    <strong>Total Classes:</strong> {{ results.statistics.total_classes }}\n                                </div>\n                                <div class=\"col-md-3\">\n                                    <strong>Total Functions/Methods:</strong> {{ results.statistics.total_functions }}\n                                </div>\n                                <div class=\"col-md-3\">\n                                    <strong>Overall Score:</strong> \n                                    <span class=\"badge {{ 'bg-success' if results.statistics.overall_average_score >= 0.8 else 'bg-warning' if results.statistics.overall_average_score >= 0.5 else 'bg-danger' }}\">\n                                        {{ '%.2f'|format(results.statistics.overall_average_score) }}\n                                    </span>\n                                </div>\n                            </div>\n                        </div>\n\n                        <!-- Classes Section -->\n                        <h4 class=\"mt-4 mb-3\">Classes</h4>\n                        <div class=\"table-responsive\">\n                            <table class=\"table table-striped table-hover\">\n                                <thead class=\"table-dark\">\n                                    <tr>\n                                        <th>Class Name</th>\n                                        <th>Score</th>\n                                        <th>Summary</th>\n                                        <th>Description</th>\n                                        <th>Parameters</th>\n                                        <th>Attributes</th>\n                                        <th>Examples</th>\n                                        <th>File</th>\n                                    </tr>\n                                </thead>\n                                <tbody>\n                                    {% for class in results.classes %}\n                                    <tr>\n                                        <td>{{ class.name }}</td>\n                                        <td>\n                                            <span class=\"badge {{ 'bg-success' if class.completeness_score >= 0.8 else 'bg-warning' if class.completeness_score >= 0.5 else 'bg-danger' }}\">\n                                                {{ '%.2f'|format(class.completeness_score) }}\n                                            </span>\n                                        </td>\n                                        <td>\n                                            {% if class.completeness_elements.summary %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if class.element_required.summary %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"class\"\n                                                            data-component-name=\"{{ class.name }}\"\n                                                            data-docstring-part=\"summary\"\n                                                            data-signature=\"{{ class.signature }}\"\n                                                            data-docstring-content=\"{{ class.docstring|extract_component('summary') }}\">\n                                                        {% if class.helpfulness_scores and class.helpfulness_scores.summary %}\n                                                            <span class=\"badge bg-info\">{{ class.helpfulness_scores.summary.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if class.element_required.summary %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if class.completeness_elements.description %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if class.element_required.description %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"class\"\n                                                            data-component-name=\"{{ class.name }}\"\n                                                            data-docstring-part=\"description\"\n                                                            data-signature=\"{{ class.signature }}\"\n                                                            data-docstring-content=\"{{ class.docstring|extract_component('description') }}\">\n                                                        {% if class.helpfulness_scores and class.helpfulness_scores.description %}\n                                                            <span class=\"badge bg-info\">{{ class.helpfulness_scores.description.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if class.element_required.description %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if class.completeness_elements.parameters %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if class.element_required.parameters %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"class\"\n                                                            data-component-name=\"{{ class.name }}\"\n                                                            data-docstring-part=\"parameters\"\n                                                            data-signature=\"{{ class.signature }}\"\n                                                            data-docstring-content=\"{{ class.docstring|extract_component('parameters') }}\">\n                                                        {% if class.helpfulness_scores and class.helpfulness_scores.parameters %}\n                                                            <span class=\"badge bg-info\">{{ class.helpfulness_scores.parameters.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if class.element_required.parameters %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if class.completeness_elements.attributes %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if class.element_required.attributes %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"class\"\n                                                            data-component-name=\"{{ class.name }}\"\n                                                            data-docstring-part=\"attributes\"\n                                                            data-signature=\"{{ class.signature }}\"\n                                                            data-docstring-content=\"{{ class.docstring|extract_component('attributes') }}\">\n                                                        {% if class.helpfulness_scores and class.helpfulness_scores.attributes %}\n                                                            <span class=\"badge bg-info\">{{ class.helpfulness_scores.attributes.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if class.element_required.attributes %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if class.completeness_elements.examples %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if class.element_required.examples %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"class\"\n                                                            data-component-name=\"{{ class.name }}\"\n                                                            data-docstring-part=\"examples\"\n                                                            data-signature=\"{{ class.signature }}\"\n                                                            data-docstring-content=\"{{ class.docstring|extract_component('examples') }}\">\n                                                        {% if class.helpfulness_scores and class.helpfulness_scores.examples %}\n                                                            <span class=\"badge bg-info\">{{ class.helpfulness_scores.examples.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if class.element_required.examples %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>{{ class.file.split('/')[-1] }}</td>\n                                    </tr>\n                                    {% endfor %}\n                                </tbody>\n                            </table>\n                        </div>\n\n                        <!-- Functions/Methods Section -->\n                        <h4 class=\"mt-5 mb-3\">Functions/Methods</h4>\n                        <div class=\"table-responsive\">\n                            <table class=\"table table-striped table-hover\">\n                                <thead class=\"table-dark\">\n                                    <tr>\n                                        <th>Function Name</th>\n                                        <th>Type</th>\n                                        <th>Score</th>\n                                        <th>Summary</th>\n                                        <th>Description</th>\n                                        <th>Returns</th>\n                                        <th>Raises</th>\n                                        <th>Examples</th>\n                                        <th>File</th>\n                                    </tr>\n                                </thead>\n                                <tbody>\n                                    {% for func in results.functions %}\n                                    <tr>\n                                        <td>{{ func.name }}</td>\n                                        <td>{{ func.type }}</td>\n                                        <td>\n                                            <span class=\"badge {{ 'bg-success' if func.completeness_score >= 0.8 else 'bg-warning' if func.completeness_score >= 0.5 else 'bg-danger' }}\">\n                                                {{ '%.2f'|format(func.completeness_score) }}\n                                            </span>\n                                        </td>\n                                        <td>\n                                            {% if func.completeness_elements.summary %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if func.element_required.summary %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"function\"\n                                                            data-component-name=\"{{ func.name }}\"\n                                                            data-docstring-part=\"summary\"\n                                                            data-signature=\"{{ func.signature }}\"\n                                                            data-docstring-content=\"{{ func.docstring|extract_component('summary') }}\">\n                                                        {% if func.helpfulness_scores and func.helpfulness_scores.summary %}\n                                                            <span class=\"badge bg-info\">{{ func.helpfulness_scores.summary.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if func.element_required.summary %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if func.completeness_elements.description %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if func.element_required.description %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"function\"\n                                                            data-component-name=\"{{ func.name }}\"\n                                                            data-docstring-part=\"description\"\n                                                            data-signature=\"{{ func.signature }}\"\n                                                            data-docstring-content=\"{{ func.docstring|extract_component('description') }}\">\n                                                        {% if func.helpfulness_scores and func.helpfulness_scores.description %}\n                                                            <span class=\"badge bg-info\">{{ func.helpfulness_scores.description.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if func.element_required.description %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if func.completeness_elements.returns %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if func.element_required.returns %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"function\"\n                                                            data-component-name=\"{{ func.name }}\"\n                                                            data-docstring-part=\"returns\"\n                                                            data-signature=\"{{ func.signature }}\"\n                                                            data-docstring-content=\"{{ func.docstring|extract_component('returns') }}\">\n                                                        {% if func.helpfulness_scores and func.helpfulness_scores.returns %}\n                                                            <span class=\"badge bg-info\">{{ func.helpfulness_scores.returns.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if func.element_required.returns %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if func.completeness_elements.raises %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if func.element_required.raises %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"function\"\n                                                            data-component-name=\"{{ func.name }}\"\n                                                            data-docstring-part=\"raises\"\n                                                            data-signature=\"{{ func.signature }}\"\n                                                            data-docstring-content=\"{{ func.docstring|extract_component('raises') }}\">\n                                                        {% if func.helpfulness_scores and func.helpfulness_scores.raises %}\n                                                            <span class=\"badge bg-info\">{{ func.helpfulness_scores.raises.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if func.element_required.raises %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>\n                                            {% if func.completeness_elements.examples %}\n                                                <span class=\"text-success\">✓</span>\n                                                {% if func.element_required.examples %}\n                                                    <button class=\"btn btn-sm btn-outline-primary evaluate-btn ms-2\" \n                                                            data-component-type=\"function\"\n                                                            data-component-name=\"{{ func.name }}\"\n                                                            data-docstring-part=\"examples\"\n                                                            data-signature=\"{{ func.signature }}\"\n                                                            data-docstring-content=\"{{ func.docstring|extract_component('examples') }}\">\n                                                        {% if func.helpfulness_scores and func.helpfulness_scores.examples %}\n                                                            <span class=\"badge bg-info\">{{ func.helpfulness_scores.examples.score }}/5</span>\n                                                        {% else %}\n                                                            Evaluate\n                                                        {% endif %}\n                                                    </button>\n                                                {% endif %}\n                                            {% else %}\n                                                <span class=\"text-danger\">✗</span>\n                                                {% if func.element_required.examples %}\n                                                    <span class=\"badge bg-warning\">Required</span>\n                                                {% endif %}\n                                            {% endif %}\n                                        </td>\n                                        <td>{{ func.file.split('/')[-1] }}</td>\n                                    </tr>\n                                    {% endfor %}\n                                </tbody>\n                            </table>\n                        </div>\n                    </div>\n                </div>\n            </div>\n        </div>\n    </div>\n\n    <!-- Evaluation Modal -->\n    <div class=\"modal fade\" id=\"evaluationModal\" tabindex=\"-1\" aria-labelledby=\"evaluationModalLabel\" aria-hidden=\"true\">\n        <div class=\"modal-dialog modal-lg\">\n            <div class=\"modal-content\">\n                <div class=\"modal-header bg-primary text-white\">\n                    <h5 class=\"modal-title\" id=\"evaluationModalLabel\">Evaluating Helpfulness</h5>\n                    <button type=\"button\" class=\"btn-close btn-close-white\" data-bs-dismiss=\"modal\" aria-label=\"Close\"></button>\n                </div>\n                <div class=\"modal-body\">\n                    <div class=\"mb-3\">\n                        <p><strong>Component:</strong> <span id=\"componentName\"></span></p>\n                        <p><strong>Part:</strong> <span id=\"docstringPart\"></span></p>\n                        <p><strong>Content:</strong></p>\n                        <pre id=\"docstringContent\" class=\"p-3 bg-light rounded\"></pre>\n                    </div>\n                    <div id=\"evaluationResult\" class=\"d-none\">\n                        <div class=\"alert alert-info\">\n                            <h5 class=\"mb-3\">Evaluation Result: <span id=\"evaluationScore\" class=\"badge bg-info\"></span></h5>\n                            <p><strong>Explanation:</strong></p>\n                            <p id=\"evaluationExplanation\"></p>\n                        </div>\n                    </div>\n                    <div id=\"evaluationLoading\" class=\"text-center d-none\">\n                        <div class=\"spinner-border text-primary\" role=\"status\">\n                            <span class=\"visually-hidden\">Loading...</span>\n                        </div>\n                        <p class=\"mt-2\">Evaluating helpfulness with LLM... This may take a minute.</p>\n                    </div>\n                    <div id=\"evaluationError\" class=\"alert alert-danger d-none\"></div>\n                </div>\n                <div class=\"modal-footer\">\n                    <button type=\"button\" class=\"btn btn-secondary\" data-bs-dismiss=\"modal\">Close</button>\n                    <button type=\"button\" id=\"startEvaluationBtn\" class=\"btn btn-primary\">\n                        <span id=\"evaluationBtnSpinner\" class=\"spinner-border spinner-border-sm d-none\" role=\"status\" aria-hidden=\"true\"></span>\n                        Evaluate\n                    </button>\n                </div>\n            </div>\n        </div>\n    </div>\n\n    <script src=\"https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/js/bootstrap.bundle.min.js\"></script>\n    <script>\n        document.addEventListener('DOMContentLoaded', function() {\n            // Initialize modal\n            const evaluationModal = new bootstrap.Modal(document.getElementById('evaluationModal'));\n            \n            // Event listener for evaluate buttons\n            document.querySelectorAll('.evaluate-btn').forEach(btn => {\n                btn.addEventListener('click', function() {\n                    // Get data attributes\n                    const componentType = this.getAttribute('data-component-type');\n                    const componentName = this.getAttribute('data-component-name');\n                    const docstringPart = this.getAttribute('data-docstring-part');\n                    const signature = this.getAttribute('data-signature');\n                    const docstringContent = this.getAttribute('data-docstring-content');\n                    \n                    // Set modal content\n                    document.getElementById('componentName').textContent = componentName;\n                    document.getElementById('docstringPart').textContent = docstringPart;\n                    document.getElementById('docstringContent').textContent = docstringContent;\n                    \n                    // Store data for evaluation\n                    document.getElementById('startEvaluationBtn').setAttribute('data-component-type', componentType);\n                    document.getElementById('startEvaluationBtn').setAttribute('data-component-name', componentName);\n                    document.getElementById('startEvaluationBtn').setAttribute('data-docstring-part', docstringPart);\n                    document.getElementById('startEvaluationBtn').setAttribute('data-signature', signature);\n                    document.getElementById('startEvaluationBtn').setAttribute('data-docstring-content', docstringContent);\n                    \n                    // Reset UI elements\n                    document.getElementById('evaluationResult').classList.add('d-none');\n                    document.getElementById('evaluationLoading').classList.add('d-none');\n                    document.getElementById('evaluationError').classList.add('d-none');\n                    document.getElementById('startEvaluationBtn').classList.remove('d-none');\n                    \n                    // Check if already evaluated\n                    if (this.querySelector('.badge')) {\n                        // If already evaluated, show result\n                        const score = this.querySelector('.badge').textContent.split('/')[0];\n                        document.getElementById('evaluationScore').textContent = score + '/5';\n                        document.getElementById('evaluationResult').classList.remove('d-none');\n                        document.getElementById('startEvaluationBtn').classList.add('d-none');\n                    }\n                    \n                    // Show modal\n                    evaluationModal.show();\n                });\n            });\n            \n            // Event listener for evaluation button in modal\n            document.getElementById('startEvaluationBtn').addEventListener('click', function() {\n                const evaluationBtn = this;\n                const spinner = document.getElementById('evaluationBtnSpinner');\n                const loadingDiv = document.getElementById('evaluationLoading');\n                const resultDiv = document.getElementById('evaluationResult');\n                const errorDiv = document.getElementById('evaluationError');\n                \n                // Get data attributes\n                const componentType = this.getAttribute('data-component-type');\n                const componentName = this.getAttribute('data-component-name');\n                const docstringPart = this.getAttribute('data-docstring-part');\n                const signature = this.getAttribute('data-signature');\n                const docstringContent = this.getAttribute('data-docstring-content');\n                \n                // Show loading UI\n                spinner.classList.remove('d-none');\n                evaluationBtn.disabled = true;\n                loadingDiv.classList.remove('d-none');\n                resultDiv.classList.add('d-none');\n                errorDiv.classList.add('d-none');\n                \n                // Send request to evaluate helpfulness\n                fetch('/evaluate_helpfulness', {\n                    method: 'POST',\n                    headers: {\n                        'Content-Type': 'application/json'\n                    },\n                    body: JSON.stringify({\n                        component_type: componentType,\n                        component_name: componentName,\n                        docstring_part: docstringPart,\n                        signature: signature,\n                        docstring_content: docstringContent\n                    })\n                })\n                .then(response => response.json())\n                .then(data => {\n                    if (data.success) {\n                        // Update UI with result\n                        document.getElementById('evaluationScore').textContent = data.score + '/5';\n                        document.getElementById('evaluationExplanation').textContent = data.explanation;\n                        resultDiv.classList.remove('d-none');\n                        \n                        // Update button in table\n                        document.querySelector(`.evaluate-btn[data-component-type=\"${componentType}\"][data-component-name=\"${componentName}\"][data-docstring-part=\"${docstringPart}\"]`)\n                            .innerHTML = `<span class=\"badge bg-info\">${data.score}/5</span>`;\n                    } else {\n                        // Show error\n                        errorDiv.textContent = 'Error: ' + data.error;\n                        errorDiv.classList.remove('d-none');\n                    }\n                })\n                .catch(error => {\n                    // Show error\n                    errorDiv.textContent = 'Error: ' + error.message;\n                    errorDiv.classList.remove('d-none');\n                })\n                .finally(() => {\n                    // Hide loading UI\n                    spinner.classList.add('d-none');\n                    evaluationBtn.disabled = false;\n                    loadingDiv.classList.add('d-none');\n                });\n            });\n            \n            // Refresh button\n            document.getElementById('refreshBtn').addEventListener('click', function() {\n                const refreshBtn = this;\n                const spinner = document.getElementById('refreshSpinner');\n                \n                // Show spinner\n                spinner.classList.remove('d-none');\n                refreshBtn.disabled = true;\n                \n                // Get repository path from results\n                const repoPath = \"{{ results.directory }}\";\n                \n                // Send request to refresh evaluation\n                fetch('/refresh', {\n                    method: 'POST',\n                    headers: {\n                        'Content-Type': 'application/json'\n                    },\n                    body: JSON.stringify({\n                        repo_path: repoPath\n                    })\n                })\n                .then(response => response.json())\n                .then(data => {\n                    if (data.success) {\n                        // Reload page to show updated results\n                        window.location.reload();\n                    } else {\n                        alert(`Error: ${data.error}`);\n                        // Hide spinner\n                        spinner.classList.add('d-none');\n                        refreshBtn.disabled = false;\n                    }\n                })\n                .catch(error => {\n                    alert(`Error: ${error.message}`);\n                    // Hide spinner\n                    spinner.classList.add('d-none');\n                    refreshBtn.disabled = false;\n                });\n            });\n        });\n    </script>\n</body>\n</html> "
  },
  {
    "path": "src/web_eval/test_docstring_parser.py",
    "content": "#!/usr/bin/env python\n# Copyright (c) Meta Platforms, Inc. and affiliates\n# -*- coding: utf-8 -*-\n\"\"\"Test script for the parse_google_style_docstring function.\"\"\"\n\nfrom helpers import parse_google_style_docstring, extract_docstring_component\nimport json\nfrom typing import Dict, Any, Optional\n\n\ndef test_and_print_result(test_name: str, docstring: str) -> Dict[str, Any]:\n    \"\"\"\n    Run a test case and print results in a formatted way.\n    \n    Args:\n        test_name: The name of the test\n        docstring: The docstring to parse\n        \n    Returns:\n        The parsed docstring components\n    \"\"\"\n    print(f\"\\n{'=' * 80}\")\n    print(f\"TEST: {test_name}\")\n    print(f\"{'-' * 80}\")\n    print(\"INPUT DOCSTRING:\")\n    print(f\"{'-' * 40}\")\n    print(docstring)\n    print(f\"{'-' * 40}\")\n    \n    # Parse the docstring\n    result = parse_google_style_docstring(docstring)\n    \n    # Print the result in a formatted way\n    print(\"PARSED RESULT:\")\n    print(f\"{'-' * 40}\")\n    for section, content in result.items():\n        if content:\n            print(f\"{section.upper()}:\")\n            print(f\"{content!r}\")\n            print()\n    print(f\"{'-' * 40}\")\n\n    return result\n\n\ndef test_extract_component(docstring: str) -> None:\n    \"\"\"\n    Test the extract_docstring_component function with a given docstring.\n    \n    Args:\n        docstring: The docstring to test with\n    \"\"\"\n    print(f\"\\n{'=' * 80}\")\n    print(\"TESTING extract_docstring_component\")\n    print(f\"{'-' * 80}\")\n    print(\"INPUT DOCSTRING:\")\n    print(f\"{'-' * 40}\")\n    print(docstring)\n    print(f\"{'-' * 40}\")\n    \n    # Test extracting different components\n    components = [\"summary\", \"description\", \"parameters\", \"arguments\", \"returns\", \"raises\", \"examples\"]\n    \n    print(\"EXTRACTED COMPONENTS:\")\n    print(f\"{'-' * 40}\")\n    for component in components:\n        result = extract_docstring_component(docstring, component)\n        print(f\"{component.upper()}: {result!r}\")\n    print(f\"{'-' * 40}\")\n\n\ndef main():\n    \"\"\"Run all tests for the docstring parser.\"\"\"\n    # Test 1: Standard Google-style docstring\n    test_and_print_result(\n        \"Standard Google-style docstring\",\n        \"\"\"This is the summary line.\n\nThis is the extended description that spans\nmultiple lines.\n\nArgs:\n    param1: Description of param1\n    param2: Description of param2\n\nReturns:\n    Description of the return value\n\nRaises:\n    ValueError: If something goes wrong\n    \nExamples:\n    >>> example_function(1, 2)\n    3\n\"\"\"\n    )\n\n    # Test 2: Docstring with Google-style section markers and colons\n    test_and_print_result(\n        \"Docstring with explicit Google-style section markers\",\n        \"\"\"Summary: This is a summary on the same line as the marker.\n\nDescription:\n    This is a multi-line\n    description.\n\nArgs:\n    param1: Description of param1\n    param2: Description of param2\n\nReturns:\n    Description of the return value\n\nExamples:\n    Example 1\n    Example 2\n\"\"\"\n    )\n\n    # Test 3: Docstring with content on the same line as section headers\n    test_and_print_result(\n        \"Docstring with content on the same line as section headers\",\n        \"\"\"Summary: This is a summary on the same line.\n\nDescription: This is a description on the same line.\n\nArgs: These are args on the same line.\n    param1: Description of param1\n    param2: Description of param2\n\nReturns: This is the return value on the same line.\n\nRaises: These are exceptions on the same line.\n    ValueError: If something goes wrong\n    \nExamples: This is an example on the same line.\n    >>> example_function(1, 2)\n    3\n\"\"\"\n    )\n\n    # Test 4: Docstring with alternative labels\n    test_and_print_result(\n        \"Docstring with alternative section labels\",\n        \"\"\"Brief: This is the summary with alternative label.\n\nDetailed Description:\n    This is the description.\n\nArguments:\n    param1: Description of param1\n    param2: Description of param2\n\nReturn Value:\n    Description of the return value\n\nExceptions:\n    ValueError: If something goes wrong\n    \nUsage:\n    >>> example_function(1, 2)\n    3\n\"\"\"\n    )\n\n    # Test 5: Docstring with no explicit section markers\n    test_and_print_result(\n        \"Docstring with no explicit section markers\",\n        \"\"\"This is just a simple docstring with no section markers.\n\nIt has a second paragraph, but no explicit Args, Returns, etc.\n\"\"\"\n    )\n\n    # Test 6: Empty docstring\n    test_and_print_result(\n        \"Empty docstring\",\n        \"\"\n    )\n\n    # Test 7: Single line docstring\n    test_and_print_result(\n        \"Single line docstring\",\n        \"This is a single line docstring.\"\n    )\n\n    # Test 8: Docstring with unusual indentation\n    test_and_print_result(\n        \"Docstring with unusual indentation\",\n        \"\"\"\n        This is an indented summary.\n        \n            This description has extra indentation.\n        \n        Args:\n                param1: Indented param\n                param2: Indented param\n        \n        Returns:\n                Indented return value\n        \"\"\"\n    )\n\n    # Test 9: Incomplete docstring with some sections missing\n    test_and_print_result(\n        \"Incomplete docstring with some sections missing\",\n        \"\"\"Summary: This is the summary.\n\nArgs:\n    param1: First parameter\n    param2: Second parameter\n\"\"\"\n    )\n\n    # Test 10: Docstring with uppercase section labels\n    test_and_print_result(\n        \"Docstring with uppercase section labels\",\n        \"\"\"SUMMARY: This is the summary.\n\nDESCRIPTION: This is the description.\n\nARGS:\n    param1: First parameter\n    param2: Second parameter\n\nRETURNS: The return value.\n\"\"\"\n    )\n\n    # Test 11: Docstring with mixed case section labels\n    test_and_print_result(\n        \"Docstring with mixed case section labels\",\n        \"\"\"Summary: This is the summary.\n\nDescription: This is the description.\n\nArguments:\n    param1: First parameter\n    param2: Second parameter\n\nReTuRnS: The return value.\n\"\"\"\n    )\n\n    # Test 12: Docstring with complex examples section\n    test_and_print_result(\n        \"Docstring with complex examples section\",\n        \"\"\"Summary: This function does something.\n\nExamples:\n    >>> example_function(1, 2)\n    3\n    \n    More complex example:\n    \n    ```python\n    result = example_function(\n        a=1,\n        b=2\n    )\n    assert result == 3\n    ```\n\"\"\"\n    )\n\n    # Test 13: Docstring with parameters that look like section labels\n    test_and_print_result(\n        \"Docstring with parameters that look like section labels\",\n        \"\"\"Validates input parameters.\n\nArgs:\n    summary: A parameter named \"summary\"\n    description: A parameter named \"description\"\n    returns: A parameter named \"returns\"\n    examples: A parameter named \"examples\"\n\"\"\"\n    )\n\n    # Test 14: Docstring with non-standard sections\n    test_and_print_result(\n        \"Docstring with non-standard sections\",\n        \"\"\"Summary: This is the summary.\n\nDescription: This is the description.\n\nNote:\n    This is an important note.\n\nWarning:\n    This is a warning.\n\nArgs:\n    param1: First parameter\n\"\"\"\n    )\n\n    # Test 15: Docstring with section labels with extra spaces\n    test_and_print_result(\n        \"Docstring with section labels with extra spaces\",\n        \"\"\"Summary  :   This is the summary with extra spaces around the colon.\n\nDescription   :  \n    This is the description.\n\nArgs   :  \n    param1: First parameter\n\"\"\"\n    )\n\n    # Test 16: Docstring with section label on a line by itself (no colon)\n    # This is a tricky case!\n    test_and_print_result(\n        \"Docstring with section label on a line by itself (no colon)\",\n        \"\"\"This is the summary.\n\nDescription\n    This is the description.\n\nArguments\n    param1: First parameter\n    param2: Second parameter\n\nReturns\n    The return value.\n\"\"\"\n    )\n\n    # Test 17: Docstring with Summary section without a colon\n    test_and_print_result(\n        \"Docstring with Summary section without a colon\",\n        \"\"\"Summary\nThis is a summary without a colon after the section label.\n\nDescription:\n    This is the description.\n\"\"\"\n    )\n\n    # Test 18: Docstring with multiple colons in the summary line\n    test_and_print_result(\n        \"Docstring with multiple colons in the summary line\",\n        \"\"\"Summary: This is a summary: with another colon in it.\n\nDescription:\n    This is the description with: a colon.\n\"\"\"\n    )\n\n    # Test 19: Docstring with summary containing special characters\n    test_and_print_result(\n        \"Docstring with summary containing special characters\",\n        \"\"\"Summary: This summary has *special* characters like: [], (), {}\n\nArgs:\n    param1: Description with `code` and *formatting*\n\"\"\"\n    )\n\n    # Test 20: Docstring with only a summary section\n    test_and_print_result(\n        \"Docstring with only a summary section\",\n        \"\"\"Summary: This is only a summary section without other sections.\n\"\"\"\n    )\n\n    # Test 21: Docstring with summary containing multiple paragraphs\n    test_and_print_result(\n        \"Docstring with summary containing multiple paragraphs\",\n        \"\"\"Summary: \n    This is a multi-paragraph summary.\n    \n    It has more than one paragraph.\n    \nDescription:\n    This is the description.\n\"\"\"\n    )\n\n    # Test 22: Docstring with extra spacing between sections\n    test_and_print_result(\n        \"Docstring with extra spacing between sections\",\n        \"\"\"Summary: This is the summary.\n\n\n\nDescription: This is the description.\n\n\n\nArgs:\n    param1: First parameter\n\"\"\"\n    )\n\n    # Test 23: Docstring with no content after section label\n    test_and_print_result(\n        \"Docstring with no content after section label\",\n        \"\"\"Summary:\n\nDescription:\n\nArgs:\n    param1: This parameter has a description\n\nReturns:\n\"\"\"\n    )\n\n    # Test 24: Docstring with inconsistent indentation\n    test_and_print_result(\n        \"Docstring with inconsistent indentation\",\n        \"\"\"Summary: This is a summary.\n\n    Description: \n        This description has inconsistent indentation.\n  Args:\n      param1: Indented 6 spaces\n   param2: Indented differently\n\"\"\"\n    )\n\n    # Test 25: Real-world complex docstring example\n    test_and_print_result(\n        \"Real-world complex docstring example\",\n        '''\"\"\"\nProcess and analyze data from multiple sources.\n\nThis utility function combines data from different sources,\nperforms advanced analytics, and returns a processed result.\nIt handles various edge cases and data inconsistencies.\n\nArgs:\n    data_source (str or Path): Path to the main data source\n    secondary_sources (List[str], optional): Additional data sources to include\n    config (Dict[str, Any]): Configuration parameters with the following structure:\n        {\n            \"preprocessing\": {\n                \"normalize\": bool,\n                \"fill_missing\": str\n            },\n            \"analysis\": {\n                \"method\": str,\n                \"parameters\": Dict[str, Any]\n            }\n        }\n    callback (Callable, optional): Function to call with progress updates\n\nReturns:\n    Dict[str, Any]: Processed results with the following structure:\n        {\n            \"summary\": {\n                \"total_records\": int,\n                \"processed_records\": int,\n                \"anomalies\": int\n            },\n            \"detailed_results\": List[Dict[str, Any]]\n        }\n\nRaises:\n    FileNotFoundError: If any data source cannot be found\n    ValueError: If the configuration is invalid\n    ProcessingError: If analysis fails during execution\n\nExamples:\n    Basic usage:\n    \n    >>> result = process_data(\"data.csv\", config={\"preprocessing\": {\"normalize\": True}})\n    >>> print(result[\"summary\"][\"total_records\"])\n    1000\n    \n    Advanced usage with multiple sources:\n    \n    ```python\n    sources = [\"secondary1.csv\", \"secondary2.csv\"]\n    config = {\n        \"preprocessing\": {\"normalize\": True, \"fill_missing\": \"mean\"},\n        \"analysis\": {\"method\": \"advanced\", \"parameters\": {\"iterations\": 100}}\n    }\n    \n    def progress(percent):\n        print(f\"Processed: {percent}%\")\n        \n    result = process_data(\"main.csv\", sources, config, callback=progress)\n    ```\n\"\"\"'''\n    )\n\n    # New: Test the extract_docstring_component function specifically\n    print(\"\\n\\n\")\n    print(\"*\" * 100)\n    print(\"TESTING extract_docstring_component FUNCTION\")\n    print(\"*\" * 100)\n    \n    # Test Case 1: Standard docstring\n    test_extract_component(\n        \"\"\"This is a standard docstring summary.\n\nThis is the description.\n\nArgs:\n    param1: First parameter\n    param2: Second parameter\n\nReturns:\n    The return value\n\"\"\"\n    )\n    \n    # Test Case 2: Google-style docstring with explicit section markers\n    test_extract_component(\n        \"\"\"Summary: This is a summary with explicit section marker.\n\nDescription: This is a description.\n\nArgs:\n    param1: First parameter\n    param2: Second parameter\n\nReturns:\n    The return value\n\"\"\"\n    )\n    \n    # Test Case 3: Docstring with content on the same line as section headers\n    test_extract_component(\n        \"\"\"Summary: This is a summary on the same line.\n\nDescription: This is a description on the same line.\n\nArgs: These are arguments on the same line.\n    param1: First parameter\n    param2: Second parameter\n\nReturns: This is the return value on the same line.\n\"\"\"\n    )\n    \n    # Test Case 4: Real-world docstring that might be causing issues\n    test_extract_component(\n        \"\"\"Parses a Google-style docstring into its components.\n\nThis function takes a docstring and extracts the summary, description,\nparameters, returns, raises, and examples sections.\n\nArgs:\n    docstring: The docstring to parse\n\nReturns:\n    A dictionary containing the parsed components\n\"\"\"\n    )\n    \n    # Test Case 5: Empty docstring\n    test_extract_component(\"\")\n    \n    # Specific cases reported as problematic\n    print(\"\\n\\n\")\n    print(\"*\" * 100)\n    print(\"TESTING SPECIFIC PROBLEM CASES\")\n    print(\"*\" * 100)\n    \n    # Problem Case: Summary followed immediately by content\n    test_extract_component(\n        \"\"\"Summary:This is a summary with no space after the colon.\n\nDescription:\n    This is a description.\n\"\"\"\n    )\n    \n    # Problem Case: Summary with line break before content\n    test_extract_component(\n        \"\"\"Summary:\n    This is a summary after a line break.\n\nDescription:\n    This is a description.\n\"\"\"\n    )\n\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "tool/remove_docstrings.py",
    "content": "#!/usr/bin/env python3\n# Copyright (c) Meta Platforms, Inc. and affiliates\n\"\"\"\nTool to remove docstrings from Python files in a repository.\n\"\"\"\n\nimport os\nimport ast\nimport astor\nimport argparse\nfrom typing import List, Tuple\n\n\nclass DocstringRemover(ast.NodeTransformer):\n    \"\"\"\n    AST NodeTransformer that removes docstrings from classes, methods, and functions.\n    \"\"\"\n    \n    def visit_ClassDef(self, node):\n        \"\"\"Remove docstrings from class definitions.\"\"\"\n        # Process class body first (recursive)\n        node = self.generic_visit(node)\n        \n        # Remove docstring if present\n        if (node.body and isinstance(node.body[0], ast.Expr) and \n                isinstance(node.body[0].value, ast.Str)):\n            node.body = node.body[1:]\n        \n        return node\n    \n    def visit_FunctionDef(self, node):\n        \"\"\"Remove docstrings from function/method definitions.\"\"\"\n        # Process function body first (recursive)\n        node = self.generic_visit(node)\n        \n        # Remove docstring if present\n        if (node.body and isinstance(node.body[0], ast.Expr) and \n                isinstance(node.body[0].value, ast.Str)):\n            node.body = node.body[1:]\n        \n        return node\n    \n    def visit_AsyncFunctionDef(self, node):\n        \"\"\"Remove docstrings from async function/method definitions.\"\"\"\n        # Process function body first (recursive)\n        node = self.generic_visit(node)\n        \n        # Remove docstring if present\n        if (node.body and isinstance(node.body[0], ast.Expr) and \n                isinstance(node.body[0].value, ast.Str)):\n            node.body = node.body[1:]\n        \n        return node\n\n\ndef find_python_files(directory: str) -> List[str]:\n    \"\"\"Find all Python files in the given directory and its subdirectories.\"\"\"\n    python_files = []\n    \n    for root, _, files in os.walk(directory):\n        for file in files:\n            if file.endswith('.py'):\n                python_files.append(os.path.join(root, file))\n    \n    return python_files\n\n\ndef remove_docstrings_from_file(file_path: str, dry_run: bool = False) -> Tuple[bool, str]:\n    \"\"\"\n    Remove docstrings from a Python file.\n    \n    Args:\n        file_path: Path to the Python file\n        dry_run: If True, don't actually write changes to file\n        \n    Returns:\n        Tuple of (success, message)\n    \"\"\"\n    try:\n        with open(file_path, 'r', encoding='utf-8') as f:\n            source = f.read()\n        \n        # Parse the source code into an AST\n        tree = ast.parse(source)\n        \n        # Remove docstrings\n        transformer = DocstringRemover()\n        new_tree = transformer.visit(tree)\n        \n        # Generate the modified source code\n        new_source = astor.to_source(new_tree)\n        \n        if not dry_run:\n            with open(file_path, 'w', encoding='utf-8') as f:\n                f.write(new_source)\n            \n            return True, f\"Successfully removed docstrings from {file_path}\"\n        else:\n            return True, f\"Would remove docstrings from {file_path} (dry run)\"\n    \n    except Exception as e:\n        return False, f\"Error processing {file_path}: {str(e)}\"\n\n\ndef main():\n    parser = argparse.ArgumentParser(description=\"Remove docstrings from Python files in a repository\")\n    parser.add_argument(\"directory\", help=\"Directory containing Python files to process\")\n    parser.add_argument(\"--dry-run\", action=\"store_true\", help=\"Don't actually modify files, just show what would be done\")\n    args = parser.parse_args()\n    \n    # Find all Python files\n    python_files = find_python_files(args.directory)\n    print(f\"Found {len(python_files)} Python files to process\")\n    \n    # Process each file\n    success_count = 0\n    for file_path in python_files:\n        success, message = remove_docstrings_from_file(file_path, args.dry_run)\n        print(message)\n        if success:\n            success_count += 1\n    \n    # Summary\n    print(f\"\\nProcessed {len(python_files)} files, {success_count} successful\")\n\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "tool/remove_docstrings.sh",
    "content": "#!/bin/bash\n# Copyright (c) Meta Platforms, Inc. and affiliates\n\n# Shell script wrapper for the remove_docstrings.py tool\n\nset -e\n\n# Script directory\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\n\n# Show usage\nfunction show_usage {\n    echo \"Usage: $(basename $0) [options] DIRECTORY\"\n    echo \"\"\n    echo \"Options:\"\n    echo \"  -h, --help     Show this help message\"\n    echo \"  -d, --dry-run  Perform a dry run (no changes are made)\"\n    echo \"  -b, --backup   Create backup files before making changes\"\n    echo \"\"\n    echo \"Example:\"\n    echo \"  $(basename $0) ~/my-python-project\"\n    echo \"  $(basename $0) --dry-run ~/my-python-project\"\n    exit 1\n}\n\n# Parse arguments\nDRY_RUN=\"\"\nBACKUP=false\nDIRECTORY=\"\"\n\nwhile [[ $# -gt 0 ]]; do\n    case $1 in\n        -h|--help)\n            show_usage\n            ;;\n        -d|--dry-run)\n            DRY_RUN=\"--dry-run\"\n            shift\n            ;;\n        -b|--backup)\n            BACKUP=true\n            shift\n            ;;\n        *)\n            if [[ -z \"$DIRECTORY\" ]]; then\n                DIRECTORY=\"$1\"\n            else\n                echo \"Error: Too many arguments\"\n                show_usage\n            fi\n            shift\n            ;;\n    esac\ndone\n\n# Check if directory is provided\nif [[ -z \"$DIRECTORY\" ]]; then\n    echo \"Error: No directory specified\"\n    show_usage\nfi\n\n# Check if directory exists\nif [[ ! -d \"$DIRECTORY\" ]]; then\n    echo \"Error: Directory does not exist: $DIRECTORY\"\n    exit 1\nfi\n\n# Create backups if requested\nif [[ \"$BACKUP\" = true ]]; then\n    echo \"Creating backups of Python files...\"\n    find \"$DIRECTORY\" -name \"*.py\" -type f -exec cp {} {}.bak \\;\n    echo \"Backups created with .bak extension\"\nfi\n\n# Run the Python script\npython3 \"$SCRIPT_DIR/remove_docstrings.py\" $DRY_RUN \"$DIRECTORY\"\n\necho \"Done!\" "
  },
  {
    "path": "tool/serve_local_llm.sh",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates\nCUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \\\n  --model Your-Model-Name \\\n  --tensor-parallel-size 8 \\\n  --quantization fp8 \\\n  --gpu-memory-utilization 0.9 \\\n  --dtype bfloat16 \\\n  --host 0.0.0.0 \\\n  --port 8000"
  }
]