[
  {
    "path": ".devcontainer/devcontainer.json",
    "content": "{\n  \"name\": \"Python 3\",\n  // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile\n  \"image\": \"mcr.microsoft.com/devcontainers/python:1-3.11-bullseye\",\n  \"customizations\": {\n    \"codespaces\": {\n      \"openFiles\": [\n        \"README.md\",\n        \"app.py\"\n      ]\n    },\n    \"vscode\": {\n      \"settings\": {},\n      \"extensions\": [\n        \"ms-python.python\",\n        \"ms-python.vscode-pylance\"\n      ]\n    }\n  },\n  \"updateContentCommand\": \"[ -f packages.txt ] && sudo apt update && sudo apt upgrade -y && sudo xargs apt install -y <packages.txt; [ -f requirements.txt ] && pip3 install --user -r requirements.txt; pip3 install --user streamlit; echo '✅ Packages installed and Requirements met'\",\n  \"postAttachCommand\": {\n    \"server\": \"streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false\"\n  },\n  \"portsAttributes\": {\n    \"8501\": {\n      \"label\": \"Application\",\n      \"onAutoForward\": \"openPreview\"\n    }\n  },\n  \"forwardPorts\": [\n    8501\n  ]\n}"
  },
  {
    "path": ".gitignore",
    "content": "# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Python cache files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# Streamlit\n.streamlit/\n\n# OS generated files\n.DS_Store\n.DS_Store?\n._*\n.Spotlight-V100\n.Trashes\nehthumbs.db\nThumbs.db\n\n# IDE files\n.vscode/\n.idea/\n\n# Logs\n*.log"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2024 dwillowtree\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# DIANA: Detection and Intelligence Analysis for New Alerts\n\nDIANA automates the creation of detections from threat intelligence using Large Language Models (LLMs).\n\nNote: Providing high-quality example detections, logs, and your detection writing process is critical for optimal results.\n\n![DIANA Screenshot](assets/diana_main_1.gif)\n*Select an LLM provider, security log source and detection language*\n\n### If you liked the tool, head over to --> [seiber.ai](https://www.seiber.ai) to stay updated on what we're doing!\n\n## Table of Contents\n\n- [How To Use](#how-to-use)\n- [Threat Research Agents](#threat-research-agents)\n- [Features](#features)\n- [Roadmap](#roadmap)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Configuration](#configuration)\n- [Contributing](#contributing)\n- [License](#license)\n\n## How To Use\n\n1. **Select LLM Provider and Model:**\n   - Choose the LLM provider and model you want to use.\n2. **Choose Security Data/Log Type(s):**\n   - Focus on specific security data or log types.\n3. **Select Detection Language:**\n   - Choose your preferred detection language.\n4. **Input Threat Information:**\n   - Enter a website URL, write a description of threat TTP(s), or upload a document.\n5. **Provide Example Detections:**\n   - *Important:* Provide 3-7 diverse, high-quality example detections for the chosen log source.\n6. **Provide Example Log Sources:**\n   - *Important:* Provide 3-7 example log sources.\n7. **Outline Detection Writing Steps:**\n   - *Recommended:* Outline your typical detection writing steps to help DIANA follow your workflow.\n8. **Describe Alert Triage/Investigation Steps:**\n   - Describe steps for alert triage and investigation.\n9. **Process Threat Intel:**\n   - Click 'Process Threat Intel' to generate detection logic.\n\n*Remember: The quality and diversity of your inputs directly impact DIANA's output. Take time to provide comprehensive examples and follow your standard workflow for the best results.*\n\n![DIANA Screenshot](assets/diana_main_2.gif)\n*DIANA will convert the threat description into a detection, investigation steps and perform a QA check*\n\n## Threat Research Agents\n\n![DIANA Workflow](assets/research_crew.gif)\n*Spin up a crew of autonomous agents to perform threat detection research*\n\nThis feature spins up a crew of autonomous AI agents that perform threat detection research on your topic of choice. They are maxed out at 5 iterations each, so no need to worry about them going rogue and taking over the world.\nThese agents use Exa, which employs semantic search (embeddings) to search the web, providing more contextually relevant results than traditional keyword-based search engines like Google.\n        \n**Examples of research topics:**\n- Threat hunting in Okta logs\n- Most common TTPs used by attackers in AWS\n- Latest detection strategies for ransomware in Windows environments\n\n## Features\n\n- Automates the creation of detections from threat intelligence\n- Supports models accessed via OpenAI API, Anthropic API, and other major LLM providers\n- Converts threat intelligence from natural language descriptions, documents, or website URLs into high-quality detection logic, investigation steps, and response procedures\n- Allows selection of LLM provider, security log source, and detection language to customize outputs\n- Performs quality assurance checks on generated detection logic to ensure syntax accuracy\n- Spin up a crew of AI agents for enhanced threat research (Crew AI, EXA AI, Firecrawl)\n- Requires diverse, high-quality example detections and example log sources for optimal results\n- Follows user-defined detection writing steps and workflows just like a new teammate would\n- Generates comprehensive alert triage and investigation steps using Palantir's ADS framework\n- Runs locally on the user's machine as a Streamlit app\n\n## Roadmap\n\n- [X] Multi-modal support (upload slides from your favorite cons or presentations, diagrams, images of incidents, TTPs)\n- [X] Amazon Bedrock integration (data security and privacy)\n- [ ] Docker container (host Diana yourself in your environment)\n- [ ] Personalized prompts (when you're happy with your results, save your custom prompts so you don't have to keep copy/pasting example detections and logs)\n- [ ] Auto prompt optimization (paste your examples and instructions and your prompt will be optimized for you to get the best possible results)\n- [X] Metrics & Monitoring (view how much tokens you use and your cost $)\n- [ ] RLHF (reinforcement learning from human feedback, thumbs up and down your answers to improve the quality of your results)\n- [ ] Asynchronous/batch processing (convert 10 TTPs all at once in parallel)\n- [ ] Customizable alerting & notification (send results to Slack, Google Chat or Jira ticket)\n- [ ] Subscribe to a threat intel resource of choice (i.e. your favorite blog website or open-source detection content repo)\n- [ ] Enhanced User Documentation and Tutorials: comprehensive user guides, video tutorials, and example use cases to help users get started and make the most out of Diana.\n- [ ] Front End migration (TBD)\n- [ ] Search & Tuning Agent (automatically search your SIEM/XDR/security data lake with your converted detection logic and correct for benign positives)\n- [ ] Add RouteLLM to route prompts to ideal models to save cost, performance\n\n\n## Installation\n\n1. Clone the repository:\n   ```\n   git clone https://github.com/dwillowtree/diana.git\n   cd diana\n   ```\n2. Create a virtual environment and activate it:\n   ```\n   python3.10 -m venv venv\n   source venv/bin/activate  # On Windows use `venv\\Scripts\\activate`\n   ```\n3. Install the required dependencies:\n   ```\n   pip install -r requirements.txt\n   pip install 'crewai[tools]' # you will need to manually install this library\n   ```\n4. Set up your environment variables:\n   - Copy the `.env.example` file to `.env`\n   - Edit the `.env` file and add your OpenAI, Anthropic, EXA AI and Firecrawl API keys\n\n## Usage\n\nTo run the Streamlit app locally:\n```\nstreamlit run app.py\n```\nThen, open your web browser and go to `http://localhost:8501`.  \nPRO TIP: Use Claude 3 Haiku (fast, cheap and smart)\n\n## Configuration\n\n1. Obtain API keys:\n   - For OpenAI: Visit https://platform.openai.com/account/api-keys\n   - For Anthropic: Visit https://www.anthropic.com or follow their documentation\n   - For EXA AI (this is only needed for the threat research agents): Visit https://exa.ai to obtain your API key. Exa searches the web based on the meaning\n   of your search, as opposed to keyword search with Google. https://exa.ai/faq\n   - For Firecrawl: Visit https://www.firecrawl.dev/ you can scrape 500 pages for free a month\n\n2. Add your API keys to the `.env` file:\n   ```\n   OPENAI_API_KEY=your_openai_api_key_here\n   ANTHROPIC_API_KEY=your_anthropic_api_key_here\n   EXA_API_KEY=your_exa_api_key_here\n   FIRECRAWL_API_KEY=your_firecrawl_api_key_here\n   GROQ_API_KEY=your_groq_api_key_here\n   AWS_ACCESS_KEY_ID=your_aws_access_key_id_here\n   AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here\n   AWS_REGION_NAME=your_aws_region_name_here\n   ```\n\n## Contributing\n\n1. Fork the repository\n2. Create a new branch (`git checkout -b feature/your-feature-name`)\n3. Make your changes\n4. Commit your changes (`git commit -am 'Add some feature'`)\n5. Push to the branch (`git push origin feature/your-feature-name`)\n6. Create a new Pull Request\n\nPlease ensure that your code follows the existing style and includes appropriate tests and documentation.\n\n**If you have any feedback on the tool, or just want to talk AI or security shoot an email to dwilliams@seiber.ai.**\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details."
  },
  {
    "path": "app.py",
    "content": "import streamlit as st\nimport os\nfrom dotenv import load_dotenv\nfrom litellm import completion\nimport litellm\nfrom threat_research import perform_threat_research\nfrom ui import render_ui\nfrom config import prompts\n\n# Load environment variables\nload_dotenv()\n\n# Initialize session state for cost tracking\nif 'total_cost' not in st.session_state:\n    st.session_state.total_cost = 0\n\n# Shared variable for cost tracking\nshared_cost = 0\n\n# Define the callback function\ndef track_cost_callback(kwargs, completion_response, start_time, end_time):\n    global shared_cost\n    try:\n        response_cost = kwargs.get(\"response_cost\", 0)\n        shared_cost += response_cost\n        print(f\"Streaming response cost: ${response_cost:.6f}\")\n    except Exception as e:\n        print(f\"Error tracking cost: {str(e)}\")\n\n# Set the callback\nlitellm.success_callback = [track_cost_callback]\n\ndef process_with_llm(prompt, model, max_tokens, temperature):\n    global shared_cost\n    try:\n        response = litellm.completion(\n            model=model,\n            messages=[\n                {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ],\n            max_tokens=max_tokens,\n            temperature=temperature\n        )\n        # Update the Streamlit session state in the main thread\n        st.session_state.total_cost += shared_cost\n        st.info(f\"Total cost so far: ${st.session_state.total_cost:.6f}\")\n        shared_cost = 0  # Reset shared cost after updating session state\n        return response.choices[0].message.content.strip()\n    except Exception as e:\n        st.error(f\"Error with LLM API for {model}: {str(e)}\")\n        return None\n\ndef process_threat_intel(description, file_content, model, data_types, detection_language, current_detections, example_logs, detection_steps, sop, max_tokens, temperature):\n    results = {}\n    for i, prompt in enumerate(prompts, 1):\n        context = {\n            \"description\": description,\n            \"file_content\": file_content,\n            \"data_types\": \", \".join(data_types),\n            \"detection_language\": detection_language,\n            \"current_detections\": \"\\n\".join(current_detections),\n            \"example_logs\": \"\\n\".join(example_logs),\n            \"detection_steps\": detection_steps,\n            \"sop\": sop,\n            \"previous_analysis\": results.get(1, \"\") if \"Entire Analysis\" in st.session_state.selected_detection else next((d for d in st.session_state.detections if st.session_state.selected_detection in d['name']), \"\"),\n            \"previous_detection_rule\": results.get(2, \"\"),\n            \"previous_investigation_steps\": results.get(3, \"\"),\n            \"previous_qa_findings\": results.get(4, \"\")\n        }\n        \n        formatted_prompt = prompt.format(**context)\n        \n        result = process_with_llm(formatted_prompt, model, max_tokens, temperature)\n        \n        if result is None:\n            return None\n\n        results[i] = result\n    \n    return results\n\nif __name__ == \"__main__\":\n    render_ui(prompts, process_with_llm)\n"
  },
  {
    "path": "config.py",
    "content": "# Prompts for each step of the process\nprompts = [\n    # Prompt 1: Analyze threat intelligence\n    \"\"\"You are an expert cyber security threat intelligence analyst. \n    The intel will be provided to you in the form of incident reports, threat intel reports, cyber security blogs, adverary emulation tools, existing detection content, or any description in natural language\n    of techniques, tactics and procedures (TTPs) used by cyber security threat actors. Avoid using atomic indicators like IP addresses or domain names. Focus only on behaviors or techniques.\n    Analyze the following threat intelligence:\n\nDescription: {description}\nBlog/Report (if provided): {file_content}\nScraped Website Content (if provided): {scraped_content}\n\nFocus only on threat intelligence that can be used to write detections for {data_types}. Extract potential detections that have clear log evidence in the provided intelligence.\nFor each potential detection:\n1. Provide a concise name\n2. Write a detailed description of the threat behavior\n3. List the specific log data or events that would be used in the detection\n4. Include any relevant context or prerequisites for the detection\n\nFormat your analysis as a numbered list:\n\n1. Detection Name: [Concise name]\n   Threat Behavior: [Detailed description]\n   Log Evidence: [Specific log data or events]\n   Context: [Any relevant prerequisites or environmental factors]\n\nIf no detections are found for the specified data sources, clearly state this.\"\"\",\n    # Prompt 2: Create detection rule\n    \"\"\"As a detection engineer specializing in {detection_language}, create a robust detection rule based on the following analysis:\n{previous_analysis}\n\nAdditional context:\n- Example detections: {current_detections}\n- Log examples: {example_logs}\n- Detection steps (if any): {detection_steps}\n\nYour task:\n1. Write a detection rule in {detection_language} that accurately captures the threat behavior\n2. Ensure the rule uses the specific log data identified in the analysis\n3. Include comments explaining the logic and any assumptions made\n\nIf a complete detection cannot be written, explain why and specify any missing information.\n\nPresent the final detection rule in a code block, followed by:\n- Explanation of the rule's logic\n- Any limitations or edge cases\n- Estimated false positive rate and rationale\"\"\",\n\n    # Prompt 3: Develop investigation guide\n    \"\"\"As an experienced SOC analyst, create a detailed investigation guide for the following detection rule:\n\n{previous_detection_rule}\n\nUse Palantir's alert and detection strategy framework and incorporate elements from this standard operating procedure (if provided): {sop}\n\nYour investigation guide should include:\n1. Initial triage steps to quickly assess the alert's validity\n2. Detailed investigation procedures, including specific queries or commands\n3. Criteria for escalation or closure of the alert\n4. Potential related TTPs or lateral movement to look for\n5. Recommended containment or mitigation actions\n\nFormat the guide as a numbered list with clear, concise, and actionable steps. Include any caveats, limitations, or decision points an analyst might encounter.\"\"\",\n\n    # Prompt 4: Quality assurance review\n    \"\"\"As a QA specialist in cyber threat detection with extensive experience in {detection_language}, conduct a thorough and comprehensive review of the following detection rule:\n\nDetection Rule:\n{previous_detection_rule}\n\nAnalysis from Threat Intelligence:\n{previous_analysis}\n\nAssess the following aspects in detail, providing a score out of 10 for each:\n\n1. Syntactic Correctness (10 points):\n   - Is the rule syntactically correct in {detection_language}?\n   - Are there any syntax errors or potential runtime issues?\n   - Does it follow best practices and conventions for {detection_language}?\n\n2. Logical Accuracy (10 points):\n   - Does the rule accurately capture all aspects of the threat behavior described in the analysis?\n   - Are there any logical errors or misinterpretations of the threat intelligence?\n   - Is the detection logic complete and comprehensive?\n\n3. Coverage (10 points):\n   - Does the rule cover all potential variations of the threat behavior?\n   - Are there any edge cases or scenarios not addressed by the current implementation?\n\n4. Performance and Efficiency (10 points):\n   - Is the detection optimized for performance in the target environment?\n   - Are there any potential bottlenecks or resource-intensive operations?\n   - Could the rule be optimized without sacrificing accuracy?\n\n5. False Positive/Negative Analysis (10 points):\n   - Provide a realistic estimate of both false positive and false negative rates\n   - Justify your estimates with specific scenarios or data points\n   - Suggest ways to minimize false positives without increasing false negatives\n\n6. Robustness and Evasion Resistance (10 points):\n   - How easily could an attacker evade this detection?\n   - Are there any obvious bypass methods?\n   - Suggest improvements to make the detection more robust against evasion techniques\n\n7. Investigation Guide Quality (10 points):\n   - Are the investigation steps clear, comprehensive, and actionable?\n   - Do they cover all necessary aspects of validation, investigation, and response?\n   - Are there any missing steps or areas that need more detail?\n\n8. Integration and Dependencies (10 points):\n   - Does the rule rely on any external data sources or lookups?\n   - Are there any potential issues with data availability or freshness?\n\n9. Maintenance and Updatability (10 points):\n   - How easily can this rule be updated or modified in the future?\n   - Are there any hard-coded elements that might require frequent updates?\n\n10. Overall Effectiveness (10 points):\n    - How well does the detection rule achieve its intended purpose?\n    - Does it strike a good balance between accuracy, performance, and maintainability?\n\nFor each aspect, provide:\n- A score out of 10\n- Detailed explanation of your assessment\n- Specific, actionable recommendations for improvement\n- If no changes are needed, a thorough justification for why the current version is optimal\n\nPresent your QA findings as a structured report with clear recommendations for each aspect. Include code snippets or pseudo-code where applicable to illustrate suggested improvements.\n\nConclude with an overall assessment of the detection rule's quality and readiness for production deployment, including the total score out of 100 and a brief explanation of the score.\"\"\",\n\n    # Prompt 5: Final summary\n    \"\"\"As a senior threat analyst, compile a comprehensive detection package using the following components:\n\nDetection Rule:\n{previous_detection_rule}\n\nInvestigation Steps:\n{previous_investigation_steps}\n\nQA Findings:\n{previous_qa_findings}\n\nCreate a markdown-formatted output with the following structure:\n\n# [Threat TTP Name]: [Detection Rule Name]\n\n## Threat Description\n[Concise description of the threat behavior this detection aims to identify]\n\n## Detection Rule\n{detection_language}\n{previous_detection_rule}\n\n## Log Sources\n[List of specific log sources or data types required for this detection]\n\n##Investigation Steps\n[Numbered list of investigation steps from {previous_investigation_steps}]\n\nPerformance Considerations\n[Brief notes on expected performance, including estimated false positive rate]\n\n## Quality Assessment\n[Give the overall score out of 100 and the summary from {previous_qa_findings}]\n\nEnsure the final output is well-structured, comprehensive, and ready for review and implementation by the security operations team.\"\"\"\n\n]"
  },
  {
    "path": "firecrawl_integration.py",
    "content": "# firecrawl_integration.py\nimport os\nfrom firecrawl import FirecrawlApp\nfrom dotenv import load_dotenv\n\n# Load environment variables from .env file\nload_dotenv()\n\n# Get the Firecrawl API key from the .env file\nAPI_KEY = os.getenv('FIRECRAWL_API_KEY')\n\ndef scrape_url(url):\n    app = FirecrawlApp(api_key=API_KEY)\n    response = app.scrape_url(url=url)\n    print(response)  # Debugging output to check the response structure\n    if isinstance(response, dict) and 'success' in response and response['success']:\n        return response['data']['markdown']\n    elif isinstance(response, dict) and 'content' in response:\n        return response['content']  # Fallback to returning the 'content' key if the structure is different\n    else:\n        raise Exception(f\"Error scraping URL: {response}\")\n\n\n"
  },
  {
    "path": "requirements.txt",
    "content": "streamlit\nrequests\npython-dotenv\nopenai\nanthropic\ncrewai\nexa-py\nlangchain_openai\nPyMuPDF\nfirecrawl\nlitellm\nboto3"
  },
  {
    "path": "threat_research.py",
    "content": "import os\nimport sys\nfrom dotenv import load_dotenv\nfrom langchain_openai import ChatOpenAI\nfrom crewai import Agent, Task, Crew, Process\nfrom crewai_tools import SerperDevTool, EXASearchTool, ScrapeWebsiteTool\n\n# Load environment variables\nload_dotenv()\n\ndef perform_threat_research(query):\n\n    openai_model = os.getenv(\"OPENAI_MODEL_NAME\", \"gpt-4\")\n    # Initialize tools\n    exa_search_tool = EXASearchTool()\n    scrape_website_tool = ScrapeWebsiteTool()\n\n    # Define agents\n    researcher = Agent(\n        role='Cyber Threat Intelligence Researcher',\n        goal=f'Research the highest quality information related to: {query}, only focusing on techniques, tactics, and procedures that are good candidates for detections. Ensure the intel contains log source evidence information.',\n        backstory=\"As a seasoned cyber threat researcher, you're at the forefront of identifying and analyzing emerging threats. Your expertise helps security teams write the best detection logic to catch threats. You focus on gathering actionable threat intel that includes clear log evidence for detection.\",\n        verbose=True,\n        allow_delegation=False,\n        llm=ChatOpenAI(model_name=openai_model),\n        max_iter=5,\n        tools=[exa_search_tool, scrape_website_tool]\n    )\n\n    analyst = Agent(\n        role='Detection Engineer',\n        goal='Analyze the information from the Cyber Threat Intelligence Researcher and select the highest quality candidates for detections. Ensure the information is sufficient to convert into detection logic, focusing on log source evidence.',\n        backstory=\"With a keen eye for detail and a deep understanding of cyber threats, you excel at interpreting raw data and translating it into actionable detections for security operations teams. You prioritize threat intel that includes detailed log source evidence, ensuring the detection logic is robust and effective.\",\n        verbose=True,\n        allow_delegation=True,\n        llm=ChatOpenAI(model_name=openai_model),\n        max_iter=5,\n        tools=[exa_search_tool, scrape_website_tool]\n    )\n\n    # Define tasks\n    research_task = Task(\n        description=f\"\"\"Research and select the top 10 pieces of information related to: {query}. Follow these steps:\n        1. Search for and gather detailed information from threat intelligence reports, cybersecurity blogs, and any relevant online sources describing real-world cyber incidents.\n        2. Focus on identifying techniques, tactics, and procedures (TTPs) that are suitable candidates for detection.\n        3. Ensure that each TTP includes clear log source evidence that can be used to write detection logic. Avoid atomic indicators of compromise.\n        4. Some good examples are: threat hunting in Okta logs, TTPs used in AWS attacks, detection engineering in CloudTrail logs, hunting in SaaS logs, writing detections in Kubernetes audit logs, and detection engineering in EKS logs.\n        5. Document each TTP with detailed descriptions, including the context of its use, how it manifests in logs, and why it is a good candidate for detection.\n\n        The expected output is a comprehensive report detailing the top 10 threat intelligence findings, including:\n        - Threat names and descriptions\n        - Techniques, tactics, and procedures with log source evidence\n        - Detailed context and usage information\n        - Reasons why each TTP is a good candidate for detection.\"\"\",\n        agent=researcher,\n        expected_output=\"A comprehensive report detailing the top 10 threat intelligence findings, including threat names, descriptions, and log source evidence information.\",\n        tools=[exa_search_tool]\n    )\n\n    analysis_task = Task(\n        description=\"\"\"Analyze the research findings provided by the Cyber Threat Intelligence Researcher. Follow these steps:\n        1. Carefully review the comprehensive report containing TTPs and behaviors.\n        2. Identify and select the highest quality TTPs that would make great candidates for detection.\n        3. Ensure each selected TTP includes sufficient log source information to be converted into detection logic.\n        4. Prioritize TTPs that have clear and actionable log evidence, making them ideal for creating robust detection rules.\n        5. Provide examples of good detection logic sources, such as sigma rules, Datadog OOTB rules, Panther Security content rules, Splunk security content, Elastic security content rules, and other open-source detection content.\n\n        The expected output is a detailed analysis summarizing the threat intelligence, highlighting the best candidates for detection, and providing:\n        - A summary of the selected TTPs\n        - Detailed log source evidence for each TTP\n        - Actionable insights and strategies for detection and mitigation\"\"\",\n        agent=analyst,\n        expected_output=\"A comprehensive report that lists the final list of TTPs selected for detection, along with detailed log source evidence and actionable insights.\",\n    )\n\n    # Create the crew\n    crew = Crew(\n        agents=[researcher, analyst],\n        tasks=[research_task, analysis_task],\n        process=Process.sequential,\n        verbose=2  # Increased verbosity for more detailed output\n    )\n\n    # Kick off the research process\n    result = crew.kickoff(inputs={'query': query})\n    return result\n\n# Modified main block for subprocess compatibility\nif __name__ == \"__main__\":\n    if len(sys.argv) > 1:\n        query = sys.argv[1]\n        print(f\"Starting threat research for query: {query}\")\n        result = perform_threat_research(query)\n        print(\"Research completed. Final result:\")\n        print(result)\n    else:\n        print(\"Please provide a query as a command-line argument.\")"
  },
  {
    "path": "trained_agents_data.pkl",
    "content": "\u0004}."
  },
  {
    "path": "ui.py",
    "content": "import subprocess\nimport sys\nimport streamlit as st\nfrom dotenv import load_dotenv\nimport os\nimport fitz\nfrom threat_research import perform_threat_research\nfrom firecrawl_integration import scrape_url\n\n# Load environment variables\nload_dotenv()\n\ndef render_ui(prompts, process_with_llm):\n    # Streamlit UI\n    st.set_page_config(page_title=\"D.I.A.N.A.\", page_icon=\"🛡️\", layout=\"wide\")\n\n    # Custom CSS and JavaScript for improved styling and resizable sidebar\n    st.markdown(\"\"\"\n    <style>\n        .stApp {\n            max-width: none;\n            padding: 1rem;\n        }\n        .main .block-container {\n            max-width: none;\n            padding-left: 2rem;\n            padding-right: 2rem;\n        }\n        .main-content {\n            background-color: #f0f2f6;\n            padding: 2rem;\n            border-radius: 10px;\n        }\n        .sidebar .stButton>button {\n            width: 100%;\n        }\n        .stProgress > div > div > div > div {\n            background-color: #4CAF50;\n        }\n        .tooltip {\n            position: relative;\n            display: inline-block;\n            cursor: help;\n        }\n        .tooltip .tooltiptext {\n            visibility: hidden;\n            width: 200px;\n            background-color: #555;\n            color: #fff;\n            text-align: center;\n            border-radius: 6px;\n            padding: 5px;\n            position: absolute;\n            z-index: 1;\n            bottom: 125%;\n            left: 50%;\n            margin-left: -100px;\n            opacity: 0;\n            transition: opacity 0.3s;\n        }\n        .tooltip:hover .tooltiptext {\n            visibility: visible;\n            opacity: 1;\n        }\n        [data-testid=\"stSidebar\"] {\n            min-width: 300px;\n            max-width: 800px;\n            width: 300px;\n            resize: horizontal;\n            overflow: auto;\n        }\n        [data-testid=\"stSidebar\"] > div:first-child {\n            width: 100%;\n            height: 100%;\n        }\n        .stApp > header {\n            background-color: transparent;\n        }\n        .stApp {\n            margin: 0;\n        }\n        .resize-handle {\n            position: absolute;\n            right: -5px;\n            top: 0;\n            bottom: 0;\n            width: 10px;\n            cursor: col-resize;\n            z-index: 1000;\n        }\n    </style>\n\n    <script>\n        const resizeHandle = document.createElement('div');\n        resizeHandle.className = 'resize-handle';\n        const sidebar = window.parent.document.querySelector('[data-testid=\"stSidebar\"]');\n        sidebar.appendChild(resizeHandle);\n\n        let isResizing = false;\n        let lastDownX = 0;\n\n        resizeHandle.addEventListener('mousedown', (e) => {\n            isResizing = true;\n            lastDownX = e.clientX;\n        });\n\n        document.addEventListener('mousemove', (e) => {\n            if (!isResizing) return;\n            const offsetRight = document.body.offsetWidth - (e.clientX - document.body.offsetLeft);\n            const minWidth = 300;\n            const maxWidth = 800;\n            const newWidth = Math.min(Math.max(minWidth, document.body.offsetWidth - offsetRight), maxWidth);\n            sidebar.style.width = newWidth + 'px';\n        });\n\n        document.addEventListener('mouseup', (e) => {\n            isResizing = false;\n        });\n    </script>\n    \"\"\", unsafe_allow_html=True)\n\n    # Add a sidebar\n    sidebar = st.sidebar\n\n    with sidebar:\n        st.image(\"https://i.imgur.com/wEHCCaj.png\", width=300)\n        st.markdown(\n            \"\"\"\n            <div style=\"text-align: center;\">\n                Developed by Dylan Williams <a href=\"https://www.linkedin.com/in/dylan-williams-a2927599/\" target=\"_blank\">LinkedIn</a> | \n                <a href=\"https://github.com/dwillowtree/diana\" target=\"_blank\">GitHub Repository</a>\n            </div>\n            \"\"\",\n            unsafe_allow_html=True\n        )\n        st.markdown(\"---\")\n\n        # Quick Start Guide section (collapsed by default)\n        with st.expander(\"Quick Start Guide\", expanded=False):\n            st.markdown(\"\"\"\n            DIANA (Detection and Intelligence Analysis for New Alerts) automates detection creation from threat intelligence.\n\n            **Note: Providing high-quality example detections, logs, and your detection writing process is critical for optimal results.**\n\n            ### Steps:\n            1. Select LLM provider and model\n            2. Choose security data/log type(s) for detection\n            3. Select detection language\n            4. Input threat TTPs description or upload report/blog post\n            5. **Important:** Provide 3-7 diverse, high-quality example detections for the chosen log source\n            6. **Important:** Provide 3-7 example log sources\n            7. **Recommended:** Outline your typical detection writing steps (this helps DIANA follow your workflow)\n            8. Describe alert triage/investigation steps\n            9. Click 'Process Threat Intel'\n\n            Remember: The quality and diversity of your inputs directly impact DIANA's output. Take time to provide comprehensive examples and follow your standard workflow for best results.\n            \"\"\")\n        # About DIANA section (collapsed by default)\n        with st.expander(\"About DIANA\", expanded=False):\n            st.markdown(\"\"\"\n            DIANA (Detection and Intelligence Analysis for New Alerts) is an AI-powered tool designed to streamline the detection writing process in cybersecurity operations.\n\n            ### Purpose:\n            - Automate the creation of detections from threat intelligence\n            - Reduce manual effort in researching log sources and writing queries\n            - Generate investigation steps and quality assurance checks\n\n            DIANA leverages advanced AI capabilities to enhance efficiency and accuracy in cybersecurity threat detection, allowing security teams to respond more quickly and effectively to emerging threats.\n            \"\"\")\n        st.subheader(\"Configuration\")\n        \n        # LLM Provider selection with tooltip\n        llm_provider = st.selectbox(\n            \"LLM Provider\",\n            [\"OpenAI\", \"Anthropic\", \"Amazon Bedrock\", \"Groq\"],\n            index=1,\n            key=\"llm_provider\",\n            help=\"Choose the AI model provider for processing.\"\n        )\n\n        # Add Pro Tip before the model type selection\n        st.sidebar.markdown(\"**PRO TIP:** Use Claude 3 Haiku (it's smart, fast, and cheap)\")\n\n        # Model selection based on provider\n        if llm_provider == \"OpenAI\":\n            model = st.selectbox(\n                \"Model Type\",\n                [\"gpt-4o-mini\", \"gpt-4o\", \"gpt-4-turbo\", \"gpt-3.5-turbo\"],\n                key=\"openai_model\",\n                help=\"Select the OpenAI model to use for processing.\"\n            )\n        elif llm_provider == \"Anthropic\":\n            model = st.selectbox(\n                \"Model Type\",\n                [\"claude-3-5-sonnet-20240620\", \"claude-3-opus-20240229\", \"claude-3-haiku-20240307\"],\n                index=2, \n                key=\"anthropic_model\",\n                help=\"Select the Anthropic model to use for processing.\"\n            )\n        elif llm_provider == \"Amazon Bedrock\":\n            model = st.selectbox(\n                \"Model Type\",\n                [\n                    \"bedrock/anthropic.claude-3-sonnet-20240229-v1:0\",\n                    \"bedrock/anthropic.claude-3-haiku-20240307-v1:0\",\n                    \"bedrock/meta.llama3-8b-instruct-v1:0\",\n                    \"bedrock/meta.llama3-70b-instruct-v1:0\"\n                ],\n                key=\"bedrock_model\",\n                help=\"Select the Amazon Bedrock model to use for processing.\"\n            )\n        elif llm_provider == \"Groq\":\n            model = st.selectbox(\n                \"Model Type\",\n        [\n            \"groq/llama-3.1-70b-versatile\",\n            \"groq/llama-3.1-8b-instant\",\n            \"groq/llama3-8b-8192\",\n        ],\n        key=\"groq_model\",\n        help=\"Select the Groq model to use for processing.\"\n    )\n        \n        # Data types multiselect with search functionality\n        data_types = st.multiselect(\n            \"Security Data/Log Type(s)\",\n            [\n                \"Okta Logs\", \"AWS CloudTrail Logs\", \"Kubernetes Audit Logs\", \"GitLab Audit Logs\", \"AWS EKS Plane logs\", \"Cisco Duo Logs\"\n            ],\n            default=[\"AWS CloudTrail Logs\"],\n            key=\"security_data_type\",\n            help=\"Select the relevant log types for your detection.\"\n        )\n\n        # Detection language selection with tooltip\n        detection_language = st.selectbox(\n            \"Detection Language\",\n            [\n                \"AWS Athena\", \"StreamAlert\", \"Splunk SPL\", \"Falcon LogScale\", \"Elastic Query DSL\",\n                \"Kusto Query Language (KQL)\",\n                \"Sigma Rules\",\"Panther (Python)\", \"Hunters (Snowflake SQL)\"\n            ],\n            key=\"detection_language_select\",\n            help=\"Choose the query language for your detection rules.\"\n        )\n\n        # Model parameters with explanations\n        st.subheader(\"Model Parameters\")\n        temperature = st.slider(\n            \"Temperature\",\n            min_value=0.0,\n            max_value=1.0,\n            value=0.1,\n            step=0.1,\n            key=\"temperature_slider\",\n            help=\"Controls output randomness. Lower values (e.g., 0.2) for more deterministic results, higher values (e.g., 0.8) for more creative outputs.\"\n        )\n        \n        max_tokens = st.slider(\n            \"Max Tokens\",\n            min_value=100,\n            max_value=4000,\n            value=4000,\n            step=100,\n            key=\"max_tokens_slider\",\n            help=\"Maximum number of tokens in the generated response. Higher values allow for longer outputs but may increase processing time.\"\n        )\n    \n    st.title(\"🛡️ D.I.A.N.A.\")\n    st.subheader(\"Detection and Intelligence Analysis for New Alerts\")\n\n\n    # Create tabs for main workflow and threat research\n    tab1, tab2, tab3 = st.tabs([\"Detection Engineering\", \"Threat Research Crew\", \"Bulk Detection Processing [Coming Soon]\"])\n\n    # Progress bar for multi-step process\n    if 'step' not in st.session_state:\n        st.session_state.step = 0\n\n    # Function to update progress\n    def update_progress():\n        # Cap the step at 5 for progress calculation\n        capped_step = min(st.session_state.step, 5)\n        progress_value = capped_step / 5\n        progress_bar.progress(progress_value)\n        \n        # Display the actual step number, even if it's beyond 5\n        step_counter.markdown(f\"**Current Step: {st.session_state.step}/5**\")\n        \n    with tab1:\n        # Create placeholders for the progress bar and step counter\n        progress_bar = st.empty()\n        step_counter = st.empty()\n\n        # Update progress at the beginning\n        update_progress()\n\n    # Main content layout\n        col1, col2 = st.columns(2)\n        with col1:\n            st.subheader(\"Threat Intelligence Input\")\n            url = st.text_input(\"Enter URL:\")\n            \n            # Initialize session state for scraped content if it doesn't exist\n            if 'scraped_content' not in st.session_state:\n                st.session_state.scraped_content = \"\"\n\n            # Scrape URL button\n            if st.button(\"🔍 Scrape URL\", type=\"primary\"):  \n                if url:\n                    try:\n                        with st.spinner(\"Scraping URL...\"):\n                            st.session_state.scraped_content = scrape_url(url)\n                        st.success(\"URL scraped successfully!\")\n                    except Exception as e:\n                        st.error(f\"Error scraping URL: {e}\")\n                        st.session_state.scraped_content = \"\"\n                else:\n                    st.warning(\"Please enter a URL to scrape.\")\n\n            # Display scraped content if available\n            if st.session_state.scraped_content:\n                with st.expander(\"View Scraped Content\", expanded=False):\n                    st.markdown(st.session_state.scraped_content)\n            \n            description = st.text_area(\n                \"Enter threat intelligence description:\",\n                height=100,\n                placeholder=\"Detect a user attempting to exfiltrate an Amazon EC2 AMI Snapshot. This rule lets you monitor the ModifyImageAttribute CloudTrail API calls to detect when an Amazon EC2 AMI snapshot is made public or shared with an AWS account. This rule also inspects: @requestParameters.launchPermission.add.items.group array to determine if the string all is contained. This is the indicator which means the RDS snapshot is made public. @requestParameters.launchPermission.add.items.userId array to determine if the string * is contained. This is the indicator which means the RDS snapshot was shared with a new or unknown AWS account.\",\n                help=\"Provide a detailed description of the threat intelligence you want to analyze.\"\n            )\n            uploaded_file = st.file_uploader(\n                \"Upload threat intel report or blog (optional)\",\n                type=[\"txt\", \"pdf\"],\n                help=\"You can optionally upload a threat intelligence report or blog post for analysis.\"\n            )\n\n            file_content = \"\"\n\n            if uploaded_file is not None:\n                if uploaded_file.type == \"application/pdf\":\n                    # Process PDF file\n                    pdf_document = fitz.open(stream=uploaded_file.read(), filetype=\"pdf\")\n                    for page_num in range(pdf_document.page_count):\n                        page = pdf_document.load_page(page_num)\n                        file_content += page.get_text()\n                else:\n                    # Process other text files\n                    file_content = uploaded_file.getvalue().decode(\"utf-8\")\n\n                    # Collapsible sections for additional inputs\n            with st.expander(\"Detection Writing Steps\", expanded=False):\n                detection_steps = st.text_area(\n                    \"Enter detection writing steps:\",\n                    height=150,\n                    placeholder=\"1. Identify the key indicators or behaviors from the threat intel\\n2. Determine the relevant log sources and fields\\n3. Write the query using the specified detection language\\n4. Include appropriate filtering to reduce false positives\\n5. Add comments to explain the logic of the detection\",\n                    help=\"Outline the steps you typically follow when writing detection rules.\"\n                )\n\n            with st.expander(\"Alert Triage Steps\", expanded=False):\n                sop = st.text_area(\n                    \"Enter standard operating procedures or investigation steps for your current detections and alerts:\",\n                    height=150,\n                    placeholder=\"1. Validate the alert by reviewing the raw log data\\n2. Check for any related alerts or suspicious activities from the same source\\n3. Investigate the affected systems and user accounts\\n4. Determine the potential impact and scope of the incident\\n5. Escalate to the incident response team if a true positive is confirmed\",\n                    help=\"Describe your standard operating procedures for triaging and investigating alerts.\"\n                )    \n\n        with col2:\n            st.subheader(\"Example Detections\")\n            num_detections = st.number_input(\"Number of example detections\", min_value=1, value=2, step=1)\n            current_detections = [\n                st.text_area(\n                    f\"Example detection {i+1}\",\n                    height=100,\n                    placeholder=\"SELECT sourceIPAddress, eventName, userAgent\\nFROM cloudtrail_logs\\nWHERE eventName = 'ConsoleLogin' AND errorMessage LIKE '%Failed authentication%'\\nGROUP BY sourceIPAddress, eventName, userAgent\\nHAVING COUNT(*) > 10\",\n                    help=\"Provide an example of an existing detection query in your environment.\"\n                ) for i in range(num_detections)\n            ]\n            \n            st.subheader(\"Example Logs\")\n            num_logs = st.number_input(\"Number of example logs\", min_value=1, value=2, step=1)\n            example_logs = [\n                st.text_area(\n                    f\"Example log {i+1}\",\n                    height=100,\n                    placeholder=\"paste examples of your actual logs here, you may have different field names or logging structure\",\n                    help=\"Provide examples of actual log entries from your environment.\"\n                ) for i in range(num_logs)\n            ]\n\n        def run_threat_research(query, crewai_model):\n            # Create a placeholder in the Streamlit UI\n            output_placeholder = st.empty()\n\n            # Prepare the environment variables\n            env = os.environ.copy()\n            env[\"OPENAI_MODEL_NAME\"] = crewai_model\n\n            # Run the threat_research.py script and capture its output\n            process = subprocess.Popen(['python', 'threat_research.py', query], \n                                    stdout=subprocess.PIPE, \n                                    stderr=subprocess.STDOUT,\n                                    universal_newlines=True,\n                                    env=env)\n\n            # Stream the output to the Streamlit UI\n            full_output = \"\"\n            for line in process.stdout:\n                full_output += line\n                output_placeholder.text_area(\"Research Progress:\", full_output, height=400)\n\n            # Return the final result\n            return full_output\n\n        # Process Threat Intel button\n        if st.button(\"🚀 Process Threat Intel\", type=\"primary\") or st.session_state.step > 0:\n            if not description and not uploaded_file and not st.session_state.scraped_content and st.session_state.step == 0:\n                st.error(\"Please provide either a threat intel description or upload a file.\")\n            else:\n                if st.session_state.step == 0:\n                    # Step 1: Analyze Threat Intel\n                    st.subheader(\"Step 1: Analyze Threat Intel\")\n                    details = st.expander(\"View Details\", expanded=False)\n\n                    context = {\n                        \"description\": description,\n                        \"file_content\": file_content,\n                        \"scraped_content\": st.session_state.scraped_content,\n                        \"data_types\": \", \".join(data_types),\n                    }\n\n                    formatted_prompt = prompts[0].format(**context)\n\n                    with details:\n                        st.text(\"Prompt:\")\n                        st.code(formatted_prompt, language=\"markdown\")\n\n                    with st.spinner(\"Analyzing threat intelligence...\"):\n                        result = process_with_llm(formatted_prompt, model, max_tokens, temperature)\n\n                    if result is None:\n                        st.error(\"An error occurred while analyzing the threat intelligence.\")\n                    else:\n                        # Store the result in session state\n                        st.session_state.result = result\n\n                        with details:\n                            st.text(\"Result:\")\n                            st.code(st.session_state.result, language=\"markdown\")\n\n                        st.success(\"Analysis complete!\")\n                        st.session_state.step = 1\n                        update_progress()\n\n                if st.session_state.step >= 1:\n                    with st.spinner(\"Parsing detections...\"):\n                        # Parse the result to extract detections\n                        detections = []\n                        current_detection = {\"name\": \"\", \"behavior\": \"\", \"log_evidence\": \"\", \"context\": \"\"}\n                        capturing_threat_behavior = False\n                        capturing_log_evidence = False\n                        capturing_context = False\n\n                        for line in st.session_state.result.split('\\n'):\n                            stripped_line = line.strip()\n                            if stripped_line.startswith((\"Detection Name:\", \"1.\", \"2.\", \"3.\", \"4.\", \"5.\", \"6.\", \"7.\", \"8.\", \"9.\", \"10.\")):\n                                if current_detection[\"name\"]:\n                                    detections.append(current_detection)\n                                    current_detection = {\"name\": \"\", \"behavior\": \"\", \"log_evidence\": \"\", \"context\": \"\"}\n                                name = stripped_line.split(\":\", 1)[-1].strip() if \":\" in stripped_line else stripped_line.split(\".\", 1)[-1].strip()\n                                name = name.lstrip(\"0123456789. \")\n                                current_detection[\"name\"] = name\n                            elif \"Threat Behavior:\" in stripped_line:\n                                capturing_threat_behavior = True\n                                capturing_log_evidence = False\n                                capturing_context = False\n                                current_detection[\"behavior\"] = stripped_line.split(\"Threat Behavior:\", 1)[-1].strip()\n                            elif \"Log Evidence:\" in stripped_line:\n                                capturing_threat_behavior = False\n                                capturing_log_evidence = True\n                                capturing_context = False\n                                current_detection[\"log_evidence\"] = stripped_line.split(\"Log Evidence:\", 1)[-1].strip()\n                            elif \"Context:\" in stripped_line:\n                                capturing_threat_behavior = False\n                                capturing_log_evidence = False\n                                capturing_context = True\n                                current_detection[\"context\"] = stripped_line.split(\"Context:\", 1)[-1].strip()\n                            elif capturing_threat_behavior:\n                                current_detection[\"behavior\"] += \" \" + stripped_line\n                            elif capturing_log_evidence:\n                                current_detection[\"log_evidence\"] += \" \" + stripped_line\n                            elif capturing_context:\n                                current_detection[\"context\"] += \" \" + stripped_line\n\n                        if current_detection[\"name\"]:\n                            detections.append(current_detection)\n\n                        if not detections:\n                            st.warning(\"No specific detections were identified. The entire analysis will be processed as a single detection.\")\n                            detections = [{\"name\": \"Entire Analysis\", \"behavior\": st.session_state.result, \"log_evidence\": \"\", \"context\": \"\"}]\n\n                        st.session_state.detections = detections\n\n                    # Display the number of detections found\n                    st.info(f\"Number of detections found: {len(detections)}\")\n\n                    # Display the names of the detections and their threat behaviors\n                    if detections:\n                        st.write(\"Detections found:\")\n                        for detection in detections:\n                            st.markdown(f\"**Detection Name:** {detection['name']}\")\n                            st.write(f\"**Threat Behavior:** {detection['behavior']}\")\n                            st.write(f\"**Log Evidence:** {detection['log_evidence']}\")\n                            st.write(f\"**Context:** {detection['context']}\")\n                            st.write(\"---\")\n\n                    # Allow user to select a detection\n                    selected_detection_name = st.selectbox(\"Select a detection to process:\", [d[\"name\"] for d in st.session_state.detections])\n\n                    if st.button(\"Process Selected Detection\", type=\"primary\"):\n                        selected_detection = next(d for d in st.session_state.detections if d[\"name\"] == selected_detection_name)\n                        st.session_state.selected_detection = selected_detection\n                        st.session_state.step = 2\n                        update_progress()\n\n                if st.session_state.step >= 2:\n                    # Process the remaining steps for the selected detection\n                    selected_detection = st.session_state.selected_detection\n\n                    st.write(\"Processing the selected detection:\")\n                    st.markdown(f\"**Detection Name:** {selected_detection['name']}\")\n                    st.write(f\"**Threat Behavior:** {selected_detection['behavior']}\")\n                    st.write(f\"**Log Evidence:** {selected_detection['log_evidence']}\")\n                    st.write(f\"**Context:** {selected_detection['context']}\")\n\n                    # Further processing steps...\n                    results = {}\n                    results[1] = st.session_state.result  # Store the first result\n\n                    for i in range(2, 6):\n                        if st.session_state.step > i:\n                            continue\n\n                        step_name = ['Create Detection Rule', 'Develop Investigation Guide', 'Quality Assurance Review', 'Final Summary'][i-2]\n                        \n                        st.subheader(f\"Step {i}: {step_name}\")\n                        details = st.expander(\"View Details\", expanded=False)\n\n                        context = {\n                            \"detection_language\": detection_language,\n                            \"current_detections\": \"\\n\".join(current_detections),\n                            \"example_logs\": \"\\n\".join(example_logs),\n                            \"detection_steps\": detection_steps,\n                            \"sop\": sop,\n                            \"previous_analysis\": selected_detection,\n                            \"previous_detection_rule\": results.get(2, \"\"),\n                            \"previous_investigation_steps\": results.get(3, \"\"),\n                            \"previous_qa_findings\": results.get(4, \"\")\n                        }\n\n                        formatted_prompt = prompts[i-1].format(**context)\n\n                        with details:\n                            st.text(\"Prompt:\")\n                            st.code(formatted_prompt, language=\"markdown\")\n\n                        with st.spinner(f\"Processing {step_name}...\"):\n                            result = process_with_llm(formatted_prompt, model, max_tokens, temperature)\n\n                        if result is None:\n                            st.error(f\"An error occurred while processing {step_name}.\")\n                            break\n\n                        results[i] = result\n\n                        with details:\n                            st.text(\"Result:\")\n                            st.code(result, language=\"markdown\")\n\n                        st.success(f\"{step_name} complete!\")\n                        st.session_state.step = i + 1\n                        update_progress()\n\n                    if len(results) == 5:\n                        st.session_state.step = 6  # Indicate completion\n                        update_progress()\n                        st.success(\"Processing complete!\")\n                        st.markdown(results[5])\n\n                        # Add a button to restart the process\n                        if st.button(\"Start Over\"):\n                            st.session_state.step = 0\n                            update_progress()\n                            st.experimental_rerun()\n                    else:\n                        st.error(\"An error occurred while processing the threat intelligence.\")\n\n    with tab2:\n        # Threat Research section\n        st.subheader(\"Threat Research Crew\")\n        \n        st.markdown(\"\"\"\n        This feature spins up a crew of autonomous AI agents that perform threat detection research on your topic of choice. They are maxed out at 5 iterations each, so no need to worry\n        about them going rogue and taking over the world.\n\n        This feature is currently limited to OpenAI models.\n        \n        These agents use Exa, which employs semantic search (embeddings) to search the web, providing more contextually relevant results than traditional keyword-based search engines like Google.\n        \n        **Examples of research topics:**\n        - Threat hunting in Okta logs\n        - Most common TTPs used by attackers in AWS\n        - Latest detection strategies for ransomware in Windows environments\n        \"\"\")\n\n        # Add the CrewAI model selection here\n        crewai_model = st.selectbox(\n            \"CrewAI Model\",\n            [\"gpt-4o-mini\", \"gpt-4-turbo\", \"gpt-4o\"],\n            index=0,  # Set default to gpt-3.5-turbo\n            key=\"crewai_model\",\n            help=\"Select the model for CrewAI to use\"\n        )\n\n        research_query = st.text_input(\n            \"Enter your cybersecurity research topic:\",\n            placeholder=\"E.g., 'Threat hunting in Okta logs' or 'TTPs from CloudTrail logs used in AWS attacks'\",\n            help=\"Specify a topic for in-depth threat research to supplement your analysis.\"\n        )\n\n        if st.button(\"🔍 Perform Threat Research\", type=\"primary\", key=\"research_button\"):\n            if research_query:\n                with st.spinner(\"Performing threat research... This may take a few minutes.\"):\n                    research_result = run_threat_research(research_query, crewai_model)\n                \n                st.subheader(\"Threat Research Results\")\n                st.markdown(research_result)\n                \n            else:\n                st.warning(\"Please enter a research topic before performing threat research.\")\n    with tab3:\n        st.subheader(\"Open Source Detection Content\")\n        \n        st.markdown(\"[![Elastic](https://img.shields.io/badge/Elastic-005571?style=for-the-badge&logo=elastic&logoColor=white)](https://github.com/elastic/detection-rules)\")\n        st.markdown(\"[![Chronicle](https://img.shields.io/badge/Chronicle-4285F4?style=for-the-badge&logo=google-cloud&logoColor=white)](https://github.com/chronicle/detection-rules)\")\n        st.markdown(\"[![Sigma](https://img.shields.io/badge/Sigma-008080?style=for-the-badge&logo=sigma&logoColor=white)](https://github.com/SigmaHQ/sigma)\")\n        st.markdown(\"[![Hacking the Cloud](https://img.shields.io/badge/Hacking_the_Cloud-FF9900?style=for-the-badge&logo=amazon-aws&logoColor=white)](https://hackingthe.cloud/)\")\n        st.markdown(\"[![Wiz Threats](https://img.shields.io/badge/Wiz_Threats-00ADEF?style=for-the-badge&logo=wiz&logoColor=white)](https://threats.wiz.io/)\")\n        st.markdown(\"[![Anvilogic Armory](https://img.shields.io/badge/Anvilogic_Armory-6F2DA8?style=for-the-badge&logo=github&logoColor=white)](https://github.com/anvilogic-forge/armory)\")\n        st.markdown(\"[![Panther](https://img.shields.io/badge/Panther-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/panther-labs/panther-analysis/tree/release/rules)\")\n        st.markdown(\"[![Splunk](https://img.shields.io/badge/Splunk-000000?style=for-the-badge&logo=splunk&logoColor=white)](https://github.com/splunk/security_content)\")\n        st.markdown(\"[![Datadog](https://img.shields.io/badge/Datadog-632CA6?style=for-the-badge&logo=datadog&logoColor=white)](https://docs.datadoghq.com/security/default_rules/)\")\n        st.markdown(\"[![Falco Security](https://img.shields.io/badge/Falco_Security-6A737D?style=for-the-badge&logo=github&logoColor=white)](https://github.com/falcosecurity/rules)\")\n        st.markdown(\"[![ExaBeam Content Library](https://img.shields.io/badge/ExaBeam-008080?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ExabeamLabs/Content-Doc)\")\n        st.markdown(\"[![Sekoia Detection Rules](https://img.shields.io/badge/Sekoia_Detection-0000FF?style=for-the-badge&logo=github&logoColor=white)](https://docs.sekoia.io/xdr/features/detect/built_in_detection_rules/)\")\n        st.markdown(\"[![Sublime](https://img.shields.io/badge/Sublime-FF6347?style=for-the-badge&logo=github&logoColor=white)](https://github.com/sublime-security/sublime-rules/)\")\n        st.markdown(\"[![Cloud Security Atlas](https://img.shields.io/badge/Cloud_Security_Atlas-632CA6?style=for-the-badge&logo=datadog&logoColor=white)](https://securitylabs.datadoghq.com/cloud-security-atlas/)\")\n        st.markdown(\"[![SaaS Attack Matrix](https://img.shields.io/badge/SaaS_Attack_Matrix-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/pushsecurity/saas-attacks)\")\n        st.markdown(\"[![Delivr.to Email Detections](https://img.shields.io/badge/Delivr.to-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/delivr-to/detections)\")\n        st.markdown(\"[![Public Cloud Breaches](https://img.shields.io/badge/Public_Cloud_Breaches-FF5733?style=for-the-badge&logo=google-cloud&logoColor=white)](https://www.breaches.cloud/)\")\n        st.markdown(\"[![CI/CD Threat Matrix](https://img.shields.io/badge/CI/CD_Threat_Matrix-6A737D?style=for-the-badge&logo=github&logoColor=white)](https://github.com/rung/threat-matrix-cicd)\")\n        st.markdown(\"[![K8s Attack Trees](https://img.shields.io/badge/K8s_Attack_Trees-0000FF?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/cncf/financial-user-group/tree/main/projects/k8s-threat-model)\")\n        st.markdown(\"[![eBPF Detections](https://img.shields.io/badge/eBPF_Detections-005571?style=for-the-badge&logo=aqua&logoColor=white)](https://github.com/aquasecurity/tracee/tree/main/signatures/golang)\")\n        st.markdown(\"[![Insider Threat TTP KnowledgeBase](https://img.shields.io/badge/Insider_Threat_TTP_KnowledgeBase-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/center-for-threat-informed-defense/insider-threat-ttp-kb)\")\n        st.markdown(\"[![Insider Threat Matrix](https://img.shields.io/badge/Insider_Threat_Matrix-FF9900?style=for-the-badge&logo=mitre&logoColor=white)](https://insiderthreatmatrix.org/)\")\n        st.markdown(\"[![Mitre Atlas](https://img.shields.io/badge/Mitre_Atlas-FF6347?style=for-the-badge&logo=mitre&logoColor=white)](https://atlas.mitre.org/techniques)\")\n        st.markdown(\"[![Dorothy](https://img.shields.io/badge/Dorothy-005571?style=for-the-badge&logo=elastic&logoColor=white)](https://github.com/elastic/dorothy)\")\n        st.markdown(\"[![Stratus Red Team](https://img.shields.io/badge/Stratus_Red_Team-000000?style=for-the-badge&logo=redhat&logoColor=white)](https://stratus-red-team.cloud/)\")\n        st.markdown(\"[![KubeHound](https://img.shields.io/badge/KubeHound-632CA6?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/DataDog/KubeHound)\")\n        st.markdown(\"[![RedKube](https://img.shields.io/badge/RedKube-FF0000?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/lightspin-tech/red-kube)\")\n        st.markdown(\"[![Kubesploit](https://img.shields.io/badge/Kubesploit-6A737D?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/cyberark/kubesploit)\")\n        st.markdown(\"[![Kubehunter](https://img.shields.io/badge/Kubehunter-005571?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/aquasecurity/kube-hunter)\")\n        st.markdown(\"[![Workload Security Evaluator](https://img.shields.io/badge/Workload_Security_Evaluator-632CA6?style=for-the-badge&logo=datadog&logoColor=white)](https://github.com/DataDog/workload-security-evaluator)\")\n        st.markdown(\"[![DeRF](https://img.shields.io/badge/DeRF-6A737D?style=for-the-badge&logo=github&logoColor=white)](https://github.com/vectra-ai-research/derf)\")\n        st.markdown(\"[![AWS Threat Composer](https://img.shields.io/badge/AWS_Threat_Composer-FF9900?style=for-the-badge&logo=amazon-aws&logoColor=white)](https://github.com/awslabs/threat-composer)\")\n\n\n    st.markdown(\"---\")\n"
  }
]