Full Code of dwillowtree/diana for AI

main 433fd4594424 cached
11 files
61.8 KB
13.7k tokens
6 symbols
1 requests
Download .txt
Repository: dwillowtree/diana
Branch: main
Commit: 433fd4594424
Files: 11
Total size: 61.8 KB

Directory structure:
gitextract_zlwpkh9w/

├── .devcontainer/
│   └── devcontainer.json
├── .gitignore
├── LICENSE
├── README.md
├── app.py
├── config.py
├── firecrawl_integration.py
├── requirements.txt
├── threat_research.py
├── trained_agents_data.pkl
└── ui.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .devcontainer/devcontainer.json
================================================
{
  "name": "Python 3",
  // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
  "image": "mcr.microsoft.com/devcontainers/python:1-3.11-bullseye",
  "customizations": {
    "codespaces": {
      "openFiles": [
        "README.md",
        "app.py"
      ]
    },
    "vscode": {
      "settings": {},
      "extensions": [
        "ms-python.python",
        "ms-python.vscode-pylance"
      ]
    }
  },
  "updateContentCommand": "[ -f packages.txt ] && sudo apt update && sudo apt upgrade -y && sudo xargs apt install -y <packages.txt; [ -f requirements.txt ] && pip3 install --user -r requirements.txt; pip3 install --user streamlit; echo '✅ Packages installed and Requirements met'",
  "postAttachCommand": {
    "server": "streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false"
  },
  "portsAttributes": {
    "8501": {
      "label": "Application",
      "onAutoForward": "openPreview"
    }
  },
  "forwardPorts": [
    8501
  ]
}

================================================
FILE: .gitignore
================================================
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Python cache files
__pycache__/
*.py[cod]
*$py.class

# Streamlit
.streamlit/

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# IDE files
.vscode/
.idea/

# Logs
*.log

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2024 dwillowtree

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# DIANA: Detection and Intelligence Analysis for New Alerts

DIANA automates the creation of detections from threat intelligence using Large Language Models (LLMs).

Note: Providing high-quality example detections, logs, and your detection writing process is critical for optimal results.

![DIANA Screenshot](assets/diana_main_1.gif)
*Select an LLM provider, security log source and detection language*

### If you liked the tool, head over to --> [seiber.ai](https://www.seiber.ai) to stay updated on what we're doing!

## Table of Contents

- [How To Use](#how-to-use)
- [Threat Research Agents](#threat-research-agents)
- [Features](#features)
- [Roadmap](#roadmap)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Contributing](#contributing)
- [License](#license)

## How To Use

1. **Select LLM Provider and Model:**
   - Choose the LLM provider and model you want to use.
2. **Choose Security Data/Log Type(s):**
   - Focus on specific security data or log types.
3. **Select Detection Language:**
   - Choose your preferred detection language.
4. **Input Threat Information:**
   - Enter a website URL, write a description of threat TTP(s), or upload a document.
5. **Provide Example Detections:**
   - *Important:* Provide 3-7 diverse, high-quality example detections for the chosen log source.
6. **Provide Example Log Sources:**
   - *Important:* Provide 3-7 example log sources.
7. **Outline Detection Writing Steps:**
   - *Recommended:* Outline your typical detection writing steps to help DIANA follow your workflow.
8. **Describe Alert Triage/Investigation Steps:**
   - Describe steps for alert triage and investigation.
9. **Process Threat Intel:**
   - Click 'Process Threat Intel' to generate detection logic.

*Remember: The quality and diversity of your inputs directly impact DIANA's output. Take time to provide comprehensive examples and follow your standard workflow for the best results.*

![DIANA Screenshot](assets/diana_main_2.gif)
*DIANA will convert the threat description into a detection, investigation steps and perform a QA check*

## Threat Research Agents

![DIANA Workflow](assets/research_crew.gif)
*Spin up a crew of autonomous agents to perform threat detection research*

This feature spins up a crew of autonomous AI agents that perform threat detection research on your topic of choice. They are maxed out at 5 iterations each, so no need to worry about them going rogue and taking over the world.
These agents use Exa, which employs semantic search (embeddings) to search the web, providing more contextually relevant results than traditional keyword-based search engines like Google.
        
**Examples of research topics:**
- Threat hunting in Okta logs
- Most common TTPs used by attackers in AWS
- Latest detection strategies for ransomware in Windows environments

## Features

- Automates the creation of detections from threat intelligence
- Supports models accessed via OpenAI API, Anthropic API, and other major LLM providers
- Converts threat intelligence from natural language descriptions, documents, or website URLs into high-quality detection logic, investigation steps, and response procedures
- Allows selection of LLM provider, security log source, and detection language to customize outputs
- Performs quality assurance checks on generated detection logic to ensure syntax accuracy
- Spin up a crew of AI agents for enhanced threat research (Crew AI, EXA AI, Firecrawl)
- Requires diverse, high-quality example detections and example log sources for optimal results
- Follows user-defined detection writing steps and workflows just like a new teammate would
- Generates comprehensive alert triage and investigation steps using Palantir's ADS framework
- Runs locally on the user's machine as a Streamlit app

## Roadmap

- [X] Multi-modal support (upload slides from your favorite cons or presentations, diagrams, images of incidents, TTPs)
- [X] Amazon Bedrock integration (data security and privacy)
- [ ] Docker container (host Diana yourself in your environment)
- [ ] Personalized prompts (when you're happy with your results, save your custom prompts so you don't have to keep copy/pasting example detections and logs)
- [ ] Auto prompt optimization (paste your examples and instructions and your prompt will be optimized for you to get the best possible results)
- [X] Metrics & Monitoring (view how much tokens you use and your cost $)
- [ ] RLHF (reinforcement learning from human feedback, thumbs up and down your answers to improve the quality of your results)
- [ ] Asynchronous/batch processing (convert 10 TTPs all at once in parallel)
- [ ] Customizable alerting & notification (send results to Slack, Google Chat or Jira ticket)
- [ ] Subscribe to a threat intel resource of choice (i.e. your favorite blog website or open-source detection content repo)
- [ ] Enhanced User Documentation and Tutorials: comprehensive user guides, video tutorials, and example use cases to help users get started and make the most out of Diana.
- [ ] Front End migration (TBD)
- [ ] Search & Tuning Agent (automatically search your SIEM/XDR/security data lake with your converted detection logic and correct for benign positives)
- [ ] Add RouteLLM to route prompts to ideal models to save cost, performance


## Installation

1. Clone the repository:
   ```
   git clone https://github.com/dwillowtree/diana.git
   cd diana
   ```
2. Create a virtual environment and activate it:
   ```
   python3.10 -m venv venv
   source venv/bin/activate  # On Windows use `venv\Scripts\activate`
   ```
3. Install the required dependencies:
   ```
   pip install -r requirements.txt
   pip install 'crewai[tools]' # you will need to manually install this library
   ```
4. Set up your environment variables:
   - Copy the `.env.example` file to `.env`
   - Edit the `.env` file and add your OpenAI, Anthropic, EXA AI and Firecrawl API keys

## Usage

To run the Streamlit app locally:
```
streamlit run app.py
```
Then, open your web browser and go to `http://localhost:8501`.  
PRO TIP: Use Claude 3 Haiku (fast, cheap and smart)

## Configuration

1. Obtain API keys:
   - For OpenAI: Visit https://platform.openai.com/account/api-keys
   - For Anthropic: Visit https://www.anthropic.com or follow their documentation
   - For EXA AI (this is only needed for the threat research agents): Visit https://exa.ai to obtain your API key. Exa searches the web based on the meaning
   of your search, as opposed to keyword search with Google. https://exa.ai/faq
   - For Firecrawl: Visit https://www.firecrawl.dev/ you can scrape 500 pages for free a month

2. Add your API keys to the `.env` file:
   ```
   OPENAI_API_KEY=your_openai_api_key_here
   ANTHROPIC_API_KEY=your_anthropic_api_key_here
   EXA_API_KEY=your_exa_api_key_here
   FIRECRAWL_API_KEY=your_firecrawl_api_key_here
   GROQ_API_KEY=your_groq_api_key_here
   AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
   AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
   AWS_REGION_NAME=your_aws_region_name_here
   ```

## Contributing

1. Fork the repository
2. Create a new branch (`git checkout -b feature/your-feature-name`)
3. Make your changes
4. Commit your changes (`git commit -am 'Add some feature'`)
5. Push to the branch (`git push origin feature/your-feature-name`)
6. Create a new Pull Request

Please ensure that your code follows the existing style and includes appropriate tests and documentation.

**If you have any feedback on the tool, or just want to talk AI or security shoot an email to dwilliams@seiber.ai.**

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

================================================
FILE: app.py
================================================
import streamlit as st
import os
from dotenv import load_dotenv
from litellm import completion
import litellm
from threat_research import perform_threat_research
from ui import render_ui
from config import prompts

# Load environment variables
load_dotenv()

# Initialize session state for cost tracking
if 'total_cost' not in st.session_state:
    st.session_state.total_cost = 0

# Shared variable for cost tracking
shared_cost = 0

# Define the callback function
def track_cost_callback(kwargs, completion_response, start_time, end_time):
    global shared_cost
    try:
        response_cost = kwargs.get("response_cost", 0)
        shared_cost += response_cost
        print(f"Streaming response cost: ${response_cost:.6f}")
    except Exception as e:
        print(f"Error tracking cost: {str(e)}")

# Set the callback
litellm.success_callback = [track_cost_callback]

def process_with_llm(prompt, model, max_tokens, temperature):
    global shared_cost
    try:
        response = litellm.completion(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=max_tokens,
            temperature=temperature
        )
        # Update the Streamlit session state in the main thread
        st.session_state.total_cost += shared_cost
        st.info(f"Total cost so far: ${st.session_state.total_cost:.6f}")
        shared_cost = 0  # Reset shared cost after updating session state
        return response.choices[0].message.content.strip()
    except Exception as e:
        st.error(f"Error with LLM API for {model}: {str(e)}")
        return None

def process_threat_intel(description, file_content, model, data_types, detection_language, current_detections, example_logs, detection_steps, sop, max_tokens, temperature):
    results = {}
    for i, prompt in enumerate(prompts, 1):
        context = {
            "description": description,
            "file_content": file_content,
            "data_types": ", ".join(data_types),
            "detection_language": detection_language,
            "current_detections": "\n".join(current_detections),
            "example_logs": "\n".join(example_logs),
            "detection_steps": detection_steps,
            "sop": sop,
            "previous_analysis": results.get(1, "") if "Entire Analysis" in st.session_state.selected_detection else next((d for d in st.session_state.detections if st.session_state.selected_detection in d['name']), ""),
            "previous_detection_rule": results.get(2, ""),
            "previous_investigation_steps": results.get(3, ""),
            "previous_qa_findings": results.get(4, "")
        }
        
        formatted_prompt = prompt.format(**context)
        
        result = process_with_llm(formatted_prompt, model, max_tokens, temperature)
        
        if result is None:
            return None

        results[i] = result
    
    return results

if __name__ == "__main__":
    render_ui(prompts, process_with_llm)


================================================
FILE: config.py
================================================
# Prompts for each step of the process
prompts = [
    # Prompt 1: Analyze threat intelligence
    """You are an expert cyber security threat intelligence analyst. 
    The intel will be provided to you in the form of incident reports, threat intel reports, cyber security blogs, adverary emulation tools, existing detection content, or any description in natural language
    of techniques, tactics and procedures (TTPs) used by cyber security threat actors. Avoid using atomic indicators like IP addresses or domain names. Focus only on behaviors or techniques.
    Analyze the following threat intelligence:

Description: {description}
Blog/Report (if provided): {file_content}
Scraped Website Content (if provided): {scraped_content}

Focus only on threat intelligence that can be used to write detections for {data_types}. Extract potential detections that have clear log evidence in the provided intelligence.
For each potential detection:
1. Provide a concise name
2. Write a detailed description of the threat behavior
3. List the specific log data or events that would be used in the detection
4. Include any relevant context or prerequisites for the detection

Format your analysis as a numbered list:

1. Detection Name: [Concise name]
   Threat Behavior: [Detailed description]
   Log Evidence: [Specific log data or events]
   Context: [Any relevant prerequisites or environmental factors]

If no detections are found for the specified data sources, clearly state this.""",
    # Prompt 2: Create detection rule
    """As a detection engineer specializing in {detection_language}, create a robust detection rule based on the following analysis:
{previous_analysis}

Additional context:
- Example detections: {current_detections}
- Log examples: {example_logs}
- Detection steps (if any): {detection_steps}

Your task:
1. Write a detection rule in {detection_language} that accurately captures the threat behavior
2. Ensure the rule uses the specific log data identified in the analysis
3. Include comments explaining the logic and any assumptions made

If a complete detection cannot be written, explain why and specify any missing information.

Present the final detection rule in a code block, followed by:
- Explanation of the rule's logic
- Any limitations or edge cases
- Estimated false positive rate and rationale""",

    # Prompt 3: Develop investigation guide
    """As an experienced SOC analyst, create a detailed investigation guide for the following detection rule:

{previous_detection_rule}

Use Palantir's alert and detection strategy framework and incorporate elements from this standard operating procedure (if provided): {sop}

Your investigation guide should include:
1. Initial triage steps to quickly assess the alert's validity
2. Detailed investigation procedures, including specific queries or commands
3. Criteria for escalation or closure of the alert
4. Potential related TTPs or lateral movement to look for
5. Recommended containment or mitigation actions

Format the guide as a numbered list with clear, concise, and actionable steps. Include any caveats, limitations, or decision points an analyst might encounter.""",

    # Prompt 4: Quality assurance review
    """As a QA specialist in cyber threat detection with extensive experience in {detection_language}, conduct a thorough and comprehensive review of the following detection rule:

Detection Rule:
{previous_detection_rule}

Analysis from Threat Intelligence:
{previous_analysis}

Assess the following aspects in detail, providing a score out of 10 for each:

1. Syntactic Correctness (10 points):
   - Is the rule syntactically correct in {detection_language}?
   - Are there any syntax errors or potential runtime issues?
   - Does it follow best practices and conventions for {detection_language}?

2. Logical Accuracy (10 points):
   - Does the rule accurately capture all aspects of the threat behavior described in the analysis?
   - Are there any logical errors or misinterpretations of the threat intelligence?
   - Is the detection logic complete and comprehensive?

3. Coverage (10 points):
   - Does the rule cover all potential variations of the threat behavior?
   - Are there any edge cases or scenarios not addressed by the current implementation?

4. Performance and Efficiency (10 points):
   - Is the detection optimized for performance in the target environment?
   - Are there any potential bottlenecks or resource-intensive operations?
   - Could the rule be optimized without sacrificing accuracy?

5. False Positive/Negative Analysis (10 points):
   - Provide a realistic estimate of both false positive and false negative rates
   - Justify your estimates with specific scenarios or data points
   - Suggest ways to minimize false positives without increasing false negatives

6. Robustness and Evasion Resistance (10 points):
   - How easily could an attacker evade this detection?
   - Are there any obvious bypass methods?
   - Suggest improvements to make the detection more robust against evasion techniques

7. Investigation Guide Quality (10 points):
   - Are the investigation steps clear, comprehensive, and actionable?
   - Do they cover all necessary aspects of validation, investigation, and response?
   - Are there any missing steps or areas that need more detail?

8. Integration and Dependencies (10 points):
   - Does the rule rely on any external data sources or lookups?
   - Are there any potential issues with data availability or freshness?

9. Maintenance and Updatability (10 points):
   - How easily can this rule be updated or modified in the future?
   - Are there any hard-coded elements that might require frequent updates?

10. Overall Effectiveness (10 points):
    - How well does the detection rule achieve its intended purpose?
    - Does it strike a good balance between accuracy, performance, and maintainability?

For each aspect, provide:
- A score out of 10
- Detailed explanation of your assessment
- Specific, actionable recommendations for improvement
- If no changes are needed, a thorough justification for why the current version is optimal

Present your QA findings as a structured report with clear recommendations for each aspect. Include code snippets or pseudo-code where applicable to illustrate suggested improvements.

Conclude with an overall assessment of the detection rule's quality and readiness for production deployment, including the total score out of 100 and a brief explanation of the score.""",

    # Prompt 5: Final summary
    """As a senior threat analyst, compile a comprehensive detection package using the following components:

Detection Rule:
{previous_detection_rule}

Investigation Steps:
{previous_investigation_steps}

QA Findings:
{previous_qa_findings}

Create a markdown-formatted output with the following structure:

# [Threat TTP Name]: [Detection Rule Name]

## Threat Description
[Concise description of the threat behavior this detection aims to identify]

## Detection Rule
{detection_language}
{previous_detection_rule}

## Log Sources
[List of specific log sources or data types required for this detection]

##Investigation Steps
[Numbered list of investigation steps from {previous_investigation_steps}]

Performance Considerations
[Brief notes on expected performance, including estimated false positive rate]

## Quality Assessment
[Give the overall score out of 100 and the summary from {previous_qa_findings}]

Ensure the final output is well-structured, comprehensive, and ready for review and implementation by the security operations team."""

]

================================================
FILE: firecrawl_integration.py
================================================
# firecrawl_integration.py
import os
from firecrawl import FirecrawlApp
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get the Firecrawl API key from the .env file
API_KEY = os.getenv('FIRECRAWL_API_KEY')

def scrape_url(url):
    app = FirecrawlApp(api_key=API_KEY)
    response = app.scrape_url(url=url)
    print(response)  # Debugging output to check the response structure
    if isinstance(response, dict) and 'success' in response and response['success']:
        return response['data']['markdown']
    elif isinstance(response, dict) and 'content' in response:
        return response['content']  # Fallback to returning the 'content' key if the structure is different
    else:
        raise Exception(f"Error scraping URL: {response}")




================================================
FILE: requirements.txt
================================================
streamlit
requests
python-dotenv
openai
anthropic
crewai
exa-py
langchain_openai
PyMuPDF
firecrawl
litellm
boto3

================================================
FILE: threat_research.py
================================================
import os
import sys
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, EXASearchTool, ScrapeWebsiteTool

# Load environment variables
load_dotenv()

def perform_threat_research(query):

    openai_model = os.getenv("OPENAI_MODEL_NAME", "gpt-4")
    # Initialize tools
    exa_search_tool = EXASearchTool()
    scrape_website_tool = ScrapeWebsiteTool()

    # Define agents
    researcher = Agent(
        role='Cyber Threat Intelligence Researcher',
        goal=f'Research the highest quality information related to: {query}, only focusing on techniques, tactics, and procedures that are good candidates for detections. Ensure the intel contains log source evidence information.',
        backstory="As a seasoned cyber threat researcher, you're at the forefront of identifying and analyzing emerging threats. Your expertise helps security teams write the best detection logic to catch threats. You focus on gathering actionable threat intel that includes clear log evidence for detection.",
        verbose=True,
        allow_delegation=False,
        llm=ChatOpenAI(model_name=openai_model),
        max_iter=5,
        tools=[exa_search_tool, scrape_website_tool]
    )

    analyst = Agent(
        role='Detection Engineer',
        goal='Analyze the information from the Cyber Threat Intelligence Researcher and select the highest quality candidates for detections. Ensure the information is sufficient to convert into detection logic, focusing on log source evidence.',
        backstory="With a keen eye for detail and a deep understanding of cyber threats, you excel at interpreting raw data and translating it into actionable detections for security operations teams. You prioritize threat intel that includes detailed log source evidence, ensuring the detection logic is robust and effective.",
        verbose=True,
        allow_delegation=True,
        llm=ChatOpenAI(model_name=openai_model),
        max_iter=5,
        tools=[exa_search_tool, scrape_website_tool]
    )

    # Define tasks
    research_task = Task(
        description=f"""Research and select the top 10 pieces of information related to: {query}. Follow these steps:
        1. Search for and gather detailed information from threat intelligence reports, cybersecurity blogs, and any relevant online sources describing real-world cyber incidents.
        2. Focus on identifying techniques, tactics, and procedures (TTPs) that are suitable candidates for detection.
        3. Ensure that each TTP includes clear log source evidence that can be used to write detection logic. Avoid atomic indicators of compromise.
        4. Some good examples are: threat hunting in Okta logs, TTPs used in AWS attacks, detection engineering in CloudTrail logs, hunting in SaaS logs, writing detections in Kubernetes audit logs, and detection engineering in EKS logs.
        5. Document each TTP with detailed descriptions, including the context of its use, how it manifests in logs, and why it is a good candidate for detection.

        The expected output is a comprehensive report detailing the top 10 threat intelligence findings, including:
        - Threat names and descriptions
        - Techniques, tactics, and procedures with log source evidence
        - Detailed context and usage information
        - Reasons why each TTP is a good candidate for detection.""",
        agent=researcher,
        expected_output="A comprehensive report detailing the top 10 threat intelligence findings, including threat names, descriptions, and log source evidence information.",
        tools=[exa_search_tool]
    )

    analysis_task = Task(
        description="""Analyze the research findings provided by the Cyber Threat Intelligence Researcher. Follow these steps:
        1. Carefully review the comprehensive report containing TTPs and behaviors.
        2. Identify and select the highest quality TTPs that would make great candidates for detection.
        3. Ensure each selected TTP includes sufficient log source information to be converted into detection logic.
        4. Prioritize TTPs that have clear and actionable log evidence, making them ideal for creating robust detection rules.
        5. Provide examples of good detection logic sources, such as sigma rules, Datadog OOTB rules, Panther Security content rules, Splunk security content, Elastic security content rules, and other open-source detection content.

        The expected output is a detailed analysis summarizing the threat intelligence, highlighting the best candidates for detection, and providing:
        - A summary of the selected TTPs
        - Detailed log source evidence for each TTP
        - Actionable insights and strategies for detection and mitigation""",
        agent=analyst,
        expected_output="A comprehensive report that lists the final list of TTPs selected for detection, along with detailed log source evidence and actionable insights.",
    )

    # Create the crew
    crew = Crew(
        agents=[researcher, analyst],
        tasks=[research_task, analysis_task],
        process=Process.sequential,
        verbose=2  # Increased verbosity for more detailed output
    )

    # Kick off the research process
    result = crew.kickoff(inputs={'query': query})
    return result

# Modified main block for subprocess compatibility
if __name__ == "__main__":
    if len(sys.argv) > 1:
        query = sys.argv[1]
        print(f"Starting threat research for query: {query}")
        result = perform_threat_research(query)
        print("Research completed. Final result:")
        print(result)
    else:
        print("Please provide a query as a command-line argument.")

================================================
FILE: trained_agents_data.pkl
================================================
}.

================================================
FILE: ui.py
================================================
import subprocess
import sys
import streamlit as st
from dotenv import load_dotenv
import os
import fitz
from threat_research import perform_threat_research
from firecrawl_integration import scrape_url

# Load environment variables
load_dotenv()

def render_ui(prompts, process_with_llm):
    # Streamlit UI
    st.set_page_config(page_title="D.I.A.N.A.", page_icon="🛡️", layout="wide")

    # Custom CSS and JavaScript for improved styling and resizable sidebar
    st.markdown("""
    <style>
        .stApp {
            max-width: none;
            padding: 1rem;
        }
        .main .block-container {
            max-width: none;
            padding-left: 2rem;
            padding-right: 2rem;
        }
        .main-content {
            background-color: #f0f2f6;
            padding: 2rem;
            border-radius: 10px;
        }
        .sidebar .stButton>button {
            width: 100%;
        }
        .stProgress > div > div > div > div {
            background-color: #4CAF50;
        }
        .tooltip {
            position: relative;
            display: inline-block;
            cursor: help;
        }
        .tooltip .tooltiptext {
            visibility: hidden;
            width: 200px;
            background-color: #555;
            color: #fff;
            text-align: center;
            border-radius: 6px;
            padding: 5px;
            position: absolute;
            z-index: 1;
            bottom: 125%;
            left: 50%;
            margin-left: -100px;
            opacity: 0;
            transition: opacity 0.3s;
        }
        .tooltip:hover .tooltiptext {
            visibility: visible;
            opacity: 1;
        }
        [data-testid="stSidebar"] {
            min-width: 300px;
            max-width: 800px;
            width: 300px;
            resize: horizontal;
            overflow: auto;
        }
        [data-testid="stSidebar"] > div:first-child {
            width: 100%;
            height: 100%;
        }
        .stApp > header {
            background-color: transparent;
        }
        .stApp {
            margin: 0;
        }
        .resize-handle {
            position: absolute;
            right: -5px;
            top: 0;
            bottom: 0;
            width: 10px;
            cursor: col-resize;
            z-index: 1000;
        }
    </style>

    <script>
        const resizeHandle = document.createElement('div');
        resizeHandle.className = 'resize-handle';
        const sidebar = window.parent.document.querySelector('[data-testid="stSidebar"]');
        sidebar.appendChild(resizeHandle);

        let isResizing = false;
        let lastDownX = 0;

        resizeHandle.addEventListener('mousedown', (e) => {
            isResizing = true;
            lastDownX = e.clientX;
        });

        document.addEventListener('mousemove', (e) => {
            if (!isResizing) return;
            const offsetRight = document.body.offsetWidth - (e.clientX - document.body.offsetLeft);
            const minWidth = 300;
            const maxWidth = 800;
            const newWidth = Math.min(Math.max(minWidth, document.body.offsetWidth - offsetRight), maxWidth);
            sidebar.style.width = newWidth + 'px';
        });

        document.addEventListener('mouseup', (e) => {
            isResizing = false;
        });
    </script>
    """, unsafe_allow_html=True)

    # Add a sidebar
    sidebar = st.sidebar

    with sidebar:
        st.image("https://i.imgur.com/wEHCCaj.png", width=300)
        st.markdown(
            """
            <div style="text-align: center;">
                Developed by Dylan Williams <a href="https://www.linkedin.com/in/dylan-williams-a2927599/" target="_blank">LinkedIn</a> | 
                <a href="https://github.com/dwillowtree/diana" target="_blank">GitHub Repository</a>
            </div>
            """,
            unsafe_allow_html=True
        )
        st.markdown("---")

        # Quick Start Guide section (collapsed by default)
        with st.expander("Quick Start Guide", expanded=False):
            st.markdown("""
            DIANA (Detection and Intelligence Analysis for New Alerts) automates detection creation from threat intelligence.

            **Note: Providing high-quality example detections, logs, and your detection writing process is critical for optimal results.**

            ### Steps:
            1. Select LLM provider and model
            2. Choose security data/log type(s) for detection
            3. Select detection language
            4. Input threat TTPs description or upload report/blog post
            5. **Important:** Provide 3-7 diverse, high-quality example detections for the chosen log source
            6. **Important:** Provide 3-7 example log sources
            7. **Recommended:** Outline your typical detection writing steps (this helps DIANA follow your workflow)
            8. Describe alert triage/investigation steps
            9. Click 'Process Threat Intel'

            Remember: The quality and diversity of your inputs directly impact DIANA's output. Take time to provide comprehensive examples and follow your standard workflow for best results.
            """)
        # About DIANA section (collapsed by default)
        with st.expander("About DIANA", expanded=False):
            st.markdown("""
            DIANA (Detection and Intelligence Analysis for New Alerts) is an AI-powered tool designed to streamline the detection writing process in cybersecurity operations.

            ### Purpose:
            - Automate the creation of detections from threat intelligence
            - Reduce manual effort in researching log sources and writing queries
            - Generate investigation steps and quality assurance checks

            DIANA leverages advanced AI capabilities to enhance efficiency and accuracy in cybersecurity threat detection, allowing security teams to respond more quickly and effectively to emerging threats.
            """)
        st.subheader("Configuration")
        
        # LLM Provider selection with tooltip
        llm_provider = st.selectbox(
            "LLM Provider",
            ["OpenAI", "Anthropic", "Amazon Bedrock", "Groq"],
            index=1,
            key="llm_provider",
            help="Choose the AI model provider for processing."
        )

        # Add Pro Tip before the model type selection
        st.sidebar.markdown("**PRO TIP:** Use Claude 3 Haiku (it's smart, fast, and cheap)")

        # Model selection based on provider
        if llm_provider == "OpenAI":
            model = st.selectbox(
                "Model Type",
                ["gpt-4o-mini", "gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo"],
                key="openai_model",
                help="Select the OpenAI model to use for processing."
            )
        elif llm_provider == "Anthropic":
            model = st.selectbox(
                "Model Type",
                ["claude-3-5-sonnet-20240620", "claude-3-opus-20240229", "claude-3-haiku-20240307"],
                index=2, 
                key="anthropic_model",
                help="Select the Anthropic model to use for processing."
            )
        elif llm_provider == "Amazon Bedrock":
            model = st.selectbox(
                "Model Type",
                [
                    "bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
                    "bedrock/anthropic.claude-3-haiku-20240307-v1:0",
                    "bedrock/meta.llama3-8b-instruct-v1:0",
                    "bedrock/meta.llama3-70b-instruct-v1:0"
                ],
                key="bedrock_model",
                help="Select the Amazon Bedrock model to use for processing."
            )
        elif llm_provider == "Groq":
            model = st.selectbox(
                "Model Type",
        [
            "groq/llama-3.1-70b-versatile",
            "groq/llama-3.1-8b-instant",
            "groq/llama3-8b-8192",
        ],
        key="groq_model",
        help="Select the Groq model to use for processing."
    )
        
        # Data types multiselect with search functionality
        data_types = st.multiselect(
            "Security Data/Log Type(s)",
            [
                "Okta Logs", "AWS CloudTrail Logs", "Kubernetes Audit Logs", "GitLab Audit Logs", "AWS EKS Plane logs", "Cisco Duo Logs"
            ],
            default=["AWS CloudTrail Logs"],
            key="security_data_type",
            help="Select the relevant log types for your detection."
        )

        # Detection language selection with tooltip
        detection_language = st.selectbox(
            "Detection Language",
            [
                "AWS Athena", "StreamAlert", "Splunk SPL", "Falcon LogScale", "Elastic Query DSL",
                "Kusto Query Language (KQL)",
                "Sigma Rules","Panther (Python)", "Hunters (Snowflake SQL)"
            ],
            key="detection_language_select",
            help="Choose the query language for your detection rules."
        )

        # Model parameters with explanations
        st.subheader("Model Parameters")
        temperature = st.slider(
            "Temperature",
            min_value=0.0,
            max_value=1.0,
            value=0.1,
            step=0.1,
            key="temperature_slider",
            help="Controls output randomness. Lower values (e.g., 0.2) for more deterministic results, higher values (e.g., 0.8) for more creative outputs."
        )
        
        max_tokens = st.slider(
            "Max Tokens",
            min_value=100,
            max_value=4000,
            value=4000,
            step=100,
            key="max_tokens_slider",
            help="Maximum number of tokens in the generated response. Higher values allow for longer outputs but may increase processing time."
        )
    
    st.title("🛡️ D.I.A.N.A.")
    st.subheader("Detection and Intelligence Analysis for New Alerts")


    # Create tabs for main workflow and threat research
    tab1, tab2, tab3 = st.tabs(["Detection Engineering", "Threat Research Crew", "Bulk Detection Processing [Coming Soon]"])

    # Progress bar for multi-step process
    if 'step' not in st.session_state:
        st.session_state.step = 0

    # Function to update progress
    def update_progress():
        # Cap the step at 5 for progress calculation
        capped_step = min(st.session_state.step, 5)
        progress_value = capped_step / 5
        progress_bar.progress(progress_value)
        
        # Display the actual step number, even if it's beyond 5
        step_counter.markdown(f"**Current Step: {st.session_state.step}/5**")
        
    with tab1:
        # Create placeholders for the progress bar and step counter
        progress_bar = st.empty()
        step_counter = st.empty()

        # Update progress at the beginning
        update_progress()

    # Main content layout
        col1, col2 = st.columns(2)
        with col1:
            st.subheader("Threat Intelligence Input")
            url = st.text_input("Enter URL:")
            
            # Initialize session state for scraped content if it doesn't exist
            if 'scraped_content' not in st.session_state:
                st.session_state.scraped_content = ""

            # Scrape URL button
            if st.button("🔍 Scrape URL", type="primary"):  
                if url:
                    try:
                        with st.spinner("Scraping URL..."):
                            st.session_state.scraped_content = scrape_url(url)
                        st.success("URL scraped successfully!")
                    except Exception as e:
                        st.error(f"Error scraping URL: {e}")
                        st.session_state.scraped_content = ""
                else:
                    st.warning("Please enter a URL to scrape.")

            # Display scraped content if available
            if st.session_state.scraped_content:
                with st.expander("View Scraped Content", expanded=False):
                    st.markdown(st.session_state.scraped_content)
            
            description = st.text_area(
                "Enter threat intelligence description:",
                height=100,
                placeholder="Detect a user attempting to exfiltrate an Amazon EC2 AMI Snapshot. This rule lets you monitor the ModifyImageAttribute CloudTrail API calls to detect when an Amazon EC2 AMI snapshot is made public or shared with an AWS account. This rule also inspects: @requestParameters.launchPermission.add.items.group array to determine if the string all is contained. This is the indicator which means the RDS snapshot is made public. @requestParameters.launchPermission.add.items.userId array to determine if the string * is contained. This is the indicator which means the RDS snapshot was shared with a new or unknown AWS account.",
                help="Provide a detailed description of the threat intelligence you want to analyze."
            )
            uploaded_file = st.file_uploader(
                "Upload threat intel report or blog (optional)",
                type=["txt", "pdf"],
                help="You can optionally upload a threat intelligence report or blog post for analysis."
            )

            file_content = ""

            if uploaded_file is not None:
                if uploaded_file.type == "application/pdf":
                    # Process PDF file
                    pdf_document = fitz.open(stream=uploaded_file.read(), filetype="pdf")
                    for page_num in range(pdf_document.page_count):
                        page = pdf_document.load_page(page_num)
                        file_content += page.get_text()
                else:
                    # Process other text files
                    file_content = uploaded_file.getvalue().decode("utf-8")

                    # Collapsible sections for additional inputs
            with st.expander("Detection Writing Steps", expanded=False):
                detection_steps = st.text_area(
                    "Enter detection writing steps:",
                    height=150,
                    placeholder="1. Identify the key indicators or behaviors from the threat intel\n2. Determine the relevant log sources and fields\n3. Write the query using the specified detection language\n4. Include appropriate filtering to reduce false positives\n5. Add comments to explain the logic of the detection",
                    help="Outline the steps you typically follow when writing detection rules."
                )

            with st.expander("Alert Triage Steps", expanded=False):
                sop = st.text_area(
                    "Enter standard operating procedures or investigation steps for your current detections and alerts:",
                    height=150,
                    placeholder="1. Validate the alert by reviewing the raw log data\n2. Check for any related alerts or suspicious activities from the same source\n3. Investigate the affected systems and user accounts\n4. Determine the potential impact and scope of the incident\n5. Escalate to the incident response team if a true positive is confirmed",
                    help="Describe your standard operating procedures for triaging and investigating alerts."
                )    

        with col2:
            st.subheader("Example Detections")
            num_detections = st.number_input("Number of example detections", min_value=1, value=2, step=1)
            current_detections = [
                st.text_area(
                    f"Example detection {i+1}",
                    height=100,
                    placeholder="SELECT sourceIPAddress, eventName, userAgent\nFROM cloudtrail_logs\nWHERE eventName = 'ConsoleLogin' AND errorMessage LIKE '%Failed authentication%'\nGROUP BY sourceIPAddress, eventName, userAgent\nHAVING COUNT(*) > 10",
                    help="Provide an example of an existing detection query in your environment."
                ) for i in range(num_detections)
            ]
            
            st.subheader("Example Logs")
            num_logs = st.number_input("Number of example logs", min_value=1, value=2, step=1)
            example_logs = [
                st.text_area(
                    f"Example log {i+1}",
                    height=100,
                    placeholder="paste examples of your actual logs here, you may have different field names or logging structure",
                    help="Provide examples of actual log entries from your environment."
                ) for i in range(num_logs)
            ]

        def run_threat_research(query, crewai_model):
            # Create a placeholder in the Streamlit UI
            output_placeholder = st.empty()

            # Prepare the environment variables
            env = os.environ.copy()
            env["OPENAI_MODEL_NAME"] = crewai_model

            # Run the threat_research.py script and capture its output
            process = subprocess.Popen(['python', 'threat_research.py', query], 
                                    stdout=subprocess.PIPE, 
                                    stderr=subprocess.STDOUT,
                                    universal_newlines=True,
                                    env=env)

            # Stream the output to the Streamlit UI
            full_output = ""
            for line in process.stdout:
                full_output += line
                output_placeholder.text_area("Research Progress:", full_output, height=400)

            # Return the final result
            return full_output

        # Process Threat Intel button
        if st.button("🚀 Process Threat Intel", type="primary") or st.session_state.step > 0:
            if not description and not uploaded_file and not st.session_state.scraped_content and st.session_state.step == 0:
                st.error("Please provide either a threat intel description or upload a file.")
            else:
                if st.session_state.step == 0:
                    # Step 1: Analyze Threat Intel
                    st.subheader("Step 1: Analyze Threat Intel")
                    details = st.expander("View Details", expanded=False)

                    context = {
                        "description": description,
                        "file_content": file_content,
                        "scraped_content": st.session_state.scraped_content,
                        "data_types": ", ".join(data_types),
                    }

                    formatted_prompt = prompts[0].format(**context)

                    with details:
                        st.text("Prompt:")
                        st.code(formatted_prompt, language="markdown")

                    with st.spinner("Analyzing threat intelligence..."):
                        result = process_with_llm(formatted_prompt, model, max_tokens, temperature)

                    if result is None:
                        st.error("An error occurred while analyzing the threat intelligence.")
                    else:
                        # Store the result in session state
                        st.session_state.result = result

                        with details:
                            st.text("Result:")
                            st.code(st.session_state.result, language="markdown")

                        st.success("Analysis complete!")
                        st.session_state.step = 1
                        update_progress()

                if st.session_state.step >= 1:
                    with st.spinner("Parsing detections..."):
                        # Parse the result to extract detections
                        detections = []
                        current_detection = {"name": "", "behavior": "", "log_evidence": "", "context": ""}
                        capturing_threat_behavior = False
                        capturing_log_evidence = False
                        capturing_context = False

                        for line in st.session_state.result.split('\n'):
                            stripped_line = line.strip()
                            if stripped_line.startswith(("Detection Name:", "1.", "2.", "3.", "4.", "5.", "6.", "7.", "8.", "9.", "10.")):
                                if current_detection["name"]:
                                    detections.append(current_detection)
                                    current_detection = {"name": "", "behavior": "", "log_evidence": "", "context": ""}
                                name = stripped_line.split(":", 1)[-1].strip() if ":" in stripped_line else stripped_line.split(".", 1)[-1].strip()
                                name = name.lstrip("0123456789. ")
                                current_detection["name"] = name
                            elif "Threat Behavior:" in stripped_line:
                                capturing_threat_behavior = True
                                capturing_log_evidence = False
                                capturing_context = False
                                current_detection["behavior"] = stripped_line.split("Threat Behavior:", 1)[-1].strip()
                            elif "Log Evidence:" in stripped_line:
                                capturing_threat_behavior = False
                                capturing_log_evidence = True
                                capturing_context = False
                                current_detection["log_evidence"] = stripped_line.split("Log Evidence:", 1)[-1].strip()
                            elif "Context:" in stripped_line:
                                capturing_threat_behavior = False
                                capturing_log_evidence = False
                                capturing_context = True
                                current_detection["context"] = stripped_line.split("Context:", 1)[-1].strip()
                            elif capturing_threat_behavior:
                                current_detection["behavior"] += " " + stripped_line
                            elif capturing_log_evidence:
                                current_detection["log_evidence"] += " " + stripped_line
                            elif capturing_context:
                                current_detection["context"] += " " + stripped_line

                        if current_detection["name"]:
                            detections.append(current_detection)

                        if not detections:
                            st.warning("No specific detections were identified. The entire analysis will be processed as a single detection.")
                            detections = [{"name": "Entire Analysis", "behavior": st.session_state.result, "log_evidence": "", "context": ""}]

                        st.session_state.detections = detections

                    # Display the number of detections found
                    st.info(f"Number of detections found: {len(detections)}")

                    # Display the names of the detections and their threat behaviors
                    if detections:
                        st.write("Detections found:")
                        for detection in detections:
                            st.markdown(f"**Detection Name:** {detection['name']}")
                            st.write(f"**Threat Behavior:** {detection['behavior']}")
                            st.write(f"**Log Evidence:** {detection['log_evidence']}")
                            st.write(f"**Context:** {detection['context']}")
                            st.write("---")

                    # Allow user to select a detection
                    selected_detection_name = st.selectbox("Select a detection to process:", [d["name"] for d in st.session_state.detections])

                    if st.button("Process Selected Detection", type="primary"):
                        selected_detection = next(d for d in st.session_state.detections if d["name"] == selected_detection_name)
                        st.session_state.selected_detection = selected_detection
                        st.session_state.step = 2
                        update_progress()

                if st.session_state.step >= 2:
                    # Process the remaining steps for the selected detection
                    selected_detection = st.session_state.selected_detection

                    st.write("Processing the selected detection:")
                    st.markdown(f"**Detection Name:** {selected_detection['name']}")
                    st.write(f"**Threat Behavior:** {selected_detection['behavior']}")
                    st.write(f"**Log Evidence:** {selected_detection['log_evidence']}")
                    st.write(f"**Context:** {selected_detection['context']}")

                    # Further processing steps...
                    results = {}
                    results[1] = st.session_state.result  # Store the first result

                    for i in range(2, 6):
                        if st.session_state.step > i:
                            continue

                        step_name = ['Create Detection Rule', 'Develop Investigation Guide', 'Quality Assurance Review', 'Final Summary'][i-2]
                        
                        st.subheader(f"Step {i}: {step_name}")
                        details = st.expander("View Details", expanded=False)

                        context = {
                            "detection_language": detection_language,
                            "current_detections": "\n".join(current_detections),
                            "example_logs": "\n".join(example_logs),
                            "detection_steps": detection_steps,
                            "sop": sop,
                            "previous_analysis": selected_detection,
                            "previous_detection_rule": results.get(2, ""),
                            "previous_investigation_steps": results.get(3, ""),
                            "previous_qa_findings": results.get(4, "")
                        }

                        formatted_prompt = prompts[i-1].format(**context)

                        with details:
                            st.text("Prompt:")
                            st.code(formatted_prompt, language="markdown")

                        with st.spinner(f"Processing {step_name}..."):
                            result = process_with_llm(formatted_prompt, model, max_tokens, temperature)

                        if result is None:
                            st.error(f"An error occurred while processing {step_name}.")
                            break

                        results[i] = result

                        with details:
                            st.text("Result:")
                            st.code(result, language="markdown")

                        st.success(f"{step_name} complete!")
                        st.session_state.step = i + 1
                        update_progress()

                    if len(results) == 5:
                        st.session_state.step = 6  # Indicate completion
                        update_progress()
                        st.success("Processing complete!")
                        st.markdown(results[5])

                        # Add a button to restart the process
                        if st.button("Start Over"):
                            st.session_state.step = 0
                            update_progress()
                            st.experimental_rerun()
                    else:
                        st.error("An error occurred while processing the threat intelligence.")

    with tab2:
        # Threat Research section
        st.subheader("Threat Research Crew")
        
        st.markdown("""
        This feature spins up a crew of autonomous AI agents that perform threat detection research on your topic of choice. They are maxed out at 5 iterations each, so no need to worry
        about them going rogue and taking over the world.

        This feature is currently limited to OpenAI models.
        
        These agents use Exa, which employs semantic search (embeddings) to search the web, providing more contextually relevant results than traditional keyword-based search engines like Google.
        
        **Examples of research topics:**
        - Threat hunting in Okta logs
        - Most common TTPs used by attackers in AWS
        - Latest detection strategies for ransomware in Windows environments
        """)

        # Add the CrewAI model selection here
        crewai_model = st.selectbox(
            "CrewAI Model",
            ["gpt-4o-mini", "gpt-4-turbo", "gpt-4o"],
            index=0,  # Set default to gpt-3.5-turbo
            key="crewai_model",
            help="Select the model for CrewAI to use"
        )

        research_query = st.text_input(
            "Enter your cybersecurity research topic:",
            placeholder="E.g., 'Threat hunting in Okta logs' or 'TTPs from CloudTrail logs used in AWS attacks'",
            help="Specify a topic for in-depth threat research to supplement your analysis."
        )

        if st.button("🔍 Perform Threat Research", type="primary", key="research_button"):
            if research_query:
                with st.spinner("Performing threat research... This may take a few minutes."):
                    research_result = run_threat_research(research_query, crewai_model)
                
                st.subheader("Threat Research Results")
                st.markdown(research_result)
                
            else:
                st.warning("Please enter a research topic before performing threat research.")
    with tab3:
        st.subheader("Open Source Detection Content")
        
        st.markdown("[![Elastic](https://img.shields.io/badge/Elastic-005571?style=for-the-badge&logo=elastic&logoColor=white)](https://github.com/elastic/detection-rules)")
        st.markdown("[![Chronicle](https://img.shields.io/badge/Chronicle-4285F4?style=for-the-badge&logo=google-cloud&logoColor=white)](https://github.com/chronicle/detection-rules)")
        st.markdown("[![Sigma](https://img.shields.io/badge/Sigma-008080?style=for-the-badge&logo=sigma&logoColor=white)](https://github.com/SigmaHQ/sigma)")
        st.markdown("[![Hacking the Cloud](https://img.shields.io/badge/Hacking_the_Cloud-FF9900?style=for-the-badge&logo=amazon-aws&logoColor=white)](https://hackingthe.cloud/)")
        st.markdown("[![Wiz Threats](https://img.shields.io/badge/Wiz_Threats-00ADEF?style=for-the-badge&logo=wiz&logoColor=white)](https://threats.wiz.io/)")
        st.markdown("[![Anvilogic Armory](https://img.shields.io/badge/Anvilogic_Armory-6F2DA8?style=for-the-badge&logo=github&logoColor=white)](https://github.com/anvilogic-forge/armory)")
        st.markdown("[![Panther](https://img.shields.io/badge/Panther-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/panther-labs/panther-analysis/tree/release/rules)")
        st.markdown("[![Splunk](https://img.shields.io/badge/Splunk-000000?style=for-the-badge&logo=splunk&logoColor=white)](https://github.com/splunk/security_content)")
        st.markdown("[![Datadog](https://img.shields.io/badge/Datadog-632CA6?style=for-the-badge&logo=datadog&logoColor=white)](https://docs.datadoghq.com/security/default_rules/)")
        st.markdown("[![Falco Security](https://img.shields.io/badge/Falco_Security-6A737D?style=for-the-badge&logo=github&logoColor=white)](https://github.com/falcosecurity/rules)")
        st.markdown("[![ExaBeam Content Library](https://img.shields.io/badge/ExaBeam-008080?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ExabeamLabs/Content-Doc)")
        st.markdown("[![Sekoia Detection Rules](https://img.shields.io/badge/Sekoia_Detection-0000FF?style=for-the-badge&logo=github&logoColor=white)](https://docs.sekoia.io/xdr/features/detect/built_in_detection_rules/)")
        st.markdown("[![Sublime](https://img.shields.io/badge/Sublime-FF6347?style=for-the-badge&logo=github&logoColor=white)](https://github.com/sublime-security/sublime-rules/)")
        st.markdown("[![Cloud Security Atlas](https://img.shields.io/badge/Cloud_Security_Atlas-632CA6?style=for-the-badge&logo=datadog&logoColor=white)](https://securitylabs.datadoghq.com/cloud-security-atlas/)")
        st.markdown("[![SaaS Attack Matrix](https://img.shields.io/badge/SaaS_Attack_Matrix-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/pushsecurity/saas-attacks)")
        st.markdown("[![Delivr.to Email Detections](https://img.shields.io/badge/Delivr.to-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/delivr-to/detections)")
        st.markdown("[![Public Cloud Breaches](https://img.shields.io/badge/Public_Cloud_Breaches-FF5733?style=for-the-badge&logo=google-cloud&logoColor=white)](https://www.breaches.cloud/)")
        st.markdown("[![CI/CD Threat Matrix](https://img.shields.io/badge/CI/CD_Threat_Matrix-6A737D?style=for-the-badge&logo=github&logoColor=white)](https://github.com/rung/threat-matrix-cicd)")
        st.markdown("[![K8s Attack Trees](https://img.shields.io/badge/K8s_Attack_Trees-0000FF?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/cncf/financial-user-group/tree/main/projects/k8s-threat-model)")
        st.markdown("[![eBPF Detections](https://img.shields.io/badge/eBPF_Detections-005571?style=for-the-badge&logo=aqua&logoColor=white)](https://github.com/aquasecurity/tracee/tree/main/signatures/golang)")
        st.markdown("[![Insider Threat TTP KnowledgeBase](https://img.shields.io/badge/Insider_Threat_TTP_KnowledgeBase-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/center-for-threat-informed-defense/insider-threat-ttp-kb)")
        st.markdown("[![Insider Threat Matrix](https://img.shields.io/badge/Insider_Threat_Matrix-FF9900?style=for-the-badge&logo=mitre&logoColor=white)](https://insiderthreatmatrix.org/)")
        st.markdown("[![Mitre Atlas](https://img.shields.io/badge/Mitre_Atlas-FF6347?style=for-the-badge&logo=mitre&logoColor=white)](https://atlas.mitre.org/techniques)")
        st.markdown("[![Dorothy](https://img.shields.io/badge/Dorothy-005571?style=for-the-badge&logo=elastic&logoColor=white)](https://github.com/elastic/dorothy)")
        st.markdown("[![Stratus Red Team](https://img.shields.io/badge/Stratus_Red_Team-000000?style=for-the-badge&logo=redhat&logoColor=white)](https://stratus-red-team.cloud/)")
        st.markdown("[![KubeHound](https://img.shields.io/badge/KubeHound-632CA6?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/DataDog/KubeHound)")
        st.markdown("[![RedKube](https://img.shields.io/badge/RedKube-FF0000?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/lightspin-tech/red-kube)")
        st.markdown("[![Kubesploit](https://img.shields.io/badge/Kubesploit-6A737D?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/cyberark/kubesploit)")
        st.markdown("[![Kubehunter](https://img.shields.io/badge/Kubehunter-005571?style=for-the-badge&logo=kubernetes&logoColor=white)](https://github.com/aquasecurity/kube-hunter)")
        st.markdown("[![Workload Security Evaluator](https://img.shields.io/badge/Workload_Security_Evaluator-632CA6?style=for-the-badge&logo=datadog&logoColor=white)](https://github.com/DataDog/workload-security-evaluator)")
        st.markdown("[![DeRF](https://img.shields.io/badge/DeRF-6A737D?style=for-the-badge&logo=github&logoColor=white)](https://github.com/vectra-ai-research/derf)")
        st.markdown("[![AWS Threat Composer](https://img.shields.io/badge/AWS_Threat_Composer-FF9900?style=for-the-badge&logo=amazon-aws&logoColor=white)](https://github.com/awslabs/threat-composer)")


    st.markdown("---")
Download .txt
gitextract_zlwpkh9w/

├── .devcontainer/
│   └── devcontainer.json
├── .gitignore
├── LICENSE
├── README.md
├── app.py
├── config.py
├── firecrawl_integration.py
├── requirements.txt
├── threat_research.py
├── trained_agents_data.pkl
└── ui.py
Download .txt
SYMBOL INDEX (6 symbols across 4 files)

FILE: app.py
  function track_cost_callback (line 21) | def track_cost_callback(kwargs, completion_response, start_time, end_time):
  function process_with_llm (line 33) | def process_with_llm(prompt, model, max_tokens, temperature):
  function process_threat_intel (line 54) | def process_threat_intel(description, file_content, model, data_types, d...

FILE: firecrawl_integration.py
  function scrape_url (line 12) | def scrape_url(url):

FILE: threat_research.py
  function perform_threat_research (line 11) | def perform_threat_research(query):

FILE: ui.py
  function render_ui (line 13) | def render_ui(prompts, process_with_llm):
Condensed preview — 11 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (66K chars).
[
  {
    "path": ".devcontainer/devcontainer.json",
    "chars": 1012,
    "preview": "{\n  \"name\": \"Python 3\",\n  // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerf"
  },
  {
    "path": ".gitignore",
    "chars": 278,
    "preview": "# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Python cache files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# "
  },
  {
    "path": "LICENSE",
    "chars": 1068,
    "preview": "MIT License\n\nCopyright (c) 2024 dwillowtree\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
  },
  {
    "path": "README.md",
    "chars": 7688,
    "preview": "# DIANA: Detection and Intelligence Analysis for New Alerts\n\nDIANA automates the creation of detections from threat inte"
  },
  {
    "path": "app.py",
    "chars": 3089,
    "preview": "import streamlit as st\nimport os\nfrom dotenv import load_dotenv\nfrom litellm import completion\nimport litellm\nfrom threa"
  },
  {
    "path": "config.py",
    "chars": 7574,
    "preview": "# Prompts for each step of the process\nprompts = [\n    # Prompt 1: Analyze threat intelligence\n    \"\"\"You are an expert "
  },
  {
    "path": "firecrawl_integration.py",
    "chars": 795,
    "preview": "# firecrawl_integration.py\nimport os\nfrom firecrawl import FirecrawlApp\nfrom dotenv import load_dotenv\n\n# Load environme"
  },
  {
    "path": "requirements.txt",
    "chars": 112,
    "preview": "streamlit\nrequests\npython-dotenv\nopenai\nanthropic\ncrewai\nexa-py\nlangchain_openai\nPyMuPDF\nfirecrawl\nlitellm\nboto3"
  },
  {
    "path": "threat_research.py",
    "chars": 5770,
    "preview": "import os\nimport sys\nfrom dotenv import load_dotenv\nfrom langchain_openai import ChatOpenAI\nfrom crewai import Agent, Ta"
  },
  {
    "path": "trained_agents_data.pkl",
    "chars": 3,
    "preview": "\u0004}."
  },
  {
    "path": "ui.py",
    "chars": 35935,
    "preview": "import subprocess\nimport sys\nimport streamlit as st\nfrom dotenv import load_dotenv\nimport os\nimport fitz\nfrom threat_res"
  }
]

About this extraction

This page contains the full source code of the dwillowtree/diana GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 11 files (61.8 KB), approximately 13.7k tokens, and a symbol index with 6 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!