Full Code of ronnieops/pi-search-hub for AI

main d6a15539a1c9 cached

9 files

85.9 KB

22.5k tokens

57 symbols

2 requests

Download .txt

Repository: ronnieops/pi-search-hub
Branch: main
Commit: d6a15539a1c9
Files: 9
Total size: 85.9 KB

Directory structure:
gitextract_flxax2c8/

├── .github/
│   └── workflows/
│       └── publish.yml
├── .gitignore
├── .npmignore
├── README.md
├── backends/
│   ├── parsers.test.ts
│   └── parsers.ts
├── extensions/
│   └── search-hub.ts
├── package.json
└── search.json.example

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/publish.yml
================================================
name: Publish Package

on:
  workflow_dispatch:
    inputs:
      version:
        description: "Version to publish (must match package.json)"
        required: true
        type: string

  release:
    types: [published]

permissions:
  id-token: write
  contents: read

jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '22'
          registry-url: 'https://registry.npmjs.org'

      - run: npm ci

      - name: Verify version
        run: |
          PKG_VERSION=$(node -p "require('./package.json').version")
          if [ "$PKG_VERSION" != "${{ inputs.version }}" ]; then
            echo "ERROR: package.json version ($PKG_VERSION) != requested version (${{ inputs.version }})"
            exit 1
          fi
          echo "Publishing pi-search-hub@$PKG_VERSION"

      - run: npm publish

================================================
FILE: .gitignore
================================================
node_modules/
dist/
*.log
.DS_Store

# API keys — NEVER commit these
.pi/
search.json
search.json.local
*.secret.*

# Dev/research docs (local only — never push)
findings.md
researcher*.md
scout*.md
handoffs/
docs/
benchmark/

# Benchmark reports
benchmark/benchmark-current-report.md

# Subagent session artifacts
context.md
research.md

# Other extensions (not search-related)
extensions/lint-markdown.ts
extensions/ralph-loop.ts

# Ralph loop state (local development)
.ralph/

# Personal/social media drafts
github-meta.md
reddit-post.md


================================================
FILE: .npmignore
================================================
# Session artifacts
handoffs/
scout*.md
researcher*.md
context.md
research.md

# Personal/social media drafts
github-meta.md
reddit-post.md

# Git
.git/
.gitignore

# All markdown except README
*.md
!README.md

# All TypeScript except the search extension and backends
extensions/*.ts
!extensions/search-hub.ts

# All non-search files
.pi/
benchmark/

# Ralph loop state
.ralph/

# Docs artifacts
docs/

# Tests
backends/*.test.ts

# Misc
*.log
.DS_Store

================================================
FILE: README.md
================================================
# pi-search-hub

Unified web search + content extraction extension for [pi](https://pi.dev) with **12 backend providers** (all working). One `web_search` tool, one `web_read` tool, auto-fallback, RRF-ranked combine mode, and credential resolution via env/shell/literal.

## Installation

```bash
pi install npm:pi-search-hub
```

> **Note for DuckDuckGo backend:** Requires the `ddgs` Python package. Install with:
>
> - Linux/macOS: `pip3 install ddgs`
> - Windows: `pip install ddgs`

## Usage

### Web Search

After installing, just ask naturally:

```text
Search for recent AI agent frameworks.
```

```text
What's the latest news on Llama 4?
```

Or use the tools directly — the agent picks the best configured backend automatically:

- `web_search` — search the web with auto-fallback or parallel combine mode
- `web_read` — fetch any URL as clean markdown

### Combine Mode

Set `combine=true` to query **ALL enabled backends in parallel** with Reciprocal Rank Fusion (RRF) ranking:

```text
Search for "Rust vs Go performance benchmarks" with combine=true to get results from all backends
```

**Combine mode benefits:**

- Broader coverage across multiple search indexes
- Results ranked by RRF — position-based scoring across all backends
- Each result shows which backend found it
- URL deduplication with content-aware merge (prefers richest result)
- Useful for comprehensive research or when you want diverse sources

**Tradeoff:** Uses more API quota per query (all backends are called), but you get more comprehensive results.

### Read Web Pages

Fetch any URL as clean markdown — great for extracting article content, docs, or reference pages.
**Note: `web_read` uses [Jina Reader](https://r.jina.ai/) to fetch and convert URLs to markdown.**

```text
Read https://docs.example.com/api-reference
```

The `web_read` tool supports:

- **objective** — CSS selector to target specific content (e.g. "div.article-body")
- **keywords** — relevant terms to highlight on long pages
- **mode** — `rush` for speed (return innerText) or `smart` (markdown extraction)
- **fresh** — bypass cache when freshness matters

## Supported Backends

| #   | Backend               | Free Tier                     | API Key? | How to get key                                                    |
| --- | --------------------- | ----------------------------- | :------: | ----------------------------------------------------------------- |
| 1   | **DuckDuckGo**        | Unlimited (rate-limited)      |  **No**  | `pip install ddgs` (Linux/macOS: `pip3`)                          |
| 2   | **Jina AI**           | Search: key req. web_read: free (no key) |   Yes   | [jina.ai](https://jina.ai)   |
| 3   | **Marginalia Search** | Unlimited (rate-limited)      | **No**†  | [marginalia.nu](https://www.marginalia.nu/marginalia-search/api/) |
| 4   | **Tavily**            | 1,000 calls/month             |   Yes    | [tavily.com](https://tavily.com)                                  |
| 5   | **Serper** (Google)   | 2,500 free queries (one-time) |   Yes    | [serper.dev](https://serper.dev)                                  |
| 6   | **Brave**             | 2,000 queries/month           |   Yes    | [brave.com/search/api](https://brave.com/search/api)              |
| 7   | **Firecrawl**         | 500 free credits              |   Yes    | [firecrawl.dev](https://www.firecrawl.dev)                        |
| 8   | **Exa**               | 1,000 free queries/month      |   Yes    | [exa.ai](https://dashboard.exa.ai/api-keys)                       |
| 9   | **LangSearch**        | Genuinely free, no CC         |   Yes    | [langsearch.com](https://langsearch.com)                          |
| 10  | **WebSearchAPI.ai**   | 2,000 free credits            |   Yes    | [websearchapi.ai](https://www.websearchapi.ai)                    |
| 11  | **Perplexity Sonar**  | Paid (usage-based)            |   Yes    | [perplexity.ai](https://docs.perplexity.ai)                       |
| 12  | **SearXNG**           | Self-hosted, unlimited        |  **No**  | [docs.searxng.org](https://docs.searxng.org)                      |

> † Marginalia Search uses `public` as a shared API key — no registration required, but subject to a shared rate limit.
>
> **Jina AI:** Search (`s.jina.ai`) requires a free API key from [jina.ai](https://jina.ai). Content extraction via `web_read` uses Jina Reader (`r.jina.ai`) which is **free and needs no API key**.
>
> **Perplexity Sonar** supports multiple model variants. Set `model` in your Perplexity backend config to choose: `sonar` (default, fast), `sonar-pro` (higher quality), `sonar-deep-research` (multi-step reasoning), or `sonar-reasoning` (DeepSeek R1-based).
>
> **SearXNG** is a self-hosted metasearch engine. Run your own instance (or use a public one), no API key required. Configure the instance URL in `.pi/search.json`.
>
> **Firecrawl** uses `api.firecrawl.dev/v2/search` with a `data.web[]` response shape. The v1 endpoint is deprecated.
>
> **Exa** (March 2026) includes content for the first 10 results per request at no extra cost. Content extraction is enabled by default.

## Configuration

Configure backends globally (all projects) or per-project:

**Global:** `~/.pi/agent/extensions/search.json`
**Project:** `.pi/search.json` (project takes precedence)

```json
{
  "defaultBackend": "auto",
  "backends": {
    "duckduckgo": { "enabled": true },
    "jina": { "enabled": true, "apiKey": "JINA_API_KEY" },
    "marginalia": { "enabled": true },
    "serper": { "enabled": true, "apiKey": "SERPER_API_KEY" },
    "tavily": { "enabled": true, "apiKey": "TAVILY_API_KEY" },
    "brave": { "enabled": true, "apiKey": "BRAVE_API_KEY" },
    "exa": { "enabled": true, "apiKey": "EXA_API_KEY" },
    "firecrawl": { "enabled": true, "apiKey": "FIRECRAWL_API_KEY" },
    "langsearch": { "enabled": true, "apiKey": "LANGSEARCH_API_KEY" },
    "websearchapi": { "enabled": true, "apiKey": "WEBSEARCHAPI_API_KEY" },
    "perplexity": {
      "enabled": true,
      "apiKey": "PERPLEXITY_API_KEY",
      "model": "sonar"
    },
    "searxng": { "enabled": true, "instanceUrl": "http://localhost:8888" }
  }
}
```

### Credential Resolution

The `apiKey` field supports four formats (following pi-web-providers convention):

| `apiKey` value            | Resolved from                           | Example                            |
| ------------------------- | --------------------------------------- | ---------------------------------- |
| `"SERPER_API_KEY"`        | `process.env.SERPER_API_KEY`            | ALL_CAPS → env var                 |
| `"!pass show api/serper"` | stdout of shell command (cached)        | `!` prefix → exec                  |
| `"sk-abc123..."`          | Used as-is                              | Literal key (backwards compatible) |
| _(unset)_                 | `SEARCH_<BACKEND>_API_KEY` env fallback | Auto-enables backend               |

**Env var references:** Any ALL_CAPS string is treated as an environment variable name (not a literal). If the referenced env var is unset, a warning is printed (your literal key is not silently discarded).

**Shell commands:** Commands prefixed with `!` are executed via `execSync` with a 5s timeout. Results are cached and invalidated when config is reloaded (editing the config file clears the cache).

**Convenience env vars:** Backends are auto-enabled when these env vars are set (even with no config entry):

```bash
export SEARCH_SERPER_API_KEY="sk-..."
export SEARCH_TAVILY_API_KEY="sk-..."
export SEARCH_EXA_API_KEY="sk-..."
# ...
```

```json
{
  "backends": {
    "serper": { "enabled": true, "apiKey": "SERPER_API_KEY" }
  }
}
```

**To rotate a shell-command key:** Update the secret in your password manager, then trigger a config reload (edit the config file, or wait 10s for automatic refresh).

Or use the interactive setup:

```
/search-setup
```

## Commands

| Command          | Description                                                       |
| ---------------- | ----------------------------------------------------------------- |
| `/search-setup`  | Interactive prompt to configure API keys and instance URLs        |
| `/search-status` | Show which backends are active, which have keys, and their status |

> **Tip:** After running `/search-setup` or editing your config, run `/reload` to activate changes without restarting pi.

## How auto mode works

### Fallback Mode (default, `combine=false`)

1. Tries each enabled backend in order from your config
2. If a backend fails (rate limit, auth error, etc.), moves to the next one
3. Jina AI search requires a free API key from [jina.ai](https://jina.ai) (get one at jina.ai/reader). DuckDuckGo requires no API key. Both serve as safety nets
4. Returns results from the first backend that succeeds
5. If all backends fail, reports the collected errors

### Combine Mode (`combine=true`)

1. Queries **ALL** enabled backends in parallel
2. Each backend receives `numResults / numBackends` as a target
3. Results are merged using **Reciprocal Rank Fusion** (RRF) — position-based scoring that works across incompatible ranking systems
4. Each result shows its source backend (e.g., `*Source: Tavily*`)
5. URL dedup prefers the result with the richest content (content > snippet)
6. Backend statistics are displayed (which succeeded, result counts, errors)

### RRF Scoring

RRF assigns each result a score of `Σ(1 / (60 + rank_i))` across all backends that returned it. Results are ranked by score, then by number of backends that found them. This means a result ranked #1 by one backend and #5 by another beats a result ranked #4 by two backends.

## Security

- API keys are stored in local config files only (`~/.pi/agent/extensions/search.json` or `.pi/search.json`), never sent to any third party besides the chosen backend
- **Env vars and shell commands** are supported for credential resolution — the config file is trusted (you own it), but never commit plain API keys to version control
- DuckDuckGo queries use spawned Python subprocess (abortable via signal)
- All HTTP backends have a 30-second timeout; shell commands for credentials have a 5-second timeout
- Error messages are sanitized — API response bodies are truncated and key-like patterns are redacted
- The `.pi/` directory is in `.gitignore` — **never commit API keys to version control**

## Testing

```bash
# Run unit tests for backend parsers
npx vitest run backends/parsers.test.ts

# Quick test Jina AI (with your free API key)
curl -s -H "Authorization: Bearer $JINA_API_KEY" "https://s.jina.ai/?q=test&format=json" | jq .

# Quick test via curl with your configured key
curl -X POST "https://api.exa.ai/search" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $KEY" \
  -d '{"query": "test", "numResults": 3, "contents": {"text": true}}'

# Quick test Perplexity Sonar (use "sonar-pro" or "sonar-deep-research" for model)
curl -X POST "https://api.perplexity.ai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KEY" \
  -d '{"model": "sonar", "messages": [{"role": "user", "content": "test"}], "search_context_size": "low"}'

# Quick test Firecrawl (v2 endpoint — code still uses v1)
curl -X POST "https://api.firecrawl.dev/v2/search" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KEY" \
  -d '{"query": "test", "limit": 3}'

# Quick test SearXNG (replace URL with your instance)
curl "http://localhost:8888/search?q=test&format=json&count=3"
```

## Adding a new backend

Backends are registered via the `BACKEND_DEFS` registry in `extensions/search-hub.ts`. Define a `search` function and add one entry to the registry:

```typescript
const BACKEND_DEFS: Record<string, BackendRunner> = {
  // ... existing entries
  mybackend: {
    needsKey: true,
    needsKeyFromConfig: false,
    needsInstanceUrl: false,
    label: "My Backend",
    setupLabel: "My Backend (free tier description)",
    search: async (query, numResults, { key, signal }) => {
      const result = await searchMyBackend(query, numResults, key!, signal);
      return { results: result.results };
    },
  },
};
```

The registry handles dispatching, key resolution, formatting labels, and setup menu — no other edits needed.

## License

MIT

---

<p align="true">Proudly created with <a href="https://pi.dev">pi</a></p>


================================================
FILE: backends/parsers.test.ts
================================================
import { describe, it, expect } from "vitest";
import {
	parseMarginalia,
	parseWebSearchAPI,
	parseSerper,
	parseTavily,
	parseExa,
	parseBrave,
	parseLangSearch,
	parseFirecrawl,
	parsePerplexity,
	parseSearXNG,
	parseJina,
} from "./parsers.js";

// ---------------------------------------------------------------------------
// Marginalia
// ---------------------------------------------------------------------------

describe("parseMarginalia", () => {
	it("parses standard response", () => {
		const data = {
			results: [
				{ title: "Test 1", url: "https://example.com/1", description: "Desc 1" },
				{ title: "Test 2", url: "https://example.com/2", description: "Desc 2" },
			],
		};
		const results = parseMarginalia(data, 10);
		expect(results).toHaveLength(2);
		expect(results[0]).toEqual({ title: "Test 1", url: "https://example.com/1", snippet: "Desc 1" });
	});

	it("handles missing fields gracefully", () => {
		const data = { results: [{}] };
		const results = parseMarginalia(data, 10);
		expect(results).toHaveLength(1);
		expect(results[0]).toEqual({ title: "", url: "", snippet: "" });
	});

	it("truncates long descriptions to 500 chars", () => {
		const data = { results: [{ description: "x".repeat(600) }] };
		const results = parseMarginalia(data, 10);
		expect(results[0].snippet.length).toBe(500);
	});

	it("respects numResults limit", () => {
		const data = { results: Array.from({ length: 10 }, (_, i) => ({ title: `T${i}`, url: `https://e.com/${i}` })) };
		const results = parseMarginalia(data, 3);
		expect(results).toHaveLength(3);
	});

	it("handles empty results", () => {
		const results = parseMarginalia({}, 10);
		expect(results).toHaveLength(0);
	});
});

// ---------------------------------------------------------------------------
// WebSearchAPI
// ---------------------------------------------------------------------------

describe("parseWebSearchAPI", () => {
	it("parses organic results", () => {
		const data = {
			organic: [
				{ title: "Web 1", url: "https://web.com/1", description: "Web desc" },
			],
		};
		const results = parseWebSearchAPI(data, 10);
		expect(results).toHaveLength(1);
		expect(results[0]).toEqual({ title: "Web 1", url: "https://web.com/1", snippet: "Web desc" });
	});

	it("handles missing organic field", () => {
		const results = parseWebSearchAPI({}, 10);
		expect(results).toHaveLength(0);
	});

	it("handles organic as non-array", () => {
		const results = parseWebSearchAPI({ organic: "not an array" }, 10);
		expect(results).toHaveLength(0);
	});
});

// ---------------------------------------------------------------------------
// Serper
// ---------------------------------------------------------------------------

describe("parseSerper", () => {
	it("maps link to url", () => {
		const data = { organic: [{ title: "S", link: "https://s.com", snippet: "snip" }] };
		const results = parseSerper(data, 10);
		expect(results[0]).toEqual({ title: "S", url: "https://s.com", snippet: "snip" });
	});
});

// ---------------------------------------------------------------------------
// Tavily
// ---------------------------------------------------------------------------

describe("parseTavily", () => {
	it("maps content to snippet and preserves content field", () => {
		const data = { results: [{ title: "T", url: "https://t.com", content: "full content" }] };
		const results = parseTavily(data, 10);
		expect(results[0].snippet).toBe("full content");
		expect(results[0].content).toBe("full content");
	});
});

// ---------------------------------------------------------------------------
// Exa
// ---------------------------------------------------------------------------

describe("parseExa", () => {
	it("prefers text over highlight for snippet", () => {
		const data = { results: [{ title: "E", url: "https://e.com", text: "text val", highlight: "high val" }] };
		const results = parseExa(data, 10);
		expect(results[0].snippet).toBe("text val");
	});

	it("falls back to highlight when no text", () => {
		const data = { results: [{ title: "E", url: "https://e.com", highlight: "high val" }] };
		const results = parseExa(data, 10);
		expect(results[0].snippet).toBe("high val");
	});
});

// ---------------------------------------------------------------------------
// Brave
// ---------------------------------------------------------------------------

describe("parseBrave", () => {
	it("navigates web.results path", () => {
		const data = { web: { results: [{ title: "B", url: "https://b.com", description: "desc" }] } };
		const results = parseBrave(data, 10);
		expect(results[0]).toEqual({ title: "B", url: "https://b.com", snippet: "desc" });
	});

	it("returns empty when web is missing", () => {
		expect(parseBrave({}, 10)).toHaveLength(0);
	});
});

// ---------------------------------------------------------------------------
// LangSearch
// ---------------------------------------------------------------------------

describe("parseLangSearch", () => {
	it("navigates data.webPages.value path", () => {
		const data = { data: { webPages: { value: [{ name: "LS", url: "https://ls.com", snippet: "sn" }] } } };
		const results = parseLangSearch(data, 10);
		expect(results[0].title).toBe("LS");
		expect(results[0].snippet).toBe("sn");
	});

	it("prefers name over title", () => {
		const data = { data: { webPages: { value: [{ name: "Name", title: "Title", url: "https://ls.com" }] } } };
		const results = parseLangSearch(data, 10);
		expect(results[0].title).toBe("Name");
	});
});

// ---------------------------------------------------------------------------
// Firecrawl v2
// ---------------------------------------------------------------------------

describe("parseFirecrawl", () => {
	it("parses v2 object response with web array", () => {
		const data = { data: { web: [{ title: "FC", url: "https://fc.com", description: "d" }] } };
		const results = parseFirecrawl(data, 10);
		expect(results[0]).toEqual({ title: "FC", url: "https://fc.com", snippet: "d" });
	});

	it("parses v2 flat array response", () => {
		const data = { data: [{ title: "FC", url: "https://fc.com" }] };
		const results = parseFirecrawl(data, 10);
		expect(results).toHaveLength(1);
	});

	it("falls back to v1 results field", () => {
		const data = { results: [{ title: "FC1", url: "https://fc.com/1" }] };
		const results = parseFirecrawl(data, 10);
		expect(results).toHaveLength(1);
	});

	it("falls back to images when web is empty", () => {
		const data = { data: { web: [], images: [{ title: "Img", url: "https://img.com" }] } };
		const results = parseFirecrawl(data, 10);
		expect(results[0].title).toBe("Img");
	});
});

// ---------------------------------------------------------------------------
// Perplexity
// ---------------------------------------------------------------------------

describe("parsePerplexity", () => {
	it("builds answer result from content + citations", () => {
		const data = {
			citations: ["https://src1.com", "https://src2.com"],
			choices: [{ message: { content: "The answer is 42" } }],
		};
		const results = parsePerplexity(data, "what is the answer", 10);
		expect(results[0].title).toBe("Answer: what is the answer");
		expect(results[0].snippet).toBe("The answer is 42");
		expect(results).toHaveLength(3); // answer + 2 citations
	});

	it("extracts hostname as title from citation URLs", () => {
		const data = { citations: ["https://www.example.com/path/to/page"] };
		const results = parsePerplexity(data, "test", 10);
		expect(results[0].title).toBe("example.com/path/to/page");
	});

	it("handles empty citations", () => {
		const data = { citations: [] };
		const results = parsePerplexity(data, "test", 10);
		expect(results).toHaveLength(0);
	});
});

// ---------------------------------------------------------------------------
// SearXNG
// ---------------------------------------------------------------------------

describe("parseSearXNG", () => {
	it("prefers content over snippet", () => {
		const data = { results: [{ title: "SX", url: "https://sx.com", content: "content", snippet: "snip" }] };
		const results = parseSearXNG(data, 10);
		expect(results[0].snippet).toBe("content");
	});
});

// ---------------------------------------------------------------------------
// Jina
// ---------------------------------------------------------------------------

describe("parseJina", () => {
	it("parses data array with content", () => {
		const data = { data: [{ title: "J", url: "https://j.com", content: "full article" }] };
		const results = parseJina(data, 10);
		expect(results[0].title).toBe("J");
		expect(results[0].content).toBe("full article");
		expect(results[0].snippet).toBe("full article");
	});

	it("truncates content to 2000 chars", () => {
		const data = { data: [{ title: "J", url: "https://j.com", content: "x".repeat(3000) }] };
		const results = parseJina(data, 10);
		expect(results[0].content.length).toBe(2000);
	});
});


================================================
FILE: backends/parsers.ts
================================================
/**
 * Pure response parsers for search backends.
 * Each takes raw JSON data and returns normalized results.
 * No HTTP, no side effects — easy to unit test.
 */

export interface ParsedResult {
	title: string;
	url: string;
	snippet: string;
}

// ---------------------------------------------------------------------------
// Marginalia Search
// Response: { results: [{ title, url, description }] }
// ---------------------------------------------------------------------------

export function parseMarginalia(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const results = (data.results || []) as Array<Record<string, unknown>>;
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		snippet: ((r.description as string) || "").slice(0, 500),
	}));
}

// ---------------------------------------------------------------------------
// WebSearchAPI.ai
// Response: { organic: [{ title, url, description }] }
// ---------------------------------------------------------------------------

export function parseWebSearchAPI(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const rawResults = data.organic;
	const organic = Array.isArray(rawResults) ? rawResults : [];
	return organic.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		snippet: ((r.description as string) || "").slice(0, 500),
	}));
}

// ---------------------------------------------------------------------------
// Serper.dev (Google)
// Response: { organic: [{ title, link, snippet }] }
// ---------------------------------------------------------------------------

export function parseSerper(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const rawResults = data.organic;
	const results = Array.isArray(rawResults) ? rawResults : [];
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.link as string) || "",
		snippet: (r.snippet as string) || "",
	}));
}

// ---------------------------------------------------------------------------
// Tavily
// Response: { results: [{ title, url, content }] }
// ---------------------------------------------------------------------------

export interface TavilyParsedResult extends ParsedResult {
	content?: string;
}

export function parseTavily(
	data: Record<string, unknown>,
	numResults: number,
): TavilyParsedResult[] {
	const rawResults = data.results;
	const results = Array.isArray(rawResults) ? rawResults : [];
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		snippet: (r.content as string) || "",
		content: r.content as string,
	}));
}

// ---------------------------------------------------------------------------
// Exa
// Response: { results: [{ title, url, text, highlight }] }
// ---------------------------------------------------------------------------

export function parseExa(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const rawResults = data.results;
	const results = Array.isArray(rawResults) ? rawResults : [];
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		snippet: ((r.text as string) || (r.highlight as string) || "").slice(0, 500),
	}));
}

// ---------------------------------------------------------------------------
// Brave Search
// Response: { web: { results: [{ title, url, description }] } }
// ---------------------------------------------------------------------------

export function parseBrave(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const web = data.web;
	if (!web || typeof web !== "object") {
		return [];
	}
	const rawResults = (web as Record<string, unknown>).results;
	const results = Array.isArray(rawResults) ? rawResults : [];
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		snippet: ((r.description as string) || "").slice(0, 500),
	}));
}

// ---------------------------------------------------------------------------
// LangSearch
// Response: { data: { webPages: { value: [{ name, url, snippet, description }] } } }
// ---------------------------------------------------------------------------

export function parseLangSearch(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const pages = (data.data as Record<string, unknown>)?.webPages as Record<string, unknown> | undefined;
	const results = (pages?.value || data.results || data.data || []) as Array<Record<string, unknown>>;
	return results.slice(0, numResults).map((r) => ({
		title: (r.name as string) || (r.title as string) || "",
		url: (r.url as string) || (r.link as string) || "",
		snippet: ((r.snippet as string) || (r.description as string) || "").slice(0, 500),
	}));
}

// ---------------------------------------------------------------------------
// Firecrawl v2
// Response: { data: { web: [...] } or data: [...] or { results: [...] } (v1 fallback)
// ---------------------------------------------------------------------------

export function parseFirecrawl(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const rawData = data.data;
	let results: Array<Record<string, unknown>> = [];
	if (Array.isArray(rawData)) {
		results = rawData;
	} else if (typeof rawData === "object" && rawData !== null) {
		const obj = rawData as Record<string, unknown>;
		results = Array.isArray(obj.web) ? obj.web : [];
		if (results.length === 0) {
			if (Array.isArray(obj.images)) results = obj.images as Array<Record<string, unknown>>;
			else if (Array.isArray(obj.news)) results = obj.news as Array<Record<string, unknown>>;
		}
	} else if (Array.isArray(data.results)) {
		results = data.results;
	}
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		snippet: ((r.description as string) || (r.snippet as string) || "").slice(0, 500),
	}));
}

// ---------------------------------------------------------------------------
// Perplexity Sonar
// Response: { citations: string[], choices: [{ message: { content } }] }
// ---------------------------------------------------------------------------

export function parsePerplexity(
	data: Record<string, unknown>,
	query: string,
	numResults: number,
): ParsedResult[] {
	const citations = (data.citations as string[]) || [];
	const message = (data.choices as Array<Record<string, unknown>>)?.[0]?.message as Record<string, unknown> | undefined;
	const answerText = (message?.content as string) || "";

	const results: ParsedResult[] = [];

	if (answerText) {
		results.push({
			title: `Answer: ${query}`,
			url: citations[0] || "",
			snippet: answerText.slice(0, 500),
		});
	}

	for (const url of citations) {
		try {
			const u = new URL(url);
			const title = u.hostname.replace(/^www\./, "") + (u.pathname !== "/" ? u.pathname.slice(0, 60) : "");
			results.push({ title: title || url, url, snippet: "" });
		} catch {
			results.push({ title: url, url, snippet: "" });
		}
	}

	return results.slice(0, numResults);
}

// ---------------------------------------------------------------------------
// SearXNG
// Response: { results: [{ title, url, content, snippet }] }
// ---------------------------------------------------------------------------

export function parseSearXNG(
	data: Record<string, unknown>,
	numResults: number,
): ParsedResult[] {
	const rawResults = data.results as Array<Record<string, unknown>> | undefined;
	const results = Array.isArray(rawResults) ? rawResults : [];
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		snippet: ((r.content as string) || (r.snippet as string) || "").slice(0, 500),
	}));
}

// ---------------------------------------------------------------------------
// Jina AI (s.jina.ai)
// Response: { data: [{ title, url, content, description }] }
// ---------------------------------------------------------------------------

export interface JinaParsedResult extends ParsedResult {
	content: string;
}

export function parseJina(
	data: Record<string, unknown>,
	numResults: number,
): JinaParsedResult[] {
	const rawData = data.data as Array<Record<string, unknown>> | undefined;
	const results = Array.isArray(rawData) ? rawData : [];
	return results.slice(0, numResults).map((r) => ({
		title: (r.title as string) || "",
		url: (r.url as string) || "",
		content: ((r.content as string) || (r.description as string) || "").slice(0, 2000),
		snippet: ((r.content as string) || (r.description as string) || "").slice(0, 500),
	}));
}


================================================
FILE: extensions/search-hub.ts
================================================
/**
 * Extension — Unified web search (12 backends) + content extraction (web_read)
 *
 * Backends (choose any, all disabled by default):
 *   duckduckgo    — ✅ Free, no key, via Python ddgs lib. Rate-limited.
 *   jina          — ✅ Free tier (API key optional for higher rate limits), full markdown via s.jina.ai
 *   marginalia    — ✅ Anti-SEO, "public" key optional. 354ms avg
 *   serper        — ✅ Google via serper.dev, 2500 free/mo. 667ms
 *   brave         — ✅ Brave Search, 2000 free/mo. 460ms
 *   tavily        — ✅ AI search, 1000 free/mo. 356ms BEST QUALITY
 *   exa           — ✅ AI-native, 10 QPS free tier. 137ms FASTEST
 *   firecrawl     — ✅ Search+crawl, 500 free credits. 644ms
 *   langsearch    — ✅ Free tier, no CC. 1816ms
 *   websearchapi  — ✅ Google-powered, 2000 free credits. 1323ms
 *   perplexity    — ✅ Unlimited free Sonar, citation-based answers
 *   searxng       — ✅ Self-hosted, 70+ aggregators. Needs instance URL
 *
 * Tools: web_search (auto-fallback + RRF combine mode), web_read (URL content)
 * Config: ~/.pi/agent/extensions/search.json + .pi/search.json (project wins)
 * Credentials: env var refs (ALL_CAPS), shell commands (!command), or literal keys
 *
 * Example .pi/search.json:
 *   {
 *     "defaultBackend": "auto",
 *     "backends": {
 *       "duckduckgo": { "enabled": true },
 *       "marginalia": { "enabled": true },
 *       "serper": { "enabled": true, "apiKey": "..." },
 *       "tavily": { "enabled": true, "apiKey": "..." },
 *       "exa": { "enabled": true, "apiKey": "..." },
 *       "firecrawl": { "enabled": true, "apiKey": "..." },
 *       "langsearch": { "enabled": true, "apiKey": "..." },
 *       "websearchapi": { "enabled": true, "apiKey": "..." },
 *       "perplexity": { "enabled": true, "apiKey": "..." },
 *       "searxng": { "enabled": true, "instanceUrl": "http://localhost:8888" }
 *     }
 *   }
 */

import { execSync, spawn } from "node:child_process";
import { existsSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
import { join } from "node:path";
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
import { StringEnum } from "@earendil-works/pi-ai";
import { Type } from "typebox";
import {
	parseMarginalia, parseWebSearchAPI, parseSerper, parseTavily,
	parseExa, parseBrave, parseLangSearch, parseFirecrawl,
	parsePerplexity, parseSearXNG, parseJina,
} from "../backends/parsers.js";

// ---------------------------------------------------------------------------
// Types & Config
// ---------------------------------------------------------------------------

interface BackendConfig {
	enabled?: boolean;
	apiKey?: string;
	/** SearXNG-specific: base URL of the self-hosted instance (e.g. http://localhost:8888) */
	instanceUrl?: string;
	/** Perplexity-specific: model variant (sonar, sonar-pro, sonar-deep-research, sonar-reasoning). Default: sonar */
	model?: string;
}

interface SearchConfig {
	defaultBackend?: string;
	backends?: {
		duckduckgo?: BackendConfig;
		marginalia?: BackendConfig;

		serper?: BackendConfig;
		tavily?: BackendConfig;
		exa?: BackendConfig;
		brave?: BackendConfig;
		langsearch?: BackendConfig;
		firecrawl?: BackendConfig;
		websearchapi?: BackendConfig;
		perplexity?: BackendConfig;
		searxng?: BackendConfig;
	};
}

function getAgentDir(): string {
	return join(process.env.HOME || process.env.USERPROFILE || "~", ".pi", "agent");
}

const commandValueCache = new Map<string, { value?: string; errorMessage?: string }>();
const COMMAND_TIMEOUT_MS = 5_000;

/**
 * Resolve a credential reference à la pi-web-providers:
 *   • "!command"   → execute shell command, return trimmed stdout (cached)
 *   • "ALL_CAPS"   → read process.env[ALL_CAPS]
 *   • otherwise     → return as literal string (actual key)
 */
function resolveConfigValue(reference: string | undefined): string | undefined {
	if (!reference) return undefined;

	// !command — execute shell command, cache result
	if (reference.startsWith("!")) {
		const cached = commandValueCache.get(reference);
		if (cached) {
			if (cached.errorMessage) throw new Error(cached.errorMessage);
			return cached.value;
		}
		try {
			const output = execSync(reference.slice(1), {
				encoding: "utf-8",
				stdio: ["ignore", "pipe", "pipe"],
				timeout: COMMAND_TIMEOUT_MS,
			})
				.trim();
			const value = output.length > 0 ? output : undefined;
			commandValueCache.set(reference, { value });
			return value;
		} catch (error) {
			const errorMessage = (error as Error).message;
			commandValueCache.set(reference, { errorMessage });
			throw error;
		}
	}

	// ALL_CAPS → env var lookup
	const envValue = process.env[reference];
	if (envValue !== undefined) return envValue;
	if (/^[A-Z][A-Z0-9_]*$/.test(reference)) {
		// Warn: value looks like an env var reference but the env var is unset.
		// If this was intended as a literal key, rename it or set the env var.
		console.warn(`[pi-search] Credential reference "${reference}" matches ALL_CAPS env-var pattern ` +
			`but process.env.${reference} is not set. If this is a literal key, ` +
			`use a different name to avoid confusion.`);
		return undefined;
	}

	// Otherwise → literal string (actual key in config)
	return reference;
}

/** Convenience env vars checked as fallback when config has no apiKey for a backend. */
const FALLBACK_ENV_MAP: Record<string, string> = {
	jina: "SEARCH_JINA_API_KEY",
	serper: "SEARCH_SERPER_API_KEY",
	tavily: "SEARCH_TAVILY_API_KEY",
	exa: "SEARCH_EXA_API_KEY",
	brave: "SEARCH_BRAVE_API_KEY",
	langsearch: "SEARCH_LANGSEARCH_API_KEY",
	firecrawl: "SEARCH_FIRECRAWL_API_KEY",
	websearchapi: "SEARCH_WEBSEARCHAPI_API_KEY",
	perplexity: "SEARCH_PERPLEXITY_API_KEY",
};

/** Invalidate cached shell-command credentials so key rotation takes effect. */
function clearCredentialCache(): void {
	commandValueCache.clear();
}

/** Lazy resolution: config.apiKey → resolveConfigValue() → FALLBACK_ENV_MAP fallback. */
function resolveBackendKey(backend: string): string | undefined {
	const bc = config.backends?.[backend as keyof typeof config.backends];
	if (bc?.apiKey) {
		const resolved = resolveConfigValue(bc.apiKey);
		if (resolved) return resolved;
	}
	const fallbackEnv = FALLBACK_ENV_MAP[backend];
	if (fallbackEnv) {
		const envValue = process.env[fallbackEnv];
		if (envValue && envValue.trim().length > 0) return envValue.trim();
	}
	return undefined;
}

/** Describe where a backend's key comes from (for search-status display). */
function getKeySource(backend: string): { configured: boolean; source: string } {
	const bc = config.backends?.[backend as keyof typeof config.backends];
	if (!bc?.apiKey) {
		const fallbackEnv = FALLBACK_ENV_MAP[backend];
		if (fallbackEnv && process.env[fallbackEnv]) {
			return { configured: true, source: `env:${fallbackEnv}` };
		}
		return { configured: false, source: "" };
	}
	const ref = bc.apiKey;
	if (ref.startsWith("!")) {
		return { configured: true, source: `shell:${ref.slice(0, 40)}...` };
	}
	if (/^[A-Z][A-Z0-9_]*$/.test(ref)) {
		const envValue = process.env[ref];
		if (envValue) return { configured: true, source: `env:${ref}` };
		return { configured: false, source: `env:${ref} (unset)` };
	}
	return { configured: true, source: "literal" };
}



function loadConfig(cwd: string): SearchConfig {
	const globalPath = join(getAgentDir(), "extensions", "search.json");
	const projectPath = join(cwd, ".pi", "search.json");

	let config: SearchConfig = { defaultBackend: "duckduckgo", backends: {} };

	if (existsSync(globalPath)) {
		try {
			config = { ...config, ...JSON.parse(readFileSync(globalPath, "utf-8")) };
		} catch {
			// ignore
		}
	}

	// Save global backends before project config overwrites them
	const preProjectBackends = { ...(config.backends ?? {}) };

	if (existsSync(projectPath)) {
		try {
			const project = JSON.parse(readFileSync(projectPath, "utf-8"));
			config = { ...config, ...project };
			// Guard: if project config set backends to null/undefined, restore global backends
			if (config.backends == null) {
				config.backends = preProjectBackends;
			}
			if (project.backends && typeof project.backends === "object") {
				// Deep merge: merge per-backend so global backends not re-listed in project config are preserved
				const merged = { ...preProjectBackends, ...config.backends };
				for (const [key, val] of Object.entries(project.backends)) {
					if (val && merged[key]) {
						merged[key] = { ...merged[key], ...val };
					} else {
						merged[key] = val;
					}
				}
				config.backends = merged;
			}
		} catch {
			// ignore
		}
	}

	// Auto-enable backends that have a convenience env var but no explicit config yet.
	// Only enables if the backend is not explicitly disabled (enabled !== false).
	for (const [backend, envVar] of Object.entries(FALLBACK_ENV_MAP)) {
		const envValue = process.env[envVar];
		if (envValue && envValue.trim().length > 0) {
			const configBackends = config.backends ?? {};
			const existing = configBackends[backend as keyof typeof configBackends];
			if (!existing || existing.enabled === undefined) {
				if (!config.backends) config.backends = {};
				(config.backends as Record<string, BackendConfig>)[backend] = {
					...existing,
					enabled: true,
				};
			}
		}
	}

	return config;
}

const MISSING_KEY_HELP =
	"Set the API key via env var (SEARCH_<BACKEND>_API_KEY), " +
	"config reference (\"apiKey\": \"SOME_ENV_VAR\"), " +
	"shell command (\"apiKey\": \"!pass show api/backend\"), " +
	"or a literal key in ~/.pi/agent/extensions/search.json. " +
	"DuckDuckGo & Marginalia need no key.";

const HTTP_TIMEOUT_MS = 30_000;

/** Simple per-backend cooldown to avoid hammering rate-limited APIs. */
const COOLDOWN_MS = 2_000;
const backendCooldowns = new Map<string, number>();

function waitForCooldown(backend: string): Promise<void> {
	const until = backendCooldowns.get(backend);
	if (!until) return Promise.resolve();
	const delay = until - Date.now();
	if (delay <= 0) return Promise.resolve();
	return new Promise(r => setTimeout(r, delay));
}

function markCooldown(backend: string) {
	backendCooldowns.set(backend, Date.now() + COOLDOWN_MS);
}

/** Combine an optional caller signal with a default timeout. */
function timeoutSignal(signal?: AbortSignal): AbortSignal | undefined {
	if (!signal) return AbortSignal.timeout(HTTP_TIMEOUT_MS);
	return AbortSignal.any([signal, AbortSignal.timeout(HTTP_TIMEOUT_MS)]);
}

/** Sanitize API error text — truncate and strip potential secrets. */
function sanitizeError(status: number, text: string): string {
	const safe = text
		// Redact "Bearer <token>" and "Token <value>" patterns
		.replace(/(bearer|token)\s+[\w.\/-]{8,}/gi, "$1 [redacted]")
		// Redact key=value or "key": "value" pairs for known secret keys
		.replace(/(api[-_]?key|bearer|token|authorization|secret|password)["']?\s*[:=]\s*["']?[\w.\/-]{8,}/gi, "[redacted]")
		// Redact JSON key-value pairs where the value looks like a key
		.replace(/"(?:api[-_]?key|apiKey|token|secret|password|bearer)"\s*:\s*"[^"']{8,}"/gi, '"[redacted]"')
		// Redact x-api-key / Authorization header values in raw text
		.replace(/(x-api-key|authorization)\s*:\s*[\w.\/-]{8,}/gi, "$1: [redacted]")
		.slice(0, 300);
	return `API error (${status}): ${safe}`;
}


// ---------------------------------------------------------------------------
// Backend: DuckDuckGo (free, no key needed)
// ---------------------------------------------------------------------------

interface DuckDuckGoResult {
	title: string;
	url: string;
	snippet: string;
}

async function searchDuckDuckGo(
	query: string,
	numResults: number,
	signal?: AbortSignal,
): Promise<{ results: DuckDuckGoResult[] }> {
	if (signal?.aborted) throw new Error("DuckDuckGo search aborted");

	const pyScript = `
import json, sys
try:
    from ddgs import DDGS
except ImportError:
    # ddgs may be installed as a uv tool — find it and add to sys.path
    import subprocess, pathlib
    try:
        ddgs_bin = subprocess.check_output(["which", "ddgs"], text=True, stderr=subprocess.DEVNULL).strip()
        if ddgs_bin:
            # Walk up from the binary until we find site-packages — no hardcoded depth assumption
            ddgs_path = pathlib.Path(ddgs_bin).resolve()
            found = False
            for parent in [ddgs_path, *ddgs_path.parents]:
                for py_ver_dir in sorted((parent / "lib").iterdir(), reverse=True):
                    sp = py_ver_dir / "site-packages"
                    if sp.is_dir():
                        sys.path.insert(0, str(sp))
                        found = True
                        break
                if found:
                    break
            if not found:
                sys.exit(1)
    except Exception:
        sys.exit(1)
    from ddgs import DDGS
results = []
with DDGS() as ddgs:
    for i, r in enumerate(ddgs.text(${JSON.stringify(query)}, max_results=${numResults})):
        results.append({"title": r.get("title",""), "url": r.get("href",""), "snippet": r.get("body","")})
print(json.dumps({"results": results}))
`;

	return new Promise((resolve, reject) => {
		const pythonCmd = process.platform === "win32" ? "python" : "python3";
		const proc = spawn(pythonCmd, ["-c", pyScript], {
			stdio: ["pipe", "pipe", "pipe"],
		});

		let stdout = "";
		let stderr = "";

		proc.stdout.on("data", (data: Buffer) => { stdout += data.toString(); });
		proc.stderr.on("data", (data: Buffer) => { stderr += data.toString(); });

		// Timeout timer
		const timeout = setTimeout(() => {
			proc.kill();
			reject(new Error("DuckDuckGo search timed out"));
		}, HTTP_TIMEOUT_MS);

		// Abort signal handler
		const onAbort = () => {
			clearTimeout(timeout);
			proc.kill();
			reject(new Error("DuckDuckGo search aborted"));
		};
		if (signal) {
			if (signal.aborted) { clearTimeout(timeout); reject(new Error("DuckDuckGo search aborted")); return; }
			signal.addEventListener("abort", onAbort, { once: true });
		}

		proc.on("close", (code) => {
			clearTimeout(timeout);
			if (signal) signal.removeEventListener("abort", onAbort);
			if (code === 0) {
				try {
					resolve(JSON.parse(stdout.trim()));
				} catch {
					reject(new Error(`DuckDuckGo search: invalid JSON output: ${stdout.slice(0, 200)}`));
				}
			} else {
				const msg = stderr.trim().slice(0, 300);
				reject(new Error(`DuckDuckGo search failed (exit ${code}): ${msg || "unknown error"}`));
			}
		});

		proc.on("error", (err) => {
			clearTimeout(timeout);
			if (signal) signal.removeEventListener("abort", onAbort);
			reject(new Error(`DuckDuckGo search failed: ${err.message}`));
		});
	});
}

// ---------------------------------------------------------------------------
// Backend: Marginalia Search (anti-SEO independent search, uses "public" key)
// ---------------------------------------------------------------------------

async function searchMarginalia(
	query: string,
	numResults: number,
	apiKey: string | undefined,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet: string }> }> {
	const key = apiKey || "public";
	const response = await fetch(
		`https://api.marginalia.nu/${encodeURIComponent(key)}/search/${encodeURIComponent(query)}?index=0&count=${Math.min(numResults, 50)}`,
		{
			signal: timeoutSignal(signal),
			headers: { "Accept": "application/json" },
		},
	);

	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`Marginalia ${sanitizeError(response.status, text)}`);
	}

	const data = (await response.json()) as Record<string, unknown>;

	return {
		results: parseMarginalia(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: Serper.dev (Google search, needs API key)
// ---------------------------------------------------------------------------

async function searchSerper(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet: string }> }> {
	const body = { q: query, num: Math.min(numResults, 100) };
	const response = await fetch("https://google.serper.dev/search", {
		method: "POST",
		headers: {
			"X-API-KEY": apiKey,
			"Content-Type": "application/json",
		},
		body: JSON.stringify(body),
		signal: timeoutSignal(signal),
	});
	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`Serper ${sanitizeError(response.status, text)}`);
	}
	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseSerper(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: Tavily (AI-agent search, needs API key)
// ---------------------------------------------------------------------------

async function searchTavily(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet: string; content?: string }> }> {
	const body = {
		query,
		max_results: Math.min(numResults, 20),
		include_answer: false,
	};
	const response = await fetch("https://api.tavily.com/search", {
		method: "POST",
		headers: {
			"Content-Type": "application/json",
			Authorization: `Bearer ${apiKey}`,
		},
		body: JSON.stringify(body),
		signal: timeoutSignal(signal),
	});
	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`Tavily ${sanitizeError(response.status, text)}`);
	}
	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseTavily(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: Exa (optional, needs API key)
// ---------------------------------------------------------------------------

async function searchExa(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet?: string }> }> {
	const body = {
		query,
		numResults: Math.min(numResults, 25),
		contents: { text: true, highlights: true },
	};
	const response = await fetch("https://api.exa.ai/search", {
		method: "POST",
		headers: {
			"Content-Type": "application/json",
			"x-api-key": apiKey,
		},
		body: JSON.stringify(body),
		signal: timeoutSignal(signal),
	});
	if (!response.ok) {
		const text = await response.text().catch(() => "");
		let detail = text;
		try {
			const json = JSON.parse(text);
			detail = json.error || json.message || text;
		} catch {
			// use raw
		}
		throw new Error(`Exa ${sanitizeError(response.status, detail)}`);
	}
	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseExa(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: Brave Search (metered billing ~$5/mo credit, needs API key)
// ---------------------------------------------------------------------------

async function searchBrave(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet?: string }> }> {
	const params = new URLSearchParams({ q: query, count: String(Math.min(numResults, 20)) });
	const response = await fetch(`https://api.search.brave.com/res/v1/web/search?${params}`, {
		method: "GET",
		headers: {
			"Accept": "application/json",
			"Accept-Encoding": "gzip",
			"X-Subscription-Token": apiKey,
		},
		signal: timeoutSignal(signal),
	});
	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`Brave ${sanitizeError(response.status, text)}`);
	}
	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseBrave(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: LangSearch (genuinely free tier, no credit card, needs API key)
// Endpoint: POST /v1/web-search, auth: Authorization: Bearer
// ---------------------------------------------------------------------------

async function searchLangSearch(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet?: string }> }> {
	const body = { query, max_results: Math.min(numResults, 20) };
	const response = await fetch("https://api.langsearch.com/v1/web-search", {
		method: "POST",
		headers: {
			"Content-Type": "application/json",
			"Authorization": `Bearer ${apiKey}`,
		},
		body: JSON.stringify(body),
		signal: timeoutSignal(signal),
	});
	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`LangSearch ${sanitizeError(response.status, text)}`);
	}
	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseLangSearch(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: Firecrawl (500 free credits, search+crawl+extract, needs API key)
// ---------------------------------------------------------------------------

async function searchFirecrawl(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet?: string }> }> {
	const body = { query, limit: Math.min(numResults, 20) };
	const response = await fetch("https://api.firecrawl.dev/v2/search", {
		method: "POST",
		headers: {
			"Content-Type": "application/json",
			"Authorization": `Bearer ${apiKey}`,
		},
		body: JSON.stringify(body),
		signal: timeoutSignal(signal),
	});
	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`Firecrawl ${sanitizeError(response.status, text)}`);
	}
	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseFirecrawl(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: WebSearchAPI.ai (2000 free credits, needs API key)
// Endpoint: POST /ai-search, auth: Authorization: Bearer
// Params: maxResults, includeContent, country, language
// ---------------------------------------------------------------------------

async function searchWebSearchAPI(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet?: string }> }> {
	const body = {
		query,
		maxResults: Math.min(numResults, 20),
		includeContent: false,
		country: "us",
		language: "en",
	};
	const response = await fetch("https://api.websearchapi.ai/ai-search", {
		method: "POST",
		headers: {
			"Content-Type": "application/json",
			"Authorization": `Bearer ${apiKey}`,
		},
		body: JSON.stringify(body),
		signal: timeoutSignal(signal),
	});
	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`WebSearchAPI ${sanitizeError(response.status, text)}`);
	}
	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseWebSearchAPI(data, numResults),
	};
}
// ---------------------------------------------------------------------------
// Backend: Perplexity Sonar (free tier, unlimited queries, needs API key)
// Endpoint: POST /chat/completions, auth: Authorization: Bearer
// Uses sonar model (configurable), extracts citations from response as search results
// ---------------------------------------------------------------------------

async function searchPerplexity(
	query: string,
	numResults: number,
	apiKey: string,
	signal?: AbortSignal,
	model?: string,
): Promise<{ results: Array<{ title: string; url: string; snippet?: string }> }> {
	const body = {
		model: model || "sonar",
		messages: [
			{
				role: "user",
				content: query,
			},
		],
		search_context_size: "high",
	};

	const response = await fetch("https://api.perplexity.ai/chat/completions", {
		method: "POST",
		headers: {
			"Content-Type": "application/json",
			"Authorization": `Bearer ${apiKey}`,
		},
		body: JSON.stringify(body),
		signal: timeoutSignal(signal),
	});

	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`Perplexity ${sanitizeError(response.status, text)}`);
	}

	const data = (await response.json()) as Record<string, unknown>;

	return { results: parsePerplexity(data, query, numResults) };
}

// ---------------------------------------------------------------------------
// Backend: SearXNG (self-hosted metasearch, aggregates 70+ providers)
// Endpoint: GET /search?q=<query>&format=json, optional auth via API key header
// Needs instance URL configured in search.json
// ---------------------------------------------------------------------------

async function searchSearXNG(
	query: string,
	numResults: number,
	apiKey: string | undefined,
	instanceUrl: string | undefined,
	signal?: AbortSignal,
): Promise<{ results: Array<{ title: string; url: string; snippet?: string }> }> {
	if (!instanceUrl) {
		throw new Error("SearXNG instance URL not configured. Set searxng.instanceUrl in search.json (e.g. http://localhost:8888)");
	}

	const baseUrl = instanceUrl.replace(/\/+$/, "");
	const params = new URLSearchParams({
		q: query,
		format: "json",
		count: String(Math.min(numResults, 50)),
	});

	const headers: Record<string, string> = {
		"Accept": "application/json",
	};
	if (apiKey) {
		headers["Authorization"] = `Bearer ${apiKey}`;
	}

	const response = await fetch(`${baseUrl}/search?${params}`, {
		method: "GET",
		headers,
		signal: timeoutSignal(signal),
	});

	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`SearXNG ${sanitizeError(response.status, text)}`);
	}

	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseSearXNG(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend: Jina AI (s.jina.ai) — search results via s.jina.ai; requires API key from jina.ai (free). web_read uses r.jina.ai (Reader, no key needed).
// Endpoint: GET https://s.jina.ai/?q=<query>, returns 5 results as markdown or JSON
// ---------------------------------------------------------------------------

interface JinaResult {
	title: string;
	url: string;
	content: string;
}

async function searchJina(
	query: string,
	numResults: number,
	apiKey?: string,
	signal?: AbortSignal,
): Promise<{ results: JinaResult[] }> {
	const url = `https://s.jina.ai/?q=${encodeURIComponent(query)}&format=json`;
	const headers: Record<string, string> = {
		"Accept": "application/json",
	};
	if (apiKey) {
		headers["Authorization"] = `Bearer ${apiKey}`;
	}
	const response = await fetch(url, {
		signal: timeoutSignal(signal),
		headers,
	});

	if (!response.ok) {
		const text = await response.text().catch(() => "");
		throw new Error(`Jina AI ${sanitizeError(response.status, text)}`);
	}

	const data = (await response.json()) as Record<string, unknown>;
	return {
		results: parseJina(data, numResults),
	};
}

// ---------------------------------------------------------------------------
// Backend Registry
// ---------------------------------------------------------------------------

interface BackendRunner {
	needsKey: boolean;
	needsKeyFromConfig: boolean;
	optionalKey: boolean;
	needsInstanceUrl: boolean;
	label: string;
	setupLabel: string | null;
	search: (query: string, numResults: number, deps: { key?: string; instanceUrl?: string; signal?: AbortSignal }) => Promise<{ results: Array<{ title: string; url: string; snippet?: string; content?: string }> }>;
}

const BACKEND_DEFS: Record<string, BackendRunner> = {
	duckduckgo: {
		needsKey: false,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "DuckDuckGo",
		setupLabel: null,
		search: async (query, numResults, { signal }) => {
			const ddg = await searchDuckDuckGo(query, numResults, signal);
			return { results: ddg.results };
		},
	},
	jina: {
		needsKey: false,
		needsKeyFromConfig: false,
		optionalKey: true,
		needsInstanceUrl: false,
		label: "Jina AI",
		setupLabel: "Jina AI (free tier, API key optional for higher rate limits)",
		search: async (query, numResults, { key, signal }) => {
			return await searchJina(query, numResults, key, signal);
		},
	},
	marginalia: {
		needsKey: false,
		needsKeyFromConfig: true,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "Marginalia",
		setupLabel: null,
		search: async (query, numResults, { key, signal }) => {
			const marg = await searchMarginalia(query, numResults, key, signal);
			return { results: marg.results };
		},
	},
	serper: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "Serper",
		setupLabel: "Serper (Google — 2500 free queries, one-time)",
		search: async (query, numResults, { key, signal }) => {
			const serp = await searchSerper(query, numResults, key!, signal);
			return { results: serp.results };
		},
	},
	tavily: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "Tavily",
		setupLabel: "Tavily (AI agent search — 1000 free calls/month)",
		search: async (query, numResults, { key, signal }) => {
			const tav = await searchTavily(query, numResults, key!, signal);
			return { results: tav.results };
		},
	},
	exa: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "Exa",
		setupLabel: "Exa (AI search — 1000 free queries/month)",
		search: async (query, numResults, { key, signal }) => {
			const exa = await searchExa(query, numResults, key!, signal);
			return { results: exa.results };
		},
	},
	brave: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "Brave",
		setupLabel: "Brave Search (metered billing ~$5/mo credit)",
		search: async (query, numResults, { key, signal }) => {
			const br = await searchBrave(query, numResults, key!, signal);
			return { results: br.results };
		},
	},
	langsearch: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "LangSearch",
		setupLabel: "LangSearch (genuinely free, no CC)",
		search: async (query, numResults, { key, signal }) => {
			const ls = await searchLangSearch(query, numResults, key!, signal);
			return { results: ls.results };
		},
	},
	firecrawl: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "Firecrawl",
		setupLabel: "Firecrawl (500 free credits)",
		search: async (query, numResults, { key, signal }) => {
			const fc = await searchFirecrawl(query, numResults, key!, signal);
			return { results: fc.results };
		},
	},
	websearchapi: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "WebSearchAPI",
		setupLabel: "WebSearchAPI.ai (2000 free credits)",
		search: async (query, numResults, { key, signal }) => {
			const ws = await searchWebSearchAPI(query, numResults, key!, signal);
			return { results: ws.results };
		},
	},
	perplexity: {
		needsKey: true,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: false,
		label: "Perplexity Sonar",
		setupLabel: "Perplexity Sonar (paid, usage-based)",
		search: async (query, numResults, { key, signal }) => {
			const bc = (config.backends as Record<string, BackendConfig> | undefined)?.perplexity;
			const model = (bc as Record<string, unknown>)?.model as string | undefined;
			const pp = await searchPerplexity(query, numResults, key!, signal, model);
			return { results: pp.results };
		},
	},
	searxng: {
		needsKey: false,
		needsKeyFromConfig: false,
		optionalKey: false,
		needsInstanceUrl: true,
		label: "SearXNG",
		setupLabel: "SearXNG (self-hosted, needs instance URL)",
		search: async (query, numResults, { key, instanceUrl, signal }) => {
			const sx = await searchSearXNG(query, numResults, key, instanceUrl, signal);
			return { results: sx.results };
		},
	},
};

// ---------------------------------------------------------------------------
// Reciprocal Rank Fusion
// ---------------------------------------------------------------------------

/**
 * RRF (Reciprocal Rank Fusion) — rank-based merge across backends.
 * Constant k=60 is standard from the original RRF paper.
 */
const RRF_K = 60;

function reciprocalRankFusion(
	backendResults: Array<{ backend: string; results: SearchResultWithBackend[] }>,
	numResults: number,
): SearchResultWithBackend[] {
	// Score each unique result by its rank positions across backends
	const urlScores = new Map<string, { score: number; result: SearchResultWithBackend; seenBackends: Set<string> }>();

	for (const { backend, results } of backendResults) {
		for (let i = 0; i < results.length; i++) {
			const r = results[i];
			const normalizedUrl = r.url.replace(/\/$/, "").toLowerCase(); // normalize trailing slash

			let entry = urlScores.get(normalizedUrl);
			if (!entry) {
				entry = { score: 0, result: r, seenBackends: new Set() };
				urlScores.set(normalizedUrl, entry);
			}

			// RRF: score += 1 / (k + rank)
			entry.score += 1 / (RRF_K + i);
			entry.seenBackends.add(backend);

			// Keep the result with the most complete data (prefer content over snippet)
			if (r.content && !entry.result.content) {
				entry.result = r;
			}
		}
	}

	// Sort by RRF score descending, then by number of backends that found it
	const sorted = Array.from(urlScores.values())
		.sort((a, b) => {
			const scoreDiff = b.score - a.score;
			if (scoreDiff !== 0) return scoreDiff;
			return b.seenBackends.size - a.seenBackends.size;
		})
		.slice(0, numResults)
		.map(e => e.result);

	return sorted;
}

// ---------------------------------------------------------------------------
// Result formatting
// ---------------------------------------------------------------------------

interface SearchResultWithBackend {
	title: string;
	url: string;
	snippet?: string;
	content?: string;
	backend?: string;
}

function formatResults(
	query: string,
	backend: string,
	results: Array<{ title: string; url: string; snippet?: string; content?: string }>,
): string {
	// Escape newlines and markdown heading chars in query to prevent injection
	const safeQuery = query.replace(/[\n\r]/g, " ").replace(/^#/gm, "\\#");
	const lines: string[] = [
		`## Search Results: "${safeQuery}"`,
		`Backend: ${backend}  ·  Results: ${results.length}`,
		"",
	];
	for (let i = 0; i < results.length; i++) {
		const r = results[i];
		lines.push(`### ${i + 1}. ${r.title || "Untitled"}`);
		lines.push(`   URL: ${r.url}`);
		const displayText = r.snippet || r.content || "";
		if (displayText) {
			const text = displayText.slice(0, 500);
			lines.push(`   ${text}${displayText.length > 500 ? "..." : ""}`);
		}
		lines.push("");
	}
	return lines.join("\n");
}

function formatCombinedResults(
	query: string,
	results: SearchResultWithBackend[],
	backendStats: Map<string, { success: boolean; count: number; error?: string }>,
): string {
	const safeQuery = query.replace(/[\n\r]/g, " ").replace(/^#/gm, "\\#");
	const lines: string[] = [
		`## Search Results: "${safeQuery}"`,
		`Mode: combined  ·  Results: ${results.length}`,
		"",
	];

	// Add backend stats (derived from registry)
	const backendLabel = Object.fromEntries(
		Object.entries(BACKEND_DEFS).map(([k, v]) => [k, v.label])
	) as Record<string, string>;

	lines.push("**Backends queried:**");
	for (const [backend, stats] of backendStats.entries()) {
		const label = backendLabel[backend] || backend;
		if (stats.success) {
			lines.push(`  - ${label}: ${stats.count} results`);
		} else {
			lines.push(`  - ${label}: failed (${stats.error || "unknown error"})`);
		}
	}
	lines.push("");

	// Add results
	for (let i = 0; i < results.length; i++) {
		const r = results[i];
		lines.push(`### ${i + 1}. ${r.title || "Untitled"}`);
		if (r.backend) {
			lines.push(`   *Source: ${backendLabel[r.backend] || r.backend}*`);
		}
		lines.push(`   URL: ${r.url}`);
		const displayText = r.snippet || r.content || "";
		if (displayText) {
			const text = displayText.slice(0, 500);
			lines.push(`   ${text}${displayText.length > 500 ? "..." : ""}`);
		}
		lines.push("");
	}
	return lines.join("\n");
}

// ---------------------------------------------------------------------------
// Extension
// ---------------------------------------------------------------------------

/** Module-level config accessible from helper functions like resolveBackendKey(). */
let config: SearchConfig = { defaultBackend: "duckduckgo", backends: {} };

export default function (pi: ExtensionAPI) {
	let activeBackends: string[] = [];
	let configCacheTime = 0;
	const CONFIG_TTL_MS = 10_000; // re-read config at most every 10s

	function refreshConfig(cwd: string, force = false) {
		const now = Date.now();
		if (!force && now - configCacheTime < CONFIG_TTL_MS) return;

		config = loadConfig(cwd);
		configCacheTime = now;

		activeBackends = Object.entries(config.backends || {})
			.filter(([_, bc]) => bc?.enabled)
			.map(([name]) => name);

		// Always add duckduckgo if no backends explicitly enabled, since it needs no key
		if (activeBackends.length === 0) {
			activeBackends.push("duckduckgo");
		}

		// Honor defaultBackend: put it first in the auto-try order
		if (config.defaultBackend && activeBackends.includes(config.defaultBackend)) {
			activeBackends = [
				config.defaultBackend,
				...activeBackends.filter(b => b !== config.defaultBackend),
			];
		} else {
			config.defaultBackend = activeBackends[0];
		}

		// Invalidate credential cache so shell-command keys refresh after config reload
		clearCredentialCache();
	}

	// -----------------------------------------------------------------------
	// Backend dispatcher
	// -----------------------------------------------------------------------

	async function runBackend(
		backend: string,
		query: string,
		numResults: number,
		signal?: AbortSignal,
	): Promise<Array<{ title: string; url: string; snippet?: string; content?: string }>> {
		await waitForCooldown(backend);
		try {
			const def = BACKEND_DEFS[backend];
			if (!def) throw new Error(`Unknown backend: ${backend}`);

			let key: string | undefined;
			if (def.needsKeyFromConfig) {
				const bc = (config.backends as Record<string, BackendConfig> | undefined)?.[backend];
				key = bc?.apiKey;
			} else if (def.needsKey) {
				key = resolveBackendKey(backend);
				if (!key) {
					const label = def.label;
					throw new Error(`${label} backend not configured. ${MISSING_KEY_HELP}`);
				}
			} else if (def.optionalKey) {
				// Optionally resolve key — don't throw if missing
				key = resolveBackendKey(backend);
			}

			let instanceUrl: string | undefined;
			if (def.needsInstanceUrl) {
				const bc = (config.backends as Record<string, BackendConfig> | undefined)?.[backend];
				instanceUrl = bc?.instanceUrl;
				if (!instanceUrl) {
					throw new Error(`SearXNG instance URL not configured. Set searxng.instanceUrl in search.json`);
				}
			}

			const result = await def.search(query, numResults, { key, instanceUrl, signal });
			return result.results;
		} finally {
			markCooldown(backend);
		}
	}

	// -----------------------------------------------------------------------
	// Tool: web_search
	// -----------------------------------------------------------------------

	pi.registerTool({
		name: "web_search",
		label: "Web Search",
		description:
			"Search the web using one of several backend search engines. " +
			"Supports DuckDuckGo (free, no key), " +
			"Marginalia Search (free, shared public key), Serper, Tavily, Exa, Brave, " +
			"LangSearch, Firecrawl, WebSearchAPI, Perplexity Sonar, and SearXNG (most need API keys). " +
			"The best available backend is used automatically. " +
			"Use combine=true to query all enabled backends in parallel for broader coverage. " +
			"Use for fact-finding, research, documentation lookups, and current events.",
		promptSnippet: "Search the web (supports multiple search backends)",
		promptGuidelines: [
			"Use web_search when you need up-to-date information, facts, or documentation from the web",
			"Auto mode tries enabled backends in order (DuckDuckGo is the free fallback)",
			"Set combine=true to query ALL backends in parallel and merge/deduplicate results",
			"Configure additional backends in .pi/search.json for better quality results",
		],
		parameters: Type.Object({
			query: Type.String({
				description: "Search query (natural language works best)",
			}),
			numResults: Type.Optional(
				Type.Number({
					description: "Number of results (1-20, default 10)",
					default: 10,
				}),
			),
			backend: Type.Optional(
				StringEnum(["duckduckgo", "jina", "marginalia", "serper", "tavily", "exa",
					"brave", "langsearch", "firecrawl", "websearchapi", "perplexity", "searxng", "auto"] as const, {
					description:
						"Backend to use. 'auto' picks the best configured backend (default)",
				}),
			),
			combine: Type.Optional(
				Type.Boolean({
					description:
						"When true, queries ALL enabled backends in parallel and merges/deduplicates results. " +
						"Default is false (fallback mode: uses first successful backend only). " +
						"Ignored when a specific backend is requested (backend != 'auto').",
					default: false,
				}),
			),
		}),
		async execute(_toolCallId, params, signal, _onUpdate, ctx) {
			refreshConfig(ctx.cwd);
			const numResults = Math.max(1, Math.min(params.numResults ?? 10, 20));
			const requestedBackend = params.backend || "auto";
			const combine = params.combine ?? false;

			if (requestedBackend !== "auto") {
				// Specific backend requested — try it directly
				const results = await runBackend(requestedBackend, params.query, numResults, signal);
				return {
					content: [{ type: "text", text: formatResults(params.query, requestedBackend, results) }],
					details: { backend: requestedBackend, resultCount: results.length },
				};
			}

			// Auto mode
			if (combine) {
				// Combine mode: query all enabled backends in parallel
				const resultsPerBackend = await Promise.all(
					activeBackends.map(async (backend) => {
						try {
							const results = await runBackend(
								backend,
								params.query,
								Math.ceil(numResults / activeBackends.length),
								signal,
							);
							return {
								backend,
								results: results.map((r) => ({ ...r, backend })) as SearchResultWithBackend[],
								success: true,
							};
						} catch (err) {
							return {
								backend,
								results: [] as SearchResultWithBackend[],
								success: false,
								error: (err as Error).message,
							};
						}
					}),
				);

				// Build backend stats map
				const backendStats = new Map<
					string,
					{ success: boolean; count: number; error?: string }
				>();

				for (const { backend, results, success, error } of resultsPerBackend) {
					backendStats.set(backend, {
						success,
						count: results.length,
						error,
					});
				}

				// Merge and re-rank using Reciprocal Rank Fusion
				const successfulBackends = resultsPerBackend
					.filter(r => r.success && r.results.length > 0)
					.map(r => ({ backend: r.backend, results: r.results }));

				const combined = successfulBackends.length > 0
					? reciprocalRankFusion(successfulBackends, numResults)
					: [];

				return {
					content: [
						{
							type: "text",
							text: formatCombinedResults(params.query, combined, backendStats),
						},
					],
					details: {
						backend: "combined",
						resultCount: combined.length,
						backendStats: Object.fromEntries(backendStats),
					},
				};
			} else {
				// Fallback mode: try each enabled backend in order
				const errors: string[] = [];
				for (const backend of activeBackends) {
					try {
						const results = await runBackend(backend, params.query, numResults, signal);
						return {
							content: [
								{
									type: "text",
									text: errors.length > 0
										? `${errors.join("; ")}\n\n${formatResults(params.query, backend, results)}`
										: formatResults(params.query, backend, results),
								},
							],
							details: {
								backend: errors.length > 0 ? `${backend} (fallback)` : backend,
								resultCount: results.length,
								errors: errors.length > 0 ? errors : undefined,
							},
						};
					} catch (err) {
						errors.push(`${backend}: ${(err as Error).message}`);
					}
				}

				throw new Error(`All backends failed: ${errors.join("; ")}`);
			}
		},
	});

	// -----------------------------------------------------------------------
	// Tool: web_read — Read/extract content from a URL
	// -----------------------------------------------------------------------

	pi.registerTool({
		name: "web_read",
		label: "Read Web Page",
		description:
			"Fetch a URL as markdown. Use objective for a concrete question, keywords for long pages, " +
			"rush for speed, smart for better narrowing.",
		promptSnippet: "Read content from a web page (supports markdown extraction)",
		promptGuidelines: [
			"Use web_read when you need to read the content of a specific URL",
			"Set objective for a concrete question when only part of the page matters",
			"Add keywords for long pages when you know the relevant terms",
			"Choose rush for speed or smart for higher-quality narrowing",
		],
		parameters: Type.Object({
			url: Type.String({
				description: "HTTP(S) URL or bare domain to fetch",
			}),
			fresh: Type.Optional(
				Type.Boolean({
					description: "Bypass cache when freshness matters",
				}),
			),
			keywords: Type.Optional(
				Type.Array(Type.String(), {
					description: "Keyword to focus extraction on relevant sections",
				}),
			),
			mode: Type.Optional(
				StringEnum(["rush", "smart"] as const, {
					description: "rush = faster mode, smart = better section selection on long/noisy pages",
				}),
			),
			objective: Type.Optional(
				Type.String({
					description:
						"CSS selector for targeted extraction. Use when only part of the page matters.",
				}),
			),
		}),
		async execute(_toolCallId, params, signal, _onUpdate, ctx) {
			refreshConfig(ctx.cwd);

			const url = params.url.startsWith("https://") || params.url.startsWith("http://")
				? params.url
				: `https://${params.url}`;

			// Build Jina Reader URL
			const readerUrl = new URL("https://r.jina.ai/" + url);

			const headers: Record<string, string> = {
				"Accept": "text/plain",
			};

			// Optional Jina API key for higher rate limits (fallback to no-auth)
			const jinaKey = resolveBackendKey("jina");
			if (jinaKey) {
				headers["Authorization"] = `Bearer ${jinaKey}`;
			}

			if (params.fresh) {
				headers["x-no-cache"] = "true";
			}
			if (params.keywords && params.keywords.length > 0) {
				headers["x-keywords"] = params.keywords.join(", ");
			}
			if (params.mode) {
				headers["x-respond-with"] = params.mode === "rush" ? "text" : "markdown";
			}
			if (params.objective) {
				headers["x-target-selector"] = params.objective;
			}

			const response = await fetch(readerUrl.toString(), {
				signal: timeoutSignal(signal),
				headers,
			});

			if (!response.ok) {
				const text = await response.text().catch(() => "");
				throw new Error(`Failed to read ${url}: ${sanitizeError(response.status, text)}`);
			}

			const content = await response.text();
			const truncated = content.length > 10000
				? content.slice(0, 10000) + `\n\n[... truncated, full length: ${content.length} chars]`
				: content;

			return {
				content: [{ type: "text", text: truncated }],
				details: {
					url,
					length: content.length,
					truncated: content.length > 10000,
				},
			};
		},
	});

	// -----------------------------------------------------------------------
	// Commands
	// -----------------------------------------------------------------------

	pi.registerCommand("search-setup", {
		description: "Configure search backends interactively",
		handler: async (_args, ctx) => {
			if (!ctx.hasUI) {
				ctx.ui.notify("/search-setup requires interactive mode", "error");
				return;
			}

			const backends = Object.values(BACKEND_DEFS)
				.filter(d => d.setupLabel !== null)
				.map(d => d.setupLabel!);

			const backendKey: Record<string, string> = Object.fromEntries(
				Object.entries(BACKEND_DEFS)
					.filter(([_, d]) => d.setupLabel !== null)
					.map(([k, d]) => [d.setupLabel!, k])
			);

			const option = await ctx.ui.select("Which backend do you want to configure?", [
				...backends,
				"✅ Done — save and exit",
			]);

			if (!option || option.startsWith("✅ Done")) {
				ctx.ui.notify("Search setup complete.", "info");
				return;
			}

			const backend = backendKey[option];
			const label = option;

			const key = await ctx.ui.input(`Enter your ${label} API key:`, {
				placeholder: "sk-...",
				validate: (v: string) =>
					v.trim().length > 0 ? undefined : "Key cannot be empty",
			});

			if (!key) {
				ctx.ui.notify("Setup cancelled.", "info");
				return;
			}

			const configDir = join(getAgentDir(), "extensions");
			const configPath = join(configDir, "search.json");

			mkdirSync(configDir, { recursive: true });

			let existing: SearchConfig = {};
			if (existsSync(configPath)) {
				try {
					existing = JSON.parse(readFileSync(configPath, "utf-8"));
				} catch {
					// ignore
				}
			}

			// SearXNG setup needs both instance URL and optional API key
			let backendConfig: BackendConfig = { enabled: true };
			if (backend === "searxng") {
				const url = await ctx.ui.input("Enter your SearXNG instance URL (e.g. http://localhost:8888):", {
					placeholder: "http://localhost:8888",
					validate: (v: string) =>
						v.trim().length > 0 ? undefined : "URL cannot be empty",
				});
				if (!url) {
					ctx.ui.notify("Setup cancelled.", "info");
					return;
				}
				backendConfig.instanceUrl = url.trim();
				// Optionally ask for API key (some instances require auth)
				const optionalKey = await ctx.ui.input("Optional API key (leave empty if none):", {
					placeholder: "sk-... (optional)",
				});
				if (optionalKey && optionalKey.trim()) {
					backendConfig.apiKey = optionalKey.trim();
				}
			} else {
				backendConfig.apiKey = key?.trim() || "";
			}

			const updated: SearchConfig = {
				...existing,
				backends: {
					...existing.backends,
					[backend]: backendConfig,
				},
			};

			writeFileSync(configPath, JSON.stringify(updated, null, 2) + "\n", { mode: 0o600 });

			ctx.ui.notify(
				`${label} API key saved to ${configPath}. Run /reload to activate.`,
				"success",
			);
		},
	});

	pi.registerCommand("search-status", {
		description: "Show which search backends are configured and active",
		handler: async (_args, ctx) => {
			refreshConfig(ctx.cwd);

			const backendLabels: Record<string, string> = Object.fromEntries(
				Object.entries(BACKEND_DEFS).map(([k, v]) => [k, `${v.label}${k === "duckduckgo" ? " (free, no key)" : ""}`])
			);

			// Collect table rows first to compute aligned column widths
			type Row = [string, string];
			const rows: Row[] = [];

			for (const [name, label] of Object.entries(backendLabels)) {
				const { configured, source } = getKeySource(name);
				const bc = config.backends?.[name as keyof typeof config.backends];
				if (name === "duckduckgo") {
					rows.push([label, "✓ enabled, key: — (free)"]);
				} else if (name === "marginalia" && bc?.enabled) {
					rows.push([label, "✓ enabled, key: optional (public)"]);
				} else if (name === "searxng" && bc?.enabled) {
					const urlInfo = bc.instanceUrl ? `url: ${bc.instanceUrl}` : "no URL set";
					rows.push([label, `✓ enabled, ${urlInfo}${configured ? `, key: ✓ (${source})` : ", key: —"}`]);
				} else if (bc?.enabled) {
					rows.push([label, `✓ enabled, key: ✓${source ? ` (${source})` : ""}`]);
				} else {
					rows.push([label, `— disabled${configured ? `, key: ✓ (${source})` : ""}`]);
				}
			}

			// Compute column widths from headers + data
			const col1Header = "Backend";
			const col2Header = "Status";
			const w1 = rows.reduce((max, [c]) => Math.max(max, c.length), col1Header.length);
			const w2 = rows.reduce((max, [, s]) => Math.max(max, s.length), col2Header.length);

			const pad = (s: string, w: number) => s + " ".repeat(w - s.length);

			const tableLines = [
				`| ${pad(col1Header, w1)} | ${pad(col2Header, w2)} |`,
				`| ${"-".repeat(w1)} | ${"-".repeat(w2)} |`,
				...rows.map(([c1, c2]) => `| ${pad(c1, w1)} | ${pad(c2, w2)} |`),
			];

			const resolvedDefault = activeBackends[0] || "none";
			const lines: string[] = [
				"## Search Backend Status",
				`Configured default: ${config.defaultBackend || "none"}`,
				`Resolved default: ${resolvedDefault}`,
				`Active: ${activeBackends.join(", ") || "none"}`,
				"",
				...tableLines,
			];

			if (activeBackends.length === 1 && activeBackends[0] === "duckduckgo") {
				lines.push("");
				lines.push("Only DuckDuckGo is active (no API key needed).");
				lines.push("Add a search backend with /search-setup to get more results.");
			}

			ctx.ui.notify(lines.join("\n"), "info");
		},
	});

	// -----------------------------------------------------------------------
	// Session start
	// -----------------------------------------------------------------------

	pi.on("session_start", async (_event, ctx) => {
		backendCooldowns.clear();
		refreshConfig(ctx.cwd);
		const status = activeBackends.join(", ");
		ctx.ui.setStatus("search", `search: ${status}`);
	});
}


================================================
FILE: package.json
================================================
{
  "name": "pi-search-hub",
  "version": "1.4.4",
  "description": "Unified web search + content extraction extension for pi with 12 backends (DuckDuckGo, Jina AI, Tavily, Brave, Exa, Serper, Firecrawl, Marginalia, LangSearch, WebSearchAPI, Perplexity Sonar, SearXNG). Auto-fallback, RRF combine mode, web_read tool, secure credential resolution.",
  "keywords": [
    "pi-package",
    "pi",
    "pi-coding-agent",
    "search",
    "web-search",
    "web-read",
    "content-extraction",
    "search-hub",
    "duckduckgo",
    "jina",
    "tavily",
    "serper",
    "exa",
    "brave",
    "firecrawl",
    "langsearch",
    "websearchapi",
    "perplexity",
    "searxng",
    "rrf",
    "ai-agent"
  ],
  "author": "",
  "license": "MIT",
  "type": "module",
  "bugs": {
    "url": "https://github.com/ronnieops/pi-search-hub/issues"
  },
  "repository": {
    "type": "git",
    "url": "git+https://github.com/ronnieops/pi-search-hub.git"
  },
  "homepage": "https://github.com/ronnieops/pi-search-hub#readme",
  "pi": {
    "extensions": [
      "./extensions/search-hub.ts"
    ],
    "image": "https://pi.dev/assets/packages/pi-search-hub.png"
  },
  "peerDependencies": {
    "@earendil-works/pi-ai": "*",
    "@earendil-works/pi-coding-agent": "*"
  },
  "peerDependenciesMeta": {
    "@earendil-works/pi-ai": {
      "optional": true
    },
    "@earendil-works/pi-coding-agent": {
      "optional": true
    }
  },
  "dependencies": {
    "typebox": "^1.1.24"
  },
  "devDependencies": {
    "@types/node": "^25.6.0",
    "typescript": "^6.0.3",
    "vitest": "^4.1.7"
  }
}


================================================
FILE: search.json.example
================================================
{
  "defaultBackend": "auto",
  "backends": {
    "duckduckgo": { "enabled": true },
    "jina": { "enabled": true },
    "marginalia": { "enabled": true },
    "serper": { "enabled": true, "apiKey": "SERPER_API_KEY" },
    "brave": { "enabled": true, "apiKey": "BRAVE_API_KEY" },
    "tavily": { "enabled": true, "apiKey": "TAVILY_API_KEY" },
    "exa": { "enabled": true, "apiKey": "EXA_API_KEY" },
    "firecrawl": { "enabled": true, "apiKey": "FIRECRAWL_API_KEY" },
    "langsearch": { "enabled": true, "apiKey": "LANGSEARCH_API_KEY" },
    "websearchapi": { "enabled": true, "apiKey": "WEBSEARCHAPI_API_KEY" },
    "perplexity": { "enabled": true, "apiKey": "PERPLEXITY_API_KEY", "model": "sonar" },
    "searxng": { "enabled": true, "instanceUrl": "http://localhost:8888" }
  }
}

Download .txt

gitextract_flxax2c8/

├── .github/
│   └── workflows/
│       └── publish.yml
├── .gitignore
├── .npmignore
├── README.md
├── backends/
│   ├── parsers.test.ts
│   └── parsers.ts
├── extensions/
│   └── search-hub.ts
├── package.json
└── search.json.example

Download .txt

SYMBOL INDEX (57 symbols across 2 files)

FILE: backends/parsers.ts
  type ParsedResult (line 7) | interface ParsedResult {
  function parseMarginalia (line 18) | function parseMarginalia(
  function parseWebSearchAPI (line 35) | function parseWebSearchAPI(
  function parseSerper (line 53) | function parseSerper(
  type TavilyParsedResult (line 71) | interface TavilyParsedResult extends ParsedResult {
  function parseTavily (line 75) | function parseTavily(
  function parseExa (line 94) | function parseExa(
  function parseBrave (line 112) | function parseBrave(
  function parseLangSearch (line 134) | function parseLangSearch(
  function parseFirecrawl (line 152) | function parseFirecrawl(
  function parsePerplexity (line 182) | function parsePerplexity(
  function parseSearXNG (line 219) | function parseSearXNG(
  type JinaParsedResult (line 237) | interface JinaParsedResult extends ParsedResult {
  function parseJina (line 241) | function parseJina(

FILE: extensions/search-hub.ts
  type BackendConfig (line 56) | interface BackendConfig {
  type SearchConfig (line 65) | interface SearchConfig {
  function getAgentDir (line 83) | function getAgentDir(): string {
  constant COMMAND_TIMEOUT_MS (line 88) | const COMMAND_TIMEOUT_MS = 5_000;
  function resolveConfigValue (line 96) | function resolveConfigValue(reference: string | undefined): string | und...
  constant FALLBACK_ENV_MAP (line 140) | const FALLBACK_ENV_MAP: Record<string, string> = {
  function clearCredentialCache (line 153) | function clearCredentialCache(): void {
  function resolveBackendKey (line 158) | function resolveBackendKey(backend: string): string | undefined {
  function getKeySource (line 173) | function getKeySource(backend: string): { configured: boolean; source: s...
  function loadConfig (line 196) | function loadConfig(cwd: string): SearchConfig {
  constant MISSING_KEY_HELP (line 258) | const MISSING_KEY_HELP =
  constant HTTP_TIMEOUT_MS (line 265) | const HTTP_TIMEOUT_MS = 30_000;
  constant COOLDOWN_MS (line 268) | const COOLDOWN_MS = 2_000;
  function waitForCooldown (line 271) | function waitForCooldown(backend: string): Promise<void> {
  function markCooldown (line 279) | function markCooldown(backend: string) {
  function timeoutSignal (line 284) | function timeoutSignal(signal?: AbortSignal): AbortSignal | undefined {
  function sanitizeError (line 290) | function sanitizeError(status: number, text: string): string {
  type DuckDuckGoResult (line 309) | interface DuckDuckGoResult {
  function searchDuckDuckGo (line 315) | async function searchDuckDuckGo(
  function searchMarginalia (line 412) | async function searchMarginalia(
  function searchSerper (line 443) | async function searchSerper(
  function searchTavily (line 473) | async function searchTavily(
  function searchExa (line 507) | async function searchExa(
  function searchBrave (line 548) | async function searchBrave(
  function searchLangSearch (line 579) | async function searchLangSearch(
  function searchFirecrawl (line 609) | async function searchFirecrawl(
  function searchWebSearchAPI (line 641) | async function searchWebSearchAPI(
  function searchPerplexity (line 678) | async function searchPerplexity(
  function searchSearXNG (line 722) | async function searchSearXNG(
  type JinaResult (line 769) | interface JinaResult {
  function searchJina (line 775) | async function searchJina(
  type BackendRunner (line 808) | interface BackendRunner {
  constant BACKEND_DEFS (line 818) | const BACKEND_DEFS: Record<string, BackendRunner> = {
  constant RRF_K (line 974) | const RRF_K = 60;
  function reciprocalRankFusion (line 976) | function reciprocalRankFusion(
  type SearchResultWithBackend (line 1022) | interface SearchResultWithBackend {
  function formatResults (line 1030) | function formatResults(
  function formatCombinedResults (line 1056) | function formatCombinedResults(
  function refreshConfig (line 1114) | function refreshConfig(cwd: string, force = false) {
  function runBackend (line 1148) | async function runBackend(
  method execute (line 1239) | async execute(_toolCallId, params, signal, _onUpdate, ctx) {
  method execute (line 1392) | async execute(_toolCallId, params, signal, _onUpdate, ctx) {
  type Row (line 1562) | type Row = [string, string];

Download .json

Condensed preview — 9 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (96K chars).

[
  {
    "path": ".github/workflows/publish.yml",
    "chars": 913,
    "preview": "name: Publish Package\n\non:\n  workflow_dispatch:\n    inputs:\n      version:\n        description: \"Version to publish (mus"
  },
  {
    "path": ".gitignore",
    "chars": 542,
    "preview": "node_modules/\ndist/\n*.log\n.DS_Store\n\n# API keys — NEVER commit these\n.pi/\nsearch.json\nsearch.json.local\n*.secret.*\n\n# De"
  },
  {
    "path": ".npmignore",
    "chars": 454,
    "preview": "# Session artifacts\nhandoffs/\nscout*.md\nresearcher*.md\ncontext.md\nresearch.md\n\n# Personal/social media drafts\ngithub-met"
  },
  {
    "path": "README.md",
    "chars": 12360,
    "preview": "# pi-search-hub\n\nUnified web search + content extraction extension for [pi](https://pi.dev) with **12 backend providers*"
  },
  {
    "path": "backends/parsers.test.ts",
    "chars": 8938,
    "preview": "import { describe, it, expect } from \"vitest\";\nimport {\n\tparseMarginalia,\n\tparseWebSearchAPI,\n\tparseSerper,\n\tparseTavily"
  },
  {
    "path": "backends/parsers.ts",
    "chars": 8805,
    "preview": "/**\n * Pure response parsers for search backends.\n * Each takes raw JSON data and returns normalized results.\n * No HTTP"
  },
  {
    "path": "extensions/search-hub.ts",
    "chars": 53571,
    "preview": "/**\n * Extension — Unified web search (12 backends) + content extraction (web_read)\n *\n * Backends (choose any, all disa"
  },
  {
    "path": "package.json",
    "chars": 1590,
    "preview": "{\n  \"name\": \"pi-search-hub\",\n  \"version\": \"1.4.4\",\n  \"description\": \"Unified web search + content extraction extension f"
  },
  {
    "path": "search.json.example",
    "chars": 786,
    "preview": "{\n  \"defaultBackend\": \"auto\",\n  \"backends\": {\n    \"duckduckgo\": { \"enabled\": true },\n    \"jina\": { \"enabled\": true },\n  "
  }
]

About this extraction

This page contains the full source code of the ronnieops/pi-search-hub GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 9 files (85.9 KB), approximately 22.5k tokens, and a symbol index with 57 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo