Showing preview only (303K chars total). Download the full file or copy to clipboard to get everything.
Repository: TeamWiseFlow/wiseflow
Branch: master
Commit: fb6a5f700b86
Files: 63
Total size: 284.3 KB
Directory structure:
gitextract_7o0qgem0/
├── .claude/
│ └── 20260307_done.md
├── .github/
│ └── workflows/
│ ├── ci.yml
│ └── release.yml
├── .gitignore
├── CHANGELOG.md
├── CLAUDE.md
├── LICENSE
├── README.md
├── README_AR.md
├── README_DE.md
├── README_EN.md
├── README_FR.md
├── README_JP.md
├── README_KR.md
├── docs/
│ ├── anti-detection-research.md
│ ├── more_powerful_search_skill/
│ │ ├── 20260308_done.md
│ │ ├── direct_url_for_search_on_media_platform.md
│ │ ├── extra/
│ │ │ ├── arxiv.py
│ │ │ ├── baidu.py
│ │ │ ├── bing.py
│ │ │ ├── bing_images.py
│ │ │ ├── bing_news.py
│ │ │ ├── flickr.py
│ │ │ ├── quark.py
│ │ │ ├── wikipedia.py
│ │ │ └── youtube_noapi.py
│ │ └── rss_parsor.py
│ └── prompt_videos.md
├── openclaw.version
├── scripts/
│ └── generate-patch.sh
├── tests/
│ ├── README.md
│ └── run-managed-tests.mjs
├── version
└── wiseflow/
├── README.md
├── addon.json
├── crew/
│ └── new-media-editor/
│ ├── AGENTS.md
│ ├── ALLOWED_COMMANDS
│ ├── BOOTSTRAP.md
│ ├── BUILTIN_SKILLS
│ ├── DENIED_SKILLS
│ ├── HEARTBEAT.md
│ ├── IDENTITY.md
│ ├── MEMORY.md
│ ├── SOUL.md
│ ├── TASKS.md
│ ├── TOOLS.md
│ ├── USER.md
│ └── skills/
│ ├── siliconflow-img-gen/
│ │ ├── SKILL.md
│ │ └── scripts/
│ │ └── gen.py
│ ├── siliconflow-video-gen/
│ │ ├── SKILL.md
│ │ └── scripts/
│ │ └── gen.py
│ └── wenyan-formatter/
│ ├── SKILL.md
│ └── scripts/
│ └── format.sh
├── overrides.sh
├── patches/
│ ├── 001-browser-tab-recovery.patch
│ ├── 002-disable-web-search-env-var.patch
│ ├── 003-act-field-validation.patch
│ └── 004-web-fetch-allow-rfc2544.patch
└── skills/
├── browser-guide/
│ └── SKILL.md
├── rss-reader/
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/
│ └── fetch-rss.mjs
└── smart-search/
└── SKILL.md
================================================
FILE CONTENTS
================================================
================================================
FILE: .claude/20260307_done.md
================================================
# 1、按 README.md 更新其他语种 readme
# 2、更新 .github/workflows/ 的 release 流程
## 触发机制:
**upstream**(TeamWiseflow 正式仓库)每次合并 PR 后通过 github actions 自动更新版本号并触发 release 打包发布
## 具体工作机制
分别从 https://github.com/openclaw/openclaw 和 https://github.com/TeamWiseFlow/openclaw_for_business 拉取最新代码,拉取后按如下结构放置:
```
openclaw_for_business/
├── addons/
│ └── wiseflow/ # 本项目代码仓内的 wiseflow/ 注意:不是整个项目目录
└── openclaw/
└──
```
使用 github action 分别在最新的 ubuntu24.04、macos-latest 两个系统上进行端到端完整测试,保证执行`openclaw_for_business/scripts/reinstall-daemon.sh`脚本没问题后,直接连同 openclaw_for_business 和 openclaw 代码,保持上面的放置结构,打包为一个 zip 压缩包,发布到本代码仓的 release
================================================
FILE: .github/workflows/ci.yml
================================================
name: CI
on:
pull_request:
branches: [master]
types: [opened, synchronize, reopened] # 明确排除 closed
# 同一 PR/分支有新 commit 时,自动取消正在运行的旧任务
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
test:
strategy:
matrix:
os: [ubuntu-24.04, macos-latest]
fail-fast: false
runs-on: ${{ matrix.os }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install pnpm
uses: pnpm/action-setup@v4
with:
version: latest
- name: Read pinned openclaw version
id: pin
run: |
source openclaw.version
echo "commit=$OPENCLAW_COMMIT" >> "$GITHUB_OUTPUT"
echo "version=$OPENCLAW_VERSION" >> "$GITHUB_OUTPUT"
- name: Clone openclaw at pinned commit
run: |
git init openclaw
git -C openclaw remote add origin https://github.com/openclaw/openclaw.git
git -C openclaw fetch --depth=1 origin ${{ steps.pin.outputs.commit }}
git -C openclaw checkout FETCH_HEAD
- name: Clone openclaw_for_business
run: git clone --depth=1 https://github.com/TeamWiseFlow/openclaw_for_business.git openclaw_for_business
- name: Set up addon directory structure
run: |
cp -r wiseflow openclaw_for_business/addons/wiseflow
cp -r openclaw openclaw_for_business/openclaw
# Run setup-crew.sh + apply-addons.sh separately.
# We intentionally skip the `pnpm openclaw daemon install` step that
# reinstall-daemon.sh would also execute: daemon installation requires
# a real user session (systemd on Linux, launchd on macOS) and cannot
# be meaningfully tested in a headless CI runner.
- name: Run setup-crew.sh
run: bash scripts/setup-crew.sh
working-directory: openclaw_for_business
- name: Run apply-addons.sh
run: bash scripts/apply-addons.sh
working-directory: openclaw_for_business
================================================
FILE: .github/workflows/release.yml
================================================
name: Auto Release
on:
pull_request_target:
types: [closed]
branches: [master]
workflow_dispatch:
inputs:
bump_type:
description: 'Version bump type'
required: false
default: 'patch'
type: choice
options:
- patch
- minor
- major
permissions:
contents: write
# 防止多个 PR 同时 merge 时并发触发重复 release
concurrency:
group: release
cancel-in-progress: false
jobs:
release:
# CI 已在 PR 期间验证过,此处直接做版本 bump + 打包发布
if: github.event.pull_request.merged == true || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.RELEASE_TOKEN || secrets.GITHUB_TOKEN }}
- name: Determine bump type from PR labels
id: bump
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "type=${{ inputs.bump_type }}" >> "$GITHUB_OUTPUT"
else
LABELS='${{ toJSON(github.event.pull_request.labels.*.name) }}'
if echo "$LABELS" | grep -q '"major"'; then
echo "type=major" >> "$GITHUB_OUTPUT"
elif echo "$LABELS" | grep -q '"minor"'; then
echo "type=minor" >> "$GITHUB_OUTPUT"
else
echo "type=patch" >> "$GITHUB_OUTPUT"
fi
fi
- name: Calculate new version
id: version
run: |
CURRENT=$(cat version | tr -d '[:space:]')
NUM=${CURRENT#v}
IFS='.' read -r MAJOR MINOR PATCH <<< "$NUM"
MAJOR=${MAJOR:-0}
MINOR=${MINOR:-0}
PATCH=${PATCH:-0}
BUMP="${{ steps.bump.outputs.type }}"
if [ "$BUMP" = "major" ]; then
MAJOR=$((MAJOR + 1))
MINOR=0
PATCH=0
elif [ "$BUMP" = "minor" ]; then
MINOR=$((MINOR + 1))
PATCH=0
else
PATCH=$((PATCH + 1))
# Auto-carry: patch 累积到 10 时自动晋升 minor
if [ "$PATCH" -ge 10 ]; then
MINOR=$((MINOR + 1))
PATCH=0
fi
fi
NEW_VERSION="v${MAJOR}.${MINOR}.${PATCH}"
echo "new=$NEW_VERSION" >> "$GITHUB_OUTPUT"
echo "New version: $NEW_VERSION"
- name: Update version file
run: echo "${{ steps.version.outputs.new }}" > version
- name: Commit and tag
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add version
git commit -m "chore: bump version to ${{ steps.version.outputs.new }} [skip ci]"
git tag "${{ steps.version.outputs.new }}"
git push origin master --tags
- name: Read pinned openclaw version
id: pin
run: |
source openclaw.version
echo "commit=$OPENCLAW_COMMIT" >> "$GITHUB_OUTPUT"
echo "version=$OPENCLAW_VERSION" >> "$GITHUB_OUTPUT"
- name: Clone openclaw at pinned commit
run: |
git init openclaw
git -C openclaw remote add origin https://github.com/openclaw/openclaw.git
git -C openclaw fetch --depth=1 origin ${{ steps.pin.outputs.commit }}
git -C openclaw checkout FETCH_HEAD
- name: Clone openclaw_for_business
run: git clone --depth=1 https://github.com/TeamWiseFlow/openclaw_for_business.git openclaw_for_business
- name: Set up release directory structure
run: |
cp -r wiseflow openclaw_for_business/addons/wiseflow
cp -r openclaw openclaw_for_business/openclaw
- name: Package release
run: |
# 保留 openclaw/.git:apply-addons.sh 中 git apply --3way 依赖 git 仓库上下文
# 仅删除其他 .git 目录(wiseflow 本身、openclaw_for_business 等)
find openclaw_for_business -type d -name ".git" \
! -path "*/openclaw/.git" \
-exec rm -rf {} + 2>/dev/null || true
zip -r "wiseflow-${{ steps.version.outputs.new }}.zip" openclaw_for_business
- name: Create GitHub Release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh release create "${{ steps.version.outputs.new }}" \
"wiseflow-${{ steps.version.outputs.new }}.zip" \
--title "${{ steps.version.outputs.new }}" \
--generate-notes
================================================
FILE: .gitignore
================================================
# node
node_modules/
package-lock.json
# default ignore
/shelf/
/workspace.xml
.DS_Store
.idea/
__pycache__
.env
.venv/
# temporary files
*.tmp
*.log
*.pyc
*.pyo
*.pyd
__pycache__/
*.so
.Python
patchright/
patchright-v*/
tests/openclaw_for_business/
tests/openclaw/
openclaw/
openclaw_for_business/
apply-addons.sh
================================================
FILE: CHANGELOG.md
================================================
# v5.0
upgrage workflow to Agent!
# v4.32
- bug fix;
- import error\can not work when use rss souces only.
- update patchright to 1.57.2
- clean useless code
# v4.3.1
- 后端新增 info_stat 统计接口,并补齐 user_notify、user_prompt、ws_ping 等前端交互相关接口。
Added info_stat statistics endpoint and completed frontend interaction endpoints such as user_notify, user_prompt, and ws_ping.
- read_info 参数与 task time_slots 枚举同步为当前实现。
Synced read_info parameters and task time_slots enum with the current implementation.
- 后端接口文档更新,移除已弃用的 mc_backup_accounts CRUD 说明。
Updated backend API docs and removed deprecated mc_backup_accounts CRUD descriptions.
# v4.30
- 升级为与 pro 版本一样的架构,同时具有一样的 api,可无缝共享 [wiseflow+](https://github.com/TeamWiseFlow/wiseflow-plus) 生态!
Upgraded to the same architecture as the pro version, with the same api, seamlessly sharing the [wiseflow+](https://github.com/TeamWiseFlow/wiseflow-plus) ecosystem!
# v4.2
- 全新的网页爬取方案,使用 patchright 直连本地用户真实浏览器,从而实现更加强大的反爬虫伪装能力,以及提供用户数据持久化留存等特性;
Brand new web crawling solution: uses patchright to directly connect to the user's real local browser, providing much stronger anti-crawling disguise capabilities and features like persistent user data storage.
- 配套提供预登录、清除、深度清除脚本
Provided supporting scripts for pre-login, cleanup, and deep cleanup.
- 大幅简化 web crawler相关的 config
Greatly simplified web crawler-related configuration.
- 新增了proxy方案(支持直连提供商服务器,动态获取,本地缓存)
Added a new proxy solution (supports direct connection to provider servers, dynamic acquisition, and local caching).
- 整合 Crawler4ai script 方案,提供网页操作能力
Integrated Crawler4ai script solution, enabling web page operation capabilities.
- 重构搜索引擎方案,适配新的爬取方案并修复一些累积问题
Refactored search engine solution to adapt to the new crawling approach and fixed some accumulated issues.
- 升级 docker 部署方案,适配全新的打包 work flow。
Upgraded Docker deployment solution to fit the brand new packaging workflow.
# v4.1
- 通用llm提取支持设定 role 和 purpose,从而实现更加精准的提取
Universal LLM extraction supports setting role and purpose, enabling more precise extraction
- 社交平台信源增加查找创作者详情的功能
Added functionality to search for creator details in social media platform sources
- 增加自定义精准搜索功能(自定义 info 提取字段)
Added custom precision search functionality (custom info extraction fields)
- 可以为关注点指定搜索源,目前支持 bing、github、arxiv、ebay 四个源,并且全部使用平台原生接口,无需额外申请并配置第三方 key
Can specify search sources for focus points, currently supporting four sources: bing, github, arxiv, ebay, all using platform native interfaces without requiring additional third-party key applications and configurations
- 优化的缓存以及缓存遗忘机制
Optimized caching and cache forgetting mechanisms
- 修复快手平台搜索结果为空时的错误处理
Fixed error handling when Kuaishou platform search results are empty
# v4.0
- 深度重构 Crawl4ai(0.6.3)和 MediaCrawler, 并整合引入 Nodriver,大幅提升获取能力,支持社交平台内容获取(4.0版本提供对微博和快手平台的支持);
Deeply refactored Crawl4ai (0.6.3) and MediaCrawler, integrated Nodriver, significantly enhanced content acquisition capabilities, supporting social media platform content retrieval (version 4.0 provides support for Weibo and Kuaishou platforms);
- 全新的架构,混合使用异步和线程池,大大提升处理效率(同时降低内存消耗);
New architecture utilizing a hybrid approach of async and thread pools, greatly improving processing efficiency (while reducing memory consumption);
- 继承了 Crawl4ai 0.6.3 版本的 dispacher 能力,提供更精细的内存管理能力;
Inherited the dispatcher capabilities from Crawl4ai 0.6.3 version, providing more refined memory management capabilities;
- 深度整合了 3.9 版本中的 Pre-Process 和 Crawl4ai 的 Markdown Generation流程, 规避了重复处理;
Deeply integrated the Pre-Process from version 3.9 and Crawl4ai's Markdown Generation process, avoiding duplicate processing;
- 放弃了通过 pocketbase 的api 进行数据库操作,改为直接读写 sqlite 数据库,因此无需用户在 .env 中提供pocketbase的账密,也规避了登录过期导致数据库无法读写,从而产生大量日志的隐患;
Abandoned database operations through PocketBase API, switched to direct SQLite database read/write, eliminating the need for users to provide PocketBase credentials in .env, and avoiding the risk of database read/write failures due to login expiration that could generate excessive logs;
- 优化 llm 处理策略,更加符合思考模型的特性;
Optimized LLM processing strategy to better align with the characteristics of thinking models;
- 优化了对 RSS 信源的支持;
Enhanced support for RSS sources;
- 优化了代码仓文件结构,更加清晰且符合当代 python 项目规范;
Optimized repository file structure, making it clearer and more compliant with contemporary Python project standards;
- 改为使用 uv 进行依赖管理,并优化了 requirement.txt 文件;
Switched to using uv for dependency management and optimized the requirement.txt file;
- 优化了启动脚本(提供提供 windows 版本),真正做到"一键启动";
Optimized startup scripts (including Windows version), achieving true "one-click startup";
- 优化了日志输出,增加 recorder 总结,并提供更精细化的日志输出控制。
Enhanced log output, added recorder summaries, and provided more granular log output control.
# v3.9-patch3
- 更改版本号命名规则
Change version number naming rules
- 诸多累积修复
Numerous cumulative fixes
# v0.3.9-patch2
- 定制更改 crawl4ai 0.4.30 版本,以取得更好的性能
Modified crawl4ai version 0.4.30 for better performance
- 相应的更改 core/requirements.txt
Corresponding changes to core/requirements.txt
- 更改 prompt,但未在 qwen2.5-14b 模型上发现改进
Modified the prompt, but no improvements were found on the qwen2.5-14b model
# V0.3.9
- 适配 Crawl4ai 0.4.248 版本,优化了性能
Adapt to Crawl4ai 0.4.248 version, optimized performance
- 累积 bug 修复
Cumulative bug fixes
- 增加 docker 运行方案(感谢 @braumye 贡献)
Added docker running solution (thanks to @braumye for contributing)
# V0.3.8
- 增加对 RSS 信源的支持
add support for RSS source
- 支持为关注点指定信源,并且可以为每个关注点增加搜索引擎作为信源
support to specify source for each focus point, and add search engine as source
- 进一步优化信息提取策略(每次只处理一个关注点)
Further optimized information extraction strategy (processing one focus point at a time)
- 优化入口逻辑,简化并合并启动方案 (感谢 @c469591 贡献windows版本启动脚本)
Optimized entry logic, simplified and merged startup solutions (thanks to @c469591 for contributing Windows startup script)
# V0.3.7
- 新增通过wxbot方案获取微信公众号订阅消息信源(不是很优雅,但已是目前能找到的最佳方案)
Added WeChat Official Account subscription message source acquisition through wxbot solution (not very elegant, but currently the best solution available)
- 升级适配 Crawl4ai 0.4.247 版本,
Upgraded to fit Crawl4ai 0.4.247 version,
- 通过新增预处理流程以及全新设计的推荐链接提取策略,大幅提升信息抓取效果,现在7b 这样的小模型也能比较好的完成复杂关注点(explanation中包含时间、指标限制这种)的提取了。
Through the addition of a new pre-processing process and a completely redesigned recommended link extraction strategy, the information capture effect has been significantly improved, and now even small models like 7b can better complete the extraction of complex focus points (such as time and index limits in the explanation).
- 提供自定义提取器接口,方便用户根据实际需求进行定制。
Provided a custom extractor interface to allow users to customize according to actual needs.
- bug 修复以及其他改进(crawl4ai浏览器生命周期管理,异步 llm wrapper 等)(感谢 @tusik 贡献)
Bug fixes and other improvements (crawl4ai browser lifecycle management, asynchronous llm wrapper, etc.)
Thanks to @tusik for contributing
# V0.3.6
- 改用 Crawl4ai 作为底层爬虫框架,其实Crawl4ai 和 Crawlee 的获取效果差别不大,二者也都是基于 Playwright ,但 Crawl4ai 的 html2markdown 功能很实用,而这对llm 信息提取作用很大,另外 Crawl4ai 的架构也更加符合我的思路;
Switched to Crawl4ai as the underlying web crawling framework. Although Crawl4ai and Crawlee both rely on Playwright with similar fetching results, Crawl4ai's html2markdown feature is quite practical for LLM information extraction. Additionally, Crawl4ai's architecture better aligns with my design philosophy.
- 在 Crawl4ai 的 html2markdown 基础上,增加了 deep scraper,进一步把页面的独立链接与正文进行区分,便于后一步 llm 的精准提取。由于html2markdown和deep scraper已经将原始网页数据做了很好的清理,极大降低了llm所受的干扰和误导,保证了最终结果的质量,同时也减少了不必要的 token 消耗;
Built upon Crawl4ai's html2markdown, we added a deep scraper to further differentiate standalone links from the main content, facilitating more precise LLM extraction. The preprocessing done by html2markdown and deep scraper significantly cleans up raw web data, minimizing interference and misleading information for LLMs, ensuring higher quality outcomes while reducing unnecessary token consumption.
*列表页面和文章页面的区分是所有爬虫类项目都头痛的地方,尤其是现代网页往往习惯在文章页面的侧边栏和底部增加大量推荐阅读,使得二者几乎不存在文本统计上的特征差异。*
*这一块我本来想用视觉大模型进行 layout 分析,但最终实现起来发现获取不受干扰的网页截图是一件会极大增加程序复杂度并降低处理效率的事情……*
*Distinguishing between list pages and article pages is a common challenge in web scraping projects, especially when modern webpages often include extensive recommended readings in sidebars and footers of articles, making it difficult to differentiate them through text statistics.*
*Initially, I considered using large visual models for layout analysis, but found that obtaining undistorted webpage screenshots greatly increases program complexity and reduces processing efficiency...*
- 重构了提取策略、llm 的 prompt 等;
Restructured extraction strategies and LLM prompts;
*有关 prompt 我想说的是,我理解好的 prompt 是清晰的工作流指导,每一步都足够明确,明确到很难犯错。但我不太相信过于复杂的 prompt 的价值,这个很难评估,如果你有更好的方案,欢迎提供 PR*
*Regarding prompts, I believe that a good prompt serves as clear workflow guidance, with each step being explicit enough to minimize errors. However, I am skeptical about the value of overly complex prompts, which are hard to evaluate. If you have better solutions, feel free to submit a PR.*
- 引入视觉大模型,自动在提取前对高权重(目前由 Crawl4ai 评估权重)图片进行识别,并补充相关信息到页面文本中;
Introduced large visual models to automatically recognize high-weight images (currently evaluated by Crawl4ai) before extraction and append relevant information to the page text;
- 继续减少 requirement.txt 的依赖项,目前不需要 json_repair了(实践中也发现让 llm 按 json 格式生成,还是会明显增加处理时间和失败率,因此我现在采用更简单的方式,同时增加对处理结果的后处理)
Continued to reduce dependencies in requirement.txt; json_repair is no longer needed (in practice, having LLMs generate JSON format still noticeably increases processing time and failure rates, so I now adopt a simpler approach with additional post-processing of results)
- pb info 表单的结构做了小调整,增加了 web_title 和 reference 两项。
Made minor adjustments to the pb info form structure, adding web_title and reference fields.
- @ourines 贡献了 install_pocketbase.sh 脚本
@ourines contributed the install_pocketbase.sh script
- @ibaoger 贡献了 windows 下的pocketbase 安装脚本
@ibaoger contributed the pocketbase installation script for Windows
- docker运行方案被暂时移除了,感觉大家用起来也不是很方便……
Docker running solution has been temporarily removed as it wasn't very convenient for users...
# V0.3.5
- 引入 Crawlee(playwrigt模块),大幅提升通用爬取能力,适配实际项目场景;
Introduce Crawlee (playwright module), significantly enhancing general crawling capabilities and adapting to real-world task;
- 完全重写了信息提取模块,引入"爬-查一体"策略,你关注的才是你想要的;
Completely rewrote the information extraction module, introducing an "integrated crawl-search" strategy, focusing on what you care about;
- 新策略下放弃了 gne、jieba 等模块,去除了安装包;
Under the new strategy, modules such as gne and jieba have been abandoned, reducing the installation package size;
- 重写了 pocketbase 的表单结构;
Rewrote the PocketBase form structure;
- llm wrapper引入异步架构、自定义页面提取器规范优化(含 微信公众号文章提取优化);
llm wrapper introduces asynchronous architecture, customized page extractor specifications optimization (including WeChat official account article extraction optimization);
- 进一步简化部署操作步骤。
Further simplified deployment steps.
================================================
FILE: CLAUDE.md
================================================
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Wiseflow v5.x is an **OpenClaw_for_business add-on** that enhances browser automation with anti-detection capabilities for [openclaw](https://github.com/openclaw/openclaw). It replaces Playwright with Patchright (undetected fork) and adds tab recovery, agent skills, and anti-bot strategies. In future, it may add some extension/plugin to openclaw.
The distributable unit is the `wiseflow/` directory, which would be applied to OpenClaw by a script developed by another team: https://github.com/bigbrother666sh/openclaw_for_business/blob/main/scripts/apply-addons.sh.
we must keep working with both the latest openclaw project and openclaw_for_business project.
**OpenClaw_for_business is also called "OFB" for short.**
## OpenClaw_for_business Add-on Architecture
### Three-Layer Add-on Loading
OpenClaw_for_business's `apply-addons.sh` processes our add-ons in this order:
1. **`overrides.sh`** — pnpm overrides that swap `playwright-core` → `patchright-core` at the package manager level. Controlled by `PATCHRIGHT_VERSION` env var (default: 1.57.0). Also patches documentation references.
2. **`patches/*.patch`** — Git patches applied to OpenClaw source. Currently `001-browser-tab-recovery.patch` adds snapshot-based tab recovery when the target tab disappears mid-session.
**must use scripts/generate-patch.sh to generate the final patch file**
3. **`skills/*/SKILL.md`** — Agent skill definitions installed into OpenClaw's skill system. `browser-guide/SKILL.md` teaches the agent login wall handling, CAPTCHA strategies, lazy-load scrolling, paywall detection, and tab cleanup.
### Key Files
| Path | Purpose |
|------|---------|
| `wiseflow/addon.json` | Package manifest (name, version, openclaw dependency) |
| `wiseflow/overrides.sh` | pnpm override script, receives `$ADDON_DIR` and `$OPENCLAW_DIR` |
| `wiseflow/patches/001-browser-tab-recovery.patch` | Tab resilience patch for browser tool |
| `wiseflow/skills/browser-guide/SKILL.md` | Agent browser best practices |
| `tests/run-managed-tests.mjs` | Automated test suite (Node.js ESM) |
| `docs/anti-detection-research.md` | Technical analysis of detection mechanisms |
| `version` | Current version string (v5.0) |
### Deployment Model
```
openclaw_for_business/
addons/
wiseflow/ ← copy of wiseflow/ directory
addon.json
overrides.sh
patches/
skills/
```
Install: copy `wiseflow/` → `<openclaw>/addons/wiseflow`, then restart OpenClaw.
## Development Workflow
### 远程仓库
- **origin** → `git@github.com:bigbrother666sh/wiseflow.git`(个人开发仓库)
- **upstream** → `git@github.com:TeamWiseFlow/wiseflow.git`(TeamWiseflow 正式发布仓库)
### 开发流程与注意事项
1. 默认在 `master` 分支上开发,按需创建功能分支
2. 本项目是基于 openclaw 进行 patch,同时必须遵循 openclaw_for_business(OFB)的 add-on 加载机制。因此你应该保证在 代码仓根目录始终克隆一份来自 https://github.com/openclaw/openclaw 的代码,同时下载一份 https://github.com/bigbrother666sh/openclaw_for_business/blob/main/scripts/apply-addons.sh
每次开发前都应该进行一次拉取,然后基于最新的 openclaw 代码进行开发,并保证最后的产出适配 apply-addons.sh
3. 遵循 tdd(测试驱动开发)流程,每次开发之后必须进行完整测试
4. 本项目建立在其他一些开源项目基础上,比如[patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright), 随着项目发展,你需要记录一份我们的依赖清单,对于每一个你都可以 clone 一份代码到项目根目录下,以便随时查看我们是否有必要跟着升级,但记得同步更新 .gitignore 文件, 避免混入提交
5. 开发完成后推送到 **origin**(个人仓库)
6. 阶段性成果通过 GitHub PR 从 origin 合并到 **upstream**(TeamWiseflow 正式仓库)
7. **upstream**(TeamWiseflow 正式仓库)每次合并 PR 后自动更新版本号并触发 release 打包发布
注:有时我会通过在 .claude/ 中留下 TODO.md 的方式下发开发任务,这些任务你完成后需要把 TODO.md 改名为 {date}_done.md
### 版本管理
版本号存储在 `version` 文件中,格式为 `vMAJOR.MINOR.PATCH`。当 PR 合并到 upstream 的 master 时,GitHub Action 自动递增版本号并创建 Release。通过 PR 标签控制递增类型:
- `major` 标签 → 大版本升级
- `minor` 标签 → 功能版本升级
- 无标签或 `patch` 标签 → 补丁版本升级(默认)
**不要手动修改 `version` 文件**,由 CI 自动维护。
## Permissions
Claude Code 被授权在本仓库中执行任何 git 命令(包括 push、branch、tag 等),无需逐次确认。
================================================
FILE: LICENSE
================================================
# Open Source License
wiseflow is licensed under a modified version of the Apache License 2.0, with the following additional conditions:
1. Wiseflow may be utilized commercially. Should the conditions below be met, a commercial license must be obtained from the producer:
a. Multi-tenant service: Unless explicitly authorized by Wiseflow in writing, you may not use the Wiseflow source code to operate a multi-tenant environment.
- Tenant Definition: Within the context of Wiseflow, one tenant corresponds to one workspace.
The workspace provides a separated area for each tenant's data and configurations.
b. LOGO and copyright information: In the process of using Wiseflow's frontend, you may not remove or modify the LOGO or copyright information in the Wiseflow console or applications. This restriction is inapplicable to uses of Wiseflow that do not involve its frontend.
- Frontend Definition: For the purposes of this license, the "frontend" of Wiseflow includes all components located in the `web/` directory when running Wiseflow from the raw source code, or the "web" image when running Wiseflow with Docker.
c. Prohibited usage: Using Wiseflow for commercial web crawling or data harvesting operations.
d. Prohibited usage: Using Wiseflow for any unlawful or unauthorized scraping, including activities that violate applicable laws, website terms of service, or robots exclusion directives.
e. Prohibited usage: Using Wiseflow to obtain, copy, or distribute content from media platforms and trading platforms or other materials protected by third-party intellectual property rights, unless you have obtained prior explicit authorization from the rights holder.
2. As a contributor, you should agree that:
a. The producer can adjust the open-source agreement to be more strict or relaxed as deemed necessary.
b. Your contributed code may be used for commercial purposes, including but not limited to its cloud business operations.
Apart from the specific conditions mentioned above, all other rights and restrictions follow the Apache License 2.0. Detailed information about the Apache License 2.0 can be found at http://www.apache.org/licenses/LICENSE-2.0.
The interactive design of this product is protected by appearance patent.
© 2025 Team Wiseflow
================================================
FILE: README.md
================================================
# Wiseflow
**[English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [Français](README_FR.md) | [العربية](README_AR.md)**
🚀 **STEP INTO 5.x**
> 📌 **寻找 4.x 版本?** 原版 v4.30 及之前版本的代码在 [`4.x` 分支](https://github.com/TeamWiseFlow/wiseflow/tree/4.x)中。
```
“吾生也有涯,而知也无涯。以有涯随无涯,殆已!“ —— 《庄子·内篇·养生主第三》
```
wiseflow 4.x(包括之前的版本) 通过一系列精密的 workflow 实现了在特定场景下的强大的获取能力,但依然存在诸多局限性:
- 1. 无法获取交互式内容(需要经过点选才能出现的内容,尤其是动态加载的情况)
- 2. 只能进行信息过滤与提取,几乎没有任何下游任务能力
- ……
虽然我们一直致力于完善它的功能、扩增它的边界,但真实世界是复杂的,真实的互联网也一样,规则永无可能穷尽,因此固定的 workflow 永远做不到适配所有场景,这不是 wiseflow 的问题,这是传统软件的问题!
然而过去一年 Agent 的突飞猛进,让我们看到了由大模型驱动完全模拟人类互联网行为在技术上的可能,[openclaw](https://github.com/openclaw/openclaw) 的出现更让我们坚定了此信念。
更奇妙的是,通过前期的实验和探索,我们发现将 wiseflow 的获取能力以”插件“形式融入 openclaw,即可以完美解决上面提到的两个局限性。
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
需要说明的是:openclaw 的 plugin 系统与传统上我们理解的“插件”(类似 claude code 的 plugin)并不相同,因此我们不得不额外提出了“add-on"的概念,所以确切的说,wiseflow5.x 将以 openclaw add-on 的形态出现。原版的 openclaw 并不具有”add-on“架构,不过实际上,你只需要几条简单的 shell 命令即可完成这个”改造“。我们也准备了开箱即用、同时包含一系列针对真实商用场景预设配置的 openclaw 强化版本,即 [openclaw_for_business](https://github.com/TeamWiseFlow/openclaw_for_business), 你可以直接 clone ,并将 wiseflow release 解压缩放置于 openclaw_for_business 的 add-on 文件夹内即可。
## ✨ 通过安装 wiseflow 你能获得什么(强于原版 openclaw)?
### 1. 反检测浏览器,且无需安装浏览器插件
wiseflow 的 patch-001 将 openclaw 内置的 Playwright 替换为 [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)(Playwright 的反检测 fork),显著降低自动化浏览器被目标网站识别和拦截的概率。从而实现不需要安装 chrome relay extension,只用托管浏览器也能达到与 relay 同样、甚至更优的网络获取与操作能力。
📥 *我们综合考察了目前市面上流行的各浏览器自动化框架,包括 nodriver、browser-use、vercel 的 agent-browser等,目前可以确认的是虽然基本原理都是通过走 cdp 并提供持久化 openclaw 专用的 profile,但是只有 patchright 提供了完全的针对 CDP 探针的移除,换言之,即便是用最纯粹的 cdp 直连方案,也是带有特征的,即也是可以被检测到的。其他框架的定位是自动化测试目的,而非获取目的,而 patchright 本身就定位于获取,并且它本质上是 playwright 的 patch,继承了几乎全部的 playwright 上层 api,这就天然与 openclaw 兼容,不必额外安装任何插件或者mcp*
### 2. 标签页自动恢复机制
当 Agent 操作过程中目标标签页意外关闭或消失时,自动进行快照级别的标签页恢复,确保任务不会因标签页丢失而中断。
### 3. Smart Search(智能搜索) Skill
替代 openclaw 内置的 `web_search`,提供更强大的搜索能力。相比原版内置的 web search tool,Smart Search 具备三大核心优势:
- **完全免费,无需 API Key**:不依赖任何第三方搜索 API,零成本使用
- **即时搜索,时效性最佳**:直接驱动浏览器前往目标页面或各大社交媒体平台(微博、Twitter/X、facebook 等)进行搜索,第一时间获取最新发布的内容
- **信源可自定义**:用户可以自由指定搜索源,精准匹配自己的信息需求
### 4. 新媒体小编 Crew(预设 AI Agent)
开箱即用的中文自媒体内容创作 AI Agent,深耕微博、小红书、知乎、B 站、抖音等国内主流平台。
**主要能力:**
- 选题研究 + 热点分析(Mode A)
- 草稿扩写 + 网络佐证(Mode B)
- 文章定稿后自动调用 [文颜(Wenyan)](https://github.com/caol64/wenyan) 渲染为公众号风格 HTML,支持 7 套内置主题智能匹配
- 可直接推送微信公众号草稿箱(Mode C,需配置 `WECHAT_APP_ID`/`WECHAT_APP_SECRET`)
- 支持 AI 文生图([SiliconFlow](https://cloud.siliconflow.cn/i/WNLYbBpi) 图片/视频生成,需配置 `SILICONFLOW_API_KEY`)
## 🌟 快速开始
> **💡 模型费用说明**
>
> wiseflow5.x 底层基于 openclaw,Agent 工作流对 token 消耗有一定要求,建议先准备好大模型 API:
>
> - **国内用户(推荐)**:[硅基流动(SiliconFlow)](https://cloud.siliconflow.cn/i/WNLYbBpi) — 注册并实名认证可领取免费全平台模型代金券,覆盖上手阶段所需费用(配置模板中已预置 siliconflow.cn 的最佳实践,可直接使用)。😄 欢迎使用我的[推荐链接](https://cloud.siliconflow.cn/i/WNLYbBpi)注册,你我都会获赠 ¥16 平台奖励
> - **OpenAI / 海外闭源模型**:推荐 [AiHubMix](https://aihubmix.com?aff=Gp54) — 国内直连无障碍。😄 欢迎使用我的[邀请链接](https://aihubmix.com?aff=Gp54)注册
> - **海外用户**:可直接使用 SiliconFlow 国际版:https://www.siliconflow.com/
直接从本代码仓的 [Releases](https://github.com/TeamWiseFlow/wiseflow/releases) 下载包含了 openclaw_for_business 和 wiseflow addon 的整合压缩包。
1. 下载压缩包并解压缩
2. 进入解压缩后的文件夹
3. 根据需求选择启动方式:
**调试模式**(单次启动,适合测试和开发):
```bash
./scripts/dev.sh gateway
```
**生产模式**(安装为系统服务,适合长期运行):
```bash
./scripts/reinstall-daemon.sh
```
> **系统要求**
> - 推荐使用 **Ubuntu 22.04** 系统
> - 支持 **Windows WSL2** 环境
> - 支持 **macOS**
> - **不支持**直接在 Windows(原生)下运行
### 【备用】手动方案
注意:你需要先下载部署 openclaw_for_business,下载地址为:https://github.com/TeamWiseFlow/openclaw_for_business/releases
复制代码仓内的 wiseflow 文件夹(注意不是代码仓本身)到 openclaw_for_business 的 `addons/` 目录:
```bash
# 方式一:从 wiseflow 仓库复制
git clone https://github.com/TeamWiseFlow/wiseflow.git /tmp/wiseflow
cp -r /tmp/wiseflow/wiseflow <openclaw_for_business>/addons/wiseflow
```
安装后重启 openclaw_for_business 即可生效。
## 目录结构
```
wiseflow/ # addon 包(复制到 addons/ 目录使用)
├── addon.json # 元数据
├── overrides.sh # pnpm overrides + 禁用内置 web_search
├── patches/
│ ├���─ 001-browser-tab-recovery.patch # 标签页恢复补丁
│ ├── 002-disable-web-search-env-var.patch # 禁用内置 web_search(env var)
│ └── 003-act-field-validation.patch # ACT 字段校验补丁
├── skills/ # 全局技能(所有 Agent 可用)
│ ├── browser-guide/SKILL.md # 浏览器最佳实践(登录/验证码/懒加载等)
│ ├── smart-search/SKILL.md # 多平台搜索 URL 构造(替代内置 web_search)
│ └── rss-reader/ # RSS/Atom Feed 读取器
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/fetch-rss.mjs
└── crew/ # 预设 AI Agent(Crew 模板)
└── new-media-editor/ # 新媒体小编(中文自媒体内容创作)
├── IDENTITY.md / SOUL.md / AGENTS.md / TOOLS.md / ...
└── skills/ # Crew 专属技能
├── siliconflow-img-gen/ # 文生图(SiliconFlow API)
├── siliconflow-video-gen/ # 文生视频(SiliconFlow API)
└── wenyan-formatter/ # Markdown → 公众号 HTML / 推送草稿
docs/ # 技术文档(代码仓根目录)
├── anti-detection-research.md
└── more_powerful_search_skill/
scripts/ # 工具脚本(代码仓根目录)
└── generate-patch.sh
tests/ # 测试用例和脚本(代码仓根目录)
├── README.md
└── run-managed-tests.mjs
```
## WiseFlow Pro 版本现已发布!
更强的抓取能力、更全面的社交媒体支持、含 UI 界面和免部署一键安装包!
https://github.com/user-attachments/assets/57f8569c-e20a-4564-a669-1200d56c5725
🔥 **Pro 版本现已面向全网发售**:https://shouxiqingbaoguan.com/
🌹 即日起为 wiseflow 开源版本贡献 PR(代码、文档、成功案例分享均欢迎),一经采纳,贡献者将获赠 wiseflow pro版本一年使用权!
## 🛡️ 许可协议
自4.2版本起,我们更新了开源许可协议,敬请查阅: [LICENSE](LICENSE)
商用合作,请联系 **Email:zm.zhao@foxmail.com**
## 📬 联系方式
有任何问题或建议,欢迎通过 [issue](https://github.com/TeamWiseFlow/wiseflow/issues) 留言。
🎉 wiseflow && OFB 目前提供付费知识库,包含《手把手从零开始安装教程》、各种独门应用秘籍等,以及 **vip微信交流群**:
欢迎添加”掌柜的“企业微信咨询了解:
<img width="360" height="360" alt="wiseflow掌柜" src="https://github.com/user-attachments/assets/b013b3fd-546e-4176-b418-57bee419e761" />
🌹 开源不易,感谢支持!
## 🤝 wiseflow5.x 基于如下优秀的开源项目:
- Patchright(Undetected Python version of the Playwright testing and automation library) https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
- Feedparser(Parse feeds in Python) https://github.com/kurtmckee/feedparser
- SearXNG(a free internet metasearch engine which aggregates results from various search services and databases) https://github.com/searxng/searxng
- 文颜(Wenyan)多平台 Markdown 排版与发布工具(新媒体小编 Crew 通过 wenyan-formatter 技能调用) https://github.com/caol64/wenyan
## Citation
如果您在相关工作中参考或引用了本项目的部分或全部,请注明如下信息:
```
Author:Wiseflow Team
https://github.com/TeamWiseFlow/wiseflow
```
## 友情链接
[<img src="https://github.com/TeamWiseFlow/wiseflow/raw/4.x/docs/logos/SiliconFlow.png" alt="siliconflow" width="360">](https://cloud.siliconflow.cn/i/WNLYbBpi)
================================================
FILE: README_AR.md
================================================
<div dir="rtl">
# Wiseflow
**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [Français](README_FR.md)**
🚀 **STEP INTO 5.x**
> 📌 **تبحث عن الإصدار 4.x؟** الكود الأصلي للإصدار v4.30 والإصدارات السابقة متوفر في [فرع `4.x`](https://github.com/TeamWiseFlow/wiseflow/tree/4.x).
```
"حياتي لها حدود، لكن المعرفة بلا حدود. أن تلاحق اللامحدود بالمحدود — فذلك خطر محدق!" — تشوانغ تزو، الفصول الداخلية، تغذية مبدأ الحياة
```
حقق wiseflow 4.x (بما في ذلك الإصدارات السابقة) قدرات قوية في جمع البيانات في سيناريوهات محددة من خلال سلسلة من سير العمل الدقيقة، لكنه لا يزال يعاني من قيود كبيرة:
- 1. عدم القدرة على جمع المحتوى التفاعلي (المحتوى الذي لا يظهر إلا بعد النقر، خاصة في حالات التحميل الديناميكي)
- 2. يقتصر على تصفية واستخراج المعلومات، مع غياب شبه كامل لقدرات معالجة المهام اللاحقة
- ……
على الرغم من أننا عملنا باستمرار على تحسين وظائفه وتوسيع حدوده، إلا أن العالم الحقيقي معقد، وكذلك الإنترنت. لا يمكن أن تكون القواعد شاملة أبداً، لذا فإن سير العمل الثابت لن يتمكن أبداً من التكيف مع جميع السيناريوهات. هذه ليست مشكلة wiseflow — إنها مشكلة البرمجيات التقليدية!
ومع ذلك، أظهر لنا التقدم السريع في تقنية الوكلاء (Agents) خلال العام الماضي الإمكانية التقنية لمحاكاة سلوك الإنسان على الإنترنت بالكامل بواسطة نماذج اللغة الكبيرة. وقد عزز ظهور [openclaw](https://github.com/openclaw/openclaw) هذا الاقتناع بشكل أكبر.
والأكثر إثارة للدهشة أنه من خلال تجاربنا واستكشافاتنا المبكرة، اكتشفنا أن دمج قدرات جمع البيانات في wiseflow في openclaw على شكل "إضافات" يحل المشكلتين المذكورتين أعلاه بشكل مثالي.
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
تجدر الإشارة إلى أن نظام الإضافات في openclaw يختلف كثيراً عما نفهمه تقليدياً بـ"الإضافات" (المشابهة لإضافات Claude Code). لذلك اضطررنا إلى تقديم مفهوم "add-on". وبشكل دقيق، سيظهر wiseflow 5.x على شكل add-on لـ openclaw. لا يحتوي openclaw الأصلي على بنية "add-on"، لكن عملياً تحتاج فقط إلى بضعة أوامر shell بسيطة لإتمام هذا "التحويل". كما أعددنا نسخة معززة من openclaw جاهزة للاستخدام مع إعدادات مسبقة لسيناريوهات الأعمال الحقيقية: [openclaw_for_business](https://github.com/TeamWiseFlow/openclaw_for_business). يمكنك ببساطة استنساخها ووضع إصدار wiseflow في مجلد add-on الخاص بـ openclaw_for_business.
## ✨ ما الذي ستكسبه بتثبيت wiseflow (أفضل من openclaw الأصلي)؟
### 1. متصفح مضاد للكشف، دون الحاجة لتثبيت أي إضافات للمتصفح
يستبدل patch-001 الخاص بـ wiseflow برنامج Playwright المدمج في openclaw بـ [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) (نسخة fork غير قابلة للكشف من Playwright)، مما يقلل بشكل كبير من احتمالية اكتشاف المتصفحات الآلية وحجبها من قبل المواقع المستهدفة. وهذا يعني أنه دون الحاجة إلى تثبيت امتداد Chrome Relay، يمكن لمتصفح مُدار وحده تحقيق قدرات اكتساب وتشغيل الويب المماثلة لإعداد relay أو حتى أفضل منه.
📥 *قمنا بتقييم جميع أطر عمل أتمتة المتصفح الرائجة في السوق، بما في ذلك nodriver وbrowser-use وagent-browser من Vercel. يمكننا التأكيد أنه رغم أن جميعها تعمل عبر CDP وتوفر ملفات تعريف مخصصة ومستمرة لـ openclaw، إلا أن Patchright وحده يوفر إزالة كاملة لبصمات CDP. بعبارة أخرى، حتى نهج الاتصال المباشر بـ CDP الأكثر نقاءً لا يزال يحمل توقيعات قابلة للكشف. تم تصميم الأطر الأخرى للاختبار الآلي، وليس لجمع البيانات، بينما تم تصميم Patchright خصيصاً للاستحواذ. ونظراً لأنه في جوهره تصحيح (patch) على Playwright، فإنه يرث تقريباً جميع واجهات برمجة التطبيقات عالية المستوى الخاصة بـ Playwright، مما يجعله متوافقاً بطبيعته مع openclaw دون الحاجة إلى تثبيت أي إضافات أو MCP إضافية.*
### 2. آلية الاسترداد التلقائي لعلامات التبويب
عندما تُغلق أو تُفقد علامة تبويب مستهدفة بشكل غير متوقع أثناء عملية Agent، يقوم النظام تلقائياً بإجراء استرداد علامة التبويب بناءً على لقطات الحالة، مما يضمن عدم انقطاع المهام بسبب فقدان علامة التبويب.
### 3. مهارة البحث الذكي (Smart Search Skill)
يحل محل `web_search` المدمج في openclaw بقدرات بحث أكثر قوة. مقارنةً بأداة web search المدمجة الأصلية، يتميز البحث الذكي بثلاث مزايا جوهرية:
- **مجاني تماماً، لا يتطلب مفتاح API**: لا يعتمد على أي API بحث من طرف ثالث — تكلفة صفرية
- **بحث فوري لأقصى درجات الحداثة**: يوجّه المتصفح مباشرةً إلى الصفحات المستهدفة أو منصات التواصل الاجتماعي الكبرى (ويبو، Twitter/X، Facebook، إلخ) للحصول فوراً على أحدث المنشورات
- **مصادر بحث قابلة للتخصيص**: يمكن للمستخدمين تحديد مصادر بحثهم بحرية للحصول على معلومات دقيقة وموجّهة
### 4. New Media Editor Crew (وكيل AI مسبق الإعداد)
وكيل AI جاهز للاستخدام لإنشاء محتوى وسائل التواصل الاجتماعي الصينية، متخصص في المنصات الرئيسية الصينية مثل Weibo وXiaohongshu وZhihu وBilibili وDouyin.
**القدرات الرئيسية:**
- بحث الموضوعات + تحليل الاتجاهات (الوضع A)
- توسيع المسودة + إضافة أدلة من الإنترنت (الوضع B)
- بعد الانتهاء من المقال، استدعاء [Wenyan](https://github.com/caol64/wenyan) تلقائياً لتحويله إلى HTML بتنسيق حساب WeChat العام (7 قوالب مدمجة)
- الدفع المباشر إلى صندوق مسودات حساب WeChat العام (الوضع C، يتطلب `WECHAT_APP_ID`/`WECHAT_APP_SECRET`)
- دعم توليد الصور/الفيديو بالذكاء الاصطناعي ([SiliconFlow](https://www.siliconflow.com/) لتوليد الصور/الفيديو، يتطلب `SILICONFLOW_API_KEY`)
## 🌟 البدء السريع
> **💡 ��لاحظة حول تكاليف API**
>
> يعتمد wiseflow 5.x على سير عمل Agent الخاص بـ openclaw، مما يتطلب الوصول إلى واجهة برمجة تطبيقات LLM. نوصي بإعداد بيانات اعتماد API مسبقاً:
>
> - **المستخدمون الدوليون (موصى به)**: [SiliconFlow](https://www.siliconflow.com/) — رصيد مجاني متاح بعد التسجيل يغطي تكاليف الاستخدام الأولي
> - **OpenAI / Anthropic ومزودون آخرون**: أي API متوافق يعمل
قم بتنزيل الحزمة المتكاملة (التي تشمل openclaw_for_business وإضافة wiseflow) مباشرةً من [Releases](https://github.com/TeamWiseFlow/wiseflow/releases) لهذا المستودع.
1. تنزيل الأرشيف وفك ضغطه
2. الانتقال إلى المجلد المستخرج
3. اختيار وضع التشغيل:
**وضع التصحيح** (تشغيل مفرد، للاختبار والتطوير):
<div dir="ltr">
```bash
./scripts/dev.sh gateway
```
</div>
**وضع الإنتاج** (التثبيت كخدمة نظام، للتشغيل طويل الأمد):
<div dir="ltr">
```bash
./scripts/reinstall-daemon.sh
```
</div>
> **متطلبات النظام**
> - يُنصح باستخدام نظام **Ubuntu 22.04**
> - بيئة **Windows WSL2** مدعومة
> - **macOS** مدعوم
> - التشغيل المباشر على **Windows الأصلي** **غير مدعوم**
### [بديل] التثبيت اليدوي
> ملاحظة: تحتاج أولاً إلى تنزيل ��نشر openclaw_for_business من: https://github.com/TeamWiseFlow/openclaw_for_business/releases
انسخ مجلد `wiseflow` من هذا المستودع (وليس المستودع بأكمله) إلى مجلد `addons/` الخاص بـ openclaw_for_business:
<div dir="ltr">
```bash
# الطريقة 1: الاستنساخ من مستودع wiseflow
git clone https://github.com/TeamWiseFlow/wiseflow.git /tmp/wiseflow
cp -r /tmp/wiseflow/wiseflow <openclaw_for_business>/addons/wiseflow
```
</div>
أعد تشغيل openclaw_for_business بعد التثبيت لتفعيل التغييرات.
## هيكل المجلدات
<div dir="ltr">
```
wiseflow/ # حزمة addon (انسخها إلى مجلد addons/)
├── addon.json # البيانات الوصفية
├── overrides.sh # pnpm overrides + تعطيل web_search المدمج
├── patches/
│ ├── 001-browser-tab-recovery.patch # رقعة استعادة علامات التبويب
│ ├── 002-disable-web-search-env-var.patch # تعطيل web_search المدمج (env var)
│ └── 003-act-field-validation.patch # رقعة التحقق من حقول ACT
├── skills/ # المهارات العامة (متاحة لجميع الوكلاء)
│ ├── browser-guide/SKILL.md # أفضل ممارسات المتصفح (تسجيل الدخول/CAPTCHA/التحميل الكسول، إلخ)
│ ├── smart-search/SKILL.md # منشئ URL البحث متعدد المنصات (يحل محل web_search المدمج)
│ └── rss-reader/ # قارئ خلاصات RSS/Atom
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/fetch-rss.mjs
└── crew/ # وكلاء AI مسبقو الإعداد (قوالب Crew)
└── new-media-editor/ # محرر الوسائط ��لجديدة (إنشاء محتوى وسائل التواصل الاجتماعي الصينية)
├── IDENTITY.md / SOUL.md / AGENTS.md / TOOLS.md / ...
└── skills/ # مهارات خاصة بـ Crew
├── siliconflow-img-gen/ # توليد صور AI (SiliconFlow API)
├── siliconflow-video-gen/ # توليد فيديو AI (SiliconFlow API)
└── wenyan-formatter/ # Markdown → HTML WeChat / إرسال المسودة
docs/ # التوثيق التقني (جذر المستودع)
├── anti-detection-research.md
└── more_powerful_search_skill/
scripts/ # النصوص البرمجية المساعدة (جذر المستودع)
└── generate-patch.sh
tests/ # حالات الاختبار والنصوص البرمجية (جذر المستودع)
├── README.md
└── run-managed-tests.mjs
```
</div>
## WiseFlow Pro متوفر الآن!
قدرات استخراج أقوى، دعم أشمل لوسائل التواصل الاجتماعي، مع واجهة مستخدم وحزمة تثبيت بنقرة واحدة — لا حاجة للنشر!
https://github.com/user-attachments/assets/57f8569c-e20a-4564-a669-1200d56c5725
🔥 **النسخة الاحترافية معروضة للبيع الآن**: https://shouxiqingbaoguan.com/
🌹 بدءاً من اليوم، ساهم بطلبات السحب (PR) في النسخة مفتوحة المصدر من wiseflow (الكود والتوثيق ومشاركة قصص النجاح مرحب بها). عند القبول، سيحصل المساهمون على ترخيص لمدة عام واحد لـ wiseflow Pro!
## 🛡️ الترخيص
منذ الإصدار 4.2، قمنا بتحديث ترخيصنا مفتوح المصدر. يرجى الاطلاع على: [LICENSE](LICENSE)
للتعاون التجاري، يرجى التواصل عبر **البريد الإلكتروني: zm.zhao@foxmail.com**
## 📬 اتصل بنا
لأي أسئلة أو اقتراحات، لا تتردد في ترك رسالة عبر [المشكلات](https://github.com/TeamWiseFlow/wiseflow/issues).
🎉 يقدم wiseflow & OFB الآن **قاعدة معرفة مدفوعة**، تتضمن دروس تعليمية للتثبيت خطوة بخطوة، ونصائح تطبيقية حصرية، و**مجموعة WeChat VIP**:
أضف "Keeper" على WeChat Enterprise للاستفسار:
<img width="360" height="360" alt="wiseflow掌柜" src="https://github.com/user-attachments/assets/b013b3fd-546e-4176-b418-57bee419e761" />
🌹 المصدر المفتوح يتطلب جهداً كبيراً — شكراً لدعمكم!
## 🤝 wiseflow 5.x مبني على المشاريع مفتوحة المصدر الممتازة التالية:
- Patchright (نسخة Python غير قابلة للكشف من مكتبة Playwright للاختبار والأتمتة) https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
- Feedparser (تحليل الخلاصات في Python) https://github.com/kurtmckee/feedparser
- SearXNG (محرك بحث وصفي مجاني على الإنترنت يجمع النتائج من خدمات البحث وقواعد البيانات المختلفة) https://github.com/searxng/searxng
- Wenyan (أداة تنسيق ونشر Markdown متعددة المنصات، يستخدمها New Media Editor Crew عبر مهارة wenyan-formatter) https://github.com/caol64/wenyan
## الاستشهاد
إذا أشرت إلى أو استشهدت بجزء أو كل هذا المشروع في عملك، يرجى تضمين المعلومات التالية:
```
Author: Wiseflow Team
https://github.com/TeamWiseFlow/wiseflow
```
## الشركاء
[<img src="https://github.com/TeamWiseFlow/wiseflow/raw/4.x/docs/logos/SiliconFlow.png" alt="siliconflow" width="360">](https://siliconflow.com/)
</div>
================================================
FILE: README_DE.md
================================================
# Wiseflow
**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Français](README_FR.md) | [العربية](README_AR.md)**
🚀 **STEP INTO 5.x**
> 📌 **Suchen Sie 4.x?** Der ursprüngliche Code von v4.30 und früheren Versionen ist im [`4.x`-Branch](https://github.com/TeamWiseFlow/wiseflow/tree/4.x) verfügbar.
```
„Mein Leben hat Grenzen, doch das Wissen hat keine. Mit dem Begrenzten dem Grenzenlosen zu folgen — das ist gefährlich!" — Zhuangzi, Innere Kapitel, Die Pflege des Lebensprinzips
```
Wiseflow 4.x (einschließlich früherer Versionen) erreichte durch eine Reihe präziser Workflows leistungsstarke Datenerfassungsfähigkeiten in bestimmten Szenarien, hatte jedoch weiterhin erhebliche Einschränkungen:
- 1. Interaktive Inhalte konnten nicht erfasst werden (Inhalte, die erst nach einem Klick erscheinen, insbesondere bei dynamischem Laden)
- 2. Beschränkung auf Informationsfilterung und -extraktion, praktisch keine Fähigkeit zur Verarbeitung nachgelagerter Aufgaben
- ……
Obwohl wir stets daran gearbeitet haben, die Funktionalität zu verbessern und die Grenzen zu erweitern, ist die reale Welt komplex — und das Internet ebenso. Regeln können niemals vollständig sein, daher kann ein fester Workflow niemals alle Szenarien abdecken. Dies ist kein Problem von wiseflow — es ist ein Problem traditioneller Software!
Die rasante Entwicklung von Agenten im vergangenen Jahr hat uns jedoch die technische Möglichkeit gezeigt, menschliches Internetverhalten durch große Sprachmodelle vollständig zu simulieren. Das Erscheinen von [openclaw](https://github.com/openclaw/openclaw) hat diese Überzeugung weiter gestärkt.
Noch bemerkenswerter ist, dass wir durch frühe Experimente und Erforschung entdeckt haben, dass die Integration der Erfassungsfähigkeiten von wiseflow als „Plugins" in openclaw die beiden oben genannten Einschränkungen perfekt löst.
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
Es ist jedoch zu beachten, dass das Plugin-System von openclaw sich erheblich von dem unterscheidet, was wir traditionell unter „Plugins" verstehen (ähnlich den Plugins von Claude Code). Daher mussten wir das Konzept des „Add-ons" einführen. Genau genommen wird wiseflow 5.x als openclaw Add-on erscheinen. Das originale openclaw verfügt nicht über eine „Add-on"-Architektur, aber in der Praxis benötigen Sie nur wenige einfache Shell-Befehle, um diese „Umgestaltung" durchzuführen. Wir haben auch eine sofort einsatzbereite, erweiterte Version von openclaw mit voreingestellten Konfigurationen für reale Geschäftsszenarien vorbereitet: [openclaw_for_business](https://github.com/TeamWiseFlow/openclaw_for_business). Sie können es einfach klonen und das wiseflow-Release in den Add-on-Ordner von openclaw_for_business entpacken.
## ✨ Was erhalten Sie durch die Installation von wiseflow (überlegen dem originalen openclaw)?
### 1. Anti-Erkennungs-Browser, keine Browser-Erweiterungen erforderlich
wiseflow's patch-001 ersetzt das in openclaw integrierte Playwright durch [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) (ein unerkannter Fork von Playwright) und reduziert damit erheblich die Wahrscheinlichkeit, dass automatisierte Browser von Ziel-Websites erkannt und blockiert werden. Dadurch lassen sich ohne die Installation einer Chrome-Relay-Extension mit einem verwalteten Browser gleichwertige oder sogar überlegene Web-Erfassungs- und Bedienungsfähigkeiten gegenüber einer Relay-Konfiguration erzielen.
📥 *Wir haben alle derzeit populären Browser-Automatisierungs-Frameworks bewertet, darunter nodriver, browser-use und Vercels agent-browser. Wir können bestätigen, dass zwar alle über CDP arbeiten und beständige openclaw-spezifische Profile bereitstellen, aber nur Patchright eine vollständige Entfernung von CDP-Fingerprints bietet. Mit anderen Worten: Selbst der direkteste CDP-Verbindungsansatz hinterlässt nachweisbare Merkmale. Andere Frameworks sind für automatisierte Tests konzipiert, nicht für Datenerfassung, während Patchright speziell für die Erfassung entwickelt wurde. Da es sich im Wesentlichen um einen Patch auf Playwright handelt, erbt es fast alle High-Level-APIs von Playwright — und ist dadurch nativ mit openclaw kompatibel, ohne dass zusätzliche Erweiterungen oder MCP installiert werden müssen.*
### 2. Automatischer Tab-Wiederherstellungsmechanismus
Wenn ein Ziel-Browser-Tab während eines Agent-Vorgangs unerwartet geschlossen oder verloren geht, führt das System automatisch eine snapshot-basierte Tab-Wiederherstellung durch, damit Aufgaben nicht durch Tab-Verlust unterbrochen werden.
### 3. Smart Search Skill
Ersetzt die eingebaute `web_search` von openclaw durch leistungsfähigere Suchfunktionen. Im Vergleich zum ursprünglich integrierten web search tool bietet Smart Search drei zentrale Vorteile:
- **Völlig kostenlos, kein API-Schlüssel erforderlich**: Keine Abhängigkeit von Drittanbieter-Such-APIs — null Kosten
- **Echtzeit-Suche für maximale Aktualität**: Steuert den Browser direkt zu Zielseiten oder großen Social-Media-Plattformen (Weibo, Twitter/X, Facebook usw.), um die zuletzt veröffentlichten Inhalte sofort abzurufen
- **Benutzerdefinierbare Suchquellen**: Benutzer können ihre Suchquellen frei festlegen, um präzise und zielgerichtete Informationsabfragen zu ermöglichen
### 4. New-Media-Editor Crew (vorkonfigurierter KI-Agent)
Ein sofort einsatzbereiter KI-Agent zur Erstellung chinesischer Social-Media-Inhalte, spezialisiert auf die wichtigsten chinesischen Plattformen wie Weibo, Xiaohongshu, Zhihu, Bilibili und Douyin.
**Hauptfähigkeiten:**
- Themenrecherche + Trendanalyse (Modus A)
- Entwurfserweiterung + Online-Belegung (Modus B)
- Nach der Fertigstellung automatischer Aufruf von [Wenyan](https://github.com/caol64/wenyan) zur Darstellung als WeChat-Public-Account-HTML mit 7 integrierten Themen
- Direktes Pushen in den WeChat-Public-Account-Entwurfsbereich (Modus C, erfordert `WECHAT_APP_ID`/`WECHAT_APP_SECRET`)
- KI-Bild-/Videogenerierung ([SiliconFlow](https://www.siliconflow.com/) Bild/Video-Generierung, erfordert `SILICONFLOW_API_KEY`)
## 🌟 Schnellstart
> **💡 Hinweis zu API-Kosten**
>
> wiseflow 5.x basiert auf dem Agent-Workflow von openclaw und benötigt LLM-API-Zugang. Wir empfehlen, Ihre API-Zugangsdaten vorab vorzubereiten:
>
> - **Internationale Benutzer (empfohlen)**: [SiliconFlow](https://www.siliconflow.com/) — nach der Registrierung werden kostenlose Credits gutgeschrieben, die die Anfangskosten abdecken
> - **OpenAI / Anthropic und andere Anbieter**: Jede kompatible API ist verwendbar
Laden Sie das integrierte Paket (enthält openclaw_for_business und das wiseflow Addon) direkt aus den [Releases](https://github.com/TeamWiseFlow/wiseflow/releases) dieses Repositories herunter.
1. Das Archiv herunterladen und entpacken
2. In den entpackten Ordner wechseln
3. Startmodus auswählen:
**Debug-Modus** (Einzelstart, für Tests und Entwicklung):
```bash
./scripts/dev.sh gateway
```
**Produktionsmodus** (als Systemdienst installieren, für den Dauerbetrieb):
```bash
./scripts/reinstall-daemon.sh
```
> **Systemanforderungen**
> - **Ubuntu 22.04** wird empfohlen
> - **Windows WSL2**-Umgebung wird unterstützt
> - **macOS** wird unterstützt
> - Die direkte Ausführung unter **nativem Windows** wird **nicht unterstützt**
### [Alternative] Manuelle Installation
> Hinweis: Sie müssen zuerst openclaw_for_business herunterladen und deployen. Download-Adresse: https://github.com/TeamWiseFlow/openclaw_for_business/releases
Kopieren Sie den `wiseflow`-Ordner aus diesem Repository (nicht das Repository selbst) in das `addons/`-Verzeichnis von openclaw_for_business:
```bash
# Option 1: Aus dem wiseflow-Repository klonen
git clone https://github.com/TeamWiseFlow/wiseflow.git /tmp/wiseflow
cp -r /tmp/wiseflow/wiseflow <openclaw_for_business>/addons/wiseflow
```
Nach der Installation openclaw_for_business neu starten, um die Änderungen zu aktivieren.
## Verzeichnisstruktur
```
wiseflow/ # addon-Paket (in addons/-Verzeichnis kopieren)
├── addon.json # Metadaten
├── overrides.sh # pnpm overrides + integrierte web_search deaktivieren
├── patches/
│ ├── 001-browser-tab-recovery.patch # Tab-Wiederherstellungs-Patch
│ ├── 002-disable-web-search-env-var.patch # Integrierte web_search deaktivieren (env var)
│ └── 003-act-field-validation.patch # ACT-Feldvalidierungs-Patch
├── skills/ # Globale Skills (für alle Agents verfügbar)
│ ├── browser-guide/SKILL.md # Best Practices für den Browser (Login/CAPTCHA/Lazy-Loading etc.)
│ ├── smart-search/SKILL.md # Multiplattform-Such-URL-Builder (ersetzt integrierte web_search)
│ └── rss-reader/ # RSS/Atom Feed-Reader
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/fetch-rss.mjs
└── crew/ # Vorkonfigurierte KI-Agents (Crew-Vorlagen)
└── new-media-editor/ # New-Media-Editor (Chinesische Social-Media-Inhaltserstellung)
├── IDENTITY.md / SOUL.md / AGENTS.md / TOOLS.md / ...
└── skills/ # Crew-spezifische Skills
├── siliconflow-img-gen/ # KI-Bildgenerierung (SiliconFlow API)
├── siliconflow-video-gen/ # KI-Videogenerierung (SiliconFlow API)
└── wenyan-formatter/ # Markdown → WeChat HTML / Entwurf pushen
docs/ # Technische Dokumentation (Repository-Root)
├── anti-detection-research.md
└── more_powerful_search_skill/
scripts/ # Hilfsskripte (Repository-Root)
└── generate-patch.sh
tests/ # Testfälle und Skripte (Repository-Root)
├── README.md
└── run-managed-tests.mjs
```
## WiseFlow Pro ist jetzt verfügbar!
Stärkere Scraping-Fähigkeiten, umfassendere Social-Media-Unterstützung, mit UI-Oberfläche und Ein-Klick-Installationspaket — keine Bereitstellung erforderlich!
https://github.com/user-attachments/assets/57f8569c-e20a-4564-a669-1200d56c5725
🔥 **Pro-Version ist jetzt im Verkauf**: https://shouxiqingbaoguan.com/
🌹 Ab sofort: Beiträge (PRs) zur Open-Source-Version von wiseflow (Code, Dokumentation und erfolgreiche Fallstudien sind willkommen) — bei Annahme erhalten Mitwirkende eine einjährige Lizenz für wiseflow Pro!
## 🛡️ Lizenz
Seit Version 4.2 haben wir unsere Open-Source-Lizenz aktualisiert. Bitte beachten Sie: [LICENSE](LICENSE)
Für kommerzielle Zusammenarbeit kontaktieren Sie bitte **Email: zm.zhao@foxmail.com**
## 📬 Kontakt
Bei Fragen oder Vorschlägen hinterlassen Sie gerne eine Nachricht über [Issues](https://github.com/TeamWiseFlow/wiseflow/issues).
🎉 wiseflow & OFB bieten jetzt eine **kostenpflichtige Wissensdatenbank** an, einschließlich Schritt-für-Schritt-Installationstutorials, exklusiver Anwendungstipps und einer **VIP-WeChat-Gruppe**:
Fügen Sie „Keeper" auf WeChat Enterprise für Anfragen hinzu:
<img width="360" height="360" alt="wiseflow掌柜" src="https://github.com/user-attachments/assets/b013b3fd-546e-4176-b418-57bee419e761" />
🌹 Open Source erfordert viel Aufwand — vielen Dank für Ihre Unterstützung!
## 🤝 wiseflow 5.x basiert auf folgenden hervorragenden Open-Source-Projekten:
- Patchright (Unerkannte Python-Version der Playwright Test- und Automatisierungsbibliothek) https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
- Feedparser (Feeds in Python parsen) https://github.com/kurtmckee/feedparser
- SearXNG (eine freie Internet-Metasuchmaschine, die Ergebnisse verschiedener Suchdienste und Datenbanken aggregiert) https://github.com/searxng/searxng
- Wenyan (plattformübergreifendes Markdown-Formatierungs- und Veröffentlichungstool, vom New-Media-Editor-Crew über das wenyan-formatter-Skill verwendet) https://github.com/caol64/wenyan
## Citation
Wenn Sie Teile oder das gesamte Projekt in Ihrer Arbeit referenzieren oder zitieren, geben Sie bitte folgende Informationen an:
```
Author: Wiseflow Team
https://github.com/TeamWiseFlow/wiseflow
```
## Partner
[<img src="https://github.com/TeamWiseFlow/wiseflow/raw/4.x/docs/logos/SiliconFlow.png" alt="siliconflow" width="360">](https://siliconflow.com/)
================================================
FILE: README_EN.md
================================================
# Wiseflow
**[中文](README.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [Français](README_FR.md) | [العربية](README_AR.md)**
🚀 **STEP INTO 5.x**
> 📌 **Looking for 4.x?** The original v4.30 and earlier code is available on the [`4.x` branch](https://github.com/TeamWiseFlow/wiseflow/tree/4.x).
```
"My life has a limit, but knowledge has none. To pursue the limitless with the limited — that is perilous!" — Zhuangzi, Inner Chapters, Nourishing the Lord of Life
```
Wiseflow 4.x (and earlier versions) achieved powerful data acquisition capabilities in specific scenarios through a series of precisely engineered workflows, but still had significant limitations:
- 1. Unable to acquire interactive content (content that only appears after clicking, especially in dynamically loaded scenarios)
- 2. Limited to information filtering and extraction, with virtually no downstream task capabilities
- ……
Although we have been dedicated to improving its functionality and expanding its boundaries, the real world is complex, and so is the real internet. Rules can never be exhaustive, so a fixed workflow can never adapt to all scenarios. This is not a problem with wiseflow — it's a problem with traditional software!
However, the rapid advancement of Agents over the past year has shown us the technical possibility of fully simulating human internet behavior driven by large language models. The emergence of [openclaw](https://github.com/openclaw/openclaw) has further strengthened this belief.
What's even more remarkable is that through our early experiments and exploration, we discovered that integrating wiseflow's acquisition capabilities into openclaw as "plugins" perfectly solves the two limitations mentioned above.
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
It should be noted that openclaw's plugin system is quite different from what we traditionally understand as "plugins" (similar to Claude Code's plugins). Therefore, we had to introduce the concept of "add-on". To be precise, wiseflow 5.x will appear in the form of an openclaw add-on. The original openclaw does not have an "add-on" architecture, but in practice, you only need a few simple shell commands to complete this "transformation". We have also prepared a ready-to-use enhanced version of openclaw with a series of preset configurations for real business scenarios: [openclaw_for_business](https://github.com/TeamWiseFlow/openclaw_for_business). You can simply clone it and extract the wiseflow release into the add-on folder of openclaw_for_business.
## ✨ What Do You Gain by Installing wiseflow (Superior to Vanilla openclaw)?
### 1. Anti-Detection Browser, No Browser Extensions Required
wiseflow's patch-001 replaces openclaw's built-in Playwright with [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) (an undetected fork of Playwright), significantly reducing the likelihood of automated browsers being identified and blocked by target websites. This means that without installing the Chrome relay extension, a managed browser alone can achieve the same — or even better — web acquisition and operation capabilities as a relay setup.
📥 *We evaluated all major browser automation frameworks currently available, including nodriver, browser-use, and Vercel's agent-browser. We can confirm that while they all operate through CDP and provide persistent openclaw-specific profiles, only Patchright delivers complete removal of CDP fingerprints. In other words, even the most direct CDP connection approach still carries detectable signatures. Other frameworks are designed for automated testing, not for data acquisition, whereas Patchright was specifically built for acquisition. Since it is essentially a patch on top of Playwright, it inherits nearly all of Playwright's high-level APIs — making it natively compatible with openclaw without requiring any additional extensions or MCP.*
### 2. Automatic Tab Recovery Mechanism
When a target browser tab is unexpectedly closed or lost during an Agent operation, the system automatically performs snapshot-based tab recovery, ensuring tasks are not interrupted by tab loss.
### 3. Smart Search Skill
Replaces openclaw's built-in `web_search` with more powerful search capabilities. Compared to the original built-in web search tool, Smart Search has three core advantages:
- **Completely free, no API key required**: Does not rely on any third-party search APIs — zero cost
- **Real-time search for maximum timeliness**: Directly drives the browser to target pages or major social media platforms (Weibo, Twitter/X, Facebook, etc.) to search for the latest published content
- **User-configurable search sources**: Users can freely specify their search sources for precise, targeted information retrieval
### 4. New Media Editor Crew (Preset AI Agent)
A ready-to-use Chinese social media content creation AI Agent, focused on major Chinese platforms including Weibo, Xiaohongshu, Zhihu, Bilibili, and Douyin.
**Key capabilities:**
- Topic research + trending analysis (Mode A)
- Draft expansion + online fact support (Mode B)
- After article finalization, automatically invokes [Wenyan](https://github.com/caol64/wenyan) to render WeChat public account-style HTML with 7 built-in themes
- Direct push to WeChat public account draft box (Mode C, requires `WECHAT_APP_ID`/`WECHAT_APP_SECRET`)
- AI image/video generation support ([SiliconFlow](https://www.siliconflow.com/) image/video generation, requires `SILICONFLOW_API_KEY`)
## 🌟 Quick Start
> **💡 API Cost Note**
>
> wiseflow 5.x is powered by openclaw's Agent workflow, which requires LLM API access. We recommend preparing your API credentials first:
>
> - **International users (recommended)**: [SiliconFlow](https://www.siliconflow.com/) — free credits available after registration, covering initial usage costs
> - **OpenAI / Anthropic and other providers**: Any compatible API works
Download the integrated package (which includes openclaw_for_business and the wiseflow addon) directly from this repository's [Releases](https://github.com/TeamWiseFlow/wiseflow/releases).
1. Download and extract the archive
2. Enter the extracted directory
3. Choose your startup mode:
**Debug mode** (single startup, for testing and development):
```bash
./scripts/dev.sh gateway
```
**Production mode** (install as a system service for long-term operation):
```bash
./scripts/reinstall-daemon.sh
```
> **System Requirements**
> - **Ubuntu 22.04** is recommended
> - **Windows WSL2** environment is supported
> - **macOS** is supported
> - Running directly on **native Windows** is **not supported**
### [Alternative] Manual Installation
> Note: You need to first download and deploy openclaw_for_business from: https://github.com/TeamWiseFlow/openclaw_for_business/releases
Copy the `wiseflow` folder from this repository (not the repository itself) to the `addons/` directory of openclaw_for_business:
```bash
# Option 1: Clone from the wiseflow repository
git clone https://github.com/TeamWiseFlow/wiseflow.git /tmp/wiseflow
cp -r /tmp/wiseflow/wiseflow <openclaw_for_business>/addons/wiseflow
```
Restart openclaw_for_business after installation to take effect.
## Directory Structure
```
wiseflow/ # addon package (copy to addons/ directory)
├── addon.json # Metadata
├── overrides.sh # pnpm overrides + disable built-in web_search
├── patches/
│ ├── 001-browser-tab-recovery.patch # Tab recovery patch
│ ├── 002-disable-web-search-env-var.patch # Disable built-in web_search (env var)
│ └── 003-act-field-validation.patch # ACT field validation patch
├── skills/ # Global skills (available to all Agents)
│ ├── browser-guide/SKILL.md # Browser best practices (login/captcha/lazy-loading, etc.)
│ ├── smart-search/SKILL.md # Multi-platform search URL builder (replaces built-in web_search)
│ └── rss-reader/ # RSS/Atom Feed reader
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/fetch-rss.mjs
└── crew/ # Preset AI Agents (Crew templates)
└── new-media-editor/ # New Media Editor (Chinese social media content creation)
├── IDENTITY.md / SOUL.md / AGENTS.md / TOOLS.md / ...
└── skills/ # Crew-specific skills
├── siliconflow-img-gen/ # AI image generation (SiliconFlow API)
├── siliconflow-video-gen/ # AI video generation (SiliconFlow API)
└── wenyan-formatter/ # Markdown → WeChat HTML / push draft
docs/ # Technical documentation (repo root)
├── anti-detection-research.md
└── more_powerful_search_skill/
scripts/ # Utility scripts (repo root)
└── generate-patch.sh
tests/ # Test cases and scripts (repo root)
├── README.md
└── run-managed-tests.mjs
```
## WiseFlow Pro is Now Available!
Stronger scraping capabilities, more comprehensive social media support, with UI interface and one-click installation package — no deployment needed!
https://github.com/user-attachments/assets/57f8569c-e20a-4564-a669-1200d56c5725
🔥 **Pro version is now on sale**: https://shouxiqingbaoguan.com/
🌹 Starting today, contribute PRs to the wiseflow open-source version (code, documentation, and successful case studies are all welcome). Once accepted, contributors will receive a one-year license for wiseflow Pro!
## 🛡️ License
Since version 4.2, we have updated our open-source license. Please refer to: [LICENSE](LICENSE)
For commercial cooperation, please contact **Email: zm.zhao@foxmail.com**
## 📬 Contact
For any questions or suggestions, feel free to leave a message via [issue](https://github.com/TeamWiseFlow/wiseflow/issues).
🎉 wiseflow & OFB now offer a **paid knowledge base**, including step-by-step installation tutorials, exclusive application tips, and a **VIP WeChat group**:
Feel free to add "Keeper" on WeChat Enterprise for inquiries:
<img width="360" height="360" alt="wiseflow掌柜" src="https://github.com/user-attachments/assets/b013b3fd-546e-4176-b418-57bee419e761" />
🌹 Open source takes effort — thank you for your support!
## 🤝 wiseflow 5.x is built on the following excellent open-source projects:
- Patchright (Undetected Python version of the Playwright testing and automation library) https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
- Feedparser (Parse feeds in Python) https://github.com/kurtmckee/feedparser
- SearXNG (a free internet metasearch engine which aggregates results from various search services and databases) https://github.com/searxng/searxng
- Wenyan (multi-platform Markdown formatting and publishing tool, used by the New Media Editor Crew via the wenyan-formatter skill) https://github.com/caol64/wenyan
## Citation
If you reference or cite part or all of this project in your work, please include the following information:
```
Author: Wiseflow Team
https://github.com/TeamWiseFlow/wiseflow
```
## Partners
[<img src="https://github.com/TeamWiseFlow/wiseflow/raw/4.x/docs/logos/SiliconFlow.png" alt="siliconflow" width="360">](https://siliconflow.com/)
================================================
FILE: README_FR.md
================================================
# Wiseflow
**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [العربية](README_AR.md)**
🚀 **STEP INTO 5.x**
> 📌 **Vous cherchez la version 4.x ?** Le code original de la v4.30 et des versions antérieures est disponible sur la [branche `4.x`](https://github.com/TeamWiseFlow/wiseflow/tree/4.x).
```
« Ma vie a des limites, mais la connaissance n'en a point. Poursuivre l'illimité avec le limité — voilà qui est périlleux ! » — Zhuangzi, Chapitres intérieurs, Nourrir le principe vital
```
Wiseflow 4.x (y compris les versions précédentes) a permis d'atteindre de puissantes capacités d'acquisition de données dans des scénarios spécifiques grâce à une série de workflows précis, mais présentait encore des limitations significatives :
- 1. Incapacité à acquérir du contenu interactif (contenu qui n'apparaît qu'après un clic, en particulier dans les cas de chargement dynamique)
- 2. Limité au filtrage et à l'extraction d'informations, avec pratiquement aucune capacité de traitement en aval
- ……
Bien que nous nous soyons constamment efforcés d'améliorer ses fonctionnalités et d'étendre ses limites, le monde réel est complexe, tout comme l'internet. Les règles ne peuvent jamais être exhaustives, c'est pourquoi un workflow fixe ne peut jamais s'adapter à tous les scénarios. Ce n'est pas un problème de wiseflow — c'est un problème des logiciels traditionnels !
Cependant, les progrès fulgurants des Agents au cours de l'année écoulée nous ont montré la possibilité technique de simuler entièrement le comportement humain sur Internet grâce aux grands modèles de langage. L'apparition d'[openclaw](https://github.com/openclaw/openclaw) a renforcé davantage cette conviction.
Plus remarquable encore, grâce à nos expériences et explorations préliminaires, nous avons découvert que l'intégration des capacités d'acquisition de wiseflow dans openclaw sous forme de « plugins » résout parfaitement les deux limitations mentionnées ci-dessus.
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
Il convient de noter que le système de plugins d'openclaw diffère considérablement de ce que nous comprenons traditionnellement par « plugins » (similaires aux plugins de Claude Code). Nous avons donc dû introduire le concept d'« add-on ». Pour être précis, wiseflow 5.x apparaîtra sous la forme d'un add-on openclaw. L'openclaw original ne dispose pas d'une architecture « add-on », mais en pratique, vous n'avez besoin que de quelques commandes shell simples pour effectuer cette « transformation ». Nous avons également préparé une version améliorée d'openclaw prête à l'emploi avec des configurations prédéfinies pour des scénarios commerciaux réels : [openclaw_for_business](https://github.com/TeamWiseFlow/openclaw_for_business). Vous pouvez simplement le cloner et extraire la release wiseflow dans le dossier add-on d'openclaw_for_business.
## ✨ Que gagnez-vous en installant wiseflow (supérieur à l'openclaw original) ?
### 1. Navigateur anti-détection, sans extensions de navigateur
Le patch-001 de wiseflow remplace le Playwright intégré d'openclaw par [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) (un fork non détectable de Playwright), réduisant considérablement le risque que les navigateurs automatisés soient identifiés et bloqués par les sites cibles. Cela permet d'atteindre des capacités d'acquisition et d'opération web équivalentes, voire supérieures à celles d'une configuration relay, en utilisant uniquement un navigateur géré sans installer d'extension Chrome relay.
📥 *Nous avons évalué tous les principaux frameworks d'automatisation de navigateur disponibles, notamment nodriver, browser-use et agent-browser de Vercel. Nous pouvons confirmer que bien qu'ils fonctionnent tous via CDP et fournissent des profils persistants dédiés à openclaw, seul Patchright assure la suppression complète des empreintes CDP. En d'autres termes, même l'approche de connexion CDP la plus directe laisse des signatures détectables. Les autres frameworks sont conçus pour les tests automatisés, non pour l'acquisition de données, tandis que Patchright a été spécifiquement conçu pour l'acquisition. Étant essentiellement un patch de Playwright, il hérite de presque toutes ses API de haut niveau — le rendant nativement compatible avec openclaw sans nécessiter d'extensions ou de MCP supplémentaires.*
### 2. Mécanisme de récupération automatique des onglets
Lorsqu'un onglet cible est fermé ou perdu de manière inattendue lors d'une opération Agent, le système effectue automatiquement une récupération d'onglet basée sur des snapshots, garantissant que les tâches ne soient pas interrompues par une perte d'onglet.
### 3. Smart Search Skill
Remplace le `web_search` intégré d'openclaw par des capacités de recherche plus puissantes. Comparé à l'outil web search intégré d'origine, Smart Search présente trois avantages clés :
- **Entièrement gratuit, sans clé API** : Ne dépend d'aucune API de recherche tierce — coût zéro
- **Recherche en temps réel pour une actualité maximale** : Pilote directement le navigateur vers les pages cibles ou les grandes plateformes de médias sociaux (Weibo, Twitter/X, Facebook, etc.) pour récupérer immédiatement les contenus publiés récemment
- **Sources de recherche personnalisables** : Les utilisateurs peuvent librement spécifier leurs sources de recherche pour une récupération d'informations précise et ciblée
### 4. New Media Editor Crew (Agent IA préconfiguré)
Un agent IA de création de contenu pour les réseaux sociaux chinois prêt à l'emploi, spécialisé dans les principales plateformes chinoises comme Weibo, Xiaohongshu, Zhihu, Bilibili et Douyin.
**Capacités principales :**
- Recherche de sujets + analyse des tendances (Mode A)
- Expansion du brouillon + justification en ligne (Mode B)
- Après finalisation de l'article, appel automatique de [Wenyan](https://github.com/caol64/wenyan) pour le rendre en HTML style compte officiel WeChat, avec 7 thèmes intégrés
- Envoi direct vers la boîte de brouillons du compte officiel WeChat (Mode C, nécessite `WECHAT_APP_ID`/`WECHAT_APP_SECRET`)
- Support de génération d'images/vidéos IA ([SiliconFlow](https://www.siliconflow.com/) génération d'images/vidéos, nécessite `SILICONFLOW_API_KEY`)
## 🌟 Démarrage rapide
> **💡 Note sur les coûts API**
>
> wiseflow 5.x repose sur le workflow Agent d'openclaw, qui nécessite un accès à l'API LLM. Nous recommandons de préparer vos identifiants API à l'avance :
>
> - **Utilisateurs internationaux (recommandé)** : [SiliconFlow](https://www.siliconflow.com/) — des crédits gratuits sont disponibles après inscription, couvrant les coûts initiaux
> - **OpenAI / Anthropic et autres fournisseurs** : Toute API compatible fonctionne
Téléchargez le package intégré (qui inclut openclaw_for_business et le wiseflow addon) directement depuis les [Releases](https://github.com/TeamWiseFlow/wiseflow/releases) de ce dépôt.
1. Télécharger et extraire l'archive
2. Accéder au dossier extrait
3. Choisir le mode de démarrage :
**Mode débogage** (démarrage unique, pour les tests et le développement) :
```bash
./scripts/dev.sh gateway
```
**Mode production** (installation en tant que service système, pour un fonctionnement à long terme) :
```bash
./scripts/reinstall-daemon.sh
```
> **Configuration requise**
> - **Ubuntu 22.04** est recommandé
> - L'environnement **Windows WSL2** est pris en charge
> - **macOS** est pris en charge
> - L'exécution directe sur **Windows natif** n'est **pas prise en charge**
### [Alternative] Installation manuelle
> Note : Vous devez d'abord télécharger et déployer openclaw_for_business depuis : https://github.com/TeamWiseFlow/openclaw_for_business/releases
Copiez le dossier `wiseflow` de ce dépôt (pas le dépôt lui-même) dans le répertoire `addons/` d'openclaw_for_business :
```bash
# Option 1 : Cloner depuis le dépôt wiseflow
git clone https://github.com/TeamWiseFlow/wiseflow.git /tmp/wiseflow
cp -r /tmp/wiseflow/wiseflow <openclaw_for_business>/addons/wiseflow
```
Redémarrez openclaw_for_business après l'installation pour que les changements prennent effet.
## Structure des répertoires
```
wiseflow/ # package addon (copier dans le répertoire addons/)
├── addon.json # Métadonnées
├── overrides.sh # pnpm overrides + désactiver web_search intégré
├── patches/
│ ├── 001-browser-tab-recovery.patch # Patch de récupération d'onglets
│ ├── 002-disable-web-search-env-var.patch # Désactiver web_search intégré (env var)
│ └── 003-act-field-validation.patch # Patch de validation des champs ACT
├── skills/ # Skills globaux (disponibles pour tous les Agents)
│ ├── browser-guide/SKILL.md # Bonnes pratiques du navigateur (connexion/CAPTCHA/chargement différé, etc.)
│ ├── smart-search/SKILL.md # Constructeur d'URL de recherche multi-plateforme (remplace web_search intégré)
│ └── rss-reader/ # Lecteur de flux RSS/Atom
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/fetch-rss.mjs
└── crew/ # Agents IA préconfigurés (modèles Crew)
└── new-media-editor/ # New Media Editor (création de contenu social media chinois)
├── IDENTITY.md / SOUL.md / AGENTS.md / TOOLS.md / ...
└── skills/ # Skills spécifiques au Crew
├── siliconflow-img-gen/ # Génération d'images IA (API SiliconFlow)
├── siliconflow-video-gen/ # Génération de vidéos IA (API SiliconFlow)
└── wenyan-formatter/ # Markdown → HTML WeChat / envoi brouillon
docs/ # Documentation technique (racine du dépôt)
├── anti-detection-research.md
└── more_powerful_search_skill/
scripts/ # Scripts utilitaires (racine du dépôt)
└── generate-patch.sh
tests/ # Cas de test et scripts (racine du dépôt)
├── README.md
└── run-managed-tests.mjs
```
## WiseFlow Pro est maintenant disponible !
Des capacités de scraping plus puissantes, un support plus complet des réseaux sociaux, avec interface graphique et package d'installation en un clic — aucun déploiement nécessaire !
https://github.com/user-attachments/assets/57f8569c-e20a-4564-a669-1200d56c5725
🔥 **La version Pro est en vente** : https://shouxiqingbaoguan.com/
🌹 Dès aujourd'hui, contribuez des PRs à la version open source de wiseflow (code, documentation et partage de cas d'utilisation réussis sont les bienvenus). Une fois acceptées, les contributeurs recevront une licence d'un an pour wiseflow Pro !
## 🛡️ Licence
Depuis la version 4.2, nous avons mis à jour notre licence open source. Veuillez consulter : [LICENSE](LICENSE)
Pour une coopération commerciale, veuillez contacter **Email : zm.zhao@foxmail.com**
## 📬 Contact
Pour toute question ou suggestion, n'hésitez pas à laisser un message via les [issues](https://github.com/TeamWiseFlow/wiseflow/issues).
🎉 wiseflow & OFB proposent désormais une **base de connaissances payante**, incluant des tutoriels d'installation pas à pas, des astuces d'application exclusives et un **groupe WeChat VIP** :
N'hésitez pas à ajouter « Keeper » sur WeChat Enterprise pour toute demande :
<img width="360" height="360" alt="wiseflow掌柜" src="https://github.com/user-attachments/assets/b013b3fd-546e-4176-b418-57bee419e761" />
🌹 L'open source demande beaucoup d'efforts — merci pour votre soutien !
## 🤝 wiseflow 5.x est construit sur les excellents projets open source suivants :
- Patchright (Version Python indétectable de la bibliothèque de test et d'automatisation Playwright) https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
- Feedparser (Analyse de flux en Python) https://github.com/kurtmckee/feedparser
- SearXNG (un métamoteur de recherche internet gratuit qui agrège les résultats de divers services de recherche et bases de données) https://github.com/searxng/searxng
- Wenyan (outil de formatage et de publication Markdown multi-plateforme, utilisé par le New Media Editor Crew via le skill wenyan-formatter) https://github.com/caol64/wenyan
## Citation
Si vous référencez ou citez tout ou partie de ce projet dans votre travail, veuillez inclure les informations suivantes :
```
Author : Wiseflow Team
https://github.com/TeamWiseFlow/wiseflow
```
## Partenaires
[<img src="https://github.com/TeamWiseFlow/wiseflow/raw/4.x/docs/logos/SiliconFlow.png" alt="siliconflow" width="360">](https://siliconflow.com/)
================================================
FILE: README_JP.md
================================================
# Wiseflow
**[中文](README.md) | [English](README_EN.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [Français](README_FR.md) | [العربية](README_AR.md)**
🚀 **STEP INTO 5.x**
> 📌 **4.x をお探しですか?** オリジナルの v4.30 以前のコードは [`4.x` ブランチ](https://github.com/TeamWiseFlow/wiseflow/tree/4.x)にありま��。
```
「我が生には涯(かぎり)有るも、知には涯無し。涯有るを以て涯無きに随(したが)うは、殆(あやう)きのみ!」—— 『荘子・内篇・養生主第三』
```
wiseflow 4.x(およびそれ以前のバージョン)は、一連の精密なワークフローによって特定のシナリオで強力なデータ取得能力を実現しましたが、依然として多くの制限がありました:
- 1. インタラクティブなコンテンツを取得できない(クリックしないと表示されないコンテンツ、特に動的ロードの場合)
- 2. 情報のフィルタリングと抽出のみで、下流タスク処理能力がほぼない
- ……
私たちは機能の改善と範囲の拡大に取り組んできましたが、現実の世界は複雑であり、インターネットも同様です。ルールを網羅することは不可能であるため、固定のワークフローではすべてのシナリオに対応できません。これは wiseflow の問題ではなく、従来のソフトウェアの問題です!
しかし、この一年で Agent 技術が飛���的に進歩し、大規模言語モデルによって人間のインターネット行動を完全にシミュレートすることが技術的に可能であることが示されました。[openclaw](https://github.com/openclaw/openclaw) の登場は、この確信をさらに強めました。
さらに驚くべきことに、初期の実験と探索を通じて、wiseflow のデータ取得能力を「プラグイン」として openclaw に統合することで、上記の2つの制限を完全に解決できることを発見しました。
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
ただし、openclaw のプラグインシステムは、従来の「プラグイン」(Claude Code のプラグインのようなもの)とは異なるため、「add-on」という概念を新たに導入する必要がありました。正確に言えば、wiseflow 5.x は openclaw の add-on として提供されます。オリジナルの openclaw には「add-on」アーキテクチャがありませんが、実際にはいくつかの簡単なシェルコマンドでこの「改造」を完了できます。また、実際のビジネスシーンに向���たプリセット設定を含む、すぐに使える openclaw 強化版 [openclaw_for_business](https://github.com/TeamWiseFlow/openclaw_for_business) も用意しています。クローンして、wiseflow のリリースを openclaw_for_business の add-on フォルダに配置するだけで使用できます。
## ✨ wiseflow をインストールすることで何が得られますか(オリジナル openclaw より優れている点)?
### 1. アンチ検出ブラウザ、ブラウザ拡張機能インストール不要
wiseflow の patch-001 は、openclaw に内蔵された Playwright を [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)(Playwright の検出回避フォーク)に置き換え、自動化ブラウザがターゲットサイトに検出・ブロックされる可能性を大幅に低減します。これにより、Chrome Relay Extension をインストールすることなく、マネージドブラウザだけで Relay と同等、あるいはそれ以上のウェブ取得・操作能力を実現できます。
📥 *私たちは現在市場で人気のあるブラウザ自動化フレームワーク(nodriver、browser-use、Vercel の agent-browser など)を総合的に評価しました。すべてが CDP を通じて動作し、openclaw 専用の永続化プロファイルを提供するという基本原理は同じですが、CDP プローブの完全な除去を提供しているのは Patchright だけです。つまり、最も純粋な CDP 直接接続アプローチを使用しても、特徴的なフィンガープリントは残り、検出される可能性があります。他のフレームワークはデータ取得ではなく自動テストを目的として設計されていますが、Patchright はもともとデータ取得を目的として設計されており、本質的には Playwright のパッチであり、ほぼすべての Playwright の上位 API を継承しています。これにより openclaw とのネイティブな互換性が実現し、追加のプラグインや MCP をインストールする必要がありません。*
### 2. タブ自動復元メカニズム
Agent の操作中にターゲットタブが予期せず閉じられたり失われたりした場合、スナップショットベースのタブ復元を自動的に実行し、タブの消失によるタスクの中断を防ぎます。
### 3. スマート検索 Skill
openclaw に内蔵された `web_search` をより強力な検索機能に置き換えます。オリジナルの内蔵 web search tool と比較して、スマート検索には3つの主要な優位性があります:
- **完全無料、API キー不要**:サードパーティの検索 API に依存せず、ゼロコストで利用可能
- **リアルタイム検索、最高の鮮度**:ブラウザを直接ターゲットページや主要なソーシャルメディアプラットフォーム(Weibo、Twitter/X、Facebook など)に誘導し、最新公開コンテンツを即座に取得
- **検索ソースのカスタマイズ**:ユーザーが自由に検索ソースを指定でき、必要な情報を精確に取得
### 4. 新媒体小編 Crew(プリセット AI エージェント)
すぐに使える中国語ソーシャルメディアコンテンツ制作 AI エージェントで、微博、小紅書、知乎、B ステーション、抖音などの中国の主要プラットフォームに特化しています。
**主な機能:**
- テーマリサーチ + トレンド分析(モード A)
- 下書き拡充 + オンライン根拠追加(モード B)
- 記事確定後、[文颜(Wenyan)](https://github.com/caol64/wenyan) を自動呼び出して WeChat 公式アカウント形式の HTML にレンダリング(7 種類の内蔵テーマ対応)
- WeChat 公式アカウントの下書きに直接プッシュ(モード C、`WECHAT_APP_ID`/`WECHAT_APP_SECRET` の設定が必要)
- AI 画像/動画生成サポート([SiliconFlow](https://www.siliconflow.com/) 画像/動画生成、`SILICONFLOW_API_KEY` の設定が必要)
## 🌟 クイックスタート
> **💡 API コストのご説明**
>
> wiseflow 5.x は openclaw の Agent ワークフローをベースにしており、LLM API アクセスが必要です。事前に API 資格情報を準備することをお勧めします:
>
> - **海外ユーザー(推奨)**:[SiliconFlow](https://www.siliconflow.com/) — 登録後に無料クレジットが付与され、初期使用コストをカバーできます
> - **OpenAI / Anthropic その他のプロバイダー**:互換性のある任意の API が使用可能です
本リポジトリの [Releases](https://github.com/TeamWiseFlow/wiseflow/releases) から openclaw_for_business と wiseflow addon を含む統合パッケージをダウンロードしてください。
1. アーカイブをダウンロードして解凍する
2. 解凍されたフォルダに移動する
3. 起動方式を選択する:
**デバッグモード**(単回起動、テスト・開発向け):
```bash
./scripts/dev.sh gateway
```
**本番モード**(システムサービスとしてインストール、長期運用向け):
```bash
./scripts/reinstall-daemon.sh
```
> **システム要件**
> - **Ubuntu 22.04** を推奨
> - **Windows WSL2** 環境をサポート
> - **macOS** をサポート
> - **Windows ネイティブ**環境での直接実行は**非対応**
### 【代替】手動インストール
> 注意:先に openclaw_for_business をダウンロード・デプロイする必要があります。ダウンロード先:https://github.com/TeamWiseFlow/openclaw_for_business/releases
本リポジトリ内の `wiseflow` フォルダ(リポジトリ全体ではありません)を openclaw_for_business の `addons/` ディレクトリにコピーしてください:
```bash
# 方法1:wiseflow リポジトリからクローン
git clone https://github.com/TeamWiseFlow/wiseflow.git /tmp/wiseflow
cp -r /tmp/wiseflow/wiseflow <openclaw_for_business>/addons/wiseflow
```
インストール後、openclaw_for_business を再起動すると有効になります。
## ディレクトリ構造
```
wiseflow/ # addon パッケージ(addons/ ディレクトリに配置)
├── addon.json # メタデータ
├── overrides.sh # pnpm overrides + 内蔵 web_search を無効化
├── patches/
│ ├── 001-browser-tab-recovery.patch # タブ復元パッチ
│ ├── 002-disable-web-search-env-var.patch # 内蔵 web_search の無効化(env var)
│ └── 003-act-field-validation.patch # ACT フィールド検証パッチ
├── skills/ # グローバルスキル(全エージェント利用可能)
│ ├── browser-guide/SKILL.md # ブラウザのベストプラクティス(ログイン/CAPTCHA/遅延ロードなど)
│ ├── smart-search/SKILL.md # マルチプラットフォーム検索URL構築(内蔵 web_search の代替)
│ └── rss-reader/ # RSS/Atom フィードリーダー
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/fetch-rss.mjs
└── crew/ # プリセット AI エージェント(Crew テンプレート)
└── new-media-editor/ # 新媒体小編(中国語ソーシャルメディアコンテンツ制作)
├── IDENTITY.md / SOUL.md / AGENTS.md / TOOLS.md / ...
└── skills/ # Crew 専属スキル
├── siliconflow-img-gen/ # AI 画像生成(SiliconFlow API)
├── siliconflow-video-gen/ # AI 動画生成(SiliconFlow API)
└── wenyan-formatter/ # Markdown → WeChat HTML / 下書きプッシュ
docs/ # 技術ドキュメント(リポジトリルート)
├── anti-detection-research.md
└── more_powerful_search_skill/
scripts/ # ユーティリティスクリプト(リポジトリルート)
└── generate-patch.sh
tests/ # テストケースとスクリプト(リポジトリルート)
├── README.md
└── run-managed-tests.mjs
```
## WiseFlow Pro 版がリリースされました!
より強力なスクレイピング能力、より包括的なソーシャルメディアサポート、UI インターフェースとワンクリックインストールパッケージ付き — デプロイ不要!
https://github.com/user-attachments/assets/57f8569c-e20a-4564-a669-1200d56c5725
🔥 **Pro 版が発売中**:https://shouxiqingbaoguan.com/
🌹 本日より、wiseflow オープンソース版への PR 貢献(コード、ドキュメント、成功事例の共有すべて歓迎)が採用された場合、コントリビューターには wiseflow Pro 版の1年間ライセンスが贈呈されます!
## 🛡️ ライセンス
バージョン 4.2 以降、オープンソースライセンスを更新しました。詳細はこちら:[LICENSE](LICENSE)
商用提携については **Email:zm.zhao@foxmail.com** までご連絡ください。
## 📬 お問い合わせ
ご質問やご提案がございましたら、[issue](https://github.com/TeamWiseFlow/wiseflow/issues) からお気軽にメッセージをお寄せください。
🎉 wiseflow && OFB では現在**有料ナレッジベース**を提供しています。内容には、ゼロからの手順インストールチュートリアル、各種独自の活用ノウハウ、および **VIP WeChat グループ**が含まれます:
ご相談は「掌柜的」企業 WeChat をご追加ください:
<img width="360" height="360" alt="wiseflow掌柜" src="https://github.com/user-attachments/assets/b013b3fd-546e-4176-b418-57bee419e761" />
🌹 オープンソース維持のご支援に感謝します!
## 🤝 wiseflow 5.x は以下の優秀なオープンソースプロジェクトを基盤としています:
- Patchright(Playwright テスト・自動化ライブラリの検出回避 Python 版)https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
- Feedparser(Python でフィードを解析)https://github.com/kurtmckee/feedparser
- SearXNG(様々な検索サービスやデータベースから結果を集約する無料のインターネットメタ検索エンジン)https://github.com/searxng/searxng
- 文颜(Wenyan)(多プラットフォーム Markdown フォーマットと投稿ツール、新媒体小編 Crew が wenyan-formatter スキル経由で使用)https://github.com/caol64/wenyan
## Citation
本プロジェクトの一部または全部を参照・引用する場合は、以下の情報を明記してください:
```
Author:Wiseflow Team
https://github.com/TeamWiseFlow/wiseflow
```
## パートナー
[<img src="https://github.com/TeamWiseFlow/wiseflow/raw/4.x/docs/logos/SiliconFlow.png" alt="siliconflow" width="360">](https://siliconflow.com/)
================================================
FILE: README_KR.md
================================================
# Wiseflow
**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [Deutsch](README_DE.md) | [Français](README_FR.md) | [العربية](README_AR.md)**
🚀 **STEP INTO 5.x**
> 📌 **4.x를 찾고 계신가요?** 원래 v4.30 이전 버전의 코드는 [`4.x` 브랜치](https://github.com/TeamWiseFlow/wiseflow/tree/4.x)에서 확인할 수 있습니다.
```
"내 삶에는 한계가 있지만, 지식에는 한계가 없다. 유한한 것으로 무한한 것을 쫓으니, 위태로울 뿐이다!" —— 『장자·내편·양생주제삼』
```
wiseflow 4.x(이전 버전 포함)는 일련의 정밀한 워크플로우를 통해 특정 시나리오에서 강력한 데이터 수집 능력을 구현했지만, 여전히 많은 한계가 존재했습니다:
- 1. 인터랙티브 콘텐츠를 수집할 수 없음 (클릭해야만 나타나는 콘텐츠, 특히 동적 로딩의 경우)
- 2. 정보 필터링과 추출만 가능하며, 다운스트림 작업 처리 능력이 거의 없음
- ……
기능 개선과 범위 확장에 꾸준히 노력해 왔지만, 현실 세계는 복잡하고 인터넷도 마찬가지입니다. 규칙을 완전히 망라하는 것은 불가능하므로, 고정된 워크플로우로는 모든 시나리오에 대응할 수 없습니다. 이것은 wiseflow의 문제가 아니라 전통적인 소프트웨어의 문제입니다!
그러나 지난 1년간 Agent 기술의 비약적인 발전은 대규모 언어 모델로 인간의 인터넷 행동을 완전히 시뮬레이션하는 것이 기술적으로 가능하다는 것을 보여주었습니다. [openclaw](https://github.com/openclaw/openclaw)의 등장은 이러한 확신을 더욱 굳건히 했습니다.
더욱 놀라운 것은, 초기 실험과 탐색을 통해 wiseflow의 데이터 수집 능력을 "플러그인" 형태로 openclaw에 통합하면 위에서 언급한 두 가지 한계를 완벽하게 해결할 수 있다는 것을 발견했습니다.
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
다만, openclaw의 플러그인 시스템은 우리가 전통적으로 이해하는 "플러그인"(Claude Code의 플러그인과 유사한 것)과는 다르기 때문에, "add-on"이라는 개념을 별도로 도입해야 했습니다. 정확히 말하면, wiseflow 5.x는 openclaw add-on 형태로 제공됩니다. 원래 openclaw에는 "add-on" 아키텍처가 없지만, 실제로는 몇 가지 간단한 셸 명령어만으로 이 "개조"를 완료할 수 있습니다. 또한 실제 비즈니스 시나리오를 위한 프리셋 설정이 포함된 즉시 사용 가능한 openclaw 강화 버전인 [openclaw_for_business](https://github.com/TeamWiseFlow/openclaw_for_business)도 준비했습니다. 클론한 후 wiseflow 릴리스를 openclaw_for_business의 add-on 폴더에 배치하면 됩니다.
## ✨ wiseflow를 설치하면 무엇을 얻을 수 있나요(원본 openclaw보다 우수한 점)?
### 1. 탐지 방지 브라우저, 브라우저 확장 프로그램 설치 불필요
wiseflow의 patch-001은 openclaw 내장 Playwright를 [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)(Playwright의 탐지 방지 포크)로 교체하여, 자동화 브라우저가 대상 웹사이트에 감지·차단될 가능성을 크게 줄입니다. 이를 통해 Chrome Relay Extension 설치 없이, 관리형 브라우저만으로도 Relay와 동등하거나 더 뛰어난 웹 수집 및 조작 능력을 달성할 수 있습니다.
📥 *저희는 nodriver, browser-use, Vercel의 agent-browser 등 현재 시장에서 인기 있는 모든 브라우저 자동화 프레임워크를 종합적으로 평가했습니다. 모두 CDP를 통해 동작하고 openclaw 전용 지속적 프로필을 제공한다는 기본 원리는 같지만, CDP 프로브를 완전히 제거하는 것은 Patchright뿐입니다. 즉, 가장 순수한 CDP 직접 연결 방식을 사용하더라도 여전히 검출 가능한 특징이 남아 있습니다. 다른 프레임워크는 데이터 수집이 아닌 자동화 테스트를 목적으로 설계되었지만, Patchright는 처음부터 데이터 수집을 목적으로 설계되었습니다. 본질적으로 Playwright의 패치이기 때문에 거의 모든 Playwright 상위 API를 그대로 계승하며, 이로 인해 openclaw와 기본적으로 호환되어 추가 플러그인이나 MCP를 설치할 필요가 없습니다.*
### 2. 자동 탭 복구 메커니즘
Agent 작업 중 대상 탭이 예기치 않게 닫히거나 사라질 경우, 스냅샷 기반 탭 복구를 자동으로 수행하여 탭 소실로 인한 작업 중단을 방지합니다.
### 3. 스마트 검색 Skill
openclaw 내장 `web_search`를 더욱 강력한 검색 기능으로 대체합니다. 원버전 내장 web search tool 대비 스마트 검색의 세 가지 핵심 강점:
- **완전 무료, API 키 불필요**: 서드파티 검색 API에 의존하지 않아 비용 제로
- **실시간 검색, 최고의 시의성**: 브라우저를 직접 대상 페이지나 주요 소셜 미디어 플랫폼(Weibo, Twitter/X, Facebook 등)으로 ��동하여 최신 게시물을 즉시 검색
- **검색 출처 사용자 정의 가능**: 사용자가 검색 출처를 자유롭게 지정하여 필요한 정보를 정확하게 취득
### 4. 새 미디어 편집자 Crew(사전 설정 AI 에이전트)
즉시 사용 가능한 중국어 소셜 미디어 콘텐츠 제작 AI 에이전트로, 웨이보, 샤오홍슈, 즈후, 빌리빌리, 더우인 등 중국 주요 플랫폼에 특화되어 있습니다.
**주요 기능:**
- 주제 리서치 + 트렌드 분석(Mode A)
- 초안 확장 + 온라인 근거 추가(Mode B)
- 기사 완성 후 [文颜(Wenyan)](https://github.com/caol64/wenyan)을 자동으로 호출하여 위챗 공식 계정 스타일 HTML로 렌더링(내장 테마 7개 지원)
- 위챗 공식 계정 임시 보관함에 직접 발행(Mode C, `WECHAT_APP_ID`/`WECHAT_APP_SECRET` 설정 필요)
- AI 이미지/영상 생성 지원([SiliconFlow](https://www.siliconflow.com/) 이미지/영상 생성, `SILICONFLOW_API_KEY` 설정 필요)
## 🌟 빠른 시작
> **💡 API 비용 안내**
>
> wiseflow 5.x는 openclaw의 Agent 워크플로우를 기반으로 하며, LLM API 접근이 필요합니다. 사전에 API 자격 증명을 준비하시기 바랍니다:
>
> - **해외 사용자(권장)**:[SiliconFlow](https://www.siliconflow.com/) — 등록 후 무료 크레딧 지급, 초기 사용 비용 충당 가능
> - **OpenAI / Anthropic 및 기타 제공업체**:호환 가능한 모든 API 사용 가능
본 저장소의 [Releases](https://github.com/TeamWiseFlow/wiseflow/releases)에서 openclaw_for_business와 wiseflow addon이 포함된 통합 패키지를 다운로드하세요.
1. 압축 파일을 다운로드하고 압축을 해제합니다
2. 압축 해제된 폴더로 이동합니다
3. 시작 방식을 선택합니다:
**디버그 모드**(단회 실행, 테스트 및 개발용):
```bash
./scripts/dev.sh gateway
```
**프로덕션 모드**(시스템 서비��로 설치, 장기 운영용):
```bash
./scripts/reinstall-daemon.sh
```
> **시스템 요구사항**
> - **Ubuntu 22.04** 권장
> - **Windows WSL2** 환경 지원
> - **macOS** 지원
> - **Windows 네이티브** 환경에서의 직접 실행은 **지원하지 않음**
### 【대안】수동 설치
> 주의: 먼저 openclaw_for_business를 다운로드하여 배포해야 합니다. 다운로드 주소: https://github.com/TeamWiseFlow/openclaw_for_business/releases
저장소 내의 `wiseflow` 폴더(저장소 전체가 아님)를 openclaw_for_business의 `addons/` 디렉토리에 복사하세요:
```bash
# 방법 1: wiseflow 저장소에서 클론
git clone https://github.com/TeamWiseFlow/wiseflow.git /tmp/wiseflow
cp -r /tmp/wiseflow/wiseflow <openclaw_for_business>/addons/wiseflow
```
설치 후 openclaw_for_business를 재시작하면 적용됩니다.
## 디렉토리 구조
```
wiseflow/ # addon 패키지(addons/ 디렉토리에 복사)
├── addon.json # 메타데이터
├── overrides.sh # pnpm overrides + 내장 web_search 비활성화
├── patches/
│ ├── 001-browser-tab-recovery.patch # 탭 복구 패치
│ ├── 002-disable-web-search-env-var.patch # 내장 web_search 비활성화 (env var)
│ └── 003-act-field-validation.patch # ACT 필드 유효성 검사 패치
├── skills/ # 글로벌 스킬(모든 에이전트 사용 가능)
│ ├── browser-guide/SKILL.md # 브라우저 모범 사례 (로그인/캡차/지연 로딩 등)
│ ├── smart-search/SKILL.md # 다중 플랫폼 검색 URL 빌더 (내장 web_search 대체)
│ └── rss-reader/ # RSS/Atom 피드 리더
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/fetch-rss.mjs
└── crew/ # 사전 설정 AI 에이전트(Crew 템플릿)
└── new-media-editor/ # 새 미디어 편집자(중국어 소셜 미디어 콘텐츠 제작)
├── IDENTITY.md / SOUL.md / AGENTS.md / TOOLS.md / ...
└── skills/ # Crew 전용 스킬
├── siliconflow-img-gen/ # AI 이미지 생성(SiliconFlow API)
├── siliconflow-video-gen/ # AI 영상 생성(SiliconFlow API)
└── wenyan-formatter/ # Markdown → 위챗 HTML / 임시 보관함 발행
docs/ # 기술 문서(저장소 루트)
├── anti-detection-research.md
└── more_powerful_search_skill/
scripts/ # 유틸리티 스크립트(저장소 루트)
└── generate-patch.sh
tests/ # 테스트 케이스 및 스크립트(저장소 루트)
├── README.md
└── run-managed-tests.mjs
```
## WiseFlow Pro 버전 출시!
더 강력한 스크래핑 능력, 더 포괄적인 소셜 미디어 지원, UI 인터페이스 및 원클릭 설치 패키지 — 배포 불필요!
https://github.com/user-attachments/assets/57f8569c-e20a-4564-a669-1200d56c5725
🔥 **Pro 버전 판매 중**: https://shouxiqingbaoguan.com/
🌹 오늘부터 wiseflow 오픈소스 버전에 PR 기여(코드, 문서, 성공 사례 공유 모두 환영)가 채택되면, 기여자에게 wiseflow Pro 버전 1년 사용권이 증정됩니다!
## 🛡️ 라이선스
버전 4.2부터 오픈소스 라이선스를 업데이트했습니다. 자세한 내용은: [LICENSE](LICENSE)
상업적 협력 문의: **Email: zm.zhao@foxmail.com**
## 📬 연락처
질문이나 제안이 있으시면 [issue](https://github.com/TeamWiseFlow/wiseflow/issues)를 통해 메시지를 남겨주세요.
🎉 wiseflow && OFB에서 현재 **유료 지식 베이스**를 제공하고 있습니다. 내용에는 단계별 설치 튜토리얼, 각종 독점 활용 팁, **VIP 위챗 그룹**이 포함됩니다:
"掌柜的" 기업 위챗을 추가하여 문의하세요:
<img width="360" height="360" alt="wiseflow掌柜" src="https://github.com/user-attachments/assets/b013b3fd-546e-4176-b418-57bee419e761" />
🌹 오픈소스 유지에 응원해 주셔서 감사합니다!
## 🤝 wiseflow 5.x는 다음의 우수한 오픈소스 프로젝트를 기반으로 합니다:
- Patchright (Playwright 테스트 및 자동화 라이브러리의 탐지 우회 Python 버전) https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
- Feedparser (Python으로 피드 파싱) https://github.com/kurtmckee/feedparser
- SearXNG (다양한 검색 서비스와 데이터베이스에서 결과를 집계하는 무료 인터넷 메타 검색 엔진) https://github.com/searxng/searxng
- Wenyan (다중 플랫폼 Markdown 서식 및 게시 도구, 새 미디어 편집자 Crew가 wenyan-formatter 스킬을 통해 사용) https://github.com/caol64/wenyan
## Citation
본 프로젝트의 일부 또는 전체를 참조하거나 인용하는 경우, 다음 정보를 명시해 주세요:
```
Author: Wiseflow Team
https://github.com/TeamWiseFlow/wiseflow
```
## 파트너
[<img src="https://github.com/TeamWiseFlow/wiseflow/raw/4.x/docs/logos/SiliconFlow.png" alt="siliconflow" width="360">](https://siliconflow.com/)
================================================
FILE: docs/anti-detection-research.md
================================================
# 浏览器自动化反检测方案调研报告
> 调研日期:2026-02-20
> 目标:评估 rebrowser-patches 与 patchright 两个方案,为 OpenClaw 集成反检测能力提供技术路线
---
## 一、问题背景
### 1.1 为什么本地 AI 助手也需要反检测
OpenClaw 作为本地 AI 助手,通过浏览器执行用户指令(比价、填表、读取信息、搜索等)。目标网站不区分"本地助手"和"恶意爬虫"——检测到自动化特征就会触发防御:
- 电商比价 → 触发验证码或封 IP
- 表单提交 → 被拒绝
- 银行/邮箱 → 风控要求重新验证
- Google 搜索 → CAPTCHA
- 后台管理 → WAF 拦截
### 1.2 Playwright 的主要检测泄露点
| 泄露点 | 检测原理 | 严重程度 |
|--------|---------|---------|
| `Runtime.enable` CDP 调用 | 激活 Runtime domain 后浏览器内部行为变化,反爬脚本可通过侧信道检测 | **致命** |
| `Console.enable` CDP 调用 | 类似 Runtime.enable 的侧信道 | 高 |
| `--enable-automation` 启动参数 | 设置 `navigator.webdriver = true` | 高 |
| `//# sourceURL=pptr:evaluate` | 注入脚本带有特征性 sourceURL 注释 | 中 |
| `__playwright_utility_world__` | utility world 名称可被检测 | 中 |
| init script 通过 CDP 注入 | `Page.addScriptToEvaluateOnNewDocument` 有检测方法 | 中 |
| Playwright 自带 Chromium | 定制版浏览器与正版 Chrome 有指纹差异 | 中(connectOverCDP 时不存在) |
| 大量自动化启动参数 | 参数组合本身是指纹 | 低-中 |
### 1.3 OpenClaw 当前状态
- 使用 `playwright-core@1.58.2`
- 已经通过 `chromium.connectOverCDP()` 连接浏览器(不用 launch)
- Extension relay 模式下连接用户真实 Chrome,零启动参数
- 但 **Playwright driver 层的 CDP 泄露未处理**(Runtime.enable、Console.enable 等)
---
## 二、方案 A:rebrowser-patches
### 2.1 项目概况
- GitHub: `rebrowser/rebrowser-patches` (1.2k stars)
- 生态:rebrowser-playwright-core(drop-in 替代包)、rebrowser-puppeteer 等
- 方式:对现有 playwright-core 源码打补丁(Unix `patch` 命令)
- 支持:Puppeteer + Playwright
### 2.2 修补内容
#### Runtime.enable 修复 — 3 种模式
**模式 1:`addBinding`(默认,推荐)**
```
原理:
1. 生成随机名称 binding(如 "x7k2m9q")
2. 通过 Runtime.addBinding 注册(不需要 Runtime.enable)
3. 在 isolated world 中 dispatch 自定义事件触发 binding
4. 从 Runtime.bindingCalled 回调拿到真实 executionContextId
5. 用该 contextId 做后续 Runtime.evaluate
优势:
- 完全不调用 Runtime.enable
- 保留对 main world 的完整访问(能读页面变量)
- 支持 web workers 和 iframes
```
**模式 2:`alwaysIsolated`**
```
原理:所有脚本执行都在 Page.createIsolatedWorld 创建的隔离上下文中
优势:完全隔离,防止 MutationObserver 检测
劣势:无法访问 main world 变量,不支持 web workers
```
**模式 3:`enableDisable`**
```
原理:快速 enable → 捕获 context ID → 立即 disable
优势:完整 main world 访问
劣势:有短暂时间窗口可能被检测到
```
#### 其他修复
| 补丁 | 效果 | 配置 |
|------|------|------|
| sourceURL 伪装 | `pptr:evaluate` → `app.js`(可自定义) | `REBROWSER_PATCHES_SOURCE_URL=jquery.min.js` |
| utility world 名称 | `__puppeteer_utility_world__` → `util`(可自定义) | `REBROWSER_PATCHES_UTILITY_WORLD_NAME=util` |
### 2.3 未修复的内容
- Console.enable — 未处理(但也未禁用,保留了完整功能)
- init script 注入方式 — 仍走 CDP
- CSP — 未处理
- Closed Shadow Root — 未处理
- 启动参数 — 未修改(需自行配置)
- 指纹伪装 — 不在范围内
### 2.4 配置方式
```bash
# 环境变量(运行时可切换)
REBROWSER_PATCHES_RUNTIME_FIX_MODE=addBinding # 默认
REBROWSER_PATCHES_RUNTIME_FIX_MODE=alwaysIsolated
REBROWSER_PATCHES_RUNTIME_FIX_MODE=enableDisable
REBROWSER_PATCHES_RUNTIME_FIX_MODE=0 # 禁用
REBROWSER_PATCHES_SOURCE_URL=app.js
REBROWSER_PATCHES_UTILITY_WORLD_NAME=util
REBROWSER_PATCHES_DEBUG=1
```
### 2.5 集成方式
```bash
# 方式 1:打补丁(npm install 后需重新执行)
npx rebrowser-patches@latest patch --packageName playwright-core
# 方式 2:替换包(推荐,一劳永逸)
# package.json:
# "playwright-core": "1.58.2" → "rebrowser-playwright-core": "1.58.2"
# 无需改 import 路径
```
方案 A 测试下来与 openclaw 存在一定兼容性问题。
openclaw 使用 playwright 1.58.2 版本,但是 rebrowser-patches 最新只在 playwright 1.52.0 版本上进行过全面测试。
openclaw 高度依赖 playwright 的私有 api,比如 _snapshotForAI() 等。应用 rebrowser-patches 后,此接口无法工作。
---
## 三、方案 B:Patchright
### 3.1 项目概况
- GitHub: `Kaliiiiiiiiii-Vinyzu/patchright` + `patchright-python` (1.1k stars)
- 方式:fork Playwright 源码,通过 ts-morph AST 重写 22 个核心模块,编译为独立包
- 支持:仅 Playwright
- 自动化:每小时检查 Playwright 新版本,自动 patch 并发布
### 3.2 修补内容
| 补丁 | 详情 |
|------|------|
| **Runtime.enable 移除** | 从 crPage、crDevTools、crServiceWorker 中直接删除调用 |
| **Console.enable 禁用** | 完全移除 Console domain |
| **启动参数清理** | 移除 `--enable-automation` 等 6 个参数,添加 `--disable-blink-features=AutomationControlled` |
| **init script 注入** | 改为 HTTP route interception → 在 HTML `<head>` 中注入 `<script>` 标签 |
| **CSP bypass** | 自动修改 Content-Security-Policy,添加 nonce/unsafe-inline |
| **sourceURL 移除** | 删除所有 `//# sourceURL` 注释 |
| **Service Worker** | 静默阻止注册(移除 console.warn 暴露信息) |
| **Closed Shadow Root** | 支持穿透 mode:'closed' 的 Shadow DOM |
| **evaluate() 改造** | 新增 `isolated_context` 参数(默认 true) |
### 3.3 代价
| 失去的能力 | 影响 |
|-----------|------|
| **Console API 完全禁用** | `page.on("console")` 永远不触发 |
| **page.pause() 调试** | 未知是否受影响 |
| **Console 日志收集** | 需要替代方案(JS 注入) |
### 3.4 集成方式
```bash
# 必须改 import 路径
# package.json:
# "playwright-core": "1.58.2" → "patchright-core": "1.57.0"
# 所有源码:
# import { chromium } from "playwright-core" → import { chromium } from "patchright-core"
```
已在本仓库落地(2026-02-22):
- `openclaw/package.json` 已切换到 `patchright-core@1.57.0`(当前 npm 可用最新版本)
- `openclaw/src/browser/*` 中所有 `playwright-core` import 已切换为 `patchright-core`
- `scripts/dev.sh` / `scripts/apply-patches.sh` 已移除 rebrowser 自动补丁流程
- 上游改动已生成业务补丁:`patches/001-switch-playwright-to-patchright-core.patch`
### 3.5 已知问题(来自 GitHub Issues)
- `#94` 新版本反而被检测到(dist-info 暴露)
- `#100` Cloudflare 403 错误
- `#101` Google Anti-Bot 触发
- `#170` Sannysoft 检测到 patchright
---
## 四、方案对比
### 4.1 核心差异
| 维度 | rebrowser-patches | Patchright |
|------|-------------------|-----------|
| **修补方式** | 运行时打补丁 / drop-in 包 | 编译时 fork 重写 |
| **改动侵入性** | 低 — 可一键回退 | 高 — 需改所有 import |
| **Runtime.enable** | 3 种模式可选,默认 addBinding | 单一方案 isolated context |
| **Console API** | **保留** | **禁用** |
| **main world 访问** | ✅ addBinding 模式完整保留 | ⚠️ isolated context 有局限 |
| **init script 注入** | 未改(仍走 CDP) | ✅ 改为 HTML 注入 |
| **CSP bypass** | 未处理 | ✅ 自动处理 |
| **Closed Shadow Root** | 未处理 | ✅ 支持穿透 |
| **启动参数清理** | 未处理(需自行配置) | ✅ 自动清理 |
| **配置灵活性** | 环境变量运行时切换 | 编译时固定 |
| **`_snapshotForAI` 兼容** | 大概率兼容(改动面小) | 风险较高(重写面广) |
| **OpenClaw console 收集** | ✅ 不受影响 | ❌ 需要改造 |
| **GitHub dependents** | 有 drop-in 替代包生态 | 0 个已知依赖项目 |
| **反检测通过率** | 未公开完整测试 | 声称通过 Cloudflare/Kasada/Datadome 等(但有 issue 反馈失败) |
### 4.2 适用场景判断
| 场景 | 推荐方案 | 原因 |
|------|---------|------|
| **OpenClaw 集成(首选)** | rebrowser-patches | Console API 保留、改动小、风险低、可回退 |
| **需要最激进反检测** | Patchright | init script HTML 注入 + CSP bypass + closed shadow root |
| **快速验证可行性** | rebrowser-patches | 一行命令打补丁,不改代码 |
| **长期维护** | 两者均可 | rebrowser 有 drop-in 包;patchright 自动跟踪上游 |
---
## 五、OpenClaw 集成改造方案
### 5.1 Phase 1:最小改动验证(rebrowser-patches)
**目标**:零代码修改,验证基本兼容性
```bash
cd openclaw
# 打补丁
npx rebrowser-patches@latest patch --packageName playwright-core
# 设置环境变量
export REBROWSER_PATCHES_RUNTIME_FIX_MODE=addBinding
export REBROWSER_PATCHES_SOURCE_URL=app.js
export REBROWSER_PATCHES_UTILITY_WORLD_NAME=util
# 启动 OpenClaw 并测试
```
**验证清单**:
- [ ] OpenClaw 正常启动
- [ ] `_snapshotForAI()` 正常工作
- [ ] `page.on("console")` 事件正常触发
- [ ] `page.evaluate()` 正常执行
- [ ] 页面导航和元素交互正常
- [ ] Extension relay 模式正常工作
- [ ] 非 Extension 模式正常工作
### 5.2 Phase 2:反检测效果测试
**目标**:量化反检测提升
使用以下检测网站逐一测试:
| 检测站 | URL | 测试项 |
|--------|-----|-------|
| CreepJS | `https://nicepkg.github.io/nicepkg-test/` | 综合指纹 |
| Sannysoft | `https://bot.sannysoft.com/` | navigator.webdriver 等 |
| Incolumitas | `https://bot.incolumitas.com/` | 高级检测 |
| Browserscan | `https://browserscan.net/` | 浏览器指纹 |
| Pixelscan | `https://pixelscan.net/` | 指纹一致性 |
**测试矩阵**(6 种组合):
```
未打补丁 rebrowser-patches
非 Extension 模式: A1 A2
Extension + 真实 Chrome: B1 B2
```
**对比指标**:
- 各检测站得分/通过项
- `navigator.webdriver` 值
- Runtime.enable 是否泄露
- CreepJS Trust Score
### 5.3 Phase 3:Patchright 对比测试
**目标**:评估 Patchright 的额外收益是否值得代价
```bash
cd openclaw
# 替换包
# package.json: "playwright-core" → "patchright-core"
# 批量替换 import(8 个文件)
# 改造 console 收集逻辑
# 同样跑 Phase 2 的测试矩阵
```
**额外验证**:
- [ ] `_snapshotForAI()` 是否兼容(最关键)
- [ ] Console 收集替代方案是否可靠
- [ ] init script HTML 注入是否带来额外通过率
### 5.4 Phase 4:生产化改造(基于 Phase 2/3 结果选择方案)
**如果选 rebrowser-patches**:
```json
// package.json
{
"dependencies": {
"rebrowser-playwright-core": "1.58.2" // 替代 playwright-core
}
}
```
**额外加固**(无论选哪个方案):
- [ ] OpenClaw 非 Extension 模式:添加 `--disable-blink-features=AutomationControlled` 到启动参数
- [ ] OpenClaw 非 Extension 模式:移除 `--enable-automation` 等自动化参数
- [ ] 审查 `cdp.ts` 中的 `Runtime.enable` 直接调用,评估是否可以移除
- [ ] Extension 模式:考虑添加 `chrome.runtime.onStartup` 自动重连
---
## 六、OpenClaw raw CDP 层清理
OpenClaw 自有的 `cdp.ts` 直接通过 WebSocket 发送 CDP 命令,绕过了 Playwright driver:
```typescript
// 这些调用不经过 Playwright,不受 rebrowser-patches / patchright 影响
send("Runtime.enable")
send("Runtime.evaluate", { expression, awaitPromise })
send("Runtime.terminateExecution")
```
**需要评估**:
1. `Runtime.enable` 是否可以移除?在只做 `Runtime.evaluate` 的场景下,某些 Chrome 版本不需要先 enable
2. `Runtime.terminateExecution` 是否需要先 enable?需测试
3. 如果必须 enable,可以参考 rebrowser-patches 的 enableDisable 模式(快速 enable → 拿到 contextId → 立即 disable)
---
## 七、检测面全景图
改造完成后,两种模式的检测面:
### 非 Extension 模式(rebrowser-patches + 参数清理)
```
✅ 已消除:
- Runtime.enable 泄露(rebrowser-patches addBinding 模式)
- sourceURL 特征(rebrowser-patches)
- utility world 名称(rebrowser-patches)
- navigator.webdriver(--disable-blink-features=AutomationControlled)
- --enable-automation 参数
⚠️ 仍存在:
- --remote-debugging-port 参数(必需)
- Console.enable(rebrowser-patches 未处理)
- Playwright ���带 Chromium 指纹(如果不用 connectOverCDP)
- OpenClaw cdp.ts 的 Runtime.enable(需单独清理)
- init script 通过 CDP 注入(rebrowser-patches 未改)
```
### Extension + 真实 Chrome 模式(rebrowser-patches + 参数清理)
```
✅ 已消除:
- Runtime.enable 泄露
- sourceURL 特征
- utility world 名称
- navigator.webdriver
- --remote-debugging-port(Extension 不需要)
- --enable-automation(Extension 不需要)
- 浏览器指纹差异(真实 Chrome)
- 空 Profile(真实用户 Profile)
- 所有自动化启动参数(零参数)
⚠️ 仍存在:
- Console.enable
- OpenClaw cdp.ts 的 Runtime.enable(需单独清理)
- init script 通过 CDP 注入
- Chrome 扩展可能被探测(chrome://extensions 可见)
- chrome.debugger 调试横幅(页面 JS 不可检测,但用户可见)
```
如果用 Patchright 替代 rebrowser-patches,Console.enable 和 init script 注入也可以消除,但代价是失去 Console API 和更高的兼容风险。
---
## 八、参考文件索引
### 项目源码
| 项目 | 关键文件 | 用途 |
|------|---------|------|
| OpenClaw | `src/browser/pw-session.ts:335` | `chromium.connectOverCDP()` 连接入口 |
| OpenClaw | `src/browser/pw-session.ts:217-283` | 页面事件监听(含 console) |
| OpenClaw | `src/browser/pw-tools-core.snapshot.ts:54-62` | `_snapshotForAI()` 调用 |
| OpenClaw | `src/browser/extension-relay.ts` | Extension relay 服务器 |
| OpenClaw | `assets/chrome-extension/background.js` | 扩展核心逻辑 |
| OpenClaw | `src/browser/cdp.ts` | Raw CDP 层(含 Runtime.enable) |
| rebrowser-patches | `patches/playwright-core/src.patch` | Playwright 核心补丁 |
| rebrowser-patches | `scripts/patcher.js` | 补丁应用脚本 |
| Patchright | `patchright_driver_patch.js` | 主编排脚本 |
| Patchright | `driver_patches/crPagePatch.js` | 最大补丁(~470 行) |
| Patchright | `driver_patches/crNetworkManagerPatch.js` | HTTP 注入补丁(~465 行) |
### 检测原理参考
| 检测向量 | 说明 |
|---------|------|
| `Runtime.enable` leak | CDP domain 激活后浏览器内部行为变化,可被页面 JS 通过侧信道检测 |
| `navigator.webdriver` | `--enable-automation` 会设置此属性为 true |
| `sourceURL` fingerprint | 注入脚本的 `//# sourceURL=pptr:evaluate` 注释暴露自动化框架 |
| utility world detection | 命名为 `__playwright_utility_world__` 的执行上下文可被枚举 |
| Chrome binary fingerprint | Playwright 自带 Chromium 的 UA/WebGL/内部 API 与正版 Chrome 有差异 |
| launch args fingerprint | 大量 `--disable-*` 参数的组合是自动化特征 |
| empty profile | 无历史记录、无 Cookie、空 localStorage 是强自动化信号 |
---
## 九、明日实验计划
### 优先级排序
1. **rebrowser-patches 基础验证**(Phase 1)— 30 分钟
- 打补丁 → 启动 OpenClaw → 基本功能测试
2. **反检测效果量化**(Phase 2)— 1 小时
- 6 种组合 × 5 个检测站 = 30 次测试
3. **Patchright 验证**(Phase 3,如果 Phase 2 不够好)— 1-2 小时
- 替换包 → 兼容性测试 → 反检测测试
4. **cdp.ts 清理评估**(Phase 4)— 视前几步结果而定
### 预期结论
- 如果 rebrowser-patches 的 addBinding 模式能通过主流检测站且 OpenClaw 功能正常 → 选 rebrowser-patches
- 如果 rebrowser-patches 不够且 Patchright 额外通过了关键检测 → 评估 Patchright 的兼容成本是否可接受
- 如果两者都不够 → 考虑组合方案(rebrowser-patches + 额外 JS 注入补充)
================================================
FILE: docs/more_powerful_search_skill/20260308_done.md
================================================
# 目的一
为 wiseflow add-on 增加一个 skill,旨在让 Agent 通过使用 skill 可以更好的操作浏览器完成各种搜索任务。替换 openclaw 内置的 web_search 工具。
## 实现方案
解析用户指令,按已知规则直接构造查询 url。对于 filter 和 sort 要求,按各平台摸索出的方法指导 agent。
其中常见社交媒体平台的搜索 url 构造和具体指导见 ./direct_url_for_search_on_media_platform.md
额外的去 /extra 文件夹下挨个分析里面的 python 脚本,提炼出查询 url,添加到本 skill 的支持列表中。
# 目的二
为 wiseflow add-on 增加一个 skill,旨在让 Agent 通过使用 skill 获取和解析 rss 信源
## 实现方案
参考 rss_parsor.py
================================================
FILE: docs/more_powerful_search_skill/direct_url_for_search_on_media_platform.md
================================================
# 自媒体平台的搜索
## bilibili(哔哩哔哩,简称:b站):
https://search.bilibili.com/{channel}?keyword={keyword}
keyword 多个的话,之间用 + 连接
channel 可选:
- 综合:all
- 视频:video
- 番剧:bangumi
- 影视:pgc
- 直播:live
- 专栏:article
- 用户:upuser
每个 channel 都可以指定搜索规则(默认不指定任何),具体规则如下:
### all
支持的搜索规则包括:
- 最多播放:&order=click
- 最新发布:&order=pubdate
- 最多弹幕:&order=dm
- 最多收藏:&order=stow
示例:
- 默认搜索:https://search.bilibili.com/all?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83
- 最多播放:https://search.bilibili.com/all?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83&order=click
### video
支持的搜索规则包括:
- 最多播放:&order=click
- 最新发布:&order=pubdate
- 最多弹幕:&order=dm
- 最多收藏:&order=stow
示例:
- 默认搜索:https://search.bilibili.com/video?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83
- 最新发布:https://search.bilibili.com/video?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83&order=pubdate
### bangumi
这个 channel 不支持搜索规则,只有默认搜索
### pgc
这个 channel 不支持搜索规则,只有默认搜索
### live
支持的搜索规则包括(默认是搜全部):
- 搜主播:&search_type=live_user
- 搜直播间:&search_type=live_room
- 按最新开播顺序搜直播间:search_type=live_room&order=live_time
示例:
- 默认搜索(搜全部):https://search.bilibili.com/live?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83
- 按最新开播顺序搜直播间:https://search.bilibili.com/live?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83&search_type=live_room&order=live_time
### article
支持的搜索规则包括:
- 最新发布:&order=pubdate
- 最多点击:&order=click
- 最受欢迎:&order=attention
- 最多评论:&order=scores
示例:
- 默认搜索:https://search.bilibili.com/article?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83
- 最多评论:https://search.bilibili.com/article?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83&order=scores
### upuser
支持的搜索规则包括:
- 粉丝数由高到低:&order=fans
- 粉丝数由低到高:&order=fans&order_sort=1
- 会员等级由高到低:&order=level
- 会员等级由低到高:&order=level&order_sort=1
示例:
- 默认搜索:https://search.bilibili.com/upuser?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83
- 粉丝数由高到低:https://search.bilibili.com/upuser?keyword=%E8%8D%AF%E5%B1%8B%E5%B0%91%E5%A5%B3%E7%9A%84%E5%91%A2%E5%96%83&order=fans
## 抖音(douyin,简称dy):
- 综合搜索:https://www.douyin.com/search/{keyword}?type=general
- 视频搜索:https://www.douyin.com/search/{keyword}?type=video
- 用户搜索:https://www.douyin.com/search/{keyword}?type=user
- 直播搜索:https://www.douyin.com/search/{keyword}?type=live
多 keyword,中间用 %20 连接,如: https://www.douyin.com/search/wiseflow%20%E8%B4%9F%E9%9D%A2
type 缺省为综合搜索
如果涉及到搜索结果排序或者筛选,必须通过网页交互进行
## 微博(weibo,简称 wb):
- 综合搜索:https://s.weibo.com/weibo/{keyword}
- 实时搜索(最新发布):https://s.weibo.com/realtime?q={keyword}
- 搜索用户:https://s.weibo.com/user?q={keyword}
- 搜索话题:https://s.weibo.com/topic?q={keyword}
## 小红书(xiaohongshu,简称 xhs,又称 红薯):
https://www.xiaohongshu.com/search_result?keyword={keyword}&source=web_explore_feed
小红书平台具体的搜索频道、筛选条件、排序要求等都必须通过网页交互实现
## 知乎(zhihu):
- 综合:https://www.zhihu.com/search?type=content&q={keyword}
- 用户(找人):https://www.zhihu.com/search?q={keyword}&type=people
- 论文:https://www.zhihu.com/search?q={keyword}&type=scholar
- 专栏:https://www.zhihu.com/search?q={keyword}&type=column
- 电子书:https://www.zhihu.com/search?q={keyword}&type=publication
- 圈子:https://www.zhihu.com/search?q={keyword}&type=ring
- 话题:https://www.zhihu.com/search?q={keyword}&type=topic
- 视频:https://www.zhihu.com/search?q={keyword}&type=zvideo
多 keyword,中间用 %20 连接,如: https://www.zhihu.com/search?type=zvideo&q=wiseflow%20%E4%BB%98%E8%B4%B9
其中如下支持通过 url 构造filter 条件或者排序:
### 综和
基础:https://www.zhihu.com/search?type=content&q={keyword}
- 条件:
- 只看回答,url 后加:&type=content&vertical=answer
- 只看文章,url 后加:&type=content&vertical=article
- 只看视频,url 后加:&type=content&vertical=zvideo
- 排序:
- 最多赞同,url 后加:&sort=upvoted_count
- 最新发布,url 后加:&sort=created_time
- 时间限制:
- 一天内,url 后加:&time_interval=a_day
- 一周内,url 后加:&time_interval=a_week
- 一月内,url 后加:&time_interval=a_month
- 三月内,url 后加:&time_interval=three_months
- 半年内,url 后加:&time_interval=half_a_year
- 一年内,url 后加:&time_interval=a_year
以上都可以灵活组合:比如:https://www.zhihu.com/search?q=wiseflow%20%E4%BB%98%E8%B4%B9&sort=created_time&time_interval=a_month&type=content&vertical=article
## twitter(X,推特)
- TOP:https://x.com/search?q={keyword}
- Latest:https://x.com/search?q={keyword}&f=live
- People(找人):https://x.com/search?q={keyword}&f=user
- Media:https://x.com/search?q={keyword}&f=media
- Lists:https://x.com/search?q={keyword}&f=list
多 keyword,中间用 %20 连接,如:https://x.com/search?q=wiseflow%20%E8%BD%AF%E4%BB%B6&src=typed_query&f=list
均可叠加 Near You 选项,后面加 &lf=on, 如:https://x.com/search?q=wiseflow&f=live&lf=on
## facebook(FB,脸书)
- ALL:https://www.facebook.com/search/top/?q={keyword}
- People(找人):https://www.facebook.com/search/people/?q={keyword}
- pages: https://www.facebook.com/search/pages?q={keyword}
- groups: https://www.facebook.com/search/groups?q={keyword}
- events: https://www.facebook.com/search/events?q={keyword}
多 keyword,中间用 %20 连接,如:https://www.facebook.com/search/top/?q=jinchen%20%E4%BD%8F%E5%8F%8B
搜索条件等都需要通过网页交互实现
## github
- Repositories:https://github.com/search?q={keyword}&type=repositories
- Users:https://github.com/search?q={keyword}&type=users
- Issues:https://github.com/search?q={keyword}&type=issues
- Pull Requests:https://github.com/search?q={keyword}&type=pullrequests
- Code:https://github.com/search?q={keyword}&type=code
- discussions https://github.com/search?q={keyword}&type=discussions
- Wikis: https://github.com/search?q={keyword}&type=wikis
- topics: https://github.com/search?q={keyword}&type=topics
多 keyword,中间用 + 连接,如:https://github.com/search?q=wiseflow+addon&type=topics
### Repositories 支持的搜素条件:
- most stars: &s=stars&o=desc
- fewest stars: &s=stars&o=asc
- most forks: &s=forks&o=desc
- fewest forks: &s=forks&o=asc
- recently updated: &s=updated&o=desc
- latest recently updated: &s=updated&o=asc
### users 支持的搜素条件:
- most followers: &s=followers&o=desc
- fewest followers: &s=followers&o=asc
- most recently joined: &s=joined&o=desc
- least recently joined: &s=joined&o=asc
- most repositories: &s=repositories&o=desc
- fewest repositories: &s=repositories&o=asc
repositories 和 user 搜索都支持添加 语言作为过滤,&l=HTML
如:https://github.com/search?q=wiseflow+language%3AHTML&type=users&s=repositories&o=desc&l=HTML
支持的语言过滤:HTML, CSS, JavaScript, Python, Ruby, Java, C++, PHP, Swift, Go, Kotlin, TypeScript, Rust, Scala, Haskell, Lua, Shell, Dockerfile, JSON, YAML, Markdown, SVG,
================================================
FILE: docs/more_powerful_search_skill/extra/arxiv.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""arXiv is a free distribution service and an open-access archive for nearly
2.4 million scholarly articles in the fields of physics, mathematics, computer
science, quantitative biology, quantitative finance, statistics, electrical
engineering and systems science, and economics.
The engine uses the `arXiv API`_.
.. _arXiv API: https://info.arxiv.org/help/api/user-manual.html
"""
import typing as t
from datetime import datetime
from urllib.parse import urlencode
from lxml import etree
from lxml.etree import XPath
from searx.utils import eval_xpath, eval_xpath_list, eval_xpath_getindex
from searx.result_types import EngineResults
if t.TYPE_CHECKING:
from searx.extended_types import SXNG_Response
from searx.search.processors import OnlineParams
about = {
"website": "https://arxiv.org",
"wikidata_id": "Q118398",
"official_api_documentation": "https://info.arxiv.org/help/api/user-manual.html",
"use_official_api": True,
"require_api_key": False,
"results": "XML-RSS",
}
categories = ["science", "scientific publications"]
paging = True
arxiv_max_results = 10
arxiv_search_prefix = "all"
"""Search fields, for more details see, `Details of Query Construction`_.
.. _Details of Query Construction:
https://info.arxiv.org/help/api/user-manual.html#51-details-of-query-construction
"""
base_url = "https://export.arxiv.org/api/query"
"""`arXiv API`_ URL, for more details see Query-Interface_
.. _Query-Interface: https://info.arxiv.org/help/api/user-manual.html#_query_interface
"""
arxiv_namespaces = {
"atom": "http://www.w3.org/2005/Atom",
"arxiv": "http://arxiv.org/schemas/atom",
}
xpath_entry = XPath("//atom:entry", namespaces=arxiv_namespaces)
xpath_title = XPath(".//atom:title", namespaces=arxiv_namespaces)
xpath_id = XPath(".//atom:id", namespaces=arxiv_namespaces)
xpath_summary = XPath(".//atom:summary", namespaces=arxiv_namespaces)
xpath_author_name = XPath(".//atom:author/atom:name", namespaces=arxiv_namespaces)
xpath_doi = XPath(".//arxiv:doi", namespaces=arxiv_namespaces)
xpath_pdf = XPath(".//atom:link[@title='pdf']", namespaces=arxiv_namespaces)
xpath_published = XPath(".//atom:published", namespaces=arxiv_namespaces)
xpath_journal = XPath(".//arxiv:journal_ref", namespaces=arxiv_namespaces)
xpath_category = XPath(".//atom:category/@term", namespaces=arxiv_namespaces)
xpath_comment = XPath("./arxiv:comment", namespaces=arxiv_namespaces)
def request(query: str, params: "OnlineParams") -> None:
args = {
"search_query": f"{arxiv_search_prefix}:{query}",
"start": (params["pageno"] - 1) * arxiv_max_results,
"max_results": arxiv_max_results,
}
params["url"] = f"{base_url}?{urlencode(args)}"
def response(resp: "SXNG_Response") -> EngineResults:
res = EngineResults()
dom = etree.fromstring(resp.content)
for entry in eval_xpath_list(dom, xpath_entry):
title: str = eval_xpath_getindex(entry, xpath_title, 0).text
url: str = eval_xpath_getindex(entry, xpath_id, 0).text
abstract: str = eval_xpath_getindex(entry, xpath_summary, 0).text
authors: list[str] = [author.text for author in eval_xpath_list(entry, xpath_author_name)]
# doi
doi_element = eval_xpath_getindex(entry, xpath_doi, 0, default=None)
doi: str = "" if doi_element is None else doi_element.text
# pdf
pdf_element = eval_xpath_getindex(entry, xpath_pdf, 0, default=None)
pdf_url: str = "" if pdf_element is None else pdf_element.attrib.get("href")
# journal
journal_element = eval_xpath_getindex(entry, xpath_journal, 0, default=None)
journal: str = "" if journal_element is None else journal_element.text
# tags
tag_elements = eval_xpath(entry, xpath_category)
tags: list[str] = [str(tag) for tag in tag_elements]
# comments
comments_elements = eval_xpath_getindex(entry, xpath_comment, 0, default=None)
comments: str = "" if comments_elements is None else comments_elements.text
publishedDate = datetime.strptime(eval_xpath_getindex(entry, xpath_published, 0).text, "%Y-%m-%dT%H:%M:%SZ")
res.add(
res.types.Paper(
url=url,
title=title,
publishedDate=publishedDate,
content=abstract,
doi=doi,
authors=authors,
journal=journal,
tags=tags,
comments=comments,
pdf_url=pdf_url,
)
)
return res
================================================
FILE: docs/more_powerful_search_skill/extra/baidu.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Baidu_
.. _Baidu: https://www.baidu.com
"""
# There exits a https://github.com/ohblue/baidu-serp-api/
# but we don't use it here (may we can learn from).
from urllib.parse import urlencode
from datetime import datetime
from html import unescape
import time
import json
from searx.exceptions import SearxEngineAPIException, SearxEngineCaptchaException
from searx.utils import html_to_text
about = {
"website": "https://www.baidu.com",
"wikidata_id": "Q14772",
"official_api_documentation": None,
"use_official_api": False,
"require_api_key": False,
"results": "JSON",
"language": "zh",
}
paging = True
categories = []
results_per_page = 10
baidu_category = 'general'
time_range_support = True
time_range_dict = {"day": 86400, "week": 604800, "month": 2592000, "year": 31536000}
def init(_):
if baidu_category not in ('general', 'images', 'it'):
raise SearxEngineAPIException(f"Unsupported category: {baidu_category}")
def request(query, params):
page_num = params["pageno"]
category_config = {
'general': {
'endpoint': 'https://www.baidu.com/s',
'params': {
"wd": query,
"rn": results_per_page,
"pn": (page_num - 1) * results_per_page,
"tn": "json",
},
},
'images': {
'endpoint': 'https://image.baidu.com/search/acjson',
'params': {
"word": query,
"rn": results_per_page,
"pn": (page_num - 1) * results_per_page,
"tn": "resultjson_com",
},
},
'it': {
'endpoint': 'https://kaifa.baidu.com/rest/v1/search',
'params': {
"wd": query,
"pageSize": results_per_page,
"pageNum": page_num,
"paramList": f"page_num={page_num},page_size={results_per_page}",
"position": 0,
},
},
}
query_params = category_config[baidu_category]['params']
query_url = category_config[baidu_category]['endpoint']
if params.get("time_range") in time_range_dict:
now = int(time.time())
past = now - time_range_dict[params["time_range"]]
if baidu_category == 'general':
query_params["gpc"] = f"stf={past},{now}|stftype=1"
if baidu_category == 'it':
query_params["paramList"] += f",timestamp_range={past}-{now}"
params["url"] = f"{query_url}?{urlencode(query_params)}"
params["allow_redirects"] = False
return params
def response(resp):
# Detect Baidu Captcha, it will redirect to wappass.baidu.com
if 'wappass.baidu.com/static/captcha' in resp.headers.get('Location', ''):
raise SearxEngineCaptchaException()
text = resp.text
if baidu_category == 'images':
# baidu's JSON encoder wrongly quotes / and ' characters by \\ and \'
text = text.replace(r"\/", "/").replace(r"\'", "'")
data = json.loads(text, strict=False)
parsers = {'general': parse_general, 'images': parse_images, 'it': parse_it}
return parsers[baidu_category](data)
def parse_general(data):
results = []
if not data.get("feed", {}).get("entry"):
raise SearxEngineAPIException("Invalid response")
for entry in data["feed"]["entry"]:
if not entry.get("title") or not entry.get("url"):
continue
published_date = None
if entry.get("time"):
try:
published_date = datetime.fromtimestamp(entry["time"])
except (ValueError, TypeError):
published_date = None
# title and content sometimes containing characters such as & ' " etc...
title = unescape(entry["title"])
content = unescape(entry.get("abs", ""))
results.append(
{
"title": title,
"url": entry["url"],
"content": content,
"publishedDate": published_date,
}
)
return results
def parse_images(data):
results = []
if "data" in data:
for item in data["data"]:
if not item:
# the last item in the JSON list is empty, the JSON string ends with "}, {}]"
continue
replace_url = item.get("replaceUrl", [{}])[0]
width = item.get("width")
height = item.get("height")
img_date = item.get("bdImgnewsDate")
publishedDate = None
if img_date:
publishedDate = datetime.strptime(img_date, "%Y-%m-%d %H:%M")
results.append(
{
"template": "images.html",
"url": replace_url.get("FromURL"),
"thumbnail_src": item.get("thumbURL"),
"img_src": replace_url.get("ObjURL"),
"title": html_to_text(item.get("fromPageTitle")),
"source": item.get("fromURLHost"),
"resolution": f"{width} x {height}",
"img_format": item.get("type"),
"filesize": item.get("filesize"),
"publishedDate": publishedDate,
}
)
return results
def parse_it(data):
results = []
if not data.get("data", {}).get("documents", {}).get("data"):
raise SearxEngineAPIException("Invalid response")
for entry in data["data"]["documents"]["data"]:
results.append(
{
'title': entry["techDocDigest"]["title"],
'url': entry["techDocDigest"]["url"],
'content': entry["techDocDigest"]["summary"],
}
)
return results
================================================
FILE: docs/more_powerful_search_skill/extra/bing.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""This is the implementation of the Bing-WEB engine. Some of this
implementations are shared by other engines:
- :ref:`bing images engine`
- :ref:`bing news engine`
- :ref:`bing videos engine`
On the `preference page`_ Bing offers a lot of languages an regions (see section
LANGUAGE and COUNTRY/REGION). The Language is the language of the UI, we need
in SearXNG to get the translations of data such as *"published last week"*.
There is a description of the official search-APIs_, unfortunately this is not
the API we can use or that bing itself would use. You can look up some things
in the API to get a better picture of bing, but the value specifications like
the market codes are usually outdated or at least no longer used by bing itself.
The market codes have been harmonized and are identical for web, video and
images. The news area has also been harmonized with the other categories. Only
political adjustments still seem to be made -- for example, there is no news
category for the Chinese market.
.. _preference page: https://www.bing.com/account/general
.. _search-APIs: https://learn.microsoft.com/en-us/bing/search-apis/
"""
# pylint: disable=too-many-branches, invalid-name
import base64
import re
import time
from urllib.parse import parse_qs, urlencode, urlparse
import babel
import babel.languages
from lxml import html
from searx.enginelib.traits import EngineTraits
from searx.exceptions import SearxEngineAPIException
from searx.locales import language_tag, region_tag
from searx.utils import eval_xpath, eval_xpath_getindex, eval_xpath_list, extract_text
about = {
"website": "https://www.bing.com",
"wikidata_id": "Q182496",
"official_api_documentation": "https://www.microsoft.com/en-us/bing/apis/bing-web-search-api",
"use_official_api": False,
"require_api_key": False,
"results": "HTML",
}
# engine dependent config
categories = ["general", "web"]
paging = True
max_page = 200
"""200 pages maximum (``&first=1991``)"""
time_range_support = True
safesearch = True
"""Bing results are always SFW. To get NSFW links from bing some age
verification by a cookie is needed / thats not possible in SearXNG.
"""
base_url = "https://www.bing.com/search"
"""Bing (Web) search URL"""
def _page_offset(pageno):
return (int(pageno) - 1) * 10 + 1
def set_bing_cookies(params, engine_language, engine_region):
params["cookies"]["_EDGE_CD"] = f"m={engine_region}&u={engine_language}"
params["cookies"]["_EDGE_S"] = f"mkt={engine_region}&ui={engine_language}"
logger.debug("bing cookies: %s", params["cookies"])
def request(query, params):
"""Assemble a Bing-Web request."""
engine_region = traits.get_region(params["searxng_locale"], traits.all_locale) # type: ignore
engine_language = traits.get_language(params["searxng_locale"], "en") # type: ignore
set_bing_cookies(params, engine_language, engine_region)
page = params.get("pageno", 1)
query_params = {
"q": query,
# if arg 'pq' is missed, sometimes on page 4 we get results from page 1,
# don't ask why it is only sometimes / its M$ and they have never been
# deterministic ;)
"pq": query,
}
# To get correct page, arg first and this arg FORM is needed, the value PERE
# is on page 2, on page 3 its PERE1 and on page 4 its PERE2 .. and so forth.
# The 'first' arg should never send on page 1.
if page > 1:
query_params["first"] = _page_offset(page) # see also arg FORM
if page == 2:
query_params["FORM"] = "PERE"
elif page > 2:
query_params["FORM"] = "PERE%s" % (page - 2)
params["url"] = f"{base_url}?{urlencode(query_params)}"
if params.get("time_range"):
unix_day = int(time.time() / 86400)
time_ranges = {
"day": "1",
"week": "2",
"month": "3",
"year": f"5_{unix_day - 365}_{unix_day}",
}
params["url"] += f'&filters=ex1:"ez{time_ranges[params["time_range"]]}"'
# in some regions where geoblocking is employed (e.g. China),
# www.bing.com redirects to the regional version of Bing
params["allow_redirects"] = True
return params
def response(resp):
# pylint: disable=too-many-locals
results = []
result_len = 0
dom = html.fromstring(resp.text)
# parse results again if nothing is found yet
for result in eval_xpath_list(dom, '//ol[@id="b_results"]/li[contains(@class, "b_algo")]'):
link = eval_xpath_getindex(result, ".//h2/a", 0, None)
if link is None:
continue
url = link.attrib.get("href")
title = extract_text(link)
content = eval_xpath(result, ".//p")
for p in content:
# Make sure that the element is free of:
# <span class="algoSlug_icon" # data-priority="2">Web</span>
for e in p.xpath('.//span[@class="algoSlug_icon"]'):
e.getparent().remove(e)
content = extract_text(content)
# get the real URL
if url.startswith("https://www.bing.com/ck/a?"):
# get the first value of u parameter
url_query = urlparse(url).query
parsed_url_query = parse_qs(url_query)
param_u = parsed_url_query["u"][0]
# remove "a1" in front
encoded_url = param_u[2:]
# add padding
encoded_url = encoded_url + "=" * (-len(encoded_url) % 4)
# decode base64 encoded URL
url = base64.urlsafe_b64decode(encoded_url).decode()
# append result
results.append({"url": url, "title": title, "content": content})
# get number_of_results
if results:
result_len_container = "".join(eval_xpath(dom, '//span[@class="sb_count"]//text()'))
if "-" in result_len_container:
start_str, result_len_container = re.split(r"-\d+", result_len_container)
start = int(start_str)
else:
start = 1
result_len_container = re.sub("[^0-9]", "", result_len_container)
if len(result_len_container) > 0:
result_len = int(result_len_container)
expected_start = _page_offset(resp.search_params.get("pageno", 1))
if expected_start != start:
if expected_start > result_len:
# Avoid reading more results than available.
# For example, if there is 100 results from some search and we try to get results from 120 to 130,
# Bing will send back the results from 0 to 10 and no error.
# If we compare results count with the first parameter of the request we can avoid this "invalid"
# results.
return []
# Sometimes Bing will send back the first result page instead of the requested page as a rate limiting
# measure.
msg = f"Expected results to start at {expected_start}, but got results starting at {start}"
raise SearxEngineAPIException(msg)
results.append({"number_of_results": result_len})
return results
def fetch_traits(engine_traits: EngineTraits):
"""Fetch languages and regions from Bing-Web."""
# pylint: disable=import-outside-toplevel
from searx.network import get # see https://github.com/searxng/searxng/issues/762
from searx.utils import gen_useragent
headers = {
"User-Agent": gen_useragent(),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "en-US;q=0.5,en;q=0.3",
"DNT": "1",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-GPC": "1",
"Cache-Control": "max-age=0",
}
resp = get("https://www.bing.com/account/general", headers=headers, timeout=5)
if not resp.ok:
raise RuntimeError("Response from Bing is not OK.")
dom = html.fromstring(resp.text)
# languages
engine_traits.languages["zh"] = "zh-hans"
map_lang = {"prs": "fa-AF", "en": "en-us"}
bing_ui_lang_map = {
# HINT: this list probably needs to be supplemented
"en": "us", # en --> en-us
"da": "dk", # da --> da-dk
}
for href in eval_xpath(dom, '//div[@id="language-section-content"]//div[@class="languageItem"]/a/@href'):
eng_lang = parse_qs(urlparse(href).query)["setlang"][0]
babel_lang = map_lang.get(eng_lang, eng_lang)
try:
sxng_tag = language_tag(babel.Locale.parse(babel_lang.replace("-", "_")))
except babel.UnknownLocaleError:
print("ERROR: language (%s) is unknown by babel" % (babel_lang))
continue
# Language (e.g. 'en' or 'de') from https://www.bing.com/account/general
# is converted by bing to 'en-us' or 'de-de'. But only if there is not
# already a '-' delemitter in the language. For instance 'pt-PT' -->
# 'pt-pt' and 'pt-br' --> 'pt-br'
bing_ui_lang = eng_lang.lower()
if "-" not in bing_ui_lang:
bing_ui_lang = bing_ui_lang + "-" + bing_ui_lang_map.get(bing_ui_lang, bing_ui_lang)
conflict = engine_traits.languages.get(sxng_tag)
if conflict:
if conflict != bing_ui_lang:
print(f"CONFLICT: babel {sxng_tag} --> {conflict}, {bing_ui_lang}")
continue
engine_traits.languages[sxng_tag] = bing_ui_lang
# regions (aka "market codes")
engine_traits.regions["zh-CN"] = "zh-cn"
map_market_codes = {
"zh-hk": "en-hk", # not sure why, but at M$ this is the market code for Hongkong
}
for href in eval_xpath(dom, '//div[@id="region-section-content"]//div[@class="regionItem"]/a/@href'):
cc_tag = parse_qs(urlparse(href).query)["cc"][0]
if cc_tag == "clear":
engine_traits.all_locale = cc_tag
continue
# add market codes from official languages of the country ..
for lang_tag in babel.languages.get_official_languages(cc_tag, de_facto=True):
if lang_tag not in engine_traits.languages.keys():
# print("ignore lang: %s <-- %s" % (cc_tag, lang_tag))
continue
lang_tag = lang_tag.split("_")[0] # zh_Hant --> zh
market_code = f"{lang_tag}-{cc_tag}" # zh-tw
market_code = map_market_codes.get(market_code, market_code)
sxng_tag = region_tag(babel.Locale.parse("%s_%s" % (lang_tag, cc_tag.upper())))
conflict = engine_traits.regions.get(sxng_tag)
if conflict:
if conflict != market_code:
print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, market_code))
continue
engine_traits.regions[sxng_tag] = market_code
================================================
FILE: docs/more_powerful_search_skill/extra/bing_images.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Bing-Images: description see :py:obj:`searx.engines.bing`."""
# pylint: disable=invalid-name
import json
from urllib.parse import urlencode
from lxml import html
from searx.engines.bing import set_bing_cookies
from searx.engines.bing import fetch_traits # pylint: disable=unused-import
# about
about = {
"website": 'https://www.bing.com/images',
"wikidata_id": 'Q182496',
"official_api_documentation": 'https://www.microsoft.com/en-us/bing/apis/bing-image-search-api',
"use_official_api": False,
"require_api_key": False,
"results": 'HTML',
}
# engine dependent config
categories = ['images', 'web']
paging = True
safesearch = True
time_range_support = True
base_url = 'https://www.bing.com/images/async'
"""Bing (Images) search URL"""
time_map = {
'day': 60 * 24,
'week': 60 * 24 * 7,
'month': 60 * 24 * 31,
'year': 60 * 24 * 365,
}
def request(query, params):
"""Assemble a Bing-Image request."""
engine_region = traits.get_region(params['searxng_locale'], traits.all_locale) # type: ignore
engine_language = traits.get_language(params['searxng_locale'], 'en') # type: ignore
set_bing_cookies(params, engine_language, engine_region)
# build URL query
# - example: https://www.bing.com/images/async?q=foo&async=content&first=1&count=35
query_params = {
'q': query,
'async': '1',
# to simplify the page count lets use the default of 35 images per page
'first': (int(params.get('pageno', 1)) - 1) * 35 + 1,
'count': 35,
}
# time range
# - example: one year (525600 minutes) 'qft=+filterui:age-lt525600'
if params['time_range']:
query_params['qft'] = 'filterui:age-lt%s' % time_map[params['time_range']]
params['url'] = base_url + '?' + urlencode(query_params)
return params
def response(resp):
"""Get response from Bing-Images"""
results = []
dom = html.fromstring(resp.text)
for result in dom.xpath('//ul[contains(@class, "dgControl_list")]/li'):
metadata = result.xpath('.//a[@class="iusc"]/@m')
if not metadata:
continue
metadata = json.loads(result.xpath('.//a[@class="iusc"]/@m')[0])
title = ' '.join(result.xpath('.//div[@class="infnmpt"]//a/text()')).strip()
img_format = ' '.join(result.xpath('.//div[@class="imgpt"]/div/span/text()')).strip().split(" · ")
source = ' '.join(result.xpath('.//div[@class="imgpt"]//div[@class="lnkw"]//a/text()')).strip()
results.append(
{
'template': 'images.html',
'url': metadata['purl'],
'thumbnail_src': metadata['turl'],
'img_src': metadata['murl'],
'content': metadata.get('desc'),
'title': title,
'source': source,
'resolution': img_format[0],
'img_format': img_format[1] if len(img_format) >= 2 else None,
}
)
return results
================================================
FILE: docs/more_powerful_search_skill/extra/bing_news.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Bing-News: description see :py:obj:`searx.engines.bing`.
.. hint::
Bing News is *different* in some ways!
"""
# pylint: disable=invalid-name
from urllib.parse import urlencode
from lxml import html
from searx.utils import eval_xpath, extract_text, eval_xpath_list, eval_xpath_getindex
from searx.enginelib.traits import EngineTraits
from searx.engines.bing import set_bing_cookies
# about
about = {
"website": 'https://www.bing.com/news',
"wikidata_id": 'Q2878637',
"official_api_documentation": 'https://www.microsoft.com/en-us/bing/apis/bing-news-search-api',
"use_official_api": False,
"require_api_key": False,
"results": 'RSS',
}
# engine dependent config
categories = ['news']
paging = True
"""If go through the pages and there are actually no new results for another
page, then bing returns the results from the last page again."""
time_range_support = True
time_map = {
'day': 'interval="4"',
'week': 'interval="7"',
'month': 'interval="9"',
}
"""A string '4' means *last hour*. We use *last hour* for ``day`` here since the
difference of *last day* and *last week* in the result list is just marginally.
Bing does not have news range ``year`` / we use ``month`` instead."""
base_url = 'https://www.bing.com/news/infinitescrollajax'
"""Bing (News) search URL"""
def request(query, params):
"""Assemble a Bing-News request."""
engine_region = traits.get_region(params['searxng_locale'], traits.all_locale) # type: ignore
engine_language = traits.get_language(params['searxng_locale'], 'en') # type: ignore
set_bing_cookies(params, engine_language, engine_region)
# build URL query
#
# example: https://www.bing.com/news/infinitescrollajax?q=london&first=1
page = int(params.get('pageno', 1)) - 1
query_params = {
'q': query,
'InfiniteScroll': 1,
# to simplify the page count lets use the default of 10 images per page
'first': page * 10 + 1,
'SFX': page,
'form': 'PTFTNR',
'setlang': engine_region.split('-')[0],
'cc': engine_region.split('-')[-1],
}
if params['time_range']:
query_params['qft'] = time_map.get(params['time_range'], 'interval="9"')
params['url'] = base_url + '?' + urlencode(query_params)
return params
def response(resp):
"""Get response from Bing-Video"""
results = []
if not resp.ok or not resp.text:
return results
dom = html.fromstring(resp.text)
for newsitem in eval_xpath_list(dom, '//div[contains(@class, "newsitem")]'):
link = eval_xpath_getindex(newsitem, './/a[@class="title"]', 0, None)
if link is None:
continue
url = link.attrib.get('href')
title = extract_text(link)
content = extract_text(eval_xpath(newsitem, './/div[@class="snippet"]'))
metadata = []
source = eval_xpath_getindex(newsitem, './/div[contains(@class, "source")]', 0, None)
if source is not None:
for item in (
eval_xpath_getindex(source, './/span[@aria-label]/@aria-label', 0, None),
# eval_xpath_getindex(source, './/a', 0, None),
# eval_xpath_getindex(source, './div/span', 3, None),
link.attrib.get('data-author'),
):
if item is not None:
t = extract_text(item)
if t and t.strip():
metadata.append(t.strip())
metadata = ' | '.join(metadata)
thumbnail = None
imagelink = eval_xpath_getindex(newsitem, './/a[@class="imagelink"]//img', 0, None)
if imagelink is not None:
thumbnail = imagelink.attrib.get('src')
if not thumbnail.startswith("https://www.bing.com"):
thumbnail = 'https://www.bing.com/' + thumbnail
results.append(
{
'url': url,
'title': title,
'content': content,
'thumbnail': thumbnail,
'metadata': metadata,
}
)
return results
def fetch_traits(engine_traits: EngineTraits):
"""Fetch languages and regions from Bing-News."""
# pylint: disable=import-outside-toplevel
from searx.engines.bing import fetch_traits as _f
_f(engine_traits)
# fix market codes not known by bing news:
# In bing the market code 'zh-cn' exists, but there is no 'news' category in
# bing for this market. Alternatively we use the the market code from Honk
# Kong. Even if this is not correct, it is better than having no hits at
# all, or sending false queries to bing that could raise the suspicion of a
# bot.
# HINT: 'en-hk' is the region code it does not indicate the language en!!
engine_traits.regions['zh-CN'] = 'en-hk'
================================================
FILE: docs/more_powerful_search_skill/extra/flickr.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""
Flickr (Images)
More info on api-key : https://www.flickr.com/services/apps/create/
"""
from json import loads
from urllib.parse import urlencode
# about
about = {
"website": 'https://www.flickr.com',
"wikidata_id": 'Q103204',
"official_api_documentation": 'https://secure.flickr.com/services/api/flickr.photos.search.html',
"use_official_api": True,
"require_api_key": True,
"results": 'JSON',
}
categories = ['images']
nb_per_page = 15
paging = True
api_key = None
url = (
'https://api.flickr.com/services/rest/?method=flickr.photos.search'
+ '&api_key={api_key}&{text}&sort=relevance'
+ '&extras=description%2C+owner_name%2C+url_o%2C+url_n%2C+url_z'
+ '&per_page={nb_per_page}&format=json&nojsoncallback=1&page={page}'
)
photo_url = 'https://www.flickr.com/photos/{userid}/{photoid}'
paging = True
def build_flickr_url(user_id, photo_id):
return photo_url.format(userid=user_id, photoid=photo_id)
def request(query, params):
params['url'] = url.format(
text=urlencode({'text': query}), api_key=api_key, nb_per_page=nb_per_page, page=params['pageno']
)
return params
def response(resp):
results = []
search_results = loads(resp.text)
# return empty array if there are no results
if 'photos' not in search_results:
return []
if 'photo' not in search_results['photos']:
return []
photos = search_results['photos']['photo']
# parse results
for photo in photos:
if 'url_o' in photo:
img_src = photo['url_o']
elif 'url_z' in photo:
img_src = photo['url_z']
else:
continue
# For a bigger thumbnail, keep only the url_z, not the url_n
if 'url_n' in photo:
thumbnail_src = photo['url_n']
elif 'url_z' in photo:
thumbnail_src = photo['url_z']
else:
thumbnail_src = img_src
# append result
results.append(
{
'url': build_flickr_url(photo['owner'], photo['id']),
'title': photo['title'],
'img_src': img_src,
'thumbnail_src': thumbnail_src,
'content': photo['description']['_content'],
'author': photo['ownername'],
'template': 'images.html',
}
)
# return results
return results
================================================
FILE: docs/more_powerful_search_skill/extra/quark.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Quark (Shenma) search engine for searxng"""
from urllib.parse import urlencode
from datetime import datetime
import re
import json
from searx.utils import html_to_text
from searx.exceptions import SearxEngineAPIException, SearxEngineCaptchaException
# Metadata
about = {
"website": "https://quark.sm.cn/",
"wikidata_id": "Q48816502",
"use_official_api": False,
"require_api_key": False,
"results": "HTML",
"language": "zh",
}
# Engine Configuration
categories = []
paging = True
results_per_page = 10
quark_category = 'general'
time_range_support = True
time_range_dict = {'day': '4', 'week': '3', 'month': '2', 'year': '1'}
CAPTCHA_PATTERN = r'\{[^{]*?"action"\s*:\s*"captcha"\s*,\s*"url"\s*:\s*"([^"]+)"[^{]*?\}'
def is_alibaba_captcha(html):
"""
Detects if the response contains an Alibaba X5SEC CAPTCHA page.
Quark may return a CAPTCHA challenge after 9 requests in a short period.
Typically, the ban duration is around 15 minutes.
"""
return bool(re.search(CAPTCHA_PATTERN, html))
def init(_):
if quark_category not in ('general', 'images'):
raise SearxEngineAPIException(f"Unsupported category: {quark_category}")
def request(query, params):
page_num = params["pageno"]
category_config = {
'general': {
'endpoint': 'https://quark.sm.cn/s',
'params': {
"q": query,
"layout": "html",
"page": page_num,
},
},
'images': {
'endpoint': 'https://vt.sm.cn/api/pic/list',
'params': {
"query": query,
"limit": results_per_page,
"start": (page_num - 1) * results_per_page,
},
},
}
query_params = category_config[quark_category]['params']
query_url = category_config[quark_category]['endpoint']
if time_range_dict.get(params['time_range']) and quark_category == 'general':
query_params["tl_request"] = time_range_dict.get(params['time_range'])
params["url"] = f"{query_url}?{urlencode(query_params)}"
return params
def response(resp):
results = []
text = resp.text
if is_alibaba_captcha(text):
raise SearxEngineCaptchaException(
suspended_time=900, message="Alibaba CAPTCHA detected. Please try again later."
)
if quark_category == 'images':
data = json.loads(text)
for item in data.get('data', {}).get('hit', {}).get('imgInfo', {}).get('item', []):
try:
published_date = datetime.fromtimestamp(int(item.get("publish_time")))
except (ValueError, TypeError):
published_date = None
results.append(
{
"template": "images.html",
"url": item.get("imgUrl"),
"thumbnail_src": item.get("img"),
"img_src": item.get("bigPicUrl"),
"title": item.get("title"),
"source": item.get("site"),
"resolution": f"{item['width']} x {item['height']}",
"publishedDate": published_date,
}
)
if quark_category == 'general':
# Quark returns a variety of different sc values on a single page, depending on the query type.
source_category_parsers = {
'addition': parse_addition,
'ai_page': parse_ai_page,
'baike_sc': parse_baike_sc,
'finance_shuidi': parse_finance_shuidi,
'kk_yidian_all': parse_kk_yidian_all,
'life_show_general_image': parse_life_show_general_image,
'med_struct': parse_med_struct,
'music_new_song': parse_music_new_song,
'nature_result': parse_nature_result,
'news_uchq': parse_news_uchq,
'ss_note': parse_ss_note,
# ss_kv, ss_pic, ss_text, ss_video, baike, structure_web_novel use the same struct as ss_doc
'ss_doc': parse_ss_doc,
'ss_kv': parse_ss_doc,
'ss_pic': parse_ss_doc,
'ss_text': parse_ss_doc,
'ss_video': parse_ss_doc,
'baike': parse_ss_doc,
'structure_web_novel': parse_ss_doc,
'travel_dest_overview': parse_travel_dest_overview,
'travel_ranking_list': parse_travel_ranking_list,
}
pattern = r'<script\s+type="application/json"\s+id="s-data-[^"]+"\s+data-used-by="hydrate">(.*?)</script>'
matches = re.findall(pattern, text, re.DOTALL)
for match in matches:
data = json.loads(match)
initial_data = data.get('data', {}).get('initialData', {})
extra_data = data.get('extraData', {})
source_category = extra_data.get('sc')
parsers = source_category_parsers.get(source_category)
if parsers:
parsed_results = parsers(initial_data)
if isinstance(parsed_results, list):
# Extend if the result is a list
results.extend(parsed_results)
else:
# Append if it's a single result
results.append(parsed_results)
return results
def parse_addition(data):
return {
"title": html_to_text(data.get('title', {}).get('content')),
"url": data.get('source', {}).get('url'),
"content": html_to_text(data.get('summary', {}).get('content')),
}
def parse_ai_page(data):
results = []
for item in data.get('list', []):
content = (
" | ".join(map(str, item.get('content', [])))
if isinstance(item.get('content'), list)
else str(item.get('content'))
)
try:
published_date = datetime.fromtimestamp(int(item.get('source', {}).get('time')))
except (ValueError, TypeError):
published_date = None
results.append(
{
"title": html_to_text(item.get('title')),
"url": item.get('url'),
"content": html_to_text(content),
"publishedDate": published_date,
}
)
return results
def parse_baike_sc(data):
return {
"title": html_to_text(data.get('data', {}).get('title')),
"url": data.get('data', {}).get('url'),
"content": html_to_text(data.get('data', {}).get('abstract')),
"thumbnail": data.get('data', {}).get('img').replace("http://", "https://"),
}
def parse_finance_shuidi(data):
content = " | ".join(
(
info
for info in [
data.get('establish_time'),
data.get('company_status'),
data.get('controled_type'),
data.get('company_type'),
data.get('capital'),
data.get('address'),
data.get('business_scope'),
]
if info
)
)
return {
"title": html_to_text(data.get('company_name')),
"url": data.get('title_url'),
"content": html_to_text(content),
}
def parse_kk_yidian_all(data):
content_list = []
for section in data.get('list_container', []):
for item in section.get('list_container', []):
if 'dot_text' in item:
content_list.append(item['dot_text'])
return {
"title": html_to_text(data.get('title')),
"url": data.get('title_url'),
"content": html_to_text(' '.join(content_list)),
}
def parse_life_show_general_image(data):
results = []
for item in data.get('image', []):
try:
published_date = datetime.fromtimestamp(int(item.get("publish_time")))
except (ValueError, TypeError):
published_date = None
results.append(
{
"template": "images.html",
"url": item.get("imgUrl"),
"thumbnail_src": item.get("img"),
"img_src": item.get("bigPicUrl"),
"title": item.get("title"),
"source": item.get("site"),
"resolution": f"{item['width']} x {item['height']}",
"publishedDate": published_date,
}
)
return results
def parse_med_struct(data):
return {
"title": html_to_text(data.get('title')),
"url": data.get('message', {}).get('statistics', {}).get('nu'),
"content": html_to_text(data.get('message', {}).get('content_text')),
"thumbnail": data.get('message', {}).get('video_img').replace("http://", "https://"),
}
def parse_music_new_song(data):
results = []
for item in data.get('hit3', []):
results.append(
{
"title": f"{item['song_name']} | {item['song_singer']}",
"url": item.get("play_url"),
"content": html_to_text(item.get("lyrics")),
"thumbnail": item.get("image_url").replace("http://", "https://"),
}
)
return results
def parse_nature_result(data):
return {"title": html_to_text(data.get('title')), "url": data.get('url'), "content": html_to_text(data.get('desc'))}
def parse_news_uchq(data):
results = []
for item in data.get('feed', []):
try:
published_date = datetime.strptime(item.get('time'), "%Y-%m-%d")
except (ValueError, TypeError):
# Sometime Quark will return non-standard format like "1天前", set published_date as None
published_date = None
results.append(
{
"title": html_to_text(item.get('title')),
"url": item.get('url'),
"content": html_to_text(item.get('summary')),
"thumbnail": item.get('image').replace("http://", "https://"),
"publishedDate": published_date,
}
)
return results
def parse_ss_doc(data):
published_date = None
try:
timestamp = int(data.get('sourceProps', {}).get('time'))
# Sometime Quark will return 0, set published_date as None
if timestamp != 0:
published_date = datetime.fromtimestamp(timestamp)
except (ValueError, TypeError):
pass
try:
thumbnail = data.get('picListProps', [])[0].get('src').replace("http://", "https://")
except (ValueError, TypeError, IndexError):
thumbnail = None
return {
"title": html_to_text(
data.get('titleProps', {}).get('content')
# ss_kv variant 1 & 2
or data.get('title')
),
"url": data.get('sourceProps', {}).get('dest_url')
# ss_kv variant 1
or data.get('normal_url')
# ss_kv variant 2
or data.get('url'),
"content": html_to_text(
data.get('summaryProps', {}).get('content')
# ss_doc variant 1
or data.get('message', {}).get('replyContent')
# ss_kv variant 1
or data.get('show_body')
# ss_kv variant 2
or data.get('desc')
),
"publishedDate": published_date,
"thumbnail": thumbnail,
}
def parse_ss_note(data):
try:
published_date = datetime.fromtimestamp(int(data.get('source', {}).get('time')))
except (ValueError, TypeError):
published_date = None
return {
"title": html_to_text(data.get('title', {}).get('content')),
"url": data.get('source', {}).get('dest_url'),
"content": html_to_text(data.get('summary', {}).get('content')),
"publishedDate": published_date,
}
def parse_travel_dest_overview(data):
return {
"title": html_to_text(data.get('strong', {}).get('title')),
"url": data.get('strong', {}).get('baike_url'),
"content": html_to_text(data.get('strong', {}).get('baike_text')),
}
def parse_travel_ranking_list(data):
return {
"title": html_to_text(data.get('title', {}).get('text')),
"url": data.get('title', {}).get('url'),
"content": html_to_text(data.get('title', {}).get('title_tag')),
}
================================================
FILE: docs/more_powerful_search_skill/extra/wikipedia.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""This module implements the Wikipedia engine. Some of this implementations
are shared by other engines:
- :ref:`wikidata engine`
The list of supported languages is :py:obj:`fetched <fetch_wikimedia_traits>` from
the article linked by :py:obj:`list_of_wikipedias`.
Unlike traditional search engines, wikipedia does not support one Wikipedia for
all languages, but there is one Wikipedia for each supported language. Some of
these Wikipedias have a LanguageConverter_ enabled
(:py:obj:`rest_v1_summary_url`).
A LanguageConverter_ (LC) is a system based on language variants that
automatically converts the content of a page into a different variant. A variant
is mostly the same language in a different script.
- `Wikipedias in multiple writing systems`_
- `Automatic conversion between traditional and simplified Chinese characters`_
PR-2554_:
The Wikipedia link returned by the API is still the same in all cases
(`https://zh.wikipedia.org/wiki/出租車`_) but if your browser's
``Accept-Language`` is set to any of ``zh``, ``zh-CN``, ``zh-TW``, ``zh-HK``
or .. Wikipedia's LC automatically returns the desired script in their
web-page.
- You can test the API here: https://reqbin.com/gesg2kvx
.. _https://zh.wikipedia.org/wiki/出租車:
https://zh.wikipedia.org/wiki/%E5%87%BA%E7%A7%9F%E8%BB%8A
To support Wikipedia's LanguageConverter_, a SearXNG request to Wikipedia uses
:py:obj:`get_wiki_params` and :py:obj:`wiki_lc_locale_variants' in the
:py:obj:`fetch_wikimedia_traits` function.
To test in SearXNG, query for ``!wp 出租車`` with each of the available Chinese
options:
- ``!wp 出租車 :zh`` should show 出租車
- ``!wp 出租車 :zh-CN`` should show 出租车
- ``!wp 出租車 :zh-TW`` should show 計程車
- ``!wp 出租車 :zh-HK`` should show 的士
- ``!wp 出租車 :zh-SG`` should show 德士
.. _LanguageConverter:
https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter
.. _Wikipedias in multiple writing systems:
https://meta.wikimedia.org/wiki/Wikipedias_in_multiple_writing_systems
.. _Automatic conversion between traditional and simplified Chinese characters:
https://en.wikipedia.org/wiki/Chinese_Wikipedia#Automatic_conversion_between_traditional_and_simplified_Chinese_characters
.. _PR-2554: https://github.com/searx/searx/pull/2554
"""
import urllib.parse
import babel
from lxml import html
from searx import locales, utils
from searx import network as _network
from searx.enginelib.traits import EngineTraits
# about
about = {
"website": "https://www.wikipedia.org/",
"wikidata_id": "Q52",
"official_api_documentation": "https://en.wikipedia.org/api/",
"use_official_api": True,
"require_api_key": False,
"results": "JSON",
}
display_type = ["infobox"]
"""A list of display types composed from ``infobox`` and ``list``. The latter
one will add a hit to the result list. The first one will show a hit in the
info box. Both values can be set, or one of the two can be set."""
list_of_wikipedias = "https://meta.wikimedia.org/wiki/List_of_Wikipedias"
"""`List of all wikipedias <https://meta.wikimedia.org/wiki/List_of_Wikipedias>`_
"""
wikipedia_article_depth = "https://meta.wikimedia.org/wiki/Wikipedia_article_depth"
"""The *editing depth* of Wikipedia is one of several possible rough indicators
of the encyclopedia's collaborative quality, showing how frequently its articles
are updated. The measurement of depth was introduced after some limitations of
the classic measurement of article count were realized.
"""
rest_v1_summary_url = "https://{wiki_netloc}/api/rest_v1/page/summary/{title}"
"""
`wikipedia rest_v1 summary API`_:
The summary response includes an extract of the first paragraph of the page in
plain text and HTML as well as the type of page. This is useful for page
previews (fka. Hovercards, aka. Popups) on the web and link previews in the
apps.
HTTP ``Accept-Language`` header (``send_accept_language_header``):
The desired language variant code for wikis where LanguageConverter_ is
enabled.
.. _wikipedia rest_v1 summary API:
https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_summary__title_
"""
wiki_lc_locale_variants = {
"zh": (
"zh-CN",
"zh-HK",
"zh-MO",
"zh-MY",
"zh-SG",
"zh-TW",
),
"zh-classical": ("zh-classical",),
}
"""Mapping rule of the LanguageConverter_ to map a language and its variants to
a Locale (used in the HTTP ``Accept-Language`` header). For example see `LC
Chinese`_.
.. _LC Chinese:
https://meta.wikimedia.org/wiki/Wikipedias_in_multiple_writing_systems#Chinese
"""
wikipedia_script_variants = {
"zh": (
"zh_Hant",
"zh_Hans",
)
}
def get_wiki_params(sxng_locale, eng_traits):
"""Returns the Wikipedia language tag and the netloc that fits to the
``sxng_locale``. To support LanguageConverter_ this function rates a locale
(region) higher than a language (compare :py:obj:`wiki_lc_locale_variants`).
"""
eng_tag = eng_traits.get_region(sxng_locale, eng_traits.get_language(sxng_locale, "en"))
wiki_netloc = eng_traits.custom["wiki_netloc"].get(eng_tag, "en.wikipedia.org")
return eng_tag, wiki_netloc
def request(query, params):
"""Assemble a request (`wikipedia rest_v1 summary API`_)."""
if query.islower():
query = query.title()
_eng_tag, wiki_netloc = get_wiki_params(params["searxng_locale"], traits)
title = urllib.parse.quote(query)
params["url"] = rest_v1_summary_url.format(wiki_netloc=wiki_netloc, title=title)
params["raise_for_httperror"] = False
params["soft_max_redirects"] = 2
return params
# get response from search-request
def response(resp):
results = []
if resp.status_code == 404:
return []
if resp.status_code == 400:
try:
api_result = resp.json()
except Exception: # pylint: disable=broad-except
pass
else:
if (
api_result["type"] == "https://mediawiki.org/wiki/HyperSwitch/errors/bad_request"
and api_result["detail"] == "title-invalid-characters"
):
return []
_network.raise_for_httperror(resp)
api_result = resp.json()
title = utils.html_to_text(api_result.get("titles", {}).get("display") or api_result.get("title"))
wikipedia_link = api_result["content_urls"]["desktop"]["page"]
if "list" in display_type or api_result.get("type") != "standard":
# show item in the result list if 'list' is in the display options or it
# is a item that can't be displayed in a infobox.
results.append(
{
"url": wikipedia_link,
"title": title,
"content": api_result.get("description", ""),
}
)
if "infobox" in display_type:
if api_result.get("type") == "standard":
results.append(
{
"infobox": title,
"id": wikipedia_link,
"content": api_result.get("extract", ""),
"img_src": api_result.get("thumbnail", {}).get("source"),
"urls": [{"title": "Wikipedia", "url": wikipedia_link}],
}
)
return results
# Nonstandard language codes
#
# These Wikipedias use language codes that do not conform to the ISO 639
# standard (which is how wiki subdomains are chosen nowadays).
lang_map = locales.LOCALE_BEST_MATCH.copy()
lang_map.update(
{
"be-tarask": "bel",
"ak": "aka",
"als": "gsw",
"bat-smg": "sgs",
"cbk-zam": "cbk",
"fiu-vro": "vro",
"map-bms": "map",
"no": "nb-NO",
"nrm": "nrf",
"roa-rup": "rup",
"nds-nl": "nds",
#'simple: – invented code used for the Simple English Wikipedia (not the official IETF code en-simple)
"zh-min-nan": "nan",
"zh-yue": "yue",
"an": "arg",
}
)
def fetch_traits(engine_traits: EngineTraits):
fetch_wikimedia_traits(engine_traits)
print("WIKIPEDIA_LANGUAGES: %s" % len(engine_traits.custom["WIKIPEDIA_LANGUAGES"]))
def fetch_wikimedia_traits(engine_traits: EngineTraits):
"""Fetch languages from Wikipedia. Not all languages from the
:py:obj:`list_of_wikipedias` are supported by SearXNG locales, only those
known from :py:obj:`searx.locales.LOCALE_NAMES` or those with a minimal
:py:obj:`editing depth <wikipedia_article_depth>`.
The location of the Wikipedia address of a language is mapped in a
:py:obj:`custom field <searx.enginelib.traits.EngineTraits.custom>`
(``wiki_netloc``). Here is a reduced example:
.. code:: python
traits.custom['wiki_netloc'] = {
"en": "en.wikipedia.org",
..
"gsw": "als.wikipedia.org",
..
"zh": "zh.wikipedia.org",
"zh-classical": "zh-classical.wikipedia.org"
}
"""
# pylint: disable=import-outside-toplevel, too-many-branches
from searx.network import get # see https://github.com/searxng/searxng/issues/762
from searx.utils import searxng_useragent
engine_traits.custom["wiki_netloc"] = {}
engine_traits.custom["WIKIPEDIA_LANGUAGES"] = []
# insert alias to map from a script or region to a wikipedia variant
for eng_tag, sxng_tag_list in wikipedia_script_variants.items():
for sxng_tag in sxng_tag_list:
engine_traits.languages[sxng_tag] = eng_tag
for eng_tag, sxng_tag_list in wiki_lc_locale_variants.items():
for sxng_tag in sxng_tag_list:
engine_traits.regions[sxng_tag] = eng_tag
headers = {"Accept": "*/*", "User-Agent": searxng_useragent()}
resp = get(list_of_wikipedias, timeout=5, headers=headers)
if not resp.ok:
raise RuntimeError("Response from Wikipedia is not OK.")
dom = html.fromstring(resp.text)
for row in dom.xpath('//table[contains(@class,"sortable")]//tbody/tr'):
cols = row.xpath("./td")
if not cols:
continue
cols = [c.text_content().strip() for c in cols]
depth = float(cols[11].replace("-", "0").replace(",", ""))
articles = int(cols[4].replace(",", "").replace(",", ""))
eng_tag = cols[3]
wiki_url = row.xpath("./td[4]/a/@href")[0]
wiki_url = urllib.parse.urlparse(wiki_url)
try:
sxng_tag = locales.language_tag(babel.Locale.parse(lang_map.get(eng_tag, eng_tag), sep="-"))
except babel.UnknownLocaleError:
# print("ERROR: %s [%s] is unknown by babel" % (cols[0], eng_tag))
continue
finally:
engine_traits.custom["WIKIPEDIA_LANGUAGES"].append(eng_tag)
if sxng_tag not in locales.LOCALE_NAMES:
if articles < 10000:
# exclude languages with too few articles
continue
if int(depth) < 20:
# Rough indicator of a Wikipedia’s quality, showing how
# frequently its articles are updated.
continue
conflict = engine_traits.languages.get(sxng_tag)
if conflict:
if conflict != eng_tag:
print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, eng_tag))
continue
engine_traits.languages[sxng_tag] = eng_tag
engine_traits.custom["wiki_netloc"][eng_tag] = wiki_url.netloc
engine_traits.custom["WIKIPEDIA_LANGUAGES"].sort()
================================================
FILE: docs/more_powerful_search_skill/extra/youtube_noapi.py
================================================
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Youtube (Videos)"""
from functools import reduce
from json import loads, dumps
from urllib.parse import quote_plus
from searx.utils import extr
# about
about = {
"website": 'https://www.youtube.com/',
"wikidata_id": 'Q866',
"official_api_documentation": 'https://developers.google.com/youtube/v3/docs/search/list?apix=true',
"use_official_api": False,
"require_api_key": False,
"results": 'HTML',
}
# engine dependent config
categories = ['videos', 'music']
paging = True
language_support = False
time_range_support = True
# search-url
base_url = 'https://www.youtube.com/results'
search_url = base_url + '?search_query={query}&page={page}'
time_range_url = '&sp=EgII{time_range}%253D%253D'
# the key seems to be constant
next_page_url = f'https://www.youtube.com/youtubei/v1/search?key={key}'
time_range_dict = {'day': 'Ag', 'week': 'Aw', 'month': 'BA', 'year': 'BQ'}
base_youtube_url = 'https://www.youtube.com/watch?v='
# do search-request
def request(query, params):
params['cookies']['CONSENT'] = "YES+"
if not params['engine_data'].get('next_page_token'):
params['url'] = search_url.format(query=quote_plus(query), page=params['pageno'])
if params['time_range'] in time_range_dict:
params['url'] += time_range_url.format(time_range=time_range_dict[params['time_range']])
else:
params['url'] = next_page_url
params['method'] = 'POST'
params['data'] = dumps(
{
'context': {"client": {"clientName": "WEB", "clientVersion": "2.20210310.12.01"}},
'continuation': params['engine_data']['next_page_token'],
}
)
params['headers']['Content-Type'] = 'application/json'
return params
# get response from search-request
def response(resp):
if resp.search_params.get('engine_data'):
return parse_next_page_response(resp.text)
return parse_first_page_response(resp.text)
def parse_next_page_response(response_text):
results = []
result_json = loads(response_text)
for section in (
result_json['onResponseReceivedCommands'][0]
.get('appendContinuationItemsAction')['continuationItems'][0]
.get('itemSectionRenderer')['contents']
):
if 'videoRenderer' not in section:
continue
section = section['videoRenderer']
content = "-"
if 'descriptionSnippet' in section:
content = ' '.join(x['text'] for x in section['descriptionSnippet']['runs'])
results.append(
{
'url': base_youtube_url + section['videoId'],
'title': ' '.join(x['text'] for x in section['title']['runs']),
'content': content,
'author': section['ownerText']['runs'][0]['text'],
'length': section['lengthText']['simpleText'],
'template': 'videos.html',
'iframe_src': 'https://www.youtube-nocookie.com/embed/' + section['videoId'],
'thumbnail': section['thumbnail']['thumbnails'][-1]['url'],
}
)
try:
token = (
result_json['onResponseReceivedCommands'][0]
.get('appendContinuationItemsAction')['continuationItems'][1]
.get('continuationItemRenderer')['continuationEndpoint']
.get('continuationCommand')['token']
)
results.append(
{
"engine_data": token,
"key": "next_page_token",
}
)
except: # pylint: disable=bare-except
pass
return results
def parse_first_page_response(response_text):
results = []
results_data = extr(response_text, 'ytInitialData = ', ';</script>')
results_json = loads(results_data) if results_data else {}
sections = (
results_json.get('contents', {})
.get('twoColumnSearchResultsRenderer', {})
.get('primaryContents', {})
.get('sectionListRenderer', {})
.get('contents', [])
)
for section in sections:
if "continuationItemRenderer" in section:
next_page_token = (
section["continuationItemRenderer"]
.get("continuationEndpoint", {})
.get("continuationCommand", {})
.get("token", "")
)
if next_page_token:
results.append(
{
"engine_data": next_page_token,
"key": "next_page_token",
}
)
for video_container in section.get('itemSectionRenderer', {}).get('contents', []):
video = video_container.get('videoRenderer', {})
videoid = video.get('videoId')
if videoid is not None:
url = base_youtube_url + videoid
thumbnail = 'https://i.ytimg.com/vi/' + videoid + '/hqdefault.jpg'
title = get_text_from_json(video.get('title', {}))
content = get_text_from_json(video.get('descriptionSnippet', {}))
author = get_text_from_json(video.get('ownerText', {}))
length = get_text_from_json(video.get('lengthText', {}))
# append result
results.append(
{
'url': url,
'title': title,
'content': content,
'author': author,
'length': length,
'template': 'videos.html',
'iframe_src': 'https://www.youtube-nocookie.com/embed/' + videoid,
'thumbnail': thumbnail,
}
)
# return results
return results
def get_text_from_json(element):
if 'runs' in element:
return reduce(lambda a, b: a + b.get('text', ''), element.get('runs'), '')
return element.get('simpleText', '')
================================================
FILE: docs/more_powerful_search_skill/rss_parsor.py
================================================
import httpx
import feedparser
from reference.async_logger import wis_logger
from reference.wis import CrawlResult, SqliteCache
from typing import List, Tuple
from reference.wis.ws_connect import notify_user
from reference.tools.general_utils import normalize_publish_date
import asyncio
async def fetch_rss(url, existings: set=set(), cache_manager: SqliteCache = None) -> Tuple[List[CrawlResult], str, dict]:
entries = None
if cache_manager:
entries = await cache_manager.get(url, namespace='rss')
if entries == '**empty**':
return [], '', {}
if not entries:
max_retries = 3
base_delay = 10 # seconds
for attempt in range(max_retries):
try:
async with httpx.AsyncClient(timeout=30) as client:
response = await client.get(url)
response.raise_for_status()
content = response.content # bytes
break
except Exception as e:
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt)
wis_logger.debug(f"fetching RSS from {url} attempt {attempt + 1} failed with error: {str(e)}, retrying in {delay} seconds")
await asyncio.sleep(delay)
else:
wis_logger.warning(f"fetching RSS from {url} failed after {max_retries} attempts with error: {str(e)}")
await notify_user(15, [url])
return [], '', {}
parsed = feedparser.parse(content)
if parsed.get("bozo", False):
wis_logger.warning(f"Error parsing RSS from {url}: {parsed.get('bozo_exception', '')}")
raise RuntimeError(f"RSS from {url}: {parsed.get('bozo_exception', '')}")
entries = parsed.entries
if cache_manager:
await cache_manager.set(url, entries, 60*24, namespace='rss')
results = []
markdown = ''
link_dict = {}
for entry in entries:
html_parts = []
description = ''
article_url = entry.get('link', url)
if article_url in existings:
continue
# 1. 如果 entry 有 content 字段,遍历每个 content_item
if 'content' in entry and entry['content']:
for content_item in entry['content']:
t = content_item.get('type', '').lower()
if t.startswith('text/') or t == 'application/xhtml+xml':
# 尝试多种字段名
for key in ['value', 'body', 'content']:
if key in content_item:
html_parts.append(content_item[key])
break
# 2. 如果没有 content 字段,尝试 summary 或 description
if not html_parts:
summary = entry.get('summary', '')
description = entry.get('description', '')
if len(summary) > len(description):
description = summary
if len(description) > 50:
html_parts.append(description)
description = ''
if not html_parts and not description:
wis_logger.debug(f"No content or summary or description found for {article_url} from rss: {url}")
continue
# 4. 拼接所有内容为一个整体 html
author = entry.get('author', '')
title = entry.get('title', '')
publish_date = normalize_publish_date(entry.get('published', '')) or entry.get('published', '')
if html_parts:
html = '\n\n'.join(html_parts)
results.append(CrawlResult(
url=article_url,
html=html,
title=title,
author=author,
publish_date=publish_date,
))
# existings.add(article_url) will add when llm extracting finished
elif description and article_url != url:
key = f"[{len(link_dict)+1}]"
link_dict[key] = article_url
markdown += f"* {key}{description} (Author: {author} Publish Date: {publish_date}) {key}\n"
# existings.add(article_url) will add when llm extracting finished
return results, markdown, link_dict
================================================
FILE: docs/prompt_videos.md
================================================
https://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089
================================================
FILE: openclaw.version
================================================
# OpenClaw 上游版本锁定
# 所有 addon 开发者和 CI 均从此文件读取,保证基于同一版本开发和测试
# 格式遵循 openclaw-for-business/openclaw.version 规范
#
# 使用方式(shell):
# source openclaw.version
# git clone https://github.com/openclaw/openclaw openclaw
# git -C openclaw checkout $OPENCLAW_COMMIT
#
OPENCLAW_VERSION=2026.3.13
OPENCLAW_COMMIT=61d171ab0b2fe4abc9afe89c518586274b4b76c2
================================================
FILE: scripts/generate-patch.sh
================================================
#!/bin/bash
set -e
cd "$(dirname "$0")/.."
if [ -z "$1" ]; then
echo "Usage: ./scripts/generate-patch.sh <patch-name>"
exit 1
fi
PATCH_NAME="$1"
PATCHES_DIR="patches"
mkdir -p "$PATCHES_DIR"
cd openclaw
# 获取下一个补丁编号
LAST_NUM=$(ls ../$PATCHES_DIR/*.patch 2>/dev/null | sed 's/.*\/\([0-9]*\)-.*/\1/' | sort -n | tail -1)
NEXT_NUM=$(printf "%03d" $((10#${LAST_NUM:-0} + 1)))
PATCH_FILE="../$PATCHES_DIR/${NEXT_NUM}-${PATCH_NAME}.patch"
echo "📝 Generating patch: $(basename "$PATCH_FILE")"
git diff -- . ':(exclude)pnpm-lock.yaml' > "$PATCH_FILE"
if [ ! -s "$PATCH_FILE" ]; then
echo "⚠️ No changes detected"
rm "$PATCH_FILE"
exit 1
fi
echo "✅ Patch generated: $PATCH_FILE"
================================================
FILE: tests/README.md
================================================
以下测试均要求:
- 在 test 下创建一个 md 文件,记录每次的测试结果(success/fail),并最终整理为一个列表;
- 每次测试获取的快照需要额外保存为本地文件,备查。
# 企业官网与政府网站
分别打开如下页面,**每批并发 6 个页面**,看是否可以正常打开页面,不触发反检测,并可以获取有效快照内容
## 测试用例
```
https://www.komatsu.com/en-us
https://www.putzmeister.com/web/european-union
https://www.liebherr.com/en-hk/group/start-page-5221008
https://www.cat.com/global-selector.html
https://www.bing.com/search?q=%E5%8D%8E%E5%B0%94%E8%A1%97%E6%97%A5%E6%8A%A5
https://www.wsj.com/
https://www.bloomberg.com
https://www.justice.gov/
http://www.china-cer.com.cn/policy_base/
https://zjw.sh.gov.cn/zwgk/index.html#tab2-a
https://fgj.sh.gov.cn/gfxwj/index.html
https://rsj.sh.gov.cn/tgwgfx_17726/index.html
https://ybj.sh.gov.cn/dybz3/index.html
https://www.mohurd.gov.cn/gongkai/fdzdgknr/zgzygwywj/index.html
https://www.shanghai.gov.cn/nw39221/index.html
https://www.shanghai.gov.cn/nw39220/index.html
https://www.shanghai.gov.cn/nw11408/index.html
https://www.shanghai.gov.cn/nw11407/index.html
https://www.shanghai.gov.cn/nw2407/index.html
https://www.shanghai.gov.cn/nw42850/index.html
https://www.shanghai.gov.cn/nw42944/index.html
https://www.gov.cn/zhengce/index.htm
```
# 搜索引擎压力测试
打开搜索页并做压力测试,**每批并发 6 个页面**,看是否能够正常返回并成功获取快照内容
# 网站预登录功能模拟测试
打开 https://www.wsj.com/opinion/donald-trump-tariffs-ieepa-supreme-court-john-roberts-opinion-e2610d81
能够侦测到页面存在登录元素,提醒用户完成登录,之后再次访问该页面,能够正常访问,并成功获取内容(正文长度出现变化,大于第一次获取的)
# 社交媒体获取
## 1. 非登录指定主页获取
1.1 打开下面第一个站点,看是否触发了反侦测,以及是否可以获取页面内容。(这些内容都是可以不登录进行获取的)
1.2 在打开的页面里面随便选一个帖子,点进入,看是否可以获取详情,包括评论,如果触发了登录验证,则提示用户完成登录,之后再次访问该页面,能够正常访问,并成功获取内容(正文长度出现变化,大于第一次获取的)
1.3 重复以上步骤逐个测试每个站点
```
https://x.com/valormental
https://www.facebook.com/andrea.sow.31
https://www.linkedin.com/in/baoqiangliu/
https://www.instagram.com/elisameliani/
https://space.bilibili.com/3546603057056627
https://mp.weixin.qq.com/s/Duij3Z2vrImLuOzanqbgbA
https://www.douyin.com/user/MS4wLjABAAAAXUpP_zAelVixv3zv_sWINae86Dt0FMPRZyuozH8MmhbBjvgoDg_xq3Lqnwlacelc
https://www.kuaishou.com/profile/3xvwve5yerjsvvg
https://m.weibo.cn/profile/2194035935
https://www.xiaohongshu.com/user/profile/5f035b1c0000000001002389?xsec_token=ABti9cMRn3S9ARpTWxqiy-5oHI9_QXq50-5qjiSm8emMk=&xsec_source=pc_feed
https://www.zhihu.com/people/lingzezhao
https://discord.com/servers/midjourney-662267976984297473
```
## 2. 搜索
2.1 依次打开如下站点,看是否触发了反侦测,以及是否可以获取页面内容,如果检测到需要登录,则提示用户完成登录,之后再次访问该页面,能够正常访问,并成功获取内容(正文长度出现变化,大于第一次获取的);
2.2 在打开的页面里面随便选一个帖子,点进入,看是否可以获取详情,包括评论,如果触发了登录验证,则提示用户完成登录,之后再次访问该页面,能够正常访问,并成功获取内容(正文长度出现变化,大于第一次获取的)
2.3 重复以上步骤逐个测试每个站点
```
https://x.com/search?q=OpenClaw
https://www.facebook.com/search/posts/?q=OpenClaw
https://www.linkedin.com/search/results/all/?keywords=openclaw
https://www.instagram.com/explore/tags/OpenClaw/
https://search.bilibili.com/all?keyword=openclaw
https://www.douyin.com/search/OpenClaw?type=user
https://www.douyin.com/search/OpenClaw?type=video
https://www.kuaishou.com/search/video?searchKey=openclaw
https://m.weibo.cn/search?containerid=100103type%3D1%26q%3Dopenclaw
https://www.xiaohongshu.com/search_result?keyword=openclaw
https://www.zhihu.com/search?q=openclaw&type=content
https://www.zhihu.com/search?q=openclaw&type=people
https://www.zhihu.com/search?q=openclaw&type=zvideo
```
## 输出结果
每次运行会生成目录:`browser_test/results/<runId>/`
- `report.md`:汇总报告(success/fail/blocked)
- `cases/<caseId>.json`:单用例详情(耗时、错误、登录校验结果)
- `snapshots/<caseId>.txt`:登录前 AI 快照
- `snapshots/<caseId>_after.txt`:登录后 AI 快照(仅登录流程触发时)
---
# Managed Browser 自动化测试(推荐)
脚本路径:`browser_test/run-managed-tests.mjs`
使用 **openclaw-managed browser**,通过以下四个 CLI 命令驱动测试,无需 Chrome 扩展:
```
openclaw browser --browser-profile openclaw status
openclaw browser --browser-profile openclaw start
openclaw browser --browser-profile openclaw open <url>
openclaw browser --browser-profile openclaw snapshot
```
## 运行命令
```bash
# 快速验证(8 个代表性用例,~5 分钟)
node browser_test/run-managed-tests.mjs --mode smoke
# 完整测试(企业/政府/搜索/新闻,含预登录交互,~25 分钟)
node browser_test/run-managed-tests.mjs --mode full
# 全量测试(包含社交媒体,含多处交互登录提示)
node browser_test/run-managed-tests.mjs --mode social
```
## 可选参数
| 参数 | 说明 | 默认值 |
|------|------|-------|
| `--mode <smoke\|full\|social>` | 测试范围 | `smoke` |
| `--profile <name>` | browser profile | `openclaw` |
| `--stabilizeMs <ms>` | 打开页面后等待稳定的时间 | `4000` |
| `--timeoutMs <ms>` | 单条命令超时 | `60000` |
| `--outputDir <dir>` | 结果根目录 | `browser_test/results` |
## 运行前准备
先启动 openclaw gateway(保持运行):
```bash
./scripts/dev.sh gateway
```
无需 Chrome 扩展,managed browser 由 openclaw 自动管理。
## 输出结构
```
browser_test/results/<RUN_ID>_managed/
report.md 汇总报告(success/blocked/partial/error)
cases/<id>.json 单用例详情(耗时、快照字节数、内容分析结果)
snapshots/<id>.txt 页面 AI 快照(snapshot --format ai --mode efficient)
snapshots/<id>_after.txt 登录后 AI 快照(仅预登录测试触发时)
```
所有判断(是否触发反爬、是否存在登录墙、内容是否充足)均基于提取到的 AI 快照文本。
## 并发策略
- 企业官网、政府网站、搜索引擎:每批并发 6 个页面
- 新闻媒体、预登录、社交媒体:串行逐个测试(避免登录/状态互相干扰)
## 登录测试行为
`full` / `social` 模式下,遇到预登录测试用例(如 WSJ 文章):
1. 先抓取一次内容(before);
2. 若检测到登录墙,命令行暂停等待手动登录;
3. 回车后再次抓取(after);
4. 内容增长 > 20% 且 > 200 chars 则判定登录成功。
================================================
FILE: tests/run-managed-tests.mjs
================================================
#!/usr/bin/env node
/**
* OpenClaw Managed Browser — Automated Test Suite
*
* Opens each test URL with the openclaw-managed browser, extracts AI snapshot
* content via native `snapshot --format ai --mode detailed`, saves it to disk,
* and analyses the snapshot text for
* anti-bot / login-wall signals.
*
* Commands used:
* openclaw browser --browser-profile openclaw status
* openclaw browser --browser-profile openclaw start
* openclaw browser --browser-profile openclaw open <url>
* openclaw browser --browser-profile openclaw snapshot --format ai --mode detailed
*
* Usage:
* node browser_test/run-managed-tests.mjs [options]
*
* Options:
* --mode <smoke|full|social> scope (default: smoke)
* --profile <name> browser profile (default: openclaw)
* --stabilizeMs <ms> wait after open before extracting HTML (default: 4000)
* --timeoutMs <ms> CLI timeout per command (default: 60000)
* --outputDir <dir> results root (default: browser_test/results)
*
* Modes:
* smoke — 8 representative cases, no login prompts (~5 min)
* full — all README section-1/2/3 cases + pre-login interactive (~25 min)
* social — full + social media (multiple interactive login prompts)
*/
import { spawnSync } from 'node:child_process';
import { mkdirSync, writeFileSync } from 'node:fs';
import { join, dirname, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { createInterface } from 'node:readline';
const __dirname = dirname(fileURLToPath(import.meta.url));
const PROJECT_ROOT = resolve(__dirname, '..');
const OPENCLAW_DIR = join(PROJECT_ROOT, 'openclaw');
// ── Argument parsing ──────────────────────────────────────────────────────────
const argv = process.argv.slice(2);
const getArg = (k, d) => { const i = argv.indexOf(`--${k}`); return i >= 0 && argv[i+1] !== undefined ? argv[i+1] : d; };
const MODE = getArg('mode', 'smoke');
const PROFILE = getArg('profile', 'openclaw');
const STABILIZE_MS = parseInt(getArg('stabilizeMs', '4000'), 10);
const TIMEOUT_MS = parseInt(getArg('timeoutMs', '60000'), 10);
const OUTPUT_DIR = getArg('outputDir', join(PROJECT_ROOT, 'browser_test', 'results'));
// ── Run directory ─────────────────────────────────────────────────────────────
const RUN_ID = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19) + '_managed';
const RUN_DIR = join(OUTPUT_DIR, RUN_ID);
const CASES = join(RUN_DIR, 'cases');
const SNAPSHOTS = join(RUN_DIR, 'snapshots');
mkdirSync(CASES, { recursive: true });
mkdirSync(SNAPSHOTS, { recursive: true });
// ── Environment for openclaw processes (uses default ~/.openclaw) ────────────
const OC_ENV = { ...process.env };
// ── openclaw CLI wrapper ──────────────────────────────────────────────────────
function oc(args, { ms = TIMEOUT_MS, safe = false } = {}) {
const res = spawnSync('pnpm', ['openclaw', ...args], {
cwd: OPENCLAW_DIR,
env: OC_ENV,
timeout: ms,
encoding: 'utf8',
maxBuffer: 30 * 1024 * 1024,
});
const out = (res.stdout ?? '').trim();
const err = (res.stderr ?? '').trim();
if (!safe && res.status !== 0) {
throw new Error(err || out || `openclaw exited ${res.status}`);
}
return { ok: res.status === 0, out, err };
}
/** Extract first valid JSON object from CLI output (skips logo / header lines). */
function extractJson(text) {
if (!text) return null;
const raw = text.trim();
try { return JSON.parse(raw); } catch { /* continue */ }
// Try parsing from first JSON opener to each possible closing index.
const firstObj = raw.indexOf('{');
const firstArr = raw.indexOf('[');
const starts = [firstObj, firstArr].filter(i => i >= 0).sort((a, b) => a - b);
for (const start of starts) {
for (let end = raw.length; end > start; end--) {
const ch = raw[end - 1];
if (ch !== '}' && ch !== ']') continue;
const candidate = raw.slice(start, end).trim();
try { return JSON.parse(candidate); } catch { /* try shorter tail */ }
}
}
// Last fallback: parse any single JSON line (for compact outputs).
for (const line of raw.split('\n').reverse()) {
const t = line.trim();
if (!t) continue;
if (t.startsWith('{') || t.startsWith('[') || t.startsWith('"')) {
try { return JSON.parse(t); } catch { /* try next line */ }
}
}
return null;
}
const sleep = (ms) => new Promise(r => setTimeout(r, ms));
function waitForEnter(msg) {
return new Promise(resolve => {
const rl = createInterface({ input: process.stdin, output: process.stdout });
rl.question(msg, () => { rl.close(); resolve(); });
});
}
// ── Snapshot-based content analysis ───────────────────────────────────────────
// Pick a best-effort title from an AI snapshot line like:
// - heading "Some Title" [level=1] [ref=e1]
function extractTitle(snapshot) {
const lines = String(snapshot ?? '').split('\n');
for (const line of lines) {
if (!line.includes('heading')) continue;
const m = line.match(/"([^"]{1,200})"/);
if (m && m[1]) return m[1].trim();
}
return '';
}
// Convert snapshot tree text to plain text for keyword matching.
function snapshotToText(snapshot) {
return String(snapshot ?? '')
.replace(/\[[^\]]+\]/g, ' ')
.replace(/-\s+/g, ' ')
.replace(/\s+/g, ' ')
.trim();
}
const BOT_PATTERNS = [
/captcha/i,
/are you (a )?human/i,
/verify you(\'re| are)/i,
/robot check/i,
/security check/i,
/ddos.{0,20}protection/i,
/cloudflare ray id/i,
/just a moment/i,
/checking your browser/i,
/access denied/i,
/403 forbidden/i,
/request blocked/i,
/you('ve| have) been blocked/i,
/请完成安全验证/,
/人机验证/,
/请证明您不是机器人/,
];
const LOGIN_PATTERNS = [
/sign in to continue/i,
/log in to continue/i,
/subscribe to (read|continue|access)/i,
/create (an? )?account/i,
/登录后(才能|可以|方可)/,
/请先登录/,
/立即登录/,
/(注册|登录)享受更多/,
];
function analyzeSnapshot(snapshot) {
const title = extractTitle(snapshot);
const text = snapshotToText(snapshot);
const lower = text.toLowerCase();
const botBlocked = BOT_PATTERNS.some(re => re.test(lower));
// Login wall: softer signal — only flag if bot not triggered
const loginWall = !botBlocked && LOGIN_PATTERNS.some(re => re.test(text));
return {
title,
snapshotChars: String(snapshot ?? '').length,
textChars: text.length,
rich: text.length >= 300, // meaningful snapshot content
botBlocked,
loginWall,
};
}
// ── Test case definitions ─────────────────────────────────────────────────────
// Fields:
// id — unique id used in filenames
// sec — display section
// url — target URL
// login — true: login wall is expected/normal (not a failure)
// loginTest — pause for manual login then re-extract HTML (pre-login scenario)
// socialOnly — only run in 'social' mode
const ALL_CASES = [
// ── 企业官网 ──────────────────────────────────────────────────────────────
{ id: '1.01_komatsu', sec: '企业官网', url: 'https://www.komatsu.com/en-us' },
{ id: '1.02_putzmeister', sec: '企业官网', url: 'https://www.putzmeister.com/web/european-union' },
{ id: '1.03_liebherr', sec: '企业官网', url: 'https://www.liebherr.com/en-hk/group/start-page-5221008' },
{ id: '1.04_cat', sec: '企业官网', url: 'https://www.cat.com/global-selector.html' },
// ── 新闻媒体 ──────────────────────────────────────────────────────────────
{ id: '2.01_wsj', sec: '新闻媒体', url: 'https://www.wsj.com/', login: true },
{ id: '2.02_bloomberg', sec: '新闻媒体', url: 'https://www.bloomberg.com' },
// ── 政府网站(美国)──────────────────────────────────────────────────────
{ id: '3.01_justice', sec: '政府网站', url: 'https://www.justice.gov/' },
// ── 政府网站(中国)──────────────────────────────────────────────────────
{ id: '3.02_china_cer', sec: '政府网站', url: 'http://www.china-cer.com.cn/policy_base/' },
{ id: '3.03_zjw_sh', sec: '政府网站', url: 'https://zjw.sh.gov.cn/zwgk/index.html#tab2-a' },
{ id: '3.04_fgj_sh', sec: '政府网站', url: 'https://fgj.sh.gov.cn/gfxwj/index.html' },
{ id: '3.05_rsj_sh', sec: '政府网站', url: 'https://rsj.sh.gov.cn/tgwgfx_17726/index.html' },
{ id: '3.06_ybj_sh', sec: '政府网站', url: 'https://ybj.sh.gov.cn/dybz3/index.html' },
{ id: '3.07_mohurd', sec: '政府网站', url: 'https://www.mohurd.gov.cn/gongkai/fdzdgknr/zgzygwywj/index.html' },
{ id: '3.08_sh_nw39221', sec: '政府网站', url: 'https://www.shanghai.gov.cn/nw39221/index.html' },
{ id: '3.09_sh_nw39220', sec: '政府网站', url: 'https://www.shanghai.gov.cn/nw39220/index.html' },
{ id: '3.10_sh_nw11408', sec: '政府网站', url: 'https://www.shanghai.gov.cn/nw11408/index.html' },
{ id: '3.11_sh_nw11407', sec: '政府网站', url: 'https://www.shanghai.gov.cn/nw11407/index.html' },
{ id: '3.12_sh_nw2407', sec: '政府网站', url: 'https://www.shanghai.gov.cn/nw2407/index.html' },
{ id: '3.13_sh_nw42850', sec: '政府网站', url: 'https://www.shanghai.gov.cn/nw42850/index.html' },
{ id: '3.14_sh_nw42944', sec: '政府网站', url: 'https://www.shanghai.gov.cn/nw42944/index.html' },
{ id: '3.15_gov_cn', sec: '政府网站', url: 'https://www.gov.cn/zhengce/index.htm' },
// ── 搜索引擎压力测试 ──────────────────────────────────────────────────────
{ id: '4.01_bing_wsj', sec: '搜索引擎', url: 'https://www.bing.com/search?q=%E5%8D%8E%E5%B0%94%E8%A1%97%E6%97%A5%E6%8A%A5' },
{ id: '4.02_bing_cat', sec: '搜索引擎', url: 'https://www.bing.com/search?q=caterpillar+heavy+equipment' },
{ id: '4.03_bing_policy', sec: '搜索引擎', url: 'https://www.bing.com/search?q=%E4%B8%8A%E6%B5%B7+%E5%BB%BA%E8%AE%BE%E5%B7%A5%E7%A8%8B+%E6%94%BF%E7%AD%96' },
// ── 预登录功能测试(full+ 模式,交互)───────────────────────────────────
{ id: '5.01_wsj_article', sec: '预登录测试', login: true, loginTest: true,
url: 'https://www.wsj.com/opinion/donald-trump-tariffs-ieepa-supreme-court-john-roberts-opinion-e2610d81' },
// ── 社交媒体 — 主页(social 模式)────────────────────────────────────────
{ id: '6.01_x', sec: '社交媒体', login: true, socialOnly: true, url: 'https://x.com/valormental' },
{ id: '6.02_facebook', sec: '社交媒体', login: true, socialOnly: true, url: 'https://www.facebook.com/andrea.sow.31' },
{ id: '6.03_linkedin', sec: '社交媒体', login: true, socialOnly: true, url: 'https://www.linkedin.com/in/baoqiangliu/' },
{ id: '6.04_instagram', sec: '社交媒体', login: true, socialOnly: true, url: 'https://www.instagram.com/elisameliani/' },
{ id: '6.05_bilibili', sec: '社交媒体', socialOnly: true, url: 'https://space.bilibili.com/3546603057056627' },
{ id: '6.06_weixin', sec: '社交媒体', socialOnly: true, url: 'https://mp.weixin.qq.com/s/Duij3Z2vrImLuOzanqbgbA' },
{ id: '6.07_douyin', sec: '社交媒体', login: true, socialOnly: true, url: 'https://www.douyin.com/user/MS4wLjABAAAAXUpP_zAelVixv3zv_sWINae86Dt0FMPRZyuozH8MmhbBjvgoDg_xq3Lqnwlacelc' },
{ id: '6.08_kuaishou', sec: '社交媒体', socialOnly: true, url: 'https://www.kuaishou.com/profile/3xvwve5yerjsvvg' },
{ id: '6.09_weibo', sec: '社交媒体', socialOnly: true, url: 'https://m.weibo.cn/profile/2194035935' },
{ id: '6.10_xiaohongshu', sec: '社交媒体', socialOnly: true, url: 'https://www.xiaohongshu.com/user/profile/5f035b1c0000000001002389?xsec_token=ABti9cMRn3S9ARpTWxqiy-5oHI9_QXq50-5qjiSm8emMk=&xsec_source=pc_feed' },
{ id: '6.11_zhihu', sec: '社交媒体', socialOnly: true, url: 'https://www.zhihu.com/people/lingzezhao' },
{ id: '6.12_discord', sec: '社交媒体', login: true, socialOnly: true, url: 'https://discord.com/servers/midjourn
gitextract_7o0qgem0/
├── .claude/
│ └── 20260307_done.md
├── .github/
│ └── workflows/
│ ├── ci.yml
│ └── release.yml
├── .gitignore
├── CHANGELOG.md
├── CLAUDE.md
├── LICENSE
├── README.md
├── README_AR.md
├── README_DE.md
├── README_EN.md
├── README_FR.md
├── README_JP.md
├── README_KR.md
├── docs/
│ ├── anti-detection-research.md
│ ├── more_powerful_search_skill/
│ │ ├── 20260308_done.md
│ │ ├── direct_url_for_search_on_media_platform.md
│ │ ├── extra/
│ │ │ ├── arxiv.py
│ │ │ ├── baidu.py
│ │ │ ├── bing.py
│ │ │ ├── bing_images.py
│ │ │ ├── bing_news.py
│ │ │ ├── flickr.py
│ │ │ ├── quark.py
│ │ │ ├── wikipedia.py
│ │ │ └── youtube_noapi.py
│ │ └── rss_parsor.py
│ └── prompt_videos.md
├── openclaw.version
├── scripts/
│ └── generate-patch.sh
├── tests/
│ ├── README.md
│ └── run-managed-tests.mjs
├── version
└── wiseflow/
├── README.md
├── addon.json
├── crew/
│ └── new-media-editor/
│ ├── AGENTS.md
│ ├── ALLOWED_COMMANDS
│ ├── BOOTSTRAP.md
│ ├── BUILTIN_SKILLS
│ ├── DENIED_SKILLS
│ ├── HEARTBEAT.md
│ ├── IDENTITY.md
│ ├── MEMORY.md
│ ├── SOUL.md
│ ├── TASKS.md
│ ├── TOOLS.md
│ ├── USER.md
│ └── skills/
│ ├── siliconflow-img-gen/
│ │ ├── SKILL.md
│ │ └── scripts/
│ │ └── gen.py
│ ├── siliconflow-video-gen/
│ │ ├── SKILL.md
│ │ └── scripts/
│ │ └── gen.py
│ └── wenyan-formatter/
│ ├── SKILL.md
│ └── scripts/
│ └── format.sh
├── overrides.sh
├── patches/
│ ├── 001-browser-tab-recovery.patch
│ ├── 002-disable-web-search-env-var.patch
│ ├── 003-act-field-validation.patch
│ └── 004-web-fetch-allow-rfc2544.patch
└── skills/
├── browser-guide/
│ └── SKILL.md
├── rss-reader/
│ ├── SKILL.md
│ ├── package.json
│ └── scripts/
│ └── fetch-rss.mjs
└── smart-search/
└── SKILL.md
SYMBOL INDEX (92 symbols across 13 files)
FILE: docs/more_powerful_search_skill/extra/arxiv.py
function request (line 68) | def request(query: str, params: "OnlineParams") -> None:
function response (line 78) | def response(resp: "SXNG_Response") -> EngineResults:
FILE: docs/more_powerful_search_skill/extra/baidu.py
function init (line 39) | def init(_):
function request (line 44) | def request(query, params):
function response (line 96) | def response(resp):
function parse_general (line 111) | def parse_general(data):
function parse_images (line 142) | def parse_images(data):
function parse_it (line 173) | def parse_it(data):
FILE: docs/more_powerful_search_skill/extra/bing.py
function _page_offset (line 68) | def _page_offset(pageno):
function set_bing_cookies (line 72) | def set_bing_cookies(params, engine_language, engine_region):
function request (line 78) | def request(query, params):
function response (line 124) | def response(resp):
function fetch_traits (line 198) | def fetch_traits(engine_traits: EngineTraits):
FILE: docs/more_powerful_search_skill/extra/bing_images.py
function request (line 39) | def request(query, params):
function response (line 67) | def response(resp):
FILE: docs/more_powerful_search_skill/extra/bing_news.py
function request (line 50) | def request(query, params):
function response (line 81) | def response(resp):
function fetch_traits (line 134) | def fetch_traits(engine_traits: EngineTraits):
FILE: docs/more_powerful_search_skill/extra/flickr.py
function build_flickr_url (line 39) | def build_flickr_url(user_id, photo_id):
function request (line 43) | def request(query, params):
function response (line 50) | def response(resp):
FILE: docs/more_powerful_search_skill/extra/quark.py
function is_alibaba_captcha (line 35) | def is_alibaba_captcha(html):
function init (line 46) | def init(_):
function request (line 51) | def request(query, params):
function response (line 83) | def response(resp):
function parse_addition (line 162) | def parse_addition(data):
function parse_ai_page (line 170) | def parse_ai_page(data):
function parse_baike_sc (line 195) | def parse_baike_sc(data):
function parse_finance_shuidi (line 204) | def parse_finance_shuidi(data):
function parse_kk_yidian_all (line 227) | def parse_kk_yidian_all(data):
function parse_life_show_general_image (line 241) | def parse_life_show_general_image(data):
function parse_med_struct (line 264) | def parse_med_struct(data):
function parse_music_new_song (line 273) | def parse_music_new_song(data):
function parse_nature_result (line 287) | def parse_nature_result(data):
function parse_news_uchq (line 291) | def parse_news_uchq(data):
function parse_ss_doc (line 312) | def parse_ss_doc(data):
function parse_ss_note (line 353) | def parse_ss_note(data):
function parse_travel_dest_overview (line 367) | def parse_travel_dest_overview(data):
function parse_travel_ranking_list (line 375) | def parse_travel_ranking_list(data):
FILE: docs/more_powerful_search_skill/extra/wikipedia.py
function get_wiki_params (line 136) | def get_wiki_params(sxng_locale, eng_traits):
function request (line 147) | def request(query, params):
function response (line 163) | def response(resp):
function fetch_traits (line 239) | def fetch_traits(engine_traits: EngineTraits):
function fetch_wikimedia_traits (line 244) | def fetch_wikimedia_traits(engine_traits: EngineTraits):
FILE: docs/more_powerful_search_skill/extra/youtube_noapi.py
function request (line 38) | def request(query, params):
function response (line 59) | def response(resp):
function parse_next_page_response (line 65) | def parse_next_page_response(response_text):
function parse_first_page_response (line 110) | def parse_first_page_response(response_text):
function get_text_from_json (line 167) | def get_text_from_json(element):
FILE: docs/more_powerful_search_skill/rss_parsor.py
function fetch_rss (line 11) | async def fetch_rss(url, existings: set=set(), cache_manager: SqliteCach...
FILE: tests/run-managed-tests.mjs
constant PROJECT_ROOT (line 39) | const PROJECT_ROOT = resolve(__dirname, '..');
constant OPENCLAW_DIR (line 40) | const OPENCLAW_DIR = join(PROJECT_ROOT, 'openclaw');
constant MODE (line 46) | const MODE = getArg('mode', 'smoke');
constant PROFILE (line 47) | const PROFILE = getArg('profile', 'openclaw');
constant STABILIZE_MS (line 48) | const STABILIZE_MS = parseInt(getArg('stabilizeMs', '4000'), 10);
constant TIMEOUT_MS (line 49) | const TIMEOUT_MS = parseInt(getArg('timeoutMs', '60000'), 10);
constant OUTPUT_DIR (line 50) | const OUTPUT_DIR = getArg('outputDir', join(PROJECT_ROOT, 'browser_t...
constant RUN_ID (line 53) | const RUN_ID = new Date().toISOString().replace(/[:.]/g, '-').slice(0, ...
constant RUN_DIR (line 54) | const RUN_DIR = join(OUTPUT_DIR, RUN_ID);
constant CASES (line 55) | const CASES = join(RUN_DIR, 'cases');
constant SNAPSHOTS (line 56) | const SNAPSHOTS = join(RUN_DIR, 'snapshots');
constant OC_ENV (line 62) | const OC_ENV = { ...process.env };
function oc (line 65) | function oc(args, { ms = TIMEOUT_MS, safe = false } = {}) {
function extractJson (line 82) | function extractJson(text) {
function waitForEnter (line 115) | function waitForEnter(msg) {
function extractTitle (line 125) | function extractTitle(snapshot) {
function snapshotToText (line 136) | function snapshotToText(snapshot) {
constant BOT_PATTERNS (line 144) | const BOT_PATTERNS = [
constant LOGIN_PATTERNS (line 163) | const LOGIN_PATTERNS = [
function analyzeSnapshot (line 174) | function analyzeSnapshot(snapshot) {
constant ALL_CASES (line 201) | const ALL_CASES = [
constant SMOKE_IDS (line 272) | const SMOKE_IDS = new Set([
function buildRunList (line 280) | function buildRunList() {
constant RUN_LIST (line 291) | const RUN_LIST = buildRunList();
constant PARALLEL_SECTIONS (line 292) | const PARALLEL_SECTIONS = new Set(['企业官网', '政府网站', '搜索引擎']);
constant PARALLEL_LIMIT (line 293) | const PARALLEL_LIMIT = 6;
function extractSnapshotFromOutput (line 296) | function extractSnapshotFromOutput(out) {
function fetchSnapshotAI (line 350) | async function fetchSnapshotAI({ ms = TIMEOUT_MS, targetId = null } = {}) {
function parseOpenedTargetId (line 358) | function parseOpenedTargetId(text) {
function fetchCurrentTab (line 372) | function fetchCurrentTab() {
function runCase (line 392) | async function runCase(tc) {
function writeReport (line 518) | function writeReport(results) {
function main (line 579) | async function main() {
FILE: wiseflow/crew/new-media-editor/skills/siliconflow-img-gen/scripts/gen.py
function build_payload (line 18) | def build_payload(args):
function api_request (line 42) | def api_request(payload, api_key):
function download_image (line 62) | def download_image(url, dest_path):
function main (line 68) | def main():
FILE: wiseflow/crew/new-media-editor/skills/siliconflow-video-gen/scripts/gen.py
function post_json (line 27) | def post_json(url, payload, api_key, timeout=60):
function submit_job (line 47) | def submit_job(payload, api_key):
function poll_until_done (line 56) | def poll_until_done(request_id, api_key, poll_interval, timeout):
function download_video (line 76) | def download_video(url, dest_path):
function main (line 84) | def main():
Condensed preview — 63 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (306K chars).
[
{
"path": ".claude/20260307_done.md",
"chars": 619,
"preview": "# 1、按 README.md 更新其他语种 readme\n\n# 2、更新 .github/workflows/ 的 release 流程\n\n## 触发机制:\n\n**upstream**(TeamWiseflow 正式仓库)每次合并 PR "
},
{
"path": ".github/workflows/ci.yml",
"chars": 2151,
"preview": "name: CI\n\non:\n pull_request:\n branches: [master]\n types: [opened, synchronize, reopened] # 明确排除 closed\n\n# 同一 PR/"
},
{
"path": ".github/workflows/release.yml",
"chars": 4469,
"preview": "name: Auto Release\n\non:\n pull_request_target:\n types: [closed]\n branches: [master]\n workflow_dispatch:\n input"
},
{
"path": ".gitignore",
"chars": 317,
"preview": "# node\nnode_modules/\npackage-lock.json\n\n# default ignore\n/shelf/\n/workspace.xml\n.DS_Store\n.idea/\n__pycache__\n.env\n.venv/"
},
{
"path": "CHANGELOG.md",
"chars": 11231,
"preview": "# v5.0\n\nupgrage workflow to Agent!\n\n# v4.32\n- bug fix;\n\n- import error\\can not work when use rss souces only.\n\n- update "
},
{
"path": "CLAUDE.md",
"chars": 3925,
"preview": "# CLAUDE.md\n\nThis file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.\n\n## "
},
{
"path": "LICENSE",
"chars": 2285,
"preview": "# Open Source License\n\nwiseflow is licensed under a modified version of the Apache License 2.0, with the following addit"
},
{
"path": "README.md",
"chars": 6917,
"preview": "# Wiseflow\n\n**[English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [Français]"
},
{
"path": "README_AR.md",
"chars": 10461,
"preview": "<div dir=\"rtl\">\n\n# Wiseflow\n\n**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | ["
},
{
"path": "README_DE.md",
"chars": 12212,
"preview": "# Wiseflow\n\n**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Français](README_"
},
{
"path": "README_EN.md",
"chars": 11283,
"preview": "# Wiseflow\n\n**[中文](README.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [Français](README_"
},
{
"path": "README_FR.md",
"chars": 12597,
"preview": "# Wiseflow\n\n**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [한국어](README_KR.md) | [Deutsch](README_D"
},
{
"path": "README_JP.md",
"chars": 7563,
"preview": "# Wiseflow\n\n**[中文](README.md) | [English](README_EN.md) | [한국어](README_KR.md) | [Deutsch](README_DE.md) | [Français](REA"
},
{
"path": "README_KR.md",
"chars": 7536,
"preview": "# Wiseflow\n\n**[中文](README.md) | [English](README_EN.md) | [日本語](README_JP.md) | [Deutsch](README_DE.md) | [Français](REA"
},
{
"path": "docs/anti-detection-research.md",
"chars": 11742,
"preview": "# 浏览器自动化反检测方案调研报告\n\n> 调研日期:2026-02-20\n> 目标:评估 rebrowser-patches 与 patchright 两个方案,为 OpenClaw 集成反检测能力提供技术路线\n\n---\n\n## 一、问题背"
},
{
"path": "docs/more_powerful_search_skill/20260308_done.md",
"chars": 417,
"preview": "# 目的一\n\n为 wiseflow add-on 增加一个 skill,旨在让 Agent 通过使用 skill 可以更好的操作浏览器完成各种搜索任务。替换 openclaw 内置的 web_search 工具。\n\n## 实现方案\n\n解析用"
},
{
"path": "docs/more_powerful_search_skill/direct_url_for_search_on_media_platform.md",
"chars": 6486,
"preview": "# 自媒体平台的搜索\n\n## bilibili(哔哩哔哩,简称:b站):\n\nhttps://search.bilibili.com/{channel}?keyword={keyword}\n\nkeyword 多个的话,之间用 + 连接\n\nch"
},
{
"path": "docs/more_powerful_search_skill/extra/arxiv.py",
"chars": 4621,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"arXiv is a free distribution service and an open-access archive for near"
},
{
"path": "docs/more_powerful_search_skill/extra/baidu.py",
"chars": 5828,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"Baidu_\n\n.. _Baidu: https://www.baidu.com\n\"\"\"\n\n# There exits a https://gi"
},
{
"path": "docs/more_powerful_search_skill/extra/bing.py",
"chars": 10846,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"This is the implementation of the Bing-WEB engine. Some of this\nimplemen"
},
{
"path": "docs/more_powerful_search_skill/extra/bing_images.py",
"chars": 3047,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"Bing-Images: description see :py:obj:`searx.engines.bing`.\"\"\"\n# pylint: "
},
{
"path": "docs/more_powerful_search_skill/extra/bing_news.py",
"chars": 4890,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"Bing-News: description see :py:obj:`searx.engines.bing`.\n\n.. hint::\n\n "
},
{
"path": "docs/more_powerful_search_skill/extra/flickr.py",
"chars": 2448,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"\nFlickr (Images)\n\nMore info on api-key : https://www.flickr.com/services"
},
{
"path": "docs/more_powerful_search_skill/extra/quark.py",
"chars": 12235,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"Quark (Shenma) search engine for searxng\"\"\"\n\nfrom urllib.parse import ur"
},
{
"path": "docs/more_powerful_search_skill/extra/wikipedia.py",
"chars": 11513,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"This module implements the Wikipedia engine. Some of this implementatio"
},
{
"path": "docs/more_powerful_search_skill/extra/youtube_noapi.py",
"chars": 6020,
"preview": "# SPDX-License-Identifier: AGPL-3.0-or-later\n\"\"\"Youtube (Videos)\"\"\"\n\nfrom functools import reduce\nfrom json import loads"
},
{
"path": "docs/more_powerful_search_skill/rss_parsor.py",
"chars": 4213,
"preview": "import httpx\nimport feedparser\nfrom reference.async_logger import wis_logger\nfrom reference.wis import CrawlResult, Sqli"
},
{
"path": "docs/prompt_videos.md",
"chars": 83,
"preview": "\n\nhttps://github.com/user-attachments/assets/8d097b3b-f9ab-42eb-98bb-88af5d28b089\n\n"
},
{
"path": "openclaw.version",
"chars": 345,
"preview": "# OpenClaw 上游版本锁定\n# 所有 addon 开发者和 CI 均从此文件读取,保证基于同一版本开发和测试\n# 格式遵循 openclaw-for-business/openclaw.version 规范\n#\n# 使用方式(she"
},
{
"path": "scripts/generate-patch.sh",
"chars": 690,
"preview": "#!/bin/bash\nset -e\n\ncd \"$(dirname \"$0\")/..\"\n\nif [ -z \"$1\" ]; then\n echo \"Usage: ./scripts/generate-patch.sh <patch-name"
},
{
"path": "tests/README.md",
"chars": 5052,
"preview": "以下测试均要求:\n\n- 在 test 下创建一个 md 文件,记录每次的测试结果(success/fail),并最终整理为一个列表;\n- 每次测试获取的快照需要额外保存为本地文件,备查。\n\n\n# 企业官网与政府网站\n\n分别打开如下页面,**"
},
{
"path": "tests/run-managed-tests.mjs",
"chars": 26940,
"preview": "#!/usr/bin/env node\n/**\n * OpenClaw Managed Browser — Automated Test Suite\n *\n * Opens each test URL with the openclaw-m"
},
{
"path": "version",
"chars": 7,
"preview": "v5.1.7\n"
},
{
"path": "wiseflow/README.md",
"chars": 4643,
"preview": "# Wiseflow Addon for OpenClaw\n\n浏览器反检测 + Tab Recovery + Smart Search + RSS Reader + 新媒体小编 Crew。\n\n本目录是 [wiseflow](https://"
},
{
"path": "wiseflow/addon.json",
"chars": 234,
"preview": "{\n \"name\": \"wiseflow\",\n \"version\": \"0.3.0\",\n \"description\": \"浏览器反检测 + Tab Recovery + 互联网搜索增强(smart-search / rss-reade"
},
{
"path": "wiseflow/crew/new-media-editor/AGENTS.md",
"chars": 2704,
"preview": "# 新媒体小编 — Workflow\n\n## Mode A:选题研究 → 图文输出\n\n```\n1. 接收用户指定的选题/方向(如「AI 工具」「春节营销」等)\n2. 确认:目标平台、风格要求(轻松/严肃/专业)、大约字数、是否有截止时间\n3"
},
{
"path": "wiseflow/crew/new-media-editor/ALLOWED_COMMANDS",
"chars": 52,
"preview": "# T1 基础上追加:技能脚本执行所需的运行时命令\n+bash\n+python3\n+node\n+npx\n"
},
{
"path": "wiseflow/crew/new-media-editor/BOOTSTRAP.md",
"chars": 517,
"preview": "# Bootstrap\n\nThis is a pre-configured crew workspace. Your role, responsibilities, and behavioral guidelines are fully d"
},
{
"path": "wiseflow/crew/new-media-editor/BUILTIN_SKILLS",
"chars": 10,
"preview": "summarize\n"
},
{
"path": "wiseflow/crew/new-media-editor/DENIED_SKILLS",
"chars": 30,
"preview": "github\ngh-issues\ncoding-agent\n"
},
{
"path": "wiseflow/crew/new-media-editor/HEARTBEAT.md",
"chars": 51,
"preview": "# 新媒体小编 — Heartbeat\n\n<!-- 初始为空,实例化后由系统定期更新健康状态 -->\n"
},
{
"path": "wiseflow/crew/new-media-editor/IDENTITY.md",
"chars": 191,
"preview": "# 新媒体小编 — Identity\n\n## Name\n新媒体小编 (New Media Editor)\n\n## Role\n社交媒体内容创作专家 — 深耕中国主流自媒体生态,发现热点、采集素材、撰写图文,交付可直接发布的内容。\n\n## Pe"
},
{
"path": "wiseflow/crew/new-media-editor/MEMORY.md",
"chars": 293,
"preview": "# 新媒体小编 — Memory\n\n## Account Profiles\n(实���化后填写:运营的平台账号、粉丝画像、账号调性、发布节奏)\n\n## Content Archive\n(记录已发布内容的标题、平台、发布日期,避免重复选题)\n"
},
{
"path": "wiseflow/crew/new-media-editor/SOUL.md",
"chars": 628,
"preview": "# 新媒体小编 — SOUL\n\n## Identity\n你是一名专业的新媒体内容创作者,专门服务于中国主流自媒体平台(微博、微信公众号、小红书、知乎、抖音、B站等)。你的核心能力是:快速捕捉热点、深度采集一手素材、精准提炼核心观点,最终产出"
},
{
"path": "wiseflow/crew/new-media-editor/TASKS.md",
"chars": 45,
"preview": "# 新媒体小编 — Tasks\n\n<!-- 初始为空,实例化后由小编在运行中维护 -->\n"
},
{
"path": "wiseflow/crew/new-media-editor/TOOLS.md",
"chars": 1000,
"preview": "# 新媒体小编 — Tools\n\n## Available Tools\n\n| Tool | Purpose |\n|------|---------|\n| `smart-search` | 在各大平台(微博、小红书、知乎、B站、抖音、Bing"
},
{
"path": "wiseflow/crew/new-media-editor/USER.md",
"chars": 321,
"preview": "# 新媒体小编 — User Context\n\n## User Role\n新媒体运营者 — 可能是品牌方的市场/运营人员、个人自媒体博主,或希望提升内容产出效率的企业主。\n\n## Preferences\n- Language: 中文(主要)"
},
{
"path": "wiseflow/crew/new-media-editor/skills/siliconflow-img-gen/SKILL.md",
"chars": 2210,
"preview": "---\nname: siliconflow-img-gen\ndescription: Generate images via SiliconFlow Images API. Default model is Qwen/Qwen-Image-"
},
{
"path": "wiseflow/crew/new-media-editor/skills/siliconflow-img-gen/scripts/gen.py",
"chars": 4284,
"preview": "#!/usr/bin/env python3\n\"\"\"SiliconFlow image generation — stdlib only (no httpx/requests).\"\"\"\n\nimport argparse\nimport jso"
},
{
"path": "wiseflow/crew/new-media-editor/skills/siliconflow-video-gen/SKILL.md",
"chars": 2768,
"preview": "---\nname: siliconflow-video-gen\ndescription: Generate videos via SiliconFlow Video API. Supports text-to-video (T2V) and"
},
{
"path": "wiseflow/crew/new-media-editor/skills/siliconflow-video-gen/scripts/gen.py",
"chars": 5481,
"preview": "#!/usr/bin/env python3\n\"\"\"SiliconFlow video generation — stdlib only (no httpx/requests).\n\nFlow:\n 1. POST /v1/video/sub"
},
{
"path": "wiseflow/crew/new-media-editor/skills/wenyan-formatter/SKILL.md",
"chars": 3257,
"preview": "---\nname: wenyan-formatter\ndescription: Format Markdown drafts into styled HTML for preview, or publish directly to WeCh"
},
{
"path": "wiseflow/crew/new-media-editor/skills/wenyan-formatter/scripts/format.sh",
"chars": 6514,
"preview": "#!/usr/bin/env bash\n# wenyan-formatter — Markdown → styled HTML (render) or WeChat GZH draft (publish)\n# Wraps @wenyan-m"
},
{
"path": "wiseflow/overrides.sh",
"chars": 1866,
"preview": "#!/bin/bash\n# wiseflow addon - overrides.sh\n# 通过 pnpm overrides 将 playwright-core 替换为 patchright-core(反检测)\n# 由 apply-add"
},
{
"path": "wiseflow/patches/001-browser-tab-recovery.patch",
"chars": 12594,
"preview": "diff --git a/src/agents/tools/browser-tool.actions.ts b/src/agents/tools/browser-tool.actions.ts\nindex a4b6cb456..8fd275"
},
{
"path": "wiseflow/patches/002-disable-web-search-env-var.patch",
"chars": 660,
"preview": "diff --git a/src/agents/tools/web-search.ts b/src/agents/tools/web-search.ts\nindex 1e4983f85..aa7dac794 100644\n--- a/src"
},
{
"path": "wiseflow/patches/003-act-field-validation.patch",
"chars": 6435,
"preview": "diff --git a/src/agents/tools/browser-tool.actions.ts b/src/agents/tools/browser-tool.actions.ts\nindex 8fd27500c..af0f67"
},
{
"path": "wiseflow/patches/004-web-fetch-allow-rfc2544.patch",
"chars": 640,
"preview": "diff --git a/src/agents/tools/web-fetch.ts b/src/agents/tools/web-fetch.ts\nindex f4cc88e2d..cd08b177c 100644\n--- a/src/a"
},
{
"path": "wiseflow/skills/browser-guide/SKILL.md",
"chars": 4079,
"preview": "---\nname: browser-guide\ndescription: Best practices for using the managed browser — handling login walls, CAPTCHAs, lazy"
},
{
"path": "wiseflow/skills/rss-reader/SKILL.md",
"chars": 2514,
"preview": "---\nname: rss-reader\ndescription: Discover the RSS/Atom feed URL for a website, then run the fetch-rss.mjs script to ret"
},
{
"path": "wiseflow/skills/rss-reader/package.json",
"chars": 188,
"preview": "{\n \"name\": \"rss-reader-skill\",\n \"version\": \"1.0.0\",\n \"description\": \"RSS/Atom feed reader skill for wiseflow\",\n \"typ"
},
{
"path": "wiseflow/skills/rss-reader/scripts/fetch-rss.mjs",
"chars": 4882,
"preview": "#!/usr/bin/env node\n/**\n * fetch-rss.mjs — Fetch and parse an RSS/Atom feed, output as markdown\n *\n * Usage:\n * node f"
},
{
"path": "wiseflow/skills/smart-search/SKILL.md",
"chars": 11009,
"preview": "---\nname: smart-search\ndescription: Construct optimized search URLs for major platforms and navigate to results with the"
}
]
About this extraction
This page contains the full source code of the TeamWiseFlow/wiseflow GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 63 files (284.3 KB), approximately 89.3k tokens, and a symbol index with 92 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.