[
  {
    "path": ".deepsource.toml",
    "content": "version = 1\n\n[[analyzers]]\nname = \"python\"\n\n  [analyzers.meta]\n  runtime_version = \"3.x.x\""
  },
  {
    "path": ".gitignore",
    "content": "# Development documentation (local only, not for Git)\ndevlogs/\nconclusions/\nresearches/\n\n# Python\n__pycache__/\n*.py[cod]\n*$py.class\n\n# Virtual environments\nvenv/\nenv/\nENV/\n\n# IDE\n.vscode/\n.idea/\n*.swp\n\n# OS\n.DS_Store\nnode_modules/\n**/node_modules/backend/celerybeat-schedule*\nbackend/.crawl_cache/\nbackend/celerybeat-schedule\nbackend/reproduce_sina.py\nbackend/checkpoints/"
  },
  {
    "path": "LICENSE",
    "content": "\n                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   APPENDIX: How to apply the Apache License to your work.\n\n      To apply the Apache License to your work, attach the following\n      boilerplate notice, with the fields enclosed by brackets \"[]\"\n      replaced with your own identifying information. (Don't include\n      the brackets!)  The text should be enclosed in the appropriate\n      comment syntax for the file format. We also recommend that a\n      file or class name and description of purpose be included on the\n      same \"printed page\" as the copyright notice for easier\n      identification within third-party archives.\n\n   Copyright 2025 Ziran Li\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n"
  },
  {
    "path": "README.md",
    "content": "# FinnewsHunter: Multi-Agent Investment Decision Platform Driven by Financial News\n\n<div align=\"right\">\n  <a href=\"README_zn.md\">中文版</a> | <a href=\"README.md\">English</a>\n</div>\n\n<div align=\"center\">\n  <img src=\"assets/images/FINNEWS_HUNTER_LOGO.png\" alt=\"FinnewsHunter Logo\" width=\"450\">\n</div>\n\nAn enterprise-grade financial news analysis system built on the [AgenticX](https://github.com/DemonDamon/AgenticX) framework, integrating real-time news streams, deep quantitative analysis, and multi-agent debate mechanisms.\n\nFinnewsHunter goes beyond traditional text classification by deploying multi-agent teams (NewsAnalyst, Researcher, etc.) to monitor multiple financial news sources in real-time, including Sina Finance, National Business Daily, Financial World, Securities Times, and more. It leverages large language models for deep interpretation, sentiment analysis, and market impact assessment, combined with knowledge graphs to mine potential investment opportunities and risks, providing decision-level alpha signals for quantitative trading.\n\n---\n\n## 🎯 Project Features\n\n- ✅ **AgenticX Native**: Deeply integrated with AgenticX framework, using core abstractions like Agent, Tool, and Workflow\n- ✅ **AgenticX Component Integration**: Direct use of AgenticX's `BailianEmbeddingProvider` and `MilvusStorage`, avoiding reinventing the wheel\n- ✅ **Agent-Driven**: NewsAnalyst agent automatically analyzes news sentiment and market impact\n- ✅ **Multi-Provider LLM Support**: Supports 5 major LLM providers (Bailian, OpenAI, DeepSeek, Kimi, Zhipu), switchable with one click in the frontend\n- ✅ **Batch Operations**: Supports batch selection, batch deletion, and batch analysis of news, improving operational efficiency\n- ✅ **Stock K-Line Analysis**: Integrated with akshare real market data, supporting daily/minute K-line multi-period display\n- ✅ **Intelligent Stock Search**: Supports code and name fuzzy queries, pre-loaded with 5000+ A-share data\n- ✅ **Complete Tech Stack**: FastAPI + PostgreSQL + Milvus + Redis + React\n- ✅ **Real-time Search**: Supports multi-dimensional search by title, content, stock code, with keyword highlighting\n- ✅ **Async Vectorization**: Background async vectorization execution, non-blocking analysis flow\n- ✅ **Production Ready**: One-click deployment with Docker Compose, complete logging and monitoring\n\n---\n\n## 🏗️ System Architecture\n\n![FinnewsHunter Architecture](assets/images/arch-20251201.png)\n\nThe system adopts a layered architecture design:\n- **M6 Frontend Interaction Layer**: React + TypeScript + Shadcn UI\n- **M1 Platform Service Layer**: FastAPI Gateway + Task Manager\n- **M4/M5 Agent Collaboration Layer**: AgenticX Agent + Debate Workflow\n- **M2/M3 Infrastructure Layer**: Crawler Service + LLM Service + Embedding\n- **M7-M11 Storage & Learning Layer**: PostgreSQL + Milvus + Redis + ACE Framework\n\n---\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.11+\n- Docker & Docker Compose\n- (Optional) OpenAI API Key or local LLM\n- Node.js 18+ (for frontend development)\n\n### 1. Install AgenticX\n\n```bash\ncd /Users/damon/myWork/AgenticX\npip install -e .\n```\n\n### 2. Install Backend Dependencies\n\n```bash\ncd FinnewsHunter/backend\npip install -r requirements.txt\n```\n\n### 3. Configure Environment Variables\n\n```bash\ncd FinnewsHunter/backend\ncp env.example .env\n# Edit .env file and fill in LLM API Key and other configurations\n```\n\n**Multi-Provider LLM Configuration:**\n\nThe system supports 5 LLM providers, at least one needs to be configured:\n\n| Provider | Environment Variable | Registration URL |\n|----------|---------------------|------------------|\n| Bailian (Alibaba Cloud) | `DASHSCOPE_API_KEY` | https://dashscope.console.aliyun.com/ |\n| OpenAI | `OPENAI_API_KEY` | https://platform.openai.com/api-keys |\n| DeepSeek | `DEEPSEEK_API_KEY` | https://platform.deepseek.com/ |\n| Kimi (Moonshot) | `MOONSHOT_API_KEY` | https://platform.moonshot.cn/ |\n| Zhipu | `ZHIPU_API_KEY` | https://open.bigmodel.cn/ |\n\n**Example Configuration (Recommended: Bailian):**\n\n```bash\n# Bailian (Alibaba Cloud) - Recommended, fast access in China\nDASHSCOPE_API_KEY=sk-your-dashscope-key\nDASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nBAILIAN_MODELS=qwen-plus,qwen-max,qwen-turbo\n\n# Optional: Other providers\nOPENAI_API_KEY=sk-your-openai-key\nDEEPSEEK_API_KEY=sk-your-deepseek-key\n```\n\n### 4. Start Base Services (PostgreSQL, Redis, Milvus)\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml up -d postgres redis milvus-etcd milvus-minio milvus-standalone\n```\n\n### 5. Initialize Database\n\n```bash\ncd FinnewsHunter/backend\npython init_db.py\n```\n\n### 5.1 Initialize Stock Data (Optional, for stock search functionality)\n\n```bash\ncd FinnewsHunter/backend\npython -m app.scripts.init_stocks\n# Will fetch all A-share data (approximately 5000+ stocks) from akshare and save to database\n```\n\n### 6. Start Backend API Service\n\n```bash\ncd FinnewsHunter/backend\nuvicorn app.main:app --reload --host 0.0.0.0 --port 8000\n```\n\n### 7. Start Celery Worker and Beat (Auto Crawling)\n\n```bash\n# Open a new terminal\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml up -d celery-worker celery-beat\n```\n\n### 8. Start Frontend Service\n\n```bash\n# Open a new terminal\ncd FinnewsHunter/frontend\nnpm install  # First time requires dependency installation\nnpm run dev\n```\n\n### 9. Access Application\n\n- **Frontend Interface**: http://localhost:3000\n- **Backend API**: http://localhost:8000\n- **API Documentation**: http://localhost:8000/docs\n\n---\n\n## 🔄 Service Management\n\n### View All Service Status\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml ps\n```\n\n### Restart All Services\n\n```bash\ncd FinnewsHunter\n\n# Restart Docker services (infrastructure + Celery)\ndocker compose -f deploy/docker-compose.dev.yml restart\n\n# If backend API is started independently, manually restart it\n# Press Ctrl+C to stop backend process, then rerun:\ncd backend\nuvicorn app.main:app --reload --host 0.0.0.0 --port 8000\n```\n\n### Restart Specific Service\n\n```bash\ncd FinnewsHunter\n\n# Restart only Celery (after code changes)\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n\n# Restart only database\ndocker compose -f deploy/docker-compose.dev.yml restart postgres\n\n# Restart only Redis\ndocker compose -f deploy/docker-compose.dev.yml restart redis\n```\n\n### Stop All Services\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml down\n```\n\n### View Logs\n\n```bash\ncd FinnewsHunter\n\n# View Celery Worker logs\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-worker\n\n# View Celery Beat logs (scheduled task dispatch)\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n\n# View PostgreSQL logs\ndocker compose -f deploy/docker-compose.dev.yml logs -f postgres\n\n# View all service logs\ndocker compose -f deploy/docker-compose.dev.yml logs -f\n```\n\n---\n\n## 🗑️ Reset Database\n\n### Method 1: Use One-Click Reset Script (Recommended) ⭐\n\n```bash\ncd FinnewsHunter\n\n# Execute reset script\n./reset_all_data.sh\n\n# Enter yes to confirm\n```\n\n**The script will automatically complete:**\n1. ✅ Clear all news and task data in PostgreSQL\n2. ✅ Clear Redis cache\n3. ✅ Reset database auto-increment IDs (restart from 1)\n4. ✅ Clear Celery schedule files\n5. ✅ Automatically restart Celery services\n\n**After execution, wait:**\n- 5-10 minutes for the system to automatically re-crawl data\n- Access frontend to view new data\n\n---\n\n### Method 2: Manual Reset (Advanced)\n\n#### Step 1: Clear PostgreSQL Data\n\n```bash\n# Enter PostgreSQL container\ndocker exec -it finnews_postgres psql -U finnews -d finnews_db\n```\n\nExecute in PostgreSQL command line:\n\n```sql\n-- Clear news table\nDELETE FROM news;\n\n-- Clear task table\nDELETE FROM crawl_tasks;\n\n-- Clear analysis table\nDELETE FROM analyses;\n\n-- Reset auto-increment IDs\nALTER SEQUENCE news_id_seq RESTART WITH 1;\nALTER SEQUENCE crawl_tasks_id_seq RESTART WITH 1;\nALTER SEQUENCE analyses_id_seq RESTART WITH 1;\n\n-- Verify results (should all be 0)\nSELECT 'news table', COUNT(*) FROM news;\nSELECT 'crawl_tasks table', COUNT(*) FROM crawl_tasks;\nSELECT 'analyses table', COUNT(*) FROM analyses;\n\n-- Exit\n\\q\n```\n\n#### Step 2: Clear Redis Cache\n\n```bash\ncd FinnewsHunter\ndocker exec finnews_redis redis-cli FLUSHDB\n```\n\n#### Step 3: Clear Celery Schedule Files\n\n```bash\ncd FinnewsHunter/backend\nrm -f celerybeat-schedule*\n```\n\n#### Step 4: Restart Celery Services\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n```\n\n#### Step 5: Verify Data Cleared\n\n```bash\n# Check news count (should be 0)\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT COUNT(*) FROM news;\"\n\n# Check Redis (should be 0 or very small)\ndocker exec finnews_redis redis-cli DBSIZE\n\n# Check if Celery has started crawling\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n# Should see 10 crawl tasks triggered per minute\n```\n\n---\n\n### Method 3: Use Python Script Reset\n\n```bash\ncd FinnewsHunter/backend\npython reset_database.py\n# Enter yes to confirm\n```\n\n---\n\n### Method 4: Quick Manual Cleanup (One-Line Commands) 🔥\n\n**Use Case:** When reset script doesn't work, this is the fastest method\n\n```bash\ncd FinnewsHunter\n\n# Step 1: Clear database tables\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"DELETE FROM news; DELETE FROM crawl_tasks; DELETE FROM analyses;\"\n\n# Step 2: Reset auto-increment IDs\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"ALTER SEQUENCE news_id_seq RESTART WITH 1; ALTER SEQUENCE crawl_tasks_id_seq RESTART WITH 1; ALTER SEQUENCE analyses_id_seq RESTART WITH 1;\"\n\n# Step 3: Clear Redis cache\ndocker exec finnews_redis redis-cli FLUSHDB\n\n# Step 4: Clear Celery schedule files\nrm -f backend/celerybeat-schedule*\n\n# Step 5: Restart Celery services\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n\n# Step 6: Verify cleared (should display 0)\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT COUNT(*) FROM news;\"\n```\n\n**Immediately refresh browser after execution:**\n- Mac: `Command + Shift + R`\n- Windows: `Ctrl + Shift + R`\n\n---\n\n### 🖥️ Clear Frontend Cache (Important!)\n\n**After data is cleared, frontend may still display old data due to browser cache.**\n\n#### Method 1: Hard Refresh Browser (Recommended) ⭐\n\n**Mac System:**\n```\nPress Command + Shift + R\nor Command + Option + R\n```\n\n**Windows/Linux System:**\n```\nPress Ctrl + Shift + R\nor Ctrl + F5\n```\n\n#### Method 2: Developer Tools Clear Cache\n\n1. Press `F12` to open developer tools\n2. Right-click the refresh button (next to address bar)\n3. Select **\"Empty Cache and Hard Reload\"**\n\n#### Method 3: Clear Browser Cache\n\n1. **Chrome/Edge:**\n   - `Command + Shift + Delete` (Mac) or `Ctrl + Shift + Delete` (Windows)\n   - Check \"Cached images and files\"\n   - Time range select \"All time\"\n   - Click \"Clear data\"\n\n2. **After refreshing page, hard refresh again**\n   - Ensure React Query cache is also cleared\n\n#### Method 4: Restart Frontend Dev Server (Most Thorough)\n\n```bash\n# Press Ctrl+C in frontend terminal to stop service\n# Then restart\ncd FinnewsHunter/frontend\nnpm run dev\n```\n\n---\n\n## 📊 Data Recovery Timeline After Reset\n\n| Time | Event | Expected Result |\n|------|-------|----------------|\n| 0 min | Execute reset script | Database cleared, Redis cleared |\n| 1 min | Celery Beat starts scheduling | 10 crawl tasks triggered |\n| 2-5 min | First batch of news saved | Database starts having data |\n| 5-10 min | All sources have data | Frontend can see 100+ news |\n| 30 min | Data continues growing | 500+ news |\n| 1 hour | Stable operation | 1000-2000 news |\n\n**Notes:**\n- Need to wait 5-10 minutes after reset to see new data\n- **Frontend must hard refresh** (Command+Shift+R / Ctrl+Shift+R) to clear cache\n- Don't reset frequently, affects system stability\n\n**Steps to immediately hard refresh frontend after reset:**\n1. Execute reset command\n2. **Immediately** press `Command + Shift + R` (Mac) or `Ctrl + Shift + R` (Windows) in browser\n3. Wait 5-10 minutes then refresh again to view new data\n\n---\n\n## ⚠️ Crawler Status Check\n\n### Check Which Sources Are Working\n\n```bash\ncd FinnewsHunter\n\n# View news count by source\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\nSELECT source, COUNT(*) as count \nFROM news \nWHERE created_at > NOW() - INTERVAL '1 hour'\nGROUP BY source \nORDER BY count DESC;\n\"\n\n# View recent crawl task status\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\nSELECT source, \n       crawled_count, \n       saved_count, \n       status,\n       error_message \nFROM crawl_tasks \nWHERE created_at > NOW() - INTERVAL '10 minutes'\nORDER BY created_at DESC \nLIMIT 20;\n\"\n```\n\n### View Crawl Errors\n\n```bash\ncd FinnewsHunter\n\n# View ERROR logs\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker | grep ERROR\n\n# View specific source issues\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker | grep \"jwview\"\n```\n\n---\n\n## 📚 User Guide\n\n### Auto Crawl Mode (Recommended) ⭐\n\n**System is configured with automatic crawling for 10 news sources:**\n\n1. 🌐 Sina Finance\n2. 🐧 Tencent Finance\n3. 💰 Financial World\n4. 📊 Economic Observer\n5. 📈 Caijing.com\n6. 📉 21st Century Business Herald\n7. 📰 National Business Daily\n8. 🎯 Yicai\n9. 📧 NetEase Finance\n10. 💎 East Money\n\n**How it works:**\n- ✅ Celery Beat automatically triggers crawling for all sources every 1 minute\n- ✅ Automatic deduplication (URL level)\n- ✅ Smart time filtering (keep news within 24 hours)\n- ✅ Stock keyword filtering\n- ✅ No manual operation needed\n\n**View crawl progress:**\n\n```bash\n# View Celery Beat scheduling logs\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n\n# View Celery Worker execution logs\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-worker\n```\n\n---\n\n### Manual Refresh (Get Latest Immediately)\n\n**Method 1: Via Frontend**\n1. Visit http://localhost:3000/news\n2. Click the \"🔄 Refresh Now\" button in the top right\n3. System will immediately trigger crawling, data updates in about 2 minutes\n\n**Method 2: Via API**\n```bash\n# Force refresh Sina Finance\ncurl -X POST \"http://localhost:8000/api/v1/news/refresh?source=sina\"\n\n# Force refresh all sources (need to call individually)\nfor source in sina tencent jwview eeo caijing jingji21 nbd yicai 163 eastmoney; do\n  curl -X POST \"http://localhost:8000/api/v1/news/refresh?source=$source\"\n  sleep 1\ndone\n```\n\n---\n\n### View News List\n\n**Method 1: Via Frontend (Recommended)**\n- Visit http://localhost:3000\n- Homepage: View source statistics and latest news\n- News Feed: Filter news by source and sentiment\n- Batch selection support: Use checkboxes to select multiple news, supports Shift key range selection\n- Batch operations: Select all/deselect all, batch delete, batch analyze\n\n**Method 2: Via API**\n\n```bash\n# Get latest news from all sources (200 items)\ncurl \"http://localhost:8000/api/v1/news/latest?limit=200\"\n\n# Get news from specific source\ncurl \"http://localhost:8000/api/v1/news/latest?source=sina&limit=50\"\n\n# Filter by sentiment (using old API)\ncurl \"http://localhost:8000/api/v1/news/?sentiment=positive&limit=20\"\n\n# Get all available news source list\ncurl \"http://localhost:8000/api/v1/news/sources\"\n```\n\n---\n\n### Batch Operations on News\n\n**Frontend Operations:**\n1. **Batch Selection**:\n   - Click checkbox on the left of news card to select single news\n   - Hold Shift key and click for range selection\n   - Use \"Select All\" button in top toolbar to select all news in current filter results\n   - Selection state automatically clears when switching news source or filter conditions\n\n2. **Batch Delete**:\n   - After selecting multiple news, click \"Batch Delete\" button in top toolbar\n   - After confirming delete dialog, selected news will be deleted\n   - List automatically refreshes after deletion\n\n3. **Batch Analysis**:\n   - After selecting multiple news, click \"Batch Analyze\" button in top toolbar\n   - System will analyze selected news sequentially, showing progress and result statistics\n   - After analysis completes, shows success/failure count\n\n**API Operations:**\n```bash\n# Batch delete news\ncurl -X POST \"http://localhost:8000/api/v1/news/batch/delete\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"news_ids\": [1, 2, 3]}'\n\n# Batch analyze news\ncurl -X POST \"http://localhost:8000/api/v1/analysis/batch\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"news_ids\": [1, 2, 3], \"provider\": \"bailian\", \"model\": \"qwen-plus\"}'\n```\n\n---\n\n### Analyze News\n\n**Method 1: Via Frontend**\n- Click \"✨ Analyze\" button on news card\n- Wait 3-5 seconds to view analysis results\n- Click news card to open detail drawer, view complete analysis content\n\n**Method 2: Via API**\n```bash\n# Analyze news with specified ID (using default model)\ncurl -X POST http://localhost:8000/api/v1/analysis/news/1\n\n# Analyze news (specify model)\ncurl -X POST http://localhost:8000/api/v1/analysis/news/1 \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"provider\": \"bailian\", \"model\": \"qwen-max\"}'\n\n# View analysis results\ncurl http://localhost:8000/api/v1/analysis/1\n```\n\n---\n\n### Switch LLM Model\n\n**Frontend Operations:**\n1. Click model selector in top right (shows current model name)\n2. Select different provider and model from dropdown menu\n3. Selection automatically saves, subsequent analyses will use new model\n\n**Supported Models:**\n- 🔥 **Bailian**: qwen-plus, qwen-max, qwen-turbo, qwen-long\n- 🤖 **OpenAI**: gpt-4, gpt-4-turbo, gpt-3.5-turbo\n- 🧠 **DeepSeek**: deepseek-chat, deepseek-coder\n- 🌙 **Kimi**: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k\n- 🔮 **Zhipu**: glm-4, glm-4-plus, glm-4-air\n\n**API to Get Available Model List:**\n```bash\ncurl http://localhost:8000/api/v1/llm/config\n```\n\n---\n\n### Search News\n\n**Frontend Operations:**\n1. Enter keywords in top search box\n2. Supports search: title, content, stock code, source\n3. Matching keywords will be highlighted\n4. Search has 300ms debounce, automatically searches after input stops\n\n**Search Examples:**\n- Search stock code: `600519` (Kweichow Moutai)\n- Search keywords: `新能源` (new energy), `半导体` (semiconductor)\n- Search source: `sina`, `eastmoney`\n\n---\n\n### View News Details\n\n**Frontend Operations:**\n1. Click any news card\n2. Detail drawer slides out from right, displaying:\n   - 📰 News title and source\n   - 📊 Sentiment score (positive/negative/neutral)\n   - 📈 Associated stock codes\n   - 📝 Complete news content\n   - 🤖 AI analysis results (Markdown format)\n   - 🔗 Original article link\n3. Click \"Copy Analysis Content\" to copy analysis report in Markdown format\n\n---\n\n### Stock K-Line Analysis\n\n**Frontend Operations:**\n1. Visit http://localhost:3000/stocks/SH600519 (Kweichow Moutai example)\n2. Use top right search box to enter stock code or name (e.g., `茅台` (Moutai), `600519`)\n3. Select time period: Daily K, 60min, 30min, 15min, 5min, 1min\n4. Chart supports:\n   - 📈 K-line candlestick chart (OHLC)\n   - 📊 Volume bar chart\n   - 📉 MA moving averages (5/10/30/60 day)\n\n**API Operations:**\n\n```bash\n# Get K-line data (daily, default 180 items)\ncurl \"http://localhost:8000/api/v1/stocks/SH600519/kline?period=daily&limit=180\"\n\n# Get minute K-line (60-minute line)\ncurl \"http://localhost:8000/api/v1/stocks/SH600519/kline?period=60m&limit=200\"\n\n# Search stocks\ncurl \"http://localhost:8000/api/v1/stocks/search/realtime?q=茅台&limit=10\"\n\n# View stock count in database\ncurl \"http://localhost:8000/api/v1/stocks/count\"\n```\n\n---\n\n### Filter by Source\n\n**Frontend Operations:**\n\n1. **Homepage (Dashboard)**\n   - View \"News Source Statistics\" card\n   - Click any source button to filter\n   - Display news count and list for that source\n\n2. **News Feed Page**\n   - Top has 10 source filter buttons\n   - Click to switch and view different sources\n   - Supports source + sentiment dual filtering\n\n**API Operations:**\n\n```bash\n# View Sina Finance news\ncurl \"http://localhost:8000/api/v1/news/latest?source=sina&limit=50\"\n\n# View National Business Daily news\ncurl \"http://localhost:8000/api/v1/news/latest?source=nbd&limit=50\"\n\n# View all sources\ncurl \"http://localhost:8000/api/v1/news/latest?limit=200\"\n```\n\n---\n\n## 🏗️ Project Structure\n\n```\nFinnewsHunter/\n├── backend/                    # Backend service\n│   ├── app/\n│   │   ├── agents/            # Agent definitions (NewsAnalyst, debate agents, etc.)\n│   │   ├── api/v1/            # FastAPI routes\n│   │   │   ├── analysis.py    # Analysis API (supports batch analysis)\n│   │   │   ├── llm_config.py  # LLM config API\n│   │   │   ├── news_v2.py     # News API (supports batch delete)\n│   │   │   └── ...\n│   │   ├── core/              # Core configuration (config, database, redis, neo4j)\n│   │   ├── models/            # SQLAlchemy data models\n│   │   ├── services/          # Business services\n│   │   │   ├── llm_service.py      # LLM service (multi-provider support)\n│   │   │   ├── analysis_service.py # Analysis service (async vectorization)\n│   │   │   ├── embedding_service.py # Vectorization service (based on AgenticX BailianEmbeddingProvider)\n│   │   │   └── stock_data_service.py # Stock data service\n│   │   ├── storage/           # Storage wrapper\n│   │   │   └── vector_storage.py # Milvus vector storage (based on AgenticX MilvusStorage)\n│   │   ├── tasks/             # Celery tasks\n│   │   └── tools/              # AgenticX tools (Crawler, Cleaner)\n│   ├── tests/                 # Test and utility scripts\n│   │   ├── check_milvus_data.py           # Check Milvus vector storage data\n│   │   ├── check_news_embedding_status.py # Check news vectorization status\n│   │   └── manual_vectorize.py           # Manually vectorize specified news\n│   ├── env.example            # Environment variable template\n│   └── requirements.txt       # Python dependencies\n├── frontend/                  # React frontend\n│   └── src/\n│       ├── components/        # Components\n│       │   ├── ModelSelector.tsx    # LLM model selector\n│       │   ├── NewsDetailDrawer.tsx # News detail drawer\n│       │   └── HighlightText.tsx    # Keyword highlighting\n│       ├── context/           # React Context\n│       ├── hooks/             # Custom Hooks\n│       │   └── useDebounce.ts # Debounce Hook\n│       ├── layout/            # Layout components\n│       └── pages/             # Page components\n│           └── NewsListPage.tsx # News list page (supports batch operations)\n├── deploy/                    # Deployment configuration\n│   ├── docker-compose.dev.yml # Docker Compose configuration\n│   ├── Dockerfile.celery     # Celery image build file\n│   └── celery-entrypoint.sh  # Celery container startup script\n├── conclusions/               # Module summary documentation\n│   ├── backend/              # Backend module summaries\n│   └── frontend/             # Frontend module summaries\n└── .dev-docs/                 # Development documentation\n```\n\n---\n\n## 🧪 Testing & Acceptance\n\n### MVP Acceptance Criteria\n\n- [x] News crawling successful and saved to PostgreSQL\n- [x] NewsAnalyst calls LLM to complete analysis\n- [x] Analysis results include sentiment scores\n- [x] Frontend can display news and analysis results\n- [x] Support multi-provider LLM dynamic switching\n- [x] News details display complete analysis content\n- [x] Real-time search and filtering functionality\n- [x] Batch selection, batch delete, batch analysis functionality\n- [x] Vectorization and storage services based on AgenticX\n- [x] Async vectorization, non-blocking analysis flow\n\n### Testing Process\n\n1. **Start All Services**\n   ```bash\n   ./start.sh\n   ```\n\n2. **Check Docker Container Status**\n   ```bash\n   docker ps\n   # Should see: postgres, redis, milvus-standalone, milvus-etcd, milvus-minio\n   ```\n\n3. **Test News Crawling**\n   ```bash\n   curl -X POST http://localhost:8000/api/v1/news/crawl \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\"source\": \"sina\", \"start_page\": 1, \"end_page\": 1}'\n   \n   # Wait 5-10 seconds then check results\n   curl http://localhost:8000/api/v1/news/?limit=5\n   ```\n\n4. **Test Agent Analysis**\n   ```bash\n   # Get first news ID\n   NEWS_ID=$(curl -s http://localhost:8000/api/v1/news/?limit=1 | jq '.[0].id')\n   \n   # Trigger analysis\n   curl -X POST http://localhost:8000/api/v1/analysis/news/$NEWS_ID\n   \n   # View analysis results\n   curl http://localhost:8000/api/v1/analysis/1\n   ```\n\n5. **Test Frontend Interface**\n   - Open `frontend/index.html`\n   - Click \"Crawl News\" and wait for completion\n   - Select a news item and click \"Analyze\"\n   - Check if sentiment score is displayed\n\n---\n\n## 🔧 Troubleshooting\n\n### Issue 1: Database Connection Failed\n\n**Symptom:** Backend startup error `could not connect to database`\n\n**Solution:**\n\n```bash\ncd FinnewsHunter\n\n# Check if PostgreSQL is running\ndocker ps | grep postgres\n\n# View logs\ndocker compose -f deploy/docker-compose.dev.yml logs postgres\n\n# Restart container\ndocker compose -f deploy/docker-compose.dev.yml restart postgres\n\n# Wait 30 seconds then retry backend startup\n```\n\n---\n\n### Issue 2: Celery Tasks Not Executing\n\n**Symptom:** Frontend shows 0 news count, no automatic crawling\n\n**Troubleshooting Steps:**\n\n```bash\ncd FinnewsHunter\n\n# 1. Check if Celery Worker is running\ndocker ps | grep celery\n\n# 2. View Celery Beat logs (should see tasks triggered every minute)\ndocker compose -f deploy/docker-compose.dev.yml logs celery-beat --tail=100\n\n# 3. View Celery Worker logs (check task execution)\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker --tail=100\n\n# 4. Check Redis connection\ndocker exec finnews_redis redis-cli PING\n# Should return PONG\n\n# 5. Restart Celery services\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n```\n\n---\n\n### Issue 3: Crawling Failed (404 Error)\n\n**Symptom:** Celery logs show `404 Client Error: Not Found`\n\n**Cause:** News website URL has changed\n\n**Solution:**\n\n```bash\n# 1. Manually visit URL to verify if available\ncurl -I https://finance.caijing.com.cn/\n\n# 2. If URL changed, update corresponding crawler configuration\n# Edit backend/app/tools/{source}_crawler.py\n# Update BASE_URL and STOCK_URL\n\n# 3. Clear Python cache\ncd FinnewsHunter/backend\nfind . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true\n\n# 4. Restart Celery\ncd ..\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n```\n\n---\n\n### Issue 4: Only Sina Finance Has Data\n\n**Symptom:** Other 9 sources have no news\n\n**Possible Causes:**\n1. Celery Beat configuration incomplete\n2. Crawler code has errors\n3. Website URL incorrect\n\n**Solution:**\n\n```bash\ncd FinnewsHunter\n\n# 1. Check Celery Beat configuration\ndocker compose -f deploy/docker-compose.dev.yml logs celery-beat | grep \"crawl-\"\n# Should see 10 scheduled tasks (crawl-sina, crawl-tencent, ..., crawl-eastmoney)\n\n# 2. Manually test single source crawling\ndocker exec -it finnews_celery_worker python -c \"\nfrom app.tools import get_crawler_tool\ncrawler = get_crawler_tool('nbd')  # Test National Business Daily\nnews = crawler.crawl()\nprint(f'Crawled {len(news)} news items')\n\"\n\n# 3. View data volume by source in database\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\nSELECT source, COUNT(*) as count \nFROM news \nGROUP BY source \nORDER BY count DESC;\n\"\n\n# 4. If a source keeps failing, view detailed errors\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker | grep \"ERROR\"\n```\n\n---\n\n### Issue 5: LLM Call Failed\n\n**Symptom:** Analysis functionality not working, error `LLM Provider NOT provided`\n\n**Solution:**\n\n```bash\ncd FinnewsHunter/backend\n\n# 1. Check if API Key is configured\ngrep -E \"DASHSCOPE_API_KEY|OPENAI_API_KEY|DEEPSEEK_API_KEY\" .env\n\n# 2. Check if Base URL is correct (Bailian must configure)\ngrep DASHSCOPE_BASE_URL .env\n# Should be: https://dashscope.aliyuncs.com/compatible-mode/v1\n\n# 3. Verify LLM config API is normal\ncurl http://localhost:8000/api/v1/llm/config | jq '.providers[].has_api_key'\n# At least one should return true\n\n# 4. If using Bailian, ensure complete configuration\ncat >> .env << EOF\nDASHSCOPE_API_KEY=sk-your-key\nDASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nBAILIAN_MODELS=qwen-plus,qwen-max\nEOF\n\n# 5. Restart backend service\n```\n\n---\n\n### Issue 6: Frontend Shows Blank or CORS Error\n\n**Symptom:** Frontend cannot load data, browser Console shows CORS error\n\n**Solution:**\n\n```bash\n# 1. Check backend CORS configuration\ncd FinnewsHunter/backend\ngrep BACKEND_CORS_ORIGINS .env\n# Should include http://localhost:3000\n\n# 2. Check frontend API address configuration\ncd ../frontend\ncat .env\n# VITE_API_URL should be http://localhost:8000\n\n# 3. Hard refresh browser\n# Chrome/Edge: Ctrl+Shift+R (Windows) or Cmd+Shift+R (Mac)\n\n# 4. Restart frontend dev server\nnpm run dev\n```\n\n---\n\n### Issue 7: Milvus Connection Failed\n\n**Symptom:** Vector search functionality not working\n\n**Solution:**\n\n```bash\ncd FinnewsHunter\n\n# Milvus requires longer startup time (approximately 60 seconds)\ndocker compose -f deploy/docker-compose.dev.yml logs milvus-standalone\n\n# Check health status\ndocker inspect finnews_milvus | grep -A 10 Health\n\n# Restart Milvus related services\ndocker compose -f deploy/docker-compose.dev.yml restart milvus-etcd milvus-minio milvus-standalone\n```\n\n---\n\n### Issue 8: Data Statistics Inaccurate\n\n**Symptom:** Homepage shows news count doesn't match actual\n\n**Solution:**\n\n```bash\n# Use reset script to clear data and start fresh\ncd FinnewsHunter\n./reset_all_data.sh\n```\n\n---\n\n### Common Debugging Commands\n\n```bash\ncd FinnewsHunter\n\n# View all container status\ndocker compose -f deploy/docker-compose.dev.yml ps\n\n# View complete logs for a service\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker --tail=500\n\n# Enter container for debugging\ndocker exec -it finnews_celery_worker bash\n\n# View database connection\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\\conninfo\"\n\n# View Redis connection\ndocker exec finnews_redis redis-cli INFO\n\n# Test network connectivity\ndocker exec finnews_celery_worker ping -c 3 postgres\n```\n\n---\n\n## ⚡ Quick Reference (Common Commands)\n\n### Project Directory\n\n```bash\ncd FinnewsHunter\n```\n\n### One-Click Operations\n\n```bash\n# Start all services\ndocker compose -f deploy/docker-compose.dev.yml up -d\n\n# Stop all services\ndocker compose -f deploy/docker-compose.dev.yml down\n\n# Restart Celery (after code updates)\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n\n# Clear all data and start fresh\n./reset_all_data.sh\n```\n\n### View Status\n\n```bash\n# Service status\ndocker compose -f deploy/docker-compose.dev.yml ps\n\n# News count\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT source, COUNT(*) FROM news GROUP BY source;\"\n\n# Task count\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT status, COUNT(*) FROM crawl_tasks GROUP BY status;\"\n\n# Redis cache\ndocker exec finnews_redis redis-cli DBSIZE\n```\n\n### View Logs\n\n```bash\n# Celery Beat (scheduled dispatch)\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n\n# Celery Worker (task execution)\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-worker\n\n# PostgreSQL\ndocker compose -f deploy/docker-compose.dev.yml logs -f postgres\n\n# All services\ndocker compose -f deploy/docker-compose.dev.yml logs -f\n```\n\n### Direct Access\n\n- **Frontend**: http://localhost:3000\n- **Backend API**: http://localhost:8000\n- **API Documentation**: http://localhost:8000/docs\n\n---\n\n## 📊 Database Structure\n\n### News Table\n- id, title, content, url, source\n- publish_time, stock_codes\n- sentiment_score, is_embedded\n\n### Analysis Table\n- id, news_id, agent_name\n- sentiment, sentiment_score, confidence\n- analysis_result, structured_data\n\n### Stock Table\n- id, code, name, industry, market\n\n---\n\n## 🛠️ Development Guide\n\n### Add New Crawler\n\n1. Inherit `BaseCrawler` class\n2. Implement `crawl()` method\n3. Register in `tools/__init__.py`\n\nExample:\n```python\n# backend/app/tools/custom_crawler.py\nfrom .crawler_base import BaseCrawler\n\nclass CustomCrawlerTool(BaseCrawler):\n    name = \"custom_crawler\"\n    \n    def crawl(self, start_page, end_page):\n        # Implement crawling logic\n        pass\n```\n\n### Use Enhanced Crawler (Optional)\n\nFor scenarios requiring JS rendering or intelligent content extraction, use enhanced crawler:\n\n```python\nfrom app.tools.crawler_enhanced import crawl_url, EnhancedCrawler\n\n# Quick crawl single URL\narticle = crawl_url(\"https://finance.sina.com.cn/xxx\", engine='auto')\nprint(article.to_markdown())\n\n# Get LLM message format (multimodal)\nllm_messages = article.to_llm_message()\n\n# Batch crawl (with cache)\ncrawler = EnhancedCrawler(use_cache=True)\narticles = crawler.crawl_batch(urls, delay=1.0)\n```\n\n**Supported Engines:**\n- `requests`: Basic HTTP requests (default)\n- `playwright`: JS rendering (requires `playwright install chromium`)\n- `jina`: Jina Reader API (requires `JINA_API_KEY` configuration)\n- `auto`: Automatically select best engine\n\n**Install Optional Dependencies:**\n\n```bash\npip install markdownify readabilipy playwright\nplaywright install chromium  # Optional, for JS rendering\n```\n\n---\n\n### Add New Agent\n\n1. Inherit `Agent` class\n2. Define role, goal, backstory\n3. Implement business methods\n\nExample:\n```python\n# backend/app/agents/risk_analyst.py\nfrom agenticx import Agent\n\nclass RiskAnalystAgent(Agent):\n    def __init__(self, llm_provider):\n        super().__init__(\n            name=\"RiskAnalyst\",\n            role=\"Risk Analyst\",\n            goal=\"Assess investment risks\",\n            llm_provider=llm_provider\n        )\n```\n\n---\n\n### Using AgenticX Components\n\nFinnewsHunter deeply integrates AgenticX framework core components to avoid reinventing the wheel:\n\n#### 1. Embedding Service\n\nThe system uses `agenticx.embeddings.BailianEmbeddingProvider` as the core embedding engine:\n\n```python\nfrom app.services.embedding_service import EmbeddingService\n\n# Synchronous interface (for sync contexts)\nembedding_service = EmbeddingService()\nvector = embedding_service.embed_text(\"text content\")\n\n# Asynchronous interface (recommended for async contexts)\nvector = await embedding_service.aembed_text(\"text content\")\n\n# Batch processing (Provider handles internal batching)\nvectors = embedding_service.embed_batch([\"text1\", \"text2\", \"text3\"])\n```\n\n**Features**:\n- Redis caching support to avoid duplicate calculations\n- Automatic text length limit handling (6000 characters)\n- Both sync and async interfaces to avoid event loop conflicts\n\n#### 2. Vector Storage (Milvus)\n\nThe system uses `agenticx.storage.vectordb_storages.milvus.MilvusStorage` as the vector database:\n\n```python\nfrom app.storage.vector_storage import VectorStorage\n\nvector_storage = VectorStorage()\n\n# Store single vector\nvector_storage.store_embedding(\n    news_id=1,\n    text=\"news content\",\n    embedding=[0.1, 0.2, ...]\n)\n\n# Batch storage\nvector_storage.store_embeddings_batch([\n    {\"news_id\": 1, \"text\": \"content1\", \"embedding\": [...]},\n    {\"news_id\": 2, \"text\": \"content2\", \"embedding\": [...]}\n])\n\n# Similarity search\nresults = vector_storage.search_similar(query_vector=[...], top_k=10)\n\n# Get statistics (with query count fallback mechanism)\nstats = vector_storage.get_stats()\n```\n\n**Features**:\n- Direct use of AgenticX MilvusStorage, no duplicate implementation\n- Compatibility interface for simplified calls\n- Query count fallback when `num_entities` is inaccurate\n- Async operation support to avoid blocking\n\n#### 3. Async Embedding Best Practices\n\nIn async contexts (e.g., FastAPI routes), use async interfaces:\n\n```python\nfrom app.services.embedding_service import EmbeddingService\nfrom app.storage.vector_storage import VectorStorage\n\nasync def analyze_news(news_id: int, text: str):\n    embedding_service = EmbeddingService()\n    vector_storage = VectorStorage()\n    \n    # Use async interface to avoid event loop conflicts\n    embedding = await embedding_service.aembed_text(text)\n    \n    # Store vector asynchronously in background (non-blocking)\n    asyncio.create_task(\n        vector_storage.store_embedding(news_id, text, embedding)\n    )\n    \n    # Continue with analysis logic...\n```\n\n**Notes**:\n- In async contexts, use `aembed_text()` instead of `embed_text()`\n- Embedding operations run asynchronously in background, non-blocking\n- Milvus `flush()` operation is optimized, not executed by default (relies on auto-flush)\n\n---\n\n## Multi-Agent Debate Architecture\n\nFinnewsHunter's core feature is the **bull-bear debate mechanism**, through collaboration and confrontation of multiple professional agents, deeply mining investment value and risks of individual stocks.\n\n### Core Participants\n\n| Agent | Role | Core Responsibilities |\n|-------|------|---------------------|\n| **BullResearcher** | Bull Researcher | Mine growth potential, core positives, valuation advantages |\n| **BearResearcher** | Bear Researcher | Identify downside risks, negative catalysts, refute optimistic expectations |\n| **SearchAnalyst** | Search Analyst | Dynamically acquire data (AkShare/BochaAI/browser search) |\n| **InvestmentManager** | Investment Manager | Host debate, evaluate argument quality, make final decisions |\n\n### Debate Data Flow Architecture\n\n```mermaid\ngraph TD\n    subgraph Debate Initiation\n        Manager[Investment Manager] -->|Opening Statement| Orchestrator[Debate Orchestrator]\n    end\n    \n    subgraph Multi-Round Debate\n        Orchestrator -->|Round N| Bull[Bull Researcher]\n        Bull -->|Statement + Data Request| Orchestrator\n        Orchestrator -->|Trigger Search| Searcher[Search Analyst]\n        \n        Searcher -->|Financial Data| AkShare[AkShare]\n        Searcher -->|Real-time News| BochaAI[BochaAI]\n        Searcher -->|Web Search| Browser[Browser Engine]\n        \n        AkShare --> Context[Update Context]\n        BochaAI --> Context\n        Browser --> Context\n        \n        Context --> Orchestrator\n        Orchestrator -->|Round N| Bear[Bear Researcher]\n        Bear -->|Statement + Data Request| Orchestrator\n    end\n    \n    subgraph Final Decision\n        Orchestrator -->|Intelligent Data Supplement| Searcher\n        Orchestrator -->|Comprehensive Judgment| Manager\n        Manager -->|Investment Rating| Result[Final Report]\n    end\n```\n\n### Dynamic Search Mechanism\n\nDuring debate, agents can request additional data through specific format:\n\n```\n[SEARCH: \"Recent gross margin data\" source:akshare]   -- Get financial data from AkShare\n[SEARCH: \"Industry competition analysis\" source:bochaai]   -- Search news from BochaAI\n[SEARCH: \"Recent fund flows\" source:akshare]       -- Get fund flows\n[SEARCH: \"Competitor comparison analysis\"]                       -- Automatically select best data source\n```\n\n**Supported Data Sources:**\n- **AkShare**: Financial indicators, K-line market data, fund flows, institutional holdings\n- **BochaAI**: Real-time news search, analyst reports\n- **Browser Search**: Baidu News, Sogou, 360 and other multi-engine search\n- **Knowledge Base**: Historical news and analysis data\n\n---\n\n## 📈 Roadmap\n\n### Phase 1: MVP (Completed) ✅\n- [x] Project infrastructure\n- [x] Database models\n- [x] Crawler tool refactoring (10 news sources)\n- [x] LLM service integration\n- [x] NewsAnalyst agent\n- [x] FastAPI routes\n- [x] React + TypeScript frontend\n\n### Phase 1.5: Multi-Provider LLM Support (Completed) ✅\n- [x] Support 5 major LLM providers (Bailian, OpenAI, DeepSeek, Kimi, Zhipu)\n- [x] Frontend dynamic model switching\n- [x] LLM config API (`/api/v1/llm/config`)\n- [x] News detail drawer (complete content + AI analysis)\n- [x] Real-time search functionality (multi-dimensional + keyword highlighting)\n- [x] Markdown rendering (supports tables, code blocks)\n- [x] One-click copy analysis report\n\n### Phase 1.6: Stock Analysis & Enhanced Crawler (Completed) ✅\n- [x] Stock K-line charts (integrated akshare + klinecharts)\n- [x] Multi-period support (Daily K/60min/30min/15min/5min/1min)\n- [x] Stock search (code/name fuzzy query, pre-loaded 5000+ A-shares)\n- [x] Enhanced crawler module\n  - [x] Multi-engine support (Requests/Playwright/Jina)\n  - [x] Intelligent content extraction (readabilipy + heuristic algorithms)\n  - [x] Content quality assessment and auto-retry\n  - [x] Cache mechanism and unified Article model\n\n### Phase 1.7: AgenticX Deep Integration & Batch Operations (Completed) ✅\n- [x] Migrated to AgenticX BailianEmbeddingProvider (removed redundant batch processing logic)\n- [x] Migrated to AgenticX MilvusStorage (simplified storage wrapper, removed duplicate code)\n- [x] Async vectorization interfaces (aembed_text/aembed_batch), avoid event loop conflicts\n- [x] Background async vectorization, non-blocking analysis flow\n- [x] Milvus statistics optimization (query count fallback mechanism)\n- [x] Frontend batch selection functionality (checkboxes + Shift range selection)\n- [x] Batch delete news functionality\n- [x] Batch analyze news functionality (with progress display and result statistics)\n- [x] Docker Compose optimization (Celery image build, improved startup performance)\n\n### Phase 2: Multi-Agent Debate (Completed) ✅\n- [x] BullResearcher & BearResearcher agents\n- [x] SearchAnalyst search analyst (dynamic data acquisition)\n- [x] InvestmentManager investment manager decision\n- [x] Debate orchestrator (DebateOrchestrator)\n- [x] Dynamic search mechanism (on-demand data acquisition during debate)\n- [x] Three debate modes: parallel analysis, real-time debate, quick analysis\n- [ ] Real-time WebSocket push (in progress)\n- [ ] Agent execution trace visualization (in progress)\n\n### Phase 3: Knowledge Enhancement (Planned)\n- [ ] Financial knowledge graph (Neo4j)\n- [ ] Agent memory system\n- [ ] GraphRetriever graph retrieval\n\n### Phase 4: Self-Evolution (Planned)\n- [ ] ACE framework integration\n- [ ] Investment strategy Playbook\n- [ ] Decision effectiveness evaluation and learning\n\n---\n\n## 📄 License\n\nThis project follows the AgenticX license.\n\n---\n\n## 🙏 Acknowledgments\n\n- [AgenticX](https://github.com/yourusername/AgenticX) - Multi-agent framework\n- [FastAPI](https://fastapi.tiangolo.com/) - Web framework\n- [Milvus](https://milvus.io/) - Vector database\n- [Alibaba Cloud Bailian](https://dashscope.console.aliyun.com/) - LLM service\n- [Shadcn UI](https://ui.shadcn.com/) - Frontend component library\n\n---\n\n## ⭐ Star History\n\nIf you find this project helpful, please give it a Star ⭐️!\n\n[![Star History Chart](https://api.star-history.com/svg?repos=DemonDamon/FinnewsHunter&type=Date)](https://star-history.com/#DemonDamon/FinnewsHunter&Date)\n\n---\n\n**Built with ❤️ using AgenticX**\n"
  },
  {
    "path": "README_zn.md",
    "content": "# FinnewsHunter：金融新闻驱动的多智能体投资决策平台\n\n<div align=\"right\">\n  <a href=\"README_zn.md\">中文版</a> | <a href=\"README.md\">English</a>\n</div>\n\n<div align=\"center\">\n  <img src=\"assets/images/FINNEWS_HUNTER_LOGO.png\" alt=\"FinnewsHunter Logo\" width=\"450\">\n</div>\n\n基于 [AgenticX](https://github.com/DemonDamon/AgenticX) 框架构建的企业级金融新闻分析系统，融合实时新闻流、深度量化分析和多智能体辩论机制。\n\nFinnewsHunter 不再局限于传统的文本分类，而是部署多智能体战队（NewsAnalyst, Researcher 等），实时监控新浪财经、每经网、金融界、证券时报等多源财经资讯。利用大模型进行深度解读、情感分析与市场影响评估，并结合知识图谱挖掘潜在的投资机会与风险，为量化交易提供决策级别的阿尔法信号。\n\n---\n\n## 🎯 项目特色\n\n- ✅ **AgenticX 原生**: 深度集成 AgenticX 框架，使用 Agent、Tool、Workflow 等核心抽象\n- ✅ **AgenticX 组件集成**: 直接使用 AgenticX 的 `BailianEmbeddingProvider` 和 `MilvusStorage`，避免重复造轮子\n- ✅ **智能体驱动**: NewsAnalyst 智能体自动分析新闻情感和市场影响\n- ✅ **多厂商 LLM 支持**: 支持百炼、OpenAI、DeepSeek、Kimi、智谱 5 大厂商，前端一键切换\n- ✅ **批量操作**: 支持批量选择、批量删除、批量分析新闻，提高操作效率\n- ✅ **股票 K 线分析**: 集成 akshare 真实行情数据，支持日K/分K多周期展示\n- ✅ **股票智能搜索**: 支持代码和名称模糊查询，预加载 5000+ A股数据\n- ✅ **完整技术栈**: FastAPI + PostgreSQL + Milvus + Redis + React\n- ✅ **实时搜索**: 支持标题、内容、股票代码多维度搜索，关键词高亮\n- ✅ **异步向量化**: 后台异步执行向量化，不阻塞分析流程\n- ✅ **生产就绪**: Docker Compose 一键部署，日志、监控完备\n\n---\n\n## 🏗️ 系统架构\n\n![FinnewsHunter Architecture](assets/images/arch-20251201.png)\n\n系统采用分层架构设计：\n- **M6 前端交互层**: React + TypeScript + Shadcn UI\n- **M1 平台服务层**: FastAPI Gateway + Task Manager\n- **M4/M5 智能体协同层**: AgenticX Agent + Debate Workflow\n- **M2/M3 基础设施层**: Crawler Service + LLM Service + Embedding\n- **M7-M11 存储与学习层**: PostgreSQL + Milvus + Redis + ACE Framework\n\n---\n\n## 🚀 快速开始\n\n### 前置条件\n\n- Python 3.11+\n- Docker & Docker Compose\n- (可选) OpenAI API Key 或本地 LLM\n- Node.js 18+ (前端开发)\n\n### 1. 安装 AgenticX\n\n```bash\ncd /Users/damon/myWork/AgenticX\npip install -e .\n```\n\n### 2. 安装后端依赖\n\n```bash\ncd FinnewsHunter/backend\npip install -r requirements.txt\n```\n\n### 3. 配置环境变量\n\n```bash\ncd FinnewsHunter/backend\ncp env.example .env\n# 编辑 .env 文件，填入 LLM API Key 等配置\n```\n\n**多厂商 LLM 配置说明：**\n\n系统支持 5 个 LLM 厂商，至少配置一个即可使用：\n\n| 厂商 | 环境变量 | 获取地址 |\n|------|----------|----------|\n| 百炼（阿里云） | `DASHSCOPE_API_KEY` | https://dashscope.console.aliyun.com/ |\n| OpenAI | `OPENAI_API_KEY` | https://platform.openai.com/api-keys |\n| DeepSeek | `DEEPSEEK_API_KEY` | https://platform.deepseek.com/ |\n| Kimi（Moonshot） | `MOONSHOT_API_KEY` | https://platform.moonshot.cn/ |\n| 智谱 | `ZHIPU_API_KEY` | https://open.bigmodel.cn/ |\n\n**示例配置（推荐百炼）：**\n\n```bash\n# 百炼（阿里云）- 推荐，国内访问快\nDASHSCOPE_API_KEY=sk-your-dashscope-key\nDASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nBAILIAN_MODELS=qwen-plus,qwen-max,qwen-turbo\n\n# 可选：其他厂商\nOPENAI_API_KEY=sk-your-openai-key\nDEEPSEEK_API_KEY=sk-your-deepseek-key\n```\n\n### 4. 启动基础服务（PostgreSQL、Redis、Milvus）\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml up -d postgres redis milvus-etcd milvus-minio milvus-standalone\n```\n\n### 5. 初始化数据库\n\n```bash\ncd FinnewsHunter/backend\npython init_db.py\n```\n\n### 5.1 初始化股票数据（可选，用于股票搜索功能）\n\n```bash\ncd FinnewsHunter/backend\npython -m app.scripts.init_stocks\n# 将从 akshare 获取全部 A 股数据（约 5000+ 只）并存入数据库\n```\n\n### 6. 启动后端API服务\n\n```bash\ncd FinnewsHunter/backend\nuvicorn app.main:app --reload --host 0.0.0.0 --port 8000\n```\n\n### 7. 启动Celery Worker和Beat（自动爬取）\n\n```bash\n# 新开一个终端\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml up -d celery-worker celery-beat\n```\n\n### 8. 启动前端服务\n\n```bash\n# 新开一个终端\ncd FinnewsHunter/frontend\nnpm install  # 首次需要安装依赖\nnpm run dev\n```\n\n### 9. 访问应用\n\n- **前端界面**: http://localhost:3000\n- **后端 API**: http://localhost:8000\n- **API 文档**: http://localhost:8000/docs\n\n---\n\n## 🔄 服务管理\n\n### 查看所有服务状态\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml ps\n```\n\n### 重启所有服务\n\n```bash\ncd FinnewsHunter\n\n# 重启Docker服务（基础设施 + Celery）\ndocker compose -f deploy/docker-compose.dev.yml restart\n\n# 如果后端API是独立启动的，需要手动重启\n# Ctrl+C 停止后端进程，然后重新运行：\ncd backend\nuvicorn app.main:app --reload --host 0.0.0.0 --port 8000\n```\n\n### 重启特定服务\n\n```bash\ncd FinnewsHunter\n\n# 只重启Celery（应用代码更改后）\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n\n# 只重启数据库\ndocker compose -f deploy/docker-compose.dev.yml restart postgres\n\n# 只重启Redis\ndocker compose -f deploy/docker-compose.dev.yml restart redis\n```\n\n### 停止所有服务\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml down\n```\n\n### 查看日志\n\n```bash\ncd FinnewsHunter\n\n# 查看Celery Worker日志\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-worker\n\n# 查看Celery Beat日志（定时任务调度）\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n\n# 查看PostgreSQL日志\ndocker compose -f deploy/docker-compose.dev.yml logs -f postgres\n\n# 查看所有服务日志\ndocker compose -f deploy/docker-compose.dev.yml logs -f\n```\n\n---\n\n## 🗑️ 重置数据库\n\n### 方式1：使用一键重置脚本（推荐）⭐\n\n```bash\ncd FinnewsHunter\n\n# 执行重置脚本\n./reset_all_data.sh\n\n# 输入 yes 确认\n```\n\n**脚本会自动完成：**\n1. ✅ 清空PostgreSQL中的所有新闻和任务数据\n2. ✅ 清空Redis缓存\n3. ✅ 重置数据库自增ID（从1重新开始）\n4. ✅ 清空Celery调度文件\n5. ✅ 自动重启Celery服务\n\n**执行后等待：**\n- 5-10分钟系统会自动重新爬取数据\n- 访问前端查看新数据\n\n---\n\n### 方式2：手动重置（高级）\n\n#### 步骤1：清空PostgreSQL数据\n\n```bash\n# 进入PostgreSQL容器\ndocker exec -it finnews_postgres psql -U finnews -d finnews_db\n```\n\n在PostgreSQL命令行中执行：\n\n```sql\n-- 清空新闻表\nDELETE FROM news;\n\n-- 清空任务表\nDELETE FROM crawl_tasks;\n\n-- 清空分析表\nDELETE FROM analyses;\n\n-- 重置自增ID\nALTER SEQUENCE news_id_seq RESTART WITH 1;\nALTER SEQUENCE crawl_tasks_id_seq RESTART WITH 1;\nALTER SEQUENCE analyses_id_seq RESTART WITH 1;\n\n-- 验证结果（应该都是0）\nSELECT 'news表', COUNT(*) FROM news;\nSELECT 'crawl_tasks表', COUNT(*) FROM crawl_tasks;\nSELECT 'analyses表', COUNT(*) FROM analyses;\n\n-- 退出\n\\q\n```\n\n#### 步骤2：清空Redis缓存\n\n```bash\ncd FinnewsHunter\ndocker exec finnews_redis redis-cli FLUSHDB\n```\n\n#### 步骤3：清空Celery调度文件\n\n```bash\ncd FinnewsHunter/backend\nrm -f celerybeat-schedule*\n```\n\n#### 步骤4：重启Celery服务\n\n```bash\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n```\n\n#### 步骤5：验证数据已清空\n\n```bash\n# 检查新闻数量（应该是0）\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT COUNT(*) FROM news;\"\n\n# 检查Redis（应该是0或很小）\ndocker exec finnews_redis redis-cli DBSIZE\n\n# 查看Celery是否开始爬取\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n# 应该看到每分钟触发10个爬取任务\n```\n\n---\n\n### 方式3：使用Python脚本重置\n\n```bash\ncd FinnewsHunter/backend\npython reset_database.py\n# 输入 yes 确认\n```\n\n---\n\n### 方式4：快速手动清理（一行命令）🔥\n\n**适用场景：** 当重置脚本不工作时，使用此方法最快速\n\n```bash\ncd FinnewsHunter\n\n# 步骤1：清空数据库表\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"DELETE FROM news; DELETE FROM crawl_tasks; DELETE FROM analyses;\"\n\n# 步骤2：重置自增ID\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"ALTER SEQUENCE news_id_seq RESTART WITH 1; ALTER SEQUENCE crawl_tasks_id_seq RESTART WITH 1; ALTER SEQUENCE analyses_id_seq RESTART WITH 1;\"\n\n# 步骤3：清空Redis缓存\ndocker exec finnews_redis redis-cli FLUSHDB\n\n# 步骤4：清空Celery调度文件\nrm -f backend/celerybeat-schedule*\n\n# 步骤5：重启Celery服务\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n\n# 步骤6：验证是否清空（应该显示0）\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT COUNT(*) FROM news;\"\n```\n\n**执行后立即刷新浏览器：**\n- Mac: `Command + Shift + R`\n- Windows: `Ctrl + Shift + R`\n\n---\n\n### 🖥️ 清除前端缓存（重要！）\n\n**数据清空后，前端可能仍显示旧数据，这是因为浏览器缓存。**\n\n#### 方法1：硬刷新浏览器（推荐）⭐\n\n**Mac系统：**\n```\n按 Command + Shift + R\n或 Command + Option + R\n```\n\n**Windows/Linux系统：**\n```\n按 Ctrl + Shift + R\n或 Ctrl + F5\n```\n\n#### 方法2：开发者工具清空缓存\n\n1. 按 `F12` 打开开发者工具\n2. 右键点击刷新按钮（地址栏旁边）\n3. 选择 **\"清空缓存并硬性重新加载\"**\n\n#### 方法3：清除浏览器缓存\n\n1. **Chrome/Edge:**\n   - `Command + Shift + Delete` (Mac) 或 `Ctrl + Shift + Delete` (Windows)\n   - 勾选\"缓存的图片和文件\"\n   - 时间范围选择\"全部\"\n   - 点击\"清除数据\"\n\n2. **刷新页面后，再次硬刷新**\n   - 确保React Query缓存也被清除\n\n#### 方法4：重启前端开发服务器（最彻底）\n\n```bash\n# 在前端终端按 Ctrl+C 停止服务\n# 然后重新启动\ncd FinnewsHunter/frontend\nnpm run dev\n```\n\n---\n\n## 📊 重置后的数据恢复时间线\n\n| 时间 | 事件 | 预期结果 |\n|------|------|----------|\n| 0分钟 | 执行重置脚本 | 数据库清空，Redis清空 |\n| 1分钟 | Celery Beat开始调度 | 10个爬取任务被触发 |\n| 2-5分钟 | 第一批新闻保存 | 数据库开始有数据 |\n| 5-10分钟 | 所有源都有数据 | 前端可看到100+条新闻 |\n| 30分钟 | 数据持续增长 | 500+条新闻 |\n| 1小时 | 稳定运行 | 1000-2000条新闻 |\n\n**注意：**\n- 重置后需要等待5-10分钟才能看到新数据\n- **前端必须硬刷新**（Command+Shift+R / Ctrl+Shift+R）清除缓存\n- 不要频繁重置，会影响系统稳定性\n\n**重置后立即硬刷新前端的步骤：**\n1. 执行重置命令\n2. **立即**在浏览器按 `Command + Shift + R` (Mac) 或 `Ctrl + Shift + R` (Windows)\n3. 等待5-10分钟后再次刷新查看新数据\n\n---\n\n## ⚠️ 爬虫状态检查\n\n### 查看哪些源正常工作\n\n```bash\ncd FinnewsHunter\n\n# 查看各源的新闻数量\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\nSELECT source, COUNT(*) as count \nFROM news \nWHERE created_at > NOW() - INTERVAL '1 hour'\nGROUP BY source \nORDER BY count DESC;\n\"\n\n# 查看最近的爬取任务状态\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\nSELECT source, \n       crawled_count, \n       saved_count, \n       status,\n       error_message \nFROM crawl_tasks \nWHERE created_at > NOW() - INTERVAL '10 minutes'\nORDER BY created_at DESC \nLIMIT 20;\n\"\n```\n\n### 查看爬取错误\n\n```bash\ncd FinnewsHunter\n\n# 查看ERROR日志\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker | grep ERROR\n\n# 查看特定源的问题\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker | grep \"jwview\"\n```\n\n---\n\n## 📚 使用指南\n\n### 自动爬取模式（推荐）⭐\n\n**系统已配置10个新闻源的自动爬取：**\n\n1. 🌐 新浪财经\n2. 🐧 腾讯财经\n3. 💰 金融界\n4. 📊 经济观察网\n5. 📈 财经网\n6. 📉 21经济网\n7. 📰 每日经济新闻\n8. 🎯 第一财经\n9. 📧 网易财经\n10. 💎 东方财富\n\n**工作方式：**\n- ✅ Celery Beat 每1分钟自动触发所有源的爬取\n- ✅ 自动去重（URL级别）\n- ✅ 智能时间筛选（保留24小时内新闻）\n- ✅ 股票关键词筛选\n- ✅ 无需手动操作\n\n**查看爬取进度：**\n\n```bash\n# 查看Celery Beat调度日志\ncd FinnewsHunter\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n\n# 查看Celery Worker执行日志\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-worker\n```\n\n---\n\n### 手动刷新（立即获取最新）\n\n**方式 1: 通过前端**\n1. 访问 http://localhost:3000/news\n2. 点击右上角\"🔄 立即刷新\"按钮\n3. 系统会立即触发爬取，约2分钟后数据更新\n\n**方式 2: 通过 API**\n```bash\n# 强制刷新新浪财经\ncurl -X POST \"http://localhost:8000/api/v1/news/refresh?source=sina\"\n\n# 强制刷新所有源（需要逐个调用）\nfor source in sina tencent jwview eeo caijing jingji21 nbd yicai 163 eastmoney; do\n  curl -X POST \"http://localhost:8000/api/v1/news/refresh?source=$source\"\n  sleep 1\ndone\n```\n\n---\n\n### 查看新闻列表\n\n**方式 1: 通过前端（推荐）**\n- 访问 http://localhost:3000\n- 首页：查看来源统计和最新新闻\n- 新闻流：按来源和情感筛选新闻\n- 支持批量选择：使用复选框选择多条新闻，支持 Shift 键范围选择\n- 批量操作：全选/取消全选、批量删除、批量分析\n\n**方式 2: 通过 API**\n\n```bash\n# 获取所有来源的最新新闻（200条）\ncurl \"http://localhost:8000/api/v1/news/latest?limit=200\"\n\n# 获取特定来源的新闻\ncurl \"http://localhost:8000/api/v1/news/latest?source=sina&limit=50\"\n\n# 按情感筛选（使用旧接口）\ncurl \"http://localhost:8000/api/v1/news/?sentiment=positive&limit=20\"\n\n# 获取所有可用的新闻源列表\ncurl \"http://localhost:8000/api/v1/news/sources\"\n```\n\n---\n\n### 批量操作新闻\n\n**前端操作：**\n1. **批量选择**：\n   - 点击新闻卡片左侧的复选框选择单条新闻\n   - 按住 Shift 键点击可进行范围选择\n   - 使用顶部工具栏的\"全选\"按钮选择当前筛选结果的所有新闻\n   - 切换新闻源或筛选条件时，选择状态会自动清空\n\n2. **批量删除**：\n   - 选择多条新闻后，点击顶部工具栏的\"批量删除\"按钮\n   - 确认删除对话框后，选中的新闻将被删除\n   - 删除后会自动刷新列表\n\n3. **批量分析**：\n   - 选择多条新闻后，点击顶部工具栏的\"批量分析\"按钮\n   - 系统会依次分析选中的新闻，显示进度和结果统计\n   - 分析完成后会显示成功/失败数量\n\n**API 操作：**\n```bash\n# 批量删除新闻\ncurl -X POST \"http://localhost:8000/api/v1/news/batch/delete\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"news_ids\": [1, 2, 3]}'\n\n# 批量分析新闻\ncurl -X POST \"http://localhost:8000/api/v1/analysis/batch\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"news_ids\": [1, 2, 3], \"provider\": \"bailian\", \"model\": \"qwen-plus\"}'\n```\n\n---\n\n### 分析新闻\n\n**方式 1: 通过前端**\n- 在新闻卡片上点击\"✨ 分析\"按钮\n- 等待3-5秒查看分析结果\n- 点击新闻卡片打开详情抽屉，查看完整分析内容\n\n**方式 2: 通过 API**\n```bash\n# 分析指定ID的新闻（使用默认模型）\ncurl -X POST http://localhost:8000/api/v1/analysis/news/1\n\n# 分析新闻（指定模型）\ncurl -X POST http://localhost:8000/api/v1/analysis/news/1 \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"provider\": \"bailian\", \"model\": \"qwen-max\"}'\n\n# 查看分析结果\ncurl http://localhost:8000/api/v1/analysis/1\n```\n\n---\n\n### 切换 LLM 模型\n\n**前端操作：**\n1. 点击右上角的模型选择器（显示当前模型名称）\n2. 在下拉菜单中选择不同的厂商和模型\n3. 选择后自动保存，后续分析将使用新模型\n\n**支持的模型：**\n- 🔥 **百炼**: qwen-plus, qwen-max, qwen-turbo, qwen-long\n- 🤖 **OpenAI**: gpt-4, gpt-4-turbo, gpt-3.5-turbo\n- 🧠 **DeepSeek**: deepseek-chat, deepseek-coder\n- 🌙 **Kimi**: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k\n- 🔮 **智谱**: glm-4, glm-4-plus, glm-4-air\n\n**API 获取可用模型列表：**\n```bash\ncurl http://localhost:8000/api/v1/llm/config\n```\n\n---\n\n### 搜索新闻\n\n**前端操作：**\n1. 在顶部搜索框输入关键词\n2. 支持搜索：标题、内容、股票代码、来源\n3. 匹配的关键词会高亮显示\n4. 搜索带有 300ms 防抖，输入停止后自动搜索\n\n**搜索示例：**\n- 搜索股票代码：`600519`（贵州茅台）\n- 搜索关键词：`新能源`、`半导体`\n- 搜索来源：`sina`、`eastmoney`\n\n---\n\n### 查看新闻详情\n\n**前端操作：**\n1. 点击任意新闻卡片\n2. 右侧滑出详情抽屉，展示：\n   - 📰 新闻标题和来源\n   - 📊 情感评分（利好/利空/中性）\n   - 📈 关联股票代码\n   - 📝 完整新闻内容\n   - 🤖 AI 分析结果（Markdown 格式）\n   - 🔗 原文链接\n3. 点击\"复制分析内容\"可复制 Markdown 格式的分析报告\n\n---\n\n### 股票 K 线分析\n\n**前端操作：**\n1. 访问 http://localhost:3000/stocks/SH600519（贵州茅台示例）\n2. 使用右上角搜索框输入股票代码或名称（如 `茅台`、`600519`）\n3. 选择时间周期：日K、60分、30分、15分、5分、1分\n4. 图表支持：\n   - 📈 K 线蜡烛图（OHLC）\n   - 📊 成交量柱状图\n   - 📉 MA 均线（5/10/30/60日）\n\n**API 操作：**\n\n```bash\n# 获取 K 线数据（日线，默认180条）\ncurl \"http://localhost:8000/api/v1/stocks/SH600519/kline?period=daily&limit=180\"\n\n# 获取分钟 K 线（60分钟线）\ncurl \"http://localhost:8000/api/v1/stocks/SH600519/kline?period=60m&limit=200\"\n\n# 搜索股票\ncurl \"http://localhost:8000/api/v1/stocks/search/realtime?q=茅台&limit=10\"\n\n# 查看数据库中的股票数量\ncurl \"http://localhost:8000/api/v1/stocks/count\"\n```\n\n---\n\n### 按来源筛选查看\n\n**前端操作：**\n\n1. **首页（Dashboard）**\n   - 查看\"新闻来源统计\"卡片\n   - 点击任意来源按钮筛选\n   - 显示该来源的新闻数量和列表\n\n2. **新闻流页面**\n   - 顶部有10个来源筛选按钮\n   - 点击切换查看不同来源\n   - 支持来源+情感双重筛选\n\n**API操作：**\n\n```bash\n# 查看新浪财经的新闻\ncurl \"http://localhost:8000/api/v1/news/latest?source=sina&limit=50\"\n\n# 查看每日经济新闻\ncurl \"http://localhost:8000/api/v1/news/latest?source=nbd&limit=50\"\n\n# 查看所有来源\ncurl \"http://localhost:8000/api/v1/news/latest?limit=200\"\n```\n\n---\n\n## 🏗️ 项目结构\n\n```\nFinnewsHunter/\n├── backend/                    # 后端服务\n│   ├── app/\n│   │   ├── agents/            # 智能体定义（NewsAnalyst、辩论智能体等）\n│   │   ├── api/v1/            # FastAPI 路由\n│   │   │   ├── analysis.py    # 分析 API（支持批量分析）\n│   │   │   ├── llm_config.py  # LLM 配置 API\n│   │   │   ├── news_v2.py     # 新闻 API（支持批量删除）\n│   │   │   └── ...\n│   │   ├── core/              # 核心配置（config, database, redis, neo4j）\n│   │   ├── models/            # SQLAlchemy 数据模型\n│   │   ├── services/          # 业务服务\n│   │   │   ├── llm_service.py      # LLM 服务（支持多厂商）\n│   │   │   ├── analysis_service.py # 分析服务（异步向量化）\n│   │   │   ├── embedding_service.py # 向量化服务（基于 AgenticX BailianEmbeddingProvider）\n│   │   │   └── stock_data_service.py # 股票数据服务\n│   │   ├── storage/           # 存储封装\n│   │   │   └── vector_storage.py # Milvus 向量存储（基于 AgenticX MilvusStorage）\n│   │   ├── tasks/             # Celery 任务\n│   │   └── tools/              # AgenticX 工具（Crawler, Cleaner）\n│   ├── tests/                 # 测试和工具脚本\n│   │   ├── check_milvus_data.py           # 检查 Milvus 向量存储数据\n│   │   ├── check_news_embedding_status.py # 检查新闻向量化状态\n│   │   └── manual_vectorize.py           # 手动向量化指定新闻\n│   ├── env.example            # 环境变量模板\n│   └── requirements.txt       # Python 依赖\n├── frontend/                  # React 前端\n│   └── src/\n│       ├── components/        # 组件\n│       │   ├── ModelSelector.tsx    # LLM 模型选择器\n│       │   ├── NewsDetailDrawer.tsx # 新闻详情抽屉\n│       │   └── HighlightText.tsx    # 关键词高亮\n│       ├── context/           # React Context\n│       ├── hooks/             # 自定义 Hooks\n│       │   └── useDebounce.ts # 防抖 Hook\n│       ├── layout/            # 布局组件\n│       └── pages/             # 页面组件\n│           └── NewsListPage.tsx # 新闻列表页面（支持批量操作）\n├── deploy/                    # 部署配置\n│   ├── docker-compose.dev.yml # Docker Compose 配置\n│   ├── Dockerfile.celery     # Celery 镜像构建文件\n│   └── celery-entrypoint.sh  # Celery 容器启动脚本\n├── conclusions/               # 模块摘要文档\n│   ├── backend/              # 后端模块总结\n│   └── frontend/             # 前端模块总结\n└── .dev-docs/                 # 开发文档\n```\n\n---\n\n## 🧪 测试与验收\n\n### MVP 验收标准\n\n- [x] 新闻爬取成功并存入 PostgreSQL\n- [x] NewsAnalyst 调用 LLM 完成分析\n- [x] 分析结果包含情感评分\n- [x] 前端能够展示新闻和分析结果\n- [x] 支持多厂商 LLM 动态切换\n- [x] 新闻详情展示完整分析内容\n- [x] 实时搜索和筛选功能\n- [x] 批量选择、批量删除、批量分析功能\n- [x] 基于 AgenticX 的向量化和存储服务\n- [x] 异步向量化，不阻塞分析流程\n\n### 测试流程\n\n1. **启动所有服务**\n   ```bash\n   ./start.sh\n   ```\n\n2. **检查 Docker 容器状态**\n   ```bash\n   docker ps\n   # 应看到: postgres, redis, milvus-standalone, milvus-etcd, milvus-minio\n   ```\n\n3. **测试新闻爬取**\n   ```bash\n   curl -X POST http://localhost:8000/api/v1/news/crawl \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\"source\": \"sina\", \"start_page\": 1, \"end_page\": 1}'\n   \n   # 等待 5-10 秒后查看结果\n   curl http://localhost:8000/api/v1/news/?limit=5\n   ```\n\n4. **测试智能体分析**\n   ```bash\n   # 获取第一条新闻的ID\n   NEWS_ID=$(curl -s http://localhost:8000/api/v1/news/?limit=1 | jq '.[0].id')\n   \n   # 触发分析\n   curl -X POST http://localhost:8000/api/v1/analysis/news/$NEWS_ID\n   \n   # 查看分析结果\n   curl http://localhost:8000/api/v1/analysis/1\n   ```\n\n5. **测试前端界面**\n   - 打开 `frontend/index.html`\n   - 点击\"爬取新闻\"并等待完成\n   - 选择一条新闻点击\"分析\"\n   - 查看情感评分是否显示\n\n---\n\n## 🔧 故障排查\n\n### 问题 1: 数据库连接失败\n\n**症状：** 后端启动报错 `could not connect to database`\n\n**解决方法：**\n\n```bash\ncd FinnewsHunter\n\n# 检查 PostgreSQL 是否启动\ndocker ps | grep postgres\n\n# 查看日志\ndocker compose -f deploy/docker-compose.dev.yml logs postgres\n\n# 重启容器\ndocker compose -f deploy/docker-compose.dev.yml restart postgres\n\n# 等待30秒后重试后端启动\n```\n\n---\n\n### 问题 2: Celery任务不执行\n\n**症状：** 前端显示新闻数量为0，没有自动爬取\n\n**排查步骤：**\n\n```bash\ncd FinnewsHunter\n\n# 1. 检查Celery Worker是否运行\ndocker ps | grep celery\n\n# 2. 查看Celery Beat日志（应该看到每分钟触发任务）\ndocker compose -f deploy/docker-compose.dev.yml logs celery-beat --tail=100\n\n# 3. 查看Celery Worker日志（查看任务执行情况）\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker --tail=100\n\n# 4. 检查Redis连接\ndocker exec finnews_redis redis-cli PING\n# 应该返回 PONG\n\n# 5. 重启Celery服务\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n```\n\n---\n\n### 问题 3: 爬取失败（404错误）\n\n**症状：** Celery日志显示 `404 Client Error: Not Found`\n\n**原因：** 新闻网站URL已变更\n\n**解决方法：**\n\n```bash\n# 1. 手动访问URL验证是否可用\ncurl -I https://finance.caijing.com.cn/\n\n# 2. 如果URL变更，更新对应爬虫的配置\n# 编辑 backend/app/tools/{source}_crawler.py\n# 更新 BASE_URL 和 STOCK_URL\n\n# 3. 清理Python缓存\ncd FinnewsHunter/backend\nfind . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true\n\n# 4. 重启Celery\ncd ..\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n```\n\n---\n\n### 问题 4: 只有新浪财经有数据\n\n**症状：** 其他9个来源没有新闻\n\n**可能原因：**\n1. Celery Beat配置不完整\n2. 爬虫代码有错误\n3. 网站URL不正确\n\n**解决方法：**\n\n```bash\ncd FinnewsHunter\n\n# 1. 检查Celery Beat配置\ndocker compose -f deploy/docker-compose.dev.yml logs celery-beat | grep \"crawl-\"\n# 应该看到10个定时任务（crawl-sina, crawl-tencent, ..., crawl-eastmoney）\n\n# 2. 手动测试单个源的爬取\ndocker exec -it finnews_celery_worker python -c \"\nfrom app.tools import get_crawler_tool\ncrawler = get_crawler_tool('nbd')  # 测试每日经济新闻\nnews = crawler.crawl()\nprint(f'爬取到 {len(news)} 条新闻')\n\"\n\n# 3. 查看数据库中各源的数据量\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\nSELECT source, COUNT(*) as count \nFROM news \nGROUP BY source \nORDER BY count DESC;\n\"\n\n# 4. 如果某个源一直失败，查看详细错误\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker | grep \"ERROR\"\n```\n\n---\n\n### 问题 5: LLM 调用失败\n\n**症状：** 分析功能不工作，报错 `LLM Provider NOT provided`\n\n**解决方法：**\n\n```bash\ncd FinnewsHunter/backend\n\n# 1. 检查 API Key 是否配置\ngrep -E \"DASHSCOPE_API_KEY|OPENAI_API_KEY|DEEPSEEK_API_KEY\" .env\n\n# 2. 检查 Base URL 是否正确（百炼必须配置）\ngrep DASHSCOPE_BASE_URL .env\n# 应该是: https://dashscope.aliyuncs.com/compatible-mode/v1\n\n# 3. 验证 LLM 配置 API 是否正常\ncurl http://localhost:8000/api/v1/llm/config | jq '.providers[].has_api_key'\n# 至少有一个返回 true\n\n# 4. 如果使用百炼，确保配置完整\ncat >> .env << EOF\nDASHSCOPE_API_KEY=sk-your-key\nDASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nBAILIAN_MODELS=qwen-plus,qwen-max\nEOF\n\n# 5. 重启后端服务\n```\n\n---\n\n### 问题 6: 前端显示空白或CORS错误\n\n**症状：** 前端无法加载数据，浏览器Console显示CORS错误\n\n**解决方法：**\n\n```bash\n# 1. 检查后端CORS配置\ncd FinnewsHunter/backend\ngrep BACKEND_CORS_ORIGINS .env\n# 应该包含 http://localhost:3000\n\n# 2. 检查前端API地址配置\ncd ../frontend\ncat .env\n# VITE_API_URL 应该是 http://localhost:8000\n\n# 3. 硬刷新浏览器\n# Chrome/Edge: Ctrl+Shift+R (Windows) 或 Cmd+Shift+R (Mac)\n\n# 4. 重启前端开发服务器\nnpm run dev\n```\n\n---\n\n### 问题 7: Milvus 连接失败\n\n**症状：** 向量搜索功能不工作\n\n**解决方法：**\n\n```bash\ncd FinnewsHunter\n\n# Milvus 需要较长启动时间（约 60 秒）\ndocker compose -f deploy/docker-compose.dev.yml logs milvus-standalone\n\n# 检查健康状态\ndocker inspect finnews_milvus | grep -A 10 Health\n\n# 重启Milvus相关服务\ndocker compose -f deploy/docker-compose.dev.yml restart milvus-etcd milvus-minio milvus-standalone\n```\n\n---\n\n### 问题 8: 数据统计不准确\n\n**症状：** 首页显示的新闻数和实际不符\n\n**解决方法：**\n\n```bash\n# 使用重置脚本清空数据重新开始\ncd FinnewsHunter\n./reset_all_data.sh\n```\n\n---\n\n### 常用调试命令\n\n```bash\ncd FinnewsHunter\n\n# 查看所有容器状态\ndocker compose -f deploy/docker-compose.dev.yml ps\n\n# 查看某个服务的完整日志\ndocker compose -f deploy/docker-compose.dev.yml logs celery-worker --tail=500\n\n# 进入容器调试\ndocker exec -it finnews_celery_worker bash\n\n# 查看数据库连接\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"\\conninfo\"\n\n# 查看Redis连接\ndocker exec finnews_redis redis-cli INFO\n\n# 测试网络连通性\ndocker exec finnews_celery_worker ping -c 3 postgres\n```\n\n---\n\n## ⚡ 快速参考（常用命令）\n\n### 项目目录\n\n```bash\ncd FinnewsHunter\n```\n\n### 一键操作\n\n```bash\n# 启动所有服务\ndocker compose -f deploy/docker-compose.dev.yml up -d\n\n# 停止所有服务\ndocker compose -f deploy/docker-compose.dev.yml down\n\n# 重启Celery（代码更新后）\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n\n# 清空所有数据重新开始\n./reset_all_data.sh\n```\n\n### 查看状态\n\n```bash\n# 服务状态\ndocker compose -f deploy/docker-compose.dev.yml ps\n\n# 新闻数量\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT source, COUNT(*) FROM news GROUP BY source;\"\n\n# 任务数量\ndocker exec finnews_postgres psql -U finnews -d finnews_db -c \"SELECT status, COUNT(*) FROM crawl_tasks GROUP BY status;\"\n\n# Redis缓存\ndocker exec finnews_redis redis-cli DBSIZE\n```\n\n### 查看日志\n\n```bash\n# Celery Beat（定时调度）\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-beat\n\n# Celery Worker（任务执行）\ndocker compose -f deploy/docker-compose.dev.yml logs -f celery-worker\n\n# PostgreSQL\ndocker compose -f deploy/docker-compose.dev.yml logs -f postgres\n\n# 所有服务\ndocker compose -f deploy/docker-compose.dev.yml logs -f\n```\n\n### 直接访问\n\n- **前端**: http://localhost:3000\n- **后端API**: http://localhost:8000\n- **API文档**: http://localhost:8000/docs\n\n---\n\n## 📊 数据库结构\n\n### News（新闻表）\n- id, title, content, url, source\n- publish_time, stock_codes\n- sentiment_score, is_embedded\n\n### Analysis（分析表）\n- id, news_id, agent_name\n- sentiment, sentiment_score, confidence\n- analysis_result, structured_data\n\n### Stock（股票表）\n- id, code, name, industry, market\n\n---\n\n## 🛠️ 开发指南\n\n### 添加新的爬虫\n\n1. 继承 `BaseCrawler` 类\n2. 实现 `crawl()` 方法\n3. 注册到 `tools/__init__.py`\n\n示例：\n```python\n# backend/app/tools/custom_crawler.py\nfrom .crawler_base import BaseCrawler\n\nclass CustomCrawlerTool(BaseCrawler):\n    name = \"custom_crawler\"\n    \n    def crawl(self, start_page, end_page):\n        # 实现爬取逻辑\n        pass\n```\n\n### 使用增强版爬虫（可选）\n\n对于需要 JS 渲染或智能内容提取的场景，可使用增强版爬虫：\n\n```python\nfrom app.tools.crawler_enhanced import crawl_url, EnhancedCrawler\n\n# 快速爬取单个 URL\narticle = crawl_url(\"https://finance.sina.com.cn/xxx\", engine='auto')\nprint(article.to_markdown())\n\n# 获取 LLM 消息格式（多模态）\nllm_messages = article.to_llm_message()\n\n# 批量爬取（带缓存）\ncrawler = EnhancedCrawler(use_cache=True)\narticles = crawler.crawl_batch(urls, delay=1.0)\n```\n\n**支持的引擎：**\n- `requests`: 基础 HTTP 请求（默认）\n- `playwright`: JS 渲染（需安装 `playwright install chromium`）\n- `jina`: Jina Reader API（需配置 `JINA_API_KEY`）\n- `auto`: 自动选择最佳引擎\n\n**安装可选依赖：**\n\n```bash\npip install markdownify readabilipy playwright\nplaywright install chromium  # 可选，用于 JS 渲染\n```\n\n---\n\n### 添加新的智能体\n\n1. 继承 `Agent` 类\n2. 定义 role、goal、backstory\n3. 实现业务方法\n\n示例：\n```python\n# backend/app/agents/risk_analyst.py\nfrom agenticx import Agent\n\nclass RiskAnalystAgent(Agent):\n    def __init__(self, llm_provider):\n        super().__init__(\n            name=\"RiskAnalyst\",\n            role=\"风险分析师\",\n            goal=\"评估投资风险\",\n            llm_provider=llm_provider\n        )\n```\n\n---\n\n### 使用 AgenticX 组件\n\nFinnewsHunter 深度集成了 AgenticX 框架的核心组件，避免重复造轮子：\n\n#### 1. 向量化服务（Embedding）\n\n系统使用 `agenticx.embeddings.BailianEmbeddingProvider` 作为核心向量化引擎：\n\n```python\nfrom app.services.embedding_service import EmbeddingService\n\n# 同步接口（适用于同步上下文）\nembedding_service = EmbeddingService()\nvector = embedding_service.embed_text(\"文本内容\")\n\n# 异步接口（推荐在异步上下文中使用）\nvector = await embedding_service.aembed_text(\"文本内容\")\n\n# 批量处理（Provider 内部已实现批量优化）\nvectors = embedding_service.embed_batch([\"文本1\", \"文本2\", \"文本3\"])\n```\n\n**特点**：\n- 支持 Redis 缓存，避免重复计算\n- 自动处理文本长度限制（6000字符）\n- 支持同步和异步两种接口，避免事件循环冲突\n\n#### 2. 向量存储（Milvus）\n\n系统使用 `agenticx.storage.vectordb_storages.milvus.MilvusStorage` 作为向量数据库：\n\n```python\nfrom app.storage.vector_storage import VectorStorage\n\nvector_storage = VectorStorage()\n\n# 存储单个向量\nvector_storage.store_embedding(\n    news_id=1,\n    text=\"新闻内容\",\n    embedding=[0.1, 0.2, ...]\n)\n\n# 批量存储\nvector_storage.store_embeddings_batch([\n    {\"news_id\": 1, \"text\": \"内容1\", \"embedding\": [...]},\n    {\"news_id\": 2, \"text\": \"内容2\", \"embedding\": [...]}\n])\n\n# 相似度搜索\nresults = vector_storage.search_similar(query_vector=[...], top_k=10)\n\n# 获取统计信息（带查询计数回退机制）\nstats = vector_storage.get_stats()\n```\n\n**特点**：\n- 直接使用 AgenticX MilvusStorage，无需重复实现\n- 提供兼容性接口，简化调用\n- 当 `num_entities` 不准确时，通过实际查询获取真实数量\n- 支持异步操作，避免阻塞\n\n#### 3. 异步向量化最佳实践\n\n在异步上下文中（如 FastAPI 路由），推荐使用异步接口：\n\n```python\nfrom app.services.embedding_service import EmbeddingService\nfrom app.storage.vector_storage import VectorStorage\n\nasync def analyze_news(news_id: int, text: str):\n    embedding_service = EmbeddingService()\n    vector_storage = VectorStorage()\n    \n    # 使用异步接口，避免事件循环冲突\n    embedding = await embedding_service.aembed_text(text)\n    \n    # 后台异步存储向量（不阻塞分析流程）\n    asyncio.create_task(\n        vector_storage.store_embedding(news_id, text, embedding)\n    )\n    \n    # 继续执行分析逻辑...\n```\n\n**注意事项**：\n- 在异步上下文中，使用 `aembed_text()` 而不是 `embed_text()`\n- 向量化操作在后台异步执行，不阻塞主流程\n- Milvus 的 `flush()` 操作已优化，默认不执行（依赖自动刷新）\n\n---\n\n## 多智能体辩论架构\n\nFinnewsHunter 的核心特色是 **多空辩论机制**，通过多个专业智能体的协作与对抗，深度挖掘个股的投资价值和风险。\n\n### 核心参与角色\n\n| 智能体 | 角色定位 | 核心职责 |\n|--------|----------|----------|\n| **BullResearcher** | 看多研究员 | 挖掘增长潜力、核心利好、估值优势 |\n| **BearResearcher** | 看空研究员 | 识别下行风险、负面催化剂、反驳乐观预期 |\n| **SearchAnalyst** | 搜索分析师 | 动态获取数据（AkShare/BochaAI/浏览器搜索） |\n| **InvestmentManager** | 投资经理 | 主持辩论、评估论点质量、做出最终决策 |\n\n### 辩论数据流架构\n\n```mermaid\ngraph TD\n    subgraph 辩论启动\n        Manager[投资经理] -->|开场陈述| Orchestrator[辩论编排器]\n    end\n    \n    subgraph 多轮辩论\n        Orchestrator -->|第N轮| Bull[看多研究员]\n        Bull -->|发言 + 数据请求| Orchestrator\n        Orchestrator -->|触发搜索| Searcher[搜索分析师]\n        \n        Searcher -->|财务数据| AkShare[AkShare]\n        Searcher -->|实时新闻| BochaAI[BochaAI]\n        Searcher -->|网页搜索| Browser[浏览器引擎]\n        \n        AkShare --> Context[更新上下文]\n        BochaAI --> Context\n        Browser --> Context\n        \n        Context --> Orchestrator\n        Orchestrator -->|第N轮| Bear[看空研究员]\n        Bear -->|发言 + 数据请求| Orchestrator\n    end\n    \n    subgraph 最终决策\n        Orchestrator -->|智能数据补充| Searcher\n        Orchestrator -->|综合判断| Manager\n        Manager -->|投资评级| Result[最终报告]\n    end\n```\n\n### 动态搜索机制\n\n辩论过程中，智能体可以通过特定格式请求额外数据：\n\n```\n[SEARCH: \"最近的毛利率数据\" source:akshare]   -- 从 AkShare 获取财务数据\n[SEARCH: \"行业竞争格局分析\" source:bochaai]   -- 从 BochaAI 搜索新闻\n[SEARCH: \"近期资金流向\" source:akshare]       -- 获取资金流向\n[SEARCH: \"竞品对比分析\"]                       -- 自动选择最佳数据源\n```\n\n**支持的数据源：**\n- **AkShare**: 财务指标、K线行情、资金流向、机构持仓\n- **BochaAI**: 实时新闻搜索、分析师报告\n- **浏览器搜索**: 百度资讯、搜狗、360等多引擎搜索\n- **知识库**: 历史新闻和分析数据\n\n---\n\n## 📈 路线图\n\n### Phase 1: MVP（已完成） ✅\n- [x] 项目基础设施\n- [x] 数据库模型\n- [x] 爬虫工具重构（10个新闻源）\n- [x] LLM 服务集成\n- [x] NewsAnalyst 智能体\n- [x] FastAPI 路由\n- [x] React + TypeScript 前端\n\n### Phase 1.5: 多厂商 LLM 支持（已完成） ✅\n- [x] 支持 5 大 LLM 厂商（百炼、OpenAI、DeepSeek、Kimi、智谱）\n- [x] 前端动态模型切换\n- [x] LLM 配置 API（`/api/v1/llm/config`）\n- [x] 新闻详情抽屉（完整内容 + AI 分析）\n- [x] 实时搜索功能（多维度 + 关键词高亮）\n- [x] Markdown 渲染（支持表格、代码块）\n- [x] 一键复制分析报告\n\n### Phase 1.6: 股票分析与增强爬虫（已完成） ✅\n- [x] 股票 K 线图（集成 akshare + klinecharts）\n- [x] 多周期支持（日K/60分/30分/15分/5分/1分）\n- [x] 股票搜索（代码/名称模糊查询，预加载 5000+ A股）\n- [x] 增强版爬虫模块\n  - [x] 多引擎支持（Requests/Playwright/Jina）\n  - [x] 智能内容提取（readabilipy + 启发式算法）\n  - [x] 内容质量评估与自动重试\n  - [x] 缓存机制和统一 Article 模型\n\n### Phase 1.7: AgenticX 深度集成与批量操作（已完成） ✅\n- [x] 迁移到 AgenticX BailianEmbeddingProvider（移除冗余批量处理逻辑）\n- [x] 迁移到 AgenticX MilvusStorage（简化存储封装，移除重复代码）\n- [x] 异步向量化接口（aembed_text/aembed_batch），避免事件循环冲突\n- [x] 后台异步向量化，不阻塞分析流程\n- [x] Milvus 统计信息优化（查询计数回退机制）\n- [x] 前端批量选择功能（复选框 + Shift 范围选择）\n- [x] 批量删除新闻功能\n- [x] 批量分析新闻功能（带进度显示和结果统计）\n- [x] Docker Compose 优化（Celery 镜像构建，提升启动性能）\n\n### Phase 2: 多智能体辩论（已完成） ✅\n- [x] BullResearcher & BearResearcher 智能体\n- [x] SearchAnalyst 搜索分析师（动态数据获取）\n- [x] InvestmentManager 投资经理决策\n- [x] 辩论编排器（DebateOrchestrator）\n- [x] 动态搜索机制（辩论中按需获取数据）\n- [x] 三种辩论模式：并行分析、实时辩论、快速分析\n- [ ] 实时 WebSocket 推送（进行中）\n- [ ] 智能体执行轨迹可视化（进行中）\n\n### Phase 3: 知识增强（计划中）\n- [ ] 金融知识图谱（Neo4j）\n- [ ] 智能体记忆系统\n- [ ] GraphRetriever 图检索\n\n### Phase 4: 自我进化（计划中）\n- [ ] ACE 框架集成\n- [ ] 投资策略 Playbook\n- [ ] 决策效果评估与学习\n\n---\n\n## 📄 许可证\n\n本项目遵循 AgenticX 的许可证。\n\n---\n\n## 🙏 致谢\n\n- [AgenticX](https://github.com/yourusername/AgenticX) - 多智能体框架\n- [FastAPI](https://fastapi.tiangolo.com/) - Web 框架\n- [Milvus](https://milvus.io/) - 向量数据库\n- [阿里云百炼](https://dashscope.console.aliyun.com/) - LLM 服务\n- [Shadcn UI](https://ui.shadcn.com/) - 前端组件库\n\n---\n\n## ⭐ Star History\n\n如果你觉得这个项目对你有帮助，欢迎给个 Star ⭐️！\n\n[![Star History Chart](https://api.star-history.com/svg?repos=DemonDamon/FinnewsHunter&type=Date)](https://star-history.com/#DemonDamon/FinnewsHunter&Date)\n\n---\n\n**Built with ❤️ using AgenticX**\n\n"
  },
  {
    "path": "backend/.gitignore",
    "content": "# Python\n__pycache__/\n*.py[cod]\n*$py.class\n*.so\n.Python\nenv/\nvenv/\nENV/\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\n*.egg-info/\n.installed.cfg\n*.egg\n\n# Environment variables\n.env\n.env.local\n\n# IDE\n.vscode/\n.idea/\n*.swp\n*.swo\n*~\n\n# Logs\nlogs/\n*.log\n\n# Database\n*.db\n*.sqlite\n\n# OS\n.DS_Store\nThumbs.db\n\n# Testing\n.pytest_cache/\n.coverage\nhtmlcov/\n\ncelerybeat-schedule\ncelerybeat-schedule\ncelerybeat-schedule\n"
  },
  {
    "path": "backend/README.md",
    "content": "# FinnewsHunter Backend\n\nBackend service for the financial news intelligent analysis system based on the AgenticX framework.\n\n## Documentation Navigation\n\n### Quick Start\n- **[QUICKSTART.md](../QUICKSTART.md)** - Quick start guide (recommended for beginners)\n\n### Configuration Guides\n- **[CONFIG_GUIDE.md](CONFIG_GUIDE.md)** - **Unified Configuration Guide** (recommended)\n  - Single configuration file supports all LLM providers\n  - Quick switching between OpenAI / Bailian / Proxy\n  - Includes scenario examples and working principles\n  \n- **[env.example](env.example)** - Configuration template (with comments for all scenarios)\n\n### Specialized Configuration\n- **[BAILIAN_SETUP.md](BAILIAN_SETUP.md)** - Detailed Alibaba Cloud Bailian configuration (recommended for Chinese users)\n- **[API_PROXY_GUIDE.md](API_PROXY_GUIDE.md)** - API proxy configuration guide\n\n---\n\n## Quick Configuration\n\n### Method 1: Interactive Script (Recommended)\n\n```bash\nchmod +x setup_env.sh\n./setup_env.sh\n\n# Follow the prompts to select:\n# 1) OpenAI Official\n# 2) Alibaba Cloud Bailian (recommended for Chinese users)\n# 3) Other Proxy\n# 4) Manual Configuration\n```\n\n### Method 2: Manual Configuration\n\n```bash\ncp env.example .env\nnano .env  # Choose configuration scheme according to comments\n```\n\n---\n\n## Main Features\n\n- **Multi-Agent System**: Based on AgenticX framework\n  - NewsAnalyst: News analysis agent\n  - More agents under development...\n\n- **Data Collection**:\n  - Sina Finance crawler\n  - JRJ Finance crawler\n\n- **Storage System**:\n  - PostgreSQL: Relational data storage\n  - Milvus: Vector database\n  - Redis: Cache and task queue\n\n- **LLM Support**:\n  - OpenAI (GPT-3.5/GPT-4)\n  - Alibaba Cloud Bailian (Qwen)\n  - Other OpenAI-compatible services\n\n---\n\n## Project Structure\n\n```\nbackend/\n├── app/\n│   ├── agents/          # Agent definitions\n│   ├── api/             # FastAPI routes\n│   ├── core/            # Core configuration\n│   ├── models/          # Data models\n│   ├── services/        # Business services\n│   ├── storage/         # Storage wrappers\n│   └── tools/           # Crawlers and tools\n├── logs/                # Log files\n├── tests/               # Test files\n├── .env                 # Environment configuration (copy from env.example)\n├── env.example          # Configuration template\n├── requirements.txt     # Python dependencies\n└── start.sh            # Startup script\n```\n\n---\n\n## Development Guide\n\n### Start Development Environment\n\n```bash\n# 1. Configure environment variables\n./setup_env.sh\n\n# 2. Start services (including Docker containers)\n./start.sh\n```\n\n### Utility Scripts\n\nThe project provides some utility scripts located in the `tests/` directory:\n\n```bash\n# Check Milvus vector storage data\npython tests/check_milvus_data.py\n\n# Check news embedding status\npython tests/check_news_embedding_status.py\n\n# Manually vectorize a specific news item (for fixing unvectorized news)\npython tests/manual_vectorize.py <news_id>\n```\n\n### View Logs\n\n```bash\ntail -f logs/finnews.log\n```\n\n---\n\n## Common Configuration Scenarios\n\n### OpenAI Official\n```bash\nLLM_MODEL=gpt-3.5-turbo\nOPENAI_API_KEY=sk-openai-key\nMILVUS_DIM=1536\n```\n\n### Alibaba Cloud Bailian (Recommended for Chinese Users)\n```bash\nLLM_MODEL=qwen-plus\nOPENAI_API_KEY=sk-bailian-key\nOPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nMILVUS_DIM=1024\n```\n\n### OpenAI Proxy\n```bash\nLLM_MODEL=gpt-3.5-turbo\nOPENAI_API_KEY=sk-proxy-key\nOPENAI_BASE_URL=https://your-proxy.com/v1\nMILVUS_DIM=1536\n```\n\nFor detailed information, see **[CONFIG_GUIDE.md](CONFIG_GUIDE.md)**\n\n---\n\n## API Documentation\n\n- Swagger UI: http://localhost:8000/docs\n- ReDoc: http://localhost:8000/redoc\n\n### Troubleshooting\n\nIf the documentation page appears blank or keeps loading:\n\n1. **Check Browser Console**: Press F12 to open developer tools, check Console and Network tabs for errors\n2. **Try ReDoc**: If Swagger UI fails to load, try accessing ReDoc (uses a different CDN)\n3. **Clear Browser Cache**: Press `Ctrl+Shift+R` (Windows/Linux) or `Cmd+Shift+R` (Mac) to force refresh\n4. **Check Network Connection**: Documentation pages need to load JavaScript resources from CDN, ensure network connection is normal\n5. **Check Backend Service**: Ensure the backend service is running, verify by accessing http://localhost:8000/health\n"
  },
  {
    "path": "backend/README_zn.md",
    "content": "# FinnewsHunter Backend\n\n基于 AgenticX 框架的金融新闻智能分析系统后端服务。\n\n## 文档导航\n\n### 快速开始\n- **[QUICKSTART.md](../QUICKSTART.md)** - 快速启动指南（推荐新手阅读）\n\n### 配置指南\n- **[CONFIG_GUIDE.md](CONFIG_GUIDE.md)** - **统一配置指南**（推荐首选）\n  - 一个配置文件支持所有 LLM 服务商\n  - 快速切换 OpenAI / 百炼 / 代理\n  - 包含场景示例和工作原理\n  \n- **[env.example](env.example)** - 配置模板（包含所有场景的注释）\n\n### 专项配置\n- **[BAILIAN_SETUP.md](BAILIAN_SETUP.md)** - 阿里云百炼详细配置（国内用户推荐）\n- **[API_PROXY_GUIDE.md](API_PROXY_GUIDE.md)** - API 代理配置详解\n\n---\n\n## 快速配置\n\n### 方法 1: 交互式脚本（推荐）\n\n```bash\nchmod +x setup_env.sh\n./setup_env.sh\n\n# 按提示选择：\n# 1) OpenAI 官方\n# 2) 阿里云百炼（推荐国内用户）\n# 3) 其他代理\n# 4) 手动配置\n```\n\n### 方法 2: 手动配置\n\n```bash\ncp env.example .env\nnano .env  # 根据注释选择配置方案\n```\n\n---\n\n## 主要功能\n\n- **多智能体系统**：基于 AgenticX 框架\n  - NewsAnalyst：新闻分析智能体\n  - 更多智能体开发中...\n\n- **数据采集**：\n  - 新浪财经爬虫\n  - 金融界爬虫\n\n- **存储系统**：\n  - PostgreSQL：关系数据存储\n  - Milvus：向量数据库\n  - Redis：缓存和任务队列\n\n- **LLM 支持**：\n  - OpenAI (GPT-3.5/GPT-4)\n  - 阿里云百炼（通义千问）\n  - 其他 OpenAI 兼容服务\n\n---\n\n## 项目结构\n\n```\nbackend/\n├── app/\n│   ├── agents/          # 智能体定义\n│   ├── api/             # FastAPI 路由\n│   ├── core/            # 核心配置\n│   ├── models/          # 数据模型\n│   ├── services/        # 业务服务\n│   ├── storage/         # 存储封装\n│   └── tools/           # 爬虫和工具\n├── logs/                # 日志文件\n├── tests/               # 测试文件\n├── .env                 # 环境配置（从 env.example 复制）\n├── env.example          # 配置模板\n├── requirements.txt     # Python 依赖\n└── start.sh            # 启动脚本\n```\n\n---\n\n## 开发指南\n\n### 启动开发环境\n\n```bash\n# 1. 配置环境变量\n./setup_env.sh\n\n# 2. 启动服务（包括 Docker 容器）\n./start.sh\n```\n\n### 工具脚本\n\n项目提供了一些实用工具脚本，位于 `tests/` 目录下：\n\n```bash\n# 检查 Milvus 向量存储数据\npython tests/check_milvus_data.py\n\n# 检查新闻向量化状态\npython tests/check_news_embedding_status.py\n\n# 手动向量化指定新闻（用于修复未向量化的新闻）\npython tests/manual_vectorize.py <news_id>\n```\n\n### 查看日志\n\n```bash\ntail -f logs/finnews.log\n```\n\n---\n\n## 常用配置场景\n\n### OpenAI 官方\n```bash\nLLM_MODEL=gpt-3.5-turbo\nOPENAI_API_KEY=sk-openai-key\nMILVUS_DIM=1536\n```\n\n### 阿里云百炼（推荐国内）\n```bash\nLLM_MODEL=qwen-plus\nOPENAI_API_KEY=sk-bailian-key\nOPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nMILVUS_DIM=1024\n```\n\n### OpenAI 代理\n```bash\nLLM_MODEL=gpt-3.5-turbo\nOPENAI_API_KEY=sk-proxy-key\nOPENAI_BASE_URL=https://your-proxy.com/v1\nMILVUS_DIM=1536\n```\n\n详细说明见 **[CONFIG_GUIDE.md](CONFIG_GUIDE.md)**\n\n---\n\n## API 文档\n\n- Swagger UI: http://localhost:8000/docs\n- ReDoc: http://localhost:8000/redoc\n\n### 手动触发爬取\n\n如果某个新闻源显示为空，可以手动触发实时爬取：\n\n```bash\n# 触发腾讯财经爬取\ncurl -X POST \"http://localhost:8000/api/v1/tasks/realtime\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"source\": \"tencent\", \"force_refresh\": true}'\n\n# 触发经济观察网爬取\ncurl -X POST \"http://localhost:8000/api/v1/tasks/realtime\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"source\": \"eeo\", \"force_refresh\": true}'\n```\n\n支持的新闻源：\n- `sina` - 新浪财经\n- `tencent` - 腾讯财经\n- `eeo` - 经济观察网\n- `jwview` - 金融界\n- `caijing` - 财经网\n- `jingji21` - 21经济网\n- `nbd` - 每日经济新闻\n- `yicai` - 第一财经\n- `163` - 网易财经\n- `eastmoney` - 东方财富\n\n### 故障排查\n\n如果文档页面显示空白或一直加载：\n\n1. **检查浏览器控制台**：按 F12 打开开发者工具，查看 Console 和 Network 标签页是否有错误\n2. **尝试 ReDoc**：如果 Swagger UI 无法加载，尝试访问 ReDoc（使用不同的 CDN）\n3. **清除浏览器缓存**：按 `Ctrl+Shift+R` (Windows/Linux) 或 `Cmd+Shift+R` (Mac) 强制刷新\n4. **检查网络连接**：文档页面需要从 CDN 加载 JavaScript 资源，确保网络连接正常\n5. **检查后端服务**：确保后端服务正在运行，可以访问 http://localhost:8000/health 验证\n"
  },
  {
    "path": "backend/add_raw_html_column.py",
    "content": "\"\"\"\n数据库迁移：添加 raw_html 字段\n\"\"\"\nimport os\nfrom pathlib import Path\nfrom dotenv import load_dotenv\n\n# 加载环境变量\nenv_path = Path(__file__).parent / \".env\"\nload_dotenv(env_path)\n\n# 构建数据库 URL\nPOSTGRES_USER = os.getenv(\"POSTGRES_USER\", \"postgres\")\nPOSTGRES_PASSWORD = os.getenv(\"POSTGRES_PASSWORD\", \"postgres\")\nPOSTGRES_HOST = os.getenv(\"POSTGRES_HOST\", \"localhost\")\nPOSTGRES_PORT = os.getenv(\"POSTGRES_PORT\", \"5432\")\nPOSTGRES_DB = os.getenv(\"POSTGRES_DB\", \"finnews_db\")\n\nDATABASE_URL = f\"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}\"\n\nfrom sqlalchemy import create_engine, text\n\ndef add_raw_html_column():\n    \"\"\"添加 raw_html 字段到 news 表\"\"\"\n    print(\"🔧 正在添加 raw_html 字段...\")\n    \n    engine = create_engine(DATABASE_URL)\n    \n    with engine.connect() as conn:\n        # 检查字段是否已存在\n        result = conn.execute(text(\"\"\"\n            SELECT column_name FROM information_schema.columns \n            WHERE table_name = 'news' AND column_name = 'raw_html'\n        \"\"\"))\n        \n        if result.fetchone():\n            print(\"✅ raw_html 字段已存在，无需迁移\")\n            return\n        \n        # 添加字段\n        conn.execute(text(\"\"\"\n            ALTER TABLE news ADD COLUMN raw_html TEXT\n        \"\"\"))\n        conn.commit()\n        \n        print(\"✅ raw_html 字段已添加成功！\")\n\nif __name__ == \"__main__\":\n    print(\"=\" * 50)\n    print(\"📦 数据库迁移：添加 raw_html 字段\")\n    print(\"=\" * 50)\n    add_raw_html_column()\n\n"
  },
  {
    "path": "backend/app/__init__.py",
    "content": "\"\"\"\nFinnewsHunter Backend Application\n\"\"\"\n__version__ = \"0.1.0\"\n\n"
  },
  {
    "path": "backend/app/agents/__init__.py",
    "content": "\"\"\"\n智能体模块\n\"\"\"\nfrom .news_analyst import NewsAnalystAgent, create_news_analyst\nfrom .debate_agents import (\n    BullResearcherAgent,\n    BearResearcherAgent,\n    InvestmentManagerAgent,\n    DebateWorkflow,\n    create_debate_workflow,\n)\nfrom .data_collector_v2 import DataCollectorAgentV2, QuickAnalystAgent, create_data_collector\nfrom .orchestrator import DebateOrchestrator, create_orchestrator\nfrom .quantitative_agent import QuantitativeAgent, create_quantitative_agent\n\n__all__ = [\n    \"NewsAnalystAgent\",\n    \"create_news_analyst\",\n    \"BullResearcherAgent\",\n    \"BearResearcherAgent\",\n    \"InvestmentManagerAgent\",\n    \"DebateWorkflow\",\n    \"create_debate_workflow\",\n    \"DataCollectorAgentV2\",\n    \"QuickAnalystAgent\",\n    \"create_data_collector\",\n    \"DebateOrchestrator\",\n    \"create_orchestrator\",\n    \"QuantitativeAgent\",\n    \"create_quantitative_agent\",\n]\n\n"
  },
  {
    "path": "backend/app/agents/data_collector.py",
    "content": "\"\"\"\n数据专员智能体\n\n负责在辩论前搜集和整理相关数据资料，包括：\n- 新闻数据（从数据库或BochaAI搜索）\n- 财务数据（从AkShare获取）\n- 行情数据（实时行情、K线等）\n\"\"\"\nimport logging\nfrom typing import Dict, Any, List, Optional\nfrom datetime import datetime\n\nfrom agenticx.core.agent import Agent\nfrom ..services.llm_service import get_llm_provider\n\nlogger = logging.getLogger(__name__)\n\n\nclass DataCollectorAgent(Agent):\n    \"\"\"数据专员智能体\"\"\"\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        super().__init__(\n            name=\"DataCollector\",\n            role=\"数据专员\",\n            goal=\"搜集和整理股票相关的新闻、财务和行情数据，为辩论提供全面的信息支持\",\n            backstory=\"\"\"你是一位专业的金融数据分析师，擅长从多个数据源搜集和整理信息。\n你的职责是在辩论开始前，为Bull/Bear研究员提供全面、准确、及时的数据支持。\n你需要：\n1. 搜集最新的相关新闻\n2. 获取关键财务指标\n3. 分析资金流向\n4. 整理行情数据\n你的工作质量直接影响辩论的深度和专业性。\"\"\",\n            organization_id=organization_id\n        )\n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        logger.info(f\"Initialized {self.name} agent\")\n    \n    async def collect_data(\n        self,\n        stock_code: str,\n        stock_name: str,\n        data_requirements: Optional[Dict[str, Any]] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        搜集股票相关数据\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            data_requirements: 数据需求配置\n            \n        Returns:\n            包含各类数据的字典\n        \"\"\"\n        logger.info(f\"📊 DataCollector: 开始搜集 {stock_name}({stock_code}) 的数据...\")\n        \n        result = {\n            \"stock_code\": stock_code,\n            \"stock_name\": stock_name,\n            \"collected_at\": datetime.utcnow().isoformat(),\n            \"news\": [],\n            \"financial\": {},\n            \"fund_flow\": {},\n            \"realtime_quote\": {},\n            \"summary\": \"\"\n        }\n        \n        try:\n            # 1. 搜集新闻数据\n            news_data = await self._collect_news(stock_code, stock_name)\n            result[\"news\"] = news_data\n            logger.info(f\"📰 DataCollector: 搜集到 {len(news_data)} 条新闻\")\n            \n            # 2. 搜集财务数据\n            financial_data = await self._collect_financial(stock_code)\n            result[\"financial\"] = financial_data\n            logger.info(f\"💰 DataCollector: 搜集到财务数据\")\n            \n            # 3. 搜集资金流向\n            fund_flow = await self._collect_fund_flow(stock_code)\n            result[\"fund_flow\"] = fund_flow\n            logger.info(f\"💸 DataCollector: 搜集到资金流向数据\")\n            \n            # 4. 搜集实时行情\n            realtime = await self._collect_realtime_quote(stock_code)\n            result[\"realtime_quote\"] = realtime\n            logger.info(f\"📈 DataCollector: 搜集到实时行情\")\n            \n            # 5. 生成数据摘要\n            result[\"summary\"] = await self._generate_summary(result)\n            logger.info(f\"📋 DataCollector: 数据摘要生成完成\")\n            \n        except Exception as e:\n            logger.error(f\"DataCollector 搜集数据时出错: {e}\", exc_info=True)\n            result[\"error\"] = str(e)\n        \n        return result\n    \n    async def _collect_news(self, stock_code: str, stock_name: str) -> List[Dict[str, Any]]:\n        \"\"\"搜集新闻数据\"\"\"\n        from ..services.news_service import news_service\n        \n        try:\n            # 从数据库获取已有新闻\n            news_list = await news_service.get_news_by_stock(stock_code, limit=20)\n            return [\n                {\n                    \"title\": news.title,\n                    \"content\": news.content[:500] if news.content else \"\",\n                    \"source\": news.source,\n                    \"published_at\": news.published_at.isoformat() if news.published_at else None,\n                    \"sentiment\": news.sentiment\n                }\n                for news in news_list\n            ]\n        except Exception as e:\n            logger.warning(f\"从数据库获取新闻失败: {e}\")\n            return []\n    \n    async def _collect_financial(self, stock_code: str) -> Dict[str, Any]:\n        \"\"\"搜集财务数据\"\"\"\n        from ..services.stock_data_service import stock_data_service\n        \n        try:\n            return await stock_data_service.get_financial_indicators(stock_code) or {}\n        except Exception as e:\n            logger.warning(f\"获取财务数据失败: {e}\")\n            return {}\n    \n    async def _collect_fund_flow(self, stock_code: str) -> Dict[str, Any]:\n        \"\"\"搜集资金流向数据\"\"\"\n        from ..services.stock_data_service import stock_data_service\n        \n        try:\n            return await stock_data_service.get_fund_flow(stock_code) or {}\n        except Exception as e:\n            logger.warning(f\"获取资金流向失败: {e}\")\n            return {}\n    \n    async def _collect_realtime_quote(self, stock_code: str) -> Dict[str, Any]:\n        \"\"\"搜集实时行情\"\"\"\n        from ..services.stock_data_service import stock_data_service\n        \n        try:\n            return await stock_data_service.get_realtime_quote(stock_code) or {}\n        except Exception as e:\n            logger.warning(f\"获取实时行情失败: {e}\")\n            return {}\n    \n    async def _generate_summary(self, data: Dict[str, Any]) -> str:\n        \"\"\"使用LLM生成数据摘要\"\"\"\n        try:\n            # 准备摘要内容\n            news_summary = \"\"\n            if data.get(\"news\"):\n                news_titles = [n[\"title\"] for n in data[\"news\"][:5]]\n                news_summary = f\"最新新闻（{len(data['news'])}条）:\\n\" + \"\\n\".join(f\"- {t}\" for t in news_titles)\n            \n            financial_summary = \"\"\n            if data.get(\"financial\"):\n                f = data[\"financial\"]\n                financial_summary = f\"\"\"财务指标:\n- PE: {f.get('pe', 'N/A')}\n- PB: {f.get('pb', 'N/A')}\n- ROE: {f.get('roe', 'N/A')}\n- 净利润增长率: {f.get('net_profit_growth', 'N/A')}\"\"\"\n            \n            fund_flow_summary = \"\"\n            if data.get(\"fund_flow\"):\n                ff = data[\"fund_flow\"]\n                fund_flow_summary = f\"\"\"资金流向:\n- 主力净流入: {ff.get('main_net_inflow', 'N/A')}\n- 散户净流入: {ff.get('retail_net_inflow', 'N/A')}\"\"\"\n            \n            realtime_summary = \"\"\n            if data.get(\"realtime_quote\"):\n                rt = data[\"realtime_quote\"]\n                realtime_summary = f\"\"\"实时行情:\n- 当前价: {rt.get('price', 'N/A')}\n- 涨跌幅: {rt.get('change_pct', 'N/A')}%\n- 成交量: {rt.get('volume', 'N/A')}\"\"\"\n            \n            summary = f\"\"\"## {data['stock_name']}({data['stock_code']}) 数据摘要\n\n{realtime_summary}\n\n{financial_summary}\n\n{fund_flow_summary}\n\n{news_summary}\n\n数据搜集时间: {data['collected_at']}\"\"\"\n            \n            return summary\n            \n        except Exception as e:\n            logger.error(f\"生成数据摘要失败: {e}\")\n            return f\"数据搜集完成，但生成摘要时出错: {e}\"\n    \n    async def analyze_data_quality(self, data: Dict[str, Any]) -> Dict[str, Any]:\n        \"\"\"分析数据质量和完整性\"\"\"\n        quality = {\n            \"score\": 0,\n            \"max_score\": 100,\n            \"details\": [],\n            \"recommendations\": []\n        }\n        \n        # 检查新闻数据\n        news_count = len(data.get(\"news\", []))\n        if news_count >= 10:\n            quality[\"score\"] += 30\n            quality[\"details\"].append(f\"✅ 新闻数据充足（{news_count}条）\")\n        elif news_count >= 5:\n            quality[\"score\"] += 20\n            quality[\"details\"].append(f\"⚠️ 新闻数据较少（{news_count}条）\")\n            quality[\"recommendations\"].append(\"建议搜集更多新闻以支持分析\")\n        elif news_count > 0:\n            quality[\"score\"] += 10\n            quality[\"details\"].append(f\"⚠️ 新闻数据不足（{news_count}条）\")\n            quality[\"recommendations\"].append(\"新闻数据偏少，分析可能不够全面\")\n        else:\n            quality[\"details\"].append(\"❌ 无新闻数据\")\n            quality[\"recommendations\"].append(\"缺少新闻数据，建议先进行定向爬取\")\n        \n        # 检查财务数据\n        if data.get(\"financial\"):\n            quality[\"score\"] += 25\n            quality[\"details\"].append(\"✅ 财务数据完整\")\n        else:\n            quality[\"details\"].append(\"❌ 缺少财务数据\")\n            quality[\"recommendations\"].append(\"无法获取财务指标\")\n        \n        # 检查资金流向\n        if data.get(\"fund_flow\"):\n            quality[\"score\"] += 20\n            quality[\"details\"].append(\"✅ 资金流向数据完整\")\n        else:\n            quality[\"details\"].append(\"⚠️ 缺少资金流向数据\")\n        \n        # 检查实时行情\n        if data.get(\"realtime_quote\"):\n            quality[\"score\"] += 25\n            quality[\"details\"].append(\"✅ 实时行情数据完整\")\n        else:\n            quality[\"details\"].append(\"⚠️ 缺少实时行情数据\")\n        \n        return quality\n\n\n# 快速分析师（用于快速分析模式）\nclass QuickAnalystAgent(Agent):\n    \"\"\"快速分析师智能体\"\"\"\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        super().__init__(\n            name=\"QuickAnalyst\",\n            role=\"快速分析师\",\n            goal=\"快速综合多角度给出投资建议\",\n            backstory=\"\"\"你是一位经验丰富的量化分析师，擅长快速分析和决策。\n你能够在短时间内综合考虑多空因素，给出简洁明了的投资建议。\n你的分析风格是：快速、准确、实用。\"\"\",\n            organization_id=organization_id\n        )\n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        logger.info(f\"Initialized {self.name} agent\")\n    \n    async def quick_analyze(\n        self,\n        stock_code: str,\n        stock_name: str,\n        context: str\n    ) -> Dict[str, Any]:\n        \"\"\"快速分析\"\"\"\n        # 获取当前系统时间\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        prompt = f\"\"\"请对 {stock_name}({stock_code}) 进行快速投资分析。\n\n【当前时间】\n{current_time}\n\n背景资料:\n{context}\n\n请在1分钟内给出：\n1. 核心观点（一句话）\n2. 看多因素（3点）\n3. 看空因素（3点）\n4. 投资建议（买入/持有/卖出）\n5. 目标价位和止损价位\n\n请用简洁的语言，直接给出结论。\"\"\"\n\n        try:\n            response = await self._llm_provider.chat(prompt)\n            return {\n                \"success\": True,\n                \"analysis\": response,\n                \"timestamp\": datetime.utcnow().isoformat()\n            }\n        except Exception as e:\n            logger.error(f\"Quick analysis failed: {e}\")\n            return {\n                \"success\": False,\n                \"error\": str(e)\n            }\n\n"
  },
  {
    "path": "backend/app/agents/data_collector_v2.py",
    "content": "\"\"\"\n数据专员智能体 V2 (DataCollectorAgent)\n\n统一负责所有数据获取任务，支持：\n- 辩论前的初始数据收集\n- 辩论中的动态数据补充\n- 用户追问时的按需搜索\n\n核心特性：\n1. 计划/执行分离：先生成搜索计划，用户确认后再执行\n2. 多数据源支持：AkShare、BochaAI、网页搜索、知识库\n3. 智能意图识别：根据用户问题自动选择数据源\n\"\"\"\nimport logging\nimport re\nimport asyncio\nfrom typing import Dict, Any, List, Optional, ClassVar, Pattern\nfrom datetime import datetime\nfrom enum import Enum\nfrom pydantic import BaseModel, Field\n\nfrom agenticx.core.agent import Agent\nfrom ..services.llm_service import get_llm_provider\nfrom ..services.stock_data_service import stock_data_service\nfrom ..tools.bochaai_search import bochaai_search, SearchResult\nfrom ..tools.interactive_crawler import InteractiveCrawler\n\nlogger = logging.getLogger(__name__)\n\n\nclass SearchSource(str, Enum):\n    \"\"\"搜索数据源类型\"\"\"\n    AKSHARE = \"akshare\"           # AkShare 财务/行情数据\n    BOCHAAI = \"bochaai\"           # BochaAI Web搜索\n    BROWSER = \"browser\"           # 交互式浏览器搜索\n    KNOWLEDGE_BASE = \"kb\"         # 内部知识库\n    ALL = \"all\"                   # 所有来源\n\n\nclass SearchTask(BaseModel):\n    \"\"\"单个搜索任务\"\"\"\n    id: str = Field(..., description=\"任务ID\")\n    source: SearchSource = Field(..., description=\"数据源\")\n    query: str = Field(..., description=\"搜索查询\")\n    description: str = Field(\"\", description=\"任务描述（用于展示给用户）\")\n    data_type: Optional[str] = Field(None, description=\"数据类型（如 financial, news, kline）\")\n    icon: str = Field(\"🔍\", description=\"图标（用于UI展示）\")\n    estimated_time: int = Field(3, description=\"预计耗时（秒）\")\n\n\nclass SearchPlan(BaseModel):\n    \"\"\"搜索计划\"\"\"\n    plan_id: str = Field(..., description=\"计划ID\")\n    stock_code: str = Field(..., description=\"股票代码\")\n    stock_name: str = Field(\"\", description=\"股票名称\")\n    user_query: str = Field(..., description=\"用户原始问题\")\n    tasks: List[SearchTask] = Field(default_factory=list, description=\"搜索任务列表\")\n    total_estimated_time: int = Field(0, description=\"总预计耗时（秒）\")\n    created_at: str = Field(default_factory=lambda: datetime.utcnow().isoformat())\n    status: str = Field(\"pending\", description=\"状态：pending, confirmed, executing, completed, cancelled\")\n\n\nclass SearchResult(BaseModel):\n    \"\"\"搜索结果\"\"\"\n    task_id: str\n    source: str\n    success: bool\n    data: Dict[str, Any] = Field(default_factory=dict)\n    summary: str = \"\"\n    error: Optional[str] = None\n    execution_time: float = 0\n\n\nclass DataCollectorAgentV2(Agent):\n    \"\"\"\n    数据专员智能体 V2\n    \n    支持\"确认优先\"模式：\n    1. 用户 @数据专员 提问\n    2. 生成搜索计划（不执行）\n    3. 用户确认后执行\n    4. 返回结果\n    \"\"\"\n    \n    # 关键词到数据源的映射\n    KEYWORD_SOURCE_MAP: ClassVar[Dict[str, tuple]] = {\n        # 财务相关 -> AkShare\n        \"财务\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"pe\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"pb\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"roe\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"利润\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"营收\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"估值\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"市盈\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"市净\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \"报表\": (SearchSource.AKSHARE, \"financial\", \"📊\"),\n        \n        # 资金/行情 -> AkShare\n        \"资金\": (SearchSource.AKSHARE, \"fund_flow\", \"💰\"),\n        \"主力\": (SearchSource.AKSHARE, \"fund_flow\", \"💰\"),\n        \"流入\": (SearchSource.AKSHARE, \"fund_flow\", \"💰\"),\n        \"流出\": (SearchSource.AKSHARE, \"fund_flow\", \"💰\"),\n        \"行情\": (SearchSource.AKSHARE, \"realtime\", \"📈\"),\n        \"价格\": (SearchSource.AKSHARE, \"realtime\", \"📈\"),\n        \"涨跌\": (SearchSource.AKSHARE, \"realtime\", \"📈\"),\n        \"k线\": (SearchSource.AKSHARE, \"kline\", \"📈\"),\n        \"走势\": (SearchSource.AKSHARE, \"kline\", \"📈\"),\n        \n        # 新闻相关 -> BochaAI\n        \"新闻\": (SearchSource.BOCHAAI, \"news\", \"📰\"),\n        \"资讯\": (SearchSource.BOCHAAI, \"news\", \"📰\"),\n        \"报道\": (SearchSource.BOCHAAI, \"news\", \"📰\"),\n        \"公告\": (SearchSource.BOCHAAI, \"news\", \"📰\"),\n        \"消息\": (SearchSource.BOCHAAI, \"news\", \"📰\"),\n        \n        # 上下游/产业链 -> 多源搜索\n        \"上下游\": (SearchSource.BROWSER, \"industry\", \"🔗\"),\n        \"供应链\": (SearchSource.BROWSER, \"industry\", \"🔗\"),\n        \"客户\": (SearchSource.BROWSER, \"industry\", \"🔗\"),\n        \"供应商\": (SearchSource.BROWSER, \"industry\", \"🔗\"),\n        \"合作\": (SearchSource.BROWSER, \"industry\", \"🔗\"),\n        \"产业链\": (SearchSource.BROWSER, \"industry\", \"🔗\"),\n    }\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        super().__init__(\n            name=\"DataCollector\",\n            role=\"数据专员\",\n            goal=\"根据用户需求，从多个数据源搜集和整理相关信息，支持辩论前准备和辩论中追问\",\n            backstory=\"\"\"你是一位专业的金融数据专家，精通各类金融数据源的使用。\n你的职责是：\n1. 理解用户的数据需求\n2. 制定合理的搜索计划\n3. 从多个数据源获取数据\n4. 整理并格式化数据\n\n你能够访问的数据源包括：\n- AkShare: 股票财务指标、K线行情、资金流向等\n- BochaAI: 实时新闻搜索、财经报道\n- 网页搜索: 百度资讯、搜狗等\n- 知识库: 历史新闻和分析数据\"\"\",\n            organization_id=organization_id\n        )\n        \n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        \n        # 初始化搜索工具\n        self._interactive_crawler = InteractiveCrawler(timeout=20)\n        \n        logger.info(f\"✅ Initialized DataCollectorV2 with multi-source search capabilities\")\n    \n    async def generate_search_plan(\n        self,\n        query: str,\n        stock_code: str,\n        stock_name: str = \"\"\n    ) -> SearchPlan:\n        \"\"\"\n        生成搜索计划（不执行）\n        \n        根据用户问题分析需要哪些数据，生成待确认的搜索计划\n        \n        Args:\n            query: 用户问题\n            stock_code: 股票代码\n            stock_name: 股票名称\n            \n        Returns:\n            SearchPlan 对象\n        \"\"\"\n        logger.info(f\"📋 DataCollector: 为 '{query}' 生成搜索计划...\")\n        \n        plan_id = f\"plan_{datetime.utcnow().strftime('%Y%m%d%H%M%S')}_{stock_code}\"\n        \n        plan = SearchPlan(\n            plan_id=plan_id,\n            stock_code=stock_code,\n            stock_name=stock_name or stock_code,\n            user_query=query,\n            tasks=[],\n            status=\"pending\"\n        )\n        \n        query_lower = query.lower()\n        \n        # 1. 基于关键词匹配生成任务\n        matched_sources = set()\n        for keyword, (source, data_type, icon) in self.KEYWORD_SOURCE_MAP.items():\n            if keyword in query_lower:\n                if (source, data_type) not in matched_sources:\n                    matched_sources.add((source, data_type))\n                    task = self._create_task(\n                        source=source,\n                        data_type=data_type,\n                        icon=icon,\n                        query=query,\n                        stock_code=stock_code,\n                        stock_name=stock_name\n                    )\n                    plan.tasks.append(task)\n        \n        # 2. 如果没有匹配到任何关键词，使用 LLM 分析\n        if not plan.tasks:\n            plan.tasks = await self._analyze_with_llm(query, stock_code, stock_name)\n        \n        # 3. 如果还是没有任务，添加默认的综合搜索\n        if not plan.tasks:\n            plan.tasks = [\n                SearchTask(\n                    id=f\"task_{plan_id}_1\",\n                    source=SearchSource.BOCHAAI,\n                    query=f\"{stock_name or stock_code} {query}\",\n                    description=f\"搜索 {stock_name} 相关新闻\",\n                    icon=\"📰\",\n                    estimated_time=3\n                ),\n                SearchTask(\n                    id=f\"task_{plan_id}_2\",\n                    source=SearchSource.AKSHARE,\n                    query=query,\n                    description=\"获取最新财务和行情数据\",\n                    data_type=\"overview\",\n                    icon=\"📊\",\n                    estimated_time=2\n                )\n            ]\n        \n        # 计算总耗时\n        plan.total_estimated_time = sum(t.estimated_time for t in plan.tasks)\n        \n        logger.info(f\"✅ 生成搜索计划: {len(plan.tasks)} 个任务，预计耗时 {plan.total_estimated_time}s\")\n        \n        return plan\n    \n    def _create_task(\n        self,\n        source: SearchSource,\n        data_type: str,\n        icon: str,\n        query: str,\n        stock_code: str,\n        stock_name: str\n    ) -> SearchTask:\n        \"\"\"创建搜索任务\"\"\"\n        task_id = f\"task_{datetime.utcnow().strftime('%H%M%S%f')}\"\n        \n        # 根据数据类型生成描述\n        descriptions = {\n            \"financial\": f\"获取 {stock_name or stock_code} 财务指标（PE/PB/ROE等）\",\n            \"fund_flow\": f\"获取 {stock_name or stock_code} 资金流向（主力/散户）\",\n            \"realtime\": f\"获取 {stock_name or stock_code} 实时行情\",\n            \"kline\": f\"获取 {stock_name or stock_code} K线走势\",\n            \"news\": f\"搜索 {stock_name or stock_code} 最新新闻\",\n            \"industry\": f\"搜索 {stock_name or stock_code} 产业链/上下游信息\",\n        }\n        \n        # 根据数据类型生成查询\n        queries = {\n            \"financial\": stock_code,\n            \"fund_flow\": stock_code,\n            \"realtime\": stock_code,\n            \"kline\": stock_code,\n            \"news\": f\"{stock_name or stock_code} {query}\",\n            \"industry\": f\"{stock_name or stock_code} {query}\",\n        }\n        \n        return SearchTask(\n            id=task_id,\n            source=source,\n            query=queries.get(data_type, query),\n            description=descriptions.get(data_type, f\"搜索: {query}\"),\n            data_type=data_type,\n            icon=icon,\n            estimated_time=3 if source != SearchSource.BROWSER else 5\n        )\n    \n    async def _analyze_with_llm(\n        self,\n        query: str,\n        stock_code: str,\n        stock_name: str\n    ) -> List[SearchTask]:\n        \"\"\"使用 LLM 分析需要哪些数据\"\"\"\n        try:\n            prompt = f\"\"\"分析以下用户问题，判断需要搜索哪些数据：\n\n用户问题: \"{query}\"\n股票: {stock_name}({stock_code})\n\n可用数据源:\n1. akshare - 财务数据（PE/PB/ROE等）、资金流向、实时行情、K线\n2. bochaai - 新闻搜索、财经报道\n3. browser - 网页搜索（适合搜索产业链、上下游、合作方等）\n4. kb - 历史新闻数据库\n\n请返回需要搜索的内容，格式如下（每行一个）:\nSOURCE:数据源|TYPE:数据类型|QUERY:搜索词|DESC:描述\n\n示例:\nSOURCE:bochaai|TYPE:news|QUERY:ST国华 上下游|DESC:搜索ST国华上下游相关新闻\nSOURCE:akshare|TYPE:financial|QUERY:002074|DESC:获取国轩高科财务数据\n\n只输出2-4个最相关的搜索任务。\"\"\"\n\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": \"你是数据搜索专家，帮助分析需要哪些数据。\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            \n            content = response.content if hasattr(response, 'content') else str(response)\n            \n            tasks = []\n            for line in content.strip().split('\\n'):\n                if 'SOURCE:' in line:\n                    try:\n                        parts = {}\n                        for part in line.split('|'):\n                            if ':' in part:\n                                key, value = part.split(':', 1)\n                                parts[key.strip().upper()] = value.strip()\n                        \n                        if 'SOURCE' in parts:\n                            source_str = parts['SOURCE'].lower()\n                            source = SearchSource(source_str) if source_str in [s.value for s in SearchSource] else SearchSource.BOCHAAI\n                            \n                            tasks.append(SearchTask(\n                                id=f\"task_llm_{len(tasks)+1}\",\n                                source=source,\n                                query=parts.get('QUERY', query),\n                                description=parts.get('DESC', f\"搜索: {query}\"),\n                                data_type=parts.get('TYPE', 'general'),\n                                icon=self._get_icon_for_source(source),\n                                estimated_time=3\n                            ))\n                    except Exception as e:\n                        logger.debug(f\"解析 LLM 响应行失败: {e}\")\n            \n            return tasks\n            \n        except Exception as e:\n            logger.warning(f\"LLM 分析失败: {e}\")\n            return []\n    \n    def _get_icon_for_source(self, source: SearchSource) -> str:\n        \"\"\"获取数据源对应的图标\"\"\"\n        icons = {\n            SearchSource.AKSHARE: \"📊\",\n            SearchSource.BOCHAAI: \"📰\",\n            SearchSource.BROWSER: \"🌐\",\n            SearchSource.KNOWLEDGE_BASE: \"📚\",\n            SearchSource.ALL: \"🔍\"\n        }\n        return icons.get(source, \"🔍\")\n    \n    async def execute_search_plan(\n        self,\n        plan: SearchPlan\n    ) -> Dict[str, Any]:\n        \"\"\"\n        执行搜索计划\n        \n        Args:\n            plan: 已确认的搜索计划\n            \n        Returns:\n            搜索结果汇总\n        \"\"\"\n        logger.info(f\"🚀 DataCollector: 开始执行搜索计划 {plan.plan_id}...\")\n        \n        plan.status = \"executing\"\n        start_time = datetime.utcnow()\n        \n        results = {\n            \"plan_id\": plan.plan_id,\n            \"stock_code\": plan.stock_code,\n            \"stock_name\": plan.stock_name,\n            \"user_query\": plan.user_query,\n            \"task_results\": [],\n            \"combined_data\": {},\n            \"summary\": \"\",\n            \"success\": False,\n            \"execution_time\": 0\n        }\n        \n        # 并行执行所有任务\n        async_tasks = []\n        for task in plan.tasks:\n            async_tasks.append(self._execute_task(task, plan.stock_code, plan.stock_name))\n        \n        task_results = await asyncio.gather(*async_tasks, return_exceptions=True)\n        \n        # 收集结果\n        for i, result in enumerate(task_results):\n            if isinstance(result, Exception):\n                logger.error(f\"任务执行失败: {result}\")\n                results[\"task_results\"].append(SearchResult(\n                    task_id=plan.tasks[i].id,\n                    source=plan.tasks[i].source.value,\n                    success=False,\n                    error=str(result)\n                ).dict())\n            else:\n                results[\"task_results\"].append(result.dict() if hasattr(result, 'dict') else result)\n                if result.get(\"success\"):\n                    # 合并数据\n                    source = result.get(\"source\", \"unknown\")\n                    if source not in results[\"combined_data\"]:\n                        results[\"combined_data\"][source] = {}\n                    results[\"combined_data\"][source].update(result.get(\"data\", {}))\n        \n        # 生成综合摘要\n        results[\"summary\"] = await self._generate_combined_summary(\n            plan.user_query,\n            results[\"combined_data\"],\n            plan.stock_name\n        )\n        \n        # 计算执行时间\n        end_time = datetime.utcnow()\n        results[\"execution_time\"] = (end_time - start_time).total_seconds()\n        results[\"success\"] = any(r.get(\"success\") for r in results[\"task_results\"])\n        \n        plan.status = \"completed\"\n        \n        logger.info(f\"✅ 搜索计划执行完成，耗时 {results['execution_time']:.1f}s\")\n        \n        return results\n    \n    async def _execute_task(\n        self,\n        task: SearchTask,\n        stock_code: str,\n        stock_name: str\n    ) -> Dict[str, Any]:\n        \"\"\"执行单个搜索任务\"\"\"\n        logger.info(f\"🔍 执行任务: {task.description}\")\n        \n        start_time = datetime.utcnow()\n        result = {\n            \"task_id\": task.id,\n            \"source\": task.source.value,\n            \"success\": False,\n            \"data\": {},\n            \"summary\": \"\",\n            \"execution_time\": 0\n        }\n        \n        try:\n            if task.source == SearchSource.AKSHARE:\n                data = await self._search_akshare(task.query, stock_code, task.data_type)\n                result[\"data\"] = data or {}\n                result[\"success\"] = bool(data)\n                \n            elif task.source == SearchSource.BOCHAAI:\n                data = await self._search_bochaai(task.query, stock_name)\n                result[\"data\"] = data or {}\n                result[\"success\"] = bool(data)\n                \n            elif task.source == SearchSource.BROWSER:\n                data = await self._search_browser(task.query)\n                result[\"data\"] = data or {}\n                result[\"success\"] = bool(data)\n                \n            elif task.source == SearchSource.KNOWLEDGE_BASE:\n                data = await self._search_knowledge_base(task.query, stock_code)\n                result[\"data\"] = data or {}\n                result[\"success\"] = bool(data)\n            \n        except Exception as e:\n            logger.error(f\"任务 {task.id} 执行失败: {e}\")\n            result[\"error\"] = str(e)\n        \n        end_time = datetime.utcnow()\n        result[\"execution_time\"] = (end_time - start_time).total_seconds()\n        \n        return result\n    \n    async def _search_akshare(\n        self,\n        query: str,\n        stock_code: str,\n        data_type: Optional[str] = None\n    ) -> Optional[Dict[str, Any]]:\n        \"\"\"从 AkShare 获取数据\"\"\"\n        data = {}\n        \n        try:\n            if data_type == \"financial\" or data_type == \"overview\":\n                financial = await stock_data_service.get_financial_indicators(stock_code)\n                if financial:\n                    data[\"financial_indicators\"] = financial\n            \n            if data_type == \"fund_flow\" or data_type == \"overview\":\n                fund_flow = await stock_data_service.get_fund_flow(stock_code, days=10)\n                if fund_flow:\n                    data[\"fund_flow\"] = fund_flow\n            \n            if data_type == \"realtime\" or data_type == \"overview\":\n                realtime = await stock_data_service.get_realtime_quote(stock_code)\n                if realtime:\n                    data[\"realtime_quote\"] = realtime\n            \n            if data_type == \"kline\":\n                kline = await stock_data_service.get_kline_data(stock_code, period=\"daily\", limit=30)\n                if kline:\n                    data[\"kline_summary\"] = {\n                        \"period\": \"daily\",\n                        \"count\": len(kline),\n                        \"latest\": kline[-1] if kline else None,\n                        \"recent_5\": kline[-5:] if len(kline) >= 5 else kline\n                    }\n            \n            if data:\n                logger.info(f\"✅ AkShare 返回数据: {list(data.keys())}\")\n                return data\n                \n        except Exception as e:\n            logger.warning(f\"AkShare 搜索出错: {e}\")\n        \n        return None\n    \n    async def _search_bochaai(\n        self,\n        query: str,\n        stock_name: Optional[str] = None\n    ) -> Optional[Dict[str, Any]]:\n        \"\"\"从 BochaAI 搜索新闻\"\"\"\n        if not bochaai_search.is_available():\n            logger.debug(\"BochaAI 未配置，跳过\")\n            return None\n        \n        try:\n            results = bochaai_search.search(\n                query=query,\n                freshness=\"oneWeek\",\n                count=10\n            )\n            \n            if results:\n                news_list = [\n                    {\n                        \"title\": r.title,\n                        \"snippet\": r.snippet[:200] if r.snippet else \"\",\n                        \"url\": r.url,\n                        \"source\": r.site_name or \"unknown\",\n                        \"date\": r.date_published or \"\"\n                    }\n                    for r in results\n                ]\n                logger.info(f\"✅ BochaAI 返回 {len(news_list)} 条新闻\")\n                return {\"news\": news_list, \"count\": len(news_list)}\n        \n        except Exception as e:\n            logger.warning(f\"BochaAI 搜索出错: {e}\")\n        \n        return None\n    \n    async def _search_browser(self, query: str) -> Optional[Dict[str, Any]]:\n        \"\"\"使用交互式爬虫搜索\"\"\"\n        try:\n            loop = asyncio.get_event_loop()\n            results = await loop.run_in_executor(\n                None,\n                lambda: self._interactive_crawler.interactive_search(\n                    query=query,\n                    engines=[\"baidu_news\", \"sogou\"],\n                    num_results=10,\n                    search_type=\"news\"\n                )\n            )\n            \n            if results:\n                news_list = [\n                    {\n                        \"title\": r.get(\"title\", \"\"),\n                        \"snippet\": r.get(\"snippet\", \"\")[:200],\n                        \"url\": r.get(\"url\", \"\"),\n                        \"source\": \"browser_search\"\n                    }\n                    for r in results\n                ]\n                logger.info(f\"✅ Browser 返回 {len(news_list)} 条结果\")\n                return {\"search_results\": news_list, \"count\": len(news_list)}\n        \n        except Exception as e:\n            logger.warning(f\"Browser 搜索出错: {e}\")\n        \n        return None\n    \n    async def _search_knowledge_base(\n        self,\n        query: str,\n        stock_code: str\n    ) -> Optional[Dict[str, Any]]:\n        \"\"\"从知识库搜索历史数据\"\"\"\n        try:\n            from ..services.news_service import news_service\n            \n            if stock_code and news_service:\n                news_list = await news_service.get_news_by_stock(stock_code, limit=10)\n                if news_list:\n                    kb_news = [\n                        {\n                            \"title\": getattr(news, 'title', ''),\n                            \"content\": (getattr(news, 'content', '') or '')[:300],\n                            \"source\": getattr(news, 'source', ''),\n                            \"date\": news.publish_time.isoformat() if hasattr(news, 'publish_time') and news.publish_time else \"\"\n                        }\n                        for news in news_list\n                    ]\n                    logger.info(f\"✅ KB 返回 {len(kb_news)} 条历史新闻\")\n                    return {\"historical_news\": kb_news, \"count\": len(kb_news)}\n        \n        except Exception as e:\n            logger.debug(f\"KB 搜索出错: {e}\")\n        \n        return None\n    \n    async def _generate_combined_summary(\n        self,\n        query: str,\n        data: Dict[str, Any],\n        stock_name: str\n    ) -> str:\n        \"\"\"生成综合摘要\"\"\"\n        summary_parts = [f\"## 搜索结果: {query}\\n\"]\n        summary_parts.append(f\"**股票**: {stock_name}\\n\")\n        \n        # AkShare 数据\n        if \"akshare\" in data:\n            ak_data = data[\"akshare\"]\n            summary_parts.append(\"### 📊 财务/行情数据\\n\")\n            \n            if \"financial_indicators\" in ak_data:\n                fi = ak_data[\"financial_indicators\"]\n                summary_parts.append(f\"- PE: {fi.get('pe_ratio', 'N/A')}, PB: {fi.get('pb_ratio', 'N/A')}\")\n                summary_parts.append(f\"- ROE: {fi.get('roe', 'N/A')}%\")\n            \n            if \"realtime_quote\" in ak_data:\n                rt = ak_data[\"realtime_quote\"]\n                summary_parts.append(f\"- 当前价: {rt.get('price', 'N/A')}元, 涨跌幅: {rt.get('change_percent', 'N/A')}%\")\n            \n            if \"fund_flow\" in ak_data:\n                ff = ak_data[\"fund_flow\"]\n                summary_parts.append(f\"- 资金流向: {ff.get('main_flow_trend', 'N/A')}\")\n            \n            summary_parts.append(\"\")\n        \n        # BochaAI 新闻\n        if \"bochaai\" in data:\n            news = data[\"bochaai\"].get(\"news\", [])\n            if news:\n                summary_parts.append(\"### 📰 最新新闻\\n\")\n                for i, n in enumerate(news[:5], 1):\n                    summary_parts.append(f\"{i}. **{n['title'][:50]}**\")\n                    if n.get('snippet'):\n                        summary_parts.append(f\"   {n['snippet'][:100]}...\")\n                summary_parts.append(\"\")\n        \n        # Browser 结果\n        if \"browser\" in data:\n            results = data[\"browser\"].get(\"search_results\", [])\n            if results:\n                summary_parts.append(\"### 🌐 网页搜索结果\\n\")\n                for i, r in enumerate(results[:5], 1):\n                    summary_parts.append(f\"{i}. {r['title'][:50]}\")\n                summary_parts.append(\"\")\n        \n        # KB 历史数据\n        if \"kb\" in data:\n            kb_news = data[\"kb\"].get(\"historical_news\", [])\n            if kb_news:\n                summary_parts.append(\"### 📚 历史资料\\n\")\n                for i, n in enumerate(kb_news[:3], 1):\n                    summary_parts.append(f\"{i}. {n['title'][:50]}\")\n                summary_parts.append(\"\")\n        \n        return \"\\n\".join(summary_parts)\n    \n    # ============ 兼容旧 API ============\n    \n    async def collect_data(\n        self,\n        stock_code: str,\n        stock_name: str,\n        data_requirements: Optional[Dict[str, Any]] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        搜集股票相关数据（兼容旧 API）\n        \"\"\"\n        # 创建并执行一个全面的搜索计划\n        plan = await self.generate_search_plan(\n            query=\"综合数据搜集\",\n            stock_code=stock_code,\n            stock_name=stock_name\n        )\n        \n        # 添加所有基础数据任务\n        plan.tasks = [\n            SearchTask(\n                id=f\"task_init_1\",\n                source=SearchSource.AKSHARE,\n                query=stock_code,\n                description=\"获取财务和行情数据\",\n                data_type=\"overview\",\n                icon=\"📊\",\n                estimated_time=3\n            ),\n            SearchTask(\n                id=f\"task_init_2\",\n                source=SearchSource.KNOWLEDGE_BASE,\n                query=stock_code,\n                description=\"获取历史新闻\",\n                data_type=\"news\",\n                icon=\"📚\",\n                estimated_time=2\n            )\n        ]\n        \n        return await self.execute_search_plan(plan)\n\n\n# 快速分析师（保持不变）\nclass QuickAnalystAgent(Agent):\n    \"\"\"快速分析师智能体\"\"\"\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        super().__init__(\n            name=\"QuickAnalyst\",\n            role=\"快速分析师\",\n            goal=\"快速综合多角度给出投资建议\",\n            backstory=\"\"\"你是一位经验丰富的量化分析师，擅长快速分析和决策。\n你能够在短时间内综合考虑多空因素，给出简洁明了的投资建议。\n你的分析风格是：快速、准确、实用。\"\"\",\n            organization_id=organization_id\n        )\n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        logger.info(f\"Initialized {self.name} agent\")\n    \n    async def quick_analyze(\n        self,\n        stock_code: str,\n        stock_name: str,\n        context: str\n    ) -> Dict[str, Any]:\n        \"\"\"快速分析\"\"\"\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        prompt = f\"\"\"请对 {stock_name}({stock_code}) 进行快速投资分析。\n\n【当前时间】\n{current_time}\n\n背景资料:\n{context}\n\n请在1分钟内给出：\n1. 核心观点（一句话）\n2. 看多因素（3点）\n3. 看空因素（3点）\n4. 投资建议（买入/持有/卖出）\n5. 目标价位和止损价位\n\n请用简洁的语言，直接给出结论。\"\"\"\n\n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": \"你是快速分析师，擅长快速给出投资建议。\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            content = response.content if hasattr(response, 'content') else str(response)\n            return {\n                \"success\": True,\n                \"analysis\": content,\n                \"timestamp\": datetime.utcnow().isoformat()\n            }\n        except Exception as e:\n            logger.error(f\"Quick analysis failed: {e}\")\n            return {\n                \"success\": False,\n                \"error\": str(e)\n            }\n\n\n# 工厂函数\ndef create_data_collector(llm_provider=None) -> DataCollectorAgentV2:\n    \"\"\"创建数据专员实例\"\"\"\n    return DataCollectorAgentV2(llm_provider=llm_provider)\n\n"
  },
  {
    "path": "backend/app/agents/debate_agents.py",
    "content": "\"\"\"\n辩论智能体 - Phase 2\n实现 Bull vs Bear 多智能体辩论机制\n\n支持动态搜索：智能体可以在发言中请求额外数据\n格式: [SEARCH: \"查询内容\" source:数据源]\n\"\"\"\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\nfrom agenticx import Agent\n\nfrom ..services.llm_service import get_llm_provider\n\nlogger = logging.getLogger(__name__)\n\n# 数据请求提示词片段（用于启用动态搜索的场景）\nDATA_REQUEST_HINT = \"\"\"\n【数据请求】如果需要更多数据支撑你的论点，可以在发言末尾添加搜索请求：\n- [SEARCH: \"具体数据需求\" source:akshare]  -- 财务/行情数据\n- [SEARCH: \"新闻关键词\" source:bochaai]  -- 最新新闻\n- [SEARCH: \"搜索内容\"]  -- 自动选择最佳数据源\n请只在确实需要时使用，每次最多1-2个请求。\"\"\"\n\n\nclass BullResearcherAgent(Agent):\n    \"\"\"\n    看多研究员智能体\n    职责：基于新闻和数据，生成看多观点和投资建议\n    支持在辩论中请求额外数据\n    \"\"\"\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        # 先调用父类初始化（Pydantic BaseModel）\n        super().__init__(\n            name=\"BullResearcher\",\n            role=\"看多研究员\",\n            goal=\"从积极角度分析股票，发现投资机会和增长潜力\",\n            backstory=\"\"\"你是一位乐观但理性的股票研究员，擅长发现被低估的投资机会。\n你善于从新闻和数据中提取正面信息，分析公司的增长潜力、竞争优势和市场机遇。\n你的分析注重长期价值，但也关注短期催化剂。\n当你发现数据不足以支撑论点时，你会主动请求补充数据。\"\"\",\n            organization_id=organization_id\n        )\n        \n        # 在 super().__init__() 之后设置 _llm_provider（避免被 Pydantic 清除）\n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        \n        logger.info(f\"Initialized {self.name} agent\")\n    \n    def analyze(\n        self,\n        stock_code: str,\n        stock_name: str,\n        news_list: List[Dict[str, Any]],\n        context: str = \"\"\n    ) -> Dict[str, Any]:\n        \"\"\"\n        生成看多分析报告\n        \"\"\"\n        news_summary = self._summarize_news(news_list)\n        \n        # 获取当前系统时间\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        prompt = f\"\"\"你是一位看多研究员，请从积极角度分析以下股票：\n\n【当前时间】\n{current_time}\n\n【股票信息】\n代码：{stock_code}\n名称：{stock_name}\n\n【相关新闻摘要】\n{news_summary}\n\n【分析背景】\n{context if context else \"无额外背景信息\"}\n\n请从以下角度进行看多分析：\n\n## 1. 核心看多逻辑\n- 列出3-5个看多的核心理由\n- 每个理由需要有数据或新闻支撑\n\n## 2. 增长催化剂\n- 短期催化剂（1-3个月内可能发生的利好）\n- 中长期催化剂（3-12个月的增长驱动力）\n\n## 3. 估值分析\n- 当前估值是否具有吸引力\n- 与同行业对比的优势\n\n## 4. 目标预期\n- 给出合理的预期收益空间\n- 说明达成条件\n\n## 5. 风险提示\n- 虽然看多，但也需要指出可能的风险\n\n请确保分析客观、有理有据，避免盲目乐观。\n\"\"\"\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            \n            analysis_text = response.content if hasattr(response, 'content') else str(response)\n            \n            return {\n                \"success\": True,\n                \"agent_name\": self.name,\n                \"agent_role\": self.role,\n                \"stance\": \"bull\",\n                \"analysis\": analysis_text,\n                \"timestamp\": datetime.utcnow().isoformat()\n            }\n        \n        except Exception as e:\n            logger.error(f\"Bull analysis failed: {e}\")\n            return {\n                \"success\": False,\n                \"agent_name\": self.name,\n                \"stance\": \"bull\",\n                \"error\": str(e)\n            }\n    \n    async def debate_round(self, prompt: str, enable_data_request: bool = True) -> str:\n        \"\"\"\n        辩论回合发言（用于实时辩论模式）\n        \n        Args:\n            prompt: 辩论提示词\n            enable_data_request: 是否启用数据请求功能\n            \n        Returns:\n            发言内容（可能包含数据请求标记）\n        \"\"\"\n        system_content = f\"\"\"你是{self.role}，{self.backstory}\n你正在参与一场多空辩论，请用专业但有说服力的语气发言。\n\n作为看多方，你的核心任务是：\n1. 挖掘公司的增长潜力和投资价值\n2. 用数据和事实支撑你的乐观观点\n3. 反驳看空方提出的风险点\n4. 识别被市场低估的机会\"\"\"\n\n        if enable_data_request:\n            system_content += DATA_REQUEST_HINT\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": system_content},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            return response.content if hasattr(response, 'content') else str(response)\n        except Exception as e:\n            logger.error(f\"Bull debate round failed: {e}\")\n            return f\"[发言出错: {e}]\"\n    \n    def _summarize_news(self, news_list: List[Dict[str, Any]]) -> str:\n        \"\"\"汇总新闻信息\"\"\"\n        if not news_list:\n            return \"暂无相关新闻\"\n        \n        summaries = []\n        for i, news in enumerate(news_list[:5], 1):\n            title = news.get(\"title\", \"\")\n            sentiment = news.get(\"sentiment_score\")\n            sentiment_text = \"\"\n            if sentiment is not None:\n                if sentiment > 0.1:\n                    sentiment_text = \"（利好）\"\n                elif sentiment < -0.1:\n                    sentiment_text = \"（利空）\"\n                else:\n                    sentiment_text = \"（中性）\"\n            summaries.append(f\"{i}. {title} {sentiment_text}\")\n        \n        return \"\\n\".join(summaries)\n\n\nclass BearResearcherAgent(Agent):\n    \"\"\"\n    看空研究员智能体\n    职责：基于新闻和数据，识别风险和潜在问题\n    支持在辩论中请求额外数据\n    \"\"\"\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        # 先调用父类初始化（Pydantic BaseModel）\n        super().__init__(\n            name=\"BearResearcher\",\n            role=\"看空研究员\",\n            goal=\"从风险角度分析股票，识别潜在问题和下行风险\",\n            backstory=\"\"\"你是一位谨慎的股票研究员，擅长发现被忽视的风险。\n你善于从新闻和数据中提取负面信号，分析公司的潜在问题、竞争威胁和市场风险。\n你的分析注重风险控制，帮助投资者避免损失。\n当你发现数据不足以支撑风险判断时，你会主动请求补充数据。\"\"\",\n            organization_id=organization_id\n        )\n        \n        # 在 super().__init__() 之后设置 _llm_provider（避免被 Pydantic 清除）\n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        \n        logger.info(f\"Initialized {self.name} agent\")\n    \n    def analyze(\n        self,\n        stock_code: str,\n        stock_name: str,\n        news_list: List[Dict[str, Any]],\n        context: str = \"\"\n    ) -> Dict[str, Any]:\n        \"\"\"\n        生成看空分析报告\n        \"\"\"\n        news_summary = self._summarize_news(news_list)\n        \n        # 获取当前系统时间\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        prompt = f\"\"\"你是一位看空研究员，请从风险角度分析以下股票：\n\n【当前时间】\n{current_time}\n\n【股票信息】\n代码：{stock_code}\n名称：{stock_name}\n\n【相关新闻摘要】\n{news_summary}\n\n【分析背景】\n{context if context else \"无额外背景信息\"}\n\n请从以下角度进行风险分析：\n\n## 1. 核心风险因素\n- 列出3-5个主要风险点\n- 每个风险需要有数据或新闻支撑\n\n## 2. 负面催化剂\n- 短期可能出现的利空事件\n- 中长期的结构性风险\n\n## 3. 估值风险\n- 当前估值是否过高\n- 与同行业对比的劣势\n\n## 4. 下行空间\n- 分析可能的下跌幅度\n- 触发下跌的条件\n\n## 5. 反驳看多观点\n- 针对常见的看多逻辑提出质疑\n- 指出乐观预期的不确定性\n\n请确保分析客观、有理有据，避免无根据的悲观。\n\"\"\"\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            \n            analysis_text = response.content if hasattr(response, 'content') else str(response)\n            \n            return {\n                \"success\": True,\n                \"agent_name\": self.name,\n                \"agent_role\": self.role,\n                \"stance\": \"bear\",\n                \"analysis\": analysis_text,\n                \"timestamp\": datetime.utcnow().isoformat()\n            }\n        \n        except Exception as e:\n            logger.error(f\"Bear analysis failed: {e}\")\n            return {\n                \"success\": False,\n                \"agent_name\": self.name,\n                \"stance\": \"bear\",\n                \"error\": str(e)\n            }\n    \n    def _summarize_news(self, news_list: List[Dict[str, Any]]) -> str:\n        \"\"\"汇总新闻信息\"\"\"\n        if not news_list:\n            return \"暂无相关新闻\"\n        \n        summaries = []\n        for i, news in enumerate(news_list[:5], 1):\n            title = news.get(\"title\", \"\")\n            sentiment = news.get(\"sentiment_score\")\n            sentiment_text = \"\"\n            if sentiment is not None:\n                if sentiment > 0.1:\n                    sentiment_text = \"（利好）\"\n                elif sentiment < -0.1:\n                    sentiment_text = \"（利空）\"\n                else:\n                    sentiment_text = \"（中性）\"\n            summaries.append(f\"{i}. {title} {sentiment_text}\")\n        \n        return \"\\n\".join(summaries)\n    \n    async def debate_round(self, prompt: str, enable_data_request: bool = True) -> str:\n        \"\"\"\n        辩论回合发言（用于实时辩论模式）\n        \n        Args:\n            prompt: 辩论提示词\n            enable_data_request: 是否启用数据请求功能\n            \n        Returns:\n            发言内容（可能包含数据请求标记）\n        \"\"\"\n        system_content = f\"\"\"你是{self.role}，{self.backstory}\n你正在参与一场多空辩论，请用专业但有说服力的语气发言。\n\n作为看空方，你的核心任务是：\n1. 识别公司的潜在风险和问题\n2. 用数据和事实支撑你的谨慎观点\n3. 反驳看多方过于乐观的论点\n4. 揭示被市场忽视的风险因素\"\"\"\n\n        if enable_data_request:\n            system_content += DATA_REQUEST_HINT\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": system_content},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            return response.content if hasattr(response, 'content') else str(response)\n        except Exception as e:\n            logger.error(f\"Bear debate round failed: {e}\")\n            return f\"[发言出错: {e}]\"\n\n\nclass InvestmentManagerAgent(Agent):\n    \"\"\"\n    投资经理智能体\n    职责：综合 Bull/Bear 观点，做出最终投资决策\n    支持在决策前请求额外数据\n    \"\"\"\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        # 先调用父类初始化（Pydantic BaseModel）\n        super().__init__(\n            name=\"InvestmentManager\",\n            role=\"投资经理\",\n            goal=\"综合多方观点，做出理性的投资决策\",\n            backstory=\"\"\"你是一位经验丰富的投资经理，擅长在多方观点中找到平衡。\n你善于综合看多和看空的分析，结合市场环境，做出最优的投资决策。\n你的决策注重风险收益比，追求稳健的长期回报。\n当你认为辩论双方提供的数据不足以做出决策时，你会主动请求补充关键数据。\"\"\",\n            organization_id=organization_id\n        )\n        \n        # 在 super().__init__() 之后设置 _llm_provider（避免被 Pydantic 清除）\n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        \n        logger.info(f\"Initialized {self.name} agent\")\n    \n    def make_decision(\n        self,\n        stock_code: str,\n        stock_name: str,\n        bull_analysis: str,\n        bear_analysis: str,\n        context: str = \"\",\n        enable_data_request: bool = False\n    ) -> Dict[str, Any]:\n        \"\"\"\n        综合双方观点，做出投资决策\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            bull_analysis: 看多分析\n            bear_analysis: 看空分析\n            context: 市场背景和补充数据\n            enable_data_request: 是否允许请求额外数据\n        \"\"\"\n        # 获取当前系统时间\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        prompt = f\"\"\"你是一位投资经理，请综合以下看多和看空观点，做出投资决策：\n\n【当前时间】\n{current_time}\n\n【股票信息】\n代码：{stock_code}\n名称：{stock_name}\n\n【看多观点】\n{bull_analysis}\n\n【看空观点】\n{bear_analysis}\n\n【市场背景及补充数据】\n{context if context else \"当前市场处于正常波动区间\"}\n\n请按以下结构给出最终决策：\n\n## 1. 观点评估\n\n### 看多方论点质量\n- 评估看多论点的说服力（1-10分）\n- 指出最有力的看多论据\n- 指出看多方忽视的问题\n\n### 看空方论点质量\n- 评估看空论点的说服力（1-10分）\n- 指出最有力的看空论据\n- 指出看空方过于悲观的地方\n\n## 2. 数据充分性评估\n- 辩论中使用的数据是否充分？\n- 是否有关键数据缺失影响决策？\n- 已获得的补充数据如何影响判断？\n\n## 3. 综合判断\n- 当前股票的核心矛盾是什么\n- 短期（1-3个月）和中长期（6-12个月）的观点\n\n## 4. 投资决策\n\n**最终评级**：[强烈推荐 / 推荐 / 中性 / 谨慎 / 回避]\n\n**决策理由**：\n（详细说明决策依据）\n\n**建议操作**：\n- 对于持仓者：持有/加仓/减仓/清仓\n- 对于观望者：买入/观望/规避\n\n**关键监测指标**：\n- 列出需要持续关注的信号\n- 什么情况下需要调整决策\n\n## 5. 风险收益比\n- 预期收益空间\n- 潜在下行风险\n- 风险收益比评估\n\n请确保决策客观、理性，充分考虑双方观点和已获取的数据。\n\"\"\"\n        \n        if enable_data_request:\n            prompt += f\"\"\"\n\n【数据请求】如果你认为还需要更多数据才能做出准确决策，可以添加搜索请求：\n- [SEARCH: \"具体数据需求\" source:akshare]\n- [SEARCH: \"新闻关键词\" source:bochaai]\n但请优先基于现有数据做出判断。\"\"\"\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            \n            decision_text = response.content if hasattr(response, 'content') else str(response)\n            \n            # 提取评级\n            rating = self._extract_rating(decision_text)\n            \n            return {\n                \"success\": True,\n                \"agent_name\": self.name,\n                \"agent_role\": self.role,\n                \"decision\": decision_text,\n                \"rating\": rating,\n                \"timestamp\": datetime.utcnow().isoformat()\n            }\n        \n        except Exception as e:\n            logger.error(f\"Investment decision failed: {e}\")\n            return {\n                \"success\": False,\n                \"agent_name\": self.name,\n                \"error\": str(e)\n            }\n    \n    def _extract_rating(self, text: str) -> str:\n        \"\"\"从决策文本中提取评级\"\"\"\n        import re\n        \n        ratings = [\"强烈推荐\", \"推荐\", \"中性\", \"谨慎\", \"回避\"]\n        for rating in ratings:\n            if rating in text:\n                return rating\n        return \"中性\"\n\n\nclass DebateWorkflow:\n    \"\"\"\n    辩论工作流\n    协调 Bull/Bear/InvestmentManager 进行多轮辩论\n    \"\"\"\n    \n    def __init__(self, llm_provider=None):\n        self.bull_agent = BullResearcherAgent(llm_provider)\n        self.bear_agent = BearResearcherAgent(llm_provider)\n        self.manager_agent = InvestmentManagerAgent(llm_provider)\n        \n        # 执行轨迹记录\n        self.trajectory = []\n        \n        logger.info(\"Initialized DebateWorkflow\")\n    \n    async def run_debate(\n        self,\n        stock_code: str,\n        stock_name: str,\n        news_list: List[Dict[str, Any]],\n        context: str = \"\",\n        rounds: int = 1\n    ) -> Dict[str, Any]:\n        \"\"\"\n        执行完整的辩论流程\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            news_list: 相关新闻列表\n            context: 额外上下文\n            rounds: 辩论轮数\n        \n        Returns:\n            辩论结果\n        \"\"\"\n        start_time = datetime.utcnow()\n        self.trajectory = []\n        \n        logger.info(f\"🚀 辩论工作流开始: {stock_name}({stock_code}), 新闻数量={len(news_list)}\")\n        \n        try:\n            # 第一阶段：独立分析\n            self._log_step(\"debate_start\", {\n                \"stock_code\": stock_code,\n                \"stock_name\": stock_name,\n                \"news_count\": len(news_list)\n            })\n            \n            # Bull 分析\n            logger.info(\"📈 开始看多分析 (BullResearcher)...\")\n            self._log_step(\"bull_analysis_start\", {\"agent\": \"BullResearcher\"})\n            bull_result = self.bull_agent.analyze(stock_code, stock_name, news_list, context)\n            logger.info(f\"📈 看多分析完成: success={bull_result.get('success', False)}\")\n            self._log_step(\"bull_analysis_complete\", {\n                \"agent\": \"BullResearcher\",\n                \"success\": bull_result.get(\"success\", False)\n            })\n            \n            # Bear 分析\n            logger.info(\"📉 开始看空分析 (BearResearcher)...\")\n            self._log_step(\"bear_analysis_start\", {\"agent\": \"BearResearcher\"})\n            bear_result = self.bear_agent.analyze(stock_code, stock_name, news_list, context)\n            logger.info(f\"📉 看空分析完成: success={bear_result.get('success', False)}\")\n            self._log_step(\"bear_analysis_complete\", {\n                \"agent\": \"BearResearcher\",\n                \"success\": bear_result.get(\"success\", False)\n            })\n            \n            # 第二阶段：投资经理决策\n            logger.info(\"⚖️ 开始投资经理决策 (InvestmentManager)...\")\n            self._log_step(\"decision_start\", {\"agent\": \"InvestmentManager\"})\n            decision_result = self.manager_agent.make_decision(\n                stock_code=stock_code,\n                stock_name=stock_name,\n                bull_analysis=bull_result.get(\"analysis\", \"\"),\n                bear_analysis=bear_result.get(\"analysis\", \"\"),\n                context=context\n            )\n            logger.info(f\"⚖️ 投资经理决策完成: rating={decision_result.get('rating', 'unknown')}\")\n            self._log_step(\"decision_complete\", {\n                \"agent\": \"InvestmentManager\",\n                \"rating\": decision_result.get(\"rating\", \"unknown\")\n            })\n            \n            end_time = datetime.utcnow()\n            execution_time = (end_time - start_time).total_seconds()\n            \n            logger.info(f\"✅ 辩论工作流完成! 耗时={execution_time:.2f}秒, 评级={decision_result.get('rating', 'unknown')}\")\n            \n            self._log_step(\"debate_complete\", {\n                \"execution_time\": execution_time,\n                \"final_rating\": decision_result.get(\"rating\", \"unknown\")\n            })\n            \n            return {\n                \"success\": True,\n                \"stock_code\": stock_code,\n                \"stock_name\": stock_name,\n                \"bull_analysis\": bull_result,\n                \"bear_analysis\": bear_result,\n                \"final_decision\": decision_result,\n                \"trajectory\": self.trajectory,\n                \"execution_time\": execution_time,\n                \"timestamp\": start_time.isoformat()\n            }\n        \n        except Exception as e:\n            logger.error(f\"❌ 辩论工作流失败: {e}\", exc_info=True)\n            self._log_step(\"debate_failed\", {\"error\": str(e)})\n            return {\n                \"success\": False,\n                \"error\": str(e),\n                \"trajectory\": self.trajectory\n            }\n    \n    def _log_step(self, step_name: str, data: Dict[str, Any]):\n        \"\"\"记录执行步骤\"\"\"\n        step = {\n            \"step\": step_name,\n            \"timestamp\": datetime.utcnow().isoformat(),\n            \"data\": data\n        }\n        self.trajectory.append(step)\n        logger.info(f\"Debate step: {step_name} - {data}\")\n\n\n# 工厂函数\ndef create_debate_workflow(llm_provider=None) -> DebateWorkflow:\n    \"\"\"创建辩论工作流实例\"\"\"\n    return DebateWorkflow(llm_provider)\n\n"
  },
  {
    "path": "backend/app/agents/news_analyst.py",
    "content": "\"\"\"\n新闻分析师智能体\n\"\"\"\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom agenticx import Agent, Task, BaseTool\nfrom agenticx.core.agent_executor import AgentExecutor\n\nfrom ..services.llm_service import get_llm_provider\nfrom ..tools import TextCleanerTool\n\nlogger = logging.getLogger(__name__)\n\n\nclass NewsAnalystAgent(Agent):\n    \"\"\"\n    新闻分析师智能体\n    职责：分析金融新闻的情感、影响和关键信息\n    \"\"\"\n    \n    def __init__(\n        self,\n        llm_provider=None,\n        tools: Optional[List[BaseTool]] = None,\n        organization_id: str = \"finnews\",\n        **kwargs\n    ):\n        \"\"\"\n        初始化新闻分析师智能体\n        \n        Args:\n            llm_provider: LLM 提供者\n            tools: 工具列表\n            organization_id: 组织ID（用于多租户隔离），默认 \"finnews\"\n            **kwargs: 额外参数\n        \"\"\"\n        # 如果没有提供 LLM，使用默认的\n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        \n        # 如果没有提供工具，使用默认工具\n        if tools is None:\n            tools = [TextCleanerTool()]\n        \n        # 保存 LLM 和工具供后续使用（在 super().__init__ 之前保存）\n        self._llm_provider = llm_provider\n        self._tools = tools\n        \n        # 定义智能体属性（Agent 基类）\n        super().__init__(\n            name=\"NewsAnalyst\",\n            role=\"金融新闻分析师\",\n            goal=\"深度分析金融新闻，提取关键信息，评估市场影响\",\n            backstory=\"\"\"你是一位经验丰富的金融新闻分析专家，具有10年以上的证券市场分析经验。\n你擅长从新闻中提取关键信息，准确判断新闻对股票市场的影响，并能够识别潜在的投资机会和风险。\n你的分析报告准确、专业，深受投资者信赖。\"\"\",\n            organization_id=organization_id,\n            **kwargs\n        )\n        \n        # 创建 AgentExecutor（在 super().__init__ 之后）\n        self._executor = None\n        self._init_executor(llm_provider, tools)\n        \n        logger.info(f\"Initialized {self.name} agent\")\n    \n    def _init_executor(self, llm_provider=None, tools=None):\n        \"\"\"初始化 AgentExecutor（延迟初始化）\"\"\"\n        if self._executor is None:\n            if llm_provider is None:\n                llm_provider = getattr(self, '_llm_provider', None) or get_llm_provider()\n            if tools is None:\n                tools = getattr(self, '_tools', None) or [TextCleanerTool()]\n            \n            self._llm_provider = llm_provider\n            self._tools = tools\n            self._executor = AgentExecutor(\n                llm_provider=llm_provider,\n                tools=tools\n            )\n    \n    @property\n    def executor(self):\n        \"\"\"获取 AgentExecutor（延迟初始化）\"\"\"\n        if self._executor is None:\n            self._init_executor()\n        return self._executor\n    \n    def analyze_news(\n        self,\n        news_title: str,\n        news_content: str,\n        news_url: str = \"\",\n        stock_codes: List[str] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        分析单条新闻\n        \n        Args:\n            news_title: 新闻标题\n            news_content: 新闻内容\n            news_url: 新闻URL\n            stock_codes: 关联股票代码\n            \n        Returns:\n            分析结果字典\n        \"\"\"\n        # 构建分析提示词\n        prompt = f\"\"\"你是一位经验丰富的金融新闻分析专家，具有10年以上的证券市场分析经验。\n你擅长从新闻中提取关键信息，准确判断新闻对股票市场的影响，并能够识别潜在的投资机会和风险。\n\n请深度分析以下金融新闻，并提供结构化的分析报告：\n\n【新闻标题】\n{news_title}\n\n【新闻内容】\n{news_content[:2000]}\n\n【关联股票】\n{', '.join(stock_codes) if stock_codes else '无'}\n\n请按照以下结构进行专业分析，并严格使用 Markdown 格式输出：\n\n## 摘要\n\n结构性分析，长期利好市场生态**\n\n### 正面影响：\n- 核心要点1\n- 核心要点2\n- 核心要点3\n\n### 潜在挑战：\n- 挑战点1\n- 挑战点2\n\n---\n\n## 1. 情感倾向：[中性偏利好] （评分：X.X）\n\n**情感判断**：[中性偏利好/利好/利空/中性]**\n**综合评分**：+X.X （范围：-1 至 +1）**\n\n**理由说明：**\n详细说明评分依据，包括：\n- 政策影响分析\n- 市场短期/长期影响\n- 预期收益/风险评估\n\n---\n\n## 2. 关键信息提取\n\n**请使用标准 Markdown 表格格式，确保表格清晰易读：**\n\n| 类别 | 内容 |\n|------|------|\n| 公司名称 | XXX公司（全称，股票代码：XXXXXX） |\n| 事件时间 | 新闻发布时间：YYYY年MM月DD日；关键事件时间线涵盖YYYY年QXXX |\n| 股价变动 | 详细描述股价变化趋势和数据 |\n| 财务表现（YYYY年QX） | 关键财务指标（使用具体数字和增长率） |\n| 驱动因素 | • 因素1<br>• 因素2<br>• 因素3 |\n| 分析师观点 | • 机构1（分析师）：观点内容<br>• 机构2（分析师）：观点内容 |\n| 市场情绪指标 | 具体指标和数据 |\n\n**重要说明（表格严格规范）**：\n- **禁止跨行**：同一类别下的所有内容必须在**同一行**的单元格内\n- **强制换行**：如果同一单元格有多条内容，**必须**使用 `<br>` 分隔，**严禁**使用 Markdown 列表（- 或 1.）或直接换行\n- **错误示例**（绝对禁止）：\n  | 驱动因素 | • 因素1 |\n  |          | • 因素2 |  <-- 错误！不能另起一行\n- **正确示例**：\n  | 驱动因素 | • 因素1<br>• 因素2 |\n- 表头和内容之间用 `|------|------|` 分隔\n- 数据要准确，有具体数字时必须标注\n\n---\n\n## 3. 市场影响分析\n\n### 短期影响（1-3个月）\n- 影响点1：具体分析\n- 影响点2：具体分析\n\n### 中期影响（3-12个月）\n- 影响点1：具体分析\n- 影响点2：具体分析\n\n### 长期影响（1年以上）\n- 影响点1：具体分析\n- 影响点2：具体分析\n\n---\n\n## 4. 投资建议\n\n**投资评级**：[推荐买入/谨慎持有/观望/减持]\n\n**建议理由**：\n1. 核心逻辑1\n2. 核心逻辑2\n3. 核心逻辑3\n\n**风险提示**：\n- 风险1\n- 风险2\n\n---\n\n**格式要求（重要）**：\n1. 必须使用标准 Markdown 语法\n2. **表格内容严禁跨行**，单元格内换行只能用 `<br>`\n3. 标题层级清晰：使用 ##、### 等\n4. 列表使用 - 或数字编号（表格外）\n5. 加粗使用 **文本**\n6. 分隔线使用 ---\n7. 评分必须精确到小数点后1位\n8. 所有数据必须真实、准确，来源于新闻内容\n\n请确保分析报告专业、准确、结构清晰，特别注意表格格式的规范性，避免表格行错位。\n\"\"\"\n        \n        try:\n            # 确保 LLM provider 已初始化\n            if not hasattr(self, '_llm_provider') or self._llm_provider is None:\n                self._llm_provider = get_llm_provider()\n            \n            logger.info(f\"Calling LLM provider: {type(self._llm_provider).__name__}, model: {getattr(self._llm_provider, 'model', 'unknown')}\")\n            \n            # 直接调用 LLM（不使用 AgentExecutor，避免审批暂停）\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            \n            logger.info(\"LLM response received\")\n            \n            # 获取分析结果\n            analysis_text = response.content if hasattr(response, 'content') else str(response)\n            \n            # 修复 Markdown 表格格式\n            analysis_text = self._repair_markdown_table(analysis_text)\n            \n            # 尝试提取结构化信息\n            structured_result = self._extract_structured_info(analysis_text)\n            \n            return {\n                \"success\": True,\n                \"analysis_result\": analysis_text,\n                \"structured_data\": structured_result,\n                \"agent_name\": self.name,\n                \"agent_role\": self.role,\n            }\n        \n        except Exception as e:\n            logger.error(f\"News analysis failed: {e}\", exc_info=True)\n            return {\n                \"success\": False,\n                \"error\": str(e),\n                \"agent_name\": self.name,\n            }\n    \n    def _repair_markdown_table(self, text: str) -> str:\n        \"\"\"\n        修复 Markdown 表格格式问题\n        主要解决：多行内容被错误拆分为多行单元格，导致首列为空的问题\n        \"\"\"\n        import re\n        \n        lines = text.split('\\n')\n        new_lines = []\n        in_table = False\n        last_table_line_idx = -1\n        \n        for line in lines:\n            stripped = line.strip()\n            \n            # 检测表格行\n            is_table_row = stripped.startswith('|') and stripped.endswith('|')\n            is_separator = '---' in stripped and '|' in stripped\n            \n            if is_table_row:\n                if not in_table:\n                    in_table = True\n                \n                # 如果是分隔行，直接添加\n                if is_separator:\n                    new_lines.append(line)\n                    last_table_line_idx = len(new_lines) - 1\n                    continue\n                \n                # 检查是否是\"坏行\"（首列为空）\n                # 匹配模式：| 空白 | 内容 |\n                parts = [p.strip() for p in stripped.strip('|').split('|')]\n                \n                # 如果首列为空，且不是第一行，且上一行也是表格行\n                if len(parts) >= 2 and not parts[0] and last_table_line_idx >= 0:\n                    # 获取上一行\n                    prev_line = new_lines[last_table_line_idx]\n                    prev_parts = [p.strip() for p in prev_line.strip().strip('|').split('|')]\n                    \n                    # 确保列数匹配\n                    if len(parts) == len(prev_parts):\n                        # 将内容合并到上一行的对应列\n                        for i in range(1, len(parts)):\n                            if parts[i]:\n                                prev_parts[i] = f\"{prev_parts[i]}<br>• {parts[i]}\" if parts[i].startswith('•') else f\"{prev_parts[i]}<br>{parts[i]}\"\n                        \n                        # 重建上一行\n                        new_prev_line = '| ' + ' | '.join(prev_parts) + ' |'\n                        new_lines[last_table_line_idx] = new_prev_line\n                        # 当前行被合并，不添加到 new_lines\n                        continue\n            \n            else:\n                in_table = False\n            \n            new_lines.append(line)\n            if in_table:\n                last_table_line_idx = len(new_lines) - 1\n                \n        return '\\n'.join(new_lines)\n    \n    def _extract_structured_info(self, analysis_text: str) -> Dict[str, Any]:\n        \"\"\"\n        从分析文本中提取结构化信息\n        \n        Args:\n            analysis_text: 分析文本\n            \n        Returns:\n            结构化数据\n        \"\"\"\n        import re\n        \n        result = {\n            \"sentiment\": \"neutral\",\n            \"sentiment_score\": 0.0,\n            \"confidence\": 0.5,\n            \"key_points\": [],\n            \"market_impact\": \"\",\n            \"investment_advice\": \"\",\n        }\n        \n        try:\n            # 提取情感倾向（支持多种格式）\n            # 匹配：利好、利空、中性、显著利好、显著利空等\n            sentiment_patterns = [\n                r'情感倾向[：:]\\s*\\*?\\*?(显著|明显)?(利好|利空|中性)',\n                r'(显著|明显)?(利好|利空|中性)',  # 备用模式\n            ]\n            for pattern in sentiment_patterns:\n                sentiment_match = re.search(pattern, analysis_text)\n                if sentiment_match:\n                    # 提取最后一个匹配的词（利好/利空/中性）\n                    groups = [g for g in sentiment_match.groups() if g]\n                    if groups:\n                        sentiment_word = groups[-1]\n                        sentiment_map = {\"利好\": \"positive\", \"利空\": \"negative\", \"中性\": \"neutral\"}\n                        result[\"sentiment\"] = sentiment_map.get(sentiment_word, \"neutral\")\n                        break\n            \n            # 提取情感评分（支持多种格式）\n            # 匹配：-0.92、**-0.92**、-0.92 / -1.0 等格式\n            score_patterns = [\n                r'综合评分[：:]\\s*\\*?\\*?([-+]?\\d*\\.?\\d+)',  # 综合评分：-0.92（优先级最高）\n                r'评分[：:]\\s*\\*?\\*?([-+]?\\d*\\.?\\d+)\\s*/\\s*[-+]?\\d*\\.?\\d+',  # 评分：-0.85 / 1.0\n                r'情感评分[：:]\\s*\\*?\\*?([-+]?\\d*\\.?\\d+)',  # 情感评分：-0.92\n                r'评分[：:]\\s*\\*?\\*?([-+]?\\d*\\.?\\d+)',       # 评分：-0.92\n            ]\n            for pattern in score_patterns:\n                score_match = re.search(pattern, analysis_text)\n                if score_match:\n                    result[\"sentiment_score\"] = float(score_match.group(1))\n                    logger.info(f\"Extracted sentiment score: {result['sentiment_score']}\")\n                    break\n            \n            # 如果未提取到评分，尝试从情感倾向推断\n            if result[\"sentiment_score\"] == 0.0 and result[\"sentiment\"] != \"neutral\":\n                if result[\"sentiment\"] == \"positive\":\n                    result[\"sentiment_score\"] = 0.5  # 默认中等利好\n                elif result[\"sentiment\"] == \"negative\":\n                    result[\"sentiment_score\"] = -0.5  # 默认中等利空\n            \n            # 提取置信度\n            confidence_match = re.search(r'置信度[：:]\\s*\\*?\\*?(\\d*\\.?\\d+)', analysis_text)\n            if confidence_match:\n                result[\"confidence\"] = float(confidence_match.group(1))\n            \n            # 提取关键信息点（简单实现：查找列表）\n            key_points_section = re.search(r'关键信息[：:](.*?)(?=市场影响|投资建议|$)', analysis_text, re.DOTALL)\n            if key_points_section:\n                points_text = key_points_section.group(1)\n                points = re.findall(r'[•\\-\\*]\\s*(.+)', points_text)\n                result[\"key_points\"] = [p.strip() for p in points if p.strip()]\n            \n            # 提取市场影响\n            impact_match = re.search(r'市场影响[：:](.*?)(?=投资建议|置信度|$)', analysis_text, re.DOTALL)\n            if impact_match:\n                result[\"market_impact\"] = impact_match.group(1).strip()\n            \n            # 提取投资建议\n            advice_match = re.search(r'投资建议[：:](.*?)(?=置信度|$)', analysis_text, re.DOTALL)\n            if advice_match:\n                result[\"investment_advice\"] = advice_match.group(1).strip()\n        \n        except Exception as e:\n            logger.warning(f\"Failed to extract structured info: {e}\")\n        \n        # 日志记录提取结果\n        logger.info(\n            f\"Extracted sentiment: {result['sentiment']}, \"\n            f\"score: {result['sentiment_score']}, \"\n            f\"confidence: {result['confidence']}\"\n        )\n        \n        return result\n    \n    def batch_analyze(\n        self,\n        news_list: List[Dict[str, Any]]\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        批量分析新闻\n        \n        Args:\n            news_list: 新闻列表\n            \n        Returns:\n            分析结果列表\n        \"\"\"\n        results = []\n        \n        for news in news_list:\n            try:\n                result = self.analyze_news(\n                    news_title=news.get(\"title\", \"\"),\n                    news_content=news.get(\"content\", \"\"),\n                    news_url=news.get(\"url\", \"\"),\n                    stock_codes=news.get(\"stock_codes\", [])\n                )\n                results.append(result)\n            except Exception as e:\n                logger.error(f\"Failed to analyze news: {e}\")\n                results.append({\n                    \"success\": False,\n                    \"error\": str(e),\n                    \"news_url\": news.get(\"url\", \"\")\n                })\n        \n        return results\n\n\ndef create_news_analyst(\n    llm_provider=None,\n    tools: Optional[List[BaseTool]] = None,\n    organization_id: str = \"finnews\"\n) -> NewsAnalystAgent:\n    \"\"\"\n    创建新闻分析师智能体实例\n    \n    Args:\n        llm_provider: LLM 提供者\n        tools: 工具列表\n        organization_id: 组织ID（用于多租户隔离），默认 \"finnews\"\n        \n    Returns:\n        NewsAnalystAgent 实例\n    \"\"\"\n    return NewsAnalystAgent(\n        llm_provider=llm_provider, \n        tools=tools,\n        organization_id=organization_id\n    )\n\n"
  },
  {
    "path": "backend/app/agents/orchestrator.py",
    "content": "\"\"\"\n协作编排器\n\n负责管理多智能体协作流程，支持：\n- 并行分析模式（parallel）\n- 实时辩论模式（realtime_debate）\n- 快速分析模式（quick_analysis）\n- 动态搜索模式（在辩论过程中按需获取数据）\n\"\"\"\nimport logging\nimport asyncio\nfrom typing import Dict, Any, List, Optional, Callable, AsyncGenerator\nfrom datetime import datetime\nfrom enum import Enum\n\nfrom ..config import get_mode_config, get_default_mode, DebateModeConfig\nfrom ..services.llm_service import get_llm_provider\n\nlogger = logging.getLogger(__name__)\n\n\nclass DebatePhase(Enum):\n    \"\"\"辩论阶段\"\"\"\n    INITIALIZING = \"initializing\"\n    DATA_COLLECTION = \"data_collection\"\n    OPENING = \"opening\"\n    DEBATE = \"debate\"\n    CLOSING = \"closing\"\n    COMPLETED = \"completed\"\n    FAILED = \"failed\"\n\n\nclass DebateEvent:\n    \"\"\"辩论事件（用于实时流式输出）\"\"\"\n    def __init__(\n        self,\n        event_type: str,\n        agent_name: str,\n        content: str,\n        phase: DebatePhase,\n        round_number: Optional[int] = None,\n        metadata: Optional[Dict[str, Any]] = None\n    ):\n        self.event_type = event_type\n        self.agent_name = agent_name\n        self.content = content\n        self.phase = phase\n        self.round_number = round_number\n        self.metadata = metadata or {}\n        self.timestamp = datetime.utcnow().isoformat()\n    \n    def to_dict(self) -> Dict[str, Any]:\n        return {\n            \"event_type\": self.event_type,\n            \"agent_name\": self.agent_name,\n            \"content\": self.content,\n            \"phase\": self.phase.value,\n            \"round_number\": self.round_number,\n            \"metadata\": self.metadata,\n            \"timestamp\": self.timestamp\n        }\n\n\nclass DebateOrchestrator:\n    \"\"\"辩论编排器\"\"\"\n    \n    def __init__(\n        self,\n        mode: str = None,\n        llm_provider=None,\n        enable_dynamic_search: bool = True\n    ):\n        \"\"\"\n        初始化辩论编排器\n        \n        Args:\n            mode: 辩论模式 (parallel, realtime_debate, quick_analysis)\n            llm_provider: LLM 提供者\n            enable_dynamic_search: 是否启用动态搜索（辩论中按需获取数据）\n        \"\"\"\n        self.mode = mode or get_default_mode()\n        self.config = get_mode_config(self.mode)\n        if not self.config:\n            raise ValueError(f\"未知的辩论模式: {self.mode}\")\n        \n        self.llm_provider = llm_provider or get_llm_provider()\n        self.current_phase = DebatePhase.INITIALIZING\n        self.current_round = 0\n        self.start_time: Optional[datetime] = None\n        self.events: List[DebateEvent] = []\n        self.is_interrupted = False\n        \n        # 动态搜索配置\n        self.enable_dynamic_search = enable_dynamic_search\n        self._search_analyst = None\n        \n        # 搜索统计\n        self.search_stats = {\n            \"total_requests\": 0,\n            \"successful_searches\": 0,\n            \"data_supplements\": []\n        }\n        \n        # 事件回调\n        self._event_callbacks: List[Callable[[DebateEvent], None]] = []\n        \n        logger.info(f\"🎭 初始化辩论编排器，模式: {self.mode}, 动态搜索: {enable_dynamic_search}\")\n    \n    def _get_search_analyst(self):\n        \"\"\"懒加载搜索分析师\"\"\"\n        if self._search_analyst is None and self.enable_dynamic_search:\n            from .search_analyst import SearchAnalystAgent\n            self._search_analyst = SearchAnalystAgent(self.llm_provider)\n        return self._search_analyst\n    \n    def on_event(self, callback: Callable[[DebateEvent], None]):\n        \"\"\"注册事件回调\"\"\"\n        self._event_callbacks.append(callback)\n    \n    def _emit_event(self, event: DebateEvent):\n        \"\"\"触发事件\"\"\"\n        self.events.append(event)\n        for callback in self._event_callbacks:\n            try:\n                callback(event)\n            except Exception as e:\n                logger.error(f\"事件回调出错: {e}\")\n    \n    def interrupt(self, reason: str = \"manager_decision\"):\n        \"\"\"打断辩论\"\"\"\n        self.is_interrupted = True\n        self._emit_event(DebateEvent(\n            event_type=\"interrupt\",\n            agent_name=\"InvestmentManager\",\n            content=f\"辩论被打断: {reason}\",\n            phase=self.current_phase\n        ))\n        logger.info(f\"⚡ 辩论被打断: {reason}\")\n    \n    async def run(\n        self,\n        stock_code: str,\n        stock_name: str,\n        context: str = \"\",\n        news_list: List[Dict[str, Any]] = None\n    ) -> Dict[str, Any]:\n        \"\"\"运行辩论流程\"\"\"\n        self.start_time = datetime.utcnow()\n        result = {\n            \"success\": False,\n            \"mode\": self.mode,\n            \"stock_code\": stock_code,\n            \"stock_name\": stock_name,\n            \"trajectory\": [],\n            \"events\": []\n        }\n        \n        try:\n            self._emit_event(DebateEvent(\n                event_type=\"start\",\n                agent_name=\"Orchestrator\",\n                content=f\"开始 {self.config.name}\",\n                phase=DebatePhase.INITIALIZING\n            ))\n            \n            # 根据模式选择执行流程\n            if self.config.flow.type == \"parallel_then_summarize\":\n                result = await self._run_parallel_mode(stock_code, stock_name, context, news_list)\n            elif self.config.flow.type == \"orchestrated_debate\":\n                result = await self._run_realtime_debate_mode(stock_code, stock_name, context, news_list)\n            elif self.config.flow.type == \"single_agent\":\n                result = await self._run_quick_mode(stock_code, stock_name, context)\n            else:\n                raise ValueError(f\"未知的流程类型: {self.config.flow.type}\")\n            \n            self.current_phase = DebatePhase.COMPLETED\n            self._emit_event(DebateEvent(\n                event_type=\"complete\",\n                agent_name=\"Orchestrator\",\n                content=\"辩论完成\",\n                phase=DebatePhase.COMPLETED\n            ))\n            \n        except Exception as e:\n            logger.error(f\"辩论执行失败: {e}\", exc_info=True)\n            self.current_phase = DebatePhase.FAILED\n            result[\"error\"] = str(e)\n            self._emit_event(DebateEvent(\n                event_type=\"error\",\n                agent_name=\"Orchestrator\",\n                content=f\"辩论失败: {e}\",\n                phase=DebatePhase.FAILED\n            ))\n        \n        result[\"events\"] = [e.to_dict() for e in self.events]\n        result[\"execution_time\"] = (datetime.utcnow() - self.start_time).total_seconds()\n        \n        return result\n    \n    async def _run_parallel_mode(\n        self,\n        stock_code: str,\n        stock_name: str,\n        context: str,\n        news_list: List[Dict[str, Any]]\n    ) -> Dict[str, Any]:\n        \"\"\"运行并行分析模式\"\"\"\n        from .debate_agents import BullResearcherAgent, BearResearcherAgent, InvestmentManagerAgent\n        \n        logger.info(\"🔄 执行并行分析模式\")\n        \n        # 初始化智能体\n        bull_agent = BullResearcherAgent(self.llm_provider)\n        bear_agent = BearResearcherAgent(self.llm_provider)\n        manager_agent = InvestmentManagerAgent(self.llm_provider)\n        \n        # 准备新闻摘要\n        news_summary = self._prepare_news_summary(news_list)\n        full_context = f\"{context}\\n\\n{news_summary}\" if context else news_summary\n        \n        self.current_phase = DebatePhase.DEBATE\n        \n        # 并行执行Bull和Bear分析\n        self._emit_event(DebateEvent(\n            event_type=\"analysis_start\",\n            agent_name=\"BullResearcher\",\n            content=\"开始看多分析\",\n            phase=self.current_phase\n        ))\n        self._emit_event(DebateEvent(\n            event_type=\"analysis_start\",\n            agent_name=\"BearResearcher\",\n            content=\"开始看空分析\",\n            phase=self.current_phase\n        ))\n        \n        bull_task = asyncio.create_task(\n            bull_agent.analyze(stock_code, stock_name, full_context)\n        )\n        bear_task = asyncio.create_task(\n            bear_agent.analyze(stock_code, stock_name, full_context)\n        )\n        \n        bull_analysis, bear_analysis = await asyncio.gather(bull_task, bear_task)\n        \n        self._emit_event(DebateEvent(\n            event_type=\"analysis_complete\",\n            agent_name=\"BullResearcher\",\n            content=bull_analysis.get(\"analysis\", \"\")[:200] + \"...\",\n            phase=self.current_phase\n        ))\n        self._emit_event(DebateEvent(\n            event_type=\"analysis_complete\",\n            agent_name=\"BearResearcher\",\n            content=bear_analysis.get(\"analysis\", \"\")[:200] + \"...\",\n            phase=self.current_phase\n        ))\n        \n        # 投资经理做决策\n        self.current_phase = DebatePhase.CLOSING\n        self._emit_event(DebateEvent(\n            event_type=\"decision_start\",\n            agent_name=\"InvestmentManager\",\n            content=\"开始综合决策\",\n            phase=self.current_phase\n        ))\n        \n        final_decision = await manager_agent.make_decision(\n            stock_code=stock_code,\n            stock_name=stock_name,\n            bull_analysis=bull_analysis.get(\"analysis\", \"\"),\n            bear_analysis=bear_analysis.get(\"analysis\", \"\"),\n            context=full_context\n        )\n        \n        self._emit_event(DebateEvent(\n            event_type=\"decision_complete\",\n            agent_name=\"InvestmentManager\",\n            content=f\"决策完成: {final_decision.get('rating', 'N/A')}\",\n            phase=self.current_phase\n        ))\n        \n        return {\n            \"success\": True,\n            \"mode\": self.mode,\n            \"bull_analysis\": bull_analysis,\n            \"bear_analysis\": bear_analysis,\n            \"final_decision\": final_decision,\n            \"trajectory\": [\n                {\"agent\": \"BullResearcher\", \"action\": \"analyze\", \"status\": \"completed\"},\n                {\"agent\": \"BearResearcher\", \"action\": \"analyze\", \"status\": \"completed\"},\n                {\"agent\": \"InvestmentManager\", \"action\": \"decide\", \"status\": \"completed\"}\n            ]\n        }\n    \n    async def _run_realtime_debate_mode(\n        self,\n        stock_code: str,\n        stock_name: str,\n        context: str,\n        news_list: List[Dict[str, Any]]\n    ) -> Dict[str, Any]:\n        \"\"\"运行实时辩论模式（支持动态搜索）\"\"\"\n        from .debate_agents import BullResearcherAgent, BearResearcherAgent, InvestmentManagerAgent\n        from .data_collector import DataCollectorAgent\n        \n        logger.info(\"🎭 执行实时辩论模式\")\n        \n        # 初始化智能体\n        data_collector = DataCollectorAgent(self.llm_provider)\n        bull_agent = BullResearcherAgent(self.llm_provider)\n        bear_agent = BearResearcherAgent(self.llm_provider)\n        manager_agent = InvestmentManagerAgent(self.llm_provider)\n        \n        # 获取搜索分析师（如果启用）\n        search_analyst = self._get_search_analyst()\n        \n        rules = self.config.rules\n        max_rounds = rules.max_rounds or 5\n        max_time = rules.max_time or 600\n        \n        trajectory = []\n        debate_history = []\n        dynamic_data_supplements = []  # 记录动态搜索补充的数据\n        \n        # Phase 1: 数据搜集\n        if rules.require_data_collection:\n            self.current_phase = DebatePhase.DATA_COLLECTION\n            self._emit_event(DebateEvent(\n                event_type=\"phase_start\",\n                agent_name=\"DataCollector\",\n                content=\"开始搜集数据\",\n                phase=self.current_phase\n            ))\n            \n            collected_data = await data_collector.collect_data(stock_code, stock_name)\n            data_summary = collected_data.get(\"summary\", \"\")\n            \n            self._emit_event(DebateEvent(\n                event_type=\"data_collected\",\n                agent_name=\"DataCollector\",\n                content=data_summary[:300] + \"...\",\n                phase=self.current_phase\n            ))\n            \n            trajectory.append({\n                \"agent\": \"DataCollector\",\n                \"action\": \"collect_data\",\n                \"status\": \"completed\"\n            })\n            \n            # 合并数据到上下文\n            context = f\"{context}\\n\\n{data_summary}\" if context else data_summary\n        \n        # Phase 2: 投资经理开场\n        self.current_phase = DebatePhase.OPENING\n        opening_prompt = f\"\"\"你是投资经理，现在要主持一场关于 {stock_name}({stock_code}) 的多空辩论。\n\n请做开场陈述，说明：\n1. 今天辩论的股票背景\n2. 辩论的规则（最多{max_rounds}轮，每人每轮1分钟）\n3. 请看多研究员先发言\n\n背景资料:\n{context[:2000]}\"\"\"\n        \n        self._emit_event(DebateEvent(\n            event_type=\"opening\",\n            agent_name=\"InvestmentManager\",\n            content=\"投资经理开场中...\",\n            phase=self.current_phase\n        ))\n        \n        opening = await self.llm_provider.chat(opening_prompt)\n        \n        self._emit_event(DebateEvent(\n            event_type=\"speech\",\n            agent_name=\"InvestmentManager\",\n            content=opening,\n            phase=self.current_phase,\n            round_number=0\n        ))\n        \n        trajectory.append({\n            \"agent\": \"InvestmentManager\",\n            \"action\": \"opening\",\n            \"status\": \"completed\",\n            \"content\": opening\n        })\n        \n        debate_history.append({\n            \"round\": 0,\n            \"agent\": \"InvestmentManager\",\n            \"type\": \"opening\",\n            \"content\": opening\n        })\n        \n        # Phase 3: 辩论回合\n        self.current_phase = DebatePhase.DEBATE\n        bull_analysis_full = \"\"\n        bear_analysis_full = \"\"\n        \n        for round_num in range(1, max_rounds + 1):\n            if self.is_interrupted:\n                logger.info(f\"辩论在第{round_num}轮被打断\")\n                break\n            \n            # 检查时间限制\n            elapsed = (datetime.utcnow() - self.start_time).total_seconds()\n            if elapsed > max_time:\n                logger.info(f\"辩论超时，已进行 {elapsed:.0f} 秒\")\n                break\n            \n            self.current_round = round_num\n            \n            # Bull发言\n            self._emit_event(DebateEvent(\n                event_type=\"round_start\",\n                agent_name=\"BullResearcher\",\n                content=f\"第{round_num}轮 - 看多研究员发言\",\n                phase=self.current_phase,\n                round_number=round_num\n            ))\n            \n            bull_prompt = self._build_debate_prompt(\n                agent_role=\"看多研究员\",\n                stock_name=stock_name,\n                stock_code=stock_code,\n                round_num=round_num,\n                max_rounds=max_rounds,\n                context=context,\n                debate_history=debate_history,\n                enable_search_requests=self.enable_dynamic_search\n            )\n            \n            bull_response = await bull_agent.debate_round(bull_prompt)\n            bull_analysis_full += f\"\\n\\n### 第{round_num}轮\\n{bull_response}\"\n            \n            self._emit_event(DebateEvent(\n                event_type=\"speech\",\n                agent_name=\"BullResearcher\",\n                content=bull_response,\n                phase=self.current_phase,\n                round_number=round_num\n            ))\n            \n            debate_history.append({\n                \"round\": round_num,\n                \"agent\": \"BullResearcher\",\n                \"type\": \"argument\",\n                \"content\": bull_response\n            })\n            \n            # 动态搜索：处理 Bull 发言中的数据请求\n            if search_analyst:\n                context, supplement = await self._process_speech_for_search(\n                    search_analyst=search_analyst,\n                    speech_text=bull_response,\n                    agent_name=\"BullResearcher\",\n                    stock_code=stock_code,\n                    stock_name=stock_name,\n                    context=context,\n                    round_num=round_num,\n                    trajectory=trajectory\n                )\n                if supplement:\n                    dynamic_data_supplements.append(supplement)\n            \n            # Bear发言\n            self._emit_event(DebateEvent(\n                event_type=\"round_continue\",\n                agent_name=\"BearResearcher\",\n                content=f\"第{round_num}轮 - 看空研究员发言\",\n                phase=self.current_phase,\n                round_number=round_num\n            ))\n            \n            bear_prompt = self._build_debate_prompt(\n                agent_role=\"看空研究员\",\n                stock_name=stock_name,\n                stock_code=stock_code,\n                round_num=round_num,\n                max_rounds=max_rounds,\n                context=context,\n                debate_history=debate_history,\n                enable_search_requests=self.enable_dynamic_search\n            )\n            \n            bear_response = await bear_agent.debate_round(bear_prompt)\n            bear_analysis_full += f\"\\n\\n### 第{round_num}轮\\n{bear_response}\"\n            \n            self._emit_event(DebateEvent(\n                event_type=\"speech\",\n                agent_name=\"BearResearcher\",\n                content=bear_response,\n                phase=self.current_phase,\n                round_number=round_num\n            ))\n            \n            debate_history.append({\n                \"round\": round_num,\n                \"agent\": \"BearResearcher\",\n                \"type\": \"argument\",\n                \"content\": bear_response\n            })\n            \n            # 动态搜索：处理 Bear 发言中的数据请求\n            if search_analyst:\n                context, supplement = await self._process_speech_for_search(\n                    search_analyst=search_analyst,\n                    speech_text=bear_response,\n                    agent_name=\"BearResearcher\",\n                    stock_code=stock_code,\n                    stock_name=stock_name,\n                    context=context,\n                    round_num=round_num,\n                    trajectory=trajectory\n                )\n                if supplement:\n                    dynamic_data_supplements.append(supplement)\n            \n            trajectory.append({\n                \"agent\": \"Debate\",\n                \"action\": f\"round_{round_num}\",\n                \"status\": \"completed\"\n            })\n            \n            # 投资经理可选择打断或请求更多数据\n            if rules.manager_can_interrupt and round_num < max_rounds:\n                should_interrupt, manager_data_request = await self._check_manager_interrupt_or_search(\n                    manager_agent, debate_history, stock_name, stock_code,\n                    search_analyst, context\n                )\n                \n                # 如果经理请求了更多数据，更新上下文\n                if manager_data_request:\n                    context = f\"{context}\\n\\n【投资经理补充数据】\\n{manager_data_request}\"\n                    dynamic_data_supplements.append({\n                        \"round\": round_num,\n                        \"agent\": \"InvestmentManager\",\n                        \"data\": manager_data_request\n                    })\n                \n                if should_interrupt:\n                    self.interrupt(\"投资经理认为已有足够信息做决策\")\n                    break\n        \n        # Phase 4: 投资经理总结决策\n        self.current_phase = DebatePhase.CLOSING\n        self._emit_event(DebateEvent(\n            event_type=\"closing_start\",\n            agent_name=\"InvestmentManager\",\n            content=\"投资经理正在做最终决策...\",\n            phase=self.current_phase\n        ))\n        \n        # 如果启用了动态搜索，在做决策前进行智能数据补充\n        if search_analyst and len(dynamic_data_supplements) < 2:\n            self._emit_event(DebateEvent(\n                event_type=\"smart_supplement\",\n                agent_name=\"SearchAnalyst\",\n                content=\"智能分析数据缺口，补充关键信息...\",\n                phase=self.current_phase\n            ))\n            \n            smart_result = await search_analyst.smart_data_supplement(\n                stock_code=stock_code,\n                stock_name=stock_name,\n                existing_context=context,\n                debate_history=debate_history\n            )\n            \n            if smart_result.get(\"success\") and smart_result.get(\"combined_summary\"):\n                context = f\"{context}\\n\\n【智能补充数据】\\n{smart_result['combined_summary']}\"\n                dynamic_data_supplements.append({\n                    \"round\": \"pre_decision\",\n                    \"agent\": \"SearchAnalyst\",\n                    \"data\": smart_result[\"combined_summary\"]\n                })\n        \n        final_decision = await manager_agent.make_decision(\n            stock_code=stock_code,\n            stock_name=stock_name,\n            bull_analysis=bull_analysis_full,\n            bear_analysis=bear_analysis_full,\n            context=f\"{context}\\n\\n辩论历史:\\n{self._format_debate_history(debate_history)}\"\n        )\n        \n        self._emit_event(DebateEvent(\n            event_type=\"decision\",\n            agent_name=\"InvestmentManager\",\n            content=final_decision.get(\"summary\", \"\"),\n            phase=self.current_phase,\n            metadata={\"rating\": final_decision.get(\"rating\")}\n        ))\n        \n        trajectory.append({\n            \"agent\": \"InvestmentManager\",\n            \"action\": \"final_decision\",\n            \"status\": \"completed\"\n        })\n        \n        return {\n            \"success\": True,\n            \"mode\": self.mode,\n            \"bull_analysis\": {\"analysis\": bull_analysis_full, \"success\": True},\n            \"bear_analysis\": {\"analysis\": bear_analysis_full, \"success\": True},\n            \"final_decision\": final_decision,\n            \"debate_history\": debate_history,\n            \"total_rounds\": self.current_round,\n            \"was_interrupted\": self.is_interrupted,\n            \"trajectory\": trajectory,\n            \"dynamic_search_enabled\": self.enable_dynamic_search,\n            \"data_supplements\": dynamic_data_supplements,\n            \"search_stats\": self.search_stats\n        }\n    \n    async def _process_speech_for_search(\n        self,\n        search_analyst,\n        speech_text: str,\n        agent_name: str,\n        stock_code: str,\n        stock_name: str,\n        context: str,\n        round_num: int,\n        trajectory: List[Dict]\n    ) -> tuple:\n        \"\"\"\n        处理发言中的搜索请求\n        \n        Returns:\n            (updated_context, supplement_data)\n        \"\"\"\n        try:\n            result = await search_analyst.process_debate_speech(\n                speech_text=speech_text,\n                stock_code=stock_code,\n                stock_name=stock_name,\n                agent_name=agent_name\n            )\n            \n            self.search_stats[\"total_requests\"] += result.get(\"requests_found\", 0)\n            \n            if result.get(\"success\") and result.get(\"combined_summary\"):\n                self.search_stats[\"successful_searches\"] += len(result.get(\"search_results\", []))\n                \n                self._emit_event(DebateEvent(\n                    event_type=\"dynamic_search\",\n                    agent_name=\"SearchAnalyst\",\n                    content=f\"为 {agent_name} 补充了 {result['requests_found']} 项数据\",\n                    phase=self.current_phase,\n                    round_number=round_num,\n                    metadata={\"requests\": result[\"requests_found\"]}\n                ))\n                \n                trajectory.append({\n                    \"agent\": \"SearchAnalyst\",\n                    \"action\": f\"search_for_{agent_name}\",\n                    \"status\": \"completed\",\n                    \"requests\": result[\"requests_found\"]\n                })\n                \n                # 更新上下文\n                new_context = f\"{context}\\n\\n【{agent_name} 请求的补充数据】\\n{result['combined_summary']}\"\n                \n                supplement = {\n                    \"round\": round_num,\n                    \"agent\": agent_name,\n                    \"requests\": result[\"requests_found\"],\n                    \"data\": result[\"combined_summary\"][:500]\n                }\n                \n                return new_context, supplement\n            \n        except Exception as e:\n            logger.warning(f\"处理搜索请求时出错: {e}\")\n        \n        return context, None\n    \n    async def _run_quick_mode(\n        self,\n        stock_code: str,\n        stock_name: str,\n        context: str\n    ) -> Dict[str, Any]:\n        \"\"\"运行快速分析模式\"\"\"\n        from .data_collector import QuickAnalystAgent\n        \n        logger.info(\"🚀 执行快速分析模式\")\n        \n        quick_analyst = QuickAnalystAgent(self.llm_provider)\n        \n        self.current_phase = DebatePhase.DEBATE\n        self._emit_event(DebateEvent(\n            event_type=\"quick_analysis_start\",\n            agent_name=\"QuickAnalyst\",\n            content=\"开始快速分析\",\n            phase=self.current_phase\n        ))\n        \n        result = await quick_analyst.quick_analyze(stock_code, stock_name, context)\n        \n        self._emit_event(DebateEvent(\n            event_type=\"quick_analysis_complete\",\n            agent_name=\"QuickAnalyst\",\n            content=result.get(\"analysis\", \"\")[:200] + \"...\",\n            phase=self.current_phase\n        ))\n        \n        return {\n            \"success\": result.get(\"success\", False),\n            \"mode\": self.mode,\n            \"quick_analysis\": result,\n            \"trajectory\": [\n                {\"agent\": \"QuickAnalyst\", \"action\": \"analyze\", \"status\": \"completed\"}\n            ]\n        }\n    \n    def _prepare_news_summary(self, news_list: List[Dict[str, Any]]) -> str:\n        \"\"\"准备新闻摘要\"\"\"\n        if not news_list:\n            return \"暂无相关新闻数据\"\n        \n        summary_parts = [\"## 相关新闻摘要\\n\"]\n        for i, news in enumerate(news_list[:10], 1):\n            title = news.get(\"title\", \"无标题\")\n            content = news.get(\"content\", \"\")[:200]\n            source = news.get(\"source\", \"未知来源\")\n            date = news.get(\"published_at\", \"\")\n            \n            summary_parts.append(f\"{i}. **{title}** ({source}, {date})\\n   {content}...\\n\")\n        \n        return \"\\n\".join(summary_parts)\n    \n    def _build_debate_prompt(\n        self,\n        agent_role: str,\n        stock_name: str,\n        stock_code: str,\n        round_num: int,\n        max_rounds: int,\n        context: str,\n        debate_history: List[Dict],\n        enable_search_requests: bool = False\n    ) -> str:\n        \"\"\"构建辩论提示词\"\"\"\n        history_text = self._format_debate_history(debate_history[-4:])  # 只取最近4条\n        \n        # 基础提示词\n        prompt = f\"\"\"你是{agent_role}，正在参与关于 {stock_name}({stock_code}) 的多空辩论。\n\n当前是第 {round_num}/{max_rounds} 轮辩论。\n\n背景资料:\n{context[:1500]}\n\n最近的辩论历史:\n{history_text}\n\n请发表你的观点（约200字）：\n1. 如果是第一轮，阐述你的核心论点\n2. 如果不是第一轮，先反驳对方观点，再补充新论据\n3. 用数据和事实支持你的论点\n4. 语气专业但有说服力\"\"\"\n\n        # 如果启用了动态搜索，添加搜索请求说明\n        if enable_search_requests:\n            prompt += \"\"\"\n\n【数据请求功能】\n如果你在分析过程中发现缺少关键数据，可以在发言中使用以下格式请求搜索：\n- [SEARCH: \"最新的毛利率数据\" source:akshare]  -- 从AkShare获取财务数据\n- [SEARCH: \"最近的行业新闻\" source:bochaai]  -- 从网络搜索新闻\n- [SEARCH: \"近期资金流向\" source:akshare]  -- 获取资金流向\n- [SEARCH: \"竞品对比分析\"]  -- 不指定来源则自动选择\n\n搜索请求会在你发言后自动执行，数据会补充到下一轮的背景资料中。\n请只在确实需要更多数据支撑论点时才使用搜索请求，每次最多1-2个。\"\"\"\n\n        return prompt\n    \n    def _format_debate_history(self, history: List[Dict]) -> str:\n        \"\"\"格式化辩论历史\"\"\"\n        if not history:\n            return \"（尚无辩论历史）\"\n        \n        lines = []\n        for item in history:\n            agent = item.get(\"agent\", \"Unknown\")\n            content = item.get(\"content\", \"\")[:300]\n            round_num = item.get(\"round\", 0)\n            lines.append(f\"[第{round_num}轮 - {agent}]: {content}\")\n        \n        return \"\\n\\n\".join(lines)\n    \n    async def _check_manager_interrupt(\n        self,\n        manager_agent,\n        debate_history: List[Dict],\n        stock_name: str\n    ) -> bool:\n        \"\"\"检查投资经理是否要打断辩论\"\"\"\n        if len(debate_history) < 4:\n            return False\n        \n        check_prompt = f\"\"\"你是投资经理，正在主持关于 {stock_name} 的辩论。\n\n目前的辩论历史:\n{self._format_debate_history(debate_history[-4:])}\n\n请判断：你是否已经获得足够的信息来做出投资决策？\n如果是，回复\"是\"；如果还需要更多辩论，回复\"否\"。\n只回复一个字。\"\"\"\n        \n        try:\n            response = await self.llm_provider.chat(check_prompt)\n            return \"是\" in response[:5]\n        except Exception:\n            return False\n\n    async def _check_manager_interrupt_or_search(\n        self,\n        manager_agent,\n        debate_history: List[Dict],\n        stock_name: str,\n        stock_code: str,\n        search_analyst,\n        context: str\n    ) -> tuple:\n        \"\"\"\n        检查投资经理是否要打断辩论或请求更多数据\n        \n        Returns:\n            (should_interrupt: bool, additional_data: str or None)\n        \"\"\"\n        if len(debate_history) < 4:\n            return False, None\n        \n        # 如果没有搜索分析师，使用简单的打断检查\n        if not search_analyst:\n            should_interrupt = await self._check_manager_interrupt(\n                manager_agent, debate_history, stock_name\n            )\n            return should_interrupt, None\n        \n        check_prompt = f\"\"\"你是投资经理，正在主持关于 {stock_name}({stock_code}) 的多空辩论。\n\n目前的辩论历史:\n{self._format_debate_history(debate_history[-4:])}\n\n请判断当前情况：\n1. 如果你已经获得足够的信息做决策，回复：决策就绪\n2. 如果你需要更多数据支持，使用以下格式请求：\n   [SEARCH: \"你需要的具体数据\" source:数据源]\n   \n可用数据源: akshare(财务/行情), bochaai(新闻), browser(网页搜索)\n\n请只回复\"决策就绪\"或搜索请求，不要添加其他内容。\"\"\"\n        \n        try:\n            response = await self.llm_provider.chat(check_prompt)\n            \n            # 检查是否决策就绪\n            if \"决策就绪\" in response:\n                return True, None\n            \n            # 检查是否有搜索请求\n            requests = search_analyst.extract_search_requests(response)\n            if requests:\n                self._emit_event(DebateEvent(\n                    event_type=\"manager_search_request\",\n                    agent_name=\"InvestmentManager\",\n                    content=f\"投资经理请求 {len(requests)} 项补充数据\",\n                    phase=self.current_phase,\n                    round_number=self.current_round\n                ))\n                \n                # 执行搜索\n                search_result = await search_analyst.process_debate_speech(\n                    speech_text=response,\n                    stock_code=stock_code,\n                    stock_name=stock_name,\n                    agent_name=\"InvestmentManager\"\n                )\n                \n                if search_result.get(\"success\") and search_result.get(\"combined_summary\"):\n                    self.search_stats[\"total_requests\"] += len(requests)\n                    self.search_stats[\"successful_searches\"] += len(search_result.get(\"search_results\", []))\n                    return False, search_result[\"combined_summary\"]\n            \n            return False, None\n            \n        except Exception as e:\n            logger.warning(f\"检查经理决策时出错: {e}\")\n            return False, None\n\n\ndef create_orchestrator(\n    mode: str = None,\n    llm_provider=None,\n    enable_dynamic_search: bool = True\n) -> DebateOrchestrator:\n    \"\"\"\n    创建辩论编排器\n    \n    Args:\n        mode: 辩论模式 (parallel, realtime_debate, quick_analysis)\n        llm_provider: LLM 提供者\n        enable_dynamic_search: 是否启用动态搜索\n        \n    Returns:\n        DebateOrchestrator 实例\n    \"\"\"\n    return DebateOrchestrator(\n        mode=mode,\n        llm_provider=llm_provider,\n        enable_dynamic_search=enable_dynamic_search\n    )\n\n"
  },
  {
    "path": "backend/app/agents/quantitative_agent.py",
    "content": "\"\"\"\n量化分析智能体\n\n负责量化因子挖掘、技术分析和量化策略生成。\n集成 Alpha Mining 模块，提供自动化因子发现能力。\n\n功能：\n- 因子挖掘：使用 RL 自动发现有效交易因子\n- 因子评估：评估因子的预测能力和回测表现\n- 技术分析：结合传统技术指标进行分析\n- 策略生成：基于因子生成交易策略建议\n\"\"\"\n\nimport logging\nimport asyncio\nfrom typing import Dict, Any, List, Optional\nfrom datetime import datetime\nimport json\n\nlogger = logging.getLogger(__name__)\n\n\nclass QuantitativeAgent:\n    \"\"\"\n    量化分析智能体\n    \n    集成 Alpha Mining 模块，提供因子挖掘和量化分析能力。\n    \n    Args:\n        llm_provider: LLM 提供者\n        enable_alpha_mining: 是否启用因子挖掘\n        model_path: 预训练模型路径\n        \n    Example:\n        agent = QuantitativeAgent(llm_provider)\n        result = await agent.analyze(stock_code, stock_name, market_data)\n    \"\"\"\n    \n    def __init__(\n        self,\n        llm_provider=None,\n        enable_alpha_mining: bool = True,\n        model_path: Optional[str] = None\n    ):\n        self.llm_provider = llm_provider\n        self.enable_alpha_mining = enable_alpha_mining\n        self.model_path = model_path\n        \n        # 延迟初始化 Alpha Mining 组件\n        self._alpha_mining_initialized = False\n        self._generator = None\n        self._trainer = None\n        self._vm = None\n        self._evaluator = None\n        self._market_builder = None\n        self._sentiment_builder = None\n        \n        # 存储发现的因子\n        self.discovered_factors: List[Dict[str, Any]] = []\n        \n        logger.info(f\"QuantitativeAgent initialized (alpha_mining={enable_alpha_mining})\")\n    \n    def _init_alpha_mining(self):\n        \"\"\"延迟初始化 Alpha Mining 组件\"\"\"\n        if self._alpha_mining_initialized:\n            return\n        \n        try:\n            from ..alpha_mining import (\n                AlphaMiningConfig,\n                FactorVocab,\n                FactorVM,\n                AlphaGenerator,\n                AlphaTrainer,\n                FactorEvaluator,\n                MarketFeatureBuilder,\n                SentimentFeatureBuilder\n            )\n            \n            config = AlphaMiningConfig()\n            vocab = FactorVocab()\n            \n            self._vm = FactorVM(vocab=vocab)\n            self._evaluator = FactorEvaluator(config=config)\n            self._market_builder = MarketFeatureBuilder(config=config)\n            self._sentiment_builder = SentimentFeatureBuilder(config=config)\n            \n            # 初始化生成器\n            self._generator = AlphaGenerator(vocab=vocab, config=config)\n            \n            # 如果有预训练模型，加载它\n            if self.model_path:\n                try:\n                    self._generator = AlphaGenerator.load(self.model_path, vocab=vocab)\n                    logger.info(f\"Loaded pretrained model from {self.model_path}\")\n                except Exception as e:\n                    logger.warning(f\"Failed to load model: {e}\")\n            \n            self._alpha_mining_initialized = True\n            logger.info(\"Alpha Mining components initialized\")\n            \n        except ImportError as e:\n            logger.warning(f\"Alpha Mining not available: {e}\")\n            self.enable_alpha_mining = False\n    \n    async def analyze(\n        self,\n        stock_code: str,\n        stock_name: str,\n        market_data: Optional[Dict[str, Any]] = None,\n        sentiment_data: Optional[Dict[str, Any]] = None,\n        context: str = \"\"\n    ) -> Dict[str, Any]:\n        \"\"\"\n        执行量化分析\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            market_data: 行情数据（可选）\n            sentiment_data: 情感数据（可选）\n            context: 额外上下文\n            \n        Returns:\n            分析结果字典\n        \"\"\"\n        result = {\n            \"success\": True,\n            \"stock_code\": stock_code,\n            \"stock_name\": stock_name,\n            \"timestamp\": datetime.utcnow().isoformat(),\n            \"analysis_type\": \"quantitative\",\n            \"factors_discovered\": [],\n            \"technical_analysis\": {},\n            \"strategy_suggestion\": \"\",\n            \"confidence\": 0.0\n        }\n        \n        try:\n            # 1. 因子挖掘（如果启用）\n            if self.enable_alpha_mining:\n                factor_result = await self._mine_factors(\n                    stock_code, stock_name, market_data, sentiment_data\n                )\n                result[\"factors_discovered\"] = factor_result.get(\"factors\", [])\n                result[\"factor_mining_stats\"] = factor_result.get(\"stats\", {})\n            \n            # 2. 技术分析（使用 LLM）\n            if self.llm_provider and market_data:\n                tech_analysis = await self._technical_analysis(\n                    stock_code, stock_name, market_data, context\n                )\n                result[\"technical_analysis\"] = tech_analysis\n            \n            # 3. 生成策略建议\n            if self.llm_provider:\n                strategy = await self._generate_strategy(\n                    stock_code, stock_name, result, context\n                )\n                result[\"strategy_suggestion\"] = strategy.get(\"suggestion\", \"\")\n                result[\"confidence\"] = strategy.get(\"confidence\", 0.0)\n            \n        except Exception as e:\n            logger.error(f\"Quantitative analysis failed: {e}\", exc_info=True)\n            result[\"success\"] = False\n            result[\"error\"] = str(e)\n        \n        return result\n    \n    async def _mine_factors(\n        self,\n        stock_code: str,\n        stock_name: str,\n        market_data: Optional[Dict[str, Any]],\n        sentiment_data: Optional[Dict[str, Any]]\n    ) -> Dict[str, Any]:\n        \"\"\"执行因子挖掘\"\"\"\n        self._init_alpha_mining()\n        \n        if not self._alpha_mining_initialized:\n            return {\"factors\": [], \"stats\": {\"error\": \"Alpha Mining not available\"}}\n        \n        try:\n            import torch\n            from ..alpha_mining.utils import generate_mock_data\n            \n            # 准备特征数据\n            if market_data is not None:\n                market_features = self._market_builder.build(market_data)\n                time_steps = market_features.size(-1)\n                \n                if sentiment_data is not None:\n                    sentiment_features = self._sentiment_builder.build(\n                        sentiment_data, time_steps=time_steps\n                    )\n                    features = self._sentiment_builder.combine_with_market(\n                        market_features, sentiment_features\n                    )\n                else:\n                    features = market_features\n                \n                returns = market_features[:, 0, :]  # RET\n            else:\n                # 使用模拟数据\n                features, returns = generate_mock_data(\n                    num_samples=50,\n                    num_features=6,\n                    time_steps=252,\n                    seed=42\n                )\n            \n            # 生成候选因子\n            formulas, _ = self._generator.generate(batch_size=20, max_len=8)\n            \n            # 评估每个因子\n            evaluated_factors = []\n            for formula in formulas:\n                factor = self._vm.execute(formula, features)\n                if factor is not None and factor.std() > 1e-6:\n                    try:\n                        metrics = self._evaluator.evaluate(factor, returns)\n                        evaluated_factors.append({\n                            \"formula\": formula,\n                            \"formula_str\": self._vm.decode(formula),\n                            \"sortino\": metrics[\"sortino_ratio\"],\n                            \"sharpe\": metrics[\"sharpe_ratio\"],\n                            \"ic\": metrics[\"ic\"],\n                            \"max_drawdown\": metrics[\"max_drawdown\"]\n                        })\n                    except Exception:\n                        continue\n            \n            # 按 Sortino 排序，取 top 5\n            evaluated_factors.sort(key=lambda x: x[\"sortino\"], reverse=True)\n            top_factors = evaluated_factors[:5]\n            \n            # 更新已发现因子\n            for f in top_factors:\n                f[\"stock_code\"] = stock_code\n                f[\"discovered_at\"] = datetime.utcnow().isoformat()\n            self.discovered_factors.extend(top_factors)\n            \n            return {\n                \"factors\": top_factors,\n                \"stats\": {\n                    \"generated\": len(formulas),\n                    \"valid\": len(evaluated_factors),\n                    \"top_sortino\": top_factors[0][\"sortino\"] if top_factors else 0\n                }\n            }\n            \n        except Exception as e:\n            logger.error(f\"Factor mining failed: {e}\")\n            return {\"factors\": [], \"stats\": {\"error\": str(e)}}\n    \n    async def _technical_analysis(\n        self,\n        stock_code: str,\n        stock_name: str,\n        market_data: Dict[str, Any],\n        context: str\n    ) -> Dict[str, Any]:\n        \"\"\"使用 LLM 进行技术分析\"\"\"\n        # 提取关键指标\n        data_summary = self._summarize_market_data(market_data)\n        \n        prompt = f\"\"\"你是一位资深量化分析师，请对 {stock_name}({stock_code}) 进行技术分析。\n\n行情数据摘要：\n{data_summary}\n\n{f'额外背景：{context}' if context else ''}\n\n请分析：\n1. 趋势判断（上涨/下跌/震荡）\n2. 关键支撑位和阻力位\n3. 技术指标信号（MA/MACD/RSI等）\n4. 成交量分析\n5. 短期（1周）和中期（1月）预测\n\n请以 JSON 格式返回：\n{{\n    \"trend\": \"上涨/下跌/震荡\",\n    \"support_levels\": [价格1, 价格2],\n    \"resistance_levels\": [价格1, 价格2],\n    \"technical_signals\": {{\n        \"ma_signal\": \"看涨/看跌/中性\",\n        \"macd_signal\": \"看涨/看跌/中性\",\n        \"rsi_signal\": \"超买/超卖/中性\"\n    }},\n    \"volume_analysis\": \"放量/缩量/正常\",\n    \"short_term_outlook\": \"看涨/看跌/中性\",\n    \"medium_term_outlook\": \"看涨/看跌/中性\",\n    \"confidence\": 0.0-1.0\n}}\"\"\"\n        \n        try:\n            response = await self.llm_provider.chat(prompt)\n            # 尝试解析 JSON\n            start = response.find(\"{\")\n            end = response.rfind(\"}\") + 1\n            if start >= 0 and end > start:\n                return json.loads(response[start:end])\n            return {\"raw_analysis\": response}\n        except Exception as e:\n            logger.warning(f\"Technical analysis parsing failed: {e}\")\n            return {\"error\": str(e)}\n    \n    async def _generate_strategy(\n        self,\n        stock_code: str,\n        stock_name: str,\n        analysis_result: Dict[str, Any],\n        context: str\n    ) -> Dict[str, Any]:\n        \"\"\"生成交易策略建议\"\"\"\n        factors_summary = \"\"\n        if analysis_result.get(\"factors_discovered\"):\n            factors = analysis_result[\"factors_discovered\"][:3]\n            factors_summary = \"发现的有效因子：\\n\"\n            for i, f in enumerate(factors, 1):\n                factors_summary += f\"{i}. {f['formula_str']} (Sortino={f['sortino']:.2f}, IC={f['ic']:.3f})\\n\"\n        \n        tech_summary = \"\"\n        tech = analysis_result.get(\"technical_analysis\", {})\n        if tech and not tech.get(\"error\"):\n            tech_summary = f\"\"\"技术分析结论：\n- 趋势：{tech.get('trend', 'N/A')}\n- 短期展望：{tech.get('short_term_outlook', 'N/A')}\n- 中期展望：{tech.get('medium_term_outlook', 'N/A')}\n\"\"\"\n        \n        prompt = f\"\"\"你是一位量化投资顾问，请为 {stock_name}({stock_code}) 生成交易策略建议。\n\n{factors_summary}\n\n{tech_summary}\n\n{f'额外背景：{context}' if context else ''}\n\n请提供：\n1. 总体投资建议（买入/持有/卖出/观望）\n2. 建议的仓位比例（0-100%）\n3. 入场/出场价位建议\n4. 风险控制建议（止损/止盈）\n5. 策略置信度（0-1）\n\n请以 JSON 格式返回：\n{{\n    \"suggestion\": \"详细策略建议（100-200字）\",\n    \"action\": \"买入/持有/卖出/观望\",\n    \"position_ratio\": 0-100,\n    \"entry_price\": 价格或null,\n    \"exit_price\": 价格或null,\n    \"stop_loss\": 价格或null,\n    \"take_profit\": 价格或null,\n    \"confidence\": 0.0-1.0,\n    \"risk_level\": \"低/中/高\"\n}}\"\"\"\n        \n        try:\n            response = await self.llm_provider.chat(prompt)\n            start = response.find(\"{\")\n            end = response.rfind(\"}\") + 1\n            if start >= 0 and end > start:\n                return json.loads(response[start:end])\n            return {\"suggestion\": response, \"confidence\": 0.5}\n        except Exception as e:\n            logger.warning(f\"Strategy generation failed: {e}\")\n            return {\"suggestion\": \"策略生成失败\", \"confidence\": 0.0, \"error\": str(e)}\n    \n    def _summarize_market_data(self, market_data: Dict[str, Any]) -> str:\n        \"\"\"摘要行情数据\"\"\"\n        if isinstance(market_data, dict):\n            if \"close\" in market_data:\n                close = market_data[\"close\"]\n                if hasattr(close, \"tolist\"):\n                    close = close.tolist()\n                if isinstance(close, list) and len(close) > 0:\n                    return f\"\"\"\n- 最新价格：{close[-1]:.2f}\n- 最高价（近期）：{max(close[-20:]):.2f}\n- 最低价（近期）：{min(close[-20:]):.2f}\n- 价格变化（5日）：{((close[-1]/close[-5])-1)*100:.2f}%\n- 价格变化（20日）：{((close[-1]/close[-20])-1)*100:.2f}%\n\"\"\"\n        return \"行情数据格式不支持摘要\"\n    \n    async def evaluate_factor(\n        self,\n        formula_str: str,\n        market_data: Optional[Dict[str, Any]] = None\n    ) -> Dict[str, Any]:\n        \"\"\"评估指定因子表达式\"\"\"\n        self._init_alpha_mining()\n        \n        if not self._alpha_mining_initialized:\n            return {\"success\": False, \"error\": \"Alpha Mining not available\"}\n        \n        try:\n            import torch\n            from ..alpha_mining.utils import generate_mock_data\n            \n            # 解析公式\n            tokens = []\n            parts = formula_str.replace(\"(\", \" \").replace(\")\", \" \").replace(\",\", \" \").split()\n            for part in parts:\n                part = part.strip()\n                if not part:\n                    continue\n                try:\n                    token = self._vm.vocab.name_to_token(part)\n                    tokens.append(token)\n                except (ValueError, KeyError):\n                    continue\n            \n            if not tokens:\n                return {\"success\": False, \"error\": \"Invalid formula\"}\n            \n            # 准备数据\n            if market_data is not None:\n                features = self._market_builder.build(market_data)\n                returns = features[:, 0, :]\n            else:\n                features, returns = generate_mock_data()\n            \n            # 执行\n            factor = self._vm.execute(tokens, features)\n            if factor is None:\n                return {\"success\": False, \"error\": \"Factor execution failed\"}\n            \n            # 评估\n            metrics = self._evaluator.evaluate(factor, returns)\n            \n            return {\n                \"success\": True,\n                \"formula\": formula_str,\n                \"metrics\": metrics\n            }\n            \n        except Exception as e:\n            return {\"success\": False, \"error\": str(e)}\n    \n    def get_best_factors(self, top_k: int = 5) -> List[Dict[str, Any]]:\n        \"\"\"获取最优因子\"\"\"\n        sorted_factors = sorted(\n            self.discovered_factors,\n            key=lambda x: x.get(\"sortino\", 0),\n            reverse=True\n        )\n        return sorted_factors[:top_k]\n\n\ndef create_quantitative_agent(\n    llm_provider=None,\n    enable_alpha_mining: bool = True,\n    model_path: Optional[str] = None\n) -> QuantitativeAgent:\n    \"\"\"\n    创建量化分析智能体\n    \n    Args:\n        llm_provider: LLM 提供者\n        enable_alpha_mining: 是否启用因子挖掘\n        model_path: 预训练模型路径\n        \n    Returns:\n        QuantitativeAgent 实例\n    \"\"\"\n    return QuantitativeAgent(\n        llm_provider=llm_provider,\n        enable_alpha_mining=enable_alpha_mining,\n        model_path=model_path\n    )\n"
  },
  {
    "path": "backend/app/agents/search_analyst.py",
    "content": "\"\"\"\n搜索分析师智能体 (SearchAnalystAgent)\n\n负责在辩论过程中动态搜集数据，支持多种数据源：\n- AkShare: 财务指标、K线数据、资金流向、机构持仓\n- BochaAI: 实时新闻搜索、分析师报告\n- InteractiveCrawler: 多引擎网页搜索 (百度、搜狗、360等)\n- Knowledge Base: 历史新闻和上下文 (向量数据库)\n\"\"\"\nimport logging\nimport re\nimport asyncio\nfrom typing import Dict, Any, List, Optional, ClassVar, Pattern\nfrom datetime import datetime\nfrom enum import Enum\n\nfrom agenticx.core.agent import Agent\nfrom ..services.llm_service import get_llm_provider\nfrom ..services.stock_data_service import stock_data_service\nfrom ..tools.bochaai_search import bochaai_search, SearchResult\nfrom ..tools.interactive_crawler import InteractiveCrawler\n\nlogger = logging.getLogger(__name__)\n\n\nclass SearchSource(Enum):\n    \"\"\"搜索数据源类型\"\"\"\n    AKSHARE = \"akshare\"           # AkShare 财务/行情数据\n    BOCHAAI = \"bochaai\"           # BochaAI Web搜索\n    BROWSER = \"browser\"           # 交互式浏览器搜索\n    KNOWLEDGE_BASE = \"kb\"         # 内部知识库\n    ALL = \"all\"                   # 所有来源\n\n\nclass SearchAnalystAgent(Agent):\n    \"\"\"\n    搜索分析师智能体\n    \n    在辩论过程中被其他智能体调用，动态获取所需数据。\n    支持解析结构化搜索请求，并返回格式化的数据。\n    \"\"\"\n    \n    # 搜索请求的正则模式 [SEARCH: \"query\" source:xxx]\n    # 使用 ClassVar 避免 Pydantic 将其视为模型字段\n    SEARCH_PATTERN: ClassVar[Pattern] = re.compile(\n        r'\\[SEARCH:\\s*[\"\\']([^\"\\']+)[\"\\']\\s*(?:source:(\\w+))?\\]',\n        re.IGNORECASE\n    )\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        super().__init__(\n            name=\"SearchAnalyst\",\n            role=\"搜索分析师\",\n            goal=\"根据辩论中的数据需求，快速从多个数据源获取相关信息\",\n            backstory=\"\"\"你是一位专业的金融数据搜索专家，精通各类金融数据源的使用。\n你的职责是：\n1. 解析辩论智能体的数据请求\n2. 选择最合适的数据源进行查询\n3. 整理并格式化数据，使其便于辩论使用\n4. 对数据质量进行初步评估\n\n你能够访问的数据源包括：\n- AkShare: 股票财务指标、K线行情、资金流向、机构持仓等\n- BochaAI: 实时新闻搜索、财经报道\n- 多引擎搜索: 百度资讯、搜狗、360等\n- 内部知识库: 历史新闻和分析数据\"\"\",\n            organization_id=organization_id\n        )\n        \n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        \n        # 初始化搜索工具\n        self._interactive_crawler = InteractiveCrawler(timeout=20)\n        \n        logger.info(f\"✅ Initialized {self.name} agent with multi-source search capabilities\")\n    \n    def extract_search_requests(self, text: str) -> List[Dict[str, Any]]:\n        \"\"\"\n        从文本中提取搜索请求\n        \n        支持格式:\n        - [SEARCH: \"query\"]\n        - [SEARCH: \"query\" source:akshare]\n        - [SEARCH: \"query\" source:bochaai]\n        - [SEARCH: \"query\" source:browser]\n        \n        Args:\n            text: 包含搜索请求的文本\n            \n        Returns:\n            搜索请求列表 [{\"query\": \"...\", \"source\": \"...\"}]\n        \"\"\"\n        requests = []\n        matches = self.SEARCH_PATTERN.findall(text)\n        \n        for match in matches:\n            query = match[0].strip()\n            source = match[1].lower() if match[1] else \"all\"\n            \n            # 验证 source\n            valid_sources = [s.value for s in SearchSource]\n            if source not in valid_sources:\n                source = \"all\"\n            \n            requests.append({\n                \"query\": query,\n                \"source\": source\n            })\n            logger.info(f\"🔍 提取搜索请求: query='{query}', source={source}\")\n        \n        return requests\n    \n    async def search(\n        self,\n        query: str,\n        source: str = \"all\",\n        stock_code: Optional[str] = None,\n        stock_name: Optional[str] = None,\n        context: Optional[str] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        执行搜索请求\n        \n        Args:\n            query: 搜索查询\n            source: 数据源 (akshare, bochaai, browser, kb, all)\n            stock_code: 股票代码 (用于 akshare 查询)\n            stock_name: 股票名称 (用于新闻搜索)\n            context: 额外上下文\n            \n        Returns:\n            搜索结果字典\n        \"\"\"\n        logger.info(f\"🔍 SearchAnalyst: 执行搜索 query='{query}', source={source}\")\n        \n        result = {\n            \"query\": query,\n            \"source\": source,\n            \"timestamp\": datetime.utcnow().isoformat(),\n            \"data\": {},\n            \"summary\": \"\",\n            \"success\": False\n        }\n        \n        try:\n            if source == SearchSource.AKSHARE.value or source == SearchSource.ALL.value:\n                akshare_data = await self._search_akshare(query, stock_code)\n                if akshare_data:\n                    result[\"data\"][\"akshare\"] = akshare_data\n            \n            if source == SearchSource.BOCHAAI.value or source == SearchSource.ALL.value:\n                bochaai_data = await self._search_bochaai(query, stock_name)\n                if bochaai_data:\n                    result[\"data\"][\"bochaai\"] = bochaai_data\n            \n            if source == SearchSource.BROWSER.value or source == SearchSource.ALL.value:\n                browser_data = await self._search_browser(query)\n                if browser_data:\n                    result[\"data\"][\"browser\"] = browser_data\n            \n            if source == SearchSource.KNOWLEDGE_BASE.value or source == SearchSource.ALL.value:\n                kb_data = await self._search_knowledge_base(query, stock_code, stock_name)\n                if kb_data:\n                    result[\"data\"][\"knowledge_base\"] = kb_data\n            \n            # 生成摘要\n            if result[\"data\"]:\n                result[\"summary\"] = await self._generate_summary(query, result[\"data\"])\n                result[\"success\"] = True\n            else:\n                result[\"summary\"] = f\"未找到与'{query}'相关的数据\"\n            \n        except Exception as e:\n            logger.error(f\"SearchAnalyst 搜索失败: {e}\", exc_info=True)\n            result[\"error\"] = str(e)\n        \n        return result\n    \n    async def _search_akshare(\n        self,\n        query: str,\n        stock_code: Optional[str] = None\n    ) -> Optional[Dict[str, Any]]:\n        \"\"\"从 AkShare 获取数据\"\"\"\n        if not stock_code:\n            logger.debug(\"AkShare 搜索需要股票代码，跳过\")\n            return None\n        \n        data = {}\n        query_lower = query.lower()\n        \n        try:\n            # 根据查询内容决定获取哪些数据\n            if any(kw in query_lower for kw in [\"财务\", \"pe\", \"pb\", \"roe\", \"利润\", \"估值\", \"市盈\", \"市净\"]):\n                financial = await stock_data_service.get_financial_indicators(stock_code)\n                if financial:\n                    data[\"financial_indicators\"] = financial\n            \n            if any(kw in query_lower for kw in [\"资金\", \"主力\", \"流入\", \"流出\", \"散户\", \"机构\"]):\n                fund_flow = await stock_data_service.get_fund_flow(stock_code, days=10)\n                if fund_flow:\n                    data[\"fund_flow\"] = fund_flow\n            \n            if any(kw in query_lower for kw in [\"行情\", \"价格\", \"涨跌\", \"成交\", \"量\"]):\n                realtime = await stock_data_service.get_realtime_quote(stock_code)\n                if realtime:\n                    data[\"realtime_quote\"] = realtime\n            \n            if any(kw in query_lower for kw in [\"k线\", \"走势\", \"历史\", \"均线\", \"趋势\"]):\n                kline = await stock_data_service.get_kline_data(stock_code, period=\"daily\", limit=30)\n                if kline:\n                    # 只返回最近10天的简要数据\n                    data[\"kline_summary\"] = {\n                        \"period\": \"daily\",\n                        \"count\": len(kline),\n                        \"latest\": kline[-1] if kline else None,\n                        \"recent_5\": kline[-5:] if len(kline) >= 5 else kline\n                    }\n            \n            # 如果没有匹配到特定查询，获取综合数据\n            if not data:\n                context_data = await stock_data_service.get_debate_context(stock_code)\n                if context_data:\n                    data = context_data\n            \n            if data:\n                logger.info(f\"✅ AkShare 返回数据: {list(data.keys())}\")\n                return data\n            \n        except Exception as e:\n            logger.warning(f\"AkShare 搜索出错: {e}\")\n        \n        return None\n    \n    async def _search_bochaai(\n        self,\n        query: str,\n        stock_name: Optional[str] = None\n    ) -> Optional[Dict[str, Any]]:\n        \"\"\"从 BochaAI 搜索新闻\"\"\"\n        if not bochaai_search.is_available():\n            logger.debug(\"BochaAI 未配置，跳过\")\n            return None\n        \n        try:\n            # 构建搜索查询\n            search_query = query\n            if stock_name and stock_name not in query:\n                search_query = f\"{stock_name} {query}\"\n            \n            results = bochaai_search.search(\n                query=search_query,\n                freshness=\"oneWeek\",\n                count=10\n            )\n            \n            if results:\n                news_list = [\n                    {\n                        \"title\": r.title,\n                        \"snippet\": r.snippet[:200] if r.snippet else \"\",\n                        \"url\": r.url,\n                        \"source\": r.site_name or \"unknown\",\n                        \"date\": r.date_published or \"\"\n                    }\n                    for r in results\n                ]\n                logger.info(f\"✅ BochaAI 返回 {len(news_list)} 条新闻\")\n                return {\"news\": news_list, \"count\": len(news_list)}\n        \n        except Exception as e:\n            logger.warning(f\"BochaAI 搜索出错: {e}\")\n        \n        return None\n    \n    async def _search_browser(self, query: str) -> Optional[Dict[str, Any]]:\n        \"\"\"使用交互式爬虫搜索\"\"\"\n        try:\n            loop = asyncio.get_event_loop()\n            results = await loop.run_in_executor(\n                None,\n                lambda: self._interactive_crawler.interactive_search(\n                    query=query,\n                    engines=[\"baidu_news\", \"sogou\"],\n                    num_results=10,\n                    search_type=\"news\"\n                )\n            )\n            \n            if results:\n                news_list = [\n                    {\n                        \"title\": r.get(\"title\", \"\"),\n                        \"snippet\": r.get(\"snippet\", \"\")[:200],\n                        \"url\": r.get(\"url\", \"\"),\n                        \"source\": \"browser_search\"\n                    }\n                    for r in results\n                ]\n                logger.info(f\"✅ Browser 返回 {len(news_list)} 条结果\")\n                return {\"search_results\": news_list, \"count\": len(news_list)}\n        \n        except Exception as e:\n            logger.warning(f\"Browser 搜索出错: {e}\")\n        \n        return None\n    \n    async def _search_knowledge_base(\n        self,\n        query: str,\n        stock_code: Optional[str] = None,\n        stock_name: Optional[str] = None\n    ) -> Optional[Dict[str, Any]]:\n        \"\"\"从知识库搜索历史数据\"\"\"\n        try:\n            # 尝试导入 news_service（可能不存在）\n            try:\n                from ..services.news_service import news_service\n            except ImportError:\n                logger.debug(\"news_service 未配置，跳过知识库搜索\")\n                return None\n            \n            # 尝试从数据库获取相关新闻\n            if stock_code and news_service:\n                news_list = await news_service.get_news_by_stock(stock_code, limit=10)\n                if news_list:\n                    kb_news = [\n                        {\n                            \"title\": getattr(news, 'title', ''),\n                            \"content\": (getattr(news, 'content', '') or '')[:300],\n                            \"source\": getattr(news, 'source', ''),\n                            \"date\": news.published_at.isoformat() if hasattr(news, 'published_at') and news.published_at else \"\"\n                        }\n                        for news in news_list\n                    ]\n                    logger.info(f\"✅ KB 返回 {len(kb_news)} 条历史新闻\")\n                    return {\"historical_news\": kb_news, \"count\": len(kb_news)}\n        \n        except Exception as e:\n            logger.debug(f\"KB 搜索出错: {e}\")\n        \n        return None\n    \n    async def _generate_summary(self, query: str, data: Dict[str, Any]) -> str:\n        \"\"\"生成数据摘要\"\"\"\n        summary_parts = [f\"## 搜索结果: {query}\\n\"]\n        \n        # AkShare 数据摘要\n        if \"akshare\" in data:\n            ak_data = data[\"akshare\"]\n            summary_parts.append(\"### 📊 财务/行情数据 (AkShare)\\n\")\n            \n            if \"financial_indicators\" in ak_data:\n                fi = ak_data[\"financial_indicators\"]\n                summary_parts.append(f\"- PE: {fi.get('pe_ratio', 'N/A')}, PB: {fi.get('pb_ratio', 'N/A')}\")\n                summary_parts.append(f\"- ROE: {fi.get('roe', 'N/A')}%, 净利润同比: {fi.get('profit_yoy', 'N/A')}%\")\n            \n            if \"realtime_quote\" in ak_data:\n                rt = ak_data[\"realtime_quote\"]\n                summary_parts.append(f\"- 当前价: {rt.get('price', 'N/A')}元, 涨跌幅: {rt.get('change_percent', 'N/A')}%\")\n            \n            if \"fund_flow\" in ak_data:\n                ff = ak_data[\"fund_flow\"]\n                main_net = ff.get('total_main_net', 0)\n                trend = ff.get('main_flow_trend', 'N/A')\n                summary_parts.append(f\"- 资金流向: 近{ff.get('period_days', 5)}日主力{trend}\")\n            \n            summary_parts.append(\"\")\n        \n        # BochaAI 新闻摘要\n        if \"bochaai\" in data:\n            news = data[\"bochaai\"].get(\"news\", [])\n            if news:\n                summary_parts.append(\"### 📰 最新新闻 (BochaAI)\\n\")\n                for i, n in enumerate(news[:5], 1):\n                    summary_parts.append(f\"{i}. **{n['title'][:50]}**\")\n                    if n.get('snippet'):\n                        summary_parts.append(f\"   {n['snippet'][:100]}...\")\n                summary_parts.append(\"\")\n        \n        # Browser 搜索结果摘要\n        if \"browser\" in data:\n            results = data[\"browser\"].get(\"search_results\", [])\n            if results:\n                summary_parts.append(\"### 🌐 网页搜索结果\\n\")\n                for i, r in enumerate(results[:5], 1):\n                    summary_parts.append(f\"{i}. {r['title'][:50]}\")\n                summary_parts.append(\"\")\n        \n        # KB 历史数据摘要\n        if \"knowledge_base\" in data:\n            kb_news = data[\"knowledge_base\"].get(\"historical_news\", [])\n            if kb_news:\n                summary_parts.append(\"### 📚 历史资料 (知识库)\\n\")\n                for i, n in enumerate(kb_news[:3], 1):\n                    summary_parts.append(f\"{i}. {n['title'][:50]}\")\n                summary_parts.append(\"\")\n        \n        return \"\\n\".join(summary_parts)\n    \n    async def process_debate_speech(\n        self,\n        speech_text: str,\n        stock_code: str,\n        stock_name: str,\n        agent_name: str = \"Unknown\"\n    ) -> Dict[str, Any]:\n        \"\"\"\n        处理辩论发言中的搜索请求\n        \n        Args:\n            speech_text: 辩论发言文本\n            stock_code: 股票代码\n            stock_name: 股票名称\n            agent_name: 发言智能体名称\n            \n        Returns:\n            处理结果，包含所有搜索结果和综合摘要\n        \"\"\"\n        logger.info(f\"🔍 SearchAnalyst: 处理 {agent_name} 的发言，检测搜索请求...\")\n        \n        result = {\n            \"agent_name\": agent_name,\n            \"requests_found\": 0,\n            \"search_results\": [],\n            \"combined_summary\": \"\",\n            \"success\": False\n        }\n        \n        # 提取搜索请求\n        requests = self.extract_search_requests(speech_text)\n        result[\"requests_found\"] = len(requests)\n        \n        if not requests:\n            logger.info(f\"📝 {agent_name} 的发言中未包含搜索请求\")\n            result[\"success\"] = True\n            return result\n        \n        logger.info(f\"📋 从 {agent_name} 的发言中提取到 {len(requests)} 个搜索请求\")\n        \n        # 并行执行所有搜索\n        search_tasks = []\n        for req in requests:\n            task = self.search(\n                query=req[\"query\"],\n                source=req[\"source\"],\n                stock_code=stock_code,\n                stock_name=stock_name\n            )\n            search_tasks.append(task)\n        \n        search_results = await asyncio.gather(*search_tasks, return_exceptions=True)\n        \n        # 收集结果\n        summaries = []\n        for i, res in enumerate(search_results):\n            if isinstance(res, Exception):\n                logger.error(f\"搜索请求 {i+1} 失败: {res}\")\n                continue\n            \n            if res.get(\"success\"):\n                result[\"search_results\"].append(res)\n                summaries.append(res.get(\"summary\", \"\"))\n        \n        # 生成综合摘要\n        if summaries:\n            result[\"combined_summary\"] = \"\\n---\\n\".join(summaries)\n            result[\"success\"] = True\n        \n        logger.info(f\"✅ SearchAnalyst: 为 {agent_name} 完成 {len(result['search_results'])} 个搜索请求\")\n        \n        return result\n    \n    async def smart_data_supplement(\n        self,\n        stock_code: str,\n        stock_name: str,\n        existing_context: str,\n        debate_history: List[Dict[str, Any]]\n    ) -> Dict[str, Any]:\n        \"\"\"\n        智能数据补充\n        \n        分析辩论历史和现有上下文，主动识别缺失的关键数据并补充\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            existing_context: 现有上下文\n            debate_history: 辩论历史\n            \n        Returns:\n            补充的数据和摘要\n        \"\"\"\n        logger.info(f\"🧠 SearchAnalyst: 智能分析数据缺口...\")\n        \n        # 使用 LLM 分析需要什么数据\n        analysis_prompt = f\"\"\"你是一位金融数据分析专家。请分析以下辩论情况，判断还需要哪些数据支撑：\n\n【股票】{stock_name} ({stock_code})\n\n【现有数据】\n{existing_context[:1500]}\n\n【辩论历史】\n{self._format_debate_history(debate_history[-4:])}\n\n请判断：\n1. 看多方缺少什么关键数据？\n2. 看空方缺少什么关键数据？\n3. 还需要搜索什么信息？\n\n请按以下格式输出需要搜索的内容（每行一个）：\n[SEARCH: \"搜索内容\" source:数据源]\n\n可用数据源：akshare（财务/行情）, bochaai（新闻）, browser（网页搜索）\n\n只输出3-5个最关键的搜索请求。\"\"\"\n\n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": analysis_prompt}\n            ])\n            \n            llm_response = response.content if hasattr(response, 'content') else str(response)\n            \n            # 处理 LLM 建议的搜索\n            return await self.process_debate_speech(\n                speech_text=llm_response,\n                stock_code=stock_code,\n                stock_name=stock_name,\n                agent_name=\"SmartSupplement\"\n            )\n            \n        except Exception as e:\n            logger.error(f\"智能数据补充失败: {e}\")\n            return {\"success\": False, \"error\": str(e)}\n    \n    def _format_debate_history(self, history: List[Dict[str, Any]]) -> str:\n        \"\"\"格式化辩论历史\"\"\"\n        if not history:\n            return \"（暂无辩论历史）\"\n        \n        lines = []\n        for item in history:\n            agent = item.get(\"agent\", \"Unknown\")\n            content = item.get(\"content\", \"\")[:300]\n            lines.append(f\"[{agent}]: {content}\")\n        \n        return \"\\n\\n\".join(lines)\n\n\n# 工厂函数\ndef create_search_analyst(llm_provider=None) -> SearchAnalystAgent:\n    \"\"\"创建搜索分析师实例\"\"\"\n    return SearchAnalystAgent(llm_provider=llm_provider)\n\n"
  },
  {
    "path": "backend/app/alpha_mining/README.md",
    "content": "# M12: Alpha Mining 量化因子挖掘模块\n\n基于 AlphaGPT 技术的量化因子自动挖掘模块，使用符号回归 + 强化学习自动发现有预测能力的交易因子。\n\n## 功能特性\n\n- **因子自动发现**：使用 Transformer + RL 自动生成和优化因子表达式\n- **DSL 表达式系统**：支持丰富的时序操作符（MA、STD、DELAY、DELTA 等）\n- **情感特征融合**：可结合新闻情感分析提升因子效果\n- **回测评估**：内置 Sortino/Sharpe/IC 等多种评估指标\n- **AgenticX 集成**：提供 BaseTool 封装，供 Agent 调用\n\n## 模块结构\n\n```\nalpha_mining/\n├── __init__.py          # 模块入口\n├── config.py            # 配置管理\n├── utils.py             # 工具函数\n├── dsl/                 # 因子表达式 DSL\n│   ├── ops.py          # 操作符定义\n│   └── vocab.py        # 词汇表管理\n├── vm/                  # 因子执行器\n│   └── factor_vm.py    # 栈式虚拟机\n├── model/               # 生成模型\n│   ├── alpha_generator.py  # Transformer 策略网络\n│   └── trainer.py      # RL 训练器\n├── features/            # 特征构建\n│   ├── market.py       # 行情特征\n│   └── sentiment.py    # 情感特征\n├── backtest/            # 回测评估\n│   └── evaluator.py    # 因子评估器\n└── tools/               # AgenticX 工具\n    └── alpha_mining_tool.py\n```\n\n## 快速开始\n\n### 基础使用\n\n```python\nfrom app.alpha_mining import (\n    AlphaGenerator,\n    AlphaTrainer,\n    FactorVM,\n    FactorEvaluator,\n    generate_mock_data\n)\n\n# 1. 准备数据\nfeatures, returns = generate_mock_data(\n    num_samples=50,\n    num_features=6,\n    time_steps=252\n)\n\n# 2. 创建训练器\ntrainer = AlphaTrainer()\n\n# 3. 训练挖掘因子\nresult = trainer.train(\n    features=features,\n    returns=returns,\n    num_steps=100\n)\n\nprint(f\"最优因子: {result['best_formula_str']}\")\nprint(f\"得分: {result['best_score']:.4f}\")\n```\n\n### 使用 QuantitativeAgent\n\n```python\nfrom app.agents import QuantitativeAgent\n\n# 创建智能体\nagent = QuantitativeAgent(\n    llm_provider=llm,\n    enable_alpha_mining=True\n)\n\n# 执行分析\nresult = await agent.analyze(\n    stock_code=\"000001\",\n    stock_name=\"平安银行\",\n    market_data=market_data,\n    sentiment_data=sentiment_data\n)\n\n# 获取发现的因子\nfor factor in result[\"factors_discovered\"]:\n    print(f\"{factor['formula_str']}: Sortino={factor['sortino']:.2f}\")\n```\n\n### REST API\n\n```bash\n# 启动因子挖掘任务\ncurl -X POST http://localhost:8000/api/v1/alpha-mining/mine \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"num_steps\": 100, \"use_sentiment\": true}'\n\n# 评估因子表达式\ncurl -X POST http://localhost:8000/api/v1/alpha-mining/evaluate \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"formula\": \"ADD(RET, MA5(VOL))\"}'\n\n# 获取已发现的因子\ncurl http://localhost:8000/api/v1/alpha-mining/factors?top_k=10\n```\n\n## 支持的操作符\n\n### 算术操作符\n| 操作符 | 参数数 | 描述 |\n|--------|--------|------|\n| ADD | 2 | 加法 |\n| SUB | 2 | 减法 |\n| MUL | 2 | 乘法 |\n| DIV | 2 | 除法 |\n\n### 一元操作符\n| 操作符 | 参数数 | 描述 |\n|--------|--------|------|\n| NEG | 1 | 取负 |\n| ABS | 1 | 绝对值 |\n| SIGN | 1 | 符号函数 |\n\n### 时序操作符\n| 操作符 | 参数数 | 描述 |\n|--------|--------|------|\n| DELAY1/5 | 1 | 延迟 1/5 期 |\n| DELTA1/5 | 1 | 差分 1/5 期 |\n| MA5/10 | 1 | 5/10 日移动平均 |\n| STD5/10 | 1 | 5/10 日滚动标准差 |\n\n### 条件操作符\n| 操作符 | 参数数 | 描述 |\n|--------|--------|------|\n| GATE | 3 | 条件选择 |\n| MAX | 2 | 取最大值 |\n| MIN | 2 | 取最小值 |\n\n## 特征列表\n\n| 特征 | 描述 | 数据来源 |\n|------|------|----------|\n| RET | 收益率 | 行情数据 |\n| VOL | 波动率 | 行情数据 |\n| VOLUME_CHG | 成交量变化 | 行情数据 |\n| TURNOVER | 换手率 | 行情数据 |\n| SENTIMENT | 情感分数 | 新闻分析 |\n| NEWS_COUNT | 新闻数量 | 新闻分析 |\n\n## 评估指标\n\n- **Sortino Ratio**: 风险调整收益（只考虑下行风险）\n- **Sharpe Ratio**: 风险调整收益\n- **IC**: 信息系数（因子与收益的相关性）\n- **Rank IC**: 排名信息系数\n- **Max Drawdown**: 最大回撤\n- **Turnover**: 换手率\n\n## 配置选项\n\n```python\nfrom app.alpha_mining import AlphaMiningConfig\n\nconfig = AlphaMiningConfig(\n    # 模型参数\n    d_model=64,           # Transformer 隐藏维度\n    num_layers=2,         # Transformer 层数\n    nhead=4,              # 注意力头数\n    max_seq_len=12,       # 最大序列长度\n    \n    # 训练参数\n    batch_size=1024,      # 批量大小\n    lr=1e-3,              # 学习率\n    num_steps=1000,       # 训练步数\n    \n    # 奖励参数\n    invalid_formula_reward=-5.0,  # 无效公式惩罚\n    constant_factor_reward=-2.0,  # 常量因子惩罚\n    \n    # 回测参数\n    cost_rate=0.0015,     # 交易成本率\n    signal_threshold=0.7, # 信号阈值\n    \n    # 特征配置\n    enable_sentiment=True,  # 启用情感特征\n)\n```\n\n## 参考\n\n- [AlphaGPT](https://github.com/imbue-bit/AlphaGPT) - 原始实现"
  },
  {
    "path": "backend/app/alpha_mining/__init__.py",
    "content": "\"\"\"\nM12: Alpha Mining Module for FinnewsHunter\n\n基于 AlphaGPT 技术的量化因子自动挖掘模块。\n使用符号回归 + 强化学习自动发现有预测能力的交易因子。\n\n核心组件：\n- dsl: 因子表达式 DSL（操作符、词汇表）\n- vm: 因子执行器（StackVM）\n- model: 因子生成模型（AlphaGenerator）和训练器（AlphaTrainer）\n- features: 特征构建器（行情、情感）\n- backtest: 因子回测评估\n- tools: AgenticX 工具封装\n\nReferences:\n- AlphaGPT: https://github.com/imbue-bit/AlphaGPT\n- 技术方案: researches/AlphaGPT/AlphaGPT_proposal.md\n\"\"\"\n\n__version__ = \"0.1.0\"\n__author__ = \"FinnewsHunter Team\"\n\nfrom .config import AlphaMiningConfig, DEFAULT_CONFIG\nfrom .dsl.vocab import FactorVocab, DEFAULT_VOCAB\nfrom .dsl.ops import OPS_CONFIG\nfrom .vm.factor_vm import FactorVM\nfrom .model.alpha_generator import AlphaGenerator\nfrom .model.trainer import AlphaTrainer\nfrom .features.market import MarketFeatureBuilder\nfrom .features.sentiment import SentimentFeatureBuilder\nfrom .backtest.evaluator import FactorEvaluator\nfrom .utils import generate_mock_data\n\n__all__ = [\n    # Config\n    \"AlphaMiningConfig\",\n    \"DEFAULT_CONFIG\",\n    # DSL\n    \"FactorVocab\",\n    \"DEFAULT_VOCAB\",\n    \"OPS_CONFIG\",\n    # VM\n    \"FactorVM\",\n    # Model\n    \"AlphaGenerator\",\n    \"AlphaTrainer\",\n    # Features\n    \"MarketFeatureBuilder\",\n    \"SentimentFeatureBuilder\",\n    # Backtest\n    \"FactorEvaluator\",\n    # Utils\n    \"generate_mock_data\",\n]\n"
  },
  {
    "path": "backend/app/alpha_mining/backtest/__init__.py",
    "content": "\"\"\"\n因子回测评估模块\n\n提供因子有效性评估，包括 Sortino Ratio 等指标计算。\n\"\"\"\n\nfrom .evaluator import FactorEvaluator\n\n__all__ = [\"FactorEvaluator\"]\n"
  },
  {
    "path": "backend/app/alpha_mining/backtest/evaluator.py",
    "content": "\"\"\"\n因子回测评估器\n\n评估因子的预测能力和交易表现。\n\n评估指标：\n- Sortino Ratio: 风险调整收益（只考虑下行风险）\n- Sharpe Ratio: 风险调整收益\n- IC: 信息系数（因子与收益的相关性）\n- Rank IC: 排名信息系数\n- Turnover: 换手率\n- Max Drawdown: 最大回撤\n\"\"\"\n\nimport torch\nfrom typing import Dict, Optional, List, Tuple\nimport numpy as np\nimport logging\n\nfrom ..config import AlphaMiningConfig, DEFAULT_CONFIG\n\nlogger = logging.getLogger(__name__)\n\n\nclass FactorEvaluator:\n    \"\"\"\n    因子回测评估器\n    \n    评估因子表达式的有效性和收益表现。\n    \n    Args:\n        config: 配置实例\n        cost_rate: 交易成本率\n        signal_threshold: 信号阈值（用于生成持仓）\n        \n    Example:\n        evaluator = FactorEvaluator()\n        metrics = evaluator.evaluate(factor, returns)\n    \"\"\"\n    \n    def __init__(\n        self,\n        config: Optional[AlphaMiningConfig] = None,\n        cost_rate: Optional[float] = None,\n        signal_threshold: Optional[float] = None\n    ):\n        self.config = config or DEFAULT_CONFIG\n        self.cost_rate = cost_rate or self.config.cost_rate\n        self.signal_threshold = signal_threshold or self.config.signal_threshold\n        \n        # 年化系数（假设 252 个交易日）\n        self.annualize_factor = np.sqrt(252)\n        \n        logger.info(\n            f\"FactorEvaluator initialized: \"\n            f\"cost_rate={self.cost_rate}, threshold={self.signal_threshold}\"\n        )\n    \n    def evaluate(\n        self,\n        factor: torch.Tensor,\n        returns: torch.Tensor,\n        benchmark: Optional[torch.Tensor] = None\n    ) -> Dict[str, float]:\n        \"\"\"\n        综合评估因子\n        \n        Args:\n            factor: 因子值 [batch, time_steps] 或 [time_steps]\n            returns: 收益率 [batch, time_steps] 或 [time_steps]\n            benchmark: 基准收益率（可选）\n            \n        Returns:\n            评估指标字典\n        \"\"\"\n        # 确保是 2D\n        if factor.dim() == 1:\n            factor = factor.unsqueeze(0)\n        if returns.dim() == 1:\n            returns = returns.unsqueeze(0)\n        \n        # 对每个样本计算指标，然后平均\n        metrics_list = []\n        for i in range(factor.size(0)):\n            f = factor[i]\n            r = returns[i]\n            b = benchmark[i] if benchmark is not None else None\n            \n            m = self._evaluate_single(f, r, b)\n            metrics_list.append(m)\n        \n        # 聚合指标\n        result = {}\n        for key in metrics_list[0].keys():\n            values = [m[key] for m in metrics_list]\n            result[key] = np.mean(values)\n            result[f\"{key}_std\"] = np.std(values)\n        \n        return result\n    \n    def _evaluate_single(\n        self,\n        factor: torch.Tensor,\n        returns: torch.Tensor,\n        benchmark: Optional[torch.Tensor] = None\n    ) -> Dict[str, float]:\n        \"\"\"评估单个样本\"\"\"\n        # 转换为 numpy\n        factor_np = factor.detach().cpu().numpy()\n        returns_np = returns.detach().cpu().numpy()\n        \n        # 生成信号和持仓\n        signal = self._factor_to_signal(factor_np)\n        position = self._signal_to_position(signal)\n        \n        # 计算策略收益\n        strategy_returns = position[:-1] * returns_np[1:]\n        \n        # 计算交易成本\n        turnover = np.abs(np.diff(position)).mean()\n        net_returns = strategy_returns - turnover * self.cost_rate\n        \n        # 计算各指标\n        metrics = {\n            \"sortino_ratio\": self._calc_sortino(net_returns),\n            \"sharpe_ratio\": self._calc_sharpe(net_returns),\n            \"ic\": self._calc_ic(factor_np, returns_np),\n            \"rank_ic\": self._calc_rank_ic(factor_np, returns_np),\n            \"turnover\": turnover,\n            \"max_drawdown\": self._calc_max_drawdown(net_returns),\n            \"total_return\": np.sum(net_returns),\n            \"win_rate\": np.mean(net_returns > 0),\n            \"avg_return\": np.mean(net_returns),\n        }\n        \n        # 相对基准的超额收益\n        if benchmark is not None:\n            benchmark_np = benchmark.detach().cpu().numpy()\n            excess_returns = net_returns - benchmark_np[1:]\n            metrics[\"excess_return\"] = np.sum(excess_returns)\n            metrics[\"information_ratio\"] = self._calc_sharpe(excess_returns)\n        \n        return metrics\n    \n    def _factor_to_signal(self, factor: np.ndarray) -> np.ndarray:\n        \"\"\"因子值转换为信号（-1 到 1）\"\"\"\n        # 使用 Z-score 标准化\n        mean = np.mean(factor)\n        std = np.std(factor) + 1e-8\n        z_score = (factor - mean) / std\n        \n        # Sigmoid 映射到 (-1, 1)\n        signal = 2 / (1 + np.exp(-z_score)) - 1\n        \n        return signal\n    \n    def _signal_to_position(self, signal: np.ndarray) -> np.ndarray:\n        \"\"\"信号转换为持仓\"\"\"\n        position = np.zeros_like(signal)\n        \n        # 信号大于阈值时做多\n        position[signal > self.signal_threshold] = 1.0\n        # 信号小于负阈值时做空\n        position[signal < -self.signal_threshold] = -1.0\n        # 中间区域不持仓\n        \n        return position\n    \n    def _calc_sortino(self, returns: np.ndarray) -> float:\n        \"\"\"\n        计算 Sortino Ratio\n        \n        只考虑下行风险（负收益的标准差）\n        \"\"\"\n        mean_return = np.mean(returns)\n        downside = returns[returns < 0]\n        \n        if len(downside) == 0:\n            return float('inf') if mean_return > 0 else 0.0\n        \n        downside_std = np.std(downside) + 1e-8\n        sortino = mean_return / downside_std * self.annualize_factor\n        \n        return float(sortino)\n    \n    def _calc_sharpe(self, returns: np.ndarray) -> float:\n        \"\"\"计算 Sharpe Ratio\"\"\"\n        mean_return = np.mean(returns)\n        std_return = np.std(returns) + 1e-8\n        \n        sharpe = mean_return / std_return * self.annualize_factor\n        return float(sharpe)\n    \n    def _calc_ic(self, factor: np.ndarray, returns: np.ndarray) -> float:\n        \"\"\"\n        计算 IC (Information Coefficient)\n        \n        因子值与下期收益的 Pearson 相关系数\n        \"\"\"\n        # 对齐：factor[t] 预测 returns[t+1]\n        factor_lag = factor[:-1]\n        returns_lead = returns[1:]\n        \n        # Pearson 相关\n        corr = np.corrcoef(factor_lag, returns_lead)[0, 1]\n        \n        return float(corr) if not np.isnan(corr) else 0.0\n    \n    def _calc_rank_ic(self, factor: np.ndarray, returns: np.ndarray) -> float:\n        \"\"\"\n        计算 Rank IC\n        \n        因子排名与收益排名的 Spearman 相关系数\n        \"\"\"\n        from scipy.stats import spearmanr\n        \n        factor_lag = factor[:-1]\n        returns_lead = returns[1:]\n        \n        try:\n            corr, _ = spearmanr(factor_lag, returns_lead)\n            return float(corr) if not np.isnan(corr) else 0.0\n        except Exception:\n            return 0.0\n    \n    def _calc_max_drawdown(self, returns: np.ndarray) -> float:\n        \"\"\"计算最大回撤\"\"\"\n        cumulative = np.cumsum(returns)\n        running_max = np.maximum.accumulate(cumulative)\n        drawdown = running_max - cumulative\n        \n        max_dd = np.max(drawdown)\n        return float(max_dd)\n    \n    def get_reward(\n        self,\n        factor: torch.Tensor,\n        returns: torch.Tensor\n    ) -> float:\n        \"\"\"\n        获取强化学习奖励\n        \n        使用 Sortino Ratio 作为奖励信号。\n        \n        Args:\n            factor: 因子值\n            returns: 收益率\n            \n        Returns:\n            奖励值\n        \"\"\"\n        metrics = self.evaluate(factor, returns)\n        \n        # 主要使用 Sortino Ratio\n        reward = metrics[\"sortino_ratio\"]\n        \n        # 惩罚过高的换手率\n        if metrics[\"turnover\"] > 0.5:\n            reward -= (metrics[\"turnover\"] - 0.5) * 2\n        \n        # 惩罚过大的最大回撤\n        if metrics[\"max_drawdown\"] > 0.2:\n            reward -= (metrics[\"max_drawdown\"] - 0.2) * 5\n        \n        return float(reward)\n    \n    def compare_factors(\n        self,\n        factors: List[torch.Tensor],\n        returns: torch.Tensor,\n        factor_names: Optional[List[str]] = None\n    ) -> Dict[str, Dict[str, float]]:\n        \"\"\"\n        比较多个因子的表现\n        \n        Args:\n            factors: 因子列表\n            returns: 收益率\n            factor_names: 因子名称列表\n            \n        Returns:\n            {factor_name: metrics_dict}\n        \"\"\"\n        if factor_names is None:\n            factor_names = [f\"factor_{i}\" for i in range(len(factors))]\n        \n        results = {}\n        for name, factor in zip(factor_names, factors):\n            results[name] = self.evaluate(factor, returns)\n        \n        return results\n    \n    def rank_factors(\n        self,\n        factors: List[torch.Tensor],\n        returns: torch.Tensor,\n        metric: str = \"sortino_ratio\"\n    ) -> List[Tuple[int, float]]:\n        \"\"\"\n        对因子按指定指标排名\n        \n        Args:\n            factors: 因子列表\n            returns: 收益率\n            metric: 排名指标\n            \n        Returns:\n            [(index, score), ...] 按 score 降序排列\n        \"\"\"\n        scores = []\n        for i, factor in enumerate(factors):\n            metrics = self.evaluate(factor, returns)\n            scores.append((i, metrics.get(metric, 0)))\n        \n        # 降序排列\n        scores.sort(key=lambda x: x[1], reverse=True)\n        \n        return scores\n"
  },
  {
    "path": "backend/app/alpha_mining/config.py",
    "content": "\"\"\"\nAlpha Mining 配置模块\n\n定义训练、模型、回测等配置参数。\n\nReferences:\n- AlphaGPT upstream/model_core/config.py\n\"\"\"\n\nimport torch\nfrom dataclasses import dataclass, field\nfrom typing import List, Optional\n\n\n@dataclass\nclass AlphaMiningConfig:\n    \"\"\"Alpha Mining 模块配置\"\"\"\n    \n    # ============ 设备配置 ============\n    device: str = field(default_factory=lambda: \"cuda\" if torch.cuda.is_available() else \"cpu\")\n    \n    # ============ 模型配置 ============\n    d_model: int = 64              # Transformer 嵌入维度\n    nhead: int = 4                 # 注意力头数\n    num_layers: int = 2            # Transformer 层数\n    max_seq_len: int = 12          # 最大因子表达式长度\n    \n    # ============ 训练配置 ============\n    batch_size: int = 1024         # 批量大小（每批生成的因子数）\n    num_steps: int = 1000          # 训练步数\n    lr: float = 1e-3               # 学习率\n    \n    # ============ 奖励配置 ============\n    invalid_formula_reward: float = -5.0   # 无效公式惩罚\n    constant_factor_reward: float = -2.0   # 常量因子惩罚\n    low_activity_reward: float = -10.0     # 低活跃度惩罚\n    constant_threshold: float = 1e-4       # 常量因子阈值（std < 此值视为常量）\n    \n    # ============ 回测配置 ============\n    cost_rate: float = 0.0015      # A股交易费率（双边约0.3%）\n    signal_threshold: float = 0.7  # 信号阈值（factor > threshold 时建仓）\n    min_holding_days: int = 1      # 最小持仓天数\n    min_activity: int = 5          # 最小活跃度（持仓天数）\n    \n    # ============ 特征配置 ============\n    market_features: List[str] = field(default_factory=lambda: [\n        \"RET\",           # 收益率\n        \"VOL\",           # 波动率  \n        \"VOLUME_CHG\",    # 成交量变化\n        \"TURNOVER\",      # 换手率\n    ])\n    \n    sentiment_features: List[str] = field(default_factory=lambda: [\n        \"SENTIMENT\",     # 情感分数\n        \"NEWS_COUNT\",    # 新闻数量\n    ])\n    \n    enable_sentiment: bool = True  # 是否启用情感特征\n    \n    # ============ 持久化配置 ============\n    checkpoint_dir: str = \"checkpoints/alpha_mining\"\n    save_every_n_steps: int = 100\n    \n    @property\n    def torch_device(self) -> torch.device:\n        \"\"\"获取 torch.device 对象\"\"\"\n        return torch.device(self.device)\n    \n    @property\n    def all_features(self) -> List[str]:\n        \"\"\"获取所有启用的特征列表\"\"\"\n        features = self.market_features.copy()\n        if self.enable_sentiment:\n            features.extend(self.sentiment_features)\n        return features\n    \n    @property\n    def num_features(self) -> int:\n        \"\"\"特征数量\"\"\"\n        return len(self.all_features)\n\n\n# 默认配置实例\nDEFAULT_CONFIG = AlphaMiningConfig()\n"
  },
  {
    "path": "backend/app/alpha_mining/dsl/__init__.py",
    "content": "\"\"\"\n因子表达式 DSL（Domain Specific Language）\n\n包含操作符定义和词汇表管理。\n\"\"\"\n\nfrom .ops import OPS_CONFIG, ts_delay, ts_delta, ts_mean, ts_std\nfrom .vocab import FactorVocab, FEATURES\n\n__all__ = [\n    \"OPS_CONFIG\",\n    \"ts_delay\",\n    \"ts_delta\", \n    \"ts_mean\",\n    \"ts_std\",\n    \"FactorVocab\",\n    \"FEATURES\",\n]\n"
  },
  {
    "path": "backend/app/alpha_mining/dsl/ops.py",
    "content": "\"\"\"\n因子操作符定义\n\n定义因子表达式中可用的操作符，包括：\n- 算术运算：ADD, SUB, MUL, DIV\n- 一元运算：NEG, ABS, SIGN\n- 时序运算：DELAY, DELTA, MA, STD\n- 条件运算：GATE, MAX, MIN\n\nReferences:\n- AlphaGPT upstream/model_core/ops.py\n\"\"\"\n\nimport torch\nfrom typing import Callable, Tuple, List\n\n\n# ============================================================================\n# 时序操作函数（优化版本，支持 JIT 编译）\n# ============================================================================\n\ndef ts_delay(x: torch.Tensor, d: int = 1) -> torch.Tensor:\n    \"\"\"\n    时序延迟：将序列向右移动 d 步\n    \n    Args:\n        x: [batch, time_steps] 输入张量\n        d: 延迟步数\n        \n    Returns:\n        延迟后的张量，前 d 个位置填充 0\n    \"\"\"\n    if d == 0:\n        return x\n    if d < 0:\n        raise ValueError(f\"Delay must be non-negative, got {d}\")\n    \n    batch_size = x.shape[0]\n    pad = torch.zeros((batch_size, d), device=x.device, dtype=x.dtype)\n    return torch.cat([pad, x[:, :-d]], dim=1)\n\n\ndef ts_delta(x: torch.Tensor, d: int = 1) -> torch.Tensor:\n    \"\"\"\n    时序差分：计算 x[t] - x[t-d]\n    \n    Args:\n        x: [batch, time_steps] 输入张量\n        d: 差分步数\n        \n    Returns:\n        差分后的张量\n    \"\"\"\n    return x - ts_delay(x, d)\n\n\ndef ts_mean(x: torch.Tensor, window: int = 5) -> torch.Tensor:\n    \"\"\"\n    滑动平均\n    \n    Args:\n        x: [batch, time_steps] 输入张量\n        window: 窗口大小\n        \n    Returns:\n        滑动平均后的张量\n    \"\"\"\n    if window <= 0:\n        raise ValueError(f\"Window must be positive, got {window}\")\n    \n    # 使用 unfold 实现滑动窗口\n    batch_size, time_steps = x.shape\n    \n    # Padding\n    pad = torch.zeros((batch_size, window - 1), device=x.device, dtype=x.dtype)\n    x_padded = torch.cat([pad, x], dim=1)\n    \n    # 滑动窗口平均\n    result = x_padded.unfold(1, window, 1).mean(dim=-1)\n    return result\n\n\ndef ts_std(x: torch.Tensor, window: int = 5) -> torch.Tensor:\n    \"\"\"\n    滑动标准差\n    \n    Args:\n        x: [batch, time_steps] 输入张量\n        window: 窗口大小\n        \n    Returns:\n        滑动标准差后的张量\n    \"\"\"\n    if window <= 0:\n        raise ValueError(f\"Window must be positive, got {window}\")\n    \n    batch_size, time_steps = x.shape\n    \n    # Padding\n    pad = torch.zeros((batch_size, window - 1), device=x.device, dtype=x.dtype)\n    x_padded = torch.cat([pad, x], dim=1)\n    \n    # 滑动窗口标准差\n    result = x_padded.unfold(1, window, 1).std(dim=-1)\n    return result\n\n\ndef _op_gate(condition: torch.Tensor, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n    \"\"\"\n    条件选择：condition > 0 时返回 x，否则返回 y\n    \n    类似于 torch.where(condition > 0, x, y)\n    \"\"\"\n    mask = (condition > 0).float()\n    return mask * x + (1.0 - mask) * y\n\n\ndef _op_jump(x: torch.Tensor) -> torch.Tensor:\n    \"\"\"\n    跳跃检测：返回超过 3 sigma 的异常值\n    \n    用于检测价格跳跃/异常波动\n    \"\"\"\n    mean = x.mean(dim=1, keepdim=True)\n    std = x.std(dim=1, keepdim=True) + 1e-6\n    z = (x - mean) / std\n    return torch.relu(z - 3.0)\n\n\ndef _op_decay(x: torch.Tensor) -> torch.Tensor:\n    \"\"\"\n    衰减加权：x + 0.8*x[-1] + 0.6*x[-2]\n    \n    给近期数据更高权重\n    \"\"\"\n    return x + 0.8 * ts_delay(x, 1) + 0.6 * ts_delay(x, 2)\n\n\ndef _op_max3(x: torch.Tensor) -> torch.Tensor:\n    \"\"\"\n    3 期最大值\n    \"\"\"\n    return torch.max(x, torch.max(ts_delay(x, 1), ts_delay(x, 2)))\n\n\n# ============================================================================\n# 操作符配置\n# ============================================================================\n\n# 操作符配置格式：(name, function, arity)\n# - name: 操作符名称\n# - function: 操作符函数\n# - arity: 参数数量（1=一元，2=二元，3=三元）\n\nOPS_CONFIG: List[Tuple[str, Callable, int]] = [\n    # 二元算术运算\n    ('ADD', lambda x, y: x + y, 2),\n    ('SUB', lambda x, y: x - y, 2),\n    ('MUL', lambda x, y: x * y, 2),\n    ('DIV', lambda x, y: x / (y + 1e-6), 2),  # 安全除法\n    \n    # 一元运算\n    ('NEG', lambda x: -x, 1),\n    ('ABS', torch.abs, 1),\n    ('SIGN', torch.sign, 1),\n    \n    # 条件运算\n    ('GATE', _op_gate, 3),  # 条件选择\n    ('MAX', lambda x, y: torch.max(x, y), 2),\n    ('MIN', lambda x, y: torch.min(x, y), 2),\n    \n    # 时序运算\n    ('DELAY1', lambda x: ts_delay(x, 1), 1),\n    ('DELAY5', lambda x: ts_delay(x, 5), 1),\n    ('DELTA1', lambda x: ts_delta(x, 1), 1),\n    ('DELTA5', lambda x: ts_delta(x, 5), 1),\n    ('MA5', lambda x: ts_mean(x, 5), 1),\n    ('MA10', lambda x: ts_mean(x, 10), 1),\n    ('STD5', lambda x: ts_std(x, 5), 1),\n    ('STD10', lambda x: ts_std(x, 10), 1),\n    \n    # 特殊运算\n    ('JUMP', _op_jump, 1),\n    ('DECAY', _op_decay, 1),\n    ('MAX3', _op_max3, 1),\n]\n\n\ndef get_op_names() -> List[str]:\n    \"\"\"获取所有操作符名称\"\"\"\n    return [op[0] for op in OPS_CONFIG]\n\n\ndef get_op_by_name(name: str) -> Tuple[Callable, int]:\n    \"\"\"\n    根据名称获取操作符函数和参数数量\n    \n    Args:\n        name: 操作符名称\n        \n    Returns:\n        (function, arity) 元组\n        \n    Raises:\n        KeyError: 如果操作符不存在\n    \"\"\"\n    for op_name, func, arity in OPS_CONFIG:\n        if op_name == name:\n            return func, arity\n    raise KeyError(f\"Unknown operator: {name}\")\n\n\ndef get_num_ops() -> int:\n    \"\"\"获取操作符数量\"\"\"\n    return len(OPS_CONFIG)\n"
  },
  {
    "path": "backend/app/alpha_mining/dsl/vocab.py",
    "content": "\"\"\"\n因子词汇表管理\n\n管理因子表达式中的 token 词汇表，包括：\n- 特征 token（RET, VOL, VOLUME_CHG 等）\n- 操作符 token（ADD, SUB, MUL 等）\n\n提供 token <-> name 双向映射。\n\nReferences:\n- AlphaGPT upstream/model_core/alphagpt.py:10-14\n\"\"\"\n\nfrom typing import List, Dict, Optional\nfrom dataclasses import dataclass, field\n\nfrom .ops import OPS_CONFIG, get_op_names\n\n\n# 默认特征列表\nFEATURES: List[str] = [\n    \"RET\",           # 收益率\n    \"VOL\",           # 波动率\n    \"VOLUME_CHG\",    # 成交量变化\n    \"TURNOVER\",      # 换手率\n    \"SENTIMENT\",     # 情感分数\n    \"NEWS_COUNT\",    # 新闻数量\n]\n\n\n@dataclass\nclass FactorVocab:\n    \"\"\"\n    因子词汇表\n    \n    词汇表结构：[FEATURES..., OPERATORS...]\n    - 前 num_features 个 token 是特征\n    - 后 num_ops 个 token 是操作符\n    \n    Example:\n        vocab = FactorVocab(features=[\"RET\", \"VOL\"])\n        vocab.token_to_name(0)  # -> \"RET\"\n        vocab.name_to_token(\"ADD\")  # -> 2 (假设有 2 个特征)\n    \"\"\"\n    \n    features: List[str] = field(default_factory=lambda: FEATURES.copy())\n    \n    def __post_init__(self):\n        \"\"\"初始化词汇表映射\"\"\"\n        self._operators = get_op_names()\n        self._vocab = self.features + self._operators\n        \n        # 构建映射\n        self._token_to_name: Dict[int, str] = {\n            i: name for i, name in enumerate(self._vocab)\n        }\n        self._name_to_token: Dict[str, int] = {\n            name: i for i, name in enumerate(self._vocab)\n        }\n    \n    @property\n    def vocab_size(self) -> int:\n        \"\"\"词汇表大小\"\"\"\n        return len(self._vocab)\n    \n    @property\n    def num_features(self) -> int:\n        \"\"\"特征数量\"\"\"\n        return len(self.features)\n    \n    @property\n    def num_ops(self) -> int:\n        \"\"\"操作符数量\"\"\"\n        return len(self._operators)\n    \n    @property\n    def feature_offset(self) -> int:\n        \"\"\"特征 token 的结束位置（也是操作符的起始位置）\"\"\"\n        return self.num_features\n    \n    def token_to_name(self, token: int) -> str:\n        \"\"\"\n        将 token ID 转换为名称\n        \n        Args:\n            token: token ID\n            \n        Returns:\n            token 对应的名称\n            \n        Raises:\n            KeyError: 如果 token 不存在\n        \"\"\"\n        if token not in self._token_to_name:\n            raise KeyError(f\"Unknown token: {token}\")\n        return self._token_to_name[token]\n    \n    def name_to_token(self, name: str) -> int:\n        \"\"\"\n        将名称转换为 token ID\n        \n        Args:\n            name: 特征或操作符名称\n            \n        Returns:\n            对应的 token ID\n            \n        Raises:\n            KeyError: 如果名称不存在\n        \"\"\"\n        if name not in self._name_to_token:\n            raise KeyError(f\"Unknown name: {name}\")\n        return self._name_to_token[name]\n    \n    def is_feature(self, token: int) -> bool:\n        \"\"\"判断 token 是否为特征\"\"\"\n        return 0 <= token < self.feature_offset\n    \n    def is_operator(self, token: int) -> bool:\n        \"\"\"判断 token 是否为操作符\"\"\"\n        return self.feature_offset <= token < self.vocab_size\n    \n    def get_operator_arity(self, token: int) -> int:\n        \"\"\"\n        获取操作符的参数数量\n        \n        Args:\n            token: 操作符 token ID\n            \n        Returns:\n            参数数量（1, 2 或 3）\n            \n        Raises:\n            ValueError: 如果不是操作符\n        \"\"\"\n        if not self.is_operator(token):\n            raise ValueError(f\"Token {token} is not an operator\")\n        \n        op_index = token - self.feature_offset\n        return OPS_CONFIG[op_index][2]\n    \n    def get_operator_func(self, token: int):\n        \"\"\"\n        获取操作符的函数\n        \n        Args:\n            token: 操作符 token ID\n            \n        Returns:\n            操作符函数\n            \n        Raises:\n            ValueError: 如果不是操作符\n        \"\"\"\n        if not self.is_operator(token):\n            raise ValueError(f\"Token {token} is not an operator\")\n        \n        op_index = token - self.feature_offset\n        return OPS_CONFIG[op_index][1]\n    \n    def get_all_tokens(self) -> List[int]:\n        \"\"\"获取所有 token ID\"\"\"\n        return list(range(self.vocab_size))\n    \n    def get_feature_tokens(self) -> List[int]:\n        \"\"\"获取所有特征 token ID\"\"\"\n        return list(range(self.num_features))\n    \n    def get_operator_tokens(self) -> List[int]:\n        \"\"\"获取所有操作符 token ID\"\"\"\n        return list(range(self.feature_offset, self.vocab_size))\n    \n    def __repr__(self) -> str:\n        return f\"FactorVocab(features={self.features}, vocab_size={self.vocab_size})\"\n\n\n# 默认词汇表实例\nDEFAULT_VOCAB = FactorVocab()\n"
  },
  {
    "path": "backend/app/alpha_mining/features/__init__.py",
    "content": "\"\"\"\n特征构建器模块\n\n- MarketFeatureBuilder: 从行情数据构建特征\n- SentimentFeatureBuilder: 从新闻情感分析结果构建特征\n\"\"\"\n\nfrom .market import MarketFeatureBuilder\nfrom .sentiment import SentimentFeatureBuilder\n\n__all__ = [\"MarketFeatureBuilder\", \"SentimentFeatureBuilder\"]\n"
  },
  {
    "path": "backend/app/alpha_mining/features/market.py",
    "content": "\"\"\"\n行情特征构建器\n\n从原始行情数据（OHLCV）构建因子挖掘所需的标准化特征。\n\n特征列表：\n- RET: 收益率\n- VOL: 波动率（滚动标准差）\n- VOLUME_CHG: 成交量变化率\n- TURNOVER: 换手率\n\"\"\"\n\nimport torch\nfrom typing import Dict, List, Optional, Union\nimport pandas as pd\nimport numpy as np\nimport logging\n\nfrom ..config import AlphaMiningConfig, DEFAULT_CONFIG\n\nlogger = logging.getLogger(__name__)\n\n\nclass MarketFeatureBuilder:\n    \"\"\"\n    行情特征构建器\n    \n    从 OHLCV 数据构建标准化的因子特征。\n    \n    Args:\n        config: 配置实例\n        vol_window: 波动率计算窗口\n        normalize: 是否标准化特征\n        \n    Example:\n        builder = MarketFeatureBuilder()\n        features = builder.build(ohlcv_df)\n    \"\"\"\n    \n    # 支持的特征名称\n    FEATURE_NAMES = [\"RET\", \"VOL\", \"VOLUME_CHG\", \"TURNOVER\"]\n    \n    def __init__(\n        self,\n        config: Optional[AlphaMiningConfig] = None,\n        vol_window: int = 20,\n        normalize: bool = True\n    ):\n        self.config = config or DEFAULT_CONFIG\n        self.vol_window = vol_window\n        self.normalize = normalize\n        \n        logger.info(\n            f\"MarketFeatureBuilder initialized: \"\n            f\"vol_window={vol_window}, normalize={normalize}\"\n        )\n    \n    def build(\n        self,\n        data: Union[pd.DataFrame, Dict[str, torch.Tensor]],\n        device: Optional[torch.device] = None\n    ) -> torch.Tensor:\n        \"\"\"\n        从行情数据构建特征张量\n        \n        Args:\n            data: 行情数据，DataFrame 或张量字典\n                DataFrame 需包含: close, volume, (可选: turnover, shares)\n                Dict 需包含: close, volume\n            device: 目标设备\n            \n        Returns:\n            特征张量 [batch, num_features, time_steps]\n        \"\"\"\n        device = device or self.config.torch_device\n        \n        if isinstance(data, pd.DataFrame):\n            return self._build_from_dataframe(data, device)\n        elif isinstance(data, dict):\n            return self._build_from_tensors(data, device)\n        else:\n            raise ValueError(f\"Unsupported data type: {type(data)}\")\n    \n    def _build_from_dataframe(\n        self,\n        df: pd.DataFrame,\n        device: torch.device\n    ) -> torch.Tensor:\n        \"\"\"\n        从 DataFrame 构建特征\n        \n        支持两种格式：\n        1. 单股票：index=date, columns=[close, volume, ...]\n        2. 多股票：MultiIndex 或 pivot 后的 DataFrame\n        \"\"\"\n        # 确保列名小写\n        df = df.copy()\n        df.columns = [c.lower() for c in df.columns]\n        \n        # 检查必需列\n        if \"close\" not in df.columns:\n            raise ValueError(\"DataFrame must have 'close' column\")\n        \n        # 计算各特征\n        close = torch.tensor(df[\"close\"].values, dtype=torch.float32)\n        \n        # RET: 收益率\n        ret = self._calc_returns(close)\n        \n        # VOL: 波动率\n        vol = self._calc_volatility(ret, self.vol_window)\n        \n        # VOLUME_CHG: 成交量变化\n        if \"volume\" in df.columns:\n            volume = torch.tensor(df[\"volume\"].values, dtype=torch.float32)\n            volume_chg = self._calc_pct_change(volume)\n        else:\n            volume_chg = torch.zeros_like(ret)\n        \n        # TURNOVER: 换手率\n        if \"turnover\" in df.columns:\n            turnover = torch.tensor(df[\"turnover\"].values, dtype=torch.float32)\n        elif \"volume\" in df.columns and \"shares\" in df.columns:\n            volume = df[\"volume\"].values\n            shares = df[\"shares\"].values\n            turnover = torch.tensor(volume / (shares + 1e-8), dtype=torch.float32)\n        else:\n            turnover = torch.zeros_like(ret)\n        \n        # Stack features: [num_features, time_steps]\n        features = torch.stack([ret, vol, volume_chg, turnover], dim=0)\n        \n        # 标准化\n        if self.normalize:\n            features = self._robust_normalize(features)\n        \n        # 添加 batch 维度: [1, num_features, time_steps]\n        features = features.unsqueeze(0).to(device)\n        \n        return features\n    \n    def _build_from_tensors(\n        self,\n        data: Dict[str, torch.Tensor],\n        device: torch.device\n    ) -> torch.Tensor:\n        \"\"\"\n        从张量字典构建特征\n        \n        Args:\n            data: 包含 close, volume 等张量的字典\n                  每个张量形状为 [batch, time_steps] 或 [time_steps]\n        \"\"\"\n        close = data[\"close\"]\n        \n        # 确保是 2D: [batch, time_steps]\n        if close.dim() == 1:\n            close = close.unsqueeze(0)\n        \n        batch_size, time_steps = close.shape\n        \n        # RET\n        ret = self._calc_returns(close)\n        \n        # VOL\n        vol = self._calc_volatility(ret, self.vol_window)\n        \n        # VOLUME_CHG\n        if \"volume\" in data:\n            volume = data[\"volume\"]\n            if volume.dim() == 1:\n                volume = volume.unsqueeze(0)\n            volume_chg = self._calc_pct_change(volume)\n        else:\n            volume_chg = torch.zeros_like(ret)\n        \n        # TURNOVER\n        if \"turnover\" in data:\n            turnover = data[\"turnover\"]\n            if turnover.dim() == 1:\n                turnover = turnover.unsqueeze(0)\n        else:\n            turnover = torch.zeros_like(ret)\n        \n        # Stack: [batch, num_features, time_steps]\n        features = torch.stack([ret, vol, volume_chg, turnover], dim=1)\n        \n        # 标准化\n        if self.normalize:\n            features = self._robust_normalize(features)\n        \n        return features.to(device)\n    \n    def _calc_returns(self, close: torch.Tensor) -> torch.Tensor:\n        \"\"\"计算收益率\"\"\"\n        # close: [batch, time] or [time]\n        if close.dim() == 1:\n            close = close.unsqueeze(0)\n        \n        prev_close = torch.roll(close, 1, dims=-1)\n        prev_close[..., 0] = close[..., 0]\n        \n        returns = (close - prev_close) / (prev_close + 1e-8)\n        returns[..., 0] = 0  # 第一个收益率设为 0\n        \n        return returns.squeeze(0) if close.size(0) == 1 else returns\n    \n    def _calc_volatility(self, returns: torch.Tensor, window: int) -> torch.Tensor:\n        \"\"\"计算滚动波动率\"\"\"\n        if returns.dim() == 1:\n            returns = returns.unsqueeze(0)\n        \n        batch_size, time_steps = returns.shape\n        \n        # Padding\n        pad = torch.zeros((batch_size, window - 1), device=returns.device)\n        padded = torch.cat([pad, returns], dim=-1)\n        \n        # 滚动标准差\n        vol = padded.unfold(-1, window, 1).std(dim=-1)\n        \n        return vol.squeeze(0) if batch_size == 1 else vol\n    \n    def _calc_pct_change(self, x: torch.Tensor) -> torch.Tensor:\n        \"\"\"计算百分比变化\"\"\"\n        if x.dim() == 1:\n            x = x.unsqueeze(0)\n        \n        prev = torch.roll(x, 1, dims=-1)\n        prev[..., 0] = x[..., 0]\n        \n        pct = (x - prev) / (prev + 1e-8)\n        pct[..., 0] = 0\n        \n        return pct.squeeze(0) if x.size(0) == 1 else pct\n    \n    def _robust_normalize(self, features: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        稳健标准化（使用中位数和 MAD）\n        \n        Args:\n            features: [batch, num_features, time_steps] 或 [num_features, time_steps]\n        \"\"\"\n        if features.dim() == 2:\n            features = features.unsqueeze(0)\n        \n        # 计算每个特征的中位数\n        median = features.median(dim=-1, keepdim=True).values\n        \n        # 计算 MAD\n        mad = (features - median).abs().median(dim=-1, keepdim=True).values + 1e-6\n        \n        # 标准化\n        normalized = (features - median) / mad\n        \n        # 裁剪极端值\n        normalized = torch.clamp(normalized, -5.0, 5.0)\n        \n        return normalized\n    \n    def get_feature_names(self) -> List[str]:\n        \"\"\"获取特征名称列表\"\"\"\n        return self.FEATURE_NAMES.copy()\n    \n    def build_batch(\n        self,\n        data_list: List[Union[pd.DataFrame, Dict[str, torch.Tensor]]],\n        device: Optional[torch.device] = None\n    ) -> torch.Tensor:\n        \"\"\"\n        批量构建特征\n        \n        Args:\n            data_list: 行情数据列表\n            device: 目标设备\n            \n        Returns:\n            特征张量 [batch, num_features, time_steps]\n        \"\"\"\n        features_list = []\n        for data in data_list:\n            features = self.build(data, device)\n            features_list.append(features)\n        \n        return torch.cat(features_list, dim=0)\n"
  },
  {
    "path": "backend/app/alpha_mining/features/sentiment.py",
    "content": "\"\"\"\n情感特征构建器\n\n从 FinnewsHunter 的新闻分析结果构建情感特征。\n\n特征列表：\n- SENTIMENT: 情感分数（-1 到 1）\n- NEWS_COUNT: 新闻数量（标准化）\n\n与 FinnewsHunter 现有组件集成：\n- 使用 SentimentAgent 的分析结果\n- 从 PostgreSQL/Milvus 获取历史情感数据\n\"\"\"\n\nimport torch\nfrom typing import Dict, List, Optional, Union, Any\nimport pandas as pd\nimport numpy as np\nimport logging\nfrom datetime import datetime, timedelta\n\nfrom ..config import AlphaMiningConfig, DEFAULT_CONFIG\n\nlogger = logging.getLogger(__name__)\n\n\nclass SentimentFeatureBuilder:\n    \"\"\"\n    情感特征构建器\n    \n    从新闻情感分析结果构建因子特征。\n    \n    Args:\n        config: 配置实例\n        sentiment_decay: 情感衰减因子（用于时序平滑）\n        normalize: 是否标准化特征\n        \n    Example:\n        builder = SentimentFeatureBuilder()\n        features = builder.build(sentiment_data)\n    \"\"\"\n    \n    # 支持的特征名称\n    FEATURE_NAMES = [\"SENTIMENT\", \"NEWS_COUNT\"]\n    \n    def __init__(\n        self,\n        config: Optional[AlphaMiningConfig] = None,\n        sentiment_decay: float = 0.9,\n        normalize: bool = True\n    ):\n        self.config = config or DEFAULT_CONFIG\n        self.sentiment_decay = sentiment_decay\n        self.normalize = normalize\n        \n        logger.info(\n            f\"SentimentFeatureBuilder initialized: \"\n            f\"decay={sentiment_decay}, normalize={normalize}\"\n        )\n    \n    def build(\n        self,\n        data: Union[pd.DataFrame, Dict[str, Any], List[Dict]],\n        time_steps: Optional[int] = None,\n        device: Optional[torch.device] = None\n    ) -> torch.Tensor:\n        \"\"\"\n        从情感数据构建特征张量\n        \n        Args:\n            data: 情感数据，支持多种格式：\n                - DataFrame: columns=[date, sentiment, news_count]\n                - Dict: {\"sentiment\": [...], \"news_count\": [...]}\n                - List[Dict]: [{\"date\": ..., \"sentiment\": ..., \"count\": ...}, ...]\n            time_steps: 目标时间步数（用于对齐行情数据）\n            device: 目标设备\n            \n        Returns:\n            特征张量 [1, 2, time_steps] (SENTIMENT, NEWS_COUNT)\n        \"\"\"\n        device = device or self.config.torch_device\n        \n        if isinstance(data, pd.DataFrame):\n            sentiment, news_count = self._parse_dataframe(data)\n        elif isinstance(data, dict):\n            sentiment, news_count = self._parse_dict(data)\n        elif isinstance(data, list):\n            sentiment, news_count = self._parse_list(data)\n        else:\n            raise ValueError(f\"Unsupported data type: {type(data)}\")\n        \n        # 转换为张量\n        sentiment = torch.tensor(sentiment, dtype=torch.float32)\n        news_count = torch.tensor(news_count, dtype=torch.float32)\n        \n        # 对齐时间步\n        if time_steps is not None:\n            sentiment = self._align_time_steps(sentiment, time_steps)\n            news_count = self._align_time_steps(news_count, time_steps)\n        \n        # 应用情感衰减（指数平滑）\n        sentiment = self._apply_decay(sentiment)\n        \n        # Stack: [2, time_steps]\n        features = torch.stack([sentiment, news_count], dim=0)\n        \n        # 标准化\n        if self.normalize:\n            features = self._normalize(features)\n        \n        # 添加 batch 维度: [1, 2, time_steps]\n        features = features.unsqueeze(0).to(device)\n        \n        return features\n    \n    def _parse_dataframe(self, df: pd.DataFrame):\n        \"\"\"从 DataFrame 解析情感数据\"\"\"\n        df = df.copy()\n        df.columns = [c.lower() for c in df.columns]\n        \n        # 情感分数\n        if \"sentiment\" in df.columns:\n            sentiment = df[\"sentiment\"].fillna(0).values\n        elif \"sentiment_score\" in df.columns:\n            sentiment = df[\"sentiment_score\"].fillna(0).values\n        else:\n            sentiment = np.zeros(len(df))\n            logger.warning(\"No sentiment column found, using zeros\")\n        \n        # 新闻数量\n        if \"news_count\" in df.columns:\n            news_count = df[\"news_count\"].fillna(0).values\n        elif \"count\" in df.columns:\n            news_count = df[\"count\"].fillna(0).values\n        else:\n            news_count = np.ones(len(df))  # 默认每天 1 条\n        \n        return sentiment, news_count\n    \n    def _parse_dict(self, data: Dict[str, Any]):\n        \"\"\"从字典解析情感数据\"\"\"\n        sentiment = data.get(\"sentiment\", data.get(\"sentiment_score\", []))\n        news_count = data.get(\"news_count\", data.get(\"count\", []))\n        \n        sentiment = np.array(sentiment) if sentiment else np.array([0])\n        news_count = np.array(news_count) if news_count else np.array([1])\n        \n        return sentiment, news_count\n    \n    def _parse_list(self, data: List[Dict]):\n        \"\"\"从列表解析情感数据\"\"\"\n        sentiment = []\n        news_count = []\n        \n        for item in data:\n            s = item.get(\"sentiment\", item.get(\"sentiment_score\", 0))\n            c = item.get(\"news_count\", item.get(\"count\", 1))\n            sentiment.append(s)\n            news_count.append(c)\n        \n        return np.array(sentiment), np.array(news_count)\n    \n    def _align_time_steps(self, x: torch.Tensor, target_len: int) -> torch.Tensor:\n        \"\"\"对齐时间步长度\"\"\"\n        current_len = len(x)\n        \n        if current_len == target_len:\n            return x\n        elif current_len > target_len:\n            # 截取最近的数据\n            return x[-target_len:]\n        else:\n            # 前面填充 0\n            pad = torch.zeros(target_len - current_len)\n            return torch.cat([pad, x])\n    \n    def _apply_decay(self, sentiment: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        应用指数衰减平滑\n        \n        情感影响会随时间衰减，使用指数移动平均来平滑\n        \"\"\"\n        if self.sentiment_decay >= 1.0:\n            return sentiment\n        \n        result = torch.zeros_like(sentiment)\n        result[0] = sentiment[0]\n        \n        for i in range(1, len(sentiment)):\n            result[i] = self.sentiment_decay * result[i-1] + (1 - self.sentiment_decay) * sentiment[i]\n        \n        return result\n    \n    def _normalize(self, features: torch.Tensor) -> torch.Tensor:\n        \"\"\"标准化特征\"\"\"\n        # features: [2, time_steps]\n        \n        # SENTIMENT: 已经在 [-1, 1] 范围内，保持不变\n        # NEWS_COUNT: 标准化到 0 均值、1 标准差\n        news_count = features[1]\n        if news_count.std() > 0:\n            features[1] = (news_count - news_count.mean()) / (news_count.std() + 1e-6)\n        \n        # 裁剪极端值\n        features = torch.clamp(features, -5.0, 5.0)\n        \n        return features\n    \n    def get_feature_names(self) -> List[str]:\n        \"\"\"获取特征名称列表\"\"\"\n        return self.FEATURE_NAMES.copy()\n    \n    def build_from_finnews(\n        self,\n        stock_code: str,\n        start_date: datetime,\n        end_date: datetime,\n        db_session: Any = None,\n        device: Optional[torch.device] = None\n    ) -> torch.Tensor:\n        \"\"\"\n        从 FinnewsHunter 数据库构建情感特征\n        \n        Args:\n            stock_code: 股票代码\n            start_date: 开始日期\n            end_date: 结束日期\n            db_session: 数据库会话（可选，用于真实数据）\n            device: 目标设备\n            \n        Returns:\n            特征张量 [1, 2, time_steps]\n        \"\"\"\n        device = device or self.config.torch_device\n        \n        # 计算交易日数\n        time_steps = (end_date - start_date).days\n        \n        if db_session is None:\n            # 无数据库连接时返回模拟数据\n            logger.warning(\"No db_session provided, returning mock sentiment data\")\n            return self._generate_mock_sentiment(time_steps, device)\n        \n        # TODO: 实现真实数据查询\n        # 查询逻辑示例：\n        # query = \"\"\"\n        #     SELECT date, AVG(sentiment_score) as sentiment, COUNT(*) as news_count\n        #     FROM news_analysis\n        #     WHERE stock_code = :code AND date BETWEEN :start AND :end\n        #     GROUP BY date\n        #     ORDER BY date\n        # \"\"\"\n        # results = db_session.execute(query, {...})\n        \n        logger.info(f\"Building sentiment features for {stock_code}\")\n        return self._generate_mock_sentiment(time_steps, device)\n    \n    def _generate_mock_sentiment(\n        self,\n        time_steps: int,\n        device: torch.device\n    ) -> torch.Tensor:\n        \"\"\"生成模拟情感数据\"\"\"\n        # 模拟情感分数（正态分布，均值 0）\n        sentiment = torch.randn(time_steps) * 0.3\n        sentiment = torch.clamp(sentiment, -1, 1)\n        \n        # 模拟新闻数量（泊松分布）\n        news_count = torch.abs(torch.randn(time_steps)) * 3 + 1\n        \n        # Stack 并添加 batch 维度\n        features = torch.stack([sentiment, news_count], dim=0)\n        \n        if self.normalize:\n            features = self._normalize(features)\n        \n        return features.unsqueeze(0).to(device)\n    \n    def combine_with_market(\n        self,\n        market_features: torch.Tensor,\n        sentiment_features: torch.Tensor\n    ) -> torch.Tensor:\n        \"\"\"\n        合并行情特征和情感特征\n        \n        Args:\n            market_features: [batch, 4, time_steps] (RET, VOL, VOLUME_CHG, TURNOVER)\n            sentiment_features: [batch, 2, time_steps] (SENTIMENT, NEWS_COUNT)\n            \n        Returns:\n            合并后的特征 [batch, 6, time_steps]\n        \"\"\"\n        return torch.cat([market_features, sentiment_features], dim=1)\n"
  },
  {
    "path": "backend/app/alpha_mining/model/__init__.py",
    "content": "\"\"\"\n因子生成模型和训练器\n\n- AlphaGenerator: Transformer 策略网络，生成因子表达式\n- AlphaTrainer: RL 训练器，使用 REINFORCE 算法优化\n\"\"\"\n\nfrom .alpha_generator import AlphaGenerator\nfrom .trainer import AlphaTrainer\n\n__all__ = [\"AlphaGenerator\", \"AlphaTrainer\"]\n"
  },
  {
    "path": "backend/app/alpha_mining/model/alpha_generator.py",
    "content": "\"\"\"\n因子生成模型\n\n基于 Transformer 的策略网络，用于生成因子表达式 token 序列。\n\n架构：\n- Token Embedding + Position Embedding\n- Transformer Encoder（使用 causal mask）\n- Policy Head（输出 token 概率）\n- Value Head（估计状态价值，用于 Actor-Critic）\n\nReferences:\n- AlphaGPT upstream/model_core/alphagpt.py\n\"\"\"\n\nimport torch\nimport torch.nn as nn\nfrom torch.distributions import Categorical\nfrom typing import Tuple, List, Optional\nimport logging\n\nfrom ..config import AlphaMiningConfig, DEFAULT_CONFIG\nfrom ..dsl.vocab import FactorVocab, DEFAULT_VOCAB\n\nlogger = logging.getLogger(__name__)\n\n\nclass AlphaGenerator(nn.Module):\n    \"\"\"\n    因子生成器（Transformer 策略网络）\n    \n    使用 Transformer 架构生成因子表达式的 token 序列。\n    \n    Args:\n        vocab: 词汇表实例\n        config: 配置实例\n        \n    Example:\n        generator = AlphaGenerator()\n        tokens = torch.zeros((batch_size, 1), dtype=torch.long)\n        logits, value = generator(tokens)\n    \"\"\"\n    \n    def __init__(\n        self, \n        vocab: Optional[FactorVocab] = None,\n        config: Optional[AlphaMiningConfig] = None\n    ):\n        super().__init__()\n        \n        self.vocab = vocab or DEFAULT_VOCAB\n        self.config = config or DEFAULT_CONFIG\n        \n        # 模型参数\n        self.vocab_size = self.vocab.vocab_size\n        self.d_model = self.config.d_model\n        self.max_seq_len = self.config.max_seq_len\n        \n        # Token Embedding\n        self.token_emb = nn.Embedding(self.vocab_size, self.d_model)\n        \n        # Position Embedding（可学习的位置编码）\n        self.pos_emb = nn.Parameter(\n            torch.zeros(1, self.max_seq_len + 1, self.d_model)\n        )\n        \n        # Transformer Encoder\n        encoder_layer = nn.TransformerEncoderLayer(\n            d_model=self.d_model,\n            nhead=self.config.nhead,\n            dim_feedforward=self.d_model * 2,\n            batch_first=True,\n            dropout=0.1\n        )\n        self.transformer = nn.TransformerEncoder(\n            encoder_layer,\n            num_layers=self.config.num_layers\n        )\n        \n        # Output heads\n        self.ln_f = nn.LayerNorm(self.d_model)\n        self.policy_head = nn.Linear(self.d_model, self.vocab_size)  # Actor\n        self.value_head = nn.Linear(self.d_model, 1)  # Critic\n        \n        # 初始化权重\n        self._init_weights()\n        \n        logger.info(\n            f\"AlphaGenerator initialized: vocab_size={self.vocab_size}, \"\n            f\"d_model={self.d_model}, max_seq_len={self.max_seq_len}\"\n        )\n    \n    def _init_weights(self):\n        \"\"\"初始化模型权重\"\"\"\n        # 使用 Xavier 初始化\n        for module in self.modules():\n            if isinstance(module, nn.Linear):\n                nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.zeros_(module.bias)\n            elif isinstance(module, nn.Embedding):\n                nn.init.normal_(module.weight, mean=0.0, std=0.02)\n    \n    def forward(\n        self, \n        tokens: torch.Tensor\n    ) -> Tuple[torch.Tensor, torch.Tensor]:\n        \"\"\"\n        前向传播\n        \n        Args:\n            tokens: 输入 token 序列 [batch, seq_len]\n            \n        Returns:\n            logits: 下一个 token 的 logits [batch, vocab_size]\n            value: 状态价值估计 [batch, 1]\n        \"\"\"\n        batch_size, seq_len = tokens.size()\n        device = tokens.device\n        \n        # Token + Position Embedding\n        x = self.token_emb(tokens) + self.pos_emb[:, :seq_len, :]\n        \n        # Causal Mask（确保只能看到之前的 token）\n        mask = nn.Transformer.generate_square_subsequent_mask(seq_len).to(device)\n        \n        # Transformer 编码\n        x = self.transformer(x, mask=mask, is_causal=True)\n        \n        # Layer Norm\n        x = self.ln_f(x)\n        \n        # 取最后一个位置的表示\n        last_hidden = x[:, -1, :]  # [batch, d_model]\n        \n        # 输出 heads\n        logits = self.policy_head(last_hidden)  # [batch, vocab_size]\n        value = self.value_head(last_hidden)    # [batch, 1]\n        \n        return logits, value\n    \n    @torch.no_grad()\n    def generate(\n        self,\n        batch_size: int = 1,\n        max_len: Optional[int] = None,\n        temperature: float = 1.0,\n        device: Optional[torch.device] = None\n    ) -> Tuple[List[List[int]], List[torch.Tensor]]:\n        \"\"\"\n        批量生成因子表达式\n        \n        使用自回归采样生成 token 序列。\n        \n        Args:\n            batch_size: 生成数量\n            max_len: 最大长度，默认使用 config.max_seq_len\n            temperature: 采样温度，越高越随机\n            device: 设备，默认使用 config.device\n            \n        Returns:\n            formulas: 生成的 token 序列列表\n            log_probs_list: 每个序列的 log_prob 列表（用于策略梯度）\n        \"\"\"\n        self.eval()\n        \n        max_len = max_len or self.config.max_seq_len\n        device = device or self.config.torch_device\n        \n        # 初始化：以空 token 开始（使用 0）\n        tokens = torch.zeros((batch_size, 1), dtype=torch.long, device=device)\n        \n        all_log_probs: List[List[torch.Tensor]] = [[] for _ in range(batch_size)]\n        \n        for step in range(max_len):\n            # 前向传播\n            logits, _ = self.forward(tokens)\n            \n            # 应用温度\n            if temperature != 1.0:\n                logits = logits / temperature\n            \n            # 采样\n            dist = Categorical(logits=logits)\n            action = dist.sample()  # [batch]\n            \n            # 记录 log_prob\n            log_prob = dist.log_prob(action)  # [batch]\n            for i in range(batch_size):\n                all_log_probs[i].append(log_prob[i])\n            \n            # 拼接到序列\n            tokens = torch.cat([tokens, action.unsqueeze(1)], dim=1)\n        \n        # 转换为列表格式\n        formulas = tokens[:, 1:].tolist()  # 去掉初始的 0\n        \n        # 将 log_probs 转换为 tensor 列表\n        log_probs_tensors = [torch.stack(lps) for lps in all_log_probs]\n        \n        return formulas, log_probs_tensors\n    \n    def generate_with_training(\n        self,\n        batch_size: int = 1,\n        max_len: Optional[int] = None,\n        device: Optional[torch.device] = None\n    ) -> Tuple[torch.Tensor, List[torch.Tensor], List[torch.Tensor]]:\n        \"\"\"\n        生成因子表达式（训练模式，保留梯度）\n        \n        Args:\n            batch_size: 生成数量\n            max_len: 最大长度\n            device: 设备\n            \n        Returns:\n            sequences: 生成的序列 [batch, seq_len]\n            log_probs: 每步的 log_prob 列表\n            values: 每步的 value 估计列表\n        \"\"\"\n        self.train()\n        \n        max_len = max_len or self.config.max_seq_len\n        device = device or self.config.torch_device\n        \n        # 初始化\n        tokens = torch.zeros((batch_size, 1), dtype=torch.long, device=device)\n        \n        log_probs_list = []\n        values_list = []\n        tokens_list = []\n        \n        for step in range(max_len):\n            # 前向传播\n            logits, value = self.forward(tokens)\n            \n            # 采样\n            dist = Categorical(logits=logits)\n            action = dist.sample()\n            \n            # 记录\n            log_probs_list.append(dist.log_prob(action))\n            values_list.append(value.squeeze(-1))\n            tokens_list.append(action)\n            \n            # 拼接\n            tokens = torch.cat([tokens, action.unsqueeze(1)], dim=1)\n        \n        # 组装结果\n        sequences = torch.stack(tokens_list, dim=1)  # [batch, max_len]\n        \n        return sequences, log_probs_list, values_list\n    \n    def save(self, path: str):\n        \"\"\"保存模型\"\"\"\n        torch.save({\n            'model_state_dict': self.state_dict(),\n            'vocab_size': self.vocab_size,\n            'd_model': self.d_model,\n            'max_seq_len': self.max_seq_len,\n        }, path)\n        logger.info(f\"Model saved to {path}\")\n    \n    @classmethod\n    def load(cls, path: str, vocab: Optional[FactorVocab] = None) -> 'AlphaGenerator':\n        \"\"\"加载模型\"\"\"\n        checkpoint = torch.load(path, map_location='cpu')\n        \n        # 创建模型\n        config = AlphaMiningConfig(\n            d_model=checkpoint['d_model'],\n            max_seq_len=checkpoint['max_seq_len']\n        )\n        model = cls(vocab=vocab, config=config)\n        \n        # 加载权重\n        model.load_state_dict(checkpoint['model_state_dict'])\n        logger.info(f\"Model loaded from {path}\")\n        \n        return model\n"
  },
  {
    "path": "backend/app/alpha_mining/model/trainer.py",
    "content": "\"\"\"\n因子挖掘 RL 训练器\n\n使用 REINFORCE 算法训练 AlphaGenerator，以回测收益为奖励信号。\n\n训练流程：\n1. 生成因子表达式\n2. 执行表达式得到因子值\n3. 回测评估因子有效性（计算奖励）\n4. 策略梯度更新\n\nReferences:\n- AlphaGPT upstream/model_core/engine.py\n\"\"\"\n\nimport torch\nfrom typing import Optional, List, Dict, Any, Callable\nfrom tqdm import tqdm\nimport logging\nimport json\nfrom pathlib import Path\n\nfrom ..config import AlphaMiningConfig, DEFAULT_CONFIG\nfrom ..dsl.vocab import FactorVocab, DEFAULT_VOCAB\nfrom ..vm.factor_vm import FactorVM\nfrom .alpha_generator import AlphaGenerator\n\nlogger = logging.getLogger(__name__)\n\n\nclass AlphaTrainer:\n    \"\"\"\n    因子挖掘 RL 训练器\n    \n    使用 REINFORCE 算法训练 AlphaGenerator。\n    \n    Args:\n        generator: 因子生成模型\n        vocab: 词汇表\n        config: 配置\n        evaluator: 因子评估函数，接收 (factor, returns) 返回 score\n    \"\"\"\n    \n    def __init__(\n        self,\n        generator: Optional[AlphaGenerator] = None,\n        vocab: Optional[FactorVocab] = None,\n        config: Optional[AlphaMiningConfig] = None,\n        evaluator: Optional[Callable[[torch.Tensor, torch.Tensor], float]] = None\n    ):\n        self.config = config or DEFAULT_CONFIG\n        self.vocab = vocab or DEFAULT_VOCAB\n        self.generator = generator or AlphaGenerator(vocab=self.vocab, config=self.config)\n        self.vm = FactorVM(vocab=self.vocab)\n        \n        # 默认评估器（简单 Sharpe-like）\n        self.evaluator = evaluator or self._default_evaluator\n        \n        # 优化器\n        self.optimizer = torch.optim.AdamW(\n            self.generator.parameters(),\n            lr=self.config.lr\n        )\n        \n        # 训练状态\n        self.best_score = -float('inf')\n        self.best_formula: Optional[List[int]] = None\n        self.best_formula_str: Optional[str] = None\n        self.training_history: List[Dict[str, Any]] = []\n        self.step_count = 0\n        \n        # 移动到指定设备\n        self.device = self.config.torch_device\n        self.generator.to(self.device)\n        \n        logger.info(f\"AlphaTrainer initialized on device: {self.device}\")\n    \n    def _default_evaluator(self, factor: torch.Tensor, returns: torch.Tensor) -> float:\n        \"\"\"\n        默认因子评估器（简化版 Sharpe-like）\n        \n        Args:\n            factor: 因子值 [batch, time_steps]\n            returns: 收益率 [batch, time_steps]\n            \n        Returns:\n            评分（越高越好）\n        \"\"\"\n        # 因子值作为信号（sigmoid 归一化）\n        signal = torch.sigmoid(factor)\n        \n        # 简单策略：signal > threshold 时持仓\n        threshold = self.config.signal_threshold\n        position = (signal > threshold).float()\n        \n        # 计算收益\n        pnl = position * returns\n        \n        # Sharpe-like ratio（简化）\n        mean_pnl = pnl.mean()\n        std_pnl = pnl.std() + 1e-6\n        \n        score = (mean_pnl / std_pnl).item()\n        return score\n    \n    def train_step(\n        self,\n        features: torch.Tensor,\n        returns: torch.Tensor\n    ) -> Dict[str, Any]:\n        \"\"\"\n        单步训练\n        \n        Args:\n            features: 特征张量 [batch, num_features, time_steps]\n            returns: 收益率张量 [batch, time_steps]\n            \n        Returns:\n            训练指标字典\n        \"\"\"\n        self.generator.train()\n        batch_size = self.config.batch_size\n        \n        # 1. 生成因子表达式\n        sequences, log_probs_list, _ = self.generator.generate_with_training(\n            batch_size=batch_size,\n            device=self.device\n        )\n        \n        # 2. 执行并评估每个公式\n        rewards = torch.zeros(batch_size, device=self.device)\n        valid_count = 0\n        \n        for i in range(batch_size):\n            formula = sequences[i].tolist()\n            \n            # 执行因子表达式\n            factor = self.vm.execute(formula, features)\n            \n            if factor is None:\n                # 无效公式\n                rewards[i] = self.config.invalid_formula_reward\n                continue\n            \n            # 检查是否为常量因子\n            if factor.std() < self.config.constant_threshold:\n                rewards[i] = self.config.constant_factor_reward\n                continue\n            \n            # 评估因子\n            try:\n                score = self.evaluator(factor, returns)\n                rewards[i] = score\n                valid_count += 1\n                \n                # 更新最优\n                if score > self.best_score:\n                    self.best_score = score\n                    self.best_formula = formula\n                    self.best_formula_str = self.vm.decode(formula)\n                    logger.info(\n                        f\"[Step {self.step_count}] New best: \"\n                        f\"score={score:.4f}, formula={self.best_formula_str}\"\n                    )\n            except Exception as e:\n                logger.warning(f\"Evaluation error: {e}\")\n                rewards[i] = self.config.invalid_formula_reward\n        \n        # 3. 计算 advantage（归一化）\n        adv = (rewards - rewards.mean()) / (rewards.std() + 1e-5)\n        \n        # 4. 策略梯度 loss\n        loss = torch.zeros(1, device=self.device)\n        for t, log_prob in enumerate(log_probs_list):\n            loss = loss - (log_prob * adv).mean()\n        \n        # 5. 反向传播\n        self.optimizer.zero_grad()\n        loss.backward()\n        \n        # 梯度裁剪\n        torch.nn.utils.clip_grad_norm_(self.generator.parameters(), max_norm=1.0)\n        \n        self.optimizer.step()\n        \n        # 6. 记录指标\n        self.step_count += 1\n        metrics = {\n            \"step\": self.step_count,\n            \"loss\": loss.item(),\n            \"avg_reward\": rewards.mean().item(),\n            \"max_reward\": rewards.max().item(),\n            \"min_reward\": rewards.min().item(),\n            \"valid_ratio\": valid_count / batch_size,\n            \"best_score\": self.best_score,\n            \"best_formula\": self.best_formula_str,\n        }\n        self.training_history.append(metrics)\n        \n        return metrics\n    \n    def train(\n        self,\n        features: torch.Tensor,\n        returns: torch.Tensor,\n        num_steps: Optional[int] = None,\n        progress_bar: bool = True,\n        step_callback: Optional[Callable[[Dict[str, Any]], None]] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        完整训练循环\n        \n        Args:\n            features: 特征张量 [num_samples, num_features, time_steps]\n            returns: 收益率张量 [num_samples, time_steps]\n            num_steps: 训练步数，默认使用 config.num_steps\n            progress_bar: 是否显示进度条\n            step_callback: 每步回调函数，接收 metrics 字典，用于 SSE 流式推送\n            \n        Returns:\n            训练结果\n        \"\"\"\n        num_steps = num_steps or self.config.num_steps\n        \n        logger.info(f\"Starting training for {num_steps} steps...\")\n        \n        # 确保数据在正确设备上\n        features = features.to(self.device)\n        returns = returns.to(self.device)\n        \n        iterator = range(num_steps)\n        if progress_bar:\n            iterator = tqdm(iterator, desc=\"Training\")\n        \n        for step in iterator:\n            metrics = self.train_step(features, returns)\n            \n            # 添加进度百分比\n            metrics[\"progress\"] = (step + 1) / num_steps * 100\n            metrics[\"total_steps\"] = num_steps\n            \n            if progress_bar:\n                iterator.set_postfix({\n                    \"loss\": f\"{metrics['loss']:.4f}\",\n                    \"avg_rew\": f\"{metrics['avg_reward']:.4f}\",\n                    \"best\": f\"{metrics['best_score']:.4f}\"\n                })\n            \n            # 调用回调函数（用于 SSE 流式推送）\n            if step_callback is not None:\n                try:\n                    step_callback(metrics)\n                except Exception as e:\n                    logger.warning(f\"Step callback error: {e}\")\n            \n            # 定期保存检查点\n            if self.step_count % self.config.save_every_n_steps == 0:\n                self._save_checkpoint()\n        \n        # 最终结果\n        result = {\n            \"total_steps\": self.step_count,\n            \"best_score\": self.best_score,\n            \"best_formula\": self.best_formula,\n            \"best_formula_str\": self.best_formula_str,\n            \"final_metrics\": self.training_history[-1] if self.training_history else None,\n        }\n        \n        logger.info(f\"Training complete. Best score: {self.best_score:.4f}\")\n        logger.info(f\"Best formula: {self.best_formula_str}\")\n        \n        return result\n    \n    def _save_checkpoint(self):\n        \"\"\"保存训练检查点\"\"\"\n        checkpoint_dir = Path(self.config.checkpoint_dir)\n        checkpoint_dir.mkdir(parents=True, exist_ok=True)\n        \n        # 保存模型\n        model_path = checkpoint_dir / f\"model_step_{self.step_count}.pt\"\n        self.generator.save(str(model_path))\n        \n        # 保存最优公式\n        if self.best_formula:\n            formula_path = checkpoint_dir / \"best_formula.json\"\n            with open(formula_path, 'w') as f:\n                json.dump({\n                    \"formula\": self.best_formula,\n                    \"formula_str\": self.best_formula_str,\n                    \"score\": self.best_score,\n                    \"step\": self.step_count\n                }, f, indent=2)\n    \n    def get_best_formula(self) -> Optional[str]:\n        \"\"\"获取最优因子表达式字符串\"\"\"\n        return self.best_formula_str\n    \n    def get_training_history(self) -> List[Dict[str, Any]]:\n        \"\"\"获取训练历史\"\"\"\n        return self.training_history\n"
  },
  {
    "path": "backend/app/alpha_mining/tools/__init__.py",
    "content": "\"\"\"\nAgenticX 工具封装\n\n将因子挖掘能力封装为 AgenticX Tool，供 QuantitativeAgent 调用。\n\"\"\"\n\nfrom .alpha_mining_tool import AlphaMiningTool\n\n__all__ = [\"AlphaMiningTool\"]\n"
  },
  {
    "path": "backend/app/alpha_mining/tools/alpha_mining_tool.py",
    "content": "\"\"\"\nAlpha Mining AgenticX 工具封装\n\n将因子挖掘功能封装为 AgenticX BaseTool，供 Agent 调用。\n\n支持的操作：\n- mine: 挖掘新因子\n- evaluate: 评估现有因子\n- list: 列出已发现的因子\n\"\"\"\n\nimport torch\nfrom typing import Dict, Any, Optional, List\nfrom datetime import datetime\nimport logging\nimport json\nimport uuid\n\nfrom agenticx.core.tool_v2 import (\n    BaseTool,\n    ToolMetadata,\n    ToolParameter,\n    ToolResult,\n    ToolContext,\n    ToolCategory,\n    ToolStatus,\n    ParameterType\n)\n\nfrom ..config import AlphaMiningConfig, DEFAULT_CONFIG\nfrom ..dsl.vocab import FactorVocab, DEFAULT_VOCAB\nfrom ..vm.factor_vm import FactorVM\nfrom ..model.alpha_generator import AlphaGenerator\nfrom ..model.trainer import AlphaTrainer\nfrom ..features.market import MarketFeatureBuilder\nfrom ..features.sentiment import SentimentFeatureBuilder\nfrom ..backtest.evaluator import FactorEvaluator\nfrom ..utils import generate_mock_data\n\nlogger = logging.getLogger(__name__)\n\n\nclass AlphaMiningTool(BaseTool[Dict[str, Any]]):\n    \"\"\"\n    Alpha Mining 工具\n    \n    封装因子挖掘功能，供 QuantitativeAgent 调用。\n    \n    支持操作：\n    - mine: 使用 RL 挖掘新因子\n    - evaluate: 评估指定因子表达式\n    - generate: 生成候选因子\n    - list: 列出最优因子\n    \n    Example:\n        tool = AlphaMiningTool()\n        result = tool.execute({\n            \"action\": \"mine\",\n            \"num_steps\": 100,\n            \"use_sentiment\": True\n        }, context)\n    \"\"\"\n    \n    def __init__(\n        self,\n        config: Optional[AlphaMiningConfig] = None,\n        model_path: Optional[str] = None\n    ):\n        \"\"\"\n        初始化 Alpha Mining 工具\n        \n        Args:\n            config: 配置实例\n            model_path: 预训练模型路径\n        \"\"\"\n        self.config = config or DEFAULT_CONFIG\n        \n        metadata = ToolMetadata(\n            name=\"alpha_mining\",\n            version=\"1.0.0\",\n            description=\"量化因子自动挖掘工具，使用符号回归 + 强化学习发现有效交易因子\",\n            category=ToolCategory.ANALYSIS,\n            author=\"FinnewsHunter Team\",\n            tags=[\"quant\", \"factor\", \"alpha\", \"ml\", \"reinforcement-learning\"],\n            timeout=600,  # 10分钟超时\n            max_retries=1,\n        )\n        \n        super().__init__(metadata)\n        \n        # 初始化组件\n        self.vocab = DEFAULT_VOCAB\n        self.vm = FactorVM(vocab=self.vocab)\n        self.evaluator = FactorEvaluator(config=self.config)\n        self.market_builder = MarketFeatureBuilder(config=self.config)\n        self.sentiment_builder = SentimentFeatureBuilder(config=self.config)\n        \n        # 初始化模型\n        self.generator = AlphaGenerator(vocab=self.vocab, config=self.config)\n        self.trainer: Optional[AlphaTrainer] = None\n        \n        # 加载预训练模型\n        if model_path:\n            try:\n                self.generator = AlphaGenerator.load(model_path, vocab=self.vocab)\n                logger.info(f\"Loaded pretrained model from {model_path}\")\n            except Exception as e:\n                logger.warning(f\"Failed to load model: {e}\")\n        \n        # 存储发现的因子\n        self.discovered_factors: List[Dict[str, Any]] = []\n        \n        logger.info(\"AlphaMiningTool initialized\")\n    \n    def _setup_parameters(self) -> None:\n        \"\"\"设置工具参数\"\"\"\n        self._parameters = {\n            \"action\": ToolParameter(\n                name=\"action\",\n                type=ParameterType.STRING,\n                description=\"操作类型: mine(挖掘), evaluate(评估), generate(生成), list(列表)\",\n                required=True,\n                enum=[\"mine\", \"evaluate\", \"generate\", \"list\"]\n            ),\n            \"num_steps\": ToolParameter(\n                name=\"num_steps\",\n                type=ParameterType.INTEGER,\n                description=\"训练步数（仅 mine 操作）\",\n                required=False,\n                default=100,\n                minimum=1,\n                maximum=10000\n            ),\n            \"formula\": ToolParameter(\n                name=\"formula\",\n                type=ParameterType.STRING,\n                description=\"因子表达式（仅 evaluate 操作）\",\n                required=False\n            ),\n            \"use_sentiment\": ToolParameter(\n                name=\"use_sentiment\",\n                type=ParameterType.BOOLEAN,\n                description=\"是否使用情感特征\",\n                required=False,\n                default=True\n            ),\n            \"batch_size\": ToolParameter(\n                name=\"batch_size\",\n                type=ParameterType.INTEGER,\n                description=\"生成因子数量（仅 generate 操作）\",\n                required=False,\n                default=10,\n                minimum=1,\n                maximum=100\n            ),\n            \"top_k\": ToolParameter(\n                name=\"top_k\",\n                type=ParameterType.INTEGER,\n                description=\"返回最优因子数量（仅 list 操作）\",\n                required=False,\n                default=5,\n                minimum=1,\n                maximum=50\n            ),\n            \"market_data\": ToolParameter(\n                name=\"market_data\",\n                type=ParameterType.OBJECT,\n                description=\"行情数据（可选，不提供则使用模拟数据）\",\n                required=False\n            ),\n            \"sentiment_data\": ToolParameter(\n                name=\"sentiment_data\",\n                type=ParameterType.OBJECT,\n                description=\"情感数据（可选）\",\n                required=False\n            )\n        }\n    \n    def execute(self, parameters: Dict[str, Any], context: ToolContext) -> ToolResult:\n        \"\"\"同步执行工具\"\"\"\n        start_time = datetime.now()\n        \n        try:\n            validated = self.validate_parameters(parameters)\n            action = validated[\"action\"]\n            \n            if action == \"mine\":\n                result_data = self._action_mine(validated, context)\n            elif action == \"evaluate\":\n                result_data = self._action_evaluate(validated, context)\n            elif action == \"generate\":\n                result_data = self._action_generate(validated, context)\n            elif action == \"list\":\n                result_data = self._action_list(validated, context)\n            else:\n                raise ValueError(f\"Unknown action: {action}\")\n            \n            end_time = datetime.now()\n            \n            return ToolResult(\n                status=ToolStatus.SUCCESS,\n                data=result_data,\n                execution_time=(end_time - start_time).total_seconds(),\n                start_time=start_time,\n                end_time=end_time,\n                metadata={\"action\": action}\n            )\n            \n        except Exception as e:\n            logger.error(f\"AlphaMiningTool error: {e}\")\n            end_time = datetime.now()\n            \n            return ToolResult(\n                status=ToolStatus.ERROR,\n                error=str(e),\n                execution_time=(end_time - start_time).total_seconds(),\n                start_time=start_time,\n                end_time=end_time\n            )\n    \n    async def aexecute(self, parameters: Dict[str, Any], context: ToolContext) -> ToolResult:\n        \"\"\"异步执行工具\"\"\"\n        # 目前使用同步实现\n        return self.execute(parameters, context)\n    \n    def _action_mine(self, params: Dict[str, Any], context: ToolContext) -> Dict[str, Any]:\n        \"\"\"执行因子挖掘\"\"\"\n        num_steps = params.get(\"num_steps\", 100)\n        use_sentiment = params.get(\"use_sentiment\", True)\n        \n        # 准备特征数据\n        features, returns = self._prepare_features(params, use_sentiment)\n        \n        # 创建或复用训练器\n        if self.trainer is None:\n            self.trainer = AlphaTrainer(\n                generator=self.generator,\n                vocab=self.vocab,\n                config=self.config,\n                evaluator=self.evaluator.get_reward\n            )\n        \n        # 执行训练\n        logger.info(f\"Starting factor mining for {num_steps} steps...\")\n        result = self.trainer.train(\n            features=features,\n            returns=returns,\n            num_steps=num_steps,\n            progress_bar=False\n        )\n        \n        # 保存最优因子\n        if result[\"best_formula\"]:\n            factor_info = {\n                \"id\": str(uuid.uuid4()),\n                \"formula\": result[\"best_formula\"],\n                \"formula_str\": result[\"best_formula_str\"],\n                \"score\": result[\"best_score\"],\n                \"discovered_at\": datetime.now().isoformat(),\n                \"training_steps\": num_steps,\n                \"use_sentiment\": use_sentiment\n            }\n            self.discovered_factors.append(factor_info)\n            \n            # 保持只存储最优的 100 个\n            self.discovered_factors.sort(key=lambda x: x[\"score\"], reverse=True)\n            self.discovered_factors = self.discovered_factors[:100]\n        \n        return {\n            \"success\": True,\n            \"best_factor\": result[\"best_formula_str\"],\n            \"best_score\": result[\"best_score\"],\n            \"total_steps\": result[\"total_steps\"],\n            \"message\": f\"因子挖掘完成，最优因子: {result['best_formula_str']} (score={result['best_score']:.4f})\"\n        }\n    \n    def _action_evaluate(self, params: Dict[str, Any], context: ToolContext) -> Dict[str, Any]:\n        \"\"\"评估因子表达式\"\"\"\n        formula_str = params.get(\"formula\")\n        if not formula_str:\n            raise ValueError(\"Parameter 'formula' is required for evaluate action\")\n        \n        use_sentiment = params.get(\"use_sentiment\", True)\n        \n        # 解析公式\n        formula = self._parse_formula(formula_str)\n        if formula is None:\n            return {\n                \"success\": False,\n                \"error\": f\"Invalid formula: {formula_str}\",\n                \"message\": \"无法解析因子表达式\"\n            }\n        \n        # 准备数据\n        features, returns = self._prepare_features(params, use_sentiment)\n        \n        # 执行因子\n        factor = self.vm.execute(formula, features)\n        if factor is None:\n            return {\n                \"success\": False,\n                \"error\": \"Formula execution failed\",\n                \"message\": \"因子表达式执行失败\"\n            }\n        \n        # 评估\n        metrics = self.evaluator.evaluate(factor, returns)\n        \n        return {\n            \"success\": True,\n            \"formula\": formula_str,\n            \"metrics\": metrics,\n            \"message\": f\"因子评估完成: Sortino={metrics['sortino_ratio']:.4f}, IC={metrics['ic']:.4f}\"\n        }\n    \n    def _action_generate(self, params: Dict[str, Any], context: ToolContext) -> Dict[str, Any]:\n        \"\"\"生成候选因子\"\"\"\n        batch_size = params.get(\"batch_size\", 10)\n        use_sentiment = params.get(\"use_sentiment\", True)\n        \n        # 生成因子\n        formulas, _ = self.generator.generate(batch_size=batch_size)\n        \n        # 准备数据用于评估\n        features, returns = self._prepare_features(params, use_sentiment)\n        \n        # 评估每个因子\n        results = []\n        for formula in formulas:\n            factor = self.vm.execute(formula, features)\n            if factor is not None and factor.std() > 1e-6:\n                try:\n                    metrics = self.evaluator.evaluate(factor, returns)\n                    results.append({\n                        \"formula\": formula,\n                        \"formula_str\": self.vm.decode(formula),\n                        \"sortino\": metrics[\"sortino_ratio\"],\n                        \"ic\": metrics[\"ic\"]\n                    })\n                except Exception:\n                    continue\n        \n        # 按 Sortino 排序\n        results.sort(key=lambda x: x[\"sortino\"], reverse=True)\n        \n        return {\n            \"success\": True,\n            \"generated\": len(formulas),\n            \"valid\": len(results),\n            \"factors\": results[:10],  # 返回 top 10\n            \"message\": f\"生成 {len(formulas)} 个因子，其中 {len(results)} 个有效\"\n        }\n    \n    def _action_list(self, params: Dict[str, Any], context: ToolContext) -> Dict[str, Any]:\n        \"\"\"列出已发现的因子\"\"\"\n        top_k = params.get(\"top_k\", 5)\n        \n        factors = self.discovered_factors[:top_k]\n        \n        return {\n            \"success\": True,\n            \"total_discovered\": len(self.discovered_factors),\n            \"factors\": factors,\n            \"message\": f\"共发现 {len(self.discovered_factors)} 个因子，返回 top {len(factors)}\"\n        }\n    \n    def _prepare_features(\n        self,\n        params: Dict[str, Any],\n        use_sentiment: bool\n    ) -> tuple:\n        \"\"\"准备特征数据\"\"\"\n        market_data = params.get(\"market_data\")\n        sentiment_data = params.get(\"sentiment_data\")\n        \n        if market_data is not None:\n            # 使用提供的行情数据\n            market_features = self.market_builder.build(market_data)\n            time_steps = market_features.size(-1)\n            \n            if use_sentiment and sentiment_data is not None:\n                sentiment_features = self.sentiment_builder.build(\n                    sentiment_data, time_steps=time_steps\n                )\n                features = self.sentiment_builder.combine_with_market(\n                    market_features, sentiment_features\n                )\n            else:\n                features = market_features\n            \n            # 假设收益率在行情数据中\n            returns = market_features[:, 0, :]  # RET 特征\n        else:\n            # 使用模拟数据\n            num_features = 6 if use_sentiment else 4\n            features, returns = generate_mock_data(\n                num_samples=50,\n                num_features=num_features,\n                time_steps=252,\n                seed=42\n            )\n        \n        return features, returns\n    \n    def _parse_formula(self, formula_str: str) -> Optional[List[int]]:\n        \"\"\"解析因子表达式字符串\"\"\"\n        # 简单解析：尝试匹配已知的 token\n        tokens = []\n        \n        # 移除括号和空格，按操作符分割\n        clean = formula_str.replace(\"(\", \" \").replace(\")\", \" \").replace(\",\", \" \")\n        parts = clean.split()\n        \n        for part in parts:\n            part = part.strip()\n            if not part:\n                continue\n            \n            # 尝试作为特征名\n            try:\n                token = self.vocab.name_to_token(part)\n                tokens.append(token)\n            except (ValueError, KeyError):\n                # 尝试作为数字（常量）\n                try:\n                    float(part)\n                    # 忽略常量\n                    continue\n                except ValueError:\n                    logger.warning(f\"Unknown token: {part}\")\n                    return None\n        \n        return tokens if tokens else None\n"
  },
  {
    "path": "backend/app/alpha_mining/utils.py",
    "content": "\"\"\"\nAlpha Mining 工具函数\n\n提供模拟数据生成、数据预处理等工具函数。\n\"\"\"\n\nimport torch\nimport numpy as np\nfrom typing import Tuple, Optional\nimport logging\n\nfrom .config import AlphaMiningConfig, DEFAULT_CONFIG\n\nlogger = logging.getLogger(__name__)\n\n\ndef generate_mock_data(\n    num_samples: int = 100,\n    num_features: int = 6,\n    time_steps: int = 252,\n    seed: Optional[int] = 42,\n    device: Optional[torch.device] = None\n) -> Tuple[torch.Tensor, torch.Tensor]:\n    \"\"\"\n    生成模拟行情数据用于测试\n    \n    Args:\n        num_samples: 样本数（股票数）\n        num_features: 特征数\n        time_steps: 时间步数（交易日数）\n        seed: 随机种子\n        device: 设备\n        \n    Returns:\n        features: [num_samples, num_features, time_steps]\n        returns: [num_samples, time_steps]\n    \"\"\"\n    if seed is not None:\n        torch.manual_seed(seed)\n        np.random.seed(seed)\n    \n    device = device or DEFAULT_CONFIG.torch_device\n    \n    # 生成模拟收益率（正态分布）\n    returns = torch.randn(num_samples, time_steps, device=device) * 0.02\n    \n    # 生成模拟价格（累积收益）\n    prices = torch.exp(returns.cumsum(dim=1))\n    \n    # 生成模拟特征\n    features_list = []\n    \n    # Feature 0: RET - 收益率\n    ret = returns.clone()\n    features_list.append(ret)\n    \n    # Feature 1: VOL - 波动率（滚动 20 日标准差）\n    vol = _rolling_std(returns, window=20)\n    features_list.append(vol)\n    \n    # Feature 2: VOLUME_CHG - 成交量变化（模拟）\n    volume = torch.abs(torch.randn(num_samples, time_steps, device=device))\n    volume_chg = _pct_change(volume)\n    features_list.append(volume_chg)\n    \n    # Feature 3: TURNOVER - 换手率（模拟）\n    turnover = torch.abs(torch.randn(num_samples, time_steps, device=device)) * 0.05\n    features_list.append(turnover)\n    \n    # Feature 4: SENTIMENT - 情感分数（模拟）\n    sentiment = torch.randn(num_samples, time_steps, device=device) * 0.5\n    features_list.append(sentiment)\n    \n    # Feature 5: NEWS_COUNT - 新闻数量（模拟）\n    news_count = torch.abs(torch.randn(num_samples, time_steps, device=device)) * 5\n    features_list.append(news_count)\n    \n    # 如果需要更多特征，填充随机噪声\n    while len(features_list) < num_features:\n        noise = torch.randn(num_samples, time_steps, device=device)\n        features_list.append(noise)\n    \n    # 截取到指定特征数\n    features_list = features_list[:num_features]\n    \n    # Stack features: [num_samples, num_features, time_steps]\n    features = torch.stack(features_list, dim=1)\n    \n    # 标准化特征\n    features = _robust_normalize(features)\n    \n    logger.debug(\n        f\"Generated mock data: features {features.shape}, returns {returns.shape}\"\n    )\n    \n    return features, returns\n\n\ndef _rolling_std(x: torch.Tensor, window: int = 20) -> torch.Tensor:\n    \"\"\"\n    计算滚动标准差\n    \n    Args:\n        x: [batch, time_steps]\n        window: 窗口大小\n        \n    Returns:\n        滚动标准差 [batch, time_steps]\n    \"\"\"\n    batch_size, time_steps = x.shape\n    device = x.device\n    \n    # Padding\n    pad = torch.zeros((batch_size, window - 1), device=device)\n    x_padded = torch.cat([pad, x], dim=1)\n    \n    # 使用 unfold 计算滚动窗口\n    result = x_padded.unfold(1, window, 1).std(dim=-1)\n    \n    return result\n\n\ndef _pct_change(x: torch.Tensor) -> torch.Tensor:\n    \"\"\"\n    计算百分比变化\n    \n    Args:\n        x: [batch, time_steps]\n        \n    Returns:\n        百分比变化 [batch, time_steps]\n    \"\"\"\n    prev = torch.roll(x, 1, dims=1)\n    prev[:, 0] = x[:, 0]  # 第一个值不变\n    \n    pct = (x - prev) / (prev + 1e-8)\n    return pct\n\n\ndef _robust_normalize(x: torch.Tensor) -> torch.Tensor:\n    \"\"\"\n    稳健标准化（使用中位数和 MAD）\n    \n    Args:\n        x: [batch, num_features, time_steps]\n        \n    Returns:\n        标准化后的张量\n    \"\"\"\n    # 计算每个特征的中位数\n    median = x.median(dim=2, keepdim=True).values\n    \n    # 计算 MAD (Median Absolute Deviation)\n    mad = (x - median).abs().median(dim=2, keepdim=True).values + 1e-6\n    \n    # 标准化\n    normalized = (x - median) / mad\n    \n    # 裁剪极端值\n    normalized = torch.clamp(normalized, -5.0, 5.0)\n    \n    return normalized\n\n\ndef set_random_seed(seed: int):\n    \"\"\"设置随机种子以确保可复现性\"\"\"\n    torch.manual_seed(seed)\n    np.random.seed(seed)\n    if torch.cuda.is_available():\n        torch.cuda.manual_seed_all(seed)\n\n\ndef get_device() -> torch.device:\n    \"\"\"获取最佳可用设备\"\"\"\n    if torch.cuda.is_available():\n        return torch.device(\"cuda\")\n    elif hasattr(torch.backends, \"mps\") and torch.backends.mps.is_available():\n        return torch.device(\"mps\")\n    else:\n        return torch.device(\"cpu\")\n"
  },
  {
    "path": "backend/app/alpha_mining/vm/__init__.py",
    "content": "\"\"\"\n因子执行器模块\n\n提供 FactorVM 栈式虚拟机，用于执行因子表达式。\n\"\"\"\n\nfrom .factor_vm import FactorVM\n\n__all__ = [\"FactorVM\"]\n"
  },
  {
    "path": "backend/app/alpha_mining/vm/factor_vm.py",
    "content": "\"\"\"\n因子表达式执行器（栈式虚拟机）\n\n使用栈式执行方式解析和执行因子表达式 token 序列。\n\n执行流程：\n1. 遍历 token 序列\n2. 如果是特征 token：将对应特征数据入栈\n3. 如果是操作符 token：弹出所需参数，执行操作，结果入栈\n4. 最终栈中应只剩一个结果\n\nReferences:\n- AlphaGPT upstream/model_core/vm.py\n\"\"\"\n\nimport torch\nfrom typing import List, Optional, Union\nimport logging\n\nfrom ..dsl.vocab import FactorVocab, DEFAULT_VOCAB\n\nlogger = logging.getLogger(__name__)\n\n\nclass FactorVM:\n    \"\"\"\n    因子表达式栈式虚拟机\n    \n    执行因子表达式 token 序列，返回计算结果。\n    \n    Example:\n        vm = FactorVM()\n        # features: [batch, num_features, time_steps]\n        # formula: [0, 1, 6] 表示 ADD(RET, VOL)\n        result = vm.execute([0, 1, 6], features)\n    \"\"\"\n    \n    def __init__(self, vocab: Optional[FactorVocab] = None):\n        \"\"\"\n        初始化虚拟机\n        \n        Args:\n            vocab: 词汇表实例，默认使用 DEFAULT_VOCAB\n        \"\"\"\n        self.vocab = vocab or DEFAULT_VOCAB\n    \n    def execute(\n        self, \n        formula: List[int], \n        features: torch.Tensor\n    ) -> Optional[torch.Tensor]:\n        \"\"\"\n        执行因子表达式\n        \n        Args:\n            formula: token 序列，如 [0, 1, 6] 表示 ADD(RET, VOL)\n            features: 特征张量，形状 [batch, num_features, time_steps]\n            \n        Returns:\n            因子值张量 [batch, time_steps]，如果表达式无效则返回 None\n            \n        Note:\n            - 如果堆栈溢出/不足，返回 None\n            - 如果结果包含 NaN/Inf，会自动替换为 0\n            - 如果最终堆栈不是恰好一个元素，返回 None\n        \"\"\"\n        stack: List[torch.Tensor] = []\n        \n        try:\n            for token in formula:\n                token = int(token)\n                \n                if self.vocab.is_feature(token):\n                    # 特征 token：从特征张量中取出对应特征\n                    if token >= features.shape[1]:\n                        logger.debug(f\"Feature index {token} out of range\")\n                        return None\n                    stack.append(features[:, token, :])\n                    \n                elif self.vocab.is_operator(token):\n                    # 操作符 token：执行操作\n                    arity = self.vocab.get_operator_arity(token)\n                    \n                    # 检查堆栈是否有足够参数\n                    if len(stack) < arity:\n                        logger.debug(f\"Stack underflow: need {arity}, have {len(stack)}\")\n                        return None\n                    \n                    # 弹出参数（注意顺序：先弹出的是后入的）\n                    args = []\n                    for _ in range(arity):\n                        args.append(stack.pop())\n                    args.reverse()  # 恢复正确顺序\n                    \n                    # 执行操作\n                    func = self.vocab.get_operator_func(token)\n                    result = func(*args)\n                    \n                    # 处理 NaN 和 Inf\n                    if torch.isnan(result).any() or torch.isinf(result).any():\n                        result = torch.nan_to_num(\n                            result, \n                            nan=0.0, \n                            posinf=1.0, \n                            neginf=-1.0\n                        )\n                    \n                    stack.append(result)\n                    \n                else:\n                    # 未知 token\n                    logger.debug(f\"Unknown token: {token}\")\n                    return None\n            \n            # 检查最终堆栈状态\n            if len(stack) == 1:\n                return stack[0]\n            else:\n                logger.debug(f\"Invalid stack state: {len(stack)} elements remaining\")\n                return None\n                \n        except Exception as e:\n            logger.debug(f\"Execution error: {e}\")\n            return None\n    \n    def decode(self, formula: List[int]) -> str:\n        \"\"\"\n        将 token 序列解码为人类可读的表达式字符串\n        \n        使用逆波兰表达式解析，转换为前缀表示法（函数调用形式）\n        \n        Args:\n            formula: token 序列\n            \n        Returns:\n            人类可读的表达式，如 \"ADD(RET, VOL)\"\n            \n        Example:\n            vm.decode([0, 1, 6])  # -> \"ADD(RET, VOL)\"\n            vm.decode([0, 4])    # -> \"NEG(RET)\"\n        \"\"\"\n        stack: List[str] = []\n        \n        try:\n            for token in formula:\n                token = int(token)\n                \n                if self.vocab.is_feature(token):\n                    # 特征：直接入栈名称\n                    name = self.vocab.token_to_name(token)\n                    stack.append(name)\n                    \n                elif self.vocab.is_operator(token):\n                    # 操作符：弹出参数，构建表达式\n                    name = self.vocab.token_to_name(token)\n                    arity = self.vocab.get_operator_arity(token)\n                    \n                    if len(stack) < arity:\n                        return f\"<INVALID: stack underflow at {name}>\"\n                    \n                    args = []\n                    for _ in range(arity):\n                        args.append(stack.pop())\n                    args.reverse()\n                    \n                    # 构建函数调用形式\n                    expr = f\"{name}({', '.join(args)})\"\n                    stack.append(expr)\n                    \n                else:\n                    return f\"<INVALID: unknown token {token}>\"\n            \n            if len(stack) == 1:\n                return stack[0]\n            elif len(stack) == 0:\n                return \"<EMPTY>\"\n            else:\n                # 多个元素：用逗号连接\n                return f\"<INCOMPLETE: {', '.join(stack)}>\"\n                \n        except Exception as e:\n            return f\"<ERROR: {e}>\"\n    \n    def validate(self, formula: List[int]) -> bool:\n        \"\"\"\n        验证因子表达式是否语法正确\n        \n        使用模拟执行（不实际计算）来验证。\n        \n        Args:\n            formula: token 序列\n            \n        Returns:\n            True 如果表达式语法正确\n        \"\"\"\n        stack_depth = 0\n        \n        try:\n            for token in formula:\n                token = int(token)\n                \n                if self.vocab.is_feature(token):\n                    stack_depth += 1\n                elif self.vocab.is_operator(token):\n                    arity = self.vocab.get_operator_arity(token)\n                    if stack_depth < arity:\n                        return False\n                    stack_depth -= arity\n                    stack_depth += 1  # 操作结果\n                else:\n                    return False\n            \n            return stack_depth == 1\n            \n        except Exception:\n            return False\n    \n    def get_required_features(self, formula: List[int]) -> List[int]:\n        \"\"\"\n        获取表达式中使用的特征列表\n        \n        Args:\n            formula: token 序列\n            \n        Returns:\n            使用的特征 token 列表（去重）\n        \"\"\"\n        features = []\n        for token in formula:\n            token = int(token)\n            if self.vocab.is_feature(token) and token not in features:\n                features.append(token)\n        return features\n"
  },
  {
    "path": "backend/app/api/__init__.py",
    "content": "\"\"\"\nAPI模块\n\"\"\"\n\n"
  },
  {
    "path": "backend/app/api/v1/__init__.py",
    "content": "\"\"\"\nAPI v1 模块\n\"\"\"\nfrom fastapi import APIRouter\nfrom . import analysis, tasks, llm_config, stocks, agents, debug, knowledge_graph\nfrom . import news  # 原有的新闻 API（数据库操作）\nfrom . import news_v2  # 新版 API（Provider-Fetcher 实时获取）\nfrom . import alpha_mining  # 因子挖掘 API\n\n# 创建主路由器\napi_router = APIRouter()\n\n# 注册子路由\napi_router.include_router(news.router, prefix=\"/news\", tags=[\"news\"])  # 原有端点\napi_router.include_router(news_v2.router, prefix=\"/news/v2\", tags=[\"news-v2\"])  # 新版端点\napi_router.include_router(analysis.router, prefix=\"/analysis\", tags=[\"analysis\"])\napi_router.include_router(tasks.router, prefix=\"/tasks\", tags=[\"tasks\"])\napi_router.include_router(llm_config.router, prefix=\"/llm\", tags=[\"llm\"])\napi_router.include_router(stocks.router, prefix=\"/stocks\", tags=[\"stocks\"])  # Phase 2: 个股分析\napi_router.include_router(agents.router, prefix=\"/agents\", tags=[\"agents\"])  # Phase 2: 智能体监控\napi_router.include_router(debug.router, prefix=\"/debug\", tags=[\"debug\"])  # 调试工具\napi_router.include_router(knowledge_graph.router, prefix=\"/knowledge-graph\", tags=[\"knowledge-graph\"])  # 知识图谱\napi_router.include_router(alpha_mining.router)  # 因子挖掘\n\n__all__ = [\"api_router\"]\n\n"
  },
  {
    "path": "backend/app/api/v1/agents.py",
    "content": "\"\"\"\n智能体 API 路由 - Phase 2\n提供辩论功能、执行日志、性能监控等接口\n\"\"\"\nimport logging\nimport json\nimport asyncio\nfrom datetime import datetime, timedelta\nfrom typing import List, Optional, Dict, Any, AsyncGenerator\nfrom fastapi import APIRouter, Depends, HTTPException, Query, Body\nfrom fastapi.responses import StreamingResponse\nfrom pydantic import BaseModel, Field\nfrom sqlalchemy.ext.asyncio import AsyncSession\nfrom sqlalchemy import select, func, desc, or_\n\nfrom ...core.database import get_db\nfrom ...models.news import News\nfrom ...models.analysis import Analysis\nfrom ...agents import (\n    create_debate_workflow,\n    create_orchestrator,\n    create_data_collector\n)\nfrom ...services.llm_service import get_llm_provider\nfrom ...services.stock_data_service import stock_data_service\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n\n# ============ 多语言提示词辅助函数 ============\n\ndef get_prompts(language: str = \"zh\") -> Dict[str, str]:\n    \"\"\"获取多语言提示词\"\"\"\n    if language == \"en\":\n        return {\n            \"quick_analyst_system\": \"You are a professional stock analyst, skilled in quick analysis and decision-making.\",\n            \"quick_analysis_prompt\": \"\"\"Please provide a quick investment analysis for {stock_name}({stock_code}).\n\nBackground:\n{context}\n\nRelated News:\n{news}\n\nPlease quickly provide:\n1. Core Viewpoint (one sentence)\n2. Bullish Factors (3 points)\n3. Bearish Factors (3 points)\n4. Investment Recommendation (Buy/Hold/Sell)\n5. Risk Warning\"\"\",\n            \"data_collector_content\": \"📊 Collected relevant data for {stock_name}: {count} news items, financial data ready.\\n\\nDebate will begin in {rounds} rounds.\",\n            \"bull_system\": \"You are a bullish researcher, skilled at analyzing stocks from a positive perspective. When answering user questions, maintain an optimistic but rational attitude.\",\n            \"bear_system\": \"You are a bearish researcher, skilled at identifying risks. When answering user questions, remain cautious and focus on potential risks.\",\n            \"manager_system\": \"You are an experienced investment manager, skilled at comprehensive analysis and providing investment advice. Answer user questions objectively and professionally.\",\n            \"phase_start\": \"Starting {mode} mode analysis\",\n            \"phase_analyzing\": \"Quick analyst is analyzing...\",\n            \"phase_data_collection\": \"Data Collector is gathering materials...\",\n            \"role_quick_analyst\": \"Quick Analyst\",\n            \"role_data_collector\": \"Data Collector\",\n            \"round_debate\": \"Round {round}/{max_rounds} debate\",\n            \"role_bull\": \"Bull Researcher\",\n            \"role_bear\": \"Bear Researcher\",\n            \"bull_first_round\": \"\"\"You are a bullish researcher participating in a bull vs bear debate about {stock_name}({stock_code}).\n\nBackground: {context}\nNews: {news}\n\nThis is Round 1. Please make an opening statement (about 150 words):\n1. State your core bullish view\n2. Provide 2-3 key arguments\"\"\",\n            \"bull_subsequent_rounds\": \"\"\"You are a bullish researcher debating with a bearish researcher about {stock_name}.\n\nThis is Round {round}.\n\nThe bearish researcher just said:\n\"{bear_last_statement}\"\n\nPlease refute the opponent's arguments and add new points (about 120 words):\n1. Point out flaws in the opponent's arguments\n2. Add new bullish reasons\"\"\",\n            \"bear_first_round\": \"\"\"You are a bearish researcher participating in a bull vs bear debate about {stock_name}({stock_code}).\n\nBackground: {context}\nNews: {news}\n\nThis is Round 1. Please make an opening statement (about 150 words):\n1. State your core bearish view\n2. Provide 2-3 key risk points\"\"\",\n            \"bear_subsequent_rounds\": \"\"\"You are a bearish researcher debating with a bullish researcher about {stock_name}.\n\nThis is Round {round}.\n\nThe bullish researcher just said:\n\"{bull_last_statement}\"\n\nPlease refute the opponent's arguments and add new points (about 120 words):\n1. Point out flaws in the opponent's arguments\n2. Add new risk points\"\"\",\n            \"manager_decision\": \"\"\"You are an investment manager synthesizing the debate between bullish and bearish researchers to make a final investment decision.\n\nStock: {stock_name}({stock_code})\n\nBullish Researcher's View:\n{bull_analysis}\n\nBearish Researcher's View:\n{bear_analysis}\n\nPlease provide the final decision (about 200 words):\n1. Comprehensive evaluation of both views\n2. Investment recommendation (Strongly Recommend/Recommend/Neutral/Avoid/Caution)\n3. Reasoning and risk warnings\"\"\",\n        }\n    else:  # zh (default)\n        return {\n            \"quick_analyst_system\": \"你是一位专业的股票分析师，擅长快速分析和决策。\",\n            \"quick_analysis_prompt\": \"\"\"请对 {stock_name}({stock_code}) 进行快速投资分析。\n\n背景资料:\n{context}\n\n相关新闻:\n{news}\n\n请快速给出：\n1. 核心观点（一句话）\n2. 看多因素（3点）\n3. 看空因素（3点）\n4. 投资建议（买入/持有/卖出）\n5. 风险提示\"\"\",\n            \"data_collector_content\": \"📊 已搜集 {stock_name} 的相关数据：{count} 条新闻，财务数据已就绪。\\n\\n辩论即将开始，共 {rounds} 轮。\",\n            \"bull_system\": \"你是一位看多研究员，擅长从积极角度分析股票。回答用户问题时保持乐观但理性的态度。\",\n            \"bear_system\": \"你是一位看空研究员，擅长发现风险。回答用户问题时保持谨慎，重点指出潜在风险。\",\n            \"manager_system\": \"你是一位经验丰富的投资经理，擅长综合分析和给出投资建议。回答用户问题时客观、专业。\",\n            \"phase_start\": \"开始{mode}模式分析\",\n            \"phase_analyzing\": \"快速分析师正在分析...\",\n            \"phase_data_collection\": \"数据专员正在搜集资料...\",\n            \"role_quick_analyst\": \"快速分析师\",\n            \"role_data_collector\": \"数据专员\",\n            \"round_debate\": \"第 {round}/{max_rounds} 轮辩论\",\n            \"role_bull\": \"看多研究员\",\n            \"role_bear\": \"看空研究员\",\n            \"bull_first_round\": \"\"\"你是看多研究员，正在参与关于 {stock_name}({stock_code}) 的多空辩论。\n\n背景资料: {context}\n新闻: {news}\n\n这是第1轮辩论，请做开场陈述（约150字）：\n1. 表明你的核心看多观点\n2. 给出2-3个关键论据\"\"\",\n            \"bull_subsequent_rounds\": \"\"\"你是看多研究员，正在与看空研究员辩论 {stock_name}。\n\n这是第{round}轮辩论。\n\n对方（看空研究员）刚才说：\n\"{bear_last_statement}\"\n\n请反驳对方观点并补充新论据（约120字）：\n1. 指出对方论据的漏洞\n2. 补充新的看多理由\"\"\",\n            \"bear_first_round\": \"\"\"你是看空研究员，正在参与关于 {stock_name}({stock_code}) 的多空辩论。\n\n背景资料: {context}\n新闻: {news}\n\n这是第1轮辩论，请做开场陈述（约150字）：\n1. 表明你的核心看空观点\n2. 给出2-3个关键风险点\"\"\",\n            \"bear_subsequent_rounds\": \"\"\"你是看空研究员，正在与看多研究员辩论 {stock_name}。\n\n这是第{round}轮辩论。\n\n对方（看多研究员）刚才说：\n\"{bull_last_statement}\"\n\n请反驳对方观点并补充新论据（约120字）：\n1. 指出对方论据的漏洞\n2. 补充新的风险点\"\"\",\n            \"manager_decision\": \"\"\"你是投资经理，正在综合看多和看空研究员的辩论，做出最终投资决策。\n\n股票: {stock_name}({stock_code})\n\n看多研究员观点:\n{bull_analysis}\n\n看空研究员观点:\n{bear_analysis}\n\n请给出最终决策（约200字）：\n1. 综合评估双方观点\n2. 给出投资建议（强烈推荐/推荐/中性/回避/谨慎）\n3. 说明理由和风险提示\"\"\",\n        }\n\n\n# ============ 模拟数据存储（生产环境应使用数据库） ============\n\n# 存储执行日志\nexecution_logs: List[Dict[str, Any]] = []\n\n# 存储辩论结果\ndebate_results: Dict[str, Dict[str, Any]] = {}\n\n\n# ============ Pydantic 模型 ============\n\nclass DebateRequest(BaseModel):\n    \"\"\"辩论请求\"\"\"\n    stock_code: str = Field(..., description=\"股票代码\")\n    stock_name: Optional[str] = Field(None, description=\"股票名称\")\n    context: Optional[str] = Field(None, description=\"额外背景信息\")\n    provider: Optional[str] = Field(None, description=\"LLM提供商\")\n    model: Optional[str] = Field(None, description=\"模型名称\")\n    mode: Optional[str] = Field(\"parallel\", description=\"辩论模式: parallel, realtime_debate, quick_analysis\")\n    language: Optional[str] = Field(\"zh\", description=\"语言设置: zh=中文, en=英文\")\n\n\nclass DebateResponse(BaseModel):\n    \"\"\"辩论响应\"\"\"\n    success: bool\n    debate_id: Optional[str] = None\n    stock_code: str\n    stock_name: Optional[str] = None\n    mode: Optional[str] = None  # 辩论模式\n    bull_analysis: Optional[Dict[str, Any]] = None\n    bear_analysis: Optional[Dict[str, Any]] = None\n    final_decision: Optional[Dict[str, Any]] = None\n    quick_analysis: Optional[Dict[str, Any]] = None  # 快速分析结果\n    debate_history: Optional[List[Dict[str, Any]]] = None  # 实时辩论历史\n    trajectory: Optional[List[Dict[str, Any]]] = None\n    execution_time: Optional[float] = None\n    error: Optional[str] = None\n\n\nclass AgentLogEntry(BaseModel):\n    \"\"\"智能体日志条目\"\"\"\n    id: str\n    timestamp: str\n    agent_name: str\n    agent_role: Optional[str] = None\n    action: str\n    status: str  # \"started\", \"completed\", \"failed\"\n    details: Optional[Dict[str, Any]] = None\n    execution_time: Optional[float] = None\n\n\nclass AgentMetrics(BaseModel):\n    \"\"\"智能体性能指标\"\"\"\n    total_executions: int\n    successful_executions: int\n    failed_executions: int\n    avg_execution_time: float\n    agent_stats: Dict[str, Dict[str, Any]]\n    recent_activity: List[Dict[str, Any]]\n\n\nclass TrajectoryStep(BaseModel):\n    \"\"\"执行轨迹步骤\"\"\"\n    step_id: str\n    step_name: str\n    timestamp: str\n    agent_name: Optional[str] = None\n    input_data: Optional[Dict[str, Any]] = None\n    output_data: Optional[Dict[str, Any]] = None\n    duration: Optional[float] = None\n    status: str\n\n\nclass SearchPlanRequest(BaseModel):\n    \"\"\"生成搜索计划请求\"\"\"\n    query: str\n    stock_code: str\n    stock_name: Optional[str] = None\n\n\nclass SearchExecuteRequest(BaseModel):\n    \"\"\"执行搜索计划请求\"\"\"\n    plan: Dict[str, Any]  # 完整的 SearchPlan 对象\n\n\n# ============ API 端点 ============\n\n@router.post(\"/debate\", response_model=DebateResponse)\nasync def run_stock_debate(\n    request: DebateRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    触发股票辩论分析（Bull vs Bear）\n    \n    - **stock_code**: 股票代码\n    - **stock_name**: 股票名称（可选）\n    - **context**: 额外背景信息（可选）\n    - **provider**: LLM提供商（可选）\n    - **model**: 模型名称（可选）\n    \"\"\"\n    logger.info(f\"🎯 收到辩论请求: stock_code={request.stock_code}, stock_name={request.stock_name}\")\n    \n    start_time = datetime.utcnow()\n    debate_id = f\"debate_{start_time.strftime('%Y%m%d%H%M%S')}_{request.stock_code}\"\n    \n    try:\n        # 记录开始\n        log_entry = {\n            \"id\": debate_id,\n            \"timestamp\": start_time.isoformat(),\n            \"agent_name\": \"DebateWorkflow\",\n            \"action\": \"debate_start\",\n            \"status\": \"started\",\n            \"details\": {\n                \"stock_code\": request.stock_code,\n                \"stock_name\": request.stock_name\n            }\n        }\n        execution_logs.append(log_entry)\n        \n        # 标准化股票代码\n        code = request.stock_code.upper()\n        if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n            short_code = code[2:]\n        else:\n            short_code = code\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        logger.info(f\"🔍 查询股票 {code} 的关联新闻...\")\n        \n        # 获取关联新闻 - 使用 PostgreSQL 原生 ARRAY 查询语法\n        from sqlalchemy import text\n        stock_codes_filter = text(\n            \"stock_codes @> ARRAY[:code1]::varchar[] OR stock_codes @> ARRAY[:code2]::varchar[]\"\n        ).bindparams(code1=short_code, code2=code)\n        \n        news_query = select(News).where(stock_codes_filter).order_by(desc(News.publish_time)).limit(10)\n        \n        result = await db.execute(news_query)\n        news_list = result.scalars().all()\n        \n        logger.info(f\"📰 找到 {len(news_list)} 条关联新闻\")\n        \n        news_data = [\n            {\n                \"id\": n.id,\n                \"title\": n.title,\n                \"content\": n.content[:500],\n                \"sentiment_score\": n.sentiment_score,\n                \"publish_time\": n.publish_time.isoformat() if n.publish_time else None\n            }\n            for n in news_list\n        ]\n        \n        # 如果没有关联新闻，给出警告\n        if not news_data:\n            logger.warning(f\"⚠️ 股票 {code} 没有关联新闻，辩论将基于空数据进行\")\n        \n        # 获取财务数据和资金流向（用于增强辩论上下文）\n        logger.info(f\"📊 获取 {code} 的财务数据和资金流向...\")\n        try:\n            debate_context = await stock_data_service.get_debate_context(code)\n            akshare_context = debate_context.get(\"summary\", \"\")\n            logger.info(f\"📊 获取到额外数据: {akshare_context[:100]}...\")\n        except Exception as e:\n            logger.warning(f\"⚠️ 获取财务数据失败: {e}\")\n            akshare_context = \"\"\n        \n        # 合并用户提供的上下文和 akshare 数据\n        full_context = \"\"\n        if request.context:\n            full_context += f\"【用户补充信息】\\n{request.context}\\n\\n\"\n        if akshare_context:\n            full_context += f\"【实时数据】\\n{akshare_context}\"\n        \n        # 创建 LLM provider（如果指定了自定义配置）\n        llm_provider = None\n        if request.provider or request.model:\n            logger.info(f\"🤖 使用自定义模型: provider={request.provider}, model={request.model}\")\n            llm_provider = get_llm_provider(\n                provider=request.provider,\n                model=request.model\n            )\n        else:\n            logger.info(\"🤖 使用默认 LLM 配置\")\n        \n        # 选择辩论模式\n        mode = request.mode or \"parallel\"\n        logger.info(f\"⚔️ 开始辩论工作流，模式: {mode}\")\n        \n        if mode == \"parallel\":\n            # 使用原有的并行工作流\n            workflow = create_debate_workflow(llm_provider)\n            debate_result = await workflow.run_debate(\n                stock_code=code,\n                stock_name=request.stock_name or code,\n                news_list=news_data,\n                context=full_context\n            )\n        else:\n            # 使用新的编排器（支持 realtime_debate 和 quick_analysis）\n            orchestrator = create_orchestrator(mode=mode, llm_provider=llm_provider)\n            debate_result = await orchestrator.run(\n                stock_code=code,\n                stock_name=request.stock_name or code,\n                context=full_context,\n                news_list=news_data\n            )\n        \n        end_time = datetime.utcnow()\n        execution_time = (end_time - start_time).total_seconds()\n        \n        # 存储结果\n        debate_results[debate_id] = debate_result\n        \n        # 记录完成\n        log_entry = {\n            \"id\": f\"{debate_id}_complete\",\n            \"timestamp\": end_time.isoformat(),\n            \"agent_name\": \"DebateWorkflow\",\n            \"action\": \"debate_complete\",\n            \"status\": \"completed\" if debate_result.get(\"success\") else \"failed\",\n            \"details\": {\n                \"stock_code\": request.stock_code,\n                \"rating\": debate_result.get(\"final_decision\", {}).get(\"rating\", \"unknown\")\n            },\n            \"execution_time\": execution_time\n        }\n        execution_logs.append(log_entry)\n        \n        if debate_result.get(\"success\"):\n            return DebateResponse(\n                success=True,\n                debate_id=debate_id,\n                stock_code=code,\n                stock_name=request.stock_name,\n                mode=mode,\n                bull_analysis=debate_result.get(\"bull_analysis\"),\n                bear_analysis=debate_result.get(\"bear_analysis\"),\n                final_decision=debate_result.get(\"final_decision\"),\n                quick_analysis=debate_result.get(\"quick_analysis\"),\n                debate_history=debate_result.get(\"debate_history\"),\n                trajectory=debate_result.get(\"trajectory\"),\n                execution_time=execution_time\n            )\n        else:\n            return DebateResponse(\n                success=False,\n                debate_id=debate_id,\n                stock_code=code,\n                mode=mode,\n                error=debate_result.get(\"error\", \"Unknown error\")\n            )\n    \n    except Exception as e:\n        logger.error(f\"Debate failed: {e}\", exc_info=True)\n        \n        # 记录失败\n        log_entry = {\n            \"id\": f\"{debate_id}_error\",\n            \"timestamp\": datetime.utcnow().isoformat(),\n            \"agent_name\": \"DebateWorkflow\",\n            \"action\": \"debate_error\",\n            \"status\": \"failed\",\n            \"details\": {\"error\": str(e)}\n        }\n        execution_logs.append(log_entry)\n        \n        return DebateResponse(\n            success=False,\n            debate_id=debate_id,\n            stock_code=request.stock_code,\n            error=str(e)\n        )\n\n\n# ============ SSE 流式辩论 ============\n\nasync def generate_debate_stream(\n    stock_code: str,\n    stock_name: str,\n    mode: str,\n    context: str,\n    news_data: List[Dict],\n    llm_provider,\n    language: str = \"zh\"\n) -> AsyncGenerator[str, None]:\n    \"\"\"\n    生成辩论的 SSE 流\n    \n    事件类型:\n    - phase: 阶段变化\n    - agent: 智能体发言\n    - progress: 进度更新\n    - result: 最终结果\n    - error: 错误信息\n    \"\"\"\n    debate_id = f\"debate_{datetime.utcnow().strftime('%Y%m%d%H%M%S')}\"\n    prompts = get_prompts(language)\n    \n    def sse_event(event_type: str, data: Dict) -> str:\n        \"\"\"格式化 SSE 事件\"\"\"\n        return f\"event: {event_type}\\ndata: {json.dumps(data, ensure_ascii=False)}\\n\\n\"\n    \n    try:\n        # 发送开始事件\n        yield sse_event(\"phase\", {\n            \"phase\": \"start\",\n            \"message\": prompts[\"phase_start\"].format(mode=mode),\n            \"debate_id\": debate_id\n        })\n        \n        if mode == \"quick_analysis\":\n            # 快速分析模式 - 使用流式输出\n            yield sse_event(\"phase\", {\"phase\": \"analyzing\", \"message\": prompts[\"phase_analyzing\"]})\n            \n            news_titles = json.dumps([n.get('title', '') for n in news_data[:5]], ensure_ascii=False)\n            prompt = prompts[\"quick_analysis_prompt\"].format(\n                stock_name=stock_name,\n                stock_code=stock_code,\n                context=context[:2000],\n                news=news_titles\n            )\n            \n            messages = [\n                {\"role\": \"system\", \"content\": prompts[\"quick_analyst_system\"]},\n                {\"role\": \"user\", \"content\": prompt}\n            ]\n            \n            full_response = \"\"\n            for chunk in llm_provider.stream(messages):\n                full_response += chunk\n                yield sse_event(\"agent\", {\n                    \"agent\": \"QuickAnalyst\",\n                    \"role\": prompts[\"role_quick_analyst\"],\n                    \"content\": chunk,\n                    \"is_chunk\": True\n                })\n                await asyncio.sleep(0)  # 让出控制权\n            \n            # 发送完成事件\n            yield sse_event(\"result\", {\n                \"success\": True,\n                \"mode\": mode,\n                \"quick_analysis\": {\n                    \"analysis\": full_response,\n                    \"success\": True\n                },\n                \"execution_time\": 0\n            })\n            \n        elif mode == \"realtime_debate\":\n            # 实时辩论模式 - 多轮交锋\n            max_rounds = 3  # 最大辩论轮数\n            \n            yield sse_event(\"phase\", {\"phase\": \"data_collection\", \"message\": prompts[\"phase_data_collection\"]})\n            await asyncio.sleep(0.3)\n            \n            # 数据搜集\n            yield sse_event(\"agent\", {\n                \"agent\": \"DataCollector\",\n                \"role\": prompts[\"role_data_collector\"],\n                \"content\": prompts[\"data_collector_content\"].format(\n                    stock_name=stock_name,\n                    count=len(news_data),\n                    rounds=max_rounds\n                ),\n                \"is_chunk\": False\n            })\n            \n            # 辩论历史（用于上下文）\n            debate_history = []\n            bull_full = \"\"\n            bear_full = \"\"\n            \n            # 多轮辩论\n            for round_num in range(1, max_rounds + 1):\n                yield sse_event(\"phase\", {\n                    \"phase\": \"debate\",\n                    \"message\": prompts[\"round_debate\"].format(round=round_num, max_rounds=max_rounds),\n                    \"round\": round_num,\n                    \"max_rounds\": max_rounds\n                })\n                \n                # === Bull 发言 ===\n                yield sse_event(\"agent\", {\n                    \"agent\": \"BullResearcher\",\n                    \"role\": prompts[\"role_bull\"],\n                    \"content\": \"\",\n                    \"is_start\": True,\n                    \"round\": round_num\n                })\n                \n                if round_num == 1:\n                    # 第一轮：开场陈述\n                    news_titles = json.dumps([n.get('title', '') for n in news_data[:3]], ensure_ascii=False)\n                    bull_prompt = prompts[\"bull_first_round\"].format(\n                        stock_name=stock_name,\n                        stock_code=stock_code,\n                        context=context[:800],\n                        news=news_titles\n                    )\n                else:\n                    # 后续轮次：反驳对方\n                    last_bear = debate_history[-1][\"content\"] if debate_history else \"\"\n                    bull_prompt = prompts[\"bull_subsequent_rounds\"].format(\n                        stock_name=stock_name,\n                        round=round_num,\n                        bear_last_statement=last_bear[:300]\n                    )\n                \n                bull_system_msg = prompts[\"bull_system\"] if language == \"en\" else \"你是一位辩论中的看多研究员。言简意赅，有理有据，语气自信但不傲慢。\"\n                bull_messages = [\n                    {\"role\": \"system\", \"content\": bull_system_msg},\n                    {\"role\": \"user\", \"content\": bull_prompt}\n                ]\n                \n                bull_response = \"\"\n                for chunk in llm_provider.stream(bull_messages):\n                    bull_response += chunk\n                    yield sse_event(\"agent\", {\n                        \"agent\": \"BullResearcher\",\n                        \"role\": \"看多研究员\",\n                        \"content\": chunk,\n                        \"is_chunk\": True,\n                        \"round\": round_num\n                    })\n                    await asyncio.sleep(0)\n                \n                round_marker = f\"\\n\\n**【Round {round_num}】**\\n\" if language == \"en\" else f\"\\n\\n**【第{round_num}轮】**\\n\"\n                bull_full += round_marker + bull_response\n                debate_history.append({\"agent\": \"Bull\", \"round\": round_num, \"content\": bull_response})\n                \n                yield sse_event(\"agent\", {\n                    \"agent\": \"BullResearcher\",\n                    \"role\": prompts[\"role_bull\"],\n                    \"content\": \"\",\n                    \"is_end\": True,\n                    \"round\": round_num\n                })\n                \n                # === Bear 发言（反驳） ===\n                yield sse_event(\"agent\", {\n                    \"agent\": \"BearResearcher\",\n                    \"role\": prompts[\"role_bear\"],\n                    \"content\": \"\",\n                    \"is_start\": True,\n                    \"round\": round_num\n                })\n                \n                if round_num == 1:\n                    news_titles = json.dumps([n.get('title', '') for n in news_data[:3]], ensure_ascii=False)\n                    bear_prompt = prompts[\"bear_first_round\"].format(\n                        stock_name=stock_name,\n                        stock_code=stock_code,\n                        context=context[:800],\n                        news=news_titles\n                    )\n                else:\n                    bear_prompt = prompts[\"bear_subsequent_rounds\"].format(\n                        stock_name=stock_name,\n                        round=round_num,\n                        bull_last_statement=bull_response[:300]\n                    )\n                \n                bear_system_msg = prompts[\"bear_system\"] if language == \"en\" else \"你是一位辩论中的看空研究员。言简意赅，善于发现风险，语气谨慎但有说服力。\"\n                bear_messages = [\n                    {\"role\": \"system\", \"content\": bear_system_msg},\n                    {\"role\": \"user\", \"content\": bear_prompt}\n                ]\n                \n                bear_response = \"\"\n                for chunk in llm_provider.stream(bear_messages):\n                    bear_response += chunk\n                    yield sse_event(\"agent\", {\n                        \"agent\": \"BearResearcher\",\n                        \"role\": prompts[\"role_bear\"],\n                        \"content\": chunk,\n                        \"is_chunk\": True,\n                        \"round\": round_num\n                    })\n                    await asyncio.sleep(0)\n                \n                bear_full += round_marker + bear_response\n                debate_history.append({\"agent\": \"Bear\", \"round\": round_num, \"content\": bear_response})\n                \n                yield sse_event(\"agent\", {\n                    \"agent\": \"BearResearcher\",\n                    \"role\": prompts[\"role_bear\"],\n                    \"content\": \"\",\n                    \"is_end\": True,\n                    \"round\": round_num\n                })\n            \n            # === 投资经理总结决策 ===\n            decision_msg = \"Debate ended, Investment Manager is making final decision...\" if language == \"en\" else \"辩论结束，投资经理正在做最终决策...\"\n            yield sse_event(\"phase\", {\"phase\": \"decision\", \"message\": decision_msg})\n            \n            manager_role = \"Investment Manager\" if language == \"en\" else \"投资经理\"\n            yield sse_event(\"agent\", {\n                \"agent\": \"InvestmentManager\",\n                \"role\": manager_role,\n                \"content\": \"\",\n                \"is_start\": True\n            })\n            \n            # 整理辩论历史\n            debate_summary = \"\\n\".join([\n                f\"【第{h['round']}轮-{'看多' if h['agent']=='Bull' else '看空'}】{h['content'][:150]}...\"\n                for h in debate_history\n            ])\n            \n            decision_prompt = prompts[\"manager_decision\"].format(\n                stock_name=stock_name,\n                stock_code=stock_code,\n                bull_analysis=bull_full[:1000],\n                bear_analysis=bear_full[:1000]\n            )\n            \n            manager_system_msg = prompts[\"manager_system\"] if language == \"en\" else \"你是一位经验丰富的投资经理，善于在多空观点中做出理性决策。\"\n            decision_messages = [\n                {\"role\": \"system\", \"content\": manager_system_msg},\n                {\"role\": \"user\", \"content\": decision_prompt}\n            ]\n            \n            decision = \"\"\n            for chunk in llm_provider.stream(decision_messages):\n                decision += chunk\n                yield sse_event(\"agent\", {\n                    \"agent\": \"InvestmentManager\",\n                    \"role\": manager_role,\n                    \"content\": chunk,\n                    \"is_chunk\": True\n                })\n                await asyncio.sleep(0)\n            \n            yield sse_event(\"agent\", {\n                \"agent\": \"InvestmentManager\",\n                \"role\": manager_role,\n                \"content\": \"\",\n                \"is_end\": True\n            })\n            \n            # 提取评级\n            if language == \"en\":\n                rating = \"Neutral\"\n                for r in [\"Strongly Recommend\", \"Recommend\", \"Neutral\", \"Caution\", \"Avoid\"]:\n                    if r in decision:\n                        rating = r\n                        break\n            else:\n                rating = \"中性\"\n                for r in [\"强烈推荐\", \"推荐\", \"中性\", \"谨慎\", \"回避\"]:\n                    if r in decision:\n                        rating = r\n                        break\n            \n            # 发送完成事件\n            yield sse_event(\"result\", {\n                \"success\": True,\n                \"mode\": mode,\n                \"debate_id\": debate_id,\n                \"total_rounds\": max_rounds,\n                \"bull_analysis\": {\"analysis\": bull_full.strip(), \"success\": True, \"agent_name\": \"BullResearcher\", \"agent_role\": prompts[\"role_bull\"]},\n                \"bear_analysis\": {\"analysis\": bear_full.strip(), \"success\": True, \"agent_name\": \"BearResearcher\", \"agent_role\": prompts[\"role_bear\"]},\n                \"final_decision\": {\"decision\": decision, \"rating\": rating, \"success\": True, \"agent_name\": \"InvestmentManager\", \"agent_role\": manager_role},\n                \"debate_history\": debate_history\n            })\n            \n        else:\n            # parallel 模式 - 也使用流式，但并行展示\n            yield sse_event(\"phase\", {\"phase\": \"parallel_analysis\", \"message\": \"Bull/Bear 并行分析中...\"})\n            \n            # 由于是并行，我们交替输出\n            bull_prompt = f\"\"\"你是看多研究员，请从积极角度分析 {stock_name}({stock_code})：\n背景资料: {context[:1500]}\n新闻: {json.dumps([n.get('title', '') for n in news_data[:5]], ensure_ascii=False)}\n请给出完整的看多分析报告。\"\"\"\n\n            bear_prompt = f\"\"\"你是看空研究员，请从风险角度分析 {stock_name}({stock_code})：\n背景资料: {context[:1500]}\n新闻: {json.dumps([n.get('title', '') for n in news_data[:5]], ensure_ascii=False)}\n请给出完整的看空分析报告。\"\"\"\n\n            # Bull 流式输出\n            yield sse_event(\"agent\", {\"agent\": \"BullResearcher\", \"role\": \"看多研究员\", \"content\": \"\", \"is_start\": True})\n            bull_analysis = \"\"\n            for chunk in llm_provider.stream([\n                {\"role\": \"system\", \"content\": \"你是一位乐观但理性的股票研究员。\"},\n                {\"role\": \"user\", \"content\": bull_prompt}\n            ]):\n                bull_analysis += chunk\n                yield sse_event(\"agent\", {\"agent\": \"BullResearcher\", \"role\": \"看多研究员\", \"content\": chunk, \"is_chunk\": True})\n                await asyncio.sleep(0)\n            yield sse_event(\"agent\", {\"agent\": \"BullResearcher\", \"role\": \"看多研究员\", \"content\": \"\", \"is_end\": True})\n            \n            # Bear 流式输出\n            yield sse_event(\"agent\", {\"agent\": \"BearResearcher\", \"role\": \"看空研究员\", \"content\": \"\", \"is_start\": True})\n            bear_analysis = \"\"\n            for chunk in llm_provider.stream([\n                {\"role\": \"system\", \"content\": \"你是一位谨慎的股票研究员。\"},\n                {\"role\": \"user\", \"content\": bear_prompt}\n            ]):\n                bear_analysis += chunk\n                yield sse_event(\"agent\", {\"agent\": \"BearResearcher\", \"role\": \"看空研究员\", \"content\": chunk, \"is_chunk\": True})\n                await asyncio.sleep(0)\n            yield sse_event(\"agent\", {\"agent\": \"BearResearcher\", \"role\": \"看空研究员\", \"content\": \"\", \"is_end\": True})\n            \n            # 投资经理决策\n            yield sse_event(\"phase\", {\"phase\": \"decision\", \"message\": \"投资经理决策中...\"})\n            yield sse_event(\"agent\", {\"agent\": \"InvestmentManager\", \"role\": \"投资经理\", \"content\": \"\", \"is_start\": True})\n            \n            decision_prompt = f\"\"\"综合以下多空观点，对 {stock_name} 做出投资决策：\n【看多】{bull_analysis[:800]}\n【看空】{bear_analysis[:800]}\n请给出评级[强烈推荐/推荐/中性/谨慎/回避]和决策理由。\"\"\"\n            \n            decision = \"\"\n            for chunk in llm_provider.stream([\n                {\"role\": \"system\", \"content\": \"你是投资经理。\"},\n                {\"role\": \"user\", \"content\": decision_prompt}\n            ]):\n                decision += chunk\n                yield sse_event(\"agent\", {\"agent\": \"InvestmentManager\", \"role\": \"投资经理\", \"content\": chunk, \"is_chunk\": True})\n                await asyncio.sleep(0)\n            yield sse_event(\"agent\", {\"agent\": \"InvestmentManager\", \"role\": \"投资经理\", \"content\": \"\", \"is_end\": True})\n            \n            rating = \"中性\"\n            for r in [\"强烈推荐\", \"推荐\", \"中性\", \"谨慎\", \"回避\"]:\n                if r in decision:\n                    rating = r\n                    break\n            \n            yield sse_event(\"result\", {\n                \"success\": True,\n                \"mode\": mode,\n                \"bull_analysis\": {\"analysis\": bull_analysis, \"success\": True, \"agent_name\": \"BullResearcher\", \"agent_role\": \"看多研究员\"},\n                \"bear_analysis\": {\"analysis\": bear_analysis, \"success\": True, \"agent_name\": \"BearResearcher\", \"agent_role\": \"看空研究员\"},\n                \"final_decision\": {\"decision\": decision, \"rating\": rating, \"success\": True, \"agent_name\": \"InvestmentManager\", \"agent_role\": \"投资经理\"}\n            })\n        \n        yield sse_event(\"phase\", {\"phase\": \"complete\", \"message\": \"分析完成\"})\n        \n    except Exception as e:\n        logger.error(f\"SSE Debate error: {e}\", exc_info=True)\n        yield sse_event(\"error\", {\"message\": str(e)})\n\n\n@router.post(\"/debate/stream\")\nasync def run_stock_debate_stream(\n    request: DebateRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    流式辩论分析（SSE）\n    \n    使用 Server-Sent Events 实时推送辩论过程\n    \"\"\"\n    logger.info(f\"🎯 收到流式辩论请求: stock_code={request.stock_code}, mode={request.mode}\")\n    \n    # 标准化股票代码\n    code = request.stock_code.upper()\n    if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n        short_code = code[2:]\n    else:\n        short_code = code\n        code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n    \n    # 获取关联新闻\n    from sqlalchemy import text\n    stock_codes_filter = text(\n        \"stock_codes @> ARRAY[:code1]::varchar[] OR stock_codes @> ARRAY[:code2]::varchar[]\"\n    ).bindparams(code1=short_code, code2=code)\n    \n    news_query = select(News).where(stock_codes_filter).order_by(desc(News.publish_time)).limit(10)\n    result = await db.execute(news_query)\n    news_list = result.scalars().all()\n    \n    news_data = [\n        {\n            \"id\": n.id,\n            \"title\": n.title,\n            \"content\": n.content[:500] if n.content else \"\",\n            \"sentiment_score\": n.sentiment_score,\n            \"publish_time\": n.publish_time.isoformat() if n.publish_time else None\n        }\n        for n in news_list\n    ]\n    \n    # 获取额外上下文\n    try:\n        debate_context = await stock_data_service.get_debate_context(code)\n        akshare_context = debate_context.get(\"summary\", \"\")\n    except Exception as e:\n        logger.warning(f\"获取财务数据失败: {e}\")\n        akshare_context = \"\"\n    \n    full_context = \"\"\n    if request.context:\n        full_context += f\"【用户补充】{request.context}\\n\\n\"\n    if akshare_context:\n        full_context += f\"【实时数据】{akshare_context}\"\n    \n    # 创建 LLM provider\n    llm_provider = get_llm_provider(\n        provider=request.provider,\n        model=request.model\n    ) if request.provider or request.model else get_llm_provider()\n    \n    mode = request.mode or \"parallel\"\n    stock_name = request.stock_name or code\n    \n    language = request.language or \"zh\"\n    \n    return StreamingResponse(\n        generate_debate_stream(code, stock_name, mode, full_context, news_data, llm_provider, language),\n        media_type=\"text/event-stream\",\n        headers={\n            \"Cache-Control\": \"no-cache\",\n            \"Connection\": \"keep-alive\",\n            \"X-Accel-Buffering\": \"no\"  # 禁用 nginx 缓冲\n        }\n    )\n\n\n# ============ 追问功能 ============\n\nclass FollowUpRequest(BaseModel):\n    \"\"\"追问请求\"\"\"\n    stock_code: str = Field(..., description=\"股票代码\")\n    stock_name: Optional[str] = Field(None, description=\"股票名称\")\n    question: str = Field(..., description=\"用户问题\")\n    target_agent: Optional[str] = Field(None, description=\"目标角色: bull, bear, manager\")\n    context: Optional[str] = Field(None, description=\"之前的辩论摘要\")\n\n\nasync def generate_followup_stream(\n    stock_code: str,\n    stock_name: str,\n    question: str,\n    target_agent: str,\n    context: str,\n    llm_provider\n) -> AsyncGenerator[str, None]:\n    \"\"\"\n    生成追问回复的 SSE 流\n    \"\"\"\n    def sse_event(event_type: str, data: Dict) -> str:\n        return f\"event: {event_type}\\ndata: {json.dumps(data, ensure_ascii=False)}\\n\\n\"\n    \n    # 确定回复角色\n    agent_config = {\n        'bull': {\n            'agent': 'BullResearcher',\n            'role': '多方辩手',\n            'system': '你是一位看多研究员，擅长从积极角度分析股票。回答用户问题时保持乐观但理性的态度。'\n        },\n        'bear': {\n            'agent': 'BearResearcher', \n            'role': '空方辩手',\n            'system': '你是一位看空研究员，擅长发现风险。回答用户问题时保持谨慎，重点指出潜在风险。'\n        },\n        'manager': {\n            'agent': 'InvestmentManager',\n            'role': '投资经理',\n            'system': '你是一位经验丰富的投资经理，擅长综合分析和给出投资建议。回答用户问题时客观、专业。'\n        }\n    }\n    \n    config = agent_config.get(target_agent, agent_config['manager'])\n    \n    try:\n        yield sse_event(\"agent\", {\n            \"agent\": config['agent'],\n            \"role\": config['role'],\n            \"content\": \"\",\n            \"is_start\": True\n        })\n        \n        prompt = f\"\"\"你正在参与关于 {stock_name}({stock_code}) 的投资讨论。\n\n之前的讨论背景：\n{context[:1500] if context else '暂无'}\n\n用户现在问你：\n\"{question}\"\n\n请以{config['role']}的身份回答（约150-200字）：\"\"\"\n\n        messages = [\n            {\"role\": \"system\", \"content\": config['system']},\n            {\"role\": \"user\", \"content\": prompt}\n        ]\n        \n        full_response = \"\"\n        for chunk in llm_provider.stream(messages):\n            full_response += chunk\n            yield sse_event(\"agent\", {\n                \"agent\": config['agent'],\n                \"role\": config['role'],\n                \"content\": chunk,\n                \"is_chunk\": True\n            })\n            await asyncio.sleep(0)\n        \n        yield sse_event(\"agent\", {\n            \"agent\": config['agent'],\n            \"role\": config['role'],\n            \"content\": \"\",\n            \"is_end\": True\n        })\n        \n        yield sse_event(\"complete\", {\"success\": True})\n        \n    except Exception as e:\n        logger.error(f\"Followup error: {e}\", exc_info=True)\n        yield sse_event(\"error\", {\"message\": str(e)})\n\n\n@router.post(\"/debate/followup\")\nasync def debate_followup(request: FollowUpRequest):\n    \"\"\"\n    辩论追问（SSE）\n    \n    用户可以在辩论结束后继续提问\n    - 默认由投资经理回答\n    - 如果问题中包含 @多方 或 @bull，由多方辩手回答\n    - 如果问题中包含 @空方 或 @bear，由空方辩手回答\n    - 如果问题中包含 @数据专员，则生成搜索计划（不直接回答）\n    \"\"\"\n    logger.info(f\"🎯 收到追问请求: {request.question[:50]}...\")\n    \n    # 解析目标角色\n    question = request.question\n    target = request.target_agent or 'manager'\n    \n    # 1. 检查是否提及数据专员（确认优先模式）\n    if '@数据专员' in question or target == 'data_collector':\n        logger.info(\"🔍 检测到数据专员提及，生成搜索计划...\")\n        \n        # 移除提及词\n        clean_question = question.replace('@数据专员', '').strip()\n        \n        # 创建数据专员\n        data_collector = create_data_collector()\n        \n        # 生成计划\n        plan = await data_collector.generate_search_plan(\n            query=clean_question,\n            stock_code=request.stock_code,\n            stock_name=request.stock_name or request.stock_code\n        )\n        \n        # 使用 SSE 返回计划事件\n        async def generate_plan_stream():\n            # Pydantic V2: 使用 model_dump_json() 或 json.dumps(model_dump())\n            plan_json = json.dumps(plan.model_dump(), ensure_ascii=False)\n            yield f\"event: task_plan\\ndata: {plan_json}\\n\\n\"\n            yield \"event: complete\\ndata: {\\\"success\\\": true}\\n\\n\"\n            \n        return StreamingResponse(\n            generate_plan_stream(),\n            media_type=\"text/event-stream\",\n            headers={\n                \"Cache-Control\": \"no-cache\",\n                \"Connection\": \"keep-alive\",\n                \"X-Accel-Buffering\": \"no\"\n            }\n        )\n\n    # 2. 普通追问逻辑\n    # 从问题中解析 @ 提及\n    if '@多方' in question or '@bull' in question.lower() or '@看多' in question:\n        target = 'bull'\n        question = question.replace('@多方', '').replace('@bull', '').replace('@Bull', '').replace('@看多', '').strip()\n    elif '@空方' in question or '@bear' in question.lower() or '@看空' in question:\n        target = 'bear'\n        question = question.replace('@空方', '').replace('@bear', '').replace('@Bear', '').replace('@看空', '').strip()\n    elif '@经理' in question or '@manager' in question.lower() or '@投资经理' in question:\n        target = 'manager'\n        question = question.replace('@经理', '').replace('@manager', '').replace('@Manager', '').replace('@投资经理', '').strip()\n    \n    # 创建 LLM provider\n    llm_provider = get_llm_provider()\n    \n    stock_name = request.stock_name or request.stock_code\n    \n    return StreamingResponse(\n        generate_followup_stream(\n            request.stock_code,\n            stock_name,\n            question,\n            target,\n            request.context or \"\",\n            llm_provider\n        ),\n        media_type=\"text/event-stream\",\n        headers={\n            \"Cache-Control\": \"no-cache\",\n            \"Connection\": \"keep-alive\",\n            \"X-Accel-Buffering\": \"no\"\n        }\n    )\n\n\n@router.post(\"/search/execute\")\nasync def execute_search(request: SearchExecuteRequest):\n    \"\"\"\n    执行确认后的搜索计划（SSE）\n    \"\"\"\n    from ...agents.data_collector_v2 import SearchPlan\n    \n    logger.info(f\"🚀 收到搜索执行请求: {request.plan.get('plan_id')}\")\n    \n    try:\n        # 反序列化计划\n        plan = SearchPlan(**request.plan)\n        \n        async def generate_search_results():\n            yield f\"event: phase\\ndata: {json.dumps({'phase': 'executing', 'message': '正在执行搜索任务...'}, ensure_ascii=False)}\\n\\n\"\n            \n            data_collector = create_data_collector()\n            \n            # 执行计划\n            results = await data_collector.execute_search_plan(plan)\n            \n            # 发送结果事件\n            yield f\"event: agent\\ndata: {json.dumps({'agent': 'DataCollector', 'role': '数据专员', 'content': results.get('summary', ''), 'is_chunk': False}, ensure_ascii=False)}\\n\\n\"\n            \n            yield f\"event: result\\ndata: {json.dumps(results, ensure_ascii=False)}\\n\\n\"\n            yield \"event: complete\\ndata: {\\\"success\\\": true}\\n\\n\"\n            \n        return StreamingResponse(\n            generate_search_results(),\n            media_type=\"text/event-stream\",\n            headers={\n                \"Cache-Control\": \"no-cache\",\n                \"Connection\": \"keep-alive\",\n                \"X-Accel-Buffering\": \"no\"\n            }\n        )\n        \n    except Exception as e:\n        logger.error(f\"执行搜索计划失败: {e}\", exc_info=True)\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/debate/{debate_id}\", response_model=DebateResponse)\nasync def get_debate_result(debate_id: str):\n    \"\"\"\n    获取辩论结果\n    \n    - **debate_id**: 辩论ID\n    \"\"\"\n    if debate_id not in debate_results:\n        raise HTTPException(status_code=404, detail=\"Debate not found\")\n    \n    result = debate_results[debate_id]\n    \n    return DebateResponse(\n        success=result.get(\"success\", False),\n        debate_id=debate_id,\n        stock_code=result.get(\"stock_code\", \"\"),\n        stock_name=result.get(\"stock_name\"),\n        bull_analysis=result.get(\"bull_analysis\"),\n        bear_analysis=result.get(\"bear_analysis\"),\n        final_decision=result.get(\"final_decision\"),\n        trajectory=result.get(\"trajectory\"),\n        execution_time=result.get(\"execution_time\")\n    )\n\n\n@router.get(\"/logs\", response_model=List[AgentLogEntry])\nasync def get_agent_logs(\n    limit: int = Query(50, le=200),\n    agent_name: Optional[str] = Query(None, description=\"按智能体名称筛选\"),\n    status: Optional[str] = Query(None, description=\"按状态筛选: started, completed, failed\")\n):\n    \"\"\"\n    获取智能体执行日志\n    \n    - **limit**: 返回数量限制\n    - **agent_name**: 按智能体名称筛选\n    - **status**: 按状态筛选\n    \"\"\"\n    logs = execution_logs.copy()\n    \n    # 筛选\n    if agent_name:\n        logs = [log for log in logs if log.get(\"agent_name\") == agent_name]\n    if status:\n        logs = [log for log in logs if log.get(\"status\") == status]\n    \n    # 按时间倒序\n    logs.sort(key=lambda x: x.get(\"timestamp\", \"\"), reverse=True)\n    \n    # 限制数量\n    logs = logs[:limit]\n    \n    return [AgentLogEntry(**log) for log in logs]\n\n\n@router.get(\"/metrics\", response_model=AgentMetrics)\nasync def get_agent_metrics():\n    \"\"\"\n    获取智能体性能指标\n    \"\"\"\n    total = len(execution_logs)\n    successful = len([log for log in execution_logs if log.get(\"status\") == \"completed\"])\n    failed = len([log for log in execution_logs if log.get(\"status\") == \"failed\"])\n    \n    # 计算平均执行时间\n    execution_times = [\n        log.get(\"execution_time\", 0) \n        for log in execution_logs \n        if log.get(\"execution_time\") is not None\n    ]\n    avg_time = sum(execution_times) / len(execution_times) if execution_times else 0\n    \n    # 按智能体统计\n    agent_stats = {}\n    for log in execution_logs:\n        agent_name = log.get(\"agent_name\", \"Unknown\")\n        if agent_name not in agent_stats:\n            agent_stats[agent_name] = {\n                \"total\": 0,\n                \"successful\": 0,\n                \"failed\": 0,\n                \"avg_time\": 0,\n                \"times\": []\n            }\n        agent_stats[agent_name][\"total\"] += 1\n        if log.get(\"status\") == \"completed\":\n            agent_stats[agent_name][\"successful\"] += 1\n        elif log.get(\"status\") == \"failed\":\n            agent_stats[agent_name][\"failed\"] += 1\n        if log.get(\"execution_time\"):\n            agent_stats[agent_name][\"times\"].append(log[\"execution_time\"])\n    \n    # 计算每个智能体的平均时间\n    for agent_name, stats in agent_stats.items():\n        if stats[\"times\"]:\n            stats[\"avg_time\"] = sum(stats[\"times\"]) / len(stats[\"times\"])\n        del stats[\"times\"]  # 不返回原始时间列表\n    \n    # 最近活动\n    recent_logs = sorted(\n        execution_logs, \n        key=lambda x: x.get(\"timestamp\", \"\"), \n        reverse=True\n    )[:10]\n    \n    recent_activity = [\n        {\n            \"timestamp\": log.get(\"timestamp\"),\n            \"agent_name\": log.get(\"agent_name\"),\n            \"action\": log.get(\"action\"),\n            \"status\": log.get(\"status\")\n        }\n        for log in recent_logs\n    ]\n    \n    return AgentMetrics(\n        total_executions=total,\n        successful_executions=successful,\n        failed_executions=failed,\n        avg_execution_time=round(avg_time, 2),\n        agent_stats=agent_stats,\n        recent_activity=recent_activity\n    )\n\n\n@router.get(\"/trajectory/{debate_id}\", response_model=List[TrajectoryStep])\nasync def get_debate_trajectory(debate_id: str):\n    \"\"\"\n    获取辩论执行轨迹\n    \n    - **debate_id**: 辩论ID\n    \"\"\"\n    if debate_id not in debate_results:\n        raise HTTPException(status_code=404, detail=\"Debate not found\")\n    \n    result = debate_results[debate_id]\n    trajectory = result.get(\"trajectory\", [])\n    \n    steps = []\n    for i, step in enumerate(trajectory):\n        steps.append(TrajectoryStep(\n            step_id=f\"{debate_id}_step_{i}\",\n            step_name=step.get(\"step\", \"unknown\"),\n            timestamp=step.get(\"timestamp\", \"\"),\n            agent_name=step.get(\"data\", {}).get(\"agent\"),\n            input_data=None,  # 可以扩展\n            output_data=step.get(\"data\"),\n            duration=None,\n            status=\"completed\"\n        ))\n    \n    return steps\n\n\n@router.delete(\"/logs\")\nasync def clear_logs():\n    \"\"\"\n    清空执行日志（仅用于开发测试）\n    \"\"\"\n    global execution_logs\n    count = len(execution_logs)\n    execution_logs = []\n    return {\"message\": f\"Cleared {count} logs\"}\n\n\n@router.get(\"/available\")\nasync def get_available_agents():\n    \"\"\"\n    获取可用的智能体列表\n    \"\"\"\n    return {\n        \"agents\": [\n            {\n                \"name\": \"NewsAnalyst\",\n                \"role\": \"金融新闻分析师\",\n                \"description\": \"分析金融新闻的情感、影响和关键信息\",\n                \"status\": \"active\"\n            },\n            {\n                \"name\": \"BullResearcher\",\n                \"role\": \"看多研究员\",\n                \"description\": \"从积极角度分析股票，发现投资机会\",\n                \"status\": \"active\"\n            },\n            {\n                \"name\": \"BearResearcher\",\n                \"role\": \"看空研究员\",\n                \"description\": \"从风险角度分析股票，识别潜在问题\",\n                \"status\": \"active\"\n            },\n            {\n                \"name\": \"InvestmentManager\",\n                \"role\": \"投资经理\",\n                \"description\": \"综合多方观点，做出投资决策\",\n                \"status\": \"active\"\n            },\n            {\n                \"name\": \"SearchAnalyst\",\n                \"role\": \"搜索分析师\",\n                \"description\": \"动态获取数据，支持 AkShare、BochaAI、网页搜索等\",\n                \"status\": \"active\"\n            }\n        ],\n        \"workflows\": [\n            {\n                \"name\": \"NewsAnalysisWorkflow\",\n                \"description\": \"新闻分析工作流：爬取 -> 清洗 -> 情感分析\",\n                \"agents\": [\"NewsAnalyst\"],\n                \"status\": \"active\"\n            },\n            {\n                \"name\": \"InvestmentDebateWorkflow\",\n                \"description\": \"投资辩论工作流：Bull vs Bear 多智能体辩论\",\n                \"agents\": [\"BullResearcher\", \"BearResearcher\", \"InvestmentManager\"],\n                \"status\": \"active\"\n            }\n        ]\n    }\n\n\n# ============ 辩论历史 API ============\n\nclass DebateHistoryRequest(BaseModel):\n    \"\"\"保存辩论历史请求\"\"\"\n    stock_code: str = Field(..., description=\"股票代码\")\n    sessions: List[Dict[str, Any]] = Field(..., description=\"会话列表\")\n\n\nclass DebateHistoryResponse(BaseModel):\n    \"\"\"辩论历史响应\"\"\"\n    success: bool\n    stock_code: str\n    sessions: List[Dict[str, Any]] = []\n    message: Optional[str] = None\n\n\n@router.get(\"/debate/history/{stock_code}\", response_model=DebateHistoryResponse)\nasync def get_debate_history(\n    stock_code: str,\n    limit: int = Query(10, le=50, description=\"返回会话数量限制\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取股票的辩论历史\n    \n    - **stock_code**: 股票代码\n    - **limit**: 返回数量限制（默认10，最大50）\n    \"\"\"\n    from ...models.debate_history import DebateHistory\n    \n    try:\n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        # 查询历史记录\n        query = select(DebateHistory).where(\n            DebateHistory.stock_code == code\n        ).order_by(desc(DebateHistory.updated_at)).limit(limit)\n        \n        result = await db.execute(query)\n        histories = result.scalars().all()\n        \n        sessions = []\n        for h in histories:\n            sessions.append({\n                \"id\": h.session_id,\n                \"stockCode\": h.stock_code,\n                \"stockName\": h.stock_name,\n                \"mode\": h.mode,\n                \"messages\": h.messages,\n                \"createdAt\": h.created_at.isoformat() if h.created_at else None,\n                \"updatedAt\": h.updated_at.isoformat() if h.updated_at else None\n            })\n        \n        return DebateHistoryResponse(\n            success=True,\n            stock_code=code,\n            sessions=sessions\n        )\n        \n    except Exception as e:\n        logger.error(f\"获取辩论历史失败: {e}\", exc_info=True)\n        return DebateHistoryResponse(\n            success=False,\n            stock_code=stock_code,\n            message=str(e)\n        )\n\n\n@router.post(\"/debate/history\", response_model=DebateHistoryResponse)\nasync def save_debate_history(\n    request: DebateHistoryRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    保存辩论历史\n    \n    - **stock_code**: 股票代码\n    - **sessions**: 会话列表\n    \"\"\"\n    from ...models.debate_history import DebateHistory\n    \n    try:\n        # 标准化股票代码\n        code = request.stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        saved_count = 0\n        \n        for session_data in request.sessions:\n            session_id = session_data.get(\"id\")\n            if not session_id:\n                continue\n            \n            messages = session_data.get(\"messages\", [])\n            logger.info(f\"📥 Processing session {session_id}: {len(messages)} messages\")\n            logger.info(f\"📥 Message roles: {[m.get('role') for m in messages]}\")\n            \n            # 检查是否已存在\n            existing_query = select(DebateHistory).where(\n                DebateHistory.session_id == session_id\n            )\n            existing_result = await db.execute(existing_query)\n            existing = existing_result.scalar_one_or_none()\n            \n            if existing:\n                # 更新现有记录\n                logger.info(f\"📥 Updating existing session, old messages: {len(existing.messages)}, new: {len(messages)}\")\n                existing.messages = messages\n                existing.mode = session_data.get(\"mode\")\n                existing.updated_at = datetime.utcnow()\n            else:\n                # 解析 created_at，确保是 naive datetime（去掉时区信息）\n                created_at_str = session_data.get(\"createdAt\")\n                if created_at_str:\n                    # 处理 ISO 格式字符串，移除末尾的 'Z' 并转换\n                    if created_at_str.endswith('Z'):\n                        created_at_str = created_at_str[:-1] + '+00:00'\n                    parsed_dt = datetime.fromisoformat(created_at_str)\n                    # 转换为 naive datetime (去掉时区信息)\n                    if parsed_dt.tzinfo is not None:\n                        created_at = parsed_dt.replace(tzinfo=None)\n                    else:\n                        created_at = parsed_dt\n                else:\n                    created_at = datetime.utcnow()\n                \n                # 创建新记录\n                new_history = DebateHistory(\n                    session_id=session_id,\n                    stock_code=code,\n                    stock_name=session_data.get(\"stockName\"),\n                    mode=session_data.get(\"mode\"),\n                    messages=session_data.get(\"messages\", []),\n                    created_at=created_at,\n                    updated_at=datetime.utcnow()\n                )\n                db.add(new_history)\n            \n            saved_count += 1\n        \n        await db.commit()\n        \n        logger.info(f\"保存了 {saved_count} 个辩论会话到数据库\")\n        \n        return DebateHistoryResponse(\n            success=True,\n            stock_code=code,\n            message=f\"成功保存 {saved_count} 个会话\"\n        )\n        \n    except Exception as e:\n        logger.error(f\"保存辩论历史失败: {e}\", exc_info=True)\n        await db.rollback()\n        return DebateHistoryResponse(\n            success=False,\n            stock_code=request.stock_code,\n            message=str(e)\n        )\n\n\n@router.delete(\"/debate/history/{stock_code}\")\nasync def delete_debate_history(\n    stock_code: str,\n    session_id: Optional[str] = Query(None, description=\"删除指定会话，不传则删除所有\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    删除辩论历史\n    \n    - **stock_code**: 股票代码\n    - **session_id**: 会话ID（可选，不传则删除该股票的所有历史）\n    \"\"\"\n    from ...models.debate_history import DebateHistory\n    from sqlalchemy import delete\n    \n    try:\n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        if session_id:\n            # 删除指定会话\n            stmt = delete(DebateHistory).where(\n                DebateHistory.session_id == session_id\n            )\n        else:\n            # 删除该股票的所有会话\n            stmt = delete(DebateHistory).where(\n                DebateHistory.stock_code == code\n            )\n        \n        result = await db.execute(stmt)\n        await db.commit()\n        \n        deleted_count = result.rowcount\n        \n        return {\n            \"success\": True,\n            \"stock_code\": code,\n            \"deleted_count\": deleted_count,\n            \"message\": f\"删除了 {deleted_count} 条记录\"\n        }\n        \n    except Exception as e:\n        logger.error(f\"删除辩论历史失败: {e}\", exc_info=True)\n        await db.rollback()\n        return {\n            \"success\": False,\n            \"stock_code\": stock_code,\n            \"message\": str(e)\n    }\n\n"
  },
  {
    "path": "backend/app/api/v1/alpha_mining.py",
    "content": "\"\"\"\nAlpha Mining REST API\n\n提供因子挖掘相关的 HTTP 接口。\n\nEndpoints:\n- POST /alpha-mining/mine - 启动因子挖掘任务\n- POST /alpha-mining/mine/stream - SSE 流式训练进度\n- POST /alpha-mining/evaluate - 评估因子表达式\n- POST /alpha-mining/generate - 生成候选因子\n- POST /alpha-mining/compare-sentiment - 情感融合效果对比\n- POST /alpha-mining/agent-demo - AgenticX Agent 调用演示\n- GET /alpha-mining/factors - 获取已发现的因子列表\n- GET /alpha-mining/status - 获取挖掘状态\n- GET /alpha-mining/operators - 获取操作符列表\n\"\"\"\n\nfrom fastapi import APIRouter, HTTPException, BackgroundTasks\nfrom fastapi.responses import StreamingResponse\nfrom pydantic import BaseModel, Field\nfrom typing import List, Optional, Dict, Any, AsyncGenerator\nfrom datetime import datetime\nimport logging\nimport uuid\nimport asyncio\nimport json\nimport queue\nimport threading\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter(prefix=\"/alpha-mining\", tags=[\"Alpha Mining\"])\n\n# 存储挖掘任务状态\n_mining_tasks: Dict[str, Dict[str, Any]] = {}\n_discovered_factors: List[Dict[str, Any]] = []\n\n\n# ============================================================================\n# Request/Response Models\n# ============================================================================\n\nclass MineRequest(BaseModel):\n    \"\"\"因子挖掘请求\"\"\"\n    stock_code: Optional[str] = Field(None, description=\"股票代码\")\n    num_steps: int = Field(100, ge=1, le=10000, description=\"训练步数\")\n    use_sentiment: bool = Field(True, description=\"是否使用情感特征\")\n    batch_size: int = Field(16, ge=1, le=128, description=\"批量大小\")\n\n\nclass EvaluateRequest(BaseModel):\n    \"\"\"因子评估请求\"\"\"\n    formula: str = Field(..., description=\"因子表达式\")\n    stock_code: Optional[str] = Field(None, description=\"股票代码\")\n\n\nclass GenerateRequest(BaseModel):\n    \"\"\"因子生成请求\"\"\"\n    batch_size: int = Field(10, ge=1, le=100, description=\"生成数量\")\n    max_len: int = Field(8, ge=4, le=16, description=\"最大表达式长度\")\n\n\nclass FactorResponse(BaseModel):\n    \"\"\"因子响应\"\"\"\n    formula: List[int] = Field(..., description=\"Token 序列\")\n    formula_str: str = Field(..., description=\"表达式字符串\")\n    sortino: float = Field(..., description=\"Sortino Ratio\")\n    sharpe: Optional[float] = Field(None, description=\"Sharpe Ratio\")\n    ic: Optional[float] = Field(None, description=\"IC\")\n    discovered_at: Optional[str] = Field(None, description=\"发现时间\")\n\n\nclass MineResponse(BaseModel):\n    \"\"\"挖掘响应\"\"\"\n    success: bool\n    task_id: str\n    message: str\n    best_factor: Optional[FactorResponse] = None\n\n\nclass EvaluateResponse(BaseModel):\n    \"\"\"评估响应\"\"\"\n    success: bool\n    formula: str\n    metrics: Optional[Dict[str, float]] = None\n    error: Optional[str] = None\n\n\nclass GenerateResponse(BaseModel):\n    \"\"\"生成响应\"\"\"\n    success: bool\n    generated: int\n    valid: int\n    factors: List[Dict[str, Any]]\n\n\nclass TaskStatusResponse(BaseModel):\n    \"\"\"任务状态响应\"\"\"\n    task_id: str\n    status: str  # pending, running, completed, failed\n    progress: float  # 0-100\n    result: Optional[Dict[str, Any]] = None\n    error: Optional[str] = None\n    started_at: Optional[str] = None\n    completed_at: Optional[str] = None\n\n\nclass SentimentCompareRequest(BaseModel):\n    \"\"\"情感融合对比请求\"\"\"\n    num_steps: int = Field(50, ge=10, le=500, description=\"训练步数\")\n    batch_size: int = Field(16, ge=1, le=64, description=\"批量大小\")\n\n\nclass SentimentCompareResponse(BaseModel):\n    \"\"\"情感融合对比响应\"\"\"\n    success: bool\n    with_sentiment: Dict[str, Any] = Field(..., description=\"含情感特征的结果\")\n    without_sentiment: Dict[str, Any] = Field(..., description=\"不含情感特征的结果\")\n    improvement: Dict[str, float] = Field(..., description=\"改进幅度\")\n\n\nclass AgentDemoRequest(BaseModel):\n    \"\"\"Agent 调用演示请求\"\"\"\n    stock_code: Optional[str] = Field(None, description=\"股票代码\")\n    num_steps: int = Field(30, ge=10, le=200, description=\"训练步数\")\n    use_sentiment: bool = Field(True, description=\"使用情感特征\")\n\n\nclass AgentDemoResponse(BaseModel):\n    \"\"\"Agent 调用演示响应\"\"\"\n    success: bool\n    agent_name: str\n    tool_name: str\n    input_params: Dict[str, Any]\n    output: Optional[Dict[str, Any]] = None\n    execution_time: float\n    logs: List[str] = []\n\n\n# ============================================================================\n# Helper Functions\n# ============================================================================\n\ndef _get_alpha_mining_components():\n    \"\"\"获取 Alpha Mining 组件\"\"\"\n    try:\n        from ...alpha_mining import (\n            AlphaMiningConfig,\n            FactorVocab,\n            FactorVM,\n            AlphaGenerator,\n            AlphaTrainer,\n            FactorEvaluator,\n            generate_mock_data\n        )\n        \n        config = AlphaMiningConfig()\n        vocab = FactorVocab()\n        vm = FactorVM(vocab=vocab)\n        generator = AlphaGenerator(vocab=vocab, config=config)\n        evaluator = FactorEvaluator(config=config)\n        \n        return {\n            \"config\": config,\n            \"vocab\": vocab,\n            \"vm\": vm,\n            \"generator\": generator,\n            \"evaluator\": evaluator,\n            \"generate_mock_data\": generate_mock_data\n        }\n    except ImportError as e:\n        logger.error(f\"Failed to import Alpha Mining: {e}\")\n        raise HTTPException(\n            status_code=503,\n            detail=\"Alpha Mining module not available\"\n        )\n\n\nasync def _run_mining_task(task_id: str, request: MineRequest):\n    \"\"\"后台运行挖掘任务\"\"\"\n    global _discovered_factors\n    \n    try:\n        _mining_tasks[task_id][\"status\"] = \"running\"\n        _mining_tasks[task_id][\"started_at\"] = datetime.utcnow().isoformat()\n        \n        components = _get_alpha_mining_components()\n        \n        from ...alpha_mining import AlphaTrainer\n        \n        # 准备数据\n        features, returns = components[\"generate_mock_data\"](\n            num_samples=50,\n            num_features=6,\n            time_steps=252,\n            seed=42\n        )\n        \n        # 创建训练器\n        config = components[\"config\"]\n        config.batch_size = request.batch_size\n        \n        trainer = AlphaTrainer(\n            generator=components[\"generator\"],\n            vocab=components[\"vocab\"],\n            config=config\n        )\n        \n        # 训练\n        result = trainer.train(\n            features=features,\n            returns=returns,\n            num_steps=request.num_steps,\n            progress_bar=False\n        )\n        \n        # 保存结果\n        if result[\"best_formula\"]:\n            factor_info = {\n                \"formula\": result[\"best_formula\"],\n                \"formula_str\": result[\"best_formula_str\"],\n                \"sortino\": result[\"best_score\"],\n                \"discovered_at\": datetime.utcnow().isoformat(),\n                \"task_id\": task_id,\n                \"stock_code\": request.stock_code\n            }\n            _discovered_factors.append(factor_info)\n            \n            # 保持只存储最优的 100 个\n            _discovered_factors.sort(key=lambda x: x.get(\"sortino\", 0), reverse=True)\n            _discovered_factors = _discovered_factors[:100]\n        \n        _mining_tasks[task_id][\"status\"] = \"completed\"\n        _mining_tasks[task_id][\"progress\"] = 100\n        _mining_tasks[task_id][\"completed_at\"] = datetime.utcnow().isoformat()\n        _mining_tasks[task_id][\"result\"] = {\n            \"best_factor\": result[\"best_formula_str\"],\n            \"best_score\": result[\"best_score\"],\n            \"total_steps\": result[\"total_steps\"]\n        }\n        \n    except Exception as e:\n        logger.error(f\"Mining task {task_id} failed: {e}\")\n        _mining_tasks[task_id][\"status\"] = \"failed\"\n        _mining_tasks[task_id][\"error\"] = str(e)\n        _mining_tasks[task_id][\"completed_at\"] = datetime.utcnow().isoformat()\n\n\n# ============================================================================\n# API Endpoints\n# ============================================================================\n\n@router.post(\"/mine\", response_model=MineResponse)\nasync def mine_factors(\n    request: MineRequest,\n    background_tasks: BackgroundTasks\n):\n    \"\"\"\n    启动因子挖掘任务\n    \n    使用强化学习自动发现有效的交易因子。\n    任务在后台执行，可通过 /status/{task_id} 查询进度。\n    \"\"\"\n    task_id = str(uuid.uuid4())\n    \n    # 初始化任务状态\n    _mining_tasks[task_id] = {\n        \"status\": \"pending\",\n        \"progress\": 0,\n        \"request\": request.dict(),\n        \"created_at\": datetime.utcnow().isoformat()\n    }\n    \n    # 添加后台任务\n    background_tasks.add_task(_run_mining_task, task_id, request)\n    \n    return MineResponse(\n        success=True,\n        task_id=task_id,\n        message=f\"因子挖掘任务已启动，预计 {request.num_steps} 步训练\"\n    )\n\n\n@router.post(\"/mine/stream\")\nasync def mine_factors_stream(request: MineRequest):\n    \"\"\"\n    SSE 流式返回训练进度\n    \n    实时推送每步训练指标，包括 loss、reward、best_score 等。\n    前端可使用 EventSource 订阅。\n    \"\"\"\n    async def event_generator() -> AsyncGenerator[str, None]:\n        try:\n            components = _get_alpha_mining_components()\n            \n            from ...alpha_mining import AlphaTrainer\n            \n            # 准备数据\n            features, returns = components[\"generate_mock_data\"](\n                num_samples=50,\n                num_features=6,\n                time_steps=252,\n                seed=42\n            )\n            \n            # 创建训练器\n            config = components[\"config\"]\n            config.batch_size = request.batch_size\n            \n            trainer = AlphaTrainer(\n                generator=components[\"generator\"],\n                vocab=components[\"vocab\"],\n                config=config\n            )\n            \n            # 使用队列在线程间传递数据\n            metrics_queue: queue.Queue = queue.Queue()\n            training_complete = threading.Event()\n            training_error: List[str] = []\n            \n            def step_callback(metrics: Dict[str, Any]):\n                \"\"\"每步训练回调，将指标放入队列\"\"\"\n                metrics_queue.put(metrics)\n            \n            def run_training():\n                \"\"\"在后台线程中运行训练\"\"\"\n                try:\n                    trainer.train(\n                        features=features,\n                        returns=returns,\n                        num_steps=request.num_steps,\n                        progress_bar=False,\n                        step_callback=step_callback\n                    )\n                except Exception as e:\n                    training_error.append(str(e))\n                finally:\n                    training_complete.set()\n            \n            # 启动训练线程\n            training_thread = threading.Thread(target=run_training)\n            training_thread.start()\n            \n            # 发送开始事件\n            yield f\"event: start\\ndata: {json.dumps({'status': 'started', 'total_steps': request.num_steps})}\\n\\n\"\n            \n            # 流式发送训练进度\n            while not training_complete.is_set() or not metrics_queue.empty():\n                try:\n                    metrics = metrics_queue.get(timeout=0.1)\n                    event_data = {\n                        \"step\": metrics.get(\"step\", 0),\n                        \"progress\": metrics.get(\"progress\", 0),\n                        \"loss\": round(metrics.get(\"loss\", 0), 6),\n                        \"avg_reward\": round(metrics.get(\"avg_reward\", 0), 6),\n                        \"max_reward\": round(metrics.get(\"max_reward\", 0), 6),\n                        \"valid_ratio\": round(metrics.get(\"valid_ratio\", 0), 4),\n                        \"best_score\": round(metrics.get(\"best_score\", -999), 6),\n                        \"best_formula\": metrics.get(\"best_formula\", \"\"),\n                    }\n                    yield f\"event: progress\\ndata: {json.dumps(event_data)}\\n\\n\"\n                except queue.Empty:\n                    continue\n            \n            # 等待训练线程结束\n            training_thread.join(timeout=5)\n            \n            # 发送完成事件\n            if training_error:\n                yield f\"event: error\\ndata: {json.dumps({'error': training_error[0]})}\\n\\n\"\n            else:\n                final_result = {\n                    \"status\": \"completed\",\n                    \"best_score\": round(trainer.best_score, 6),\n                    \"best_formula\": trainer.best_formula_str,\n                    \"total_steps\": trainer.step_count,\n                }\n                yield f\"event: complete\\ndata: {json.dumps(final_result)}\\n\\n\"\n                \n                # 保存发现的因子\n                if trainer.best_formula:\n                    _discovered_factors.append({\n                        \"formula\": trainer.best_formula,\n                        \"formula_str\": trainer.best_formula_str,\n                        \"sortino\": trainer.best_score,\n                        \"discovered_at\": datetime.utcnow().isoformat(),\n                        \"stock_code\": request.stock_code\n                    })\n                    \n        except Exception as e:\n            logger.error(f\"SSE streaming error: {e}\")\n            yield f\"event: error\\ndata: {json.dumps({'error': str(e)})}\\n\\n\"\n    \n    return StreamingResponse(\n        event_generator(),\n        media_type=\"text/event-stream\",\n        headers={\n            \"Cache-Control\": \"no-cache\",\n            \"Connection\": \"keep-alive\",\n            \"X-Accel-Buffering\": \"no\",\n        }\n    )\n\n\n@router.post(\"/compare-sentiment\", response_model=SentimentCompareResponse)\nasync def compare_sentiment_effect(request: SentimentCompareRequest):\n    \"\"\"\n    对比有/无情感特征的因子挖掘效果\n    \n    分别使用纯技术特征和技术+情感特征进行因子挖掘，\n    对比最终效果差异。\n    \"\"\"\n    try:\n        components = _get_alpha_mining_components()\n        from ...alpha_mining import AlphaTrainer, AlphaMiningConfig\n        \n        results = {}\n        \n        for use_sentiment in [False, True]:\n            # 准备数据\n            num_features = 6 if use_sentiment else 4  # 4个技术特征 + 2个情感特征\n            features, returns = components[\"generate_mock_data\"](\n                num_samples=50,\n                num_features=num_features,\n                time_steps=252,\n                seed=42\n            )\n            \n            # 训练\n            config = AlphaMiningConfig()\n            config.batch_size = request.batch_size\n            \n            trainer = AlphaTrainer(\n                generator=components[\"generator\"],\n                vocab=components[\"vocab\"],\n                config=config\n            )\n            \n            result = trainer.train(\n                features=features,\n                returns=returns,\n                num_steps=request.num_steps,\n                progress_bar=False\n            )\n            \n            key = \"with_sentiment\" if use_sentiment else \"without_sentiment\"\n            results[key] = {\n                \"best_score\": round(result[\"best_score\"], 6),\n                \"best_formula\": result[\"best_formula_str\"],\n                \"total_steps\": result[\"total_steps\"],\n                \"num_features\": num_features,\n            }\n        \n        # 计算改进幅度\n        with_score = results[\"with_sentiment\"][\"best_score\"]\n        without_score = results[\"without_sentiment\"][\"best_score\"]\n        \n        if without_score != 0:\n            improvement_pct = (with_score - without_score) / abs(without_score) * 100\n        else:\n            improvement_pct = 0 if with_score == 0 else 100\n        \n        improvement = {\n            \"score_diff\": round(with_score - without_score, 6),\n            \"improvement_pct\": round(improvement_pct, 2),\n        }\n        \n        return SentimentCompareResponse(\n            success=True,\n            with_sentiment=results[\"with_sentiment\"],\n            without_sentiment=results[\"without_sentiment\"],\n            improvement=improvement\n        )\n        \n    except Exception as e:\n        logger.error(f\"Sentiment comparison failed: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/agent-demo\", response_model=AgentDemoResponse)\nasync def agent_alpha_mining_demo(request: AgentDemoRequest):\n    \"\"\"\n    演示 AgenticX Agent 调用 AlphaMiningTool\n    \n    展示如何通过 Agent 接口调用因子挖掘功能。\n    \"\"\"\n    import time\n    start_time = time.time()\n    logs = []\n    \n    try:\n        logs.append(f\"[{datetime.utcnow().isoformat()}] Agent 初始化...\")\n        logs.append(f\"[{datetime.utcnow().isoformat()}] 调用 AlphaMiningTool...\")\n        \n        # 模拟 Agent 调用\n        components = _get_alpha_mining_components()\n        from ...alpha_mining import AlphaTrainer\n        \n        input_params = {\n            \"stock_code\": request.stock_code,\n            \"num_steps\": request.num_steps,\n            \"use_sentiment\": request.use_sentiment,\n        }\n        logs.append(f\"[{datetime.utcnow().isoformat()}] Tool 参数: {json.dumps(input_params)}\")\n        \n        # 准备数据\n        features, returns = components[\"generate_mock_data\"](\n            num_samples=50,\n            num_features=6 if request.use_sentiment else 4,\n            time_steps=252,\n            seed=42\n        )\n        logs.append(f\"[{datetime.utcnow().isoformat()}] 数据准备完成\")\n        \n        # 训练\n        trainer = AlphaTrainer(\n            generator=components[\"generator\"],\n            vocab=components[\"vocab\"],\n            config=components[\"config\"]\n        )\n        \n        logs.append(f\"[{datetime.utcnow().isoformat()}] 开始训练...\")\n        result = trainer.train(\n            features=features,\n            returns=returns,\n            num_steps=request.num_steps,\n            progress_bar=False\n        )\n        logs.append(f\"[{datetime.utcnow().isoformat()}] 训练完成\")\n        \n        execution_time = time.time() - start_time\n        \n        output = {\n            \"best_formula\": result[\"best_formula_str\"],\n            \"best_score\": round(result[\"best_score\"], 6),\n            \"total_steps\": result[\"total_steps\"],\n        }\n        logs.append(f\"[{datetime.utcnow().isoformat()}] 返回结果: {json.dumps(output)}\")\n        \n        return AgentDemoResponse(\n            success=True,\n            agent_name=\"QuantitativeAgent\",\n            tool_name=\"AlphaMiningTool\",\n            input_params=input_params,\n            output=output,\n            execution_time=round(execution_time, 2),\n            logs=logs\n        )\n        \n    except Exception as e:\n        execution_time = time.time() - start_time\n        logs.append(f\"[{datetime.utcnow().isoformat()}] 错误: {str(e)}\")\n        \n        return AgentDemoResponse(\n            success=False,\n            agent_name=\"QuantitativeAgent\",\n            tool_name=\"AlphaMiningTool\",\n            input_params=request.dict(),\n            output=None,\n            execution_time=round(execution_time, 2),\n            logs=logs\n        )\n\n\n@router.post(\"/evaluate\", response_model=EvaluateResponse)\nasync def evaluate_factor(request: EvaluateRequest):\n    \"\"\"\n    评估因子表达式\n    \n    对指定的因子表达式进行回测评估，返回各项指标。\n    \"\"\"\n    try:\n        components = _get_alpha_mining_components()\n        vm = components[\"vm\"]\n        evaluator = components[\"evaluator\"]\n        \n        # 解析公式\n        tokens = []\n        parts = request.formula.replace(\"(\", \" \").replace(\")\", \" \").replace(\",\", \" \").split()\n        for part in parts:\n            part = part.strip()\n            if not part:\n                continue\n            try:\n                token = vm.vocab.name_to_token(part)\n                tokens.append(token)\n            except (ValueError, KeyError):\n                continue\n        \n        if not tokens:\n            return EvaluateResponse(\n                success=False,\n                formula=request.formula,\n                error=\"无法解析因子表达式\"\n            )\n        \n        # 准备数据\n        features, returns = components[\"generate_mock_data\"](\n            num_samples=50,\n            num_features=6,\n            time_steps=252,\n            seed=42\n        )\n        \n        # 执行因子\n        factor = vm.execute(tokens, features)\n        if factor is None:\n            return EvaluateResponse(\n                success=False,\n                formula=request.formula,\n                error=\"因子执行失败\"\n            )\n        \n        # 评估\n        metrics = evaluator.evaluate(factor, returns)\n        \n        return EvaluateResponse(\n            success=True,\n            formula=request.formula,\n            metrics={\n                \"sortino_ratio\": metrics[\"sortino_ratio\"],\n                \"sharpe_ratio\": metrics[\"sharpe_ratio\"],\n                \"ic\": metrics[\"ic\"],\n                \"rank_ic\": metrics[\"rank_ic\"],\n                \"max_drawdown\": metrics[\"max_drawdown\"],\n                \"turnover\": metrics[\"turnover\"],\n                \"total_return\": metrics[\"total_return\"],\n                \"win_rate\": metrics[\"win_rate\"]\n            }\n        )\n        \n    except Exception as e:\n        logger.error(f\"Factor evaluation failed: {e}\")\n        return EvaluateResponse(\n            success=False,\n            formula=request.formula,\n            error=str(e)\n        )\n\n\n@router.post(\"/generate\", response_model=GenerateResponse)\nasync def generate_factors(request: GenerateRequest):\n    \"\"\"\n    生成候选因子\n    \n    使用训练好的模型生成一批候选因子表达式。\n    \"\"\"\n    try:\n        components = _get_alpha_mining_components()\n        generator = components[\"generator\"]\n        vm = components[\"vm\"]\n        evaluator = components[\"evaluator\"]\n        \n        # 生成因子\n        formulas, _ = generator.generate(\n            batch_size=request.batch_size,\n            max_len=request.max_len\n        )\n        \n        # 准备数据用于评估\n        features, returns = components[\"generate_mock_data\"](\n            num_samples=50,\n            num_features=6,\n            time_steps=252,\n            seed=42\n        )\n        \n        # 评估每个因子\n        results = []\n        for formula in formulas:\n            factor = vm.execute(formula, features)\n            if factor is not None and factor.std() > 1e-6:\n                try:\n                    metrics = evaluator.evaluate(factor, returns)\n                    results.append({\n                        \"formula\": formula,\n                        \"formula_str\": vm.decode(formula),\n                        \"sortino\": round(metrics[\"sortino_ratio\"], 4),\n                        \"ic\": round(metrics[\"ic\"], 4)\n                    })\n                except Exception:\n                    continue\n        \n        # 按 Sortino 排序\n        results.sort(key=lambda x: x[\"sortino\"], reverse=True)\n        \n        return GenerateResponse(\n            success=True,\n            generated=len(formulas),\n            valid=len(results),\n            factors=results[:10]\n        )\n        \n    except Exception as e:\n        logger.error(f\"Factor generation failed: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/factors\")\nasync def get_factors(\n    top_k: int = 10,\n    stock_code: Optional[str] = None\n):\n    \"\"\"\n    获取已发现的因子列表\n    \n    返回按 Sortino Ratio 排序的最优因子。\n    \"\"\"\n    factors = _discovered_factors.copy()\n    \n    # 按股票代码过滤\n    if stock_code:\n        factors = [f for f in factors if f.get(\"stock_code\") == stock_code]\n    \n    # 取 top_k\n    factors = factors[:top_k]\n    \n    return {\n        \"success\": True,\n        \"total\": len(_discovered_factors),\n        \"returned\": len(factors),\n        \"factors\": factors\n    }\n\n\n@router.get(\"/status/{task_id}\", response_model=TaskStatusResponse)\nasync def get_task_status(task_id: str):\n    \"\"\"\n    获取挖掘任务状态\n    \"\"\"\n    if task_id not in _mining_tasks:\n        raise HTTPException(status_code=404, detail=\"Task not found\")\n    \n    task = _mining_tasks[task_id]\n    \n    return TaskStatusResponse(\n        task_id=task_id,\n        status=task[\"status\"],\n        progress=task.get(\"progress\", 0),\n        result=task.get(\"result\"),\n        error=task.get(\"error\"),\n        started_at=task.get(\"started_at\"),\n        completed_at=task.get(\"completed_at\")\n    )\n\n\n@router.get(\"/operators\")\nasync def get_operators():\n    \"\"\"\n    获取支持的操作符列表\n    \"\"\"\n    try:\n        from ...alpha_mining.dsl.ops import OPS_CONFIG, get_op_names\n        from ...alpha_mining.dsl.vocab import FEATURES\n        \n        operators = []\n        for name, func, arity in OPS_CONFIG:\n            operators.append({\n                \"name\": name,\n                \"arity\": arity,\n                \"description\": func.__doc__ or \"\"\n            })\n        \n        return {\n            \"success\": True,\n            \"features\": FEATURES,\n            \"operators\": operators\n        }\n        \n    except Exception as e:\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.delete(\"/tasks/{task_id}\")\nasync def delete_task(task_id: str):\n    \"\"\"\n    删除任务记录\n    \"\"\"\n    if task_id not in _mining_tasks:\n        raise HTTPException(status_code=404, detail=\"Task not found\")\n    \n    del _mining_tasks[task_id]\n    \n    return {\"success\": True, \"message\": f\"Task {task_id} deleted\"}\n"
  },
  {
    "path": "backend/app/api/v1/analysis.py",
    "content": "\"\"\"\n分析任务 API 路由\n\"\"\"\nimport logging\nimport asyncio\nimport json\nfrom typing import List, Optional\nfrom fastapi import APIRouter, Depends, HTTPException, BackgroundTasks, Body, Request\nfrom pydantic import BaseModel, Field\nfrom sqlalchemy.ext.asyncio import AsyncSession\n\nfrom ...core.database import get_db\nfrom ...models.database import AsyncSessionLocal\nfrom ...services.analysis_service import get_analysis_service\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n\n# Pydantic 模型\nclass AnalysisRequest(BaseModel):\n    \"\"\"分析请求模型\"\"\"\n    provider: Optional[str] = Field(default=None, description=\"LLM提供商 (bailian/openai/deepseek/kimi/zhipu)\")\n    model: Optional[str] = Field(default=None, description=\"模型名称\")\n\n\nclass AnalysisResponse(BaseModel):\n    \"\"\"分析响应模型\"\"\"\n    success: bool\n    analysis_id: Optional[int] = None\n    news_id: int\n    sentiment: Optional[str] = None\n    sentiment_score: Optional[float] = None\n    confidence: Optional[float] = None\n    summary: Optional[str] = None\n    execution_time: Optional[float] = None\n    error: Optional[str] = None\n\n\nclass AnalysisDetailResponse(BaseModel):\n    \"\"\"分析详情响应模型\"\"\"\n    model_config = {\"from_attributes\": True}\n    \n    id: int\n    news_id: int\n    agent_name: str\n    agent_role: Optional[str] = None\n    analysis_result: str\n    summary: Optional[str] = None\n    sentiment: Optional[str] = None\n    sentiment_score: Optional[float] = None\n    confidence: Optional[float] = None\n    execution_time: Optional[float] = None\n    created_at: str\n\n\nclass BatchAnalyzeRequest(BaseModel):\n    \"\"\"批量分析请求模型\"\"\"\n    news_ids: List[int] = Field(..., description=\"要分析的新闻ID列表\")\n    provider: Optional[str] = Field(default=None, description=\"LLM提供商\")\n    model: Optional[str] = Field(default=None, description=\"模型名称\")\n\n\nclass BatchAnalyzeResponse(BaseModel):\n    \"\"\"批量分析响应模型\"\"\"\n    success: bool\n    message: str\n    total_count: int\n    success_count: int\n    failed_count: int\n    results: List[AnalysisResponse]\n\n\n# 后台任务：执行分析\nasync def run_analysis_task(news_id: int, db: AsyncSession):\n    \"\"\"\n    后台任务：执行新闻分析\n    \"\"\"\n    try:\n        analysis_service = get_analysis_service()\n        result = await analysis_service.analyze_news(news_id, db)\n        logger.info(f\"Analysis task completed for news {news_id}: {result}\")\n    except Exception as e:\n        logger.error(f\"Analysis task failed for news {news_id}: {e}\")\n\n\n# API 端点\n# 注意：具体路径（如 /news/batch）必须在参数路径（如 /news/{news_id}）之前定义\n# 否则 FastAPI 会把 \"batch\" 当作 news_id 参数\n\n@router.post(\"/news/batch\", response_model=BatchAnalyzeResponse)\nasync def batch_analyze_news(\n    request_body: BatchAnalyzeRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    批量分析新闻（并发）\n    \n    - **news_ids**: 要分析的新闻ID列表\n    - **provider**: LLM提供商（可选）\n    - **model**: 模型名称（可选）\n    \"\"\"\n    try:\n        logger.info(f\"Received batch analyze request: news_ids={request_body.news_ids}, provider={request_body.provider}, model={request_body.model}\")\n        \n        if not request_body.news_ids:\n            raise HTTPException(status_code=400, detail=\"news_ids cannot be empty\")\n        \n        analysis_service = get_analysis_service()\n        \n        # 准备LLM provider参数\n        llm_provider = request_body.provider\n        llm_model = request_body.model\n        \n        # 定义单个新闻的分析任务\n        # 注意：每个任务需要独立的数据库会话，因为SQLAlchemy异步会话不支持并发操作\n        async def analyze_single_news(news_id: int) -> AnalysisResponse:\n            # 为每个任务创建独立的数据库会话\n            async with AsyncSessionLocal() as task_db:\n                try:\n                    result = await analysis_service.analyze_news(\n                        news_id,\n                        task_db,\n                        llm_provider=llm_provider,\n                        llm_model=llm_model\n                    )\n                    \n                    # 提交事务\n                    await task_db.commit()\n                    \n                    if result.get(\"success\"):\n                        return AnalysisResponse(\n                            success=True,\n                            analysis_id=result.get(\"analysis_id\"),\n                            news_id=news_id,\n                            sentiment=result.get(\"sentiment\"),\n                            sentiment_score=result.get(\"sentiment_score\"),\n                            confidence=result.get(\"confidence\"),\n                            summary=result.get(\"summary\"),\n                            execution_time=result.get(\"execution_time\"),\n                        )\n                    else:\n                        return AnalysisResponse(\n                            success=False,\n                            news_id=news_id,\n                            error=result.get(\"error\")\n                        )\n                except Exception as e:\n                    # 发生错误时回滚事务\n                    await task_db.rollback()\n                    logger.error(f\"Failed to analyze news {news_id}: {e}\", exc_info=True)\n                    return AnalysisResponse(\n                        success=False,\n                        news_id=news_id,\n                        error=str(e)\n                    )\n        \n        # 并发执行所有分析任务\n        logger.info(f\"Starting batch analysis for {len(request_body.news_ids)} news items\")\n        results = await asyncio.gather(*[analyze_single_news(news_id) for news_id in request_body.news_ids])\n        \n        # 统计结果\n        success_count = sum(1 for r in results if r.success)\n        failed_count = len(results) - success_count\n        \n        logger.info(f\"Batch analysis completed: {success_count} succeeded, {failed_count} failed\")\n        \n        return BatchAnalyzeResponse(\n            success=True,\n            message=f\"批量分析完成：成功 {success_count} 条，失败 {failed_count} 条\",\n            total_count=len(request_body.news_ids),\n            success_count=success_count,\n            failed_count=failed_count,\n            results=results\n        )\n    \n    except HTTPException:\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to batch analyze news: {e}\", exc_info=True)\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/news/{news_id}\", response_model=AnalysisResponse)\nasync def analyze_news(\n    news_id: int,\n    request: Optional[AnalysisRequest] = Body(None),\n    background_tasks: BackgroundTasks = None,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    触发新闻分析任务\n    \n    - **news_id**: 新闻ID\n    - **provider**: LLM提供商（可选）\n    - **model**: 模型名称（可选）\n    \n    Returns:\n        分析任务状态\n    \"\"\"\n    try:\n        analysis_service = get_analysis_service()\n        \n        # 准备LLM provider参数\n        llm_provider = None\n        llm_model = None\n        if request:\n            llm_provider = request.provider\n            llm_model = request.model\n            if llm_provider or llm_model:\n                logger.info(f\"Using custom LLM config: provider={llm_provider}, model={llm_model}\")\n        \n        # 执行分析（同步，便于快速验证MVP）\n        # 在生产环境中，应该使用后台任务\n        result = await analysis_service.analyze_news(\n            news_id, \n            db, \n            llm_provider=llm_provider,\n            llm_model=llm_model\n        )\n        \n        if result.get(\"success\"):\n            return AnalysisResponse(\n                success=True,\n                analysis_id=result.get(\"analysis_id\"),\n                news_id=news_id,\n                sentiment=result.get(\"sentiment\"),\n                sentiment_score=result.get(\"sentiment_score\"),\n                confidence=result.get(\"confidence\"),\n                summary=result.get(\"summary\"),\n                execution_time=result.get(\"execution_time\"),\n            )\n        else:\n            return AnalysisResponse(\n                success=False,\n                news_id=news_id,\n                error=result.get(\"error\")\n            )\n    \n    except Exception as e:\n        logger.error(f\"Failed to analyze news {news_id}: {e}\", exc_info=True)\n        return AnalysisResponse(\n            success=False,\n            news_id=news_id,\n            error=str(e)\n        )\n\n\n@router.get(\"/news/{news_id}/all\", response_model=List[AnalysisDetailResponse])\nasync def get_news_analyses(\n    news_id: int,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取指定新闻的所有分析结果\n    \n    - **news_id**: 新闻ID\n    \"\"\"\n    try:\n        analysis_service = get_analysis_service()\n        results = await analysis_service.get_analyses_by_news_id(news_id, db)\n        \n        return [AnalysisDetailResponse(**result) for result in results]\n    \n    except Exception as e:\n        logger.error(f\"Failed to get analyses for news {news_id}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/{analysis_id}\", response_model=AnalysisDetailResponse)\nasync def get_analysis_detail(\n    analysis_id: int,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取分析结果详情\n    \n    - **analysis_id**: 分析ID\n    \"\"\"\n    try:\n        analysis_service = get_analysis_service()\n        result = await analysis_service.get_analysis_by_id(analysis_id, db)\n        \n        if not result:\n            raise HTTPException(status_code=404, detail=\"Analysis not found\")\n        \n        return AnalysisDetailResponse(**result)\n    \n    except HTTPException:\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to get analysis {analysis_id}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n"
  },
  {
    "path": "backend/app/api/v1/debug.py",
    "content": "\"\"\"\n调试 API - 用于测试爬虫和内容提取\n\"\"\"\nimport re\nimport logging\nfrom typing import Optional\nfrom fastapi import APIRouter, HTTPException\nfrom pydantic import BaseModel\nimport requests\nfrom bs4 import BeautifulSoup\n\nlogger = logging.getLogger(__name__)\nrouter = APIRouter()\n\n\nclass CrawlRequest(BaseModel):\n    url: str\n    return_html: bool = True  # 是否返回原始 HTML\n\n\nclass CrawlResponse(BaseModel):\n    url: str\n    title: Optional[str] = None\n    content: Optional[str] = None\n    content_length: int = 0\n    html_length: int = 0\n    raw_html: Optional[str] = None  # 原始 HTML（可选）\n    debug_info: dict = {}\n\n\ndef extract_chinese_ratio(text: str) -> float:\n    \"\"\"计算中文字符比例\"\"\"\n    pattern = re.compile(r'[\\u4e00-\\u9fa5]+')\n    chinese_chars = pattern.findall(text)\n    chinese_count = sum(len(chars) for chars in chinese_chars)\n    total_count = len(text)\n    return chinese_count / total_count if total_count > 0 else 0\n\n\ndef clean_text(text: str) -> str:\n    \"\"\"清理文本\"\"\"\n    text = re.sub(r'<[^>]+>', '', text)\n    text = text.replace('\\u3000', ' ')\n    text = ' '.join(text.split())\n    return text.strip()\n\n\ndef is_noise_text(text: str) -> bool:\n    \"\"\"判断是否为噪音文本\"\"\"\n    noise_patterns = [\n        r'^责任编辑',\n        r'^编辑[:：]',\n        r'^来源[:：]',\n        r'^声明[:：]',\n        r'^免责声明',\n        r'^版权',\n        r'^copyright',\n        r'^点击进入',\n        r'^相关阅读',\n        r'^延伸阅读',\n        r'登录新浪财经APP',\n        r'搜索【信披】',\n        r'缩小字体',\n        r'放大字体',\n        r'收藏',\n        r'微博',\n        r'微信',\n        r'分享',\n        r'腾讯QQ',\n    ]\n    text_lower = text.lower().strip()\n    for pattern in noise_patterns:\n        if re.search(pattern, text_lower, re.I):\n            return True\n    return False\n\n\ndef extract_content_from_html(html: str, url: str) -> tuple[str, str, dict]:\n    \"\"\"\n    从 HTML 中提取内容\n    返回: (title, content, debug_info)\n    \"\"\"\n    soup = BeautifulSoup(html, 'lxml')\n    debug_info = {\n        \"selectors_tried\": [],\n        \"selector_matched\": None,\n        \"total_lines_raw\": 0,\n        \"lines_kept\": 0,\n        \"lines_filtered\": 0,\n    }\n    \n    # 提取标题\n    title = \"\"\n    title_tag = soup.find('h1', class_='main-title') or soup.find('h1') or soup.find('title')\n    if title_tag:\n        title = title_tag.get_text().strip()\n        title = re.sub(r'[-_].*?(新浪|财经|网)', '', title).strip()\n    \n    # 内容选择器（按优先级）\n    content_selectors = [\n        {'id': 'artibody'},\n        {'class': 'article-content'},\n        {'class': 'article'},\n        {'id': 'article'},\n        {'class': 'content'},\n        {'class': 'news-content'},\n    ]\n    \n    for selector in content_selectors:\n        debug_info[\"selectors_tried\"].append(str(selector))\n        content_div = soup.find(['div', 'article'], selector)\n        \n        if content_div:\n            debug_info[\"selector_matched\"] = str(selector)\n            \n            # 移除噪音元素\n            for tag in content_div.find_all(['script', 'style', 'iframe', 'ins', 'select', 'input', 'button', 'form']):\n                tag.decompose()\n            for ad in content_div.find_all(class_=re.compile(r'ad|banner|share|otherContent|recommend|app-guide', re.I)):\n                ad.decompose()\n            \n            # 获取全文\n            full_text = content_div.get_text(separator='\\n', strip=True)\n            lines = full_text.split('\\n')\n            debug_info[\"total_lines_raw\"] = len(lines)\n            \n            article_parts = []\n            for line in lines:\n                line = line.strip()\n                if not line or len(line) < 2:\n                    continue\n                \n                chinese_ratio = extract_chinese_ratio(line)\n                if chinese_ratio > 0.05 or len(line) > 20:\n                    clean_line = clean_text(line)\n                    if clean_line and not is_noise_text(clean_line):\n                        article_parts.append(clean_line)\n                        debug_info[\"lines_kept\"] += 1\n                    else:\n                        debug_info[\"lines_filtered\"] += 1\n                else:\n                    debug_info[\"lines_filtered\"] += 1\n            \n            content = '\\n'.join(article_parts)\n            return title, content, debug_info\n    \n    debug_info[\"selector_matched\"] = \"fallback (body)\"\n    # 后备：直接取 body\n    body = soup.find('body')\n    if body:\n        content = body.get_text(separator='\\n', strip=True)\n        return title, content[:5000], debug_info  # 限制长度\n    \n    return title, \"\", debug_info\n\n\n@router.post(\"/crawl\", response_model=CrawlResponse)\nasync def debug_crawl(request: CrawlRequest):\n    \"\"\"\n    实时爬取指定 URL 并返回内容（用于调试）\n    \n    - **url**: 要爬取的新闻 URL\n    - **return_html**: 是否返回原始 HTML（默认 True）\n    \"\"\"\n    try:\n        headers = {\n            \"User-Agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\"\n        }\n        \n        response = requests.get(request.url, headers=headers, timeout=30)\n        response.encoding = 'utf-8'\n        html = response.text\n        \n        title, content, debug_info = extract_content_from_html(html, request.url)\n        \n        return CrawlResponse(\n            url=request.url,\n            title=title,\n            content=content,\n            content_length=len(content),\n            html_length=len(html),\n            raw_html=html if request.return_html else None,\n            debug_info=debug_info,\n        )\n        \n    except requests.RequestException as e:\n        raise HTTPException(status_code=500, detail=f\"爬取失败: {str(e)}\")\n    except Exception as e:\n        logger.error(f\"Debug crawl error: {e}\")\n        raise HTTPException(status_code=500, detail=f\"解析失败: {str(e)}\")\n\n\n@router.get(\"/test-sina\")\nasync def test_sina_crawl():\n    \"\"\"\n    测试新浪财经爬取（使用固定 URL）\n    \"\"\"\n    test_url = \"https://finance.sina.com.cn/jjxw/2024-12-28/doc-ineayfsz5142013.shtml\"\n    request = CrawlRequest(url=test_url, return_html=False)\n    return await debug_crawl(request)\n\n"
  },
  {
    "path": "backend/app/api/v1/knowledge_graph.py",
    "content": "\"\"\"\n知识图谱管理 API\n提供图谱的查询、构建、更新、删除接口\n\"\"\"\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom fastapi import APIRouter, HTTPException, BackgroundTasks\nfrom pydantic import BaseModel, Field\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n\n# ============ Pydantic 模型 ============\n\nclass CompanyGraphResponse(BaseModel):\n    \"\"\"公司图谱响应\"\"\"\n    stock_code: str\n    stock_name: str\n    graph_exists: bool\n    stats: Optional[Dict[str, int]] = None\n    name_variants: List[str] = Field(default_factory=list)\n    businesses: List[Dict[str, Any]] = Field(default_factory=list)\n    industries: List[str] = Field(default_factory=list)\n    products: List[str] = Field(default_factory=list)\n    concepts: List[str] = Field(default_factory=list)\n    search_queries: List[str] = Field(default_factory=list, description=\"生成的检索查询\")\n\n\nclass BuildGraphRequest(BaseModel):\n    \"\"\"构建图谱请求\"\"\"\n    force_rebuild: bool = Field(default=False, description=\"是否强制重建\")\n\n\nclass BuildGraphResponse(BaseModel):\n    \"\"\"构建图谱响应\"\"\"\n    success: bool\n    message: str\n    graph_stats: Optional[Dict[str, int]] = None\n\n\nclass UpdateGraphRequest(BaseModel):\n    \"\"\"更新图谱请求\"\"\"\n    update_from_news: bool = Field(default=True, description=\"是否从新闻更新\")\n    news_limit: int = Field(default=20, description=\"分析的新闻数量\")\n\n\nclass GraphStatsResponse(BaseModel):\n    \"\"\"图谱统计响应\"\"\"\n    total_companies: int\n    total_nodes: int\n    total_relationships: int\n    companies: List[Dict[str, str]] = Field(default_factory=list)\n\n\n# ============ API 路由 ============\n\n@router.get(\"/{stock_code}\", response_model=CompanyGraphResponse)\nasync def get_company_graph(stock_code: str):\n    \"\"\"\n    获取公司知识图谱\n    \n    - **stock_code**: 股票代码\n    \"\"\"\n    try:\n        from ...knowledge.graph_service import get_graph_service\n        \n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        graph_service = get_graph_service()\n        \n        # 获取图谱\n        graph = graph_service.get_company_graph(code)\n        \n        if not graph:\n            return CompanyGraphResponse(\n                stock_code=code,\n                stock_name=stock_code,\n                graph_exists=False\n            )\n        \n        # 获取统计信息\n        stats = graph_service.get_graph_stats(code)\n        \n        # 获取检索关键词\n        keyword_set = graph_service.get_search_keywords(code)\n        search_queries = keyword_set.combined_queries if keyword_set else []\n        \n        return CompanyGraphResponse(\n            stock_code=code,\n            stock_name=graph.company.stock_name,\n            graph_exists=True,\n            stats=stats,\n            name_variants=[v.variant for v in graph.name_variants],\n            businesses=[\n                {\n                    \"name\": b.business_name,\n                    \"type\": b.business_type,\n                    \"status\": b.status,\n                    \"description\": b.description\n                }\n                for b in graph.businesses\n            ],\n            industries=[i.industry_name for i in graph.industries],\n            products=[p.product_name for p in graph.products],\n            concepts=[c.concept_name for c in graph.concepts],\n            search_queries=search_queries\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to get company graph for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/{stock_code}/build\", response_model=BuildGraphResponse)\nasync def build_company_graph(\n    stock_code: str,\n    request: BuildGraphRequest,\n    background_tasks: BackgroundTasks\n):\n    \"\"\"\n    构建或重建公司知识图谱\n    \n    - **stock_code**: 股票代码\n    - **force_rebuild**: 是否强制重建（删除现有图谱）\n    \"\"\"\n    try:\n        from ...knowledge.graph_service import get_graph_service\n        from ...knowledge.knowledge_extractor import (\n            create_knowledge_extractor,\n            AkshareKnowledgeExtractor\n        )\n        \n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        graph_service = get_graph_service()\n        \n        # 检查是否已存在\n        existing = graph_service.get_company_graph(code)\n        \n        if existing and not request.force_rebuild:\n            return BuildGraphResponse(\n                success=False,\n                message=f\"图谱已存在，如需重建请设置 force_rebuild=true\",\n                graph_stats=graph_service.get_graph_stats(code)\n            )\n        \n        # 强制重建：先删除\n        if existing and request.force_rebuild:\n            graph_service.delete_company_graph(code)\n            logger.info(f\"已删除现有图谱: {code}\")\n        \n        # 从 akshare 获取信息\n        akshare_info = AkshareKnowledgeExtractor.extract_company_info(code)\n        \n        if not akshare_info:\n            return BuildGraphResponse(\n                success=False,\n                message=f\"无法从 akshare 获取公司信息: {code}\"\n            )\n        \n        # 获取股票名称\n        stock_name = akshare_info.get('raw_data', {}).get('股票简称', code)\n        \n        # 使用 LLM 提取详细信息\n        extractor = create_knowledge_extractor()\n        \n        # 在后台任务中执行（避免阻塞）\n        import asyncio\n        graph = await extractor.extract_from_akshare(code, stock_name, akshare_info)\n        \n        # 构建图谱\n        success = graph_service.build_company_graph(graph)\n        \n        if success:\n            stats = graph_service.get_graph_stats(code)\n            return BuildGraphResponse(\n                success=True,\n                message=f\"图谱构建成功: {stock_name}\",\n                graph_stats=stats\n            )\n        else:\n            return BuildGraphResponse(\n                success=False,\n                message=\"图谱构建失败\"\n            )\n    \n    except Exception as e:\n        logger.error(f\"Failed to build graph for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/{stock_code}/update\", response_model=BuildGraphResponse)\nasync def update_company_graph(\n    stock_code: str,\n    request: UpdateGraphRequest\n):\n    \"\"\"\n    更新公司知识图谱\n    \n    - **stock_code**: 股票代码\n    - **update_from_news**: 是否从新闻更新\n    - **news_limit**: 分析的新闻数量\n    \"\"\"\n    try:\n        from ...knowledge.graph_service import get_graph_service\n        from ...knowledge.knowledge_extractor import create_knowledge_extractor\n        from ...core.database import get_db\n        from sqlalchemy.ext.asyncio import AsyncSession\n        from ...models.news import News\n        from sqlalchemy import select, text\n        \n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        pure_code = code[2:] if code.startswith((\"SH\", \"SZ\")) else code\n        \n        graph_service = get_graph_service()\n        \n        # 检查图谱是否存在\n        if not graph_service.get_company_graph(code):\n            return BuildGraphResponse(\n                success=False,\n                message=\"图谱不存在，请先构建图谱\"\n            )\n        \n        if request.update_from_news:\n            # 从数据库获取最新新闻\n            from ...core.database import get_sync_db_session\n            db = get_sync_db_session()\n            \n            recent_news = db.execute(\n                text(\"\"\"\n                    SELECT title, content FROM news \n                    WHERE stock_codes @> ARRAY[:code]::varchar[] \n                    ORDER BY publish_time DESC LIMIT :limit\n                \"\"\").bindparams(code=pure_code, limit=request.news_limit)\n            ).fetchall()\n            \n            if not recent_news:\n                return BuildGraphResponse(\n                    success=False,\n                    message=\"没有可用的新闻数据\"\n                )\n            \n            news_data = [\n                {\"title\": n[0], \"content\": n[1]}\n                for n in recent_news\n            ]\n            \n            # 提取信息\n            extractor = create_knowledge_extractor()\n            extracted_info = await extractor.extract_from_news(code, \"\", news_data)\n            \n            # 更新图谱\n            if any(extracted_info.values()):\n                success = graph_service.update_from_news(code, \"\", extracted_info)\n                \n                if success:\n                    stats = graph_service.get_graph_stats(code)\n                    return BuildGraphResponse(\n                        success=True,\n                        message=f\"图谱已更新: 新增业务{len(extracted_info.get('new_businesses', []))}个, 概念{len(extracted_info.get('new_concepts', []))}个\",\n                        graph_stats=stats\n                    )\n                else:\n                    return BuildGraphResponse(\n                        success=False,\n                        message=\"图谱更新失败\"\n                    )\n            else:\n                return BuildGraphResponse(\n                    success=True,\n                    message=\"未提取到新信息\",\n                    graph_stats=graph_service.get_graph_stats(code)\n                )\n        \n        return BuildGraphResponse(\n            success=False,\n            message=\"未指定更新方式\"\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to update graph for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.delete(\"/{stock_code}\")\nasync def delete_company_graph(stock_code: str):\n    \"\"\"\n    删除公司知识图谱\n    \n    - **stock_code**: 股票代码\n    \"\"\"\n    try:\n        from ...knowledge.graph_service import get_graph_service\n        \n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        graph_service = get_graph_service()\n        success = graph_service.delete_company_graph(code)\n        \n        if success:\n            return {\"success\": True, \"message\": f\"图谱已删除: {code}\"}\n        else:\n            return {\"success\": False, \"message\": \"删除失败\"}\n    \n    except Exception as e:\n        logger.error(f\"Failed to delete graph for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/\", response_model=GraphStatsResponse)\nasync def get_graph_stats():\n    \"\"\"\n    获取所有图谱统计信息\n    \"\"\"\n    try:\n        from ...knowledge.graph_service import get_graph_service\n        \n        graph_service = get_graph_service()\n        companies = graph_service.list_all_companies()\n        \n        # 获取总体统计\n        total_companies = len(companies)\n        \n        # 查询总节点数和关系数（简化版）\n        return GraphStatsResponse(\n            total_companies=total_companies,\n            total_nodes=total_companies * 10,  # 估算\n            total_relationships=total_companies * 15,  # 估算\n            companies=companies\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to get graph stats: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n"
  },
  {
    "path": "backend/app/api/v1/llm_config.py",
    "content": "\"\"\"\nLLM 配置 API 路由\n返回可用的 LLM 厂商和模型列表\n\"\"\"\nimport logging\nfrom typing import List, Dict, Optional\nfrom fastapi import APIRouter\nfrom pydantic import BaseModel, Field\n\nfrom ...core.config import settings\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n\nclass ModelInfo(BaseModel):\n    \"\"\"模型信息\"\"\"\n    value: str = Field(..., description=\"模型标识\")\n    label: str = Field(..., description=\"模型显示名称\")\n    description: str = Field(default=\"\", description=\"模型描述\")\n\n\nclass ProviderInfo(BaseModel):\n    \"\"\"厂商信息\"\"\"\n    value: str = Field(..., description=\"厂商标识\")\n    label: str = Field(..., description=\"厂商显示名称\")\n    icon: str = Field(..., description=\"厂商图标\")\n    models: List[ModelInfo] = Field(..., description=\"可用模型列表\")\n    has_api_key: bool = Field(..., description=\"是否已配置API Key\")\n\n\nclass LLMConfigResponse(BaseModel):\n    \"\"\"LLM 配置响应\"\"\"\n    default_provider: str = Field(..., description=\"默认厂商\")\n    default_model: str = Field(..., description=\"默认模型\")\n    providers: List[ProviderInfo] = Field(..., description=\"可用厂商列表\")\n\n\ndef parse_models(models_str: str, provider_label: str) -> List[ModelInfo]:\n    \"\"\"\n    解析逗号分隔的模型字符串\n    \n    Args:\n        models_str: 逗号分隔的模型字符串\n        provider_label: 厂商显示名称\n        \n    Returns:\n        模型信息列表\n    \"\"\"\n    if not models_str:\n        return []\n    \n    models = []\n    for model in models_str.split(','):\n        model = model.strip()\n        if model:\n            models.append(ModelInfo(\n                value=model,\n                label=model,\n                description=f\"{provider_label} 模型\"\n            ))\n    return models\n\n\n@router.get(\"/config\", response_model=LLMConfigResponse)\nasync def get_llm_config():\n    \"\"\"\n    获取 LLM 配置信息\n    \n    返回所有可用的厂商和模型列表，以及是否已配置 API Key\n    \"\"\"\n    try:\n        providers = []\n        \n        # 1. 百炼\n        if settings.BAILIAN_MODELS:\n            providers.append(ProviderInfo(\n                value=\"bailian\",\n                label=\"百炼（阿里云）\",\n                icon=\"📦\",\n                models=parse_models(settings.BAILIAN_MODELS, \"百炼\"),\n                has_api_key=bool(settings.DASHSCOPE_API_KEY or settings.BAILIAN_API_KEY)\n            ))\n        \n        # 2. OpenAI\n        if settings.OPENAI_MODELS:\n            providers.append(ProviderInfo(\n                value=\"openai\",\n                label=\"OpenAI\",\n                icon=\"🤖\",\n                models=parse_models(settings.OPENAI_MODELS, \"OpenAI\"),\n                has_api_key=bool(settings.OPENAI_API_KEY)\n            ))\n        \n        # 3. DeepSeek\n        if settings.DEEPSEEK_MODELS:\n            providers.append(ProviderInfo(\n                value=\"deepseek\",\n                label=\"DeepSeek\",\n                icon=\"🧠\",\n                models=parse_models(settings.DEEPSEEK_MODELS, \"DeepSeek\"),\n                has_api_key=bool(settings.DEEPSEEK_API_KEY)\n            ))\n        \n        # 4. Kimi\n        if settings.MOONSHOT_MODELS:\n            providers.append(ProviderInfo(\n                value=\"kimi\",\n                label=\"Kimi (Moonshot)\",\n                icon=\"🌙\",\n                models=parse_models(settings.MOONSHOT_MODELS, \"Kimi\"),\n                has_api_key=bool(settings.MOONSHOT_API_KEY)\n            ))\n        \n        # 5. 智谱\n        if settings.ZHIPU_MODELS:\n            providers.append(ProviderInfo(\n                value=\"zhipu\",\n                label=\"智谱\",\n                icon=\"🔮\",\n                models=parse_models(settings.ZHIPU_MODELS, \"智谱\"),\n                has_api_key=bool(settings.ZHIPU_API_KEY)\n            ))\n        \n        return LLMConfigResponse(\n            default_provider=settings.LLM_PROVIDER,\n            default_model=settings.LLM_MODEL,\n            providers=providers\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to get LLM config: {e}\", exc_info=True)\n        # 返回默认配置\n        return LLMConfigResponse(\n            default_provider=\"bailian\",\n            default_model=\"qwen-plus\",\n            providers=[]\n        )\n\n"
  },
  {
    "path": "backend/app/api/v1/news.py",
    "content": "\"\"\"\n新闻管理 API 路由\n\"\"\"\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime, timedelta\nfrom fastapi import APIRouter, Depends, HTTPException, BackgroundTasks, Query\nfrom pydantic import BaseModel, Field\nfrom sqlalchemy.ext.asyncio import AsyncSession\nfrom sqlalchemy import select, desc\n\nfrom ...core.database import get_db\nfrom ...models.news import News\nfrom ...tools import SinaCrawlerTool\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n\n# Pydantic 模型\nclass NewsResponse(BaseModel):\n    \"\"\"新闻响应模型\"\"\"\n    model_config = {\"from_attributes\": True}\n    \n    id: int\n    title: str\n    content: str\n    url: str\n    source: str\n    publish_time: Optional[str] = None\n    stock_codes: Optional[List[str]] = None\n    sentiment_score: Optional[float] = None\n    created_at: str\n\n\nclass CrawlRequest(BaseModel):\n    \"\"\"爬取请求模型\"\"\"\n    source: str = Field(default=\"sina\", description=\"新闻源（sina, jrj, cnstock）\")\n    start_page: int = Field(default=1, ge=1, description=\"起始页码\")\n    end_page: int = Field(default=1, ge=1, le=10, description=\"结束页码\")\n\n\nclass CrawlResponse(BaseModel):\n    \"\"\"爬取响应模型\"\"\"\n    success: bool\n    message: str\n    crawled_count: int\n    saved_count: int\n    source: str\n\n\nclass BatchDeleteRequest(BaseModel):\n    \"\"\"批量删除请求模型\"\"\"\n    news_ids: List[int] = Field(..., description=\"要删除的新闻ID列表\")\n\n\nclass BatchDeleteResponse(BaseModel):\n    \"\"\"批量删除响应模型\"\"\"\n    success: bool\n    message: str\n    deleted_count: int\n\n\n# 后台任务：爬取并保存新闻（使用同步方式）\ndef crawl_and_save_news_sync(\n    source: str,\n    start_page: int,\n    end_page: int\n):\n    \"\"\"\n    后台任务：爬取新闻并保存到数据库（同步版本）\n    \"\"\"\n    from sqlalchemy import create_engine\n    from sqlalchemy.orm import Session\n    from ...core.config import settings\n    \n    try:\n        logger.info(f\"Starting crawl task: {source}, pages {start_page}-{end_page}\")\n        \n        # 创建爬虫\n        if source == \"sina\":\n            crawler = SinaCrawlerTool()\n        else:\n            logger.error(f\"Unsupported source: {source}\")\n            return\n        \n        # 执行爬取\n        news_list = crawler.crawl(start_page, end_page)\n        logger.info(f\"Crawled {len(news_list)} news items\")\n        \n        # 创建新的数据库连接（同步）\n        engine = create_engine(settings.SYNC_DATABASE_URL)\n        db = Session(engine)\n        \n        try:\n            # 时间过滤：只保存最近7天内的新闻（避免保存太旧的新闻）\n            cutoff_time = datetime.utcnow() - timedelta(days=7)\n            \n            # 保存到数据库\n            saved_count = 0\n            skipped_old_count = 0\n            skipped_existing_count = 0\n            \n            for news_item in news_list:\n                # 时间过滤：跳过太旧的新闻\n                if news_item.publish_time and news_item.publish_time < cutoff_time:\n                    skipped_old_count += 1\n                    logger.debug(f\"Skipping old news: {news_item.title[:50]} (published: {news_item.publish_time})\")\n                    continue\n                \n                # 检查URL是否已存在\n                existing = db.execute(\n                    select(News).where(News.url == news_item.url)\n                ).scalar_one_or_none()\n                \n                if existing:\n                    skipped_existing_count += 1\n                    logger.debug(f\"News already exists: {news_item.url}\")\n                    continue\n                \n                # 创建新记录\n                news = News(\n                    title=news_item.title,\n                    content=news_item.content,\n                    url=news_item.url,\n                    source=news_item.source,\n                    publish_time=news_item.publish_time,\n                    author=news_item.author,\n                    keywords=news_item.keywords,\n                    stock_codes=news_item.stock_codes,\n                    # summary 字段已移除，content 包含完整内容\n                )\n                \n                db.add(news)\n                saved_count += 1\n                logger.info(f\"Saved new news: {news_item.title[:50]} (published: {news_item.publish_time})\")\n            \n            db.commit()\n            logger.info(\n                f\"Crawl summary: crawled={len(news_list)}, \"\n                f\"saved={saved_count}, \"\n                f\"skipped_old={skipped_old_count}, \"\n                f\"skipped_existing={skipped_existing_count}\"\n            )\n        \n        finally:\n            db.close()\n    \n    except Exception as e:\n        logger.error(f\"Crawl task failed: {e}\", exc_info=True)\n\n\n# API 端点\n@router.post(\"/crawl\", response_model=CrawlResponse)\nasync def crawl_news(\n    request: CrawlRequest,\n    background_tasks: BackgroundTasks\n):\n    \"\"\"\n    触发新闻爬取任务（异步后台任务）\n    \n    - **source**: 新闻源（sina, jrj, cnstock）\n    - **start_page**: 起始页码\n    - **end_page**: 结束页码\n    \n    注意：这是简单的后台任务版本。如需更强大的任务管理，\n    请使用 POST /api/v1/tasks/cold-start 触发 Celery 任务。\n    \"\"\"\n    # 添加到后台任务（同步版本）\n    background_tasks.add_task(\n        crawl_and_save_news_sync,\n        request.source,\n        request.start_page,\n        request.end_page\n    )\n    \n    logger.info(f\"Background crawl task added: {request.source}, pages {request.start_page}-{request.end_page}\")\n    \n    return CrawlResponse(\n        success=True,\n        message=f\"Crawl task started for {request.source}, pages {request.start_page}-{request.end_page}\",\n        crawled_count=0,  # 后台任务还未完成\n        saved_count=0,\n        source=request.source\n    )\n\n\n@router.post(\"/refresh\", response_model=CrawlResponse)\nasync def refresh_news(\n    source: str = Query(\"sina\", description=\"新闻源\"),\n    pages: int = Query(1, ge=1, le=5, description=\"爬取页数\"),\n    background_tasks: BackgroundTasks = None\n):\n    \"\"\"\n    刷新新闻（前端刷新按钮调用）\n    \n    - **source**: 新闻源（sina, tencent, nbd, eastmoney, yicai, 163）\n    - **pages**: 爬取页数（1-5）\n    \"\"\"\n    background_tasks.add_task(\n        crawl_and_save_news_sync,\n        source,\n        1,  # start_page\n        pages  # end_page\n    )\n    \n    logger.info(f\"Refresh task started: {source}, {pages} pages\")\n    \n    return CrawlResponse(\n        success=True,\n        message=f\"刷新任务已启动：{source}，{pages} 页\",\n        crawled_count=0,\n        saved_count=0,\n        source=source\n    )\n\n\n@router.get(\"/\", response_model=List[NewsResponse])\nasync def get_news_list(\n    skip: int = Query(0, ge=0, description=\"跳过的记录数\"),\n    limit: int = Query(20, ge=1, le=100, description=\"返回的记录数\"),\n    source: Optional[str] = Query(None, description=\"按来源筛选\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取新闻列表\n    \n    - **skip**: 跳过的记录数（分页）\n    - **limit**: 返回的记录数\n    - **source**: 按来源筛选（可选）\n    \"\"\"\n    try:\n        query = select(News).order_by(desc(News.created_at))\n        \n        if source:\n            query = query.where(News.source == source)\n        \n        query = query.offset(skip).limit(limit)\n        \n        result = await db.execute(query)\n        news_list = result.scalars().all()\n        \n        return [NewsResponse(**news.to_dict()) for news in news_list]\n    \n    except Exception as e:\n        logger.error(f\"Failed to get news list: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/latest\", response_model=List[NewsResponse])\nasync def get_latest_news(\n    limit: int = Query(20, ge=1, le=500, description=\"返回的记录数\"),\n    source: Optional[str] = Query(None, description=\"按来源筛选\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取最新新闻（按发布时间排序）\n    \n    - **limit**: 返回的记录数（最多500条）\n    - **source**: 按来源筛选（可选）\n    \"\"\"\n    try:\n        query = select(News).order_by(desc(News.publish_time))\n        \n        if source:\n            query = query.where(News.source == source)\n        \n        query = query.limit(limit)\n        \n        result = await db.execute(query)\n        news_list = result.scalars().all()\n        \n        return [NewsResponse(**news.to_dict()) for news in news_list]\n    \n    except Exception as e:\n        logger.error(f\"Failed to get latest news: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/{news_id}\", response_model=NewsResponse)\nasync def get_news_detail(\n    news_id: int,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取新闻详情\n    \n    - **news_id**: 新闻ID\n    \"\"\"\n    try:\n        result = await db.execute(\n            select(News).where(News.id == news_id)\n        )\n        news = result.scalar_one_or_none()\n        \n        if not news:\n            raise HTTPException(status_code=404, detail=\"News not found\")\n        \n        return NewsResponse(**news.to_dict())\n    \n    except HTTPException:\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to get news {news_id}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/batch/delete\", response_model=BatchDeleteResponse)\nasync def batch_delete_news(\n    request: BatchDeleteRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    批量删除新闻\n    \n    - **news_ids**: 要删除的新闻ID列表\n    \"\"\"\n    try:\n        if not request.news_ids:\n            raise HTTPException(status_code=400, detail=\"news_ids cannot be empty\")\n        \n        # 查询要删除的新闻\n        result = await db.execute(\n            select(News).where(News.id.in_(request.news_ids))\n        )\n        news_list = result.scalars().all()\n        \n        deleted_count = len(news_list)\n        \n        if deleted_count == 0:\n            return BatchDeleteResponse(\n                success=True,\n                message=\"No news found to delete\",\n                deleted_count=0\n            )\n        \n        # 批量删除\n        for news in news_list:\n            await db.delete(news)\n        \n        await db.commit()\n        \n        logger.info(f\"Batch deleted {deleted_count} news items: {request.news_ids}\")\n        \n        return BatchDeleteResponse(\n            success=True,\n            message=f\"Successfully deleted {deleted_count} news items\",\n            deleted_count=deleted_count\n        )\n    \n    except HTTPException:\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to batch delete news: {e}\")\n        await db.rollback()\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.delete(\"/{news_id}\")\nasync def delete_news(\n    news_id: int,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    删除新闻\n    \n    - **news_id**: 新闻ID\n    \"\"\"\n    try:\n        result = await db.execute(\n            select(News).where(News.id == news_id)\n        )\n        news = result.scalar_one_or_none()\n        \n        if not news:\n            raise HTTPException(status_code=404, detail=\"News not found\")\n        \n        await db.delete(news)\n        await db.commit()\n        \n        return {\"success\": True, \"message\": f\"News {news_id} deleted\"}\n    \n    except HTTPException:\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to delete news {news_id}: {e}\")\n        await db.rollback()\n        raise HTTPException(status_code=500, detail=str(e))\n\n"
  },
  {
    "path": "backend/app/api/v1/news_v2.py",
    "content": "\"\"\"\n新闻 API v2 - 使用新的 Financial Data Layer\n\n新功能:\n1. 多数据源支持：可指定 provider (sina, tencent, nbd...)\n2. 自动降级：一个源失败自动切换另一个\n3. 标准化数据：统一的 NewsData 格式\n4. 实时获取：直接从数据源获取，不经过数据库\n\n前端可通过对比 /api/v1/news (旧) vs /api/v1/news/v2 (新) 看到差异\n\"\"\"\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom fastapi import APIRouter, HTTPException, Query\nfrom pydantic import BaseModel, Field\n\nfrom ...financial import get_registry, NewsQueryParams\nfrom ...financial.tools import FinancialNewsTool, setup_default_providers\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n# 确保 Provider 已注册\nsetup_default_providers()\n\n\nclass NewsDataResponse(BaseModel):\n    \"\"\"标准化新闻响应（使用 NewsData 模型）\"\"\"\n    id: str\n    title: str\n    content: str\n    summary: Optional[str] = None\n    source: str\n    source_url: str\n    publish_time: datetime\n    stock_codes: List[str] = []\n    sentiment: Optional[str] = None\n    sentiment_score: Optional[float] = None\n\n\nclass FetchNewsResponse(BaseModel):\n    \"\"\"获取新闻响应\"\"\"\n    success: bool\n    count: int\n    provider: Optional[str] = None\n    available_providers: Optional[List[str]] = None\n    data: List[NewsDataResponse] = []\n    error: Optional[str] = None\n\n\nclass ProviderInfoResponse(BaseModel):\n    \"\"\"Provider 信息响应\"\"\"\n    name: str\n    display_name: str\n    description: str\n    supported_types: List[str]\n    priority: int\n\n\n@router.get(\"/fetch\", response_model=FetchNewsResponse)\nasync def fetch_news_realtime(\n    stock_codes: Optional[str] = Query(\n        None, \n        description=\"股票代码，多个用逗号分隔，如 '600519,000001'\"\n    ),\n    keywords: Optional[str] = Query(\n        None, \n        description=\"关键词，多个用逗号分隔\"\n    ),\n    limit: int = Query(\n        20, \n        ge=1, \n        le=100, \n        description=\"返回条数\"\n    ),\n    provider: Optional[str] = Query(\n        None, \n        description=\"指定数据源（sina, tencent, nbd），不指定则自动选择\"\n    )\n):\n    \"\"\"\n    实时获取新闻（使用新的 Provider-Fetcher 架构）\n    \n    特点:\n    - 直接从数据源获取，不经过数据库\n    - 支持指定数据源或自动选择\n    - 返回标准化的 NewsData 格式\n    \n    示例:\n    - GET /api/v1/news/v2/fetch?stock_codes=600519&limit=10\n    - GET /api/v1/news/v2/fetch?keywords=茅台,白酒&provider=sina\n    \"\"\"\n    tool = FinancialNewsTool()\n    \n    # 解析参数\n    stock_code_list = stock_codes.split(\",\") if stock_codes else None\n    keyword_list = keywords.split(\",\") if keywords else None\n    \n    try:\n        result = await tool.aexecute(\n            stock_codes=stock_code_list,\n            keywords=keyword_list,\n            limit=limit,\n            provider=provider\n        )\n        \n        if result[\"success\"]:\n            # 转换为响应格式\n            news_list = [\n                NewsDataResponse(\n                    id=item[\"id\"],\n                    title=item[\"title\"],\n                    content=item[\"content\"],\n                    summary=item.get(\"summary\"),\n                    source=item[\"source\"],\n                    source_url=item[\"source_url\"],\n                    publish_time=item[\"publish_time\"],\n                    stock_codes=item.get(\"stock_codes\", []),\n                    sentiment=item.get(\"sentiment\"),\n                    sentiment_score=item.get(\"sentiment_score\")\n                )\n                for item in result[\"data\"]\n            ]\n            \n            return FetchNewsResponse(\n                success=True,\n                count=result[\"count\"],\n                provider=result.get(\"provider\"),\n                data=news_list\n            )\n        else:\n            return FetchNewsResponse(\n                success=False,\n                count=0,\n                error=result.get(\"error\"),\n                available_providers=result.get(\"available_providers\", [])\n            )\n            \n    except Exception as e:\n        logger.exception(f\"Failed to fetch news: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/providers\", response_model=List[ProviderInfoResponse])\nasync def list_providers():\n    \"\"\"\n    列出所有可用的数据源 Provider\n    \n    返回:\n    - 每个 Provider 的名称、描述、支持的数据类型、优先级\n    \"\"\"\n    registry = get_registry()\n    providers = []\n    \n    for name in registry.list_providers():\n        provider = registry.get_provider(name)\n        if provider:\n            providers.append(ProviderInfoResponse(\n                name=provider.info.name,\n                display_name=provider.info.display_name,\n                description=provider.info.description,\n                supported_types=list(provider.fetchers.keys()),\n                priority=provider.info.priority\n            ))\n    \n    return providers\n\n\n@router.get(\"/providers/{provider_name}/test\")\nasync def test_provider(\n    provider_name: str,\n    limit: int = Query(5, ge=1, le=20)\n):\n    \"\"\"\n    测试指定的 Provider 是否工作正常\n    \n    返回:\n    - 测试结果和获取到的样本数据\n    \"\"\"\n    tool = FinancialNewsTool()\n    \n    try:\n        result = await tool.aexecute(\n            limit=limit,\n            provider=provider_name\n        )\n        \n        return {\n            \"provider\": provider_name,\n            \"success\": result[\"success\"],\n            \"count\": result.get(\"count\", 0),\n            \"error\": result.get(\"error\"),\n            \"sample_titles\": [\n                item[\"title\"][:50] for item in result.get(\"data\", [])[:3]\n            ]\n        }\n        \n    except Exception as e:\n        return {\n            \"provider\": provider_name,\n            \"success\": False,\n            \"error\": str(e)\n        }\n"
  },
  {
    "path": "backend/app/api/v1/stocks.py",
    "content": "\"\"\"\n股票分析 API 路由 - Phase 2\n提供个股分析、关联新闻、情感趋势等接口\n支持 akshare 真实股票数据\n\"\"\"\nimport logging\nfrom datetime import datetime, timedelta\nfrom typing import List, Optional\nfrom fastapi import APIRouter, Depends, HTTPException, Query\nfrom pydantic import BaseModel, Field\nfrom sqlalchemy.ext.asyncio import AsyncSession\nfrom sqlalchemy import select, func, and_, desc, text, or_\nfrom sqlalchemy.dialects.postgresql import ARRAY, array\n\nfrom ...core.database import get_db\nfrom ...models.news import News\nfrom ...models.stock import Stock\nfrom ...models.analysis import Analysis\nfrom ...models.crawl_task import CrawlTask, CrawlMode, TaskStatus\nfrom ...services.stock_data_service import stock_data_service\nfrom ...tasks.crawl_tasks import targeted_stock_crawl_task\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n\n# ============ Pydantic 模型 ============\n\nclass StockInfo(BaseModel):\n    \"\"\"股票信息\"\"\"\n    model_config = {\"from_attributes\": True}\n    \n    code: str\n    name: str\n    full_code: Optional[str] = None\n    industry: Optional[str] = None\n    market: Optional[str] = None\n    pe_ratio: Optional[float] = None\n    market_cap: Optional[float] = None\n\n\nclass StockNewsItem(BaseModel):\n    \"\"\"股票关联新闻\"\"\"\n    id: int\n    title: str\n    content: str\n    url: str\n    source: str\n    publish_time: Optional[str] = None\n    sentiment_score: Optional[float] = None\n    has_analysis: bool = False\n\n\nclass SentimentTrendPoint(BaseModel):\n    \"\"\"情感趋势数据点\"\"\"\n    date: str\n    avg_sentiment: float\n    news_count: int\n    positive_count: int\n    negative_count: int\n    neutral_count: int\n\n\nclass StockOverview(BaseModel):\n    \"\"\"股票概览数据\"\"\"\n    code: str\n    name: Optional[str] = None\n    total_news: int\n    analyzed_news: int\n    avg_sentiment: Optional[float] = None\n    recent_sentiment: Optional[float] = None  # 最近7天\n    sentiment_trend: str  # \"up\", \"down\", \"stable\"\n    last_news_time: Optional[str] = None\n\n\nclass KLineDataPoint(BaseModel):\n    \"\"\"K线数据点（akshare 真实数据）\"\"\"\n    timestamp: int  # 时间戳（毫秒）\n    date: str\n    open: float\n    high: float\n    low: float\n    close: float\n    volume: int\n    turnover: Optional[float] = None  # 成交额\n    change_percent: Optional[float] = None  # 涨跌幅\n    change_amount: Optional[float] = None  # 涨跌额\n    amplitude: Optional[float] = None  # 振幅\n    turnover_rate: Optional[float] = None  # 换手率\n\n\n# ============ API 端点 ============\n\n# ⚠️ 注意：具体路径的路由必须放在动态路由 /{stock_code} 之前！\n\nclass StockSearchResult(BaseModel):\n    \"\"\"股票搜索结果\"\"\"\n    code: str\n    name: str\n    full_code: str\n    market: Optional[str] = None\n    industry: Optional[str] = None\n\n\n@router.get(\"/search/realtime\", response_model=List[StockSearchResult])\nasync def search_stocks_realtime(\n    q: str = Query(..., min_length=1, description=\"搜索关键词（代码或名称）\"),\n    limit: int = Query(20, le=50),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    搜索股票（从数据库，支持代码和名称模糊匹配）\n    \n    - **q**: 搜索关键词（如 \"600519\" 或 \"茅台\"）\n    - **limit**: 返回数量限制\n    \"\"\"\n    try:\n        # 从数据库搜索\n        query = select(Stock).where(\n            (Stock.code.ilike(f\"%{q}%\")) | \n            (Stock.name.ilike(f\"%{q}%\")) |\n            (Stock.full_code.ilike(f\"%{q}%\"))\n        ).limit(limit)\n        \n        result = await db.execute(query)\n        stocks = result.scalars().all()\n        \n        if stocks:\n            return [\n                StockSearchResult(\n                    code=stock.code,\n                    name=stock.name,\n                    full_code=stock.full_code or f\"{'SH' if stock.code.startswith('6') else 'SZ'}{stock.code}\",\n                    market=stock.market,\n                    industry=stock.industry,\n                )\n                for stock in stocks\n            ]\n        \n        return []\n    \n    except Exception as e:\n        logger.error(f\"Failed to search stocks: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\nclass StockInitResponse(BaseModel):\n    \"\"\"股票数据初始化响应\"\"\"\n    success: bool\n    message: str\n    count: int = 0\n\n\n@router.post(\"/init\", response_model=StockInitResponse)\nasync def init_stock_data(\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    初始化股票数据（从 akshare 获取全部 A 股并存入数据库）\n    \"\"\"\n    try:\n        import akshare as ak\n        from datetime import datetime\n        from sqlalchemy import delete\n        \n        logger.info(\"Starting stock data initialization...\")\n        \n        df = ak.stock_zh_a_spot_em()\n        \n        if df is None or df.empty:\n            return StockInitResponse(success=False, message=\"Failed to fetch stocks from akshare\", count=0)\n        \n        await db.execute(delete(Stock))\n        \n        count = 0\n        for _, row in df.iterrows():\n            code = str(row['代码'])\n            name = str(row['名称'])\n            \n            if not code or not name or name in ['N/A', 'nan', '']:\n                continue\n            \n            if code.startswith('6'):\n                market = \"SH\"\n                full_code = f\"SH{code}\"\n            elif code.startswith('0') or code.startswith('3'):\n                market = \"SZ\"\n                full_code = f\"SZ{code}\"\n            else:\n                market = \"OTHER\"\n                full_code = code\n            \n            stock = Stock(\n                code=code,\n                name=name,\n                full_code=full_code,\n                market=market,\n                status=\"active\",\n                created_at=datetime.utcnow(),\n                updated_at=datetime.utcnow(),\n            )\n            db.add(stock)\n            count += 1\n        \n        await db.commit()\n        \n        return StockInitResponse(success=True, message=f\"Successfully initialized {count} stocks\", count=count)\n        \n    except ImportError:\n        return StockInitResponse(success=False, message=\"akshare not installed\", count=0)\n    except Exception as e:\n        logger.error(f\"Failed to init stocks: {e}\")\n        await db.rollback()\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/count\")\nasync def get_stock_count(db: AsyncSession = Depends(get_db)):\n    \"\"\"获取数据库中的股票数量\"\"\"\n    from sqlalchemy import func as sql_func\n    \n    result = await db.execute(select(sql_func.count(Stock.id)))\n    count = result.scalar() or 0\n    \n    return {\"count\": count, \"message\": f\"Database has {count} stocks\"}\n\n\n# ============ 动态路由（必须放在最后） ============\n\n@router.get(\"/{stock_code}\", response_model=StockOverview)\nasync def get_stock_overview(\n    stock_code: str,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取股票概览信息\n    \n    - **stock_code**: 股票代码（如 SH600519, 600519）\n    \"\"\"\n    # 标准化股票代码（支持带前缀和不带前缀）\n    code = stock_code.upper()\n    if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n        short_code = code[2:]\n    else:\n        short_code = code\n        code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n    \n    try:\n        # 查询股票基本信息\n        stock_query = select(Stock).where(\n            (Stock.code == short_code) | (Stock.full_code == code)\n        )\n        result = await db.execute(stock_query)\n        stock = result.scalar_one_or_none()\n        \n        stock_name = stock.name if stock else None\n        \n        # 统计关联新闻\n        # 使用 PostgreSQL 原生 ARRAY 查询语法\n        stock_codes_filter = text(\n            \"stock_codes @> ARRAY[:code1]::varchar[] OR stock_codes @> ARRAY[:code2]::varchar[]\"\n        ).bindparams(code1=short_code, code2=code)\n        \n        news_query = select(func.count(News.id)).where(stock_codes_filter)\n        result = await db.execute(news_query)\n        total_news = result.scalar() or 0\n        \n        # 已分析的新闻数量\n        analyzed_query = select(func.count(News.id)).where(\n            and_(\n                stock_codes_filter,\n                News.sentiment_score.isnot(None)\n            )\n        )\n        result = await db.execute(analyzed_query)\n        analyzed_news = result.scalar() or 0\n        \n        # 计算平均情感\n        avg_sentiment_query = select(func.avg(News.sentiment_score)).where(\n            and_(\n                stock_codes_filter,\n                News.sentiment_score.isnot(None)\n            )\n        )\n        result = await db.execute(avg_sentiment_query)\n        avg_sentiment = result.scalar()\n        \n        # 最近7天的平均情感\n        seven_days_ago = datetime.utcnow() - timedelta(days=7)\n        recent_query = select(func.avg(News.sentiment_score)).where(\n            and_(\n                stock_codes_filter,\n                News.sentiment_score.isnot(None),\n                News.publish_time >= seven_days_ago\n            )\n        )\n        result = await db.execute(recent_query)\n        recent_sentiment = result.scalar()\n        \n        # 判断趋势\n        if avg_sentiment is not None and recent_sentiment is not None:\n            diff = recent_sentiment - avg_sentiment\n            if diff > 0.1:\n                sentiment_trend = \"up\"\n            elif diff < -0.1:\n                sentiment_trend = \"down\"\n            else:\n                sentiment_trend = \"stable\"\n        else:\n            sentiment_trend = \"stable\"\n        \n        # 最新新闻时间\n        last_news_query = select(News.publish_time).where(\n            stock_codes_filter\n        ).order_by(desc(News.publish_time)).limit(1)\n        result = await db.execute(last_news_query)\n        last_news_time = result.scalar()\n        \n        return StockOverview(\n            code=code,\n            name=stock_name,\n            total_news=total_news,\n            analyzed_news=analyzed_news,\n            avg_sentiment=round(avg_sentiment, 3) if avg_sentiment else None,\n            recent_sentiment=round(recent_sentiment, 3) if recent_sentiment else None,\n            sentiment_trend=sentiment_trend,\n            last_news_time=last_news_time.isoformat() if last_news_time else None\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to get stock overview for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/{stock_code}/news\", response_model=List[StockNewsItem])\nasync def get_stock_news(\n    stock_code: str,\n    limit: int = Query(50, le=200),\n    offset: int = Query(0, ge=0),\n    sentiment: Optional[str] = Query(None, description=\"筛选情感: positive, negative, neutral\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取股票关联新闻列表\n    \n    - **stock_code**: 股票代码\n    - **limit**: 返回数量限制\n    - **offset**: 偏移量\n    - **sentiment**: 情感筛选\n    \"\"\"\n    # 标准化股票代码\n    code = stock_code.upper()\n    if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n        short_code = code[2:]\n    else:\n        short_code = code\n        code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n    \n    try:\n        # 构建查询 - 使用 PostgreSQL 原生 ARRAY 查询语法\n        stock_codes_filter = text(\n            \"stock_codes @> ARRAY[:code1]::varchar[] OR stock_codes @> ARRAY[:code2]::varchar[]\"\n        ).bindparams(code1=short_code, code2=code)\n        \n        query = select(News).where(stock_codes_filter)\n        \n        # 情感筛选\n        if sentiment:\n            if sentiment == \"positive\":\n                query = query.where(News.sentiment_score > 0.1)\n            elif sentiment == \"negative\":\n                query = query.where(News.sentiment_score < -0.1)\n            elif sentiment == \"neutral\":\n                query = query.where(\n                    and_(\n                        News.sentiment_score >= -0.1,\n                        News.sentiment_score <= 0.1\n                    )\n                )\n        \n        # 排序和分页\n        query = query.order_by(desc(News.publish_time)).offset(offset).limit(limit)\n        \n        result = await db.execute(query)\n        news_list = result.scalars().all()\n        \n        # 检查每条新闻是否有分析\n        response = []\n        for news in news_list:\n            # 检查是否有分析记录\n            analysis_query = select(func.count(Analysis.id)).where(Analysis.news_id == news.id)\n            analysis_result = await db.execute(analysis_query)\n            has_analysis = (analysis_result.scalar() or 0) > 0\n            \n            response.append(StockNewsItem(\n                id=news.id,\n                title=news.title,\n                content=news.content[:500] + \"...\" if len(news.content) > 500 else news.content,\n                url=news.url,\n                source=news.source,\n                publish_time=news.publish_time.isoformat() if news.publish_time else None,\n                sentiment_score=news.sentiment_score,\n                has_analysis=has_analysis\n            ))\n        \n        return response\n    \n    except Exception as e:\n        logger.error(f\"Failed to get news for stock {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.delete(\"/{stock_code}/news\")\nasync def delete_stock_news(\n    stock_code: str,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    清除股票的所有关联新闻\n    \n    - **stock_code**: 股票代码\n    \"\"\"\n    # 标准化股票代码\n    code = stock_code.upper()\n    if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n        short_code = code[2:]\n    else:\n        short_code = code\n        code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n    \n    try:\n        # 构建查询 - 使用 PostgreSQL 原生 ARRAY 查询语法\n        stock_codes_filter = text(\n            \"stock_codes @> ARRAY[:code1]::varchar[] OR stock_codes @> ARRAY[:code2]::varchar[]\"\n        ).bindparams(code1=short_code, code2=code)\n        \n        # 先查询要删除的新闻ID列表（用于同时删除关联的分析记录）\n        news_query = select(News.id).where(stock_codes_filter)\n        news_result = await db.execute(news_query)\n        news_ids = [row[0] for row in news_result.all()]\n        \n        deleted_count = len(news_ids)\n        \n        if deleted_count > 0:\n            # 删除关联的分析记录\n            analysis_delete = await db.execute(\n                text(\"DELETE FROM analyses WHERE news_id = ANY(:news_ids)\").bindparams(news_ids=news_ids)\n            )\n            logger.info(f\"Deleted {analysis_delete.rowcount} analysis records for stock {stock_code}\")\n            \n            # 删除新闻记录\n            news_delete = await db.execute(\n                text(\"DELETE FROM news WHERE id = ANY(:news_ids)\").bindparams(news_ids=news_ids)\n            )\n            await db.commit()\n            \n            logger.info(f\"Deleted {deleted_count} news for stock {stock_code}\")\n        \n        return {\n            \"success\": True,\n            \"message\": f\"已清除 {deleted_count} 条新闻\",\n            \"deleted_count\": deleted_count\n        }\n    \n    except Exception as e:\n        await db.rollback()\n        logger.error(f\"Failed to delete news for stock {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/{stock_code}/sentiment-trend\", response_model=List[SentimentTrendPoint])\nasync def get_sentiment_trend(\n    stock_code: str,\n    days: int = Query(30, le=90, ge=7, description=\"天数范围\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取股票情感趋势（按天聚合）\n    \n    - **stock_code**: 股票代码\n    - **days**: 查询天数范围（7-90天）\n    \"\"\"\n    # 标准化股票代码\n    code = stock_code.upper()\n    if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n        short_code = code[2:]\n    else:\n        short_code = code\n        code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n    \n    try:\n        start_date = datetime.utcnow() - timedelta(days=days)\n        \n        # 按天聚合情感数据\n        # 使用原生 SQL 进行日期聚合\n        from sqlalchemy import text\n        \n        query = text(\"\"\"\n            SELECT \n                DATE(publish_time) as date,\n                AVG(sentiment_score) as avg_sentiment,\n                COUNT(*) as news_count,\n                SUM(CASE WHEN sentiment_score > 0.1 THEN 1 ELSE 0 END) as positive_count,\n                SUM(CASE WHEN sentiment_score < -0.1 THEN 1 ELSE 0 END) as negative_count,\n                SUM(CASE WHEN sentiment_score >= -0.1 AND sentiment_score <= 0.1 THEN 1 ELSE 0 END) as neutral_count\n            FROM news\n            WHERE (\n                :short_code = ANY(stock_codes) \n                OR :full_code = ANY(stock_codes)\n            )\n            AND publish_time >= :start_date\n            AND sentiment_score IS NOT NULL\n            GROUP BY DATE(publish_time)\n            ORDER BY date ASC\n        \"\"\")\n        \n        result = await db.execute(query, {\n            \"short_code\": short_code,\n            \"full_code\": code,\n            \"start_date\": start_date\n        })\n        rows = result.fetchall()\n        \n        trend_data = []\n        for row in rows:\n            trend_data.append(SentimentTrendPoint(\n                date=row.date.isoformat() if row.date else \"\",\n                avg_sentiment=round(row.avg_sentiment, 3) if row.avg_sentiment else 0,\n                news_count=row.news_count or 0,\n                positive_count=row.positive_count or 0,\n                negative_count=row.negative_count or 0,\n                neutral_count=row.neutral_count or 0\n            ))\n        \n        return trend_data\n    \n    except Exception as e:\n        logger.error(f\"Failed to get sentiment trend for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/{stock_code}/kline\", response_model=List[KLineDataPoint])\nasync def get_kline_data(\n    stock_code: str,\n    period: str = Query(\"daily\", description=\"周期: daily, 1m, 5m, 15m, 30m, 60m\"),\n    limit: int = Query(90, le=500, ge=10, description=\"数据条数\"),\n    adjust: str = Query(\"qfq\", description=\"复权类型: qfq=前复权, hfq=后复权, 空=不复权（仅日线有效）\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取K线数据（真实数据，使用 akshare）\n    \n    - **stock_code**: 股票代码（支持 600519, SH600519, sh600519 等格式）\n    - **period**: 周期类型\n      - daily: 日线（默认）\n      - 1m: 1分钟\n      - 5m: 5分钟\n      - 15m: 15分钟\n      - 30m: 30分钟\n      - 60m: 60分钟/1小时\n    - **limit**: 返回数据条数（10-500，默认90）\n    - **adjust**: 复权类型 (qfq=前复权, hfq=后复权, \"\"=不复权)，仅对日线有效\n    \"\"\"\n    try:\n        kline_data = await stock_data_service.get_kline_data(\n            stock_code=stock_code,\n            period=period,\n            limit=limit,\n            adjust=adjust\n        )\n        \n        if not kline_data:\n            logger.warning(f\"No kline data for {stock_code} period={period}\")\n            return []\n        \n        return [KLineDataPoint(**item) for item in kline_data]\n    \n    except Exception as e:\n        logger.error(f\"Failed to get kline data for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\nclass RealtimeQuote(BaseModel):\n    \"\"\"实时行情\"\"\"\n    code: str\n    name: str\n    price: float\n    change_percent: float\n    change_amount: float\n    volume: int\n    turnover: float\n    high: float\n    low: float\n    open: float\n    prev_close: float\n\n\n@router.get(\"/{stock_code}/realtime\", response_model=Optional[RealtimeQuote])\nasync def get_realtime_quote(\n    stock_code: str,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取实时行情（使用 akshare）\n    \n    - **stock_code**: 股票代码\n    \"\"\"\n    try:\n        quote = await stock_data_service.get_realtime_quote(stock_code)\n        if quote:\n            return RealtimeQuote(**quote)\n        return None\n    except Exception as e:\n        logger.error(f\"Failed to get realtime quote for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/search/code\", response_model=List[StockInfo])\nasync def search_stocks_db(\n    q: str = Query(..., min_length=1, description=\"搜索关键词\"),\n    limit: int = Query(10, le=50),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    从数据库搜索股票\n    \n    - **q**: 搜索关键词（代码或名称）\n    \"\"\"\n    try:\n        query = select(Stock).where(\n            (Stock.code.ilike(f\"%{q}%\")) | \n            (Stock.name.ilike(f\"%{q}%\")) |\n            (Stock.full_code.ilike(f\"%{q}%\"))\n        ).limit(limit)\n        \n        result = await db.execute(query)\n        stocks = result.scalars().all()\n        \n        return [StockInfo.model_validate(stock) for stock in stocks]\n    \n    except Exception as e:\n        logger.error(f\"Failed to search stocks: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n# ============ 定向爬取 API ============\n\nclass TargetedCrawlRequest(BaseModel):\n    \"\"\"定向爬取请求\"\"\"\n    stock_name: str = Field(..., description=\"股票名称\")\n    days: int = Field(default=30, ge=1, le=90, description=\"搜索时间范围（天）\")\n\n\nclass TargetedCrawlResponse(BaseModel):\n    \"\"\"定向爬取响应\"\"\"\n    success: bool\n    message: str\n    task_id: Optional[int] = None\n    celery_task_id: Optional[str] = None\n\n\nclass TargetedCrawlStatus(BaseModel):\n    \"\"\"定向爬取状态\"\"\"\n    task_id: Optional[int] = None\n    status: str  # idle, pending, running, completed, failed\n    celery_task_id: Optional[str] = None\n    progress: Optional[dict] = None\n    crawled_count: Optional[int] = None\n    saved_count: Optional[int] = None\n    error_message: Optional[str] = None\n    execution_time: Optional[float] = None\n    started_at: Optional[str] = None\n    completed_at: Optional[str] = None\n\n\n@router.post(\"/{stock_code}/targeted-crawl\", response_model=TargetedCrawlResponse)\nasync def start_targeted_crawl(\n    stock_code: str,\n    request: TargetedCrawlRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    触发定向爬取任务\n    \n    - **stock_code**: 股票代码（如 SH600519）\n    - **stock_name**: 股票名称（如 贵州茅台）\n    - **days**: 搜索时间范围（默认30天）\n    \"\"\"\n    try:\n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        # 检查是否有正在运行的任务\n        running_task = await db.execute(\n            select(CrawlTask).where(\n                and_(\n                    CrawlTask.mode == CrawlMode.TARGETED,\n                    CrawlTask.status.in_([TaskStatus.PENDING, TaskStatus.RUNNING]),\n                    text(\"config->>'stock_code' = :stock_code\").bindparams(stock_code=code)\n                )\n            ).order_by(desc(CrawlTask.created_at)).limit(1)\n        )\n        existing_task = running_task.scalar_one_or_none()\n        \n        if existing_task:\n            return TargetedCrawlResponse(\n                success=False,\n                message=f\"该股票已有正在进行的爬取任务 (ID: {existing_task.id})\",\n                task_id=existing_task.id,\n                celery_task_id=existing_task.celery_task_id\n            )\n        \n        logger.info(f\"触发定向爬取任务: {request.stock_name}({code}), 时间范围: {request.days}天\")\n        \n        # 先在数据库中创建任务记录（PENDING状态），这样前端轮询时能立即看到\n        task_record = CrawlTask(\n            mode=CrawlMode.TARGETED,\n            status=TaskStatus.PENDING,\n            source=\"targeted\",\n            config={\n                \"stock_code\": code,\n                \"stock_name\": request.stock_name,\n                \"days\": request.days,\n            },\n        )\n        db.add(task_record)\n        await db.commit()\n        await db.refresh(task_record)\n        \n        # 触发 Celery 任务，传入任务记录ID\n        celery_task = targeted_stock_crawl_task.apply_async(\n            args=(code, request.stock_name, request.days, task_record.id)\n        )\n        \n        # 更新 celery_task_id\n        task_record.celery_task_id = celery_task.id\n        await db.commit()\n        \n        return TargetedCrawlResponse(\n            success=True,\n            message=f\"定向爬取任务已启动: {request.stock_name}({code})\",\n            task_id=task_record.id,\n            celery_task_id=celery_task.id\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to start targeted crawl for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/{stock_code}/targeted-crawl/status\", response_model=TargetedCrawlStatus)\nasync def get_targeted_crawl_status(\n    stock_code: str,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    查询定向爬取任务状态\n    \n    - **stock_code**: 股票代码\n    \"\"\"\n    try:\n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        # 查询最近的定向爬取任务\n        task_query = select(CrawlTask).where(\n            and_(\n                CrawlTask.mode == CrawlMode.TARGETED,\n                text(\"config->>'stock_code' = :stock_code\").bindparams(stock_code=code)\n            )\n        ).order_by(desc(CrawlTask.created_at)).limit(1)\n        \n        result = await db.execute(task_query)\n        task = result.scalar_one_or_none()\n        \n        if not task:\n            return TargetedCrawlStatus(\n                status=\"idle\",\n                progress=None\n            )\n        \n        # 检测超时：如果任务在 PENDING 状态超过 5 分钟，自动标记为失败\n        if task.status == TaskStatus.PENDING and task.created_at:\n            pending_duration = datetime.utcnow() - task.created_at\n            if pending_duration > timedelta(minutes=5):\n                logger.warning(f\"Task {task.id} has been PENDING for {pending_duration}, marking as FAILED (timeout)\")\n                task.status = TaskStatus.FAILED\n                task.completed_at = datetime.utcnow()\n                task.error_message = \"任务超时：Celery worker 可能未启动或已停止\"\n                await db.commit()\n        \n        # 检测运行超时：如果任务在 RUNNING 状态超过 30 分钟，也标记为失败\n        if task.status == TaskStatus.RUNNING and task.started_at:\n            running_duration = datetime.utcnow() - task.started_at\n            if running_duration > timedelta(minutes=30):\n                logger.warning(f\"Task {task.id} has been RUNNING for {running_duration}, marking as FAILED (timeout)\")\n                task.status = TaskStatus.FAILED\n                task.completed_at = datetime.utcnow()\n                task.error_message = \"任务执行超时\"\n                await db.commit()\n        \n        return TargetedCrawlStatus(\n            task_id=task.id,\n            status=task.status,\n            celery_task_id=task.celery_task_id,\n            progress=task.progress,\n            crawled_count=task.crawled_count,\n            saved_count=task.saved_count,\n            error_message=task.error_message,\n            execution_time=task.execution_time,\n            started_at=task.started_at.isoformat() if task.started_at else None,\n            completed_at=task.completed_at.isoformat() if task.completed_at else None\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to get targeted crawl status for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/{stock_code}/targeted-crawl/cancel\")\nasync def cancel_targeted_crawl(\n    stock_code: str,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    取消定向爬取任务\n    \n    - **stock_code**: 股票代码\n    \"\"\"\n    try:\n        # 标准化股票代码\n        code = stock_code.upper()\n        if not (code.startswith(\"SH\") or code.startswith(\"SZ\")):\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        # 查找正在进行的任务\n        task_query = select(CrawlTask).where(\n            and_(\n                CrawlTask.mode == CrawlMode.TARGETED,\n                CrawlTask.status.in_([TaskStatus.PENDING, TaskStatus.RUNNING]),\n                text(\"config->>'stock_code' = :stock_code\").bindparams(stock_code=code)\n            )\n        ).order_by(desc(CrawlTask.created_at)).limit(1)\n        \n        result = await db.execute(task_query)\n        task = result.scalar_one_or_none()\n        \n        if not task:\n            return {\n                \"success\": True,\n                \"message\": \"没有正在进行的任务\"\n            }\n        \n        # 更新任务状态为已取消\n        task.status = TaskStatus.CANCELLED\n        task.completed_at = datetime.utcnow()\n        task.error_message = \"用户手动取消\"\n        await db.commit()\n        \n        # 如果有 celery_task_id，尝试撤销 Celery 任务\n        if task.celery_task_id:\n            try:\n                from ...tasks.crawl_tasks import celery_app\n                celery_app.control.revoke(task.celery_task_id, terminate=True)\n                logger.info(f\"Revoked Celery task: {task.celery_task_id}\")\n            except Exception as e:\n                logger.warning(f\"Failed to revoke Celery task: {e}\")\n        \n        logger.info(f\"Cancelled targeted crawl task {task.id} for {code}\")\n        \n        return {\n            \"success\": True,\n            \"message\": f\"已取消任务 (ID: {task.id})\",\n            \"task_id\": task.id\n        }\n    \n    except Exception as e:\n        logger.error(f\"Failed to cancel targeted crawl for {stock_code}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/cache/clear\")\nasync def clear_stock_data_cache(\n    pattern: Optional[str] = Query(None, description=\"缓存键模式，如 'kline' 或 '002837'\")\n):\n    \"\"\"\n    清除股票数据缓存\n    \n    - **pattern**: 可选的缓存键模式，如果不提供则清除所有缓存\n    \n    Examples:\n    - `POST /api/v1/stocks/cache/clear` - 清除所有缓存\n    - `POST /api/v1/stocks/cache/clear?pattern=kline` - 只清除K线缓存\n    - `POST /api/v1/stocks/cache/clear?pattern=002837` - 只清除特定股票的缓存\n    \"\"\"\n    try:\n        stock_data_service.clear_cache(pattern)\n        return {\n            \"success\": True,\n            \"message\": f\"Cache cleared successfully\" + (f\" (pattern: {pattern})\" if pattern else \" (all)\")\n        }\n    except Exception as e:\n        logger.error(f\"Failed to clear cache: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n"
  },
  {
    "path": "backend/app/api/v1/tasks.py",
    "content": "\"\"\"\n任务管理 API 路由\n\"\"\"\nimport logging\nfrom typing import List, Optional\nfrom fastapi import APIRouter, Depends, HTTPException, Query\nfrom pydantic import BaseModel, Field\nfrom sqlalchemy.ext.asyncio import AsyncSession\nfrom sqlalchemy import select, desc\nfrom datetime import datetime\n\nfrom ...core.database import get_db\nfrom ...models.crawl_task import CrawlTask, CrawlMode, TaskStatus\nfrom ...tasks.crawl_tasks import cold_start_crawl_task, realtime_crawl_task\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter()\n\n\n# Pydantic 模型\nclass TaskResponse(BaseModel):\n    \"\"\"任务响应模型\"\"\"\n    model_config = {\"from_attributes\": True}\n    \n    id: int\n    celery_task_id: Optional[str] = None\n    mode: str\n    status: str\n    source: str\n    config: Optional[dict] = None\n    progress: Optional[dict] = None\n    current_page: Optional[int] = None\n    total_pages: Optional[int] = None\n    result: Optional[dict] = None\n    crawled_count: int\n    saved_count: int\n    error_message: Optional[str] = None\n    execution_time: Optional[float] = None\n    created_at: str\n    started_at: Optional[str] = None\n    completed_at: Optional[str] = None\n\n\nclass ColdStartRequest(BaseModel):\n    \"\"\"冷启动请求模型\"\"\"\n    source: str = Field(default=\"sina\", description=\"新闻源\")\n    start_page: int = Field(default=1, ge=1, description=\"起始页码\")\n    end_page: int = Field(default=50, ge=1, le=100, description=\"结束页码\")\n\n\nclass ColdStartResponse(BaseModel):\n    \"\"\"冷启动响应模型\"\"\"\n    success: bool\n    message: str\n    task_id: Optional[int] = None\n    celery_task_id: Optional[str] = None\n\n\nclass RealtimeCrawlRequest(BaseModel):\n    \"\"\"实时爬取请求模型\"\"\"\n    source: str = Field(description=\"新闻源（sina, tencent, eeo等）\")\n    force_refresh: bool = Field(default=False, description=\"是否强制刷新（跳过缓存）\")\n\n\nclass RealtimeCrawlResponse(BaseModel):\n    \"\"\"实时爬取响应模型\"\"\"\n    success: bool\n    message: str\n    celery_task_id: Optional[str] = None\n\n\n# API 端点\n@router.get(\"/\", response_model=List[TaskResponse])\nasync def get_tasks_list(\n    skip: int = Query(0, ge=0, description=\"跳过的记录数\"),\n    limit: int = Query(20, ge=1, le=100, description=\"返回的记录数\"),\n    mode: Optional[str] = Query(None, description=\"按模式筛选\"),\n    status: Optional[str] = Query(None, description=\"按状态筛选\"),\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取任务列表\n    \n    - **skip**: 跳过的记录数（分页）\n    - **limit**: 返回的记录数\n    - **mode**: 按模式筛选（cold_start, realtime, targeted）\n    - **status**: 按状态筛选（pending, running, completed, failed）\n    \"\"\"\n    try:\n        query = select(CrawlTask).order_by(desc(CrawlTask.created_at))\n        \n        if mode:\n            query = query.where(CrawlTask.mode == mode)\n        if status:\n            query = query.where(CrawlTask.status == status)\n        \n        query = query.offset(skip).limit(limit)\n        \n        result = await db.execute(query)\n        tasks = result.scalars().all()\n        \n        return [TaskResponse(**task.to_dict()) for task in tasks]\n    \n    except Exception as e:\n        logger.error(f\"Failed to get tasks list: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/{task_id}\", response_model=TaskResponse)\nasync def get_task_detail(\n    task_id: int,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取任务详情\n    \n    - **task_id**: 任务ID\n    \"\"\"\n    try:\n        result = await db.execute(\n            select(CrawlTask).where(CrawlTask.id == task_id)\n        )\n        task = result.scalar_one_or_none()\n        \n        if not task:\n            raise HTTPException(status_code=404, detail=\"Task not found\")\n        \n        return TaskResponse(**task.to_dict())\n    \n    except HTTPException:\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to get task {task_id}: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/cold-start\", response_model=ColdStartResponse)\nasync def trigger_cold_start(\n    request: ColdStartRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    触发冷启动批量爬取任务\n    \n    - **source**: 新闻源（sina, jrj等）\n    - **start_page**: 起始页码\n    - **end_page**: 结束页码\n    \"\"\"\n    try:\n        logger.info(\n            f\"触发冷启动任务: {request.source}, \"\n            f\"页码 {request.start_page}-{request.end_page}\"\n        )\n        \n        # 触发 Celery 任务\n        celery_task = cold_start_crawl_task.apply_async(\n            args=(request.source, request.start_page, request.end_page)\n        )\n        \n        # 等待任务记录创建（最多等待2秒）\n        await db.commit()  # 确保之前的事务已提交\n        \n        return ColdStartResponse(\n            success=True,\n            message=f\"冷启动任务已启动: {request.source}, 页码 {request.start_page}-{request.end_page}\",\n            celery_task_id=celery_task.id\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to trigger cold start: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.post(\"/realtime\", response_model=RealtimeCrawlResponse)\nasync def trigger_realtime_crawl(\n    request: RealtimeCrawlRequest,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    手动触发实时爬取任务\n    \n    - **source**: 新闻源（sina, tencent, eeo, jwview等）\n    - **force_refresh**: 是否强制刷新（跳过缓存）\n    \n    示例:\n    - POST /api/v1/tasks/realtime {\"source\": \"tencent\", \"force_refresh\": true}\n    - POST /api/v1/tasks/realtime {\"source\": \"eeo\"}\n    \"\"\"\n    try:\n        logger.info(\n            f\"手动触发实时爬取任务: {request.source}, \"\n            f\"force_refresh={request.force_refresh}\"\n        )\n        \n        # 触发 Celery 任务\n        celery_task = realtime_crawl_task.apply_async(\n            args=(request.source, request.force_refresh)\n        )\n        \n        return RealtimeCrawlResponse(\n            success=True,\n            message=f\"实时爬取任务已启动: {request.source}\",\n            celery_task_id=celery_task.id\n        )\n    \n    except Exception as e:\n        logger.error(f\"Failed to trigger realtime crawl: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.get(\"/stats/summary\")\nasync def get_task_stats(\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    获取任务统计信息\n    \"\"\"\n    try:\n        # 统计各状态的任务数\n        result = await db.execute(select(CrawlTask))\n        all_tasks = result.scalars().all()\n        \n        stats = {\n            \"total\": len(all_tasks),\n            \"by_status\": {},\n            \"by_mode\": {},\n            \"recent_completed\": 0,\n            \"total_news_crawled\": 0,\n            \"total_news_saved\": 0,\n        }\n        \n        for task in all_tasks:\n            # 按状态统计\n            stats[\"by_status\"][task.status] = stats[\"by_status\"].get(task.status, 0) + 1\n            \n            # 按模式统计\n            stats[\"by_mode\"][task.mode] = stats[\"by_mode\"].get(task.mode, 0) + 1\n            \n            # 统计新闻数\n            stats[\"total_news_crawled\"] += task.crawled_count or 0\n            stats[\"total_news_saved\"] += task.saved_count or 0\n            \n            # 最近24小时完成的任务\n            if task.status == TaskStatus.COMPLETED and task.completed_at:\n                from datetime import timedelta\n                if datetime.utcnow() - task.completed_at < timedelta(days=1):\n                    stats[\"recent_completed\"] += 1\n        \n        return stats\n    \n    except Exception as e:\n        logger.error(f\"Failed to get task stats: {e}\")\n        raise HTTPException(status_code=500, detail=str(e))\n\n\n@router.delete(\"/{task_id}\")\nasync def delete_task(\n    task_id: int,\n    db: AsyncSession = Depends(get_db)\n):\n    \"\"\"\n    删除任务记录\n    \n    - **task_id**: 任务ID\n    \"\"\"\n    try:\n        result = await db.execute(\n            select(CrawlTask).where(CrawlTask.id == task_id)\n        )\n        task = result.scalar_one_or_none()\n        \n        if not task:\n            raise HTTPException(status_code=404, detail=\"Task not found\")\n        \n        await db.delete(task)\n        await db.commit()\n        \n        return {\"success\": True, \"message\": f\"Task {task_id} deleted\"}\n    \n    except HTTPException:\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to delete task {task_id}: {e}\")\n        await db.rollback()\n        raise HTTPException(status_code=500, detail=str(e))\n\n"
  },
  {
    "path": "backend/app/config/__init__.py",
    "content": "\"\"\"\n配置模块\n\"\"\"\nimport os\nfrom pathlib import Path\nfrom typing import Dict, Any, Optional, List\nimport yaml\nfrom pydantic import BaseModel, Field, ConfigDict\n\n\n# 配置目录\nCONFIG_DIR = Path(__file__).parent\n\n\nclass AgentConfig(BaseModel):\n    \"\"\"智能体配置\"\"\"\n    name: str\n    role: str\n    description: str\n\n\nclass FlowStep(BaseModel):\n    \"\"\"流程步骤配置\"\"\"\n    name: str\n    description: str\n    parallel: bool = False\n    agents: List[str] = Field(default_factory=list)\n    type: Optional[str] = None\n    max_rounds: Optional[int] = None\n\n\nclass FlowConfig(BaseModel):\n    \"\"\"流程配置\"\"\"\n    type: str\n    steps: List[FlowStep]\n\n\nclass ModeRules(BaseModel):\n    \"\"\"模式规则配置\"\"\"\n    max_time: int = 300\n    max_rounds: Optional[int] = None\n    round_time_limit: Optional[int] = None\n    manager_can_interrupt: bool = False\n    require_news: bool = True\n    require_financial: bool = True\n    require_data_collection: bool = False\n    early_decision: bool = False\n    min_news_count: int = 0\n\n\nclass DebateRules(BaseModel):\n    \"\"\"辩论规则配置\"\"\"\n    opening_statement: bool = True\n    rebuttal_required: bool = True\n    evidence_required: bool = True\n    interrupt_cooldown: int = 30\n\n\nclass DebateModeConfig(BaseModel):\n    \"\"\"辩论模式配置\"\"\"\n    name: str\n    description: str\n    icon: str = \"📊\"\n    agents: List[AgentConfig]\n    flow: FlowConfig\n    rules: ModeRules\n    debate_rules: Optional[DebateRules] = None\n\n\nclass LLMConfig(BaseModel):\n    \"\"\"LLM配置\"\"\"\n    default_provider: str = \"bailian\"\n    default_model: str = \"qwen-plus\"\n    temperature: float = 0.7\n    max_tokens: int = 4096\n\n\nclass DataSourceConfig(BaseModel):\n    \"\"\"数据源配置\"\"\"\n    type: str\n    priority: int = 1\n\n\nclass DataSourcesConfig(BaseModel):\n    \"\"\"数据源集合配置\"\"\"\n    news: List[DataSourceConfig] = Field(default_factory=list)\n    financial: List[DataSourceConfig] = Field(default_factory=list)\n\n\nclass OutputConfig(BaseModel):\n    \"\"\"输出配置\"\"\"\n    format: str = \"markdown\"\n    include_trajectory: bool = True\n    include_timestamps: bool = True\n\n\nclass GlobalConfig(BaseModel):\n    \"\"\"全局配置\"\"\"\n    llm: LLMConfig = Field(default_factory=LLMConfig)\n    data_sources: DataSourcesConfig = Field(default_factory=DataSourcesConfig)\n    output: OutputConfig = Field(default_factory=OutputConfig)\n\n\nclass DebateModesConfig(BaseModel):\n    \"\"\"辩论模式总配置\"\"\"\n    model_config = ConfigDict(populate_by_name=True)\n    \n    default_mode: str = \"parallel\"\n    modes: Dict[str, DebateModeConfig]\n    global_config: GlobalConfig = Field(default_factory=GlobalConfig, alias=\"global\")\n\n\ndef load_debate_modes_config() -> DebateModesConfig:\n    \"\"\"加载辩论模式配置\"\"\"\n    config_file = CONFIG_DIR / \"debate_modes.yaml\"\n    \n    if not config_file.exists():\n        raise FileNotFoundError(f\"配置文件不存在: {config_file}\")\n    \n    with open(config_file, \"r\", encoding=\"utf-8\") as f:\n        raw_config = yaml.safe_load(f)\n    \n    # 处理 global 关键字冲突\n    if \"global\" in raw_config:\n        raw_config[\"global_config\"] = raw_config.pop(\"global\")\n    \n    return DebateModesConfig(**raw_config)\n\n\ndef get_mode_config(mode_name: str) -> Optional[DebateModeConfig]:\n    \"\"\"获取指定模式的配置\"\"\"\n    config = load_debate_modes_config()\n    return config.modes.get(mode_name)\n\n\ndef get_available_modes() -> List[Dict[str, Any]]:\n    \"\"\"获取所有可用的模式列表\"\"\"\n    config = load_debate_modes_config()\n    modes = []\n    for mode_id, mode_config in config.modes.items():\n        modes.append({\n            \"id\": mode_id,\n            \"name\": mode_config.name,\n            \"description\": mode_config.description,\n            \"icon\": mode_config.icon,\n            \"is_default\": mode_id == config.default_mode\n        })\n    return modes\n\n\ndef get_default_mode() -> str:\n    \"\"\"获取默认模式\"\"\"\n    config = load_debate_modes_config()\n    return config.default_mode\n\n\n# 单例缓存\n_cached_config: Optional[DebateModesConfig] = None\n\n\ndef get_cached_config() -> DebateModesConfig:\n    \"\"\"获取缓存的配置（避免重复读取文件）\"\"\"\n    global _cached_config\n    if _cached_config is None:\n        _cached_config = load_debate_modes_config()\n    return _cached_config\n\n\ndef reload_config() -> DebateModesConfig:\n    \"\"\"重新加载配置\"\"\"\n    global _cached_config\n    _cached_config = load_debate_modes_config()\n    return _cached_config\n\n"
  },
  {
    "path": "backend/app/config/debate_modes.yaml",
    "content": "# 多智能体协作模式配置\n# 支持多种辩论/分析模式，可通过前端或API选择\n\n# 默认模式\ndefault_mode: parallel\n\nmodes:\n  # ============ 并行分析模式（当前默认） ============\n  parallel:\n    name: \"并行分析模式\"\n    description: \"Bull/Bear并行分析，投资经理汇总决策\"\n    icon: \"⚡\"\n    \n    # 参与的智能体\n    agents:\n      - name: BullResearcher\n        role: \"看多研究员\"\n        description: \"从积极角度分析股票，发现投资机会\"\n      - name: BearResearcher\n        role: \"看空研究员\"\n        description: \"从风险角度分析股票，识别潜在问题\"\n      - name: InvestmentManager\n        role: \"投资经理\"\n        description: \"综合双方观点，做出最终投资决策\"\n    \n    # 执行流程\n    flow:\n      type: parallel_then_summarize\n      steps:\n        - name: data_preparation\n          description: \"准备新闻和财务数据\"\n          parallel: false\n        - name: researcher_analysis\n          description: \"Bull/Bear并行分析\"\n          parallel: true\n          agents: [BullResearcher, BearResearcher]\n        - name: manager_decision\n          description: \"投资经理综合决策\"\n          parallel: false\n          agents: [InvestmentManager]\n    \n    # 规则配置\n    rules:\n      max_time: 300          # 最长执行时间（秒）\n      require_news: true     # 是否需要新闻数据\n      require_financial: true # 是否需要财务数据\n      min_news_count: 1      # 最少新闻数量\n\n  # ============ 实时辩论模式 ============\n  realtime_debate:\n    name: \"实时辩论模式\"\n    description: \"四人实时对话，投资经理主持，多空双方交替发言\"\n    icon: \"🎭\"\n    \n    # 参与的智能体\n    agents:\n      - name: DataCollector\n        role: \"数据专员\"\n        description: \"搜集和整理相关数据资料\"\n      - name: BullResearcher\n        role: \"多方辩手\"\n        description: \"支持买入，提出看多论点\"\n      - name: BearResearcher\n        role: \"空方辩手\"\n        description: \"建议卖出，提出看空论点\"\n      - name: InvestmentManager\n        role: \"投资经理（主持人）\"\n        description: \"主持辩论，随时提问，最终裁决\"\n    \n    # 执行流程\n    flow:\n      type: orchestrated_debate\n      steps:\n        - name: opening\n          description: \"投资经理开场，下发分析任务\"\n          agents: [InvestmentManager]\n        - name: data_collection\n          description: \"数据专员搜集资料\"\n          agents: [DataCollector]\n        - name: debate_rounds\n          description: \"多空双方辩论\"\n          type: alternating\n          agents: [BullResearcher, BearResearcher]\n          max_rounds: 5\n        - name: closing\n          description: \"投资经理总结决策\"\n          agents: [InvestmentManager]\n    \n    # 规则配置\n    rules:\n      max_rounds: 5            # 最大辩论回合数\n      max_time: 600            # 最长执行时间（秒）\n      round_time_limit: 60     # 每回合时间限制（秒）\n      manager_can_interrupt: true  # 投资经理是否可以打断\n      require_data_collection: true  # 是否需要先搜集数据\n      early_decision: true     # 是否允许提前做决策\n      \n    # 辩论规则\n    debate_rules:\n      opening_statement: true   # 是否需要开场陈述\n      rebuttal_required: true   # 是否必须反驳对方\n      evidence_required: true   # 是否需要提供证据\n      interrupt_cooldown: 30    # 打断冷却时间（秒）\n\n  # ============ 快速分析模式 ============\n  quick_analysis:\n    name: \"快速分析模式\"\n    description: \"单一分析师快速给出建议，适合时间紧迫场景\"\n    icon: \"🚀\"\n    \n    agents:\n      - name: QuickAnalyst\n        role: \"快速分析师\"\n        description: \"综合多角度快速给出投资建议\"\n    \n    flow:\n      type: single_agent\n      steps:\n        - name: quick_analysis\n          description: \"快速综合分析\"\n          agents: [QuickAnalyst]\n    \n    rules:\n      max_time: 60\n      require_news: false\n      require_financial: true\n\n# ============ 全局配置 ============\nglobal:\n  # LLM配置\n  llm:\n    default_provider: bailian\n    default_model: qwen-plus\n    temperature: 0.7\n    max_tokens: 4096\n  \n  # 数据源配置\n  data_sources:\n    news:\n      - type: database\n        priority: 1\n      - type: bochaai\n        priority: 2\n    financial:\n      - type: akshare\n        priority: 1\n  \n  # 输出配置\n  output:\n    format: markdown\n    include_trajectory: true\n    include_timestamps: true\n\n"
  },
  {
    "path": "backend/app/core/__init__.py",
    "content": "\"\"\"\n核心模块\n\"\"\"\nfrom .config import settings, get_settings\nfrom .database import get_db, init_database\n\n__all__ = [\"settings\", \"get_settings\", \"get_db\", \"init_database\"]\n\n"
  },
  {
    "path": "backend/app/core/celery_app.py",
    "content": "\"\"\"\nCelery 应用配置\n\"\"\"\nfrom celery import Celery\nfrom celery.schedules import crontab\nfrom .config import settings\n\n# 创建 Celery 应用\ncelery_app = Celery(\n    \"finnews\",\n    broker=settings.REDIS_URL,\n    backend=settings.REDIS_URL,\n    include=[\"app.tasks.crawl_tasks\"]  # 导入任务模块\n)\n\n# Celery 配置\ncelery_app.conf.update(\n    # 时区设置\n    timezone=\"Asia/Shanghai\",\n    enable_utc=True,\n    \n    # 任务结果配置\n    result_expires=3600,  # 结果保存1小时\n    result_backend_transport_options={\n        'master_name': 'mymaster'\n    },\n    \n    # 任务执行配置\n    task_serializer=\"json\",\n    result_serializer=\"json\",\n    accept_content=[\"json\"],\n    task_track_started=True,\n    task_time_limit=30 * 60,  # 30分钟超时\n    task_soft_time_limit=25 * 60,  # 25分钟软超时\n    \n    # Worker 配置\n    worker_prefetch_multiplier=1,  # 每次只拿一个任务\n    worker_max_tasks_per_child=1000,  # 每个 worker 处理1000个任务后重启\n    \n    # Beat 调度配置\n    beat_schedule={\n        # 每1分钟爬取新浪财经\n        \"crawl-sina-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"sina\",),\n        },\n        # 每1分钟爬取腾讯财经\n        \"crawl-tencent-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"tencent\",),\n        },\n        # 每1分钟爬取中新经纬\n        \"crawl-jwview-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"jwview\",),\n        },\n        # 每1分钟爬取经济观察网\n        \"crawl-eeo-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"eeo\",),\n        },\n        # 每1分钟爬取财经网\n        \"crawl-caijing-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"caijing\",),\n        },\n        # 每1分钟爬取21经济网\n        \"crawl-jingji21-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"jingji21\",),\n        },\n        # 每1分钟爬取每日经济新闻\n        \"crawl-nbd-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"nbd\",),\n        },\n        # 每1分钟爬取第一财经\n        \"crawl-yicai-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"yicai\",),\n        },\n        # 每1分钟爬取网易财经\n        \"crawl-163-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"163\",),\n        },\n        # 每1分钟爬取东方财富\n        \"crawl-eastmoney-every-1min\": {\n            \"task\": \"app.tasks.crawl_tasks.realtime_crawl_task\",\n            \"schedule\": crontab(minute=\"*/1\"),\n            \"args\": (\"eastmoney\",),\n        },\n    },\n)\n\n# 任务路由（可选，用于任务分发）\n# 注释掉自定义路由，使用默认的 celery 队列\n# celery_app.conf.task_routes = {\n#     \"app.tasks.crawl_tasks.*\": {\"queue\": \"crawl\"},\n#     \"app.tasks.analysis_tasks.*\": {\"queue\": \"analysis\"},\n# }\n\n\nif __name__ == \"__main__\":\n    celery_app.start()\n\n"
  },
  {
    "path": "backend/app/core/config.py",
    "content": "\"\"\"\nFinnewsHunter 核心配置模块\n使用 Pydantic Settings 管理环境变量和配置\n\"\"\"\nfrom typing import Optional, List\nfrom pydantic import Field\nfrom pydantic_settings import BaseSettings, SettingsConfigDict\n\n\nclass Settings(BaseSettings):\n    \"\"\"应用配置类\"\"\"\n    \n    # 应用基础配置\n    APP_NAME: str = \"FinnewsHunter\"\n    APP_VERSION: str = \"0.1.0\"\n    API_V1_PREFIX: str = \"/api/v1\"\n    DEBUG: bool = Field(default=True)\n    \n    # 服务器配置\n    HOST: str = Field(default=\"0.0.0.0\")\n    PORT: int = Field(default=8000)\n    \n    # CORS 配置\n    BACKEND_CORS_ORIGINS: List[str] = Field(\n        default=[\"http://localhost:3000\", \"http://localhost:8000\"]\n    )\n    \n    # PostgreSQL 数据库配置\n    POSTGRES_USER: str = Field(default=\"finnews\")\n    POSTGRES_PASSWORD: str = Field(default=\"finnews_dev_password\")\n    POSTGRES_HOST: str = Field(default=\"localhost\")\n    POSTGRES_PORT: int = Field(default=5432)\n    POSTGRES_DB: str = Field(default=\"finnews_db\")\n    \n    @property\n    def DATABASE_URL(self) -> str:\n        \"\"\"异步数据库连接 URL\"\"\"\n        return (\n            f\"postgresql+asyncpg://{self.POSTGRES_USER}:{self.POSTGRES_PASSWORD}\"\n            f\"@{self.POSTGRES_HOST}:{self.POSTGRES_PORT}/{self.POSTGRES_DB}\"\n        )\n    \n    @property\n    def SYNC_DATABASE_URL(self) -> str:\n        \"\"\"同步数据库连接 URL（用于初始化）\"\"\"\n        return (\n            f\"postgresql://{self.POSTGRES_USER}:{self.POSTGRES_PASSWORD}\"\n            f\"@{self.POSTGRES_HOST}:{self.POSTGRES_PORT}/{self.POSTGRES_DB}\"\n        )\n    \n    # Redis 配置\n    REDIS_HOST: str = Field(default=\"localhost\")\n    REDIS_PORT: int = Field(default=6379)\n    REDIS_DB: int = Field(default=0)\n    REDIS_PASSWORD: Optional[str] = Field(default=None)\n    \n    @property\n    def REDIS_URL(self) -> str:\n        \"\"\"Redis 连接 URL\"\"\"\n        if self.REDIS_PASSWORD:\n            return f\"redis://:{self.REDIS_PASSWORD}@{self.REDIS_HOST}:{self.REDIS_PORT}/{self.REDIS_DB}\"\n        return f\"redis://{self.REDIS_HOST}:{self.REDIS_PORT}/{self.REDIS_DB}\"\n    \n    # Milvus 配置\n    MILVUS_HOST: str = Field(default=\"localhost\")\n    MILVUS_PORT: int = Field(default=19530)\n    MILVUS_COLLECTION_NAME: str = Field(default=\"finnews_embeddings\")\n    MILVUS_DIM: int = Field(default=1536)  # OpenAI embedding dimension\n    \n    # Neo4j 知识图谱配置\n    NEO4J_URI: str = Field(default=\"bolt://localhost:7687\", description=\"Neo4j 连接URI\")\n    NEO4J_USER: str = Field(default=\"neo4j\", description=\"Neo4j 用户名\")\n    NEO4J_PASSWORD: str = Field(default=\"finnews_neo4j_password\", description=\"Neo4j 密码\")\n    \n    # LLM 配置\n    LLM_PROVIDER: str = Field(default=\"bailian\")  # 默认提供商\n    LLM_MODEL: str = Field(default=\"qwen-plus\")\n    LLM_TEMPERATURE: float = Field(default=0.7)\n    LLM_MAX_TOKENS: int = Field(default=2000)\n    LLM_TIMEOUT: int = Field(default=180)  # LLM 调用超时时间（秒），百炼建议180秒\n    \n    # 各厂商 API Key 配置\n    DASHSCOPE_API_KEY: Optional[str] = Field(default=None, description=\"阿里云百炼 API Key\")\n    DASHSCOPE_BASE_URL: str = Field(\n        default=\"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n        description=\"阿里云百炼 Base URL\"\n    )\n    BAILIAN_API_KEY: Optional[str] = Field(default=None, description=\"百炼 API Key（与DASHSCOPE相同）\")\n    OPENAI_API_KEY: Optional[str] = Field(default=None, description=\"OpenAI API Key\")\n    DEEPSEEK_API_KEY: Optional[str] = Field(default=None, description=\"DeepSeek API Key\")\n    MOONSHOT_API_KEY: Optional[str] = Field(default=None, description=\"Moonshot (Kimi) API Key\")\n    ZHIPU_API_KEY: Optional[str] = Field(default=None, description=\"智谱 API Key\")\n    ANTHROPIC_API_KEY: Optional[str] = Field(default=None, description=\"Anthropic API Key\")\n    \n    # 各厂商可用模型列表（逗号分隔）\n    BAILIAN_MODELS: str = Field(\n        default=\"qwen-plus,qwen-max,qwen-turbo,qwen-long\",\n        description=\"百炼可用模型（逗号分隔）\"\n    )\n    OPENAI_MODELS: str = Field(\n        default=\"gpt-4,gpt-4-turbo,gpt-3.5-turbo\",\n        description=\"OpenAI可用模型（逗号分隔）\"\n    )\n    DEEPSEEK_MODELS: str = Field(\n        default=\"deepseek-chat\",\n        description=\"DeepSeek可用模型（逗号分隔）\"\n    )\n    MOONSHOT_MODELS: str = Field(\n        default=\"moonshot-v1-8k,moonshot-v1-32k,moonshot-v1-128k\",\n        description=\"Moonshot可用模型（逗号分隔）\"\n    )\n    ZHIPU_MODELS: str = Field(\n        default=\"glm-4,glm-4-plus,glm-4-air,glm-3-turbo\",\n        description=\"智谱可用模型（逗号分隔）\"\n    )\n    \n    # Base URL 配置（用于第三方 API 转发）\n    OPENAI_BASE_URL: Optional[str] = Field(default=None, description=\"OpenAI Base URL\")\n    DEEPSEEK_BASE_URL: Optional[str] = Field(default=\"https://api.deepseek.com/v1\", description=\"DeepSeek Base URL\")\n    MOONSHOT_BASE_URL: Optional[str] = Field(default=\"https://api.moonshot.cn/v1\", description=\"Moonshot Base URL\")\n    ZHIPU_BASE_URL: Optional[str] = Field(default=\"https://open.bigmodel.cn/api/paas/v4\", description=\"智谱 Base URL\")\n    ANTHROPIC_BASE_URL: Optional[str] = Field(default=None, description=\"Anthropic Base URL\")\n    QWEN_BASE_URL: Optional[str] = Field(default=None, description=\"Qwen Base URL (deprecated)\")\n    BAILIAN_ACCESS_KEY_ID: Optional[str] = Field(default=None, description=\"百炼 Access Key ID\")\n    BAILIAN_ACCESS_KEY_SECRET: Optional[str] = Field(default=None, description=\"百炼 Access Key Secret\")\n    BAILIAN_AGENT_CODE: Optional[str] = Field(default=None, description=\"百炼 Agent Code\")\n    BAILIAN_REGION_ID: str = Field(default=\"cn-beijing\", description=\"百炼 Region ID\")\n    \n    # BochaAI 搜索 API 配置\n    BOCHAAI_API_KEY: Optional[str] = Field(default=None, description=\"BochaAI Web Search API Key\")\n    BOCHAAI_ENDPOINT: str = Field(default=\"https://api.bochaai.com/v1/web-search\", description=\"BochaAI API Endpoint\")\n    \n    # Embedding 配置\n    EMBEDDING_PROVIDER: str = Field(default=\"openai\")  # openai, huggingface\n    EMBEDDING_MODEL: str = Field(default=\"text-embedding-ada-002\")\n    EMBEDDING_BATCH_SIZE: int = Field(default=100)\n    EMBEDDING_BASE_URL: Optional[str] = Field(default=None)  # 自定义 Embedding API 端点\n    EMBEDDING_TIMEOUT: int = Field(default=30, description=\"Embedding API 超时时间（秒），建议设置为20-30秒\")\n    EMBEDDING_MAX_RETRIES: int = Field(default=2, description=\"Embedding API 最大重试次数，建议设置为1-2次以避免等待太久\")\n    \n    # 爬虫配置\n    CRAWLER_USER_AGENT: str = Field(\n        default=\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36\"\n    )\n    CRAWLER_TIMEOUT: int = Field(default=30)\n    CRAWLER_MAX_RETRIES: int = Field(default=3)\n    CRAWLER_DELAY: float = Field(default=1.0)  # 请求间隔（秒）\n    \n    # Phase 2: 实时爬取与缓存配置（多源支持）\n    CACHE_TTL: int = Field(default=1800, description=\"缓存过期时间（秒），默认30分钟\")\n    CRAWL_INTERVAL_SINA: int = Field(default=60, description=\"新浪财经爬取间隔（秒），默认60秒\")\n    CRAWL_INTERVAL_TENCENT: int = Field(default=60, description=\"腾讯财经爬取间隔（秒），默认60秒\")\n    CRAWL_INTERVAL_JWVIEW: int = Field(default=60, description=\"中新经纬爬取间隔（秒），默认60秒\")\n    CRAWL_INTERVAL_EEO: int = Field(default=60, description=\"经济观察网爬取间隔（秒），默认60秒\")\n    CRAWL_INTERVAL_CAIJING: int = Field(default=60, description=\"财经网爬取间隔（秒），默认60秒\")\n    CRAWL_INTERVAL_JINGJI21: int = Field(default=60, description=\"21经济网爬取间隔（秒），默认60秒\")\n    CRAWL_INTERVAL_JRJ: int = Field(default=600, description=\"金融界爬取间隔（秒），默认10分钟\")\n    NEWS_RETENTION_HOURS: int = Field(default=72000, description=\"新闻保留时间（小时），临时设置为72000小时（约8年）以包含所有爬取的新闻\")\n    FRONTEND_REFETCH_INTERVAL: int = Field(default=180, description=\"前端自动刷新间隔（秒），默认3分钟\")\n    \n    # 日志配置\n    LOG_LEVEL: str = Field(default=\"INFO\")\n    LOG_FILE: Optional[str] = Field(default=\"logs/finnews.log\")\n    \n    # 安全配置\n    SECRET_KEY: str = Field(default=\"your-secret-key-here-change-in-production\")\n    ACCESS_TOKEN_EXPIRE_MINUTES: int = Field(default=60 * 24 * 7)  # 7 days\n    \n    # 业务配置\n    MAX_NEWS_PER_REQUEST: int = Field(default=50)\n    NEWS_CACHE_TTL: int = Field(default=3600)  # 1 hour\n    \n    model_config = SettingsConfigDict(\n        env_file=\".env\",\n        env_file_encoding=\"utf-8\",\n        case_sensitive=True,\n        extra=\"ignore\",\n        env_ignore_empty=True,\n    )\n\n\n# 全局配置实例\nsettings = Settings()\n\n\n# 便捷访问函数\ndef get_settings() -> Settings:\n    \"\"\"获取配置实例（用于依赖注入）\"\"\"\n    return settings\n\n"
  },
  {
    "path": "backend/app/core/database.py",
    "content": "\"\"\"\n数据库连接和依赖注入\n\"\"\"\nfrom typing import AsyncGenerator\nfrom sqlalchemy.ext.asyncio import AsyncSession\n\nfrom ..models.database import (\n    AsyncSessionLocal,\n    init_db as create_tables,\n    Base,\n)\n\n\nasync def get_db() -> AsyncGenerator[AsyncSession, None]:\n    \"\"\"\n    FastAPI 依赖注入：获取数据库会话\n    \n    Usage:\n        @app.get(\"/items\")\n        async def get_items(db: AsyncSession = Depends(get_db)):\n            ...\n    \n    Yields:\n        AsyncSession: 数据库会话\n    \"\"\"\n    async with AsyncSessionLocal() as session:\n        try:\n            yield session\n            await session.commit()\n        except Exception:\n            await session.rollback()\n            raise\n        finally:\n            await session.close()\n\n\ndef init_database():\n    \"\"\"\n    初始化数据库\n    创建所有表结构\n    \"\"\"\n    print(\"=\" * 50)\n    print(\"Initializing FinnewsHunter Database...\")\n    print(\"=\" * 50)\n    \n    try:\n        create_tables()\n        print(\"\\n✓ Database initialization completed successfully!\")\n    except Exception as e:\n        print(f\"\\n✗ Database initialization failed: {e}\")\n        raise\n\n\nif __name__ == \"__main__\":\n    # 直接运行此文件以初始化数据库\n    init_database()\n\n"
  },
  {
    "path": "backend/app/core/neo4j_client.py",
    "content": "\"\"\"\nNeo4j 图数据库客户端\n用于存储和查询公司知识图谱\n\"\"\"\nimport logging\nfrom typing import Optional, Dict, List, Any\nfrom neo4j import GraphDatabase, Driver\nfrom contextlib import contextmanager\n\nfrom .config import settings\n\nlogger = logging.getLogger(__name__)\n\n\nclass Neo4jClient:\n    \"\"\"Neo4j 客户端封装\"\"\"\n    \n    def __init__(\n        self,\n        uri: str = None,\n        user: str = None,\n        password: str = None\n    ):\n        \"\"\"\n        初始化 Neo4j 客户端\n        \n        Args:\n            uri: Neo4j URI（如 bolt://localhost:7687）\n            user: 用户名\n            password: 密码\n        \"\"\"\n        self.uri = uri or settings.NEO4J_URI or \"bolt://localhost:7687\"\n        self.user = user or settings.NEO4J_USER or \"neo4j\"\n        self.password = password or settings.NEO4J_PASSWORD or \"finnews_neo4j_password\"\n        \n        self._driver: Optional[Driver] = None\n        self._connected = False\n    \n    def connect(self):\n        \"\"\"建立连接\"\"\"\n        if self._connected:\n            return\n        \n        try:\n            self._driver = GraphDatabase.driver(\n                self.uri,\n                auth=(self.user, self.password)\n            )\n            # 测试连接\n            self._driver.verify_connectivity()\n            self._connected = True\n            logger.info(f\"✅ Neo4j 连接成功: {self.uri}\")\n        except Exception as e:\n            logger.error(f\"❌ Neo4j 连接失败: {e}\")\n            raise\n    \n    def close(self):\n        \"\"\"关闭连接\"\"\"\n        if self._driver:\n            self._driver.close()\n            self._connected = False\n            logger.info(\"Neo4j 连接已关闭\")\n    \n    @contextmanager\n    def session(self):\n        \"\"\"获取会话（上下文管理器）\"\"\"\n        if not self._connected:\n            self.connect()\n        \n        session = self._driver.session()\n        try:\n            yield session\n        finally:\n            session.close()\n    \n    def execute_query(\n        self,\n        query: str,\n        parameters: Dict[str, Any] = None\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        执行 Cypher 查询\n        \n        Args:\n            query: Cypher 查询语句\n            parameters: 查询参数\n            \n        Returns:\n            查询结果列表\n        \"\"\"\n        with self.session() as session:\n            result = session.run(query, parameters or {})\n            return [dict(record) for record in result]\n    \n    def execute_write(\n        self,\n        query: str,\n        parameters: Dict[str, Any] = None\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        执行写入操作\n        \n        Args:\n            query: Cypher 写入语句\n            parameters: 参数\n            \n        Returns:\n            写入结果\n        \"\"\"\n        with self.session() as session:\n            result = session.run(query, parameters or {})\n            return [dict(record) for record in result]\n    \n    def is_connected(self) -> bool:\n        \"\"\"检查连接状态\"\"\"\n        return self._connected\n    \n    def health_check(self) -> bool:\n        \"\"\"健康检查\"\"\"\n        try:\n            if not self._connected:\n                self.connect()\n            \n            with self.session() as session:\n                result = session.run(\"RETURN 1 as health\")\n                return result.single()[\"health\"] == 1\n        except Exception as e:\n            logger.error(f\"Neo4j 健康检查失败: {e}\")\n            return False\n\n\n# 全局单例\n_neo4j_client: Optional[Neo4jClient] = None\n\n\ndef get_neo4j_client() -> Neo4jClient:\n    \"\"\"获取 Neo4j 客户端单例\"\"\"\n    global _neo4j_client\n    if _neo4j_client is None:\n        _neo4j_client = Neo4jClient()\n        _neo4j_client.connect()\n    return _neo4j_client\n\n\ndef close_neo4j_client():\n    \"\"\"关闭 Neo4j 客户端\"\"\"\n    global _neo4j_client\n    if _neo4j_client:\n        _neo4j_client.close()\n        _neo4j_client = None\n\n"
  },
  {
    "path": "backend/app/core/redis_client.py",
    "content": "\"\"\"\nRedis Client for Caching and Task Queue\n\"\"\"\nimport json\nimport logging\nfrom typing import Optional, Any\nfrom datetime import datetime, timedelta\n\nimport redis\nfrom app.core.config import settings\n\nlogger = logging.getLogger(__name__)\n\n\nclass RedisClient:\n    \"\"\"Redis client wrapper with JSON serialization support\"\"\"\n    \n    def __init__(self):\n        try:\n            self.client = redis.Redis(\n                host=settings.REDIS_HOST,\n                port=settings.REDIS_PORT,\n                db=settings.REDIS_DB,\n                password=settings.REDIS_PASSWORD if settings.REDIS_PASSWORD else None,\n                decode_responses=True,  # 自动解码为字符串\n                socket_connect_timeout=5,\n                socket_timeout=5,\n            )\n            # 测试连接\n            self.client.ping()\n            logger.info(f\"✅ Redis connected: {settings.REDIS_HOST}:{settings.REDIS_PORT}\")\n        except Exception as e:\n            logger.error(f\"❌ Redis connection failed: {e}\")\n            self.client = None\n    \n    def is_available(self) -> bool:\n        \"\"\"检查 Redis 是否可用\"\"\"\n        try:\n            if self.client:\n                self.client.ping()\n                return True\n        except:\n            pass\n        return False\n    \n    def get_json(self, key: str) -> Optional[Any]:\n        \"\"\"获取 JSON 数据\"\"\"\n        if not self.is_available():\n            return None\n        \n        try:\n            value = self.client.get(key)\n            if value:\n                return json.loads(value)\n        except Exception as e:\n            logger.error(f\"Redis get_json error: {e}\")\n        return None\n    \n    def set_json(self, key: str, value: Any, ttl: int = None) -> bool:\n        \"\"\"存储 JSON 数据\"\"\"\n        if not self.is_available():\n            return False\n        \n        try:\n            json_str = json.dumps(value, ensure_ascii=False, default=str)\n            if ttl:\n                self.client.setex(key, ttl, json_str)\n            else:\n                self.client.set(key, json_str)\n            return True\n        except Exception as e:\n            logger.error(f\"Redis set_json error: {e}\")\n            return False\n    \n    def get(self, key: str) -> Optional[str]:\n        \"\"\"获取字符串数据\"\"\"\n        if not self.is_available():\n            return None\n        \n        try:\n            return self.client.get(key)\n        except Exception as e:\n            logger.error(f\"Redis get error: {e}\")\n            return None\n    \n    def set(self, key: str, value: str, ttl: int = None) -> bool:\n        \"\"\"存储字符串数据\"\"\"\n        if not self.is_available():\n            return False\n        \n        try:\n            if ttl:\n                self.client.setex(key, ttl, value)\n            else:\n                self.client.set(key, value)\n            return True\n        except Exception as e:\n            logger.error(f\"Redis set error: {e}\")\n            return False\n    \n    def delete(self, key: str) -> bool:\n        \"\"\"删除键\"\"\"\n        if not self.is_available():\n            return False\n        \n        try:\n            self.client.delete(key)\n            return True\n        except Exception as e:\n            logger.error(f\"Redis delete error: {e}\")\n            return False\n    \n    def exists(self, key: str) -> bool:\n        \"\"\"检查键是否存在\"\"\"\n        if not self.is_available():\n            return False\n        \n        try:\n            return self.client.exists(key) > 0\n        except Exception as e:\n            logger.error(f\"Redis exists error: {e}\")\n            return False\n    \n    def get_cache_metadata(self, key: str) -> Optional[dict]:\n        \"\"\"获取缓存元数据（时间戳）\"\"\"\n        time_key = f\"{key}:timestamp\"\n        timestamp_str = self.get(time_key)\n        \n        if timestamp_str:\n            try:\n                return {\n                    \"timestamp\": datetime.fromisoformat(timestamp_str),\n                    \"age_seconds\": (datetime.now() - datetime.fromisoformat(timestamp_str)).total_seconds()\n                }\n            except:\n                pass\n        return None\n    \n    def set_with_metadata(self, key: str, value: Any, ttl: int = None) -> bool:\n        \"\"\"存储数据并记录时间戳\"\"\"\n        success = self.set_json(key, value, ttl)\n        if success:\n            time_key = f\"{key}:timestamp\"\n            self.set(time_key, datetime.now().isoformat(), ttl)\n        return success\n    \n    def clear_pattern(self, pattern: str) -> int:\n        \"\"\"清除匹配模式的所有键\"\"\"\n        if not self.is_available():\n            return 0\n        \n        try:\n            keys = self.client.keys(pattern)\n            if keys:\n                return self.client.delete(*keys)\n        except Exception as e:\n            logger.error(f\"Redis clear_pattern error: {e}\")\n        return 0\n\n\n# 全局单例\nredis_client = RedisClient()\n\n"
  },
  {
    "path": "backend/app/financial/__init__.py",
    "content": "\"\"\"\nFinnewsHunter 金融数据层\n\n借鉴 OpenBB 的 Provider-Fetcher 架构，提供：\n1. Standard Models: 统一的数据模型 (NewsData, StockPriceData 等)\n2. Provider Registry: 多数据源管理与自动降级\n3. AgenticX Tools: 封装为 Agent 可调用的工具\n\n设计原则：\n- 不修改 AgenticX 核心，所有金融特定逻辑内化在本模块\n- TET Pipeline: Transform Query → Extract Data → Transform Data\n- 多源降级: Provider 失败时自动切换到备用源\n\"\"\"\nfrom .registry import get_registry, ProviderRegistry\nfrom .models.news import NewsQueryParams, NewsData, NewsSentiment\nfrom .models.stock import (\n    StockQueryParams,\n    StockPriceData,\n    KlineInterval,\n    AdjustType\n)\n\n__all__ = [\n    # Registry\n    \"get_registry\",\n    \"ProviderRegistry\",\n    # News Models\n    \"NewsQueryParams\",\n    \"NewsData\",\n    \"NewsSentiment\",\n    # Stock Models\n    \"StockQueryParams\",\n    \"StockPriceData\",\n    \"KlineInterval\",\n    \"AdjustType\",\n]\n"
  },
  {
    "path": "backend/app/financial/models/__init__.py",
    "content": "\"\"\"\n金融数据标准模型\n\n借鉴 OpenBB Standard Models 设计:\n- QueryParams: 定义标准输入参数\n- Data: 定义标准输出字段\n\n所有 Provider 的 Fetcher 都使用这些标准模型，确保数据格式一致。\n\"\"\"\nfrom .news import NewsQueryParams, NewsData, NewsSentiment\nfrom .stock import StockQueryParams, StockPriceData, KlineInterval, AdjustType\n\n__all__ = [\n    \"NewsQueryParams\",\n    \"NewsData\",\n    \"NewsSentiment\",\n    \"StockQueryParams\",\n    \"StockPriceData\",\n    \"KlineInterval\",\n    \"AdjustType\",\n]\n"
  },
  {
    "path": "backend/app/financial/models/news.py",
    "content": "\"\"\"\n金融新闻标准模型\n\n借鉴 OpenBB Standard Models 设计:\n- NewsQueryParams: 新闻查询参数标准模型\n- NewsData: 新闻数据标准模型\n\n所有 NewsProvider 的 Fetcher 都接收 NewsQueryParams 作为输入，\n返回 List[NewsData] 作为输出，确保不同数据源返回的数据格式一致。\n\n来源参考:\n- OpenBB: openbb_core.provider.standard_models\n- 设计文档: research/codedeepresearch/OpenBB/FinnewsHunter_improvement_plan.md\n\"\"\"\nfrom pydantic import BaseModel, Field\nfrom datetime import datetime\nfrom typing import Optional, List\nfrom enum import Enum\nimport hashlib\n\n\nclass NewsSentiment(str, Enum):\n    \"\"\"新闻情感标签\"\"\"\n    POSITIVE = \"positive\"\n    NEGATIVE = \"negative\"\n    NEUTRAL = \"neutral\"\n\n\nclass NewsQueryParams(BaseModel):\n    \"\"\"\n    新闻查询参数标准模型\n\n    所有 NewsProvider 的 Fetcher 都接收此模型作为输入，\n    内部再转换为各自 API 的参数格式 (transform_query)。\n\n    Example:\n        >>> params = NewsQueryParams(stock_codes=[\"600519\"], limit=10)\n        >>> fetcher.fetch(params)  # 返回 List[NewsData]\n    \"\"\"\n    keywords: Optional[List[str]] = Field(\n        default=None,\n        description=\"搜索关键词列表\"\n    )\n    stock_codes: Optional[List[str]] = Field(\n        default=None,\n        description=\"关联股票代码列表，如 ['600519', '000001']\"\n    )\n    start_date: Optional[datetime] = Field(\n        default=None,\n        description=\"新闻发布时间起始\"\n    )\n    end_date: Optional[datetime] = Field(\n        default=None,\n        description=\"新闻发布时间截止\"\n    )\n    limit: int = Field(\n        default=50,\n        ge=1,\n        le=500,\n        description=\"返回条数上限\"\n    )\n    source_filter: Optional[List[str]] = Field(\n        default=None,\n        description=\"数据源过滤，如 ['sina', 'tencent']\"\n    )\n\n    class Config:\n        json_schema_extra = {\n            \"example\": {\n                \"stock_codes\": [\"600519\", \"000001\"],\n                \"limit\": 20,\n                \"keywords\": [\"茅台\", \"白酒\"]\n            }\n        }\n\n\nclass NewsData(BaseModel):\n    \"\"\"\n    新闻数据标准模型\n\n    所有 Provider 返回的数据都必须转换为此模型，\n    确保上层 Agent 处理逻辑一致。\n\n    设计原则:\n    - 必填字段: id, title, content, source, source_url, publish_time\n    - 可选字段: summary, sentiment 等 (可由 LLM 后续填充)\n    - extra 字段: 存储 Provider 特有的额外数据\n    \"\"\"\n    id: str = Field(..., description=\"新闻唯一标识 (建议用 URL 的 MD5)\")\n    title: str = Field(..., description=\"新闻标题\")\n    content: str = Field(..., description=\"新闻正文\")\n    summary: Optional[str] = Field(default=None, description=\"摘要（可由 LLM 生成）\")\n    source: str = Field(..., description=\"来源网站名称，如 'sina', 'tencent'\")\n    source_url: str = Field(..., description=\"原文链接\")\n    publish_time: datetime = Field(..., description=\"发布时间\")\n    crawl_time: Optional[datetime] = Field(\n        default_factory=datetime.now,\n        description=\"抓取时间\"\n    )\n\n    # 关联信息\n    stock_codes: List[str] = Field(\n        default_factory=list,\n        description=\"关联股票代码，如 ['SH600519', 'SZ000001']\"\n    )\n    stock_names: List[str] = Field(\n        default_factory=list,\n        description=\"关联股票名称，如 ['贵州茅台', '平安银行']\"\n    )\n\n    # 情感分析（可选，由 Agent 或 LLM 填充）\n    sentiment: Optional[NewsSentiment] = Field(\n        default=None,\n        description=\"情感标签\"\n    )\n    sentiment_score: Optional[float] = Field(\n        default=None,\n        ge=-1,\n        le=1,\n        description=\"情感分数：-1(极度负面) ~ 1(极度正面)\"\n    )\n\n    # 原始数据（可选）\n    keywords: List[str] = Field(\n        default_factory=list,\n        description=\"关键词列表\"\n    )\n    author: Optional[str] = Field(default=None, description=\"作者\")\n\n    # 元数据\n    extra: dict = Field(\n        default_factory=dict,\n        description=\"Provider 特有的额外字段\"\n    )\n\n    class Config:\n        json_encoders = {\n            datetime: lambda v: v.isoformat()\n        }\n        json_schema_extra = {\n            \"example\": {\n                \"id\": \"a1b2c3d4e5f6\",\n                \"title\": \"贵州茅台2024年三季度业绩超预期\",\n                \"content\": \"贵州茅台发布2024年三季度报告...\",\n                \"source\": \"sina\",\n                \"source_url\": \"https://finance.sina.com.cn/stock/...\",\n                \"publish_time\": \"2024-10-30T10:30:00\",\n                \"stock_codes\": [\"SH600519\"],\n                \"sentiment\": \"positive\",\n                \"sentiment_score\": 0.8\n            }\n        }\n\n    @staticmethod\n    def generate_id(url: str) -> str:\n        \"\"\"根据 URL 生成唯一 ID\"\"\"\n        return hashlib.md5(url.encode()).hexdigest()[:16]\n\n    def to_legacy_dict(self) -> dict:\n        \"\"\"\n        转换为旧版 NewsItem 格式 (兼容现有代码)\n\n        Returns:\n            与旧版 NewsItem.to_dict() 格式一致的字典\n        \"\"\"\n        return {\n            \"title\": self.title,\n            \"content\": self.content,\n            \"url\": self.source_url,\n            \"source\": self.source,\n            \"publish_time\": self.publish_time.isoformat() if self.publish_time else None,\n            \"author\": self.author,\n            \"keywords\": self.keywords,\n            \"stock_codes\": self.stock_codes,\n            \"summary\": self.summary,\n            \"raw_html\": self.extra.get(\"raw_html\"),\n        }\n"
  },
  {
    "path": "backend/app/financial/models/stock.py",
    "content": "\"\"\"\n股票数据标准模型\n\n借鉴 OpenBB Standard Models 设计:\n- StockQueryParams: 股票数据查询参数\n- StockPriceData: 股票价格数据 (K线)\n\n来源参考:\n- OpenBB: openbb_core.provider.standard_models\n- 设计文档: research/codedeepresearch/OpenBB/FinnewsHunter_improvement_plan.md\n\"\"\"\nfrom pydantic import BaseModel, Field\nfrom datetime import date, datetime\nfrom typing import Optional, List\nfrom enum import Enum\n\n\nclass KlineInterval(str, Enum):\n    \"\"\"K线周期\"\"\"\n    MIN_1 = \"1m\"\n    MIN_5 = \"5m\"\n    MIN_15 = \"15m\"\n    MIN_30 = \"30m\"\n    MIN_60 = \"60m\"\n    DAILY = \"1d\"\n    WEEKLY = \"1w\"\n    MONTHLY = \"1M\"\n\n\nclass AdjustType(str, Enum):\n    \"\"\"复权类型\"\"\"\n    NONE = \"none\"\n    QFQ = \"qfq\"    # 前复权\n    HFQ = \"hfq\"    # 后复权\n\n\nclass StockQueryParams(BaseModel):\n    \"\"\"\n    股票数据查询参数\n\n    Example:\n        >>> params = StockQueryParams(symbol=\"600519\", interval=KlineInterval.DAILY)\n        >>> fetcher.fetch(params)  # 返回 List[StockPriceData]\n    \"\"\"\n    symbol: str = Field(..., description=\"股票代码，如 '600519' 或 'SH600519'\")\n    start_date: Optional[date] = Field(default=None, description=\"开始日期\")\n    end_date: Optional[date] = Field(default=None, description=\"结束日期\")\n    interval: KlineInterval = Field(\n        default=KlineInterval.DAILY,\n        description=\"K线周期\"\n    )\n    adjust: AdjustType = Field(\n        default=AdjustType.QFQ,\n        description=\"复权类型\"\n    )\n    limit: int = Field(\n        default=90,\n        ge=1,\n        le=1000,\n        description=\"返回条数\"\n    )\n\n    class Config:\n        json_schema_extra = {\n            \"example\": {\n                \"symbol\": \"600519\",\n                \"interval\": \"1d\",\n                \"limit\": 90,\n                \"adjust\": \"qfq\"\n            }\n        }\n\n\nclass StockPriceData(BaseModel):\n    \"\"\"\n    股票价格数据（K线）\n\n    与现有 StockDataService 返回格式对齐，\n    确保迁移时的兼容性。\n    \"\"\"\n    symbol: str = Field(..., description=\"股票代码\")\n    date: datetime = Field(..., description=\"交易时间\")\n    open: float = Field(..., description=\"开盘价\")\n    high: float = Field(..., description=\"最高价\")\n    low: float = Field(..., description=\"最低价\")\n    close: float = Field(..., description=\"收盘价\")\n    volume: int = Field(..., description=\"成交量\")\n    turnover: Optional[float] = Field(default=None, description=\"成交额\")\n    change_percent: Optional[float] = Field(default=None, description=\"涨跌幅 %\")\n    change_amount: Optional[float] = Field(default=None, description=\"涨跌额\")\n    amplitude: Optional[float] = Field(default=None, description=\"振幅 %\")\n    turnover_rate: Optional[float] = Field(default=None, description=\"换手率 %\")\n\n    class Config:\n        json_encoders = {\n            datetime: lambda v: v.isoformat()\n        }\n\n    def to_legacy_dict(self) -> dict:\n        \"\"\"\n        转换为旧版 StockDataService 格式 (兼容现有代码)\n\n        Returns:\n            与旧版 get_kline_data 返回格式一致的字典\n        \"\"\"\n        return {\n            \"timestamp\": int(self.date.timestamp() * 1000),\n            \"date\": self.date.strftime(\"%Y-%m-%d\") if self.date else None,\n            \"open\": self.open,\n            \"high\": self.high,\n            \"low\": self.low,\n            \"close\": self.close,\n            \"volume\": self.volume,\n            \"turnover\": self.turnover or 0,\n            \"change_percent\": self.change_percent or 0,\n            \"change_amount\": self.change_amount or 0,\n            \"amplitude\": self.amplitude or 0,\n            \"turnover_rate\": self.turnover_rate or 0,\n        }\n\n\nclass StockRealtimeData(BaseModel):\n    \"\"\"股票实时行情\"\"\"\n    symbol: str\n    name: str\n    price: float\n    change_percent: float\n    change_amount: float\n    volume: int\n    turnover: float\n    high: float\n    low: float\n    open: float\n    prev_close: float\n    timestamp: datetime = Field(default_factory=datetime.now)\n\n\nclass StockFinancialData(BaseModel):\n    \"\"\"股票财务指标\"\"\"\n    symbol: str\n    pe_ratio: Optional[float] = None          # 市盈率\n    pb_ratio: Optional[float] = None          # 市净率\n    roe: Optional[float] = None               # 净资产收益率\n    total_market_value: Optional[float] = None\n    circulating_market_value: Optional[float] = None\n    gross_profit_margin: Optional[float] = None\n    net_profit_margin: Optional[float] = None\n    debt_ratio: Optional[float] = None\n    revenue_yoy: Optional[float] = None       # 营收同比\n    profit_yoy: Optional[float] = None        # 净利润同比\n"
  },
  {
    "path": "backend/app/financial/providers/__init__.py",
    "content": "\"\"\"\n数据源 Provider 模块\n\n每个 Provider 代表一个数据源（如 Sina, Tencent, AkShare），\n每个 Provider 下可以有多个 Fetcher，每个 Fetcher 对应一种数据类型。\n\n架构:\n    Provider (数据源)\n    └── Fetcher (数据获取器，实现 TET Pipeline)\n        ├── transform_query: 将标准参数转换为 Provider 特定参数\n        ├── extract_data: 执行实际的数据获取\n        └── transform_data: 将原始数据转换为标准模型\n\"\"\"\nfrom .base import BaseProvider, BaseFetcher, ProviderInfo\n\n__all__ = [\n    \"BaseProvider\",\n    \"BaseFetcher\",\n    \"ProviderInfo\",\n]\n"
  },
  {
    "path": "backend/app/financial/providers/base.py",
    "content": "\"\"\"\nProvider & Fetcher 基础抽象\n\n借鉴 OpenBB 的 TET (Transform-Extract-Transform) Pipeline:\n1. Transform Query: 将标准参数转换为 Provider 特定参数\n2. Extract Data: 执行实际的数据获取 (HTTP/爬虫/SDK)\n3. Transform Data: 将原始数据转换为标准模型\n\n来源参考:\n- OpenBB: openbb_core.provider.abstract.fetcher.Fetcher\n- 设计文档: research/codedeepresearch/OpenBB/FinnewsHunter_improvement_plan.md\n\"\"\"\nfrom abc import ABC, abstractmethod\nfrom typing import TypeVar, Generic, Dict, Any, List, Type, Optional\nfrom pydantic import BaseModel\nfrom dataclasses import dataclass, field\nimport logging\n\n# 泛型类型变量\nQueryT = TypeVar(\"QueryT\", bound=BaseModel)\nDataT = TypeVar(\"DataT\", bound=BaseModel)\n\n\n@dataclass\nclass ProviderInfo:\n    \"\"\"\n    Provider 元信息\n\n    Attributes:\n        name: 唯一标识，如 'sina', 'akshare'\n        display_name: 显示名称，如 '新浪财经'\n        description: 描述\n        website: 官网 URL\n        requires_credentials: 是否需要 API Key\n        credential_keys: 需要的凭证 key 列表\n        priority: 降级优先级，数字越小越优先\n    \"\"\"\n    name: str\n    display_name: str\n    description: str\n    website: Optional[str] = None\n    requires_credentials: bool = False\n    credential_keys: List[str] = field(default_factory=list)\n    priority: int = 0  # 数字越小，优先级越高\n\n\nclass BaseFetcher(ABC, Generic[QueryT, DataT]):\n    \"\"\"\n    数据获取器基类 - 实现 TET (Transform-Extract-Transform) Pipeline\n\n    子类必须:\n    1. 声明 query_model 和 data_model 类属性\n    2. 实现 transform_query, extract_data, transform_data 三个抽象方法\n\n    Example:\n        >>> class SinaNewsFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n        ...     query_model = NewsQueryParams\n        ...     data_model = NewsData\n        ...\n        ...     def transform_query(self, params):\n        ...         return {\"url\": \"...\", \"limit\": params.limit}\n        ...\n        ...     async def extract_data(self, query):\n        ...         return await self._fetch_html(query[\"url\"])\n        ...\n        ...     def transform_data(self, raw_data, query):\n        ...         return [NewsData(...) for item in raw_data]\n    \"\"\"\n\n    # 子类必须声明这两个类属性\n    query_model: Type[QueryT]\n    data_model: Type[DataT]\n\n    def __init__(self):\n        self.logger = logging.getLogger(\n            f\"{self.__class__.__module__}.{self.__class__.__name__}\"\n        )\n\n    @abstractmethod\n    def transform_query(self, params: QueryT) -> Dict[str, Any]:\n        \"\"\"\n        [T]ransform Query: 将标准参数转换为 Provider 特定参数\n\n        Args:\n            params: 标准查询参数 (NewsQueryParams, StockQueryParams 等)\n\n        Returns:\n            Provider 特定的参数字典\n\n        Example:\n            NewsQueryParams(stock_codes=['600519'], limit=10)\n            → {'url': 'https://...', 'symbol': 'sh600519', 'count': 10}\n        \"\"\"\n        pass\n\n    @abstractmethod\n    async def extract_data(self, query: Dict[str, Any]) -> Any:\n        \"\"\"\n        [E]xtract Data: 执行实际的数据获取\n\n        可以是:\n        - HTTP 请求\n        - 网页爬虫\n        - SDK 调用\n        - 数据库查询\n\n        Args:\n            query: transform_query 返回的参数字典\n\n        Returns:\n            原始数据 (任意格式，由 transform_data 处理)\n        \"\"\"\n        pass\n\n    @abstractmethod\n    def transform_data(self, raw_data: Any, query: QueryT) -> List[DataT]:\n        \"\"\"\n        [T]ransform Data: 将原始数据转换为标准模型\n\n        Args:\n            raw_data: extract_data 返回的原始数据\n            query: 原始查询参数 (可用于补充信息)\n\n        Returns:\n            标准模型列表 (List[NewsData], List[StockPriceData] 等)\n        \"\"\"\n        pass\n\n    async def fetch(self, params: QueryT) -> List[DataT]:\n        \"\"\"\n        完整的 TET 执行流程\n\n        Args:\n            params: 标准查询参数\n\n        Returns:\n            标准模型列表\n\n        Raises:\n            Exception: 任何阶段失败时抛出异常\n        \"\"\"\n        self.logger.info(f\"Fetching with params: {params.model_dump()}\")\n\n        # T: Transform Query\n        query = self.transform_query(params)\n        self.logger.debug(f\"Transformed query: {query}\")\n\n        # E: Extract Data\n        raw = await self.extract_data(query)\n        raw_count = len(raw) if isinstance(raw, (list, tuple)) else 1\n        self.logger.debug(f\"Extracted {raw_count} raw records\")\n\n        # T: Transform Data\n        results = self.transform_data(raw, params)\n        self.logger.info(f\"Transformed to {len(results)} standard records\")\n\n        return results\n\n    def fetch_sync(self, params: QueryT) -> List[DataT]:\n        \"\"\"\n        同步版本的 fetch (用于非异步环境)\n\n        Args:\n            params: 标准查询参数\n\n        Returns:\n            标准模型列表\n        \"\"\"\n        import asyncio\n        return asyncio.run(self.fetch(params))\n\n\nclass BaseProvider(ABC):\n    \"\"\"\n    Provider 基类 - 定义数据源能力\n\n    每个 Provider 可以有多个 Fetcher，每个 Fetcher 对应一种数据类型。\n\n    Example:\n        >>> class SinaProvider(BaseProvider):\n        ...     @property\n        ...     def info(self):\n        ...         return ProviderInfo(name=\"sina\", ...)\n        ...\n        ...     @property\n        ...     def fetchers(self):\n        ...         return {\"news\": SinaNewsFetcher}\n    \"\"\"\n\n    @property\n    @abstractmethod\n    def info(self) -> ProviderInfo:\n        \"\"\"返回 Provider 元信息\"\"\"\n        pass\n\n    @property\n    @abstractmethod\n    def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n        \"\"\"\n        返回支持的 Fetcher 映射\n\n        Returns:\n            格式: {data_type: FetcherClass}\n            例如: {'news': SinaNewsFetcher, 'stock_price': SinaStockFetcher}\n        \"\"\"\n        pass\n\n    def get_fetcher(self, data_type: str) -> Optional[BaseFetcher]:\n        \"\"\"\n        获取指定类型的 Fetcher 实例\n\n        Args:\n            data_type: 数据类型，如 'news', 'stock_price'\n\n        Returns:\n            Fetcher 实例，如果不支持该类型则返回 None\n        \"\"\"\n        fetcher_cls = self.fetchers.get(data_type)\n        if fetcher_cls:\n            return fetcher_cls()\n        return None\n\n    def supports(self, data_type: str) -> bool:\n        \"\"\"\n        检查是否支持某种数据类型\n\n        Args:\n            data_type: 数据类型\n\n        Returns:\n            是否支持\n        \"\"\"\n        return data_type in self.fetchers\n\n    def __repr__(self) -> str:\n        return f\"<{self.__class__.__name__} name='{self.info.name}' types={list(self.fetchers.keys())}>\"\n"
  },
  {
    "path": "backend/app/financial/providers/eastmoney/__init__.py",
    "content": "\"\"\"\n东方财富 Provider\n\"\"\"\nfrom .provider import EastmoneyProvider\nfrom .fetchers.news import EastmoneyNewsFetcher\n\n__all__ = [\"EastmoneyProvider\", \"EastmoneyNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/eastmoney/fetchers/__init__.py",
    "content": "\"\"\"\n东方财富 Fetchers\n\"\"\"\nfrom .news import EastmoneyNewsFetcher\n\n__all__ = [\"EastmoneyNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/eastmoney/fetchers/news.py",
    "content": "\"\"\"\n东方财富新闻 Fetcher\n\n基于 TET Pipeline 实现\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\nimport requests\n\nfrom ...base import BaseFetcher\nfrom ....models.news import NewsQueryParams, NewsData, NewsSentiment\n\nlogger = logging.getLogger(__name__)\n\n\nclass EastmoneyNewsFetcher(BaseFetcher):\n    \"\"\"\n    东方财富新闻 Fetcher\n    \n    数据源: https://stock.eastmoney.com/\n    \"\"\"\n    \n    BASE_URL = \"https://stock.eastmoney.com/\"\n    STOCK_URL = \"https://stock.eastmoney.com/news/\"\n    SOURCE_NAME = \"eastmoney\"\n    \n    HEADERS = {\n        \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\",\n        \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\",\n        \"Accept-Language\": \"zh-CN,zh;q=0.9,en;q=0.8\",\n    }\n    \n    def transform_query(self, params: NewsQueryParams) -> Dict[str, Any]:\n        \"\"\"转换标准查询参数\"\"\"\n        return {\n            \"url\": self.STOCK_URL,\n            \"limit\": params.limit or 20,\n            \"stock_codes\": params.stock_codes,\n            \"keywords\": params.keywords,\n        }\n    \n    def extract_data(self, query: Dict[str, Any]) -> List[Dict[str, Any]]:\n        \"\"\"从东方财富抓取原始数据\"\"\"\n        raw_news = []\n        \n        try:\n            # 尝试股票新闻页面，失败则尝试主页\n            try:\n                response = requests.get(query[\"url\"], headers=self.HEADERS, timeout=30)\n                response.raise_for_status()\n            except:\n                response = requests.get(self.BASE_URL, headers=self.HEADERS, timeout=30)\n                response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, \"html.parser\")\n            news_links = self._extract_news_links(soup)\n            \n            logger.info(f\"[Eastmoney] Found {len(news_links)} news links\")\n            \n            max_fetch = min(query[\"limit\"], 20)\n            \n            for link_info in news_links[:max_fetch]:\n                try:\n                    news_item = self._fetch_news_detail(link_info)\n                    if news_item:\n                        raw_news.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"[Eastmoney] Failed to fetch {link_info['url']}: {e}\")\n                    continue\n            \n            logger.info(f\"[Eastmoney] Extracted {len(raw_news)} news items\")\n            \n        except Exception as e:\n            logger.error(f\"[Eastmoney] Extract failed: {e}\")\n        \n        return raw_news\n    \n    def transform_data(\n        self,\n        raw_data: List[Dict[str, Any]],\n        params: NewsQueryParams\n    ) -> List[NewsData]:\n        \"\"\"转换原始数据为标准 NewsData 格式\"\"\"\n        news_list = []\n        \n        for item in raw_data:\n            try:\n                stock_codes = self._extract_stock_codes(\n                    item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                )\n                \n                if params.stock_codes:\n                    if not any(code in stock_codes for code in params.stock_codes):\n                        continue\n                \n                if params.keywords:\n                    text = item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                    if not any(kw in text for kw in params.keywords):\n                        continue\n                \n                news = NewsData(\n                    title=item.get(\"title\", \"\"),\n                    content=item.get(\"content\", \"\"),\n                    source=self.SOURCE_NAME,\n                    source_url=item.get(\"url\", \"\"),\n                    publish_time=item.get(\"publish_time\", datetime.now()),\n                    author=item.get(\"author\"),\n                    stock_codes=stock_codes,\n                    sentiment=NewsSentiment.NEUTRAL,\n                )\n                news_list.append(news)\n                \n            except Exception as e:\n                logger.warning(f\"[Eastmoney] Transform failed: {e}\")\n                continue\n        \n        if params.limit:\n            news_list = news_list[:params.limit]\n        \n        return news_list\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[Dict[str, str]]:\n        \"\"\"从页面提取新闻链接\"\"\"\n        news_links = []\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 东方财富新闻URL模式\n            if ('eastmoney.com' in href and ('/news/' in href or '/stock/' in href or '.html' in href)) and title:\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://stock.eastmoney.com' + href\n                elif not href.startswith('http'):\n                    href = 'https://stock.eastmoney.com/' + href.lstrip('/')\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _fetch_news_detail(self, link_info: Dict[str, str]) -> Optional[Dict[str, Any]]:\n        \"\"\"获取新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = requests.get(url, headers=self.HEADERS, timeout=30)\n            response.raise_for_status()\n            soup = BeautifulSoup(response.text, \"html.parser\")\n            \n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            publish_time = self._extract_publish_time(soup)\n            author = self._extract_author(soup)\n            \n            return {\n                \"title\": title,\n                \"content\": content,\n                \"url\": url,\n                \"publish_time\": publish_time,\n                \"author\": author,\n            }\n            \n        except Exception as e:\n            logger.debug(f\"[Eastmoney] Detail fetch failed: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'Body'},\n            {'id': 'ContentBody'},\n            {'class': 'article-content'},\n            {'class': 'newsContent'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([\n                        p.get_text(strip=True) for p in paragraphs \n                        if p.get_text(strip=True)\n                    ])\n                    if content:\n                        return self._clean_text(content)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> datetime:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('div', {'class': re.compile(r'time|date')})\n            if not time_elem:\n                time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception:\n            pass\n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        formats = ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M', '%Y-%m-%d', '%Y年%m月%d日 %H:%M']\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        return datetime.now()\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            elem = soup.find('div', {'class': re.compile(r'author|source')})\n            if not elem:\n                elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if elem:\n                return elem.get_text(strip=True)\n        except Exception:\n            pass\n        return None\n    \n    def _extract_stock_codes(self, text: str) -> List[str]:\n        \"\"\"从文本提取股票代码\"\"\"\n        patterns = [\n            r'(\\d{6})\\.(SH|SZ|sh|sz)',\n            r'(SH|SZ|sh|sz)(\\d{6})',\n            r'[（(](\\d{6})[)）]',\n        ]\n        \n        codes = set()\n        for pattern in patterns:\n            matches = re.findall(pattern, text)\n            for match in matches:\n                if isinstance(match, tuple):\n                    code = ''.join(match)\n                else:\n                    code = match\n                code = re.sub(r'[^0-9]', '', code)\n                if len(code) == 6:\n                    codes.add(code)\n        \n        return list(codes)\n    \n    def _clean_text(self, text: str) -> str:\n        \"\"\"清理文本\"\"\"\n        text = re.sub(r'\\s+', ' ', text)\n        return text.strip()\n"
  },
  {
    "path": "backend/app/financial/providers/eastmoney/provider.py",
    "content": "\"\"\"\n东方财富 Provider\n\"\"\"\nfrom typing import Dict, Type\n\nfrom ..base import BaseProvider, BaseFetcher, ProviderInfo\nfrom .fetchers.news import EastmoneyNewsFetcher\n\n\nclass EastmoneyProvider(BaseProvider):\n    \"\"\"\n    东方财富数据源\n\n    支持的数据类型:\n    - news: 财经新闻\n    \"\"\"\n\n    @property\n    def info(self) -> ProviderInfo:\n        return ProviderInfo(\n            name=\"eastmoney\",\n            display_name=\"东方财富\",\n            description=\"东方财富股票新闻 (eastmoney.com)\",\n            website=\"https://stock.eastmoney.com/\",\n            requires_credentials=False,\n            priority=4  # 第四优先级\n        )\n\n    @property\n    def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n        return {\n            \"news\": EastmoneyNewsFetcher,\n        }\n"
  },
  {
    "path": "backend/app/financial/providers/nbd/__init__.py",
    "content": "\"\"\"\n每日经济新闻 Provider\n\"\"\"\nfrom .provider import NbdProvider\nfrom .fetchers.news import NbdNewsFetcher\n\n__all__ = [\"NbdProvider\", \"NbdNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/nbd/fetchers/__init__.py",
    "content": "\"\"\"\n每日经济新闻 Fetchers\n\"\"\"\nfrom .news import NbdNewsFetcher\n\n__all__ = [\"NbdNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/nbd/fetchers/news.py",
    "content": "\"\"\"\n每日经济新闻 Fetcher\n\n基于 TET Pipeline 实现\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\nimport requests\n\nfrom ...base import BaseFetcher\nfrom ....models.news import NewsQueryParams, NewsData, NewsSentiment\n\nlogger = logging.getLogger(__name__)\n\n\nclass NbdNewsFetcher(BaseFetcher):\n    \"\"\"\n    每日经济新闻 Fetcher\n    \n    数据源: https://www.nbd.com.cn/\n    \"\"\"\n    \n    BASE_URL = \"https://www.nbd.com.cn/\"\n    STOCK_URL = \"https://www.nbd.com.cn/columns/3/\"  # 股市栏目\n    SOURCE_NAME = \"nbd\"\n    \n    HEADERS = {\n        \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\",\n        \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\",\n        \"Accept-Language\": \"zh-CN,zh;q=0.9,en;q=0.8\",\n    }\n    \n    def transform_query(self, params: NewsQueryParams) -> Dict[str, Any]:\n        \"\"\"转换标准查询参数\"\"\"\n        return {\n            \"url\": self.STOCK_URL,\n            \"limit\": params.limit or 20,\n            \"stock_codes\": params.stock_codes,\n            \"keywords\": params.keywords,\n        }\n    \n    def extract_data(self, query: Dict[str, Any]) -> List[Dict[str, Any]]:\n        \"\"\"从每日经济新闻抓取原始数据\"\"\"\n        raw_news = []\n        \n        try:\n            response = requests.get(query[\"url\"], headers=self.HEADERS, timeout=30)\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, \"html.parser\")\n            news_links = self._extract_news_links(soup)\n            \n            logger.info(f\"[NBD] Found {len(news_links)} news links\")\n            \n            max_fetch = min(query[\"limit\"], 20)\n            \n            for link_info in news_links[:max_fetch]:\n                try:\n                    news_item = self._fetch_news_detail(link_info)\n                    if news_item:\n                        raw_news.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"[NBD] Failed to fetch {link_info['url']}: {e}\")\n                    continue\n            \n            logger.info(f\"[NBD] Extracted {len(raw_news)} news items\")\n            \n        except Exception as e:\n            logger.error(f\"[NBD] Extract failed: {e}\")\n        \n        return raw_news\n    \n    def transform_data(\n        self,\n        raw_data: List[Dict[str, Any]],\n        params: NewsQueryParams\n    ) -> List[NewsData]:\n        \"\"\"转换原始数据为标准 NewsData 格式\"\"\"\n        news_list = []\n        \n        for item in raw_data:\n            try:\n                stock_codes = self._extract_stock_codes(\n                    item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                )\n                \n                if params.stock_codes:\n                    if not any(code in stock_codes for code in params.stock_codes):\n                        continue\n                \n                if params.keywords:\n                    text = item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                    if not any(kw in text for kw in params.keywords):\n                        continue\n                \n                news = NewsData(\n                    title=item.get(\"title\", \"\"),\n                    content=item.get(\"content\", \"\"),\n                    source=self.SOURCE_NAME,\n                    source_url=item.get(\"url\", \"\"),\n                    publish_time=item.get(\"publish_time\", datetime.now()),\n                    author=item.get(\"author\"),\n                    stock_codes=stock_codes,\n                    sentiment=NewsSentiment.NEUTRAL,\n                )\n                news_list.append(news)\n                \n            except Exception as e:\n                logger.warning(f\"[NBD] Transform failed: {e}\")\n                continue\n        \n        if params.limit:\n            news_list = news_list[:params.limit]\n        \n        return news_list\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[Dict[str, str]]:\n        \"\"\"从页面提取新闻链接\"\"\"\n        news_links = []\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            if ('/articles/' in href or '/article/' in href or '.html' in href) and title:\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://www.nbd.com.cn' + href\n                elif not href.startswith('http'):\n                    href = 'https://www.nbd.com.cn/' + href.lstrip('/')\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _fetch_news_detail(self, link_info: Dict[str, str]) -> Optional[Dict[str, Any]]:\n        \"\"\"获取新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = requests.get(url, headers=self.HEADERS, timeout=30)\n            response.raise_for_status()\n            soup = BeautifulSoup(response.text, \"html.parser\")\n            \n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            publish_time = self._extract_publish_time(soup)\n            author = self._extract_author(soup)\n            \n            return {\n                \"title\": title,\n                \"content\": content,\n                \"url\": url,\n                \"publish_time\": publish_time,\n                \"author\": author,\n            }\n            \n        except Exception as e:\n            logger.debug(f\"[NBD] Detail fetch failed: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'article-body'},\n            {'class': 'article__body'},\n            {'class': 'article-text'},\n            {'class': 'content-article'},\n            {'class': 'main-content'},\n            {'class': 'g-article-content'},\n            {'class': 'article-content'},\n            {'id': 'contentText'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find(['div', 'article', 'section'], selector)\n            if content_div:\n                for tag in content_div.find_all(['script', 'style', 'iframe', 'ins']):\n                    tag.decompose()\n                for ad in content_div.find_all(class_=re.compile(r'ad|advertisement|banner')):\n                    ad.decompose()\n                \n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([\n                        p.get_text(strip=True) for p in paragraphs \n                        if p.get_text(strip=True)\n                    ])\n                    if content and len(content) > 50:\n                        return self._clean_text(content)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> datetime:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date|pub')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception:\n            pass\n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        formats = ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M', '%Y-%m-%d', '%Y年%m月%d日 %H:%M']\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        return datetime.now()\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            elem = soup.find('span', {'class': re.compile(r'author|source|editor')})\n            if elem:\n                return elem.get_text(strip=True)\n        except Exception:\n            pass\n        return None\n    \n    def _extract_stock_codes(self, text: str) -> List[str]:\n        \"\"\"从文本提取股票代码\"\"\"\n        patterns = [\n            r'(\\d{6})\\.(SH|SZ|sh|sz)',\n            r'(SH|SZ|sh|sz)(\\d{6})',\n            r'[（(](\\d{6})[)）]',\n        ]\n        \n        codes = set()\n        for pattern in patterns:\n            matches = re.findall(pattern, text)\n            for match in matches:\n                if isinstance(match, tuple):\n                    code = ''.join(match)\n                else:\n                    code = match\n                code = re.sub(r'[^0-9]', '', code)\n                if len(code) == 6:\n                    codes.add(code)\n        \n        return list(codes)\n    \n    def _clean_text(self, text: str) -> str:\n        \"\"\"清理文本\"\"\"\n        text = re.sub(r'\\s+', ' ', text)\n        return text.strip()\n"
  },
  {
    "path": "backend/app/financial/providers/nbd/provider.py",
    "content": "\"\"\"\n每日经济新闻 Provider\n\"\"\"\nfrom typing import Dict, Type\n\nfrom ..base import BaseProvider, BaseFetcher, ProviderInfo\nfrom .fetchers.news import NbdNewsFetcher\n\n\nclass NbdProvider(BaseProvider):\n    \"\"\"\n    每日经济新闻数据源\n\n    支持的数据类型:\n    - news: 财经新闻\n    \"\"\"\n\n    @property\n    def info(self) -> ProviderInfo:\n        return ProviderInfo(\n            name=\"nbd\",\n            display_name=\"每日经济新闻\",\n            description=\"每日经济新闻 (nbd.com.cn)\",\n            website=\"https://www.nbd.com.cn/\",\n            requires_credentials=False,\n            priority=3  # 第三优先级\n        )\n\n    @property\n    def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n        return {\n            \"news\": NbdNewsFetcher,\n        }\n"
  },
  {
    "path": "backend/app/financial/providers/netease/__init__.py",
    "content": "\"\"\"\n网易财经 Provider\n\"\"\"\nfrom .provider import NeteaseProvider\nfrom .fetchers.news import NeteaseNewsFetcher\n\n__all__ = [\"NeteaseProvider\", \"NeteaseNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/netease/fetchers/__init__.py",
    "content": "\"\"\"\n网易财经 Fetchers\n\"\"\"\nfrom .news import NeteaseNewsFetcher\n\n__all__ = [\"NeteaseNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/netease/fetchers/news.py",
    "content": "\"\"\"\n网易财经新闻 Fetcher\n\n基于 TET Pipeline 实现\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\nimport requests\n\nfrom ...base import BaseFetcher\nfrom ....models.news import NewsQueryParams, NewsData, NewsSentiment\n\nlogger = logging.getLogger(__name__)\n\n\nclass NeteaseNewsFetcher(BaseFetcher):\n    \"\"\"\n    网易财经新闻 Fetcher\n    \n    数据源: https://money.163.com/\n    \"\"\"\n    \n    BASE_URL = \"https://money.163.com/\"\n    STOCK_URL = \"https://money.163.com/stock/\"\n    SOURCE_NAME = \"163\"\n    \n    HEADERS = {\n        \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\",\n        \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\",\n        \"Accept-Language\": \"zh-CN,zh;q=0.9,en;q=0.8\",\n    }\n    \n    def transform_query(self, params: NewsQueryParams) -> Dict[str, Any]:\n        \"\"\"转换标准查询参数\"\"\"\n        return {\n            \"url\": self.STOCK_URL,\n            \"limit\": params.limit or 20,\n            \"stock_codes\": params.stock_codes,\n            \"keywords\": params.keywords,\n        }\n    \n    def extract_data(self, query: Dict[str, Any]) -> List[Dict[str, Any]]:\n        \"\"\"从网易财经抓取原始数据\"\"\"\n        raw_news = []\n        \n        try:\n            # 尝试股票页面，失败则尝试主页\n            try:\n                response = requests.get(query[\"url\"], headers=self.HEADERS, timeout=30)\n                response.raise_for_status()\n            except:\n                response = requests.get(self.BASE_URL, headers=self.HEADERS, timeout=30)\n                response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, \"html.parser\")\n            news_links = self._extract_news_links(soup)\n            \n            logger.info(f\"[Netease] Found {len(news_links)} news links\")\n            \n            max_fetch = min(query[\"limit\"], 20)\n            \n            for link_info in news_links[:max_fetch]:\n                try:\n                    news_item = self._fetch_news_detail(link_info)\n                    if news_item:\n                        raw_news.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"[Netease] Failed to fetch {link_info['url']}: {e}\")\n                    continue\n            \n            logger.info(f\"[Netease] Extracted {len(raw_news)} news items\")\n            \n        except Exception as e:\n            logger.error(f\"[Netease] Extract failed: {e}\")\n        \n        return raw_news\n    \n    def transform_data(\n        self,\n        raw_data: List[Dict[str, Any]],\n        params: NewsQueryParams\n    ) -> List[NewsData]:\n        \"\"\"转换原始数据为标准 NewsData 格式\"\"\"\n        news_list = []\n        \n        for item in raw_data:\n            try:\n                stock_codes = self._extract_stock_codes(\n                    item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                )\n                \n                if params.stock_codes:\n                    if not any(code in stock_codes for code in params.stock_codes):\n                        continue\n                \n                if params.keywords:\n                    text = item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                    if not any(kw in text for kw in params.keywords):\n                        continue\n                \n                news = NewsData(\n                    title=item.get(\"title\", \"\"),\n                    content=item.get(\"content\", \"\"),\n                    source=self.SOURCE_NAME,\n                    source_url=item.get(\"url\", \"\"),\n                    publish_time=item.get(\"publish_time\", datetime.now()),\n                    author=item.get(\"author\"),\n                    stock_codes=stock_codes,\n                    sentiment=NewsSentiment.NEUTRAL,\n                )\n                news_list.append(news)\n                \n            except Exception as e:\n                logger.warning(f\"[Netease] Transform failed: {e}\")\n                continue\n        \n        if params.limit:\n            news_list = news_list[:params.limit]\n        \n        return news_list\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[Dict[str, str]]:\n        \"\"\"从页面提取新闻链接\"\"\"\n        news_links = []\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 网易新闻URL模式\n            if ('money.163.com' in href or 'stock' in href) and title:\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://money.163.com' + href\n                elif not href.startswith('http'):\n                    href = 'https://money.163.com/' + href.lstrip('/')\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _fetch_news_detail(self, link_info: Dict[str, str]) -> Optional[Dict[str, Any]]:\n        \"\"\"获取新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = requests.get(url, headers=self.HEADERS, timeout=30)\n            response.raise_for_status()\n            soup = BeautifulSoup(response.text, \"html.parser\")\n            \n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            publish_time = self._extract_publish_time(soup)\n            author = self._extract_author(soup)\n            \n            return {\n                \"title\": title,\n                \"content\": content,\n                \"url\": url,\n                \"publish_time\": publish_time,\n                \"author\": author,\n            }\n            \n        except Exception as e:\n            logger.debug(f\"[Netease] Detail fetch failed: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'post_text'},\n            {'id': 'endText'},\n            {'class': 'article-content'},\n            {'class': 'content'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([\n                        p.get_text(strip=True) for p in paragraphs \n                        if p.get_text(strip=True)\n                    ])\n                    if content:\n                        return self._clean_text(content)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> datetime:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('div', {'class': re.compile(r'post_time|time')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception:\n            pass\n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        formats = ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M', '%Y-%m-%d', '%Y年%m月%d日 %H:%M']\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        return datetime.now()\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if not elem:\n                elem = soup.find('div', {'id': 'ne_article_source'})\n            if elem:\n                return elem.get_text(strip=True)\n        except Exception:\n            pass\n        return None\n    \n    def _extract_stock_codes(self, text: str) -> List[str]:\n        \"\"\"从文本提取股票代码\"\"\"\n        patterns = [\n            r'(\\d{6})\\.(SH|SZ|sh|sz)',\n            r'(SH|SZ|sh|sz)(\\d{6})',\n            r'[（(](\\d{6})[)）]',\n        ]\n        \n        codes = set()\n        for pattern in patterns:\n            matches = re.findall(pattern, text)\n            for match in matches:\n                if isinstance(match, tuple):\n                    code = ''.join(match)\n                else:\n                    code = match\n                code = re.sub(r'[^0-9]', '', code)\n                if len(code) == 6:\n                    codes.add(code)\n        \n        return list(codes)\n    \n    def _clean_text(self, text: str) -> str:\n        \"\"\"清理文本\"\"\"\n        text = re.sub(r'\\s+', ' ', text)\n        return text.strip()\n"
  },
  {
    "path": "backend/app/financial/providers/netease/provider.py",
    "content": "\"\"\"\n网易财经 Provider\n\"\"\"\nfrom typing import Dict, Type\n\nfrom ..base import BaseProvider, BaseFetcher, ProviderInfo\nfrom .fetchers.news import NeteaseNewsFetcher\n\n\nclass NeteaseProvider(BaseProvider):\n    \"\"\"\n    网易财经数据源\n\n    支持的数据类型:\n    - news: 财经新闻\n    \"\"\"\n\n    @property\n    def info(self) -> ProviderInfo:\n        return ProviderInfo(\n            name=\"163\",\n            display_name=\"网易财经\",\n            description=\"网易财经股票新闻 (money.163.com)\",\n            website=\"https://money.163.com/\",\n            requires_credentials=False,\n            priority=6  # 第六优先级\n        )\n\n    @property\n    def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n        return {\n            \"news\": NeteaseNewsFetcher,\n        }\n"
  },
  {
    "path": "backend/app/financial/providers/sina/__init__.py",
    "content": "\"\"\"\n新浪财经 Provider\n\n提供:\n- 新闻数据 (news): SinaNewsFetcher\n\n从 tools/sina_crawler.py 迁移而来，保留核心逻辑，\n适配 TET Pipeline 架构。\n\"\"\"\nfrom .provider import SinaProvider\n\n__all__ = [\"SinaProvider\"]\n"
  },
  {
    "path": "backend/app/financial/providers/sina/fetchers/__init__.py",
    "content": "\"\"\"\n新浪财经 Fetchers\n\"\"\"\nfrom .news import SinaNewsFetcher\n\n__all__ = [\"SinaNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/sina/fetchers/news.py",
    "content": "\"\"\"\n新浪财经新闻 Fetcher\n\n从 tools/sina_crawler.py 迁移而来，适配 TET Pipeline 架构。\n\n主要变更:\n- transform_query: 将 NewsQueryParams 转换为爬虫参数\n- extract_data: 执行网页爬取\n- transform_data: 将原始数据转换为 NewsData 标准模型\n\n保留原有的:\n- 网页解析逻辑\n- 标题/内容/日期提取\n- 股票代码提取\n- 噪音过滤\n\n来源: tools/sina_crawler.py (SinaCrawlerTool)\n\"\"\"\nimport re\nimport time\nimport hashlib\nimport logging\nfrom typing import Dict, Any, List, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\n\nfrom ...base import BaseFetcher\nfrom ....models.news import NewsQueryParams, NewsData\n\nlogger = logging.getLogger(__name__)\n\n\nclass SinaNewsFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n    \"\"\"\n    新浪财经新闻获取器\n\n    实现 TET Pipeline:\n    - Transform Query: 将 NewsQueryParams 转换为爬虫参数\n    - Extract Data: 爬取网页\n    - Transform Data: 解析为 NewsData\n    \"\"\"\n\n    query_model = NewsQueryParams\n    data_model = NewsData\n\n    # 新浪财经最新滚动新闻页面\n    BASE_URL = \"https://finance.sina.com.cn/roll/c/56592.shtml\"\n    SOURCE_NAME = \"sina\"\n\n    # 请求配置\n    DEFAULT_TIMEOUT = 30\n    DEFAULT_DELAY = 0.5\n    DEFAULT_USER_AGENT = (\n        \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) \"\n        \"AppleWebKit/537.36 (KHTML, like Gecko) \"\n        \"Chrome/120.0.0.0 Safari/537.36\"\n    )\n\n    # 噪音文本模式\n    NOISE_PATTERNS = [\n        r'^责任编辑', r'^编辑[:：]', r'^来源[:：]', r'^声明[:：]',\n        r'^免责声明', r'^版权', r'^copyright', r'^点击进入',\n        r'^相关阅读', r'^延伸阅读', r'登录新浪财经APP',\n        r'搜索【信披】', r'缩小字体', r'放大字体', r'收藏',\n        r'微博', r'微信', r'分享', r'腾讯QQ',\n    ]\n\n    def __init__(self):\n        super().__init__()\n        self._session = None\n\n    def _get_session(self):\n        \"\"\"获取 requests Session (延迟初始化)\"\"\"\n        if self._session is None:\n            import requests\n            self._session = requests.Session()\n            self._session.headers.update({\n                'User-Agent': self.DEFAULT_USER_AGENT\n            })\n        return self._session\n\n    def transform_query(self, params: NewsQueryParams) -> Dict[str, Any]:\n        \"\"\"\n        将标准参数转换为爬虫参数\n\n        Args:\n            params: 标准查询参数\n\n        Returns:\n            爬虫参数字典\n        \"\"\"\n        query = {\n            \"base_url\": self.BASE_URL,\n            \"limit\": params.limit,\n            \"stock_codes\": params.stock_codes or [],\n            \"keywords\": params.keywords or [],\n        }\n\n        # 如果有股票代码，构建股票新闻 URL\n        if params.stock_codes:\n            query[\"stock_urls\"] = []\n            for code in params.stock_codes:\n                symbol = self._normalize_symbol(code)\n                stock_url = (\n                    f\"https://vip.stock.finance.sina.com.cn\"\n                    f\"/corp/go.php/vCB_AllNewsStock/symbol/{symbol}.phtml\"\n                )\n                query[\"stock_urls\"].append(stock_url)\n\n        return query\n\n    async def extract_data(self, query: Dict[str, Any]) -> List[Dict]:\n        \"\"\"\n        执行网页爬取\n\n        Args:\n            query: transform_query 返回的参数\n\n        Returns:\n            原始新闻数据列表\n        \"\"\"\n        all_news = []\n        limit = query[\"limit\"]\n\n        # 确定要爬取的 URL 列表\n        urls_to_crawl = query.get(\"stock_urls\", [query[\"base_url\"]])\n        if not urls_to_crawl:\n            urls_to_crawl = [query[\"base_url\"]]\n\n        for url in urls_to_crawl:\n            try:\n                news_items = await self._crawl_page(url, limit - len(all_news))\n                all_news.extend(news_items)\n\n                if len(all_news) >= limit:\n                    break\n\n            except Exception as e:\n                self.logger.error(f\"Failed to crawl {url}: {e}\")\n                continue\n\n        return all_news[:limit]\n\n    async def _crawl_page(self, url: str, max_items: int) -> List[Dict]:\n        \"\"\"爬取单个页面\"\"\"\n        import asyncio\n\n        self.logger.info(f\"Fetching page: {url}\")\n\n        # 使用 run_in_executor 执行同步请求\n        loop = asyncio.get_event_loop()\n        response = await loop.run_in_executor(\n            None,\n            lambda: self._fetch_page_sync(url)\n        )\n\n        if not response:\n            return []\n\n        # 设置编码\n        response.encoding = 'utf-8'\n        soup = BeautifulSoup(response.text, 'lxml')\n\n        # 查找新闻链接\n        news_links = self._extract_news_links(soup)\n        self.logger.info(f\"Found {len(news_links)} news links\")\n\n        # 爬取每条新闻详情\n        news_list = []\n        for idx, news_url in enumerate(news_links[:max_items], 1):\n            try:\n                self.logger.debug(f\"Crawling news {idx}/{min(len(news_links), max_items)}\")\n                news_item = await self._crawl_news_detail(news_url)\n                if news_item:\n                    news_list.append(news_item)\n            except Exception as e:\n                self.logger.warning(f\"Failed to crawl {news_url}: {e}\")\n                continue\n\n            # 请求间隔\n            await asyncio.sleep(self.DEFAULT_DELAY)\n\n        return news_list\n\n    def _fetch_page_sync(self, url: str):\n        \"\"\"同步获取页面\"\"\"\n        try:\n            session = self._get_session()\n            response = session.get(url, timeout=self.DEFAULT_TIMEOUT)\n            response.raise_for_status()\n            return response\n        except Exception as e:\n            self.logger.error(f\"Failed to fetch {url}: {e}\")\n            return None\n\n    def _extract_news_links(self, soup: BeautifulSoup) -> List[str]:\n        \"\"\"提取新闻链接\"\"\"\n        news_links = []\n        for link in soup.find_all('a', href=True):\n            href = link.get('href', '')\n            # 匹配新浪财经新闻 URL\n            if 'finance.sina.com.cn' in href and ('/stock/' in href or '/roll/' in href):\n                if href.startswith('http'):\n                    news_links.append(href)\n                elif href.startswith('//'):\n                    news_links.append('http:' + href)\n\n        # 去重\n        return list(set(news_links))\n\n    async def _crawl_news_detail(self, url: str) -> Optional[Dict]:\n        \"\"\"爬取新闻详情\"\"\"\n        import asyncio\n\n        loop = asyncio.get_event_loop()\n        response = await loop.run_in_executor(\n            None,\n            lambda: self._fetch_page_sync(url)\n        )\n\n        if not response:\n            return None\n\n        try:\n            soup = BeautifulSoup(response.content, \"lxml\")\n            raw_html = response.text\n\n            # 提取各字段\n            title = self._extract_title(soup)\n            if not title:\n                return None\n\n            summary, keywords = self._extract_meta(soup)\n            publish_time = self._extract_date(soup)\n            stock_codes = self._extract_stock_codes(soup)\n            content = self._extract_content(soup)\n\n            if not content or len(content) < 50:\n                return None\n\n            return {\n                \"url\": url,\n                \"title\": title,\n                \"content\": content,\n                \"summary\": summary,\n                \"keywords\": keywords,\n                \"publish_time\": publish_time,\n                \"stock_codes\": stock_codes,\n                \"raw_html\": raw_html,\n            }\n\n        except Exception as e:\n            self.logger.error(f\"Error parsing {url}: {e}\")\n            return None\n\n    def transform_data(\n        self,\n        raw_data: List[Dict],\n        query: NewsQueryParams\n    ) -> List[NewsData]:\n        \"\"\"\n        将原始数据转换为 NewsData 标准模型\n\n        Args:\n            raw_data: extract_data 返回的原始数据\n            query: 原始查询参数\n\n        Returns:\n            NewsData 列表\n        \"\"\"\n        results = []\n        for item in raw_data:\n            try:\n                news = NewsData(\n                    id=NewsData.generate_id(item[\"url\"]),\n                    title=item[\"title\"],\n                    content=item[\"content\"],\n                    summary=item.get(\"summary\"),\n                    source=self.SOURCE_NAME,\n                    source_url=item[\"url\"],\n                    publish_time=item.get(\"publish_time\") or datetime.now(),\n                    stock_codes=item.get(\"stock_codes\", []),\n                    keywords=item.get(\"keywords\", []),\n                    extra={\"raw_html\": item.get(\"raw_html\")},\n                )\n                results.append(news)\n            except Exception as e:\n                self.logger.warning(f\"Failed to transform item: {e}\")\n                continue\n\n        return results\n\n    # ========== 辅助方法（从原 sina_crawler.py 迁移）==========\n\n    def _normalize_symbol(self, code: str) -> str:\n        \"\"\"标准化股票代码为新浪格式\"\"\"\n        code = code.upper().replace(\"SH\", \"sh\").replace(\"SZ\", \"sz\")\n        if code.isdigit():\n            if code.startswith(\"6\"):\n                return f\"sh{code}\"\n            else:\n                return f\"sz{code}\"\n        return code.lower()\n\n    def _extract_title(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取标题\"\"\"\n        title_tag = soup.find('h1', class_='main-title')\n        if not title_tag:\n            title_tag = soup.find('h1')\n        if not title_tag:\n            title_tag = soup.find('title')\n\n        if title_tag:\n            title = title_tag.get_text().strip()\n            title = re.sub(r'[-_].*?(新浪|财经|网)', '', title)\n            return title.strip()\n        return None\n\n    def _extract_meta(self, soup: BeautifulSoup) -> tuple:\n        \"\"\"提取元数据（摘要和关键词）\"\"\"\n        summary = \"\"\n        keywords = []\n\n        for meta in soup.find_all('meta'):\n            name = meta.get('name', '').lower()\n            content = meta.get('content', '')\n\n            if name == 'description':\n                summary = content\n            elif name == 'keywords':\n                keywords = [kw.strip() for kw in content.split(',') if kw.strip()]\n\n        return summary, keywords\n\n    def _extract_date(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        for span in soup.find_all('span'):\n            class_attr = span.get('class', [])\n            if 'date' in class_attr or 'time-source' in class_attr:\n                date_text = span.get_text()\n                return self._parse_date(date_text)\n\n            if span.get('id') == 'pub_date':\n                date_text = span.get_text()\n                return self._parse_date(date_text)\n\n        return None\n\n    def _parse_date(self, date_text: str) -> Optional[datetime]:\n        \"\"\"解析日期字符串\"\"\"\n        try:\n            date_text = date_text.strip()\n            date_text = date_text.replace('年', '-').replace('月', '-').replace('日', '')\n\n            for fmt in ['%Y-%m-%d %H:%M', '%Y-%m-%d %H:%M:%S', '%Y-%m-%d']:\n                try:\n                    return datetime.strptime(date_text.strip(), fmt)\n                except ValueError:\n                    continue\n        except Exception:\n            pass\n        return None\n\n    def _extract_stock_codes(self, soup: BeautifulSoup) -> List[str]:\n        \"\"\"提取关联股票代码\"\"\"\n        stock_codes = []\n        for span in soup.find_all('span'):\n            span_id = span.get('id', '')\n            if span_id.startswith('stock_'):\n                code = span_id[6:].upper()\n                if code:\n                    stock_codes.append(code)\n        return list(set(stock_codes))\n\n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取正文内容\"\"\"\n        content_selectors = [\n            {'id': 'artibody'},\n            {'class': 'article-content'},\n            {'class': 'article'},\n            {'id': 'article'},\n        ]\n\n        for selector in content_selectors:\n            content_div = soup.find(['div', 'article'], selector)\n            if content_div:\n                # 移除噪音元素\n                for tag in content_div.find_all([\n                    'script', 'style', 'iframe', 'ins',\n                    'select', 'input', 'button', 'form'\n                ]):\n                    tag.decompose()\n\n                for ad in content_div.find_all(class_=re.compile(\n                    r'ad|banner|share|otherContent|recommend|app-guide', re.I\n                )):\n                    ad.decompose()\n\n                # 提取文本\n                full_text = content_div.get_text(separator='\\n', strip=True)\n                lines = full_text.split('\\n')\n                article_parts = []\n\n                for line in lines:\n                    line = line.strip()\n                    if not line or len(line) < 2:\n                        continue\n\n                    if not self._is_noise_text(line):\n                        article_parts.append(line)\n\n                if article_parts:\n                    return '\\n'.join(article_parts)\n\n        return \"\"\n\n    def _is_noise_text(self, text: str) -> bool:\n        \"\"\"判断是否为噪音文本\"\"\"\n        text_lower = text.lower().strip()\n        for pattern in self.NOISE_PATTERNS:\n            if re.match(pattern, text_lower, re.I) or re.search(pattern, text_lower, re.I):\n                return True\n        return False\n\n    def _extract_chinese_ratio(self, text: str) -> float:\n        \"\"\"计算中文字符比例\"\"\"\n        pattern = re.compile(r'[\\u4e00-\\u9fa5]+')\n        chinese_chars = pattern.findall(text)\n        chinese_count = sum(len(chars) for chars in chinese_chars)\n        total_count = len(text)\n        return chinese_count / total_count if total_count > 0 else 0\n"
  },
  {
    "path": "backend/app/financial/providers/sina/provider.py",
    "content": "\"\"\"\n新浪财经 Provider\n\"\"\"\nfrom typing import Dict, Type\n\nfrom ..base import BaseProvider, BaseFetcher, ProviderInfo\nfrom .fetchers.news import SinaNewsFetcher\n\n\nclass SinaProvider(BaseProvider):\n    \"\"\"\n    新浪财经数据源\n\n    支持的数据类型:\n    - news: 财经新闻\n    \"\"\"\n\n    @property\n    def info(self) -> ProviderInfo:\n        return ProviderInfo(\n            name=\"sina\",\n            display_name=\"新浪财经\",\n            description=\"新浪财经新闻和股票数据\",\n            website=\"https://finance.sina.com.cn\",\n            requires_credentials=False,\n            priority=1  # 第一优先级\n        )\n\n    @property\n    def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n        return {\n            \"news\": SinaNewsFetcher,\n            # 可扩展: \"stock_price\": SinaStockFetcher\n        }\n"
  },
  {
    "path": "backend/app/financial/providers/tencent/__init__.py",
    "content": "\"\"\"\n腾讯财经 Provider\n\"\"\"\nfrom .provider import TencentProvider\nfrom .fetchers.news import TencentNewsFetcher\n\n__all__ = [\"TencentProvider\", \"TencentNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/tencent/fetchers/__init__.py",
    "content": "\"\"\"\n腾讯财经 Fetchers\n\"\"\"\nfrom .news import TencentNewsFetcher\n\n__all__ = [\"TencentNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/tencent/fetchers/news.py",
    "content": "\"\"\"\n腾讯财经新闻 Fetcher\n\n基于 TET Pipeline 实现:\n- Transform Query: 转换标准参数为腾讯财经特定参数\n- Extract Data: 从腾讯财经抓取原始数据\n- Transform Data: 转换为标准 NewsData 格式\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime, timedelta\nfrom bs4 import BeautifulSoup\nimport requests\n\nfrom ...base import BaseFetcher\nfrom ....models.news import NewsQueryParams, NewsData, NewsSentiment\n\nlogger = logging.getLogger(__name__)\n\n\nclass TencentNewsFetcher(BaseFetcher):\n    \"\"\"\n    腾讯财经新闻 Fetcher\n    \n    数据源: https://news.qq.com/ch/finance/\n    \"\"\"\n    \n    BASE_URL = \"https://news.qq.com/ch/finance/\"\n    SOURCE_NAME = \"tencent\"\n    \n    # 请求配置\n    HEADERS = {\n        \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\",\n        \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\",\n        \"Accept-Language\": \"zh-CN,zh;q=0.9,en;q=0.8\",\n    }\n    \n    def transform_query(self, params: NewsQueryParams) -> Dict[str, Any]:\n        \"\"\"\n        转换标准查询参数为腾讯财经特定参数\n        \"\"\"\n        return {\n            \"url\": self.BASE_URL,\n            \"limit\": params.limit or 20,\n            \"stock_codes\": params.stock_codes,\n            \"keywords\": params.keywords,\n        }\n    \n    def extract_data(self, query: Dict[str, Any]) -> List[Dict[str, Any]]:\n        \"\"\"\n        从腾讯财经抓取原始新闻数据\n        \"\"\"\n        raw_news = []\n        \n        try:\n            response = requests.get(\n                query[\"url\"],\n                headers=self.HEADERS,\n                timeout=30\n            )\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, \"html.parser\")\n            news_links = self._extract_news_links(soup)\n            \n            logger.info(f\"[Tencent] Found {len(news_links)} news links\")\n            \n            # 限制获取数量\n            max_fetch = min(query[\"limit\"], 20)\n            \n            for link_info in news_links[:max_fetch]:\n                try:\n                    news_item = self._fetch_news_detail(link_info)\n                    if news_item:\n                        raw_news.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"[Tencent] Failed to fetch {link_info['url']}: {e}\")\n                    continue\n            \n            logger.info(f\"[Tencent] Extracted {len(raw_news)} news items\")\n            \n        except Exception as e:\n            logger.error(f\"[Tencent] Extract failed: {e}\")\n        \n        return raw_news\n    \n    def transform_data(\n        self,\n        raw_data: List[Dict[str, Any]],\n        params: NewsQueryParams\n    ) -> List[NewsData]:\n        \"\"\"\n        转换原始数据为标准 NewsData 格式\n        \"\"\"\n        news_list = []\n        \n        for item in raw_data:\n            try:\n                # 提取股票代码\n                stock_codes = self._extract_stock_codes(\n                    item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                )\n                \n                # 过滤：如果指定了股票代码，只保留相关新闻\n                if params.stock_codes:\n                    if not any(code in stock_codes for code in params.stock_codes):\n                        continue\n                \n                # 过滤：关键词过滤\n                if params.keywords:\n                    text = item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                    if not any(kw in text for kw in params.keywords):\n                        continue\n                \n                news = NewsData(\n                    title=item.get(\"title\", \"\"),\n                    content=item.get(\"content\", \"\"),\n                    source=self.SOURCE_NAME,\n                    source_url=item.get(\"url\", \"\"),\n                    publish_time=item.get(\"publish_time\", datetime.now()),\n                    author=item.get(\"author\"),\n                    stock_codes=stock_codes,\n                    sentiment=NewsSentiment.NEUTRAL,  # 默认中性\n                )\n                news_list.append(news)\n                \n            except Exception as e:\n                logger.warning(f\"[Tencent] Transform failed for item: {e}\")\n                continue\n        \n        # 应用 limit\n        if params.limit:\n            news_list = news_list[:params.limit]\n        \n        return news_list\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[Dict[str, str]]:\n        \"\"\"从页面提取新闻链接\"\"\"\n        news_links = []\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            \n            # 腾讯新闻URL模式\n            if '/rain/a/' in href or '/omn/' in href:\n                if not href.startswith('http'):\n                    href = 'https:' + href if href.startswith('//') else 'https://news.qq.com' + href\n                \n                title = link.get_text(strip=True)\n                if title and href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _fetch_news_detail(self, link_info: Dict[str, str]) -> Optional[Dict[str, Any]]:\n        \"\"\"获取新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = requests.get(url, headers=self.HEADERS, timeout=30)\n            response.raise_for_status()\n            soup = BeautifulSoup(response.text, \"html.parser\")\n            \n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            publish_time = self._extract_publish_time(soup)\n            author = self._extract_author(soup)\n            \n            return {\n                \"title\": title,\n                \"content\": content,\n                \"url\": url,\n                \"publish_time\": publish_time,\n                \"author\": author,\n            }\n            \n        except Exception as e:\n            logger.debug(f\"[Tencent] Detail fetch failed: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'content-article'},\n            {'class': 'LEFT'},\n            {'id': 'Cnt-Main-Article-QQ'},\n            {'class': 'article'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([\n                        p.get_text(strip=True) for p in paragraphs \n                        if p.get_text(strip=True)\n                    ])\n                    if content:\n                        return self._clean_text(content)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> datetime:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_selectors = [\n                {'class': 'a-time'},\n                {'class': 'article-time'},\n                {'class': 'time'},\n            ]\n            \n            for selector in time_selectors:\n                time_elem = soup.find('span', selector)\n                if time_elem:\n                    time_str = time_elem.get_text(strip=True)\n                    return self._parse_time_string(time_str)\n            \n            meta_time = soup.find('meta', {'property': 'article:published_time'})\n            if meta_time and meta_time.get('content'):\n                return datetime.fromisoformat(meta_time['content'].replace('Z', '+00:00'))\n            \n        except Exception:\n            pass\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        if '分钟前' in time_str:\n            minutes = int(re.search(r'(\\d+)', time_str).group(1))\n            return now - timedelta(minutes=minutes)\n        elif '小时前' in time_str:\n            hours = int(re.search(r'(\\d+)', time_str).group(1))\n            return now - timedelta(hours=hours)\n        elif '昨天' in time_str:\n            return now - timedelta(days=1)\n        \n        formats = ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M', '%Y-%m-%d']\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            for selector in [{'class': 'author'}, {'class': 'source'}]:\n                elem = soup.find('span', selector) or soup.find('a', selector)\n                if elem:\n                    return elem.get_text(strip=True)\n        except Exception:\n            pass\n        return None\n    \n    def _extract_stock_codes(self, text: str) -> List[str]:\n        \"\"\"从文本提取股票代码\"\"\"\n        patterns = [\n            r'(\\d{6})\\.(SH|SZ|sh|sz)',\n            r'(SH|SZ|sh|sz)(\\d{6})',\n            r'[（(](\\d{6})[)）]',\n        ]\n        \n        codes = set()\n        for pattern in patterns:\n            matches = re.findall(pattern, text)\n            for match in matches:\n                if isinstance(match, tuple):\n                    code = ''.join(match)\n                else:\n                    code = match\n                code = re.sub(r'[^0-9]', '', code)\n                if len(code) == 6:\n                    codes.add(code)\n        \n        return list(codes)\n    \n    def _clean_text(self, text: str) -> str:\n        \"\"\"清理文本\"\"\"\n        text = re.sub(r'\\s+', ' ', text)\n        text = text.strip()\n        return text\n"
  },
  {
    "path": "backend/app/financial/providers/tencent/provider.py",
    "content": "\"\"\"\n腾讯财经 Provider\n\"\"\"\nfrom typing import Dict, Type\n\nfrom ..base import BaseProvider, BaseFetcher, ProviderInfo\nfrom .fetchers.news import TencentNewsFetcher\n\n\nclass TencentProvider(BaseProvider):\n    \"\"\"\n    腾讯财经数据源\n\n    支持的数据类型:\n    - news: 财经新闻\n    \"\"\"\n\n    @property\n    def info(self) -> ProviderInfo:\n        return ProviderInfo(\n            name=\"tencent\",\n            display_name=\"腾讯财经\",\n            description=\"腾讯财经新闻 (news.qq.com)\",\n            website=\"https://news.qq.com/ch/finance/\",\n            requires_credentials=False,\n            priority=2  # 第二优先级\n        )\n\n    @property\n    def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n        return {\n            \"news\": TencentNewsFetcher,\n        }\n"
  },
  {
    "path": "backend/app/financial/providers/yicai/__init__.py",
    "content": "\"\"\"\n第一财经 Provider\n\"\"\"\nfrom .provider import YicaiProvider\nfrom .fetchers.news import YicaiNewsFetcher\n\n__all__ = [\"YicaiProvider\", \"YicaiNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/yicai/fetchers/__init__.py",
    "content": "\"\"\"\n第一财经 Fetchers\n\"\"\"\nfrom .news import YicaiNewsFetcher\n\n__all__ = [\"YicaiNewsFetcher\"]\n"
  },
  {
    "path": "backend/app/financial/providers/yicai/fetchers/news.py",
    "content": "\"\"\"\n第一财经新闻 Fetcher\n\n基于 TET Pipeline 实现\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\nimport requests\n\nfrom ...base import BaseFetcher\nfrom ....models.news import NewsQueryParams, NewsData, NewsSentiment\n\nlogger = logging.getLogger(__name__)\n\n\nclass YicaiNewsFetcher(BaseFetcher):\n    \"\"\"\n    第一财经新闻 Fetcher\n    \n    数据源: https://www.yicai.com/\n    \"\"\"\n    \n    BASE_URL = \"https://www.yicai.com/\"\n    STOCK_URL = \"https://www.yicai.com/news/gushi/\"\n    SOURCE_NAME = \"yicai\"\n    \n    HEADERS = {\n        \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\",\n        \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\",\n        \"Accept-Language\": \"zh-CN,zh;q=0.9,en;q=0.8\",\n    }\n    \n    def transform_query(self, params: NewsQueryParams) -> Dict[str, Any]:\n        \"\"\"转换标准查询参数\"\"\"\n        return {\n            \"url\": self.STOCK_URL,\n            \"limit\": params.limit or 20,\n            \"stock_codes\": params.stock_codes,\n            \"keywords\": params.keywords,\n        }\n    \n    def extract_data(self, query: Dict[str, Any]) -> List[Dict[str, Any]]:\n        \"\"\"从第一财经抓取原始数据\"\"\"\n        raw_news = []\n        \n        try:\n            response = requests.get(query[\"url\"], headers=self.HEADERS, timeout=30)\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, \"html.parser\")\n            news_links = self._extract_news_links(soup)\n            \n            logger.info(f\"[Yicai] Found {len(news_links)} news links\")\n            \n            max_fetch = min(query[\"limit\"], 20)\n            \n            for link_info in news_links[:max_fetch]:\n                try:\n                    news_item = self._fetch_news_detail(link_info)\n                    if news_item:\n                        raw_news.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"[Yicai] Failed to fetch {link_info['url']}: {e}\")\n                    continue\n            \n            logger.info(f\"[Yicai] Extracted {len(raw_news)} news items\")\n            \n        except Exception as e:\n            logger.error(f\"[Yicai] Extract failed: {e}\")\n        \n        return raw_news\n    \n    def transform_data(\n        self,\n        raw_data: List[Dict[str, Any]],\n        params: NewsQueryParams\n    ) -> List[NewsData]:\n        \"\"\"转换原始数据为标准 NewsData 格式\"\"\"\n        news_list = []\n        \n        for item in raw_data:\n            try:\n                stock_codes = self._extract_stock_codes(\n                    item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                )\n                \n                if params.stock_codes:\n                    if not any(code in stock_codes for code in params.stock_codes):\n                        continue\n                \n                if params.keywords:\n                    text = item.get(\"title\", \"\") + \" \" + item.get(\"content\", \"\")\n                    if not any(kw in text for kw in params.keywords):\n                        continue\n                \n                news = NewsData(\n                    title=item.get(\"title\", \"\"),\n                    content=item.get(\"content\", \"\"),\n                    source=self.SOURCE_NAME,\n                    source_url=item.get(\"url\", \"\"),\n                    publish_time=item.get(\"publish_time\", datetime.now()),\n                    author=item.get(\"author\"),\n                    stock_codes=stock_codes,\n                    sentiment=NewsSentiment.NEUTRAL,\n                )\n                news_list.append(news)\n                \n            except Exception as e:\n                logger.warning(f\"[Yicai] Transform failed: {e}\")\n                continue\n        \n        if params.limit:\n            news_list = news_list[:params.limit]\n        \n        return news_list\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[Dict[str, str]]:\n        \"\"\"从页面提取新闻链接\"\"\"\n        news_links = []\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            if ('/news/' in href or '/article/' in href) and title:\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://www.yicai.com' + href\n                elif not href.startswith('http'):\n                    href = 'https://www.yicai.com/' + href.lstrip('/')\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _fetch_news_detail(self, link_info: Dict[str, str]) -> Optional[Dict[str, Any]]:\n        \"\"\"获取新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = requests.get(url, headers=self.HEADERS, timeout=30)\n            response.raise_for_status()\n            soup = BeautifulSoup(response.text, \"html.parser\")\n            \n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            publish_time = self._extract_publish_time(soup)\n            author = self._extract_author(soup)\n            \n            return {\n                \"title\": title,\n                \"content\": content,\n                \"url\": url,\n                \"publish_time\": publish_time,\n                \"author\": author,\n            }\n            \n        except Exception as e:\n            logger.debug(f\"[Yicai] Detail fetch failed: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'm-txt'},\n            {'class': 'article-content'},\n            {'class': 'content'},\n            {'class': 'newsContent'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([\n                        p.get_text(strip=True) for p in paragraphs \n                        if p.get_text(strip=True)\n                    ])\n                    if content:\n                        return self._clean_text(content)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> datetime:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if not time_elem:\n                time_elem = soup.find('time')\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception:\n            pass\n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        formats = ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M', '%Y-%m-%d', '%Y年%m月%d日 %H:%M']\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        return datetime.now()\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if elem:\n                return elem.get_text(strip=True)\n        except Exception:\n            pass\n        return None\n    \n    def _extract_stock_codes(self, text: str) -> List[str]:\n        \"\"\"从文本提取股票代码\"\"\"\n        patterns = [\n            r'(\\d{6})\\.(SH|SZ|sh|sz)',\n            r'(SH|SZ|sh|sz)(\\d{6})',\n            r'[（(](\\d{6})[)）]',\n        ]\n        \n        codes = set()\n        for pattern in patterns:\n            matches = re.findall(pattern, text)\n            for match in matches:\n                if isinstance(match, tuple):\n                    code = ''.join(match)\n                else:\n                    code = match\n                code = re.sub(r'[^0-9]', '', code)\n                if len(code) == 6:\n                    codes.add(code)\n        \n        return list(codes)\n    \n    def _clean_text(self, text: str) -> str:\n        \"\"\"清理文本\"\"\"\n        text = re.sub(r'\\s+', ' ', text)\n        return text.strip()\n"
  },
  {
    "path": "backend/app/financial/providers/yicai/provider.py",
    "content": "\"\"\"\n第一财经 Provider\n\"\"\"\nfrom typing import Dict, Type\n\nfrom ..base import BaseProvider, BaseFetcher, ProviderInfo\nfrom .fetchers.news import YicaiNewsFetcher\n\n\nclass YicaiProvider(BaseProvider):\n    \"\"\"\n    第一财经数据源\n\n    支持的数据类型:\n    - news: 财经新闻\n    \"\"\"\n\n    @property\n    def info(self) -> ProviderInfo:\n        return ProviderInfo(\n            name=\"yicai\",\n            display_name=\"第一财经\",\n            description=\"第一财经股市新闻 (yicai.com)\",\n            website=\"https://www.yicai.com/\",\n            requires_credentials=False,\n            priority=5  # 第五优先级\n        )\n\n    @property\n    def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n        return {\n            \"news\": YicaiNewsFetcher,\n        }\n"
  },
  {
    "path": "backend/app/financial/registry.py",
    "content": "\"\"\"\nProvider 注册中心\n\n支持:\n1. 动态注册/注销 Provider\n2. 根据数据类型获取 Fetcher\n3. 多 Provider 自动降级\n\n来源参考:\n- OpenBB: Provider Registry 机制\n- 设计文档: research/codedeepresearch/OpenBB/FinnewsHunter_improvement_plan.md\n\"\"\"\nfrom typing import Dict, Optional, List\nimport logging\n\nfrom .providers.base import BaseProvider, BaseFetcher\n\nlogger = logging.getLogger(__name__)\n\n\nclass ProviderNotFoundError(Exception):\n    \"\"\"Provider 未找到异常\"\"\"\n    pass\n\n\nclass FetcherNotFoundError(Exception):\n    \"\"\"Fetcher 未找到异常\"\"\"\n    pass\n\n\nclass ProviderRegistry:\n    \"\"\"\n    Provider 注册中心\n\n    功能:\n    1. 注册/注销 Provider\n    2. 根据数据类型获取 Fetcher\n    3. 支持多 Provider 自动降级\n\n    Example:\n        >>> registry = ProviderRegistry()\n        >>> registry.register(SinaProvider())\n        >>> registry.register(TencentProvider())\n        >>>\n        >>> # 获取 Fetcher (按优先级自动选择)\n        >>> fetcher = registry.get_fetcher(\"news\")\n        >>>\n        >>> # 指定 Provider\n        >>> fetcher = registry.get_fetcher(\"news\", provider=\"tencent\")\n    \"\"\"\n\n    _instance: Optional[\"ProviderRegistry\"] = None\n\n    def __new__(cls):\n        \"\"\"单例模式\"\"\"\n        if cls._instance is None:\n            cls._instance = super().__new__(cls)\n            cls._instance._providers: Dict[str, BaseProvider] = {}\n            cls._instance._priority_order: List[str] = []\n            cls._instance._initialized = False\n        return cls._instance\n\n    def register(self, provider: BaseProvider) -> None:\n        \"\"\"\n        注册 Provider\n\n        Args:\n            provider: Provider 实例\n\n        Note:\n            - 如果 Provider 已存在，会被替换\n            - 按 priority 自动排序\n        \"\"\"\n        name = provider.info.name\n        priority = provider.info.priority\n\n        # 如果已存在，先移除\n        if name in self._providers:\n            self._priority_order.remove(name)\n\n        self._providers[name] = provider\n\n        # 按优先级插入（priority 越小越靠前）\n        inserted = False\n        for i, existing_name in enumerate(self._priority_order):\n            existing_priority = self._providers[existing_name].info.priority\n            if priority < existing_priority:\n                self._priority_order.insert(i, name)\n                inserted = True\n                break\n\n        if not inserted:\n            self._priority_order.append(name)\n\n        logger.info(\n            f\"Registered provider: {name} \"\n            f\"(priority={priority}, types={list(provider.fetchers.keys())})\"\n        )\n\n    def unregister(self, name: str) -> bool:\n        \"\"\"\n        注销 Provider\n\n        Args:\n            name: Provider 名称\n\n        Returns:\n            是否成功注销\n        \"\"\"\n        if name in self._providers:\n            del self._providers[name]\n            self._priority_order.remove(name)\n            logger.info(f\"Unregistered provider: {name}\")\n            return True\n        return False\n\n    def get_provider(self, name: str) -> Optional[BaseProvider]:\n        \"\"\"\n        获取指定 Provider\n\n        Args:\n            name: Provider 名称\n\n        Returns:\n            Provider 实例，如果不存在返回 None\n        \"\"\"\n        return self._providers.get(name)\n\n    def get_fetcher(\n        self,\n        data_type: str,\n        provider: Optional[str] = None\n    ) -> BaseFetcher:\n        \"\"\"\n        获取 Fetcher，支持自动降级\n\n        Args:\n            data_type: 数据类型，如 'news', 'stock_price'\n            provider: 可选的 Provider 名称，如果不指定则按优先级选择\n\n        Returns:\n            BaseFetcher 实例\n\n        Raises:\n            FetcherNotFoundError: 如果没有找到支持该数据类型的 Provider\n            ProviderNotFoundError: 如果指定的 Provider 不存在\n\n        Example:\n            >>> # 自动选择最高优先级的 Provider\n            >>> fetcher = registry.get_fetcher(\"news\")\n            >>>\n            >>> # 指定 Provider\n            >>> fetcher = registry.get_fetcher(\"news\", provider=\"tencent\")\n        \"\"\"\n        # 如果指定了 Provider\n        if provider:\n            p = self._providers.get(provider)\n            if not p:\n                raise ProviderNotFoundError(f\"Provider '{provider}' not found\")\n\n            fetcher = p.get_fetcher(data_type)\n            if not fetcher:\n                raise FetcherNotFoundError(\n                    f\"Provider '{provider}' does not support data_type='{data_type}'\"\n                )\n            return fetcher\n\n        # 否则按优先级选择\n        for p_name in self._priority_order:\n            p = self._providers[p_name]\n            if p.supports(data_type):\n                fetcher = p.get_fetcher(data_type)\n                if fetcher:\n                    logger.debug(f\"Using {p_name} for {data_type}\")\n                    return fetcher\n\n        # 没有找到支持的 Provider\n        available = self.get_providers_for_type(data_type)\n        raise FetcherNotFoundError(\n            f\"No provider found for data_type='{data_type}'. \"\n            f\"Available providers for this type: {available}\"\n        )\n\n    def list_providers(self) -> List[str]:\n        \"\"\"\n        列出所有已注册的 Provider (按优先级排序)\n\n        Returns:\n            Provider 名称列表\n        \"\"\"\n        return list(self._priority_order)\n\n    def get_providers_for_type(self, data_type: str) -> List[str]:\n        \"\"\"\n        获取支持指定数据类型的所有 Provider\n\n        Args:\n            data_type: 数据类型\n\n        Returns:\n            支持该类型的 Provider 名称列表 (按优先级排序)\n        \"\"\"\n        return [\n            name for name in self._priority_order\n            if self._providers[name].supports(data_type)\n        ]\n\n    def get_all_data_types(self) -> List[str]:\n        \"\"\"\n        获取所有支持的数据类型\n\n        Returns:\n            数据类型列表\n        \"\"\"\n        types = set()\n        for provider in self._providers.values():\n            types.update(provider.fetchers.keys())\n        return sorted(types)\n\n    def clear(self) -> None:\n        \"\"\"清空所有注册的 Provider\"\"\"\n        self._providers.clear()\n        self._priority_order.clear()\n        logger.info(\"Cleared all providers from registry\")\n\n    def __repr__(self) -> str:\n        return f\"<ProviderRegistry providers={self._priority_order}>\"\n\n\n# 全局单例\n_registry: Optional[ProviderRegistry] = None\n\n\ndef get_registry() -> ProviderRegistry:\n    \"\"\"\n    获取全局 Registry 实例\n\n    Returns:\n        ProviderRegistry 单例\n    \"\"\"\n    global _registry\n    if _registry is None:\n        _registry = ProviderRegistry()\n    return _registry\n\n\ndef reset_registry() -> ProviderRegistry:\n    \"\"\"\n    重置全局 Registry (主要用于测试)\n\n    Returns:\n        新的 ProviderRegistry 实例\n    \"\"\"\n    global _registry\n    if _registry:\n        _registry.clear()\n    _registry = ProviderRegistry()\n    _registry.clear()  # 确保单例也被清空\n    return _registry\n"
  },
  {
    "path": "backend/app/financial/tools.py",
    "content": "\"\"\"\n金融数据工具 - 封装为 AgenticX BaseTool\n\n这些工具可以直接被 Agent 调用，内部使用 Provider Registry 获取数据。\n\n设计原则:\n- 继承 AgenticX BaseTool，保持与框架兼容\n- 内部使用 ProviderRegistry 实现多源降级\n- 返回标准化的数据格式\n\n来源参考:\n- 设计文档: research/codedeepresearch/OpenBB/FinnewsHunter_improvement_plan.md\n\"\"\"\nfrom typing import List, Optional, Dict, Any\nimport asyncio\nimport logging\n\nfrom agenticx import BaseTool\nfrom agenticx.core import ToolMetadata, ToolCategory\n\nfrom .registry import get_registry, FetcherNotFoundError, ProviderNotFoundError\nfrom .models.news import NewsQueryParams, NewsData\nfrom .models.stock import StockQueryParams, StockPriceData, KlineInterval, AdjustType\n\nlogger = logging.getLogger(__name__)\n\n\nclass FinancialNewsTool(BaseTool):\n    \"\"\"\n    金融新闻获取工具\n\n    支持多数据源自动切换，返回标准化的新闻数据。\n\n    Example:\n        >>> tool = FinancialNewsTool()\n        >>> result = await tool.aexecute(stock_codes=[\"600519\"], limit=10)\n        >>> print(result[\"data\"])  # List[NewsData.model_dump()]\n    \"\"\"\n\n    def __init__(self):\n        metadata = ToolMetadata(\n            name=\"financial_news\",\n            description=\"获取金融新闻，支持多数据源自动切换\",\n            category=ToolCategory.DATA_ACCESS,\n            version=\"1.0.0\"\n        )\n        super().__init__(metadata=metadata)\n\n    def _setup_parameters(self):\n        \"\"\"设置工具参数（AgenticX BaseTool 要求的抽象方法）\"\"\"\n        pass\n\n    async def aexecute(\n        self,\n        keywords: Optional[List[str]] = None,\n        stock_codes: Optional[List[str]] = None,\n        limit: int = 50,\n        provider: Optional[str] = None,\n        **kwargs\n    ) -> Dict[str, Any]:\n        \"\"\"\n        异步执行新闻获取\n\n        Args:\n            keywords: 搜索关键词列表\n            stock_codes: 关联股票代码列表\n            limit: 返回条数\n            provider: 指定数据源\n\n        Returns:\n            {\n                \"success\": bool,\n                \"count\": int,\n                \"provider\": str,\n                \"data\": List[dict]  # NewsData.model_dump()\n            }\n        \"\"\"\n        # 构建标准查询参数\n        params = NewsQueryParams(\n            keywords=keywords,\n            stock_codes=stock_codes,\n            limit=limit\n        )\n\n        try:\n            # 获取 Fetcher（自动降级）\n            registry = get_registry()\n            fetcher = registry.get_fetcher(\"news\", provider)\n\n            # 执行 TET Pipeline\n            results: List[NewsData] = await fetcher.fetch(params)\n\n            # 获取实际使用的 provider 名称\n            provider_name = fetcher.__class__.__module__.split(\".\")[-3]\n\n            return {\n                \"success\": True,\n                \"count\": len(results),\n                \"provider\": provider_name,\n                \"data\": [r.model_dump() for r in results]\n            }\n\n        except (FetcherNotFoundError, ProviderNotFoundError) as e:\n            logger.error(f\"Provider error: {e}\")\n            registry = get_registry()\n            return {\n                \"success\": False,\n                \"error\": str(e),\n                \"available_providers\": registry.get_providers_for_type(\"news\")\n            }\n\n        except Exception as e:\n            logger.exception(f\"Unexpected error in FinancialNewsTool: {e}\")\n            return {\n                \"success\": False,\n                \"error\": f\"Unexpected error: {e}\"\n            }\n\n    def execute(\n        self,\n        keywords: Optional[List[str]] = None,\n        stock_codes: Optional[List[str]] = None,\n        limit: int = 50,\n        provider: Optional[str] = None,\n        **kwargs\n    ) -> Dict[str, Any]:\n        \"\"\"\n        同步执行（包装异步方法）\n        \"\"\"\n        return asyncio.run(self.aexecute(\n            keywords=keywords,\n            stock_codes=stock_codes,\n            limit=limit,\n            provider=provider,\n            **kwargs\n        ))\n\n\nclass StockPriceTool(BaseTool):\n    \"\"\"\n    股票价格获取工具（K线数据）\n\n    Example:\n        >>> tool = StockPriceTool()\n        >>> result = await tool.aexecute(symbol=\"600519\", interval=\"1d\", limit=30)\n        >>> print(result[\"data\"])  # List[StockPriceData.model_dump()]\n    \"\"\"\n\n    def __init__(self):\n        metadata = ToolMetadata(\n            name=\"stock_price\",\n            description=\"获取股票K线数据，支持多数据源自动切换\",\n            category=ToolCategory.DATA_ACCESS,\n            version=\"1.0.0\"\n        )\n        super().__init__(metadata=metadata)\n\n    def _setup_parameters(self):\n        \"\"\"设置工具参数（AgenticX BaseTool 要求的抽象方法）\"\"\"\n        pass\n\n    async def aexecute(\n        self,\n        symbol: str,\n        interval: str = \"1d\",\n        limit: int = 90,\n        adjust: str = \"qfq\",\n        provider: Optional[str] = None,\n        **kwargs\n    ) -> Dict[str, Any]:\n        \"\"\"\n        异步执行价格获取\n\n        Args:\n            symbol: 股票代码\n            interval: K线周期\n            limit: 返回条数\n            adjust: 复权类型\n            provider: 指定数据源\n\n        Returns:\n            {\n                \"success\": bool,\n                \"symbol\": str,\n                \"count\": int,\n                \"provider\": str,\n                \"data\": List[dict]  # StockPriceData.model_dump()\n            }\n        \"\"\"\n        try:\n            params = StockQueryParams(\n                symbol=symbol,\n                interval=KlineInterval(interval),\n                limit=limit,\n                adjust=AdjustType(adjust)\n            )\n        except ValueError as e:\n            return {\n                \"success\": False,\n                \"error\": f\"Invalid parameter: {e}\"\n            }\n\n        try:\n            registry = get_registry()\n            fetcher = registry.get_fetcher(\"stock_price\", provider)\n            results: List[StockPriceData] = await fetcher.fetch(params)\n\n            provider_name = fetcher.__class__.__module__.split(\".\")[-3]\n\n            return {\n                \"success\": True,\n                \"symbol\": symbol,\n                \"count\": len(results),\n                \"provider\": provider_name,\n                \"data\": [r.model_dump() for r in results]\n            }\n\n        except (FetcherNotFoundError, ProviderNotFoundError) as e:\n            logger.error(f\"Provider error: {e}\")\n            registry = get_registry()\n            return {\n                \"success\": False,\n                \"error\": str(e),\n                \"available_providers\": registry.get_providers_for_type(\"stock_price\")\n            }\n\n        except Exception as e:\n            logger.exception(f\"Unexpected error in StockPriceTool: {e}\")\n            return {\n                \"success\": False,\n                \"error\": f\"Unexpected error: {e}\"\n            }\n\n    def execute(\n        self,\n        symbol: str,\n        interval: str = \"1d\",\n        limit: int = 90,\n        adjust: str = \"qfq\",\n        provider: Optional[str] = None,\n        **kwargs\n    ) -> Dict[str, Any]:\n        \"\"\"同步执行\"\"\"\n        return asyncio.run(self.aexecute(\n            symbol=symbol,\n            interval=interval,\n            limit=limit,\n            adjust=adjust,\n            provider=provider,\n            **kwargs\n        ))\n\n\n# 便捷函数：自动注册默认 Provider\ndef setup_default_providers():\n    \"\"\"\n    注册默认的 Provider\n\n    在应用启动时调用，确保 Registry 中有可用的 Provider。\n    \n    当前支持的数据源（按优先级排序）:\n    1. sina - 新浪财经\n    2. tencent - 腾讯财经\n    3. nbd - 每日经济新闻\n    4. eastmoney - 东方财富\n    5. yicai - 第一财经\n    6. 163 - 网易财经\n    \"\"\"\n    from .providers.sina import SinaProvider\n    from .providers.tencent import TencentProvider\n    from .providers.nbd import NbdProvider\n    from .providers.eastmoney import EastmoneyProvider\n    from .providers.yicai import YicaiProvider\n    from .providers.netease import NeteaseProvider\n\n    registry = get_registry()\n    \n    # 定义所有 Provider（按优先级顺序）\n    providers = [\n        (\"sina\", SinaProvider),\n        (\"tencent\", TencentProvider),\n        (\"nbd\", NbdProvider),\n        (\"eastmoney\", EastmoneyProvider),\n        (\"yicai\", YicaiProvider),\n        (\"163\", NeteaseProvider),\n    ]\n\n    # 注册所有 Provider\n    for name, provider_class in providers:\n        if name not in registry.list_providers():\n            try:\n                registry.register(provider_class())\n                logger.debug(f\"Registered provider: {name}\")\n            except Exception as e:\n                logger.warning(f\"Failed to register provider {name}: {e}\")\n\n    logger.info(f\"Registered {len(registry.list_providers())} providers: {registry.list_providers()}\")\n"
  },
  {
    "path": "backend/app/knowledge/README.md",
    "content": "# 知识图谱模块\n\n## 📊 概述\n\n知识图谱模块为每只股票构建动态的知识图谱，用于智能化的新闻检索和分析。\n\n## 🎯 核心功能\n\n### 1. 多维度知识建模\n\n为每家公司建立包含以下信息的知识图谱：\n\n- **名称变体**：公司简称、别名、全称\n- **业务线**：主营业务、新增业务、已停止业务\n- **行业归属**：一级行业、二级行业、细分领域\n- **产品服务**：主要产品和服务\n- **关联概念**：涉及的热点概念（AI大模型、云计算等）\n- **检索关键词**：优化检索效果的关键词\n\n### 2. 智能并发检索\n\n基于知识图谱生成多样化的检索查询，并发调用搜索API：\n\n```\n示例：彩讯股份 (300634)\n\n生成的查询组合：\n1. \"彩讯股份 300634\"\n2. \"彩讯 股票\"\n3. \"彩讯股份 运营商增值服务\"\n4. \"彩讯 AI大模型应用\"\n5. \"彩讯科技 云计算\"\n6. ...（最多10条并发查询）\n```\n\n### 3. 动态图谱更新\n\n- **构建时机**：首次定向爬取时自动构建\n- **数据来源**：\n  - akshare：基础信息（行业、市值、主营业务）\n  - LLM推理：名称变体、业务细分\n  - 新闻分析：业务变化、新概念\n  - 文档解析：深度信息（年报、公告）\n- **更新机制**：每次定向爬取后自动更新\n\n## 🏗️ 架构设计\n\n### 图谱结构\n\n```\n(Company) 公司节点\n   ├─ HAS_VARIANT ─> (NameVariant) 名称变体\n   ├─ OPERATES_IN ─> (Business) 业务线\n   ├─ BELONGS_TO  ─> (Industry) 行业\n   ├─ PROVIDES    ─> (Product) 产品\n   ├─ RELATES_TO  ─> (Keyword) 关键词\n   └─ INVOLVES    ─> (Concept) 概念\n```\n\n### 核心组件\n\n1. **graph_models.py** - 数据模型定义\n2. **graph_service.py** - 图谱CRUD服务\n3. **knowledge_extractor.py** - 知识提取Agent\n4. **parallel_search.py** - 并发检索策略\n\n## 🚀 使用方法\n\n### 1. 启动 Neo4j\n\n```bash\ncd deploy\ndocker-compose -f docker-compose.dev.yml up -d neo4j\n```\n\n### 2. 初始化图谱\n\n```bash\ncd backend\npython init_knowledge_graph.py\n```\n\n### 3. API 调用\n\n#### 查询图谱\n```bash\nGET /api/v1/knowledge-graph/{stock_code}\n```\n\n#### 构建图谱\n```bash\nPOST /api/v1/knowledge-graph/{stock_code}/build\n{\n  \"force_rebuild\": false\n}\n```\n\n#### 更新图谱\n```bash\nPOST /api/v1/knowledge-graph/{stock_code}/update\n{\n  \"update_from_news\": true,\n  \"news_limit\": 20\n}\n```\n\n#### 删除图谱\n```bash\nDELETE /api/v1/knowledge-graph/{stock_code}\n```\n\n### 4. 自动集成\n\n定向爬取时自动使用知识图谱：\n\n1. **检查图谱**：如果不存在，自动从 akshare + LLM 构建\n2. **并发检索**：基于图谱生成的多个关键词并发搜索\n3. **更新图谱**：爬取完成后，从新闻中提取新信息更新图谱\n\n## 📈 效果对比\n\n### 传统单关键词检索\n\n```python\nquery = \"彩讯股份 股票 300634\"\nresults = search(query)  # ~20-30条\n```\n\n### 基于知识图谱的并发检索\n\n```python\nqueries = [\n    \"彩讯股份 300634\",\n    \"彩讯 运营商增值服务\",\n    \"彩讯股份 AI大模型应用\",\n    \"彩讯科技 云计算\",\n    ...\n]\nresults = parallel_search(queries)  # ~100-200条，去重后70-130条\n```\n\n**召回率提升：3-5倍**\n\n## 🔧 配置\n\n环境变量：\n```bash\nNEO4J_URI=bolt://localhost:7687\nNEO4J_USER=neo4j\nNEO4J_PASSWORD=finnews_neo4j_password\n```\n\n## 📊 监控\n\n访问 Neo4j 浏览器：\n- URL: http://localhost:7474\n- 用户名: neo4j\n- 密码: finnews_neo4j_password\n\n示例查询：\n```cypher\n// 查看所有公司\nMATCH (c:Company) RETURN c\n\n// 查看公司的完整图谱\nMATCH (c:Company {stock_code: 'SZ300634'})-[r]->(n)\nRETURN c, r, n\n\n// 查看业务线\nMATCH (c:Company)-[:OPERATES_IN]->(b:Business)\nWHERE b.status = 'active'\nRETURN c.stock_name, b.business_name, b.status\n```\n\n## ⚠️ 注意事项\n\n1. **LLM成本**：图谱构建和更新会调用LLM，注意API成本\n2. **并发限制**：并发检索默认5个worker，可根据API限制调整\n3. **图谱更新**：建议每次定向爬取后自动更新，保持图谱时效性\n4. **数据质量**：LLM提取的信息需要人工review，建议提供review接口\n\n"
  },
  {
    "path": "backend/app/knowledge/__init__.py",
    "content": "\"\"\"\n知识图谱模块\n\"\"\"\nfrom .graph_models import (\n    CompanyNode,\n    NameVariantNode,\n    BusinessNode,\n    IndustryNode,\n    ProductNode,\n    KeywordNode,\n    ConceptNode,\n    CompanyKnowledgeGraph,\n    SearchKeywordSet,\n    NodeType,\n    RelationType\n)\nfrom .graph_service import KnowledgeGraphService\n\n__all__ = [\n    \"CompanyNode\",\n    \"NameVariantNode\", \n    \"BusinessNode\",\n    \"IndustryNode\",\n    \"ProductNode\",\n    \"KeywordNode\",\n    \"ConceptNode\",\n    \"CompanyKnowledgeGraph\",\n    \"SearchKeywordSet\",\n    \"NodeType\",\n    \"RelationType\",\n    \"KnowledgeGraphService\"\n]\n\n"
  },
  {
    "path": "backend/app/knowledge/graph_models.py",
    "content": "\"\"\"\n知识图谱数据模型\n定义公司知识图谱的节点和关系结构\n\"\"\"\nfrom typing import List, Dict, Any, Optional\nfrom pydantic import BaseModel, Field\nfrom datetime import datetime\nfrom enum import Enum\n\n\nclass NodeType(str, Enum):\n    \"\"\"节点类型枚举\"\"\"\n    COMPANY = \"Company\"                # 公司\n    NAME_VARIANT = \"NameVariant\"       # 名称变体\n    BUSINESS = \"Business\"              # 业务线\n    INDUSTRY = \"Industry\"              # 行业\n    PRODUCT = \"Product\"                # 产品/服务\n    KEYWORD = \"Keyword\"                # 检索关键词\n    CONCEPT = \"Concept\"                # 概念/主题\n    PARTNER = \"Partner\"                # 合作伙伴\n\n\nclass RelationType(str, Enum):\n    \"\"\"关系类型枚举\"\"\"\n    HAS_VARIANT = \"HAS_VARIANT\"        # 有变体\n    OPERATES_IN = \"OPERATES_IN\"        # 运营于（业务线）\n    BELONGS_TO = \"BELONGS_TO\"          # 属于（行业）\n    PROVIDES = \"PROVIDES\"              # 提供（产品）\n    RELATES_TO = \"RELATES_TO\"          # 关联（关键词）\n    INVOLVES = \"INVOLVES\"              # 涉及（概念）\n    COOPERATES_WITH = \"COOPERATES_WITH\"  # 合作（伙伴）\n    UPSTREAM = \"UPSTREAM\"              # 上游\n    DOWNSTREAM = \"DOWNSTREAM\"          # 下游\n\n\nclass CompanyNode(BaseModel):\n    \"\"\"公司节点\"\"\"\n    stock_code: str = Field(description=\"股票代码（如 SZ300634）\")\n    stock_name: str = Field(description=\"股票全称（如 彩讯股份）\")\n    short_code: str = Field(description=\"纯数字代码（如 300634）\")\n    industry: Optional[str] = Field(default=None, description=\"所属行业\")\n    sector: Optional[str] = Field(default=None, description=\"所属板块\")\n    market_cap: Optional[float] = Field(default=None, description=\"市值\")\n    listed_date: Optional[str] = Field(default=None, description=\"上市日期\")\n    created_at: datetime = Field(default_factory=datetime.utcnow)\n    updated_at: datetime = Field(default_factory=datetime.utcnow)\n\n\nclass NameVariantNode(BaseModel):\n    \"\"\"名称变体节点\"\"\"\n    variant: str = Field(description=\"变体名称（如 彩讯、彩讯科技）\")\n    variant_type: str = Field(description=\"变体类型: abbreviation, alias, full_name\")\n    created_at: datetime = Field(default_factory=datetime.utcnow)\n\n\nclass BusinessNode(BaseModel):\n    \"\"\"业务线节点\"\"\"\n    business_name: str = Field(description=\"业务名称\")\n    business_type: str = Field(description=\"业务类型: main, new, stopped\")\n    description: Optional[str] = Field(default=None, description=\"业务描述\")\n    start_date: Optional[str] = Field(default=None, description=\"开始日期\")\n    end_date: Optional[str] = Field(default=None, description=\"结束日期（如果停止）\")\n    status: str = Field(default=\"active\", description=\"状态: active, stopped, planned\")\n    created_at: datetime = Field(default_factory=datetime.utcnow)\n    updated_at: datetime = Field(default_factory=datetime.utcnow)\n\n\nclass IndustryNode(BaseModel):\n    \"\"\"行业节点\"\"\"\n    industry_name: str = Field(description=\"行业名称\")\n    industry_code: Optional[str] = Field(default=None, description=\"行业代码\")\n    level: int = Field(default=1, description=\"层级: 1=一级行业, 2=二级行业\")\n    created_at: datetime = Field(default_factory=datetime.utcnow)\n\n\nclass ProductNode(BaseModel):\n    \"\"\"产品/服务节点\"\"\"\n    product_name: str = Field(description=\"产品名称\")\n    product_type: str = Field(description=\"产品类型: software, hardware, service\")\n    description: Optional[str] = Field(default=None, description=\"产品描述\")\n    created_at: datetime = Field(default_factory=datetime.utcnow)\n    updated_at: datetime = Field(default_factory=datetime.utcnow)\n\n\nclass KeywordNode(BaseModel):\n    \"\"\"检索关键词节点\"\"\"\n    keyword: str = Field(description=\"关键词\")\n    keyword_type: str = Field(description=\"类型: business, product, industry, general\")\n    weight: float = Field(default=1.0, description=\"权重（检索时的重要性）\")\n    created_at: datetime = Field(default_factory=datetime.utcnow)\n\n\nclass ConceptNode(BaseModel):\n    \"\"\"概念/主题节点\"\"\"\n    concept_name: str = Field(description=\"概念名称（如 AI大模型、元宇宙）\")\n    description: Optional[str] = Field(default=None, description=\"概念描述\")\n    hot_level: int = Field(default=0, description=\"热度等级 0-10\")\n    created_at: datetime = Field(default_factory=datetime.utcnow)\n\n\nclass CompanyKnowledgeGraph(BaseModel):\n    \"\"\"公司知识图谱完整结构（用于导入导出）\"\"\"\n    company: CompanyNode\n    name_variants: List[NameVariantNode] = Field(default_factory=list)\n    businesses: List[BusinessNode] = Field(default_factory=list)\n    industries: List[IndustryNode] = Field(default_factory=list)\n    products: List[ProductNode] = Field(default_factory=list)\n    keywords: List[KeywordNode] = Field(default_factory=list)\n    concepts: List[ConceptNode] = Field(default_factory=list)\n\n\nclass SearchKeywordSet(BaseModel):\n    \"\"\"检索关键词集合（用于定向爬取）\"\"\"\n    stock_code: str\n    stock_name: str\n    \n    # 名称相关\n    name_keywords: List[str] = Field(default_factory=list, description=\"名称变体\")\n    \n    # 业务相关\n    business_keywords: List[str] = Field(default_factory=list, description=\"业务线关键词\")\n    \n    # 行业相关\n    industry_keywords: List[str] = Field(default_factory=list, description=\"行业关键词\")\n    \n    # 产品相关\n    product_keywords: List[str] = Field(default_factory=list, description=\"产品关键词\")\n    \n    # 概念相关\n    concept_keywords: List[str] = Field(default_factory=list, description=\"概念关键词\")\n    \n    # 组合查询\n    combined_queries: List[str] = Field(default_factory=list, description=\"预组合的查询串\")\n    \n    def get_all_keywords(self) -> List[str]:\n        \"\"\"获取所有关键词（去重）\"\"\"\n        all_kw = (\n            self.name_keywords +\n            self.business_keywords +\n            self.industry_keywords +\n            self.product_keywords +\n            self.concept_keywords\n        )\n        return list(set(all_kw))\n    \n    def generate_search_queries(self, max_queries: int = 10) -> List[str]:\n        \"\"\"\n        生成多样化的搜索查询组合\n        \n        Args:\n            max_queries: 最大查询数量\n            \n        Returns:\n            查询字符串列表\n        \"\"\"\n        queries = []\n        \n        # 1. 核心查询：股票名称 + 股票代码\n        if self.name_keywords:\n            queries.append(f\"{self.stock_name} {self.stock_code}\")\n            queries.append(f\"{self.name_keywords[0]} 股票\")\n        \n        # 2. 业务线查询\n        for business in self.business_keywords[:3]:  # 最多3个业务线\n            queries.append(f\"{self.stock_name} {business}\")\n            if len(self.name_keywords) > 1:\n                queries.append(f\"{self.name_keywords[0]} {business}\")\n        \n        # 3. 概念查询\n        for concept in self.concept_keywords[:2]:  # 最多2个概念\n            queries.append(f\"{self.stock_name} {concept}\")\n        \n        # 4. 产品查询\n        for product in self.product_keywords[:2]:  # 最多2个产品\n            queries.append(f\"{self.stock_name} {product}\")\n        \n        # 5. 使用预组合查询\n        queries.extend(self.combined_queries)\n        \n        # 去重并限制数量\n        unique_queries = list(dict.fromkeys(queries))  # 保持顺序的去重\n        return unique_queries[:max_queries]\n\n"
  },
  {
    "path": "backend/app/knowledge/graph_service.py",
    "content": "\"\"\"\n知识图谱服务\n提供公司知识图谱的创建、查询、更新操作\n\"\"\"\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\n\nfrom ..core.neo4j_client import get_neo4j_client\nfrom .graph_models import (\n    CompanyNode,\n    NameVariantNode,\n    BusinessNode,\n    IndustryNode,\n    ProductNode,\n    KeywordNode,\n    ConceptNode,\n    CompanyKnowledgeGraph,\n    SearchKeywordSet,\n    NodeType,\n    RelationType\n)\n\nlogger = logging.getLogger(__name__)\n\n\nclass KnowledgeGraphService:\n    \"\"\"知识图谱服务\"\"\"\n    \n    def __init__(self):\n        self.neo4j = get_neo4j_client()\n        self._ensure_constraints()\n    \n    def _ensure_constraints(self):\n        \"\"\"确保数据库约束和索引存在\"\"\"\n        constraints = [\n            # 公司节点唯一约束\n            \"CREATE CONSTRAINT company_code IF NOT EXISTS FOR (c:Company) REQUIRE c.stock_code IS UNIQUE\",\n            # 索引加速查询\n            \"CREATE INDEX company_name IF NOT EXISTS FOR (c:Company) ON (c.stock_name)\",\n            \"CREATE INDEX business_name IF NOT EXISTS FOR (b:Business) ON (b.business_name)\",\n            \"CREATE INDEX keyword_text IF NOT EXISTS FOR (k:Keyword) ON (k.keyword)\",\n        ]\n        \n        for constraint in constraints:\n            try:\n                self.neo4j.execute_write(constraint)\n            except Exception as e:\n                # 约束可能已存在，忽略错误\n                logger.debug(f\"Constraint creation skipped: {e}\")\n    \n    # ============ 公司节点操作 ============\n    \n    def create_or_update_company(self, company: CompanyNode) -> bool:\n        \"\"\"\n        创建或更新公司节点\n        \n        Args:\n            company: 公司节点数据\n            \n        Returns:\n            是否成功\n        \"\"\"\n        query = \"\"\"\n        MERGE (c:Company {stock_code: $stock_code})\n        SET c.stock_name = $stock_name,\n            c.short_code = $short_code,\n            c.industry = $industry,\n            c.sector = $sector,\n            c.market_cap = $market_cap,\n            c.listed_date = $listed_date,\n            c.updated_at = datetime(),\n            c.created_at = coalesce(c.created_at, datetime())\n        RETURN c\n        \"\"\"\n        \n        params = company.model_dump()\n        params['created_at'] = company.created_at.isoformat()\n        params['updated_at'] = datetime.utcnow().isoformat()\n        \n        try:\n            self.neo4j.execute_write(query, params)\n            logger.info(f\"✅ 公司节点已创建/更新: {company.stock_name}({company.stock_code})\")\n            return True\n        except Exception as e:\n            logger.error(f\"❌ 公司节点创建失败: {e}\")\n            return False\n    \n    def get_company(self, stock_code: str) -> Optional[Dict[str, Any]]:\n        \"\"\"获取公司节点\"\"\"\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})\n        RETURN c\n        \"\"\"\n        \n        results = self.neo4j.execute_query(query, {\"stock_code\": stock_code})\n        return results[0]['c'] if results else None\n    \n    # ============ 名称变体操作 ============\n    \n    def add_name_variants(\n        self,\n        stock_code: str,\n        variants: List[NameVariantNode]\n    ) -> bool:\n        \"\"\"\n        添加名称变体\n        \n        Args:\n            stock_code: 股票代码\n            variants: 名称变体列表\n            \n        Returns:\n            是否成功\n        \"\"\"\n        for variant in variants:\n            query = \"\"\"\n            MATCH (c:Company {stock_code: $stock_code})\n            MERGE (v:NameVariant {variant: $variant})\n            SET v.variant_type = $variant_type,\n                v.created_at = $created_at\n            MERGE (c)-[r:HAS_VARIANT]->(v)\n            RETURN v\n            \"\"\"\n            \n            params = {\n                \"stock_code\": stock_code,\n                \"variant\": variant.variant,\n                \"variant_type\": variant.variant_type,\n                \"created_at\": variant.created_at.isoformat()\n            }\n            \n            try:\n                self.neo4j.execute_write(query, params)\n            except Exception as e:\n                logger.error(f\"添加名称变体失败 {variant.variant}: {e}\")\n                return False\n        \n        logger.info(f\"✅ 已添加 {len(variants)} 个名称变体\")\n        return True\n    \n    # ============ 业务线操作 ============\n    \n    def add_business(\n        self,\n        stock_code: str,\n        business: BusinessNode\n    ) -> bool:\n        \"\"\"添加业务线\"\"\"\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})\n        MERGE (b:Business {business_name: $business_name})\n        SET b.business_type = $business_type,\n            b.description = $description,\n            b.start_date = $start_date,\n            b.end_date = $end_date,\n            b.status = $status,\n            b.updated_at = datetime(),\n            b.created_at = coalesce(b.created_at, datetime())\n        MERGE (c)-[r:OPERATES_IN]->(b)\n        RETURN b\n        \"\"\"\n        \n        params = business.model_dump()\n        params['stock_code'] = stock_code\n        \n        try:\n            self.neo4j.execute_write(query, params)\n            logger.info(f\"✅ 业务线已添加: {business.business_name}\")\n            return True\n        except Exception as e:\n            logger.error(f\"❌ 业务线添加失败: {e}\")\n            return False\n    \n    def stop_business(\n        self,\n        stock_code: str,\n        business_name: str,\n        end_date: str = None\n    ) -> bool:\n        \"\"\"停止业务线\"\"\"\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})-[:OPERATES_IN]->(b:Business {business_name: $business_name})\n        SET b.status = 'stopped',\n            b.end_date = $end_date,\n            b.updated_at = datetime()\n        RETURN b\n        \"\"\"\n        \n        params = {\n            \"stock_code\": stock_code,\n            \"business_name\": business_name,\n            \"end_date\": end_date or datetime.utcnow().strftime(\"%Y-%m-%d\")\n        }\n        \n        try:\n            self.neo4j.execute_write(query, params)\n            logger.info(f\"✅ 业务线已停止: {business_name}\")\n            return True\n        except Exception as e:\n            logger.error(f\"❌ 业务线停止失败: {e}\")\n            return False\n    \n    # ============ 关键词操作 ============\n    \n    def add_keywords(\n        self,\n        stock_code: str,\n        keywords: List[KeywordNode],\n        relation_type: str = \"RELATES_TO\"\n    ) -> bool:\n        \"\"\"添加检索关键词\"\"\"\n        for keyword in keywords:\n            query = \"\"\"\n            MATCH (c:Company {stock_code: $stock_code})\n            MERGE (k:Keyword {keyword: $keyword})\n            SET k.keyword_type = $keyword_type,\n                k.weight = $weight,\n                k.created_at = $created_at\n            MERGE (c)-[r:RELATES_TO]->(k)\n            RETURN k\n            \"\"\"\n            \n            params = {\n                \"stock_code\": stock_code,\n                \"keyword\": keyword.keyword,\n                \"keyword_type\": keyword.keyword_type,\n                \"weight\": keyword.weight,\n                \"created_at\": keyword.created_at.isoformat()\n            }\n            \n            try:\n                self.neo4j.execute_write(query, params)\n            except Exception as e:\n                logger.error(f\"添加关键词失败 {keyword.keyword}: {e}\")\n                return False\n        \n        logger.info(f\"✅ 已添加 {len(keywords)} 个关键词\")\n        return True\n    \n    # ============ 概念操作 ============\n    \n    def add_concepts(\n        self,\n        stock_code: str,\n        concepts: List[ConceptNode]\n    ) -> bool:\n        \"\"\"添加概念/主题\"\"\"\n        for concept in concepts:\n            query = \"\"\"\n            MATCH (c:Company {stock_code: $stock_code})\n            MERGE (con:Concept {concept_name: $concept_name})\n            SET con.description = $description,\n                con.hot_level = $hot_level,\n                con.created_at = $created_at\n            MERGE (c)-[r:INVOLVES]->(con)\n            RETURN con\n            \"\"\"\n            \n            params = {\n                \"stock_code\": stock_code,\n                \"concept_name\": concept.concept_name,\n                \"description\": concept.description,\n                \"hot_level\": concept.hot_level,\n                \"created_at\": concept.created_at.isoformat()\n            }\n            \n            try:\n                self.neo4j.execute_write(query, params)\n            except Exception as e:\n                logger.error(f\"添加概念失败 {concept.concept_name}: {e}\")\n                return False\n        \n        logger.info(f\"✅ 已添加 {len(concepts)} 个概念\")\n        return True\n    \n    # ============ 完整图谱操作 ============\n    \n    def build_company_graph(self, graph: CompanyKnowledgeGraph) -> bool:\n        \"\"\"\n        构建完整的公司知识图谱\n        \n        Args:\n            graph: 公司知识图谱数据\n            \n        Returns:\n            是否成功\n        \"\"\"\n        try:\n            # 1. 创建公司节点\n            self.create_or_update_company(graph.company)\n            \n            # 2. 添加名称变体\n            if graph.name_variants:\n                self.add_name_variants(graph.company.stock_code, graph.name_variants)\n            \n            # 3. 添加业务线\n            for business in graph.businesses:\n                self.add_business(graph.company.stock_code, business)\n            \n            # 4. 添加行业\n            for industry in graph.industries:\n                self._add_industry(graph.company.stock_code, industry)\n            \n            # 5. 添加产品\n            for product in graph.products:\n                self._add_product(graph.company.stock_code, product)\n            \n            # 6. 添加关键词\n            if graph.keywords:\n                self.add_keywords(graph.company.stock_code, graph.keywords)\n            \n            # 7. 添加概念\n            if graph.concepts:\n                self.add_concepts(graph.company.stock_code, graph.concepts)\n            \n            logger.info(f\"✅ 知识图谱构建完成: {graph.company.stock_name}\")\n            return True\n            \n        except Exception as e:\n            logger.error(f\"❌ 知识图谱构建失败: {e}\")\n            return False\n    \n    def _add_industry(self, stock_code: str, industry: IndustryNode) -> bool:\n        \"\"\"添加行业节点（内部方法）\"\"\"\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})\n        MERGE (i:Industry {industry_name: $industry_name})\n        SET i.industry_code = $industry_code,\n            i.level = $level,\n            i.created_at = $created_at\n        MERGE (c)-[r:BELONGS_TO]->(i)\n        RETURN i\n        \"\"\"\n        \n        params = industry.model_dump()\n        params['stock_code'] = stock_code\n        \n        try:\n            self.neo4j.execute_write(query, params)\n            return True\n        except Exception as e:\n            logger.error(f\"行业添加失败: {e}\")\n            return False\n    \n    def _add_product(self, stock_code: str, product: ProductNode) -> bool:\n        \"\"\"添加产品节点（内部方法）\"\"\"\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})\n        MERGE (p:Product {product_name: $product_name})\n        SET p.product_type = $product_type,\n            p.description = $description,\n            p.updated_at = datetime(),\n            p.created_at = coalesce(p.created_at, datetime())\n        MERGE (c)-[r:PROVIDES]->(p)\n        RETURN p\n        \"\"\"\n        \n        params = product.model_dump()\n        params['stock_code'] = stock_code\n        \n        try:\n            self.neo4j.execute_write(query, params)\n            return True\n        except Exception as e:\n            logger.error(f\"产品添加失败: {e}\")\n            return False\n    \n    # ============ 查询操作 ============\n    \n    def get_company_graph(self, stock_code: str) -> Optional[CompanyKnowledgeGraph]:\n        \"\"\"\n        获取完整的公司知识图谱\n        \n        Args:\n            stock_code: 股票代码\n            \n        Returns:\n            公司知识图谱或None\n        \"\"\"\n        # 查询公司及其所有关联节点\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})\n        OPTIONAL MATCH (c)-[:HAS_VARIANT]->(v:NameVariant)\n        OPTIONAL MATCH (c)-[:OPERATES_IN]->(b:Business)\n        OPTIONAL MATCH (c)-[:BELONGS_TO]->(i:Industry)\n        OPTIONAL MATCH (c)-[:PROVIDES]->(p:Product)\n        OPTIONAL MATCH (c)-[:RELATES_TO]->(k:Keyword)\n        OPTIONAL MATCH (c)-[:INVOLVES]->(con:Concept)\n        RETURN c,\n               collect(DISTINCT v) as variants,\n               collect(DISTINCT b) as businesses,\n               collect(DISTINCT i) as industries,\n               collect(DISTINCT p) as products,\n               collect(DISTINCT k) as keywords,\n               collect(DISTINCT con) as concepts\n        \"\"\"\n        \n        try:\n            results = self.neo4j.execute_query(query, {\"stock_code\": stock_code})\n            \n            if not results or not results[0]['c']:\n                return None\n            \n            data = results[0]\n            company_data = dict(data['c'])\n            \n            # 构建完整图谱\n            graph = CompanyKnowledgeGraph(\n                company=CompanyNode(**company_data),\n                name_variants=[NameVariantNode(**dict(v)) for v in data['variants'] if v],\n                businesses=[BusinessNode(**dict(b)) for b in data['businesses'] if b],\n                industries=[IndustryNode(**dict(i)) for i in data['industries'] if i],\n                products=[ProductNode(**dict(p)) for p in data['products'] if p],\n                keywords=[KeywordNode(**dict(k)) for k in data['keywords'] if k],\n                concepts=[ConceptNode(**dict(c)) for c in data['concepts'] if c]\n            )\n            \n            return graph\n            \n        except Exception as e:\n            logger.error(f\"查询公司图谱失败: {e}\")\n            return None\n    \n    def get_search_keywords(self, stock_code: str) -> Optional[SearchKeywordSet]:\n        \"\"\"\n        获取用于检索的关键词集合\n        \n        Args:\n            stock_code: 股票代码\n            \n        Returns:\n            检索关键词集合\n        \"\"\"\n        graph = self.get_company_graph(stock_code)\n        if not graph:\n            return None\n        \n        # 构建检索关键词集合\n        keyword_set = SearchKeywordSet(\n            stock_code=stock_code,\n            stock_name=graph.company.stock_name,\n            name_keywords=[v.variant for v in graph.name_variants],\n            business_keywords=[b.business_name for b in graph.businesses if b.status == \"active\"],\n            industry_keywords=[i.industry_name for i in graph.industries],\n            product_keywords=[p.product_name for p in graph.products],\n            concept_keywords=[c.concept_name for c in graph.concepts]\n        )\n        \n        # 生成组合查询\n        keyword_set.combined_queries = keyword_set.generate_search_queries(max_queries=10)\n        \n        return keyword_set\n    \n    # ============ 图谱更新 ============\n    \n    def update_from_news(\n        self,\n        stock_code: str,\n        news_content: str,\n        extracted_info: Dict[str, Any]\n    ) -> bool:\n        \"\"\"\n        根据新闻更新图谱\n        \n        Args:\n            stock_code: 股票代码\n            news_content: 新闻内容\n            extracted_info: 提取的信息（由 LLM 提取）\n                {\n                    \"new_businesses\": [...],\n                    \"stopped_businesses\": [...],\n                    \"new_products\": [...],\n                    \"new_concepts\": [...]\n                }\n        \n        Returns:\n            是否成功\n        \"\"\"\n        try:\n            # 添加新业务线\n            for biz_name in extracted_info.get(\"new_businesses\", []):\n                business = BusinessNode(\n                    business_name=biz_name,\n                    business_type=\"new\",\n                    status=\"active\",\n                    start_date=datetime.utcnow().strftime(\"%Y-%m-%d\")\n                )\n                self.add_business(stock_code, business)\n            \n            # 停止业务线\n            for biz_name in extracted_info.get(\"stopped_businesses\", []):\n                self.stop_business(stock_code, biz_name)\n            \n            # 添加新产品\n            for prod_name in extracted_info.get(\"new_products\", []):\n                product = ProductNode(\n                    product_name=prod_name,\n                    product_type=\"service\"\n                )\n                self._add_product(stock_code, product)\n            \n            # 添加新概念\n            for concept_name in extracted_info.get(\"new_concepts\", []):\n                concept = ConceptNode(\n                    concept_name=concept_name,\n                    hot_level=5\n                )\n                self.add_concepts(stock_code, [concept])\n            \n            logger.info(f\"✅ 图谱已更新（基于新闻）\")\n            return True\n            \n        except Exception as e:\n            logger.error(f\"❌ 图谱更新失败: {e}\")\n            return False\n    \n    # ============ 统计和管理 ============\n    \n    def get_graph_stats(self, stock_code: str) -> Dict[str, int]:\n        \"\"\"获取图谱统计信息\"\"\"\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})\n        OPTIONAL MATCH (c)-[:HAS_VARIANT]->(v:NameVariant)\n        OPTIONAL MATCH (c)-[:OPERATES_IN]->(b:Business)\n        OPTIONAL MATCH (c)-[:BELONGS_TO]->(i:Industry)\n        OPTIONAL MATCH (c)-[:PROVIDES]->(p:Product)\n        OPTIONAL MATCH (c)-[:RELATES_TO]->(k:Keyword)\n        OPTIONAL MATCH (c)-[:INVOLVES]->(con:Concept)\n        RETURN \n            count(DISTINCT v) as variants_count,\n            count(DISTINCT b) as businesses_count,\n            count(DISTINCT i) as industries_count,\n            count(DISTINCT p) as products_count,\n            count(DISTINCT k) as keywords_count,\n            count(DISTINCT con) as concepts_count\n        \"\"\"\n        \n        try:\n            results = self.neo4j.execute_query(query, {\"stock_code\": stock_code})\n            if results:\n                return dict(results[0])\n            return {}\n        except Exception as e:\n            logger.error(f\"查询图谱统计失败: {e}\")\n            return {}\n    \n    def delete_company_graph(self, stock_code: str) -> bool:\n        \"\"\"删除公司及其所有关联节点\"\"\"\n        query = \"\"\"\n        MATCH (c:Company {stock_code: $stock_code})\n        OPTIONAL MATCH (c)-[r]->(n)\n        DETACH DELETE c, n\n        \"\"\"\n        \n        try:\n            self.neo4j.execute_write(query, {\"stock_code\": stock_code})\n            logger.info(f\"✅ 公司图谱已删除: {stock_code}\")\n            return True\n        except Exception as e:\n            logger.error(f\"❌ 图谱删除失败: {e}\")\n            return False\n    \n    def list_all_companies(self) -> List[Dict[str, str]]:\n        \"\"\"列出所有公司\"\"\"\n        query = \"\"\"\n        MATCH (c:Company)\n        RETURN c.stock_code as stock_code, \n               c.stock_name as stock_name, \n               c.industry as industry\n        ORDER BY c.stock_code\n        \"\"\"\n        \n        try:\n            return self.neo4j.execute_query(query)\n        except Exception as e:\n            logger.error(f\"查询公司列表失败: {e}\")\n            return []\n\n\n# 便捷函数\ndef get_graph_service() -> KnowledgeGraphService:\n    \"\"\"获取知识图谱服务实例\"\"\"\n    return KnowledgeGraphService()\n\n"
  },
  {
    "path": "backend/app/knowledge/knowledge_extractor.py",
    "content": "\"\"\"\n知识提取器\n从多种数据源提取公司知识并构建图谱\n\"\"\"\nimport logging\nimport json\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\n\nfrom agenticx import Agent\nfrom ..services.llm_service import get_llm_provider\nfrom .graph_models import (\n    CompanyNode,\n    NameVariantNode,\n    BusinessNode,\n    IndustryNode,\n    ProductNode,\n    KeywordNode,\n    ConceptNode,\n    CompanyKnowledgeGraph\n)\n\nlogger = logging.getLogger(__name__)\n\n\nclass KnowledgeExtractorAgent(Agent):\n    \"\"\"\n    知识提取智能体\n    从多种数据源提取公司信息并构建知识图谱\n    \"\"\"\n    \n    def __init__(self, llm_provider=None, organization_id: str = \"finnews\"):\n        super().__init__(\n            name=\"KnowledgeExtractor\",\n            role=\"知识提取专家\",\n            goal=\"从多种数据源提取公司信息，构建全面的知识图谱\",\n            backstory=\"\"\"你是一位专业的企业分析师和知识工程师。\n你擅长从各类数据源（财务数据、新闻、公告、研报）中提取关键信息，\n识别公司的业务线、产品、行业归属、关联概念等，\n并将这些信息结构化为知识图谱，用于后续的智能检索和分析。\"\"\",\n            organization_id=organization_id\n        )\n        \n        if llm_provider is None:\n            llm_provider = get_llm_provider()\n        object.__setattr__(self, '_llm_provider', llm_provider)\n        \n        logger.info(f\"Initialized {self.name} agent\")\n    \n    async def extract_from_akshare(\n        self,\n        stock_code: str,\n        stock_name: str,\n        stock_info: Dict[str, Any]\n    ) -> CompanyKnowledgeGraph:\n        \"\"\"\n        从 akshare 数据提取基础信息\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            stock_info: akshare 返回的股票信息\n            \n        Returns:\n            公司知识图谱\n        \"\"\"\n        # 获取当前时间\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        # 提取纯数字代码\n        short_code = stock_code\n        if stock_code.startswith(\"SH\") or stock_code.startswith(\"SZ\"):\n            short_code = stock_code[2:]\n        \n        # 创建公司节点\n        company = CompanyNode(\n            stock_code=stock_code,\n            stock_name=stock_name,\n            short_code=short_code,\n            industry=stock_info.get(\"industry\"),\n            sector=stock_info.get(\"sector\"),\n            market_cap=stock_info.get(\"market_cap\"),\n            listed_date=stock_info.get(\"listed_date\")\n        )\n        \n        # 生成名称变体（通过 LLM 推理）\n        name_variants_prompt = f\"\"\"请为以下公司生成可能的名称变体（简称、别名等）：\n\n【当前时间】\n{current_time}\n\n【公司信息】\n股票代码: {stock_code}\n公司全称: {stock_name}\n所属行业: {stock_info.get('industry', '未知')}\n\n请以JSON格式返回名称变体列表，每个变体包含：\n- variant: 变体名称\n- variant_type: 类型（abbreviation=简称, alias=别名, full_name=全称）\n\n示例：\n```json\n[\n    {{\"variant\": \"彩讯\", \"variant_type\": \"abbreviation\"}},\n    {{\"variant\": \"彩讯科技\", \"variant_type\": \"alias\"}},\n    {{\"variant\": \"{stock_name}\", \"variant_type\": \"full_name\"}}\n]\n```\n\n只返回JSON，不要其他解释。\"\"\"\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": name_variants_prompt}\n            ])\n            \n            content = response.content if hasattr(response, 'content') else str(response)\n            \n            # 提取JSON\n            import re\n            json_match = re.search(r'\\[.*\\]', content, re.DOTALL)\n            if json_match:\n                variants_data = json.loads(json_match.group())\n                name_variants = [NameVariantNode(**v) for v in variants_data]\n            else:\n                # 默认变体\n                name_variants = [\n                    NameVariantNode(variant=stock_name, variant_type=\"full_name\"),\n                    NameVariantNode(variant=stock_name[:2], variant_type=\"abbreviation\")\n                ]\n                logger.warning(\"LLM 未返回有效JSON，使用默认变体\")\n        except Exception as e:\n            logger.error(f\"名称变体提取失败: {e}\")\n            name_variants = [\n                NameVariantNode(variant=stock_name, variant_type=\"full_name\")\n            ]\n        \n        # 生成业务线（通过 LLM 推理 + akshare 数据）\n        business_prompt = f\"\"\"请分析以下公司的主营业务线：\n\n【当前时间】\n{current_time}\n\n【公司信息】\n股票代码: {stock_code}\n公司名称: {stock_name}\n所属行业: {stock_info.get('industry', '未知')}\n主营业务: {stock_info.get('main_business', '未知')}\n\n请以JSON格式返回业务线列表，每个业务包含：\n- business_name: 业务名称（简洁）\n- business_type: 类型（main=主营, new=新增, stopped=已停止）\n- description: 业务描述\n- status: 状态（active=活跃, stopped=已停止）\n\n示例：\n```json\n[\n    {{\"business_name\": \"运营商增值服务\", \"business_type\": \"main\", \"description\": \"为运营商提供增值业务\", \"status\": \"active\"}},\n    {{\"business_name\": \"AI大模型应用\", \"business_type\": \"new\", \"description\": \"AI应用开发与落地\", \"status\": \"active\"}}\n]\n```\n\n只返回JSON数组，不要其他解释。\"\"\"\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": business_prompt}\n            ])\n            \n            content = response.content if hasattr(response, 'content') else str(response)\n            \n            # 提取JSON\n            json_match = re.search(r'\\[.*\\]', content, re.DOTALL)\n            if json_match:\n                businesses_data = json.loads(json_match.group())\n                businesses = [BusinessNode(**b) for b in businesses_data]\n            else:\n                businesses = []\n                logger.warning(\"LLM 未返回有效业务线JSON\")\n        except Exception as e:\n            logger.error(f\"业务线提取失败: {e}\")\n            businesses = []\n        \n        # 行业节点\n        industries = []\n        if stock_info.get('industry'):\n            industries.append(IndustryNode(\n                industry_name=stock_info['industry'],\n                level=1\n            ))\n        \n        # 返回基础图谱\n        return CompanyKnowledgeGraph(\n            company=company,\n            name_variants=name_variants,\n            businesses=businesses,\n            industries=industries,\n            products=[],\n            keywords=[],\n            concepts=[]\n        )\n    \n    async def extract_from_news(\n        self,\n        stock_code: str,\n        stock_name: str,\n        news_list: List[Dict[str, Any]]\n    ) -> Dict[str, Any]:\n        \"\"\"\n        从新闻中提取业务变化和概念\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            news_list: 新闻列表\n            \n        Returns:\n            提取的信息\n        \"\"\"\n        if not news_list:\n            return {\n                \"new_businesses\": [],\n                \"stopped_businesses\": [],\n                \"new_products\": [],\n                \"new_concepts\": []\n            }\n        \n        # 获取当前时间\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        # 汇总新闻\n        news_summary = \"\\n\\n\".join([\n            f\"【{i+1}】{news.get('title', '')}\\n{news.get('content', '')[:300]}...\"\n            for i, news in enumerate(news_list[:10])\n        ])\n        \n        prompt = f\"\"\"请分析以下新闻，提取{stock_name}公司的业务变化和相关概念：\n\n【当前时间】\n{current_time}\n\n【公司】{stock_name}({stock_code})\n\n【近期新闻】\n{news_summary}\n\n请从新闻中提取：\n1. **新增业务线**：公司新开拓的业务方向\n2. **停止业务线**：公司明确表示停止或退出的业务\n3. **新产品/服务**：公司推出的新产品或服务\n4. **关联概念**：新闻中提到的热门概念（如 AI大模型、云计算、元宇宙等）\n\n以JSON格式返回：\n```json\n{{\n    \"new_businesses\": [\"业务1\", \"业务2\"],\n    \"stopped_businesses\": [\"业务3\"],\n    \"new_products\": [\"产品1\", \"产品2\"],\n    \"new_concepts\": [\"概念1\", \"概念2\"]\n}}\n```\n\n注意：\n- 只提取明确的信息，不要臆测\n- 如果没有相关信息，返回空数组\n- 只返回JSON，不要其他文字\n\nJSON:\"\"\"\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            \n            content = response.content if hasattr(response, 'content') else str(response)\n            \n            # 提取JSON\n            import re\n            json_match = re.search(r'\\{.*\\}', content, re.DOTALL)\n            if json_match:\n                extracted = json.loads(json_match.group())\n                logger.info(f\"✅ 从新闻提取信息: {extracted}\")\n                return extracted\n            else:\n                logger.warning(\"LLM 未返回有效JSON\")\n                return {\n                    \"new_businesses\": [],\n                    \"stopped_businesses\": [],\n                    \"new_products\": [],\n                    \"new_concepts\": []\n                }\n        except Exception as e:\n            logger.error(f\"新闻信息提取失败: {e}\")\n            return {\n                \"new_businesses\": [],\n                \"stopped_businesses\": [],\n                \"new_products\": [],\n                \"new_concepts\": []\n            }\n    \n    async def extract_from_document(\n        self,\n        stock_code: str,\n        stock_name: str,\n        document_content: str,\n        document_type: str = \"annual_report\"\n    ) -> Dict[str, Any]:\n        \"\"\"\n        从PDF/Word文档提取深度信息\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            document_content: 文档内容（已通过MinerU解析）\n            document_type: 文档类型（annual_report=年报, announcement=公告）\n            \n        Returns:\n            提取的信息\n        \"\"\"\n        # 获取当前时间\n        current_time = datetime.now().strftime(\"%Y年%m月%d日 %H:%M\")\n        \n        prompt = f\"\"\"请从以下{stock_name}的{document_type}中提取详细的业务信息：\n\n【当前时间】\n{current_time}\n\n【公司】{stock_name}({stock_code})\n\n【文档内容】（前3000字）\n{document_content[:3000]}\n\n请提取：\n1. **主营业务**：公司当前的核心业务（详细）\n2. **新增业务**：文档中提到的新业务拓展\n3. **主要产品**：公司的主要产品或服务\n4. **行业定位**：所属行业和细分领域\n5. **战略方向**：未来战略和关注的热点领域\n\n以JSON格式返回：\n```json\n{{\n    \"main_businesses\": [\n        {{\"name\": \"业务1\", \"description\": \"详细描述\"}}\n    ],\n    \"new_businesses\": [\n        {{\"name\": \"业务2\", \"description\": \"详细描述\"}}\n    ],\n    \"products\": [\n        {{\"name\": \"产品1\", \"type\": \"software/hardware/service\", \"description\": \"描述\"}}\n    ],\n    \"industries\": [\"一级行业\", \"二级行业\"],\n    \"concepts\": [\"概念1\", \"概念2\"],\n    \"keywords\": [\"关键词1\", \"关键词2\"]\n}}\n```\n\n只返回JSON，不要其他解释。\"\"\"\n        \n        try:\n            response = self._llm_provider.invoke([\n                {\"role\": \"system\", \"content\": f\"你是{self.role}，{self.backstory}\"},\n                {\"role\": \"user\", \"content\": prompt}\n            ])\n            \n            content = response.content if hasattr(response, 'content') else str(response)\n            \n            # 提取JSON\n            import re\n            json_match = re.search(r'\\{.*\\}', content, re.DOTALL)\n            if json_match:\n                extracted = json.loads(json_match.group())\n                logger.info(f\"✅ 从文档提取信息: {len(extracted.get('products', []))}个产品, {len(extracted.get('concepts', []))}个概念\")\n                return extracted\n            else:\n                logger.warning(\"LLM 未返回有效JSON\")\n                return {}\n        except Exception as e:\n            logger.error(f\"文档信息提取失败: {e}\")\n            return {}\n\n\nclass AkshareKnowledgeExtractor:\n    \"\"\"\n    从 akshare 提取基础信息，构建简单图谱并生成搜索关键词\n    \"\"\"\n    \n    @staticmethod\n    def extract_company_info(stock_code: str) -> Optional[Dict[str, Any]]:\n        \"\"\"\n        从 akshare 获取公司基础信息\n        \n        Args:\n            stock_code: 股票代码\n            \n        Returns:\n            公司信息字典\n        \"\"\"\n        try:\n            import akshare as ak\n            \n            # 提取纯数字代码\n            pure_code = stock_code\n            if stock_code.startswith(\"SH\") or stock_code.startswith(\"SZ\"):\n                pure_code = stock_code[2:]\n            \n            logger.info(f\"🔍 从 akshare 获取公司信息: {pure_code}\")\n            \n            # 获取个股信息\n            try:\n                # 尝试获取实时行情（包含基本信息）\n                stock_df = ak.stock_individual_info_em(symbol=pure_code)\n                \n                if stock_df is not None and not stock_df.empty:\n                    # 打印 DataFrame 结构用于调试\n                    logger.info(f\"📋 akshare 返回 DataFrame: columns={list(stock_df.columns)}, rows={len(stock_df)}\")\n                    \n                    # 转换为字典 - 兼容不同的列名格式\n                    info_dict = {}\n                    \n                    # 确定列名\n                    columns = list(stock_df.columns)\n                    key_col = None\n                    value_col = None\n                    \n                    # 尝试找到 key 列\n                    for col in ['item', '属性', 'name', '项目']:\n                        if col in columns:\n                            key_col = col\n                            break\n                    \n                    # 尝试找到 value 列\n                    for col in ['value', '值', 'data', '数值']:\n                        if col in columns:\n                            value_col = col\n                            break\n                    \n                    # 如果只有两列，直接使用\n                    if len(columns) == 2 and (key_col is None or value_col is None):\n                        key_col, value_col = columns[0], columns[1]\n                    \n                    if key_col and value_col:\n                        for _, row in stock_df.iterrows():\n                            try:\n                                key = str(row[key_col]) if row[key_col] is not None else ''\n                                value = str(row[value_col]) if row[value_col] is not None else ''\n                                if key and value and key != 'nan' and value != 'nan':\n                                    info_dict[key] = value\n                            except Exception as row_err:\n                                logger.debug(f\"跳过行: {row_err}\")\n                                continue\n                    else:\n                        logger.warning(f\"⚠️ 无法识别 DataFrame 列结构: {columns}\")\n                    \n                    logger.info(f\"📊 解析到 {len(info_dict)} 个字段: {list(info_dict.keys())[:10]}...\")\n                    \n                    # 提取关键字段\n                    result = {\n                        \"industry\": info_dict.get(\"行业\") or info_dict.get(\"所属行业\"),\n                        \"sector\": info_dict.get(\"板块\") or info_dict.get(\"所属板块\"),\n                        \"main_business\": info_dict.get(\"主营业务\") or info_dict.get(\"经营范围\"),\n                        \"total_market_cap\": info_dict.get(\"总市值\"),\n                        \"listed_date\": info_dict.get(\"上市时间\"),\n                        \"raw_data\": info_dict\n                    }\n                    \n                    main_business_preview = (result.get('main_business') or '')[:30]\n                    logger.info(f\"✅ 获取到公司信息: 行业={result.get('industry')}, 主营={main_business_preview}...\")\n                    return result\n                else:\n                    logger.warning(f\"⚠️ akshare 未返回数据: {pure_code}\")\n                    return None\n                    \n            except Exception as e:\n                logger.error(f\"❌ akshare 查询失败: {e}\", exc_info=True)\n                return None\n                \n        except ImportError:\n            logger.error(\"akshare 未安装\")\n            return None\n        except Exception as e:\n            logger.error(f\"提取公司信息失败: {e}\")\n            return None\n    \n    @staticmethod\n    def generate_search_keywords(\n        stock_code: str,\n        stock_name: str,\n        akshare_info: Optional[Dict[str, Any]] = None\n    ) -> Dict[str, List[str]]:\n        \"\"\"\n        基于股票信息生成分层关键词\n        \n        返回两类关键词：\n        - core_keywords: 核心关键词（公司名、代码等，必须包含）\n        - extension_keywords: 扩展关键词（行业、业务、人名等，用于组合）\n        \n        Args:\n            stock_code: 股票代码（如 SZ000004）\n            stock_name: 股票名称（如 *ST国华）\n            akshare_info: akshare 返回的公司信息（可选）\n            \n        Returns:\n            {\"core_keywords\": [...], \"extension_keywords\": [...]}\n        \"\"\"\n        core_keywords = []\n        extension_keywords = []\n        \n        # 提取纯数字代码\n        pure_code = stock_code\n        if stock_code.startswith(\"SH\") or stock_code.startswith(\"SZ\"):\n            pure_code = stock_code[2:]\n        \n        # === 1. 核心关键词（必须包含，用于确保相关性）===\n        # 原始名称（如 *ST国华）\n        core_keywords.append(stock_name)\n        \n        # 去除 ST 标记的名称（如 国华）\n        clean_name = stock_name\n        for prefix in [\"*ST\", \"ST\", \"S*ST\", \"S\"]:\n            if clean_name.startswith(prefix):\n                clean_name = clean_name[len(prefix):]\n                break\n        if clean_name != stock_name and len(clean_name) >= 2:\n            core_keywords.append(clean_name)\n        \n        # 股票代码\n        core_keywords.append(pure_code)  # 000004\n        core_keywords.append(stock_code)  # SZ000004\n        \n        # 小写变体（如 st国华）\n        core_keywords.append(stock_name.lower())\n        if clean_name != stock_name:\n            core_keywords.append(clean_name.lower())\n        \n        # === 2. 扩展关键词（用于组合搜索，扩大范围）===\n        if akshare_info:\n            raw_data = akshare_info.get(\"raw_data\", {})\n            \n            # 公司全称（从 raw_data 中提取）\n            company_full_name = raw_data.get(\"公司名称\", raw_data.get(\"公司全称\"))\n            if company_full_name and len(company_full_name) > 4:\n                extension_keywords.append(company_full_name)\n            \n            # 行业（但不单独搜索）\n            industry = akshare_info.get(\"industry\")\n            if industry:\n                extension_keywords.append(industry)\n            \n            # 主营业务（提取关键词）\n            main_business = akshare_info.get(\"main_business\", \"\")\n            if main_business:\n                import re\n                business_parts = re.split(r'[，,、；;。\\s]+', main_business)\n                for part in business_parts[:3]:  # 只取前3个\n                    if 3 <= len(part) <= 10:  # 长度适中的词\n                        extension_keywords.append(part)\n            \n            # 董事长、总经理等关键人物\n            ceo = raw_data.get(\"董事长\", raw_data.get(\"总经理\"))\n            if ceo and 2 <= len(str(ceo)) <= 4:\n                extension_keywords.append(str(ceo))\n        \n        # 去重\n        core_keywords = list(dict.fromkeys(core_keywords))\n        extension_keywords = list(dict.fromkeys(extension_keywords))\n        \n        logger.info(\n            f\"📋 生成分层关键词: 核心={len(core_keywords)}个{core_keywords[:5]}, \"\n            f\"扩展={len(extension_keywords)}个{extension_keywords[:5]}\"\n        )\n        \n        return {\n            \"core_keywords\": core_keywords,\n            \"extension_keywords\": extension_keywords\n        }\n    \n    @staticmethod\n    def build_simple_graph_from_info(\n        stock_code: str,\n        stock_name: str,\n        akshare_info: Optional[Dict[str, Any]] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        基于 akshare 信息构建简单的知识图谱结构\n        \n        即使 akshare 调用失败，也能基于股票名称构建基础图谱\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            akshare_info: akshare 返回的公司信息（可选）\n            \n        Returns:\n            简单图谱结构\n        \"\"\"\n        # 提取纯数字代码\n        pure_code = stock_code\n        if stock_code.startswith(\"SH\") or stock_code.startswith(\"SZ\"):\n            pure_code = stock_code[2:]\n        \n        # 构建基础图谱\n        graph = {\n            \"company\": {\n                \"stock_code\": stock_code,\n                \"stock_name\": stock_name,\n                \"pure_code\": pure_code\n            },\n            \"name_variants\": [],\n            \"industries\": [],\n            \"businesses\": [],\n            \"keywords\": []\n        }\n        \n        # === 1. 名称变体 ===\n        graph[\"name_variants\"].append(stock_name)\n        \n        # 去除 ST 标记\n        clean_name = stock_name\n        for prefix in [\"*ST\", \"ST\", \"S*ST\", \"S\"]:\n            if clean_name.startswith(prefix):\n                clean_name = clean_name[len(prefix):]\n                break\n        if clean_name != stock_name:\n            graph[\"name_variants\"].append(clean_name)\n        \n        # 简称（取前两个字）\n        if len(clean_name) >= 2:\n            graph[\"name_variants\"].append(clean_name[:2])\n        \n        # === 2. 基于 akshare 信息填充 ===\n        if akshare_info:\n            # 行业\n            industry = akshare_info.get(\"industry\")\n            if industry:\n                graph[\"industries\"].append(industry)\n            \n            # 板块\n            sector = akshare_info.get(\"sector\")\n            if sector:\n                graph[\"industries\"].append(sector)\n            \n            # 主营业务\n            main_business = akshare_info.get(\"main_business\", \"\")\n            if main_business:\n                graph[\"businesses\"].append(main_business[:100])  # 截取前100字\n                \n                # 提取业务关键词\n                import re\n                business_parts = re.split(r'[，,、；;。\\s]+', main_business)\n                for part in business_parts[:5]:\n                    if 2 <= len(part) <= 10:\n                        graph[\"keywords\"].append(part)\n        \n        # === 3. 生成搜索关键词（分层：核心 + 扩展） ===\n        keyword_groups = AkshareKnowledgeExtractor.generate_search_keywords(\n            stock_code, stock_name, akshare_info\n        )\n        graph[\"core_keywords\"] = keyword_groups[\"core_keywords\"]\n        graph[\"extension_keywords\"] = keyword_groups[\"extension_keywords\"]\n        \n        logger.info(f\"📊 构建简单图谱: 公司={stock_name}, 名称变体={len(graph['name_variants'])}个, \"\n                   f\"行业={len(graph['industries'])}个, \"\n                   f\"核心词={len(graph['core_keywords'])}个, 扩展词={len(graph['extension_keywords'])}个\")\n        \n        return graph\n\n\nclass NewsKnowledgeExtractor:\n    \"\"\"\n    从新闻中提取业务变化\n    \"\"\"\n    \n    def __init__(self, extractor_agent: KnowledgeExtractorAgent):\n        self.agent = extractor_agent\n    \n    async def extract_business_changes(\n        self,\n        stock_code: str,\n        stock_name: str,\n        news_list: List[Dict[str, Any]]\n    ) -> Dict[str, Any]:\n        \"\"\"\n        从新闻列表中提取业务变化\n        \n        Args:\n            stock_code: 股票代码\n            stock_name: 股票名称\n            news_list: 新闻列表\n            \n        Returns:\n            业务变化信息\n        \"\"\"\n        return await self.agent.extract_from_news(stock_code, stock_name, news_list)\n\n\n# 工厂函数\ndef create_knowledge_extractor(llm_provider=None) -> KnowledgeExtractorAgent:\n    \"\"\"创建知识提取智能体\"\"\"\n    return KnowledgeExtractorAgent(llm_provider)\n\n"
  },
  {
    "path": "backend/app/knowledge/parallel_search.py",
    "content": "\"\"\"\n并发多关键词检索策略\n基于知识图谱的关键词，并发调用多个搜索API\n\"\"\"\nimport logging\nimport asyncio\nfrom typing import List, Dict, Any, Set\nfrom concurrent.futures import ThreadPoolExecutor, as_completed\nfrom datetime import datetime\n\nfrom ..tools.bochaai_search import bochaai_search\nfrom .graph_models import SearchKeywordSet\n\nlogger = logging.getLogger(__name__)\n\n\nclass ParallelSearchStrategy:\n    \"\"\"\n    并发检索策略\n    基于知识图谱生成的关键词，并发搜索获取更全面的新闻\n    \"\"\"\n    \n    def __init__(self, max_workers: int = 5):\n        \"\"\"\n        初始化并发检索策略\n        \n        Args:\n            max_workers: 最大并发工作线程数\n        \"\"\"\n        self.max_workers = max_workers\n    \n    def search_with_multiple_keywords(\n        self,\n        keyword_set: SearchKeywordSet,\n        days: int = 30,\n        max_results_per_query: int = 50\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        使用多个关键词并发搜索\n        \n        Args:\n            keyword_set: 关键词集合\n            days: 搜索天数\n            max_results_per_query: 每个查询的最大结果数\n            \n        Returns:\n            去重后的新闻列表\n        \"\"\"\n        # 生成多样化的搜索查询\n        queries = keyword_set.generate_search_queries(max_queries=10)\n        \n        logger.info(f\"🔍 开始并发检索: {keyword_set.stock_name}, 查询数={len(queries)}\")\n        logger.info(f\"📋 查询列表: {queries}\")\n        \n        all_results = []\n        seen_urls: Set[str] = set()  # 用于去重\n        \n        # 并发执行搜索\n        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:\n            # 提交所有搜索任务\n            future_to_query = {}\n            for query in queries:\n                future = executor.submit(\n                    self._search_single_query,\n                    query,\n                    days,\n                    max_results_per_query\n                )\n                future_to_query[future] = query\n            \n            # 收集结果\n            for future in as_completed(future_to_query):\n                query = future_to_query[future]\n                try:\n                    results = future.result()\n                    \n                    # 去重并添加\n                    added_count = 0\n                    for result in results:\n                        if result.url not in seen_urls:\n                            seen_urls.add(result.url)\n                            all_results.append(result)\n                            added_count += 1\n                    \n                    logger.info(f\"✅ 查询「{query}」完成: 返回{len(results)}条, 去重后新增{added_count}条\")\n                    \n                except Exception as e:\n                    logger.error(f\"❌ 查询「{query}」失败: {e}\")\n        \n        logger.info(f\"🎉 并发检索完成: 共获取 {len(all_results)} 条去重后的新闻\")\n        return all_results\n    \n    def _search_single_query(\n        self,\n        query: str,\n        days: int,\n        count: int\n    ) -> List[Any]:\n        \"\"\"\n        执行单个查询（在线程中运行）\n        \n        Args:\n            query: 搜索查询\n            days: 天数\n            count: 结果数\n            \n        Returns:\n            搜索结果列表\n        \"\"\"\n        try:\n            if not bochaai_search.is_available():\n                return []\n            \n            # 调用 BochaAI 搜索\n            results = bochaai_search.search(\n                query=query,\n                freshness=\"year\",\n                count=count,\n                offset=0\n            )\n            \n            return results\n            \n        except Exception as e:\n            logger.error(f\"搜索失败 {query}: {e}\")\n            return []\n    \n    async def search_async(\n        self,\n        keyword_set: SearchKeywordSet,\n        days: int = 30,\n        max_results_per_query: int = 50\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        异步版本的并发搜索\n        \n        Args:\n            keyword_set: 关键词集合\n            days: 搜索天数\n            max_results_per_query: 每个查询的最大结果数\n            \n        Returns:\n            去重后的新闻列表\n        \"\"\"\n        # 在线程池中运行同步搜索\n        loop = asyncio.get_event_loop()\n        return await loop.run_in_executor(\n            None,\n            self.search_with_multiple_keywords,\n            keyword_set,\n            days,\n            max_results_per_query\n        )\n\n\n# 便捷函数\ndef create_parallel_search(max_workers: int = 5) -> ParallelSearchStrategy:\n    \"\"\"创建并发检索策略\"\"\"\n    return ParallelSearchStrategy(max_workers=max_workers)\n\n"
  },
  {
    "path": "backend/app/main.py",
    "content": "\"\"\"\nFinnewsHunter 主应用入口\n\"\"\"\nimport logging\nfrom contextlib import asynccontextmanager\nfrom fastapi import FastAPI, Request\nfrom fastapi.middleware.cors import CORSMiddleware\nfrom fastapi.responses import JSONResponse, Response\nfrom fastapi.exceptions import RequestValidationError\nfrom fastapi.openapi.docs import get_swagger_ui_html, get_redoc_html, get_swagger_ui_oauth2_redirect_html\nfrom starlette.middleware.base import BaseHTTPMiddleware\n\nfrom .core.config import settings\nfrom .core.database import init_database\nfrom .api.v1 import api_router\n\n# 配置日志\nlogging.basicConfig(\n    level=getattr(logging, settings.LOG_LEVEL),\n    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'\n)\nlogger = logging.getLogger(__name__)\n\n\nclass DocsCSPMiddleware(BaseHTTPMiddleware):\n    \"\"\"为文档页面设置 CSP 头，允许 unsafe-eval（Swagger UI 需要）\"\"\"\n    async def dispatch(self, request: Request, call_next):\n        response = await call_next(request)\n        # 只为文档页面设置 CSP\n        if request.url.path in [\"/docs\", \"/redoc\", \"/openapi.json\"]:\n            # 开发环境：完全禁用 CSP 限制（仅用于文档页面）\n            # 生产环境应该使用更严格的策略\n            if settings.DEBUG:\n                # 开发环境：允许所有内容（Swagger UI 需要）\n                response.headers[\"Content-Security-Policy\"] = (\n                    \"default-src * 'unsafe-inline' 'unsafe-eval' data: blob:; \"\n                    \"script-src * 'unsafe-inline' 'unsafe-eval'; \"\n                    \"style-src * 'unsafe-inline'; \"\n                    \"img-src * data: blob:; \"\n                    \"font-src * data:; \"\n                    \"connect-src *; \"\n                    \"frame-src *; \"\n                    \"object-src *; \"\n                    \"media-src *; \"\n                    \"worker-src * blob:; \"\n                    \"manifest-src *; \"\n                    \"form-action *; \"\n                    \"base-uri *; \"\n                    \"frame-ancestors *;\"\n                )\n            else:\n                # 生产环境：使用较宽松但仍有限制的策略\n                response.headers[\"Content-Security-Policy\"] = (\n                    \"default-src 'self' 'unsafe-inline' 'unsafe-eval' data: blob: https:; \"\n                    \"script-src 'self' 'unsafe-eval' 'unsafe-inline' https://cdn.jsdelivr.net https://unpkg.com; \"\n                    \"style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net https://fonts.googleapis.com https://unpkg.com; \"\n                    \"font-src 'self' data: https://fonts.gstatic.com https://cdn.jsdelivr.net; \"\n                    \"img-src 'self' data: blob: https:; \"\n                    \"connect-src 'self' https:; \"\n                    \"frame-src 'self' https:; \"\n                    \"object-src 'none'; \"\n                    \"base-uri 'self'; \"\n                    \"form-action 'self'\"\n                )\n        return response\n\n\n@asynccontextmanager\nasync def lifespan(app: FastAPI):\n    \"\"\"应用生命周期管理\"\"\"\n    # 启动时执行\n    logger.info(\"=== FinnewsHunter Starting ===\")\n    logger.info(f\"Environment: {'Development' if settings.DEBUG else 'Production'}\")\n    logger.info(f\"LLM Provider: {settings.LLM_PROVIDER}/{settings.LLM_MODEL}\")\n    \n    # 初始化 Neo4j 知识图谱（仅创建约束和索引，不构建具体图谱）\n    try:\n        from .core.neo4j_client import get_neo4j_client\n        from .knowledge.graph_service import get_graph_service\n        \n        logger.info(\"🔍 初始化 Neo4j 知识图谱...\")\n        neo4j_client = get_neo4j_client()\n        \n        if neo4j_client.health_check():\n            logger.info(\"✅ Neo4j 连接正常\")\n            # 初始化约束和索引（由 graph_service 自动完成）\n            graph_service = get_graph_service()\n            logger.info(\"✅ Neo4j 约束和索引已就绪\")\n            logger.info(\"💡 提示: 首次定向爬取时会自动为股票构建知识图谱\")\n        else:\n            logger.warning(\"⚠️ Neo4j 连接失败，知识图谱功能将不可用（不影响其他功能）\")\n    except Exception as e:\n        logger.warning(f\"⚠️ Neo4j 初始化失败: {e}，知识图谱功能将不可用（不影响其他功能）\")\n    \n    yield\n    \n    # 关闭时执行\n    logger.info(\"=== FinnewsHunter Shutting Down ===\")\n    \n    # 关闭 Neo4j 连接\n    try:\n        from .core.neo4j_client import close_neo4j_client\n        close_neo4j_client()\n        logger.info(\"✅ Neo4j 连接已关闭\")\n    except:\n        pass\n\n\n# 创建 FastAPI 应用\n# 禁用默认文档（我们将使用自定义 CDN）\napp = FastAPI(\n    title=settings.APP_NAME,\n    description=\"Financial News Analysis Platform powered by AgenticX\",\n    version=settings.APP_VERSION,\n    debug=settings.DEBUG,\n    lifespan=lifespan,\n    docs_url=None,  # 禁用默认文档，使用自定义路由\n    redoc_url=None,  # 禁用默认 ReDoc，使用自定义路由\n)\n\n# 添加文档页面的 CSP 中间件（必须在 CORS 之前）\napp.add_middleware(DocsCSPMiddleware)\n\n# 配置 CORS\n# 开发环境允许所有来源（包括 file:// 协议）\nif settings.DEBUG:\n    app.add_middleware(\n        CORSMiddleware,\n        allow_origins=[\"*\"],  # 开发环境允许所有来源\n        allow_credentials=False,  # 允许所有来源时必须为 False\n        allow_methods=[\"*\"],\n        allow_headers=[\"*\"],\n    )\nelse:\n    # 生产环境只允许配置的来源\n    app.add_middleware(\n        CORSMiddleware,\n        allow_origins=settings.BACKEND_CORS_ORIGINS,\n        allow_credentials=True,\n        allow_methods=[\"*\"],\n        allow_headers=[\"*\"],\n    )\n\n\n# 请求验证错误处理（422错误）\n@app.exception_handler(RequestValidationError)\nasync def validation_exception_handler(request: Request, exc: RequestValidationError):\n    \"\"\"处理请求验证错误（422）\"\"\"\n    # 尝试读取请求体\n    body_str = \"\"\n    try:\n        body_bytes = await request.body()\n        body_str = body_bytes.decode('utf-8')\n    except Exception as e:\n        logger.warning(f\"Failed to read request body: {e}\")\n    \n    logger.error(f\"Validation error for {request.method} {request.url.path}\")\n    logger.error(f\"Validation errors: {exc.errors()}\")\n    logger.error(f\"Request body: {body_str}\")\n    \n    return JSONResponse(\n        status_code=422,\n        content={\n            \"detail\": exc.errors(),\n            \"body\": body_str if settings.DEBUG else None\n        }\n    )\n\n\n# 全局异常处理\n@app.exception_handler(Exception)\nasync def global_exception_handler(request, exc):\n    logger.error(f\"Global exception: {exc}\", exc_info=True)\n    return JSONResponse(\n        status_code=500,\n        content={\n            \"success\": False,\n            \"error\": \"Internal server error\",\n            \"detail\": str(exc) if settings.DEBUG else None\n        }\n    )\n\n\n# 根路由\n@app.get(\"/\")\nasync def root():\n    \"\"\"根路由 - 系统信息\"\"\"\n    return {\n        \"name\": settings.APP_NAME,\n        \"version\": settings.APP_VERSION,\n        \"status\": \"active\",\n        \"message\": \"Welcome to FinnewsHunter API\",\n        \"docs_url\": \"/docs\",\n        \"api_prefix\": settings.API_V1_PREFIX,\n    }\n\n\n# 健康检查\n@app.get(\"/health\")\nasync def health_check():\n    \"\"\"健康检查端点\"\"\"\n    return {\n        \"status\": \"healthy\",\n        \"app\": settings.APP_NAME,\n        \"version\": settings.APP_VERSION,\n    }\n\n\n# 自定义 Swagger UI（使用 unpkg.com CDN，因为 jsdelivr.net 无法访问）\n@app.get(\"/docs\", include_in_schema=False)\n@app.head(\"/docs\", include_in_schema=False)\nasync def custom_swagger_ui_html():\n    \"\"\"自定义 Swagger UI，使用 unpkg.com CDN\"\"\"\n    return get_swagger_ui_html(\n        openapi_url=app.openapi_url,\n        title=app.title + \" - Swagger UI\",\n        oauth2_redirect_url=\"/docs/oauth2-redirect\",\n        swagger_js_url=\"https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js\",\n        swagger_css_url=\"https://unpkg.com/swagger-ui-dist@5/swagger-ui.css\",\n        swagger_favicon_url=\"https://fastapi.tiangolo.com/img/favicon.png\",\n    )\n\n\n# Swagger UI OAuth2 重定向\n@app.get(\"/docs/oauth2-redirect\", include_in_schema=False)\nasync def swagger_ui_redirect():\n    \"\"\"Swagger UI OAuth2 重定向\"\"\"\n    return get_swagger_ui_oauth2_redirect_html()\n\n\n# 自定义 ReDoc（使用 unpkg.com CDN）\n@app.get(\"/redoc\", include_in_schema=False)\n@app.head(\"/redoc\", include_in_schema=False)\nasync def redoc_html():\n    \"\"\"自定义 ReDoc，使用 unpkg.com CDN\"\"\"\n    return get_redoc_html(\n        openapi_url=app.openapi_url,\n        title=app.title + \" - ReDoc\",\n        redoc_js_url=\"https://unpkg.com/redoc@2/bundles/redoc.standalone.js\",\n        redoc_favicon_url=\"https://fastapi.tiangolo.com/img/favicon.png\",\n    )\n\n\n# Chrome DevTools 配置文件（避免 404 日志）\n@app.get(\"/.well-known/appspecific/com.chrome.devtools.json\")\nasync def chrome_devtools_config():\n    \"\"\"Chrome DevTools 配置文件\"\"\"\n    return {}\n\n\n# 注册 API 路由\napp.include_router(api_router, prefix=settings.API_V1_PREFIX)\n\n\nif __name__ == \"__main__\":\n    import uvicorn\n    uvicorn.run(\n        \"app.main:app\",\n        host=settings.HOST,\n        port=settings.PORT,\n        reload=settings.DEBUG,\n    )\n"
  },
  {
    "path": "backend/app/models/__init__.py",
    "content": "\"\"\"\n数据模型模块\n\"\"\"\nfrom .database import Base, get_async_session, get_sync_session, init_db\nfrom .news import News\nfrom .stock import Stock\nfrom .analysis import Analysis\nfrom .crawl_task import CrawlTask, CrawlMode, TaskStatus\nfrom .debate_history import DebateHistory\n\n__all__ = [\n    \"Base\",\n    \"get_async_session\",\n    \"get_sync_session\",\n    \"init_db\",\n    \"News\",\n    \"Stock\",\n    \"Analysis\",\n    \"CrawlTask\",\n    \"CrawlMode\",\n    \"TaskStatus\",\n    \"DebateHistory\",\n]\n\n"
  },
  {
    "path": "backend/app/models/analysis.py",
    "content": "\"\"\"\n分析结果数据模型\n\"\"\"\nfrom datetime import datetime\nfrom sqlalchemy import Column, Integer, String, Text, DateTime, Float, ForeignKey, JSON\nfrom sqlalchemy.orm import relationship\n\nfrom .database import Base\n\n\nclass Analysis(Base):\n    \"\"\"智能体分析结果表\"\"\"\n    \n    __tablename__ = \"analyses\"\n    \n    # 主键\n    id = Column(Integer, primary_key=True, index=True, autoincrement=True)\n    \n    # 关联新闻\n    news_id = Column(Integer, ForeignKey(\"news.id\", ondelete=\"CASCADE\"), nullable=False, index=True)\n    \n    # 智能体信息\n    agent_name = Column(String(100), nullable=False, comment=\"执行分析的智能体名称\")\n    agent_role = Column(String(100), nullable=True, comment=\"智能体角色\")\n    \n    # 分析结果\n    analysis_result = Column(Text, nullable=False, comment=\"分析结果（完整文本）\")\n    summary = Column(Text, nullable=True, comment=\"分析摘要\")\n    \n    # 情感分析\n    sentiment = Column(String(20), nullable=True, comment=\"情感倾向（positive, negative, neutral）\")\n    sentiment_score = Column(Float, nullable=True, comment=\"情感评分（-1到1）\")\n    confidence = Column(Float, nullable=True, comment=\"置信度（0到1）\")\n    \n    # 结构化数据\n    structured_data = Column(JSON, nullable=True, comment=\"结构化分析数据（JSON格式）\")\n    \n    # 元数据\n    execution_time = Column(Float, nullable=True, comment=\"执行时间（秒）\")\n    llm_model = Column(String(100), nullable=True, comment=\"使用的LLM模型\")\n    tokens_used = Column(Integer, nullable=True, comment=\"消耗的Token数\")\n    \n    # 时间戳\n    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)\n    \n    # 关系\n    news = relationship(\"News\", back_populates=\"analyses\")\n    \n    def __repr__(self):\n        return f\"<Analysis(id={self.id}, news_id={self.news_id}, agent='{self.agent_name}')>\"\n    \n    def to_dict(self):\n        \"\"\"转换为字典\"\"\"\n        return {\n            \"id\": self.id,\n            \"news_id\": self.news_id,\n            \"agent_name\": self.agent_name,\n            \"agent_role\": self.agent_role,\n            \"analysis_result\": self.analysis_result,\n            \"summary\": self.summary,\n            \"sentiment\": self.sentiment,\n            \"sentiment_score\": self.sentiment_score,\n            \"confidence\": self.confidence,\n            \"structured_data\": self.structured_data,\n            \"execution_time\": self.execution_time,\n            \"llm_model\": self.llm_model,\n            \"tokens_used\": self.tokens_used,\n            \"created_at\": self.created_at.isoformat() if self.created_at else None,\n        }\n\n"
  },
  {
    "path": "backend/app/models/crawl_task.py",
    "content": "\"\"\"\n爬取任务数据模型\n\"\"\"\nfrom datetime import datetime\nfrom typing import Optional\nfrom sqlalchemy import Column, Integer, String, DateTime, JSON, Float\nfrom enum import Enum\n\nfrom .database import Base\n\n\nclass CrawlMode(str, Enum):\n    \"\"\"爬取模式枚举\"\"\"\n    COLD_START = \"cold_start\"      # 冷启动（批量历史）\n    REALTIME = \"realtime\"           # 实时监控\n    TARGETED = \"targeted\"           # 定向分析\n\n\nclass TaskStatus(str, Enum):\n    \"\"\"任务状态枚举\"\"\"\n    PENDING = \"pending\"             # 待执行\n    RUNNING = \"running\"             # 执行中\n    COMPLETED = \"completed\"         # 已完成\n    FAILED = \"failed\"               # 失败\n    CANCELLED = \"cancelled\"         # 已取消\n\n\nclass CrawlTask(Base):\n    \"\"\"爬取任务表\"\"\"\n    \n    __tablename__ = \"crawl_tasks\"\n    \n    # 主键\n    id = Column(Integer, primary_key=True, index=True, autoincrement=True)\n    \n    # 任务信息\n    celery_task_id = Column(String(255), unique=True, nullable=True, index=True, comment=\"Celery任务ID\")\n    mode = Column(String(20), nullable=False, index=True, comment=\"爬取模式\")\n    status = Column(String(20), nullable=False, default=TaskStatus.PENDING, index=True, comment=\"任务状态\")\n    \n    # 任务配置\n    source = Column(String(100), nullable=False, comment=\"新闻源\")\n    config = Column(JSON, nullable=True, comment=\"任务配置（JSON）\")\n    \n    # 执行进度\n    progress = Column(JSON, nullable=True, comment=\"进度信息\")\n    current_page = Column(Integer, nullable=True, comment=\"当前页码\")\n    total_pages = Column(Integer, nullable=True, comment=\"总页数\")\n    \n    # 执行结果\n    result = Column(JSON, nullable=True, comment=\"结果统计（JSON）\")\n    crawled_count = Column(Integer, default=0, comment=\"爬取到的新闻数\")\n    saved_count = Column(Integer, default=0, comment=\"保存到数据库的新闻数\")\n    error_message = Column(String(1000), nullable=True, comment=\"错误信息\")\n    \n    # 性能指标\n    execution_time = Column(Float, nullable=True, comment=\"执行时间（秒）\")\n    \n    # 时间戳\n    created_at = Column(DateTime, default=datetime.utcnow, nullable=False, comment=\"创建时间\")\n    started_at = Column(DateTime, nullable=True, comment=\"开始时间\")\n    completed_at = Column(DateTime, nullable=True, comment=\"完成时间\")\n    \n    def __repr__(self):\n        return f\"<CrawlTask(id={self.id}, mode='{self.mode}', source='{self.source}', status='{self.status}')>\"\n    \n    def to_dict(self):\n        \"\"\"转换为字典\"\"\"\n        return {\n            \"id\": self.id,\n            \"celery_task_id\": self.celery_task_id,\n            \"mode\": self.mode,\n            \"status\": self.status,\n            \"source\": self.source,\n            \"config\": self.config,\n            \"progress\": self.progress,\n            \"current_page\": self.current_page,\n            \"total_pages\": self.total_pages,\n            \"result\": self.result,\n            \"crawled_count\": self.crawled_count,\n            \"saved_count\": self.saved_count,\n            \"error_message\": self.error_message,\n            \"execution_time\": self.execution_time,\n            \"created_at\": self.created_at.isoformat() if self.created_at else None,\n            \"started_at\": self.started_at.isoformat() if self.started_at else None,\n            \"completed_at\": self.completed_at.isoformat() if self.completed_at else None,\n        }\n\n"
  },
  {
    "path": "backend/app/models/database.py",
    "content": "\"\"\"\n数据库连接和会话管理\n\"\"\"\nfrom typing import AsyncGenerator\nfrom sqlalchemy import create_engine\nfrom sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker\nfrom sqlalchemy.orm import sessionmaker, declarative_base, Session\n\nfrom ..core.config import settings\n\n# 声明基类\nBase = declarative_base()\n\n# 异步引擎（用于应用运行时）\nasync_engine = create_async_engine(\n    settings.DATABASE_URL,\n    echo=settings.DEBUG,\n    pool_pre_ping=True,\n    pool_size=10,\n    max_overflow=20,\n)\n\n# 异步会话工厂\nAsyncSessionLocal = async_sessionmaker(\n    bind=async_engine,\n    class_=AsyncSession,\n    expire_on_commit=False,\n    autocommit=False,\n    autoflush=False,\n)\n\n# 同步引擎（用于数据库初始化）\nsync_engine = create_engine(\n    settings.SYNC_DATABASE_URL,\n    echo=settings.DEBUG,\n    pool_pre_ping=True,\n)\n\n# 同步会话工厂\nSyncSessionLocal = sessionmaker(\n    bind=sync_engine,\n    autocommit=False,\n    autoflush=False,\n)\n\n\nasync def get_async_session() -> AsyncGenerator[AsyncSession, None]:\n    \"\"\"\n    异步数据库会话依赖注入\n    \n    Yields:\n        AsyncSession: 数据库会话\n    \"\"\"\n    async with AsyncSessionLocal() as session:\n        try:\n            yield session\n            await session.commit()\n        except Exception:\n            await session.rollback()\n            raise\n        finally:\n            await session.close()\n\n\ndef get_sync_session() -> Session:\n    \"\"\"\n    同步数据库会话（用于初始化脚本）\n    \n    Returns:\n        Session: 数据库会话\n    \"\"\"\n    session = SyncSessionLocal()\n    try:\n        yield session\n        session.commit()\n    except Exception:\n        session.rollback()\n        raise\n    finally:\n        session.close()\n\n\ndef init_db():\n    \"\"\"\n    初始化数据库表\n    在首次运行或重置数据库时调用\n    \"\"\"\n    from .news import News\n    from .stock import Stock\n    from .analysis import Analysis\n    \n    print(\"Creating database tables...\")\n    Base.metadata.create_all(bind=sync_engine)\n    print(\"Database tables created successfully!\")\n\n\nif __name__ == \"__main__\":\n    # 直接运行此文件以初始化数据库\n    init_db()\n\n"
  },
  {
    "path": "backend/app/models/debate_history.py",
    "content": "\"\"\"\n辩论历史数据模型\n\"\"\"\nfrom datetime import datetime\nfrom typing import List, Optional\nfrom sqlalchemy import Column, Integer, String, Text, DateTime, JSON, Index\n\nfrom .database import Base\n\n\nclass DebateHistory(Base):\n    \"\"\"辩论历史表模型\"\"\"\n    \n    __tablename__ = \"debate_histories\"\n    \n    # 主键\n    id = Column(Integer, primary_key=True, index=True, autoincrement=True)\n    \n    # 会话标识\n    session_id = Column(String(100), unique=True, nullable=False, index=True, comment=\"会话ID\")\n    \n    # 股票信息\n    stock_code = Column(String(20), nullable=False, index=True, comment=\"股票代码\")\n    stock_name = Column(String(100), nullable=True, comment=\"股票名称\")\n    \n    # 辩论模式\n    mode = Column(String(50), nullable=True, comment=\"辩论模式(parallel/realtime_debate/quick_analysis)\")\n    \n    # 聊天消息（JSON数组）\n    messages = Column(JSON, nullable=False, default=list, comment=\"聊天消息数组\")\n    \n    # 时间信息\n    created_at = Column(DateTime, default=datetime.utcnow, nullable=False, comment=\"创建时间\")\n    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, comment=\"更新时间\")\n    \n    # 索引\n    __table_args__ = (\n        # 按股票+时间查询\n        Index('idx_debate_stock_updated', 'stock_code', 'updated_at'),\n    )\n    \n    def __repr__(self):\n        return f\"<DebateHistory(id={self.id}, stock_code='{self.stock_code}', session_id='{self.session_id}')>\"\n    \n    def to_dict(self):\n        \"\"\"转换为字典\"\"\"\n        return {\n            \"id\": self.id,\n            \"session_id\": self.session_id,\n            \"stock_code\": self.stock_code,\n            \"stock_name\": self.stock_name,\n            \"mode\": self.mode,\n            \"messages\": self.messages,\n            \"created_at\": self.created_at.isoformat() if self.created_at else None,\n            \"updated_at\": self.updated_at.isoformat() if self.updated_at else None,\n        }\n\n"
  },
  {
    "path": "backend/app/models/news.py",
    "content": "\"\"\"\n新闻数据模型 - Phase 2 索引优化\n\"\"\"\nfrom datetime import datetime\nfrom typing import List, Optional\nfrom sqlalchemy import Column, Integer, String, Text, DateTime, Float, ARRAY, Index\nfrom sqlalchemy.orm import relationship\n\nfrom .database import Base\n\n\nclass News(Base):\n    \"\"\"新闻表模型 - Phase 2 优化版\"\"\"\n    \n    __tablename__ = \"news\"\n    \n    # 主键\n    id = Column(Integer, primary_key=True, index=True, autoincrement=True)\n    \n    # 基本信息\n    title = Column(String(500), nullable=False, index=True, comment=\"新闻标题\")\n    content = Column(Text, nullable=False, comment=\"新闻正文（解析后）\")\n    raw_html = Column(Text, nullable=True, comment=\"原始HTML内容\")\n    url = Column(String(1000), unique=True, nullable=False, index=True, comment=\"新闻URL\")\n    source = Column(String(100), nullable=False, index=True, comment=\"新闻来源（sina, jrj, cnstock等）\")\n    \n    # 时间信息\n    publish_time = Column(DateTime, nullable=True, index=True, comment=\"发布时间\")\n    created_at = Column(DateTime, default=datetime.utcnow, nullable=False, comment=\"爬取时间\")\n    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, comment=\"更新时间\")\n    \n    # 关联股票\n    stock_codes = Column(ARRAY(String), nullable=True, comment=\"关联的股票代码列表\")\n    \n    # 情感分析\n    sentiment_score = Column(Float, nullable=True, comment=\"情感评分（-1到1，负面到正面）\")\n    \n    # 其他元数据\n    author = Column(String(200), nullable=True, comment=\"作者\")\n    keywords = Column(ARRAY(String), nullable=True, comment=\"关键词\")\n    \n    # 向量化标识\n    is_embedded = Column(Integer, default=0, comment=\"是否已向量化（0:否, 1:是）\")\n    \n    # 关系\n    analyses = relationship(\"Analysis\", back_populates=\"news\", cascade=\"all, delete-orphan\")\n    \n    # Phase 2: 复合索引优化（提升常见查询性能）\n    __table_args__ = (\n        # 按来源+时间查询（最常用）\n        Index('idx_source_publish_time', 'source', 'publish_time'),\n        # 按情感+时间筛选\n        Index('idx_sentiment_publish_time', 'sentiment_score', 'publish_time'),\n    )\n    \n    def __repr__(self):\n        return f\"<News(id={self.id}, title='{self.title[:30]}...', source='{self.source}')>\"\n    \n    def to_dict(self, include_html: bool = False):\n        \"\"\"转换为字典\"\"\"\n        result = {\n            \"id\": self.id,\n            \"title\": self.title,\n            \"content\": self.content,\n            \"url\": self.url,\n            \"source\": self.source,\n            \"publish_time\": self.publish_time.isoformat() if self.publish_time else None,\n            \"created_at\": self.created_at.isoformat() if self.created_at else None,\n            \"stock_codes\": self.stock_codes,\n            \"sentiment_score\": self.sentiment_score,\n            \"author\": self.author,\n            \"keywords\": self.keywords,\n            \"has_raw_html\": self.raw_html is not None and len(self.raw_html or '') > 0,\n        }\n        if include_html and self.raw_html:\n            result[\"raw_html\"] = self.raw_html\n        return result\n\n"
  },
  {
    "path": "backend/app/models/stock.py",
    "content": "\"\"\"\n股票数据模型\n\"\"\"\nfrom datetime import datetime\nfrom sqlalchemy import Column, Integer, String, DateTime, Float\n\nfrom .database import Base\n\n\nclass Stock(Base):\n    \"\"\"股票基本信息表\"\"\"\n    \n    __tablename__ = \"stocks\"\n    \n    # 主键\n    id = Column(Integer, primary_key=True, index=True, autoincrement=True)\n    \n    # 股票基本信息\n    code = Column(String(20), unique=True, nullable=False, index=True, comment=\"股票代码（如：600519）\")\n    name = Column(String(100), nullable=False, comment=\"股票名称（如：贵州茅台）\")\n    full_code = Column(String(20), nullable=True, comment=\"完整代码（如：SH600519）\")\n    \n    # 分类信息\n    industry = Column(String(100), nullable=True, comment=\"所属行业\")\n    market = Column(String(20), nullable=True, comment=\"所属市场（SH:上海, SZ:深圳）\")\n    area = Column(String(50), nullable=True, comment=\"所属地区\")\n    \n    # 财务指标（可选，后续扩展）\n    pe_ratio = Column(Float, nullable=True, comment=\"市盈率\")\n    market_cap = Column(Float, nullable=True, comment=\"总市值\")\n    \n    # 状态\n    status = Column(String(20), default=\"active\", comment=\"状态（active, suspended, delisted）\")\n    \n    # 时间戳\n    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)\n    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)\n    \n    def __repr__(self):\n        return f\"<Stock(code='{self.code}', name='{self.name}')>\"\n    \n    def to_dict(self):\n        \"\"\"转换为字典\"\"\"\n        return {\n            \"id\": self.id,\n            \"code\": self.code,\n            \"name\": self.name,\n            \"full_code\": self.full_code,\n            \"industry\": self.industry,\n            \"market\": self.market,\n            \"area\": self.area,\n            \"pe_ratio\": self.pe_ratio,\n            \"market_cap\": self.market_cap,\n            \"status\": self.status,\n            \"created_at\": self.created_at.isoformat() if self.created_at else None,\n            \"updated_at\": self.updated_at.isoformat() if self.updated_at else None,\n        }\n\n"
  },
  {
    "path": "backend/app/scripts/init_stocks.py",
    "content": "\"\"\"\n初始化股票数据脚本\n从 akshare 获取全部 A 股信息并存入 PostgreSQL\n\n使用方法:\n    cd backend\n    python -m app.scripts.init_stocks\n\"\"\"\nimport asyncio\nimport logging\nimport os\nfrom datetime import datetime\nfrom pathlib import Path\n\n# ⚠️ 禁用代理（akshare 需要直连国内网站）\nfor proxy_var in ['http_proxy', 'https_proxy', 'HTTP_PROXY', 'HTTPS_PROXY', 'all_proxy', 'ALL_PROXY']:\n    os.environ.pop(proxy_var, None)\n\n# 设置日志\nlogging.basicConfig(\n    level=logging.INFO,\n    format='%(asctime)s - %(levelname)s - %(message)s'\n)\nlogger = logging.getLogger(__name__)\n\n# 加载 .env\nfrom dotenv import load_dotenv\nenv_path = Path(__file__).parent.parent.parent / \".env\"\nload_dotenv(env_path)\nlogger.info(f\"Loaded .env from: {env_path}\")\n\n# 构建数据库 URL\nDATABASE_URL = os.getenv(\"DATABASE_URL\", \"\")\n\nif not DATABASE_URL:\n    # 从分开的变量构建 DATABASE_URL\n    pg_user = os.getenv(\"POSTGRES_USER\", \"finnews\")\n    pg_password = os.getenv(\"POSTGRES_PASSWORD\", \"finnews_dev_password\")\n    pg_host = os.getenv(\"POSTGRES_HOST\", \"localhost\")\n    pg_port = os.getenv(\"POSTGRES_PORT\", \"5432\")\n    pg_db = os.getenv(\"POSTGRES_DB\", \"finnews_db\")\n    \n    DATABASE_URL = f\"postgresql+asyncpg://{pg_user}:{pg_password}@{pg_host}:{pg_port}/{pg_db}\"\n    logger.info(f\"Built DATABASE_URL from individual variables\")\n\nelif DATABASE_URL.startswith(\"postgresql://\"):\n    DATABASE_URL = DATABASE_URL.replace(\"postgresql://\", \"postgresql+asyncpg://\", 1)\n\nlogger.info(f\"Database: {DATABASE_URL.split('@')[-1] if '@' in DATABASE_URL else DATABASE_URL[:30]}...\")\n\n# 导入依赖\ntry:\n    import akshare as ak\n    import pandas as pd\n    AKSHARE_AVAILABLE = True\n    logger.info(\"akshare loaded successfully\")\nexcept ImportError:\n    AKSHARE_AVAILABLE = False\n    logger.error(\"akshare not installed! Run: pip install akshare\")\n    exit(1)\n\nfrom sqlalchemy import Column, Integer, String, DateTime, Float, text\nfrom sqlalchemy.ext.asyncio import create_async_engine, AsyncSession\nfrom sqlalchemy.orm import sessionmaker, declarative_base\n\nBase = declarative_base()\n\n\nclass Stock(Base):\n    \"\"\"股票基本信息表\"\"\"\n    __tablename__ = \"stocks\"\n    \n    id = Column(Integer, primary_key=True, index=True, autoincrement=True)\n    code = Column(String(20), unique=True, nullable=False, index=True)\n    name = Column(String(100), nullable=False)\n    full_code = Column(String(20), nullable=True)\n    industry = Column(String(100), nullable=True)\n    market = Column(String(20), nullable=True)\n    area = Column(String(50), nullable=True)\n    pe_ratio = Column(Float, nullable=True)\n    market_cap = Column(Float, nullable=True)\n    status = Column(String(20), default=\"active\")\n    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)\n    updated_at = Column(DateTime, default=datetime.utcnow)\n\n\ndef get_fallback_stocks() -> list:\n    \"\"\"备用股票列表（如果 akshare 失败时使用）\"\"\"\n    return [\n        {\"code\": \"600519\", \"name\": \"贵州茅台\", \"full_code\": \"SH600519\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"000001\", \"name\": \"平安银行\", \"full_code\": \"SZ000001\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"601318\", \"name\": \"中国平安\", \"full_code\": \"SH601318\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"000858\", \"name\": \"五粮液\", \"full_code\": \"SZ000858\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"002594\", \"name\": \"比亚迪\", \"full_code\": \"SZ002594\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"600036\", \"name\": \"招商银行\", \"full_code\": \"SH600036\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"601166\", \"name\": \"兴业银行\", \"full_code\": \"SH601166\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"000333\", \"name\": \"美的集团\", \"full_code\": \"SZ000333\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"002415\", \"name\": \"海康威视\", \"full_code\": \"SZ002415\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"600276\", \"name\": \"恒瑞医药\", \"full_code\": \"SH600276\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"000002\", \"name\": \"万科A\", \"full_code\": \"SZ000002\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"600887\", \"name\": \"伊利股份\", \"full_code\": \"SH600887\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"000725\", \"name\": \"京东方A\", \"full_code\": \"SZ000725\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"600000\", \"name\": \"浦发银行\", \"full_code\": \"SH600000\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"000063\", \"name\": \"中兴通讯\", \"full_code\": \"SZ000063\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"600104\", \"name\": \"上汽集团\", \"full_code\": \"SH600104\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"002304\", \"name\": \"洋河股份\", \"full_code\": \"SZ002304\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"600585\", \"name\": \"海螺水泥\", \"full_code\": \"SH600585\", \"market\": \"SH\", \"status\": \"active\"},\n        {\"code\": \"000876\", \"name\": \"新希望\", \"full_code\": \"SZ000876\", \"market\": \"SZ\", \"status\": \"active\"},\n        {\"code\": \"600309\", \"name\": \"万华化学\", \"full_code\": \"SH600309\", \"market\": \"SH\", \"status\": \"active\"},\n    ]\n\n\nasync def fetch_all_stocks() -> list:\n    \"\"\"从 akshare 获取全部 A 股信息\"\"\"\n    logger.info(\"Fetching all A-share stocks from akshare...\")\n    \n    # 设置 requests 不使用代理\n    import requests\n    session = requests.Session()\n    session.proxies = {\n        'http': None,\n        'https': None,\n    }\n    \n    # 设置 User-Agent\n    session.headers.update({\n        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'\n    })\n    \n    max_retries = 3\n    for attempt in range(max_retries):\n        try:\n            logger.info(f\"Attempt {attempt + 1}/{max_retries}...\")\n            \n            # 方法1: 尝试使用 stock_zh_a_spot_em\n            try:\n                df = ak.stock_zh_a_spot_em()\n            except Exception as e1:\n                logger.warning(f\"Method 1 failed: {e1}\")\n                # 方法2: 尝试使用 stock_info_a_code_name\n                try:\n                    logger.info(\"Trying alternative method: stock_info_a_code_name...\")\n                    df = ak.stock_info_a_code_name()\n                    if df is not None and not df.empty:\n                        # 重命名列\n                        df.columns = ['代码', '名称']\n                except Exception as e2:\n                    logger.warning(f\"Method 2 failed: {e2}\")\n                    raise e1  # 抛出第一个错误\n            \n            if df is None or df.empty:\n                logger.error(\"No data returned from akshare\")\n                if attempt < max_retries - 1:\n                    await asyncio.sleep(2)  # 等待2秒后重试\n                    continue\n                return []\n            \n            logger.info(f\"✅ Fetched {len(df)} stocks from akshare\")\n            \n            stocks = []\n            for _, row in df.iterrows():\n                code = str(row['代码'])\n                name = str(row['名称'])\n                \n                # 跳过异常数据\n                if not code or not name or name in ['N/A', 'nan', '']:\n                    continue\n                \n                # 确定市场前缀\n                if code.startswith('6'):\n                    market = \"SH\"\n                    full_code = f\"SH{code}\"\n                elif code.startswith('0') or code.startswith('3'):\n                    market = \"SZ\"\n                    full_code = f\"SZ{code}\"\n                else:\n                    market = \"OTHER\"\n                    full_code = code\n                \n                stocks.append({\n                    \"code\": code,\n                    \"name\": name,\n                    \"full_code\": full_code,\n                    \"market\": market,\n                    \"status\": \"active\",\n                })\n            \n            return stocks\n            \n        except Exception as e:\n            logger.error(f\"Attempt {attempt + 1} failed: {e}\")\n            if attempt < max_retries - 1:\n                wait_time = (attempt + 1) * 2\n                logger.info(f\"Waiting {wait_time} seconds before retry...\")\n                await asyncio.sleep(wait_time)\n            else:\n                logger.error(\"All attempts failed!\")\n                import traceback\n                traceback.print_exc()\n                return []\n    \n    return []\n\n\nasync def init_stocks_to_db():\n    \"\"\"初始化股票数据到数据库\"\"\"\n    # 创建数据库引擎\n    engine = create_async_engine(DATABASE_URL, echo=False)\n    async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)\n    \n    # 确保表存在\n    async with engine.begin() as conn:\n        await conn.run_sync(Base.metadata.create_all)\n    \n    # 获取股票数据\n    stocks_data = await fetch_all_stocks()\n    \n    if not stocks_data:\n        logger.warning(\"⚠️  Failed to fetch from akshare, using fallback stock list...\")\n        # 备用方案：导入常用股票\n        stocks_data = get_fallback_stocks()\n        if not stocks_data:\n            logger.error(\"No stocks to insert\")\n            await engine.dispose()\n            return\n        logger.info(f\"Using {len(stocks_data)} fallback stocks\")\n    \n    async with async_session() as session:\n        try:\n            # 清空现有数据\n            logger.info(\"Clearing existing stock data...\")\n            await session.execute(text(\"DELETE FROM stocks\"))\n            await session.commit()\n            \n            # 批量插入\n            logger.info(f\"Inserting {len(stocks_data)} stocks...\")\n            \n            batch_size = 500\n            for i in range(0, len(stocks_data), batch_size):\n                batch = stocks_data[i:i + batch_size]\n                for stock_data in batch:\n                    stock = Stock(\n                        code=stock_data[\"code\"],\n                        name=stock_data[\"name\"],\n                        full_code=stock_data[\"full_code\"],\n                        market=stock_data[\"market\"],\n                        status=stock_data[\"status\"],\n                        created_at=datetime.utcnow(),\n                        updated_at=datetime.utcnow(),\n                    )\n                    session.add(stock)\n                \n                await session.commit()\n                logger.info(f\"Inserted batch {i // batch_size + 1}, total: {min(i + batch_size, len(stocks_data))}/{len(stocks_data)}\")\n            \n            logger.info(f\"✅ Successfully initialized {len(stocks_data)} stocks!\")\n            \n        except Exception as e:\n            logger.error(f\"Failed to insert stocks: {e}\")\n            import traceback\n            traceback.print_exc()\n            await session.rollback()\n        finally:\n            await engine.dispose()\n\n\nasync def get_stock_count():\n    \"\"\"获取数据库中股票数量\"\"\"\n    engine = create_async_engine(DATABASE_URL, echo=False)\n    async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)\n    \n    async with async_session() as session:\n        result = await session.execute(text(\"SELECT COUNT(*) FROM stocks\"))\n        count = result.scalar() or 0\n        logger.info(f\"Current stock count in database: {count}\")\n        await engine.dispose()\n        return count\n\n\nasync def main():\n    print(\"=\" * 60)\n    print(\"🚀 Stock Data Initialization Script\")\n    print(\"=\" * 60)\n    \n    # 检查当前数量\n    try:\n        await get_stock_count()\n    except Exception as e:\n        logger.warning(f\"Could not get current count (table may not exist): {e}\")\n    \n    # 执行初始化\n    print(\"\\n📥 Starting initialization...\")\n    await init_stocks_to_db()\n    \n    # 再次检查\n    print(\"\\n📊 After initialization:\")\n    await get_stock_count()\n    \n    print(\"\\n✅ Done!\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n"
  },
  {
    "path": "backend/app/services/__init__.py",
    "content": "\"\"\"\n服务模块\n\"\"\"\nfrom .llm_service import get_llm_provider, get_llm_service, LLMService\nfrom .embedding_service import get_embedding_service, EmbeddingService\nfrom .analysis_service import get_analysis_service, AnalysisService\n\n__all__ = [\n    \"get_llm_provider\",\n    \"get_llm_service\",\n    \"LLMService\",\n    \"get_embedding_service\",\n    \"EmbeddingService\",\n    \"get_analysis_service\",\n    \"AnalysisService\",\n]\n\n"
  },
  {
    "path": "backend/app/services/analysis_service.py",
    "content": "\"\"\"\n新闻分析服务\n协调智能体执行分析任务\n\"\"\"\nimport logging\nimport time\nfrom typing import Dict, Any, Optional\nfrom sqlalchemy.ext.asyncio import AsyncSession\nfrom sqlalchemy import select\nfrom starlette.concurrency import run_in_threadpool\nfrom ..models.database import AsyncSessionLocal\n\nfrom ..agents import create_news_analyst\nfrom ..models.news import News\nfrom ..models.analysis import Analysis\nfrom ..services.embedding_service import get_embedding_service\nfrom ..storage.vector_storage import get_vector_storage\n\nlogger = logging.getLogger(__name__)\n\n\nclass AnalysisService:\n    \"\"\"\n    新闻分析服务\n    负责协调智能体执行新闻分析任务\n    \"\"\"\n    \n    def __init__(self):\n        \"\"\"初始化分析服务\"\"\"\n        self.news_analyst = create_news_analyst()\n        self.embedding_service = get_embedding_service()\n        self.vector_storage = get_vector_storage()\n        logger.info(\"Initialized AnalysisService\")\n    \n    async def analyze_news(\n        self,\n        news_id: int,\n        db: AsyncSession,\n        llm_provider: Optional[str] = None,\n        llm_model: Optional[str] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        分析指定新闻\n        \n        Args:\n            news_id: 新闻ID\n            db: 数据库会话\n            llm_provider: 模型厂商（可选：bailian, openai, deepseek, kimi）\n            llm_model: 模型名称（可选）\n            \n        Returns:\n            分析结果\n        \"\"\"\n        start_time = time.time()\n        \n        # 如果指定了自定义模型，创建临时的智能体\n        if llm_provider and llm_model:\n            from ..services.llm_service import create_custom_llm_provider\n            from ..agents.news_analyst import NewsAnalystAgent\n            \n            logger.info(f\"Using custom model: {llm_provider}/{llm_model}\")\n            custom_llm = create_custom_llm_provider(llm_provider, llm_model)\n            analyst = NewsAnalystAgent(llm_provider=custom_llm)\n        else:\n            analyst = self.news_analyst\n        \n        try:\n            # 1. 查询新闻\n            result = await db.execute(\n                select(News).where(News.id == news_id)\n            )\n            news = result.scalar_one_or_none()\n            \n            if not news:\n                return {\n                    \"success\": False,\n                    \"error\": f\"News not found: {news_id}\"\n                }\n            \n            logger.info(f\"Analyzing news: {news_id} - {news.title}\")\n            \n            # 2. 执行智能体分析\n            # 注意：由于 agent.analyze_news 是同步方法，需要在线程池中运行以避免阻塞异步事件循环\n            analysis_result = await run_in_threadpool(\n                analyst.analyze_news,  # 使用 analyst（可能是自定义的或默认的）\n                news_title=news.title,\n                news_content=news.content,\n                news_url=news.url,\n                stock_codes=news.stock_codes or []\n            )\n            \n            if not analysis_result.get(\"success\"):\n                return analysis_result\n            \n            # 3. 保存分析结果到数据库\n            structured_data = analysis_result.get(\"structured_data\", {})\n            \n            analysis = Analysis(\n                news_id=news_id,\n                agent_name=analysis_result.get(\"agent_name\"),\n                agent_role=analysis_result.get(\"agent_role\"),\n                analysis_result=analysis_result.get(\"analysis_result\", \"\"),\n                summary=structured_data.get(\"market_impact\", \"\")[:500],\n                sentiment=structured_data.get(\"sentiment\"),\n                sentiment_score=structured_data.get(\"sentiment_score\"),\n                confidence=structured_data.get(\"confidence\"),\n                structured_data=structured_data,\n                execution_time=time.time() - start_time,\n                llm_model=f\"{llm_provider}/{llm_model}\" if llm_provider and llm_model else (analyst._llm_provider.model if hasattr(analyst, '_llm_provider') and hasattr(analyst._llm_provider, 'model') else None),\n            )\n            \n            db.add(analysis)\n            \n            # 4. 更新新闻的情感评分\n            news.sentiment_score = structured_data.get(\"sentiment_score\")\n            \n            # 5. 向量化新闻内容（如果尚未向量化）\n            # 注意：embedding是可选功能，失败不应影响分析结果\n            # 在后台异步执行，不阻塞分析流程\n            if not news.is_embedded:\n                # 使用 asyncio.create_task 在后台执行，不等待结果\n                # 这样即使embedding超时或失败，也不会影响分析结果的返回\n                import asyncio\n                \n                async def vectorize_in_background():\n                    try:\n                        # 组合标题和内容进行向量化\n                        text_to_embed = f\"{news.title}\\n{news.content[:1000]}\"\n                        \n                        # 使用异步方法，避免事件循环问题\n                        embedding = await asyncio.wait_for(\n                            self.embedding_service.aembed_text(text_to_embed),\n                            timeout=20.0  # 20秒超时，避免等待太久\n                        )\n                        \n                        # 存储到 Milvus（也在线程池中执行）\n                        await run_in_threadpool(\n                            self.vector_storage.store_embedding,\n                            news_id=news_id,\n                            embedding=embedding,\n                            text=text_to_embed\n                        )\n                        \n                        # 更新数据库中的is_embedded标志（需要新的数据库会话）\n                        async with AsyncSessionLocal() as update_db:\n                            try:\n                                result = await update_db.execute(\n                                    select(News).where(News.id == news_id)\n                                )\n                                update_news = result.scalar_one_or_none()\n                                if update_news:\n                                    update_news.is_embedded = 1\n                                    await update_db.commit()\n                                    logger.info(f\"Vectorized news: {news_id}\")\n                            except Exception as e:\n                                logger.warning(f\"Failed to update is_embedded flag for news {news_id}: {e}\")\n                                await update_db.rollback()\n                    except asyncio.TimeoutError:\n                        logger.warning(f\"Embedding timeout for news {news_id} (20s), skipping vectorization\")\n                    except Exception as e:\n                        logger.warning(f\"Failed to vectorize news {news_id}: {e}\")\n                \n                # 在后台执行，不等待完成\n                asyncio.create_task(vectorize_in_background())\n            \n            await db.commit()\n            await db.refresh(analysis)\n            \n            logger.info(f\"Analysis completed for news {news_id}, execution time: {analysis.execution_time:.2f}s\")\n            \n            return {\n                \"success\": True,\n                \"analysis_id\": analysis.id,\n                \"news_id\": news_id,\n                \"sentiment\": analysis.sentiment,\n                \"sentiment_score\": analysis.sentiment_score,\n                \"confidence\": analysis.confidence,\n                \"summary\": analysis.summary,\n                \"execution_time\": analysis.execution_time,\n            }\n        \n        except Exception as e:\n            logger.error(f\"Analysis failed for news {news_id}: {e}\")\n            await db.rollback()\n            return {\n                \"success\": False,\n                \"error\": str(e)\n            }\n    \n    async def get_analysis_by_id(\n        self,\n        analysis_id: int,\n        db: AsyncSession\n    ) -> Optional[Dict[str, Any]]:\n        \"\"\"\n        获取分析结果\n        \n        Args:\n            analysis_id: 分析ID\n            db: 数据库会话\n            \n        Returns:\n            分析结果或None\n        \"\"\"\n        try:\n            result = await db.execute(\n                select(Analysis).where(Analysis.id == analysis_id)\n            )\n            analysis = result.scalar_one_or_none()\n            \n            if analysis:\n                return analysis.to_dict()\n            return None\n        \n        except Exception as e:\n            logger.error(f\"Failed to get analysis {analysis_id}: {e}\")\n            return None\n    \n    async def get_analyses_by_news_id(\n        self,\n        news_id: int,\n        db: AsyncSession\n    ) -> list:\n        \"\"\"\n        获取指定新闻的所有分析结果（按时间倒序，最新的在前）\n        \n        Args:\n            news_id: 新闻ID\n            db: 数据库会话\n            \n        Returns:\n            分析结果列表（最新的在前）\n        \"\"\"\n        try:\n            from sqlalchemy import desc\n            \n            result = await db.execute(\n                select(Analysis)\n                .where(Analysis.news_id == news_id)\n                .order_by(desc(Analysis.created_at))  # 按创建时间倒序，最新的在前\n            )\n            analyses = result.scalars().all()\n            \n            return [analysis.to_dict() for analysis in analyses]\n        \n        except Exception as e:\n            logger.error(f\"Failed to get analyses for news {news_id}: {e}\")\n            return []\n\n\n# 全局实例\n_analysis_service: Optional[AnalysisService] = None\n\n\ndef get_analysis_service() -> AnalysisService:\n    \"\"\"\n    获取分析服务实例（单例模式）\n    \n    Returns:\n        AnalysisService 实例\n    \"\"\"\n    global _analysis_service\n    if _analysis_service is None:\n        _analysis_service = AnalysisService()\n    return _analysis_service\n\n"
  },
  {
    "path": "backend/app/services/embedding_service.py",
    "content": "\"\"\"\nEmbedding 服务封装\n使用 agenticx.embeddings.BailianEmbeddingProvider\n\"\"\"\nimport logging\nimport asyncio\nfrom typing import List, Optional\nimport redis\nimport hashlib\nimport json\n\nfrom ..core.config import settings\nfrom agenticx.embeddings import BailianEmbeddingProvider\n\nlogger = logging.getLogger(__name__)\n\n\nclass EmbeddingService:\n    \"\"\"\n    Embedding 服务封装类\n    基于 agenticx.embeddings.BailianEmbeddingProvider\n    提供文本向量化功能，支持缓存\n    \"\"\"\n    \n    def __init__(\n        self,\n        provider: str = None,\n        model: str = None,\n        batch_size: int = None,\n        enable_cache: bool = True,\n        base_url: str = None,\n    ):\n        \"\"\"\n        初始化 Embedding 服务\n        \n        Args:\n            provider: 提供商（保留参数以兼容，实际使用 bailian）\n            model: 模型名称\n            batch_size: 批处理大小\n            enable_cache: 是否启用Redis缓存\n            base_url: 自定义 API 端点（用于百炼等第三方服务）\n        \"\"\"\n        self.provider = provider or settings.EMBEDDING_PROVIDER\n        self.model = model or settings.EMBEDDING_MODEL\n        self.batch_size = batch_size or settings.EMBEDDING_BATCH_SIZE\n        self.enable_cache = enable_cache\n        self.base_url = base_url or settings.EMBEDDING_BASE_URL\n        \n        # 获取 API Key\n        api_key = settings.DASHSCOPE_API_KEY\n        if not api_key:\n            # 如果没有 DASHSCOPE_API_KEY，尝试使用 OPENAI_API_KEY（向后兼容）\n            api_key = settings.OPENAI_API_KEY\n            if not api_key:\n                raise ValueError(\"DASHSCOPE_API_KEY or OPENAI_API_KEY is required for embedding\")\n        \n        # 设置 API URL\n        api_url = self.base_url or settings.DASHSCOPE_BASE_URL or \"https://dashscope.aliyuncs.com/compatible-mode/v1\"\n        \n        # 初始化 agenticx BailianEmbeddingProvider\n        self.provider_instance = BailianEmbeddingProvider(\n            api_key=api_key,\n            model=self.model,\n            api_url=api_url,\n            batch_size=self.batch_size,\n            timeout=settings.EMBEDDING_TIMEOUT,\n            retry_count=settings.EMBEDDING_MAX_RETRIES,\n            dimensions=settings.MILVUS_DIM,  # 确保维度匹配\n            use_dashscope_sdk=False  # 使用 HTTP API，避免 SDK 依赖问题\n        )\n        \n        logger.info(f\"Initialized BailianEmbeddingProvider: {self.model}, dimension={self.provider_instance.get_embedding_dim()}\")\n        \n        # 初始化Redis缓存\n        if self.enable_cache:\n            try:\n                self.redis_client = redis.from_url(settings.REDIS_URL)\n                self.cache_ttl = 86400 * 7  # 7天\n                logger.info(\"Redis cache enabled for embeddings\")\n            except Exception as e:\n                logger.warning(f\"Failed to connect to Redis, cache disabled: {e}\")\n                self.enable_cache = False\n    \n    def _get_cache_key(self, text: str) -> str:\n        \"\"\"生成缓存键\"\"\"\n        # 使用文本的MD5哈希和模型名称作为键\n        text_hash = hashlib.md5(text.encode()).hexdigest()\n        return f\"embedding:{self.model}:{text_hash}\"\n    \n    def _get_from_cache(self, text: str) -> Optional[List[float]]:\n        \"\"\"从缓存获取向量\"\"\"\n        if not self.enable_cache:\n            return None\n        \n        try:\n            cache_key = self._get_cache_key(text)\n            cached = self.redis_client.get(cache_key)\n            if cached:\n                return json.loads(cached)\n        except Exception as e:\n            logger.warning(f\"Failed to get from cache: {e}\")\n        \n        return None\n    \n    def _save_to_cache(self, text: str, embedding: List[float]):\n        \"\"\"保存向量到缓存\"\"\"\n        if not self.enable_cache:\n            return\n        \n        try:\n            cache_key = self._get_cache_key(text)\n            self.redis_client.setex(\n                cache_key,\n                self.cache_ttl,\n                json.dumps(embedding)\n            )\n        except Exception as e:\n            logger.warning(f\"Failed to save to cache: {e}\")\n    \n    def embed_text(self, text: str) -> List[float]:\n        \"\"\"\n        将文本转换为向量\n        \n        Args:\n            text: 文本\n            \n        Returns:\n            向量（List[float]）\n        \"\"\"\n        # 检查缓存\n        cached = self._get_from_cache(text)\n        if cached is not None:\n            return cached\n        \n        # 限制文本长度（避免超过模型限制）\n        max_length = 6000\n        if len(text) > max_length:\n            logger.warning(f\"Text too long ({len(text)} chars), truncating to {max_length} chars\")\n            text = text[:max_length]\n        \n        # 生成向量（使用 agenticx provider）\n        # 注意：embed() 方法内部使用 asyncio.run()，在同步上下文中可以直接调用\n        # 如果在异步上下文中调用此同步方法，应该在 ThreadPoolExecutor 中运行\n        try:\n            # 直接调用 embed()，它内部会使用 asyncio.run() 创建新的事件循环\n            # 这在同步上下文中可以正常工作\n            # 如果在异步上下文中，调用者应该在 ThreadPoolExecutor 中运行此方法\n            embeddings = self.provider_instance.embed([text])\n            embedding = embeddings[0] if embeddings else []\n            \n            # 保存到缓存\n            self._save_to_cache(text, embedding)\n            \n            return embedding\n        \n        except Exception as e:\n            logger.error(f\"Embedding failed for text: {text[:100]}..., error: {e}\")\n            raise\n    \n    def embed_batch(self, texts: List[str]) -> List[List[float]]:\n        \"\"\"\n        批量将文本转换为向量\n        \n        Args:\n            texts: 文本列表\n            \n        Returns:\n            向量列表\n        \"\"\"\n        if not texts:\n            return []\n        \n        # 检查缓存并分离需要处理的文本\n        embeddings_map = {}  # {index: embedding}\n        texts_to_embed = []  # [(index, text), ...]\n        \n        max_length = 6000\n        for idx, text in enumerate(texts):\n            # 检查缓存\n            cached = self._get_from_cache(text)\n            if cached is not None:\n                embeddings_map[idx] = cached\n            else:\n                # 限制文本长度\n                if len(text) > max_length:\n                    logger.warning(f\"Text too long ({len(text)} chars), truncating to {max_length} chars\")\n                    text = text[:max_length]\n                texts_to_embed.append((idx, text))\n        \n        # 对未缓存的文本批量生成向量\n        # 注意：BailianEmbeddingProvider.embed() 内部已经会分批处理，不需要我们再次分批\n        if texts_to_embed:\n            try:\n                texts_list = [t[1] for t in texts_to_embed]\n                # 直接调用 embed()，它内部会使用 asyncio.run() 创建新的事件循环\n                # BailianEmbeddingProvider 内部会根据 batch_size 自动分批处理\n                new_embeddings = self.provider_instance.embed(texts_list)\n                \n                # 保存到缓存并添加到结果\n                for (idx, text), embedding in zip(texts_to_embed, new_embeddings):\n                    self._save_to_cache(text, embedding)\n                    embeddings_map[idx] = embedding\n            \n            except Exception as e:\n                logger.error(f\"Batch embedding failed: {e}\")\n                raise\n        \n        # 按原始顺序返回结果\n        return [embeddings_map.get(i, []) for i in range(len(texts))]\n    \n    async def aembed_text(self, text: str) -> List[float]:\n        \"\"\"\n        异步将文本转换为向量（推荐在异步上下文中使用）\n        \n        Args:\n            text: 文本\n            \n        Returns:\n            向量（List[float]）\n        \"\"\"\n        # 检查缓存\n        cached = self._get_from_cache(text)\n        if cached is not None:\n            return cached\n        \n        # 限制文本长度（避免超过模型限制）\n        max_length = 6000\n        if len(text) > max_length:\n            logger.warning(f\"Text too long ({len(text)} chars), truncating to {max_length} chars\")\n            text = text[:max_length]\n        \n        # 使用异步接口，避免 asyncio.run() 的问题\n        try:\n            embeddings = await self.provider_instance.aembed([text])\n            embedding = embeddings[0] if embeddings else []\n            \n            # 保存到缓存\n            self._save_to_cache(text, embedding)\n            \n            return embedding\n        \n        except Exception as e:\n            logger.error(f\"Embedding failed for text: {text[:100]}..., error: {e}\")\n            raise\n    \n    async def aembed_batch(self, texts: List[str]) -> List[List[float]]:\n        \"\"\"\n        异步批量将文本转换为向量（推荐在异步上下文中使用）\n        \n        Args:\n            texts: 文本列表\n            \n        Returns:\n            向量列表\n        \"\"\"\n        if not texts:\n            return []\n        \n        # 检查缓存并分离需要处理的文本\n        embeddings_map = {}  # {index: embedding}\n        texts_to_embed = []  # [(index, text), ...]\n        \n        max_length = 6000\n        for idx, text in enumerate(texts):\n            # 检查缓存\n            cached = self._get_from_cache(text)\n            if cached is not None:\n                embeddings_map[idx] = cached\n            else:\n                # 限制文本长度\n                if len(text) > max_length:\n                    logger.warning(f\"Text too long ({len(text)} chars), truncating to {max_length} chars\")\n                    text = text[:max_length]\n                texts_to_embed.append((idx, text))\n        \n        # 对未缓存的文本批量生成向量\n        # BailianEmbeddingProvider.aembed() 内部已经会分批处理\n        if texts_to_embed:\n            try:\n                texts_list = [t[1] for t in texts_to_embed]\n                # 使用异步接口，避免 asyncio.run() 的问题\n                new_embeddings = await self.provider_instance.aembed(texts_list)\n                \n                # 保存到缓存并添加到结果\n                for (idx, text), embedding in zip(texts_to_embed, new_embeddings):\n                    self._save_to_cache(text, embedding)\n                    embeddings_map[idx] = embedding\n            \n            except Exception as e:\n                logger.error(f\"Batch embedding failed: {e}\")\n                raise\n        \n        # 按原始顺序返回结果\n        return [embeddings_map.get(i, []) for i in range(len(texts))]\n\n\n# 全局实例\n_embedding_service: Optional[EmbeddingService] = None\n\n\ndef get_embedding_service() -> EmbeddingService:\n    \"\"\"\n    获取 Embedding 服务实例（单例模式）\n    \n    Returns:\n        EmbeddingService 实例\n    \"\"\"\n    global _embedding_service\n    if _embedding_service is None:\n        _embedding_service = EmbeddingService()\n    return _embedding_service\n"
  },
  {
    "path": "backend/app/services/llm_service.py",
    "content": "\"\"\"\nLLM 服务封装\n\"\"\"\nimport logging\nfrom typing import Optional, Dict, Any, Union\nfrom agenticx import LiteLLMProvider, LLMResponse\nfrom agenticx.llms.bailian_provider import BailianProvider\n\nfrom ..core.config import settings\n\nlogger = logging.getLogger(__name__)\n\n\nclass LLMService:\n    \"\"\"\n    LLM 服务封装类\n    提供统一的 LLM 调用接口\n    \"\"\"\n    \n    def __init__(\n        self,\n        provider: str = None,\n        model: str = None,\n        temperature: float = None,\n        max_tokens: int = None,\n        api_key: str = None,\n        base_url: str = None,\n    ):\n        \"\"\"\n        初始化 LLM 服务\n        \n        Args:\n            provider: 提供商（openai, anthropic, ollama）\n            model: 模型名称\n            temperature: 温度参数\n            max_tokens: 最大token数\n            api_key: API密钥\n            base_url: 自定义 API 端点（用于第三方转发）\n        \"\"\"\n        self.provider_name = provider or settings.LLM_PROVIDER\n        self.model = model or settings.LLM_MODEL\n        self.temperature = temperature or settings.LLM_TEMPERATURE\n        self.max_tokens = max_tokens or settings.LLM_MAX_TOKENS\n        \n        # 设置API密钥\n        if api_key:\n            self.api_key = api_key\n        elif self.provider_name == \"bailian\":\n            self.api_key = settings.DASHSCOPE_API_KEY or settings.BAILIAN_API_KEY\n        elif self.provider_name == \"openai\":\n            self.api_key = settings.OPENAI_API_KEY\n        elif self.provider_name == \"deepseek\":\n            self.api_key = settings.DEEPSEEK_API_KEY\n        elif self.provider_name == \"kimi\":\n            self.api_key = settings.MOONSHOT_API_KEY\n        elif self.provider_name == \"zhipu\":\n            self.api_key = settings.ZHIPU_API_KEY\n        elif self.provider_name == \"anthropic\":\n            self.api_key = settings.ANTHROPIC_API_KEY\n        else:\n            self.api_key = None\n        \n        # 设置 Base URL（用于第三方 API 转发）\n        if base_url:\n            self.base_url = base_url\n        elif self.provider_name == \"bailian\":\n            self.base_url = settings.DASHSCOPE_BASE_URL\n        elif self.provider_name == \"openai\":\n            self.base_url = settings.OPENAI_BASE_URL\n        elif self.provider_name == \"deepseek\":\n            self.base_url = settings.DEEPSEEK_BASE_URL or \"https://api.deepseek.com/v1\"\n        elif self.provider_name == \"kimi\":\n            self.base_url = settings.MOONSHOT_BASE_URL or \"https://api.moonshot.cn/v1\"\n        elif self.provider_name == \"zhipu\":\n            self.base_url = settings.ZHIPU_BASE_URL or \"https://open.bigmodel.cn/api/paas/v4\"\n        elif self.provider_name == \"anthropic\":\n            self.base_url = settings.ANTHROPIC_BASE_URL\n        else:\n            self.base_url = None\n        \n        # 创建 LLM 提供者\n        self.llm_provider = self._create_provider()\n    \n    def _create_provider(self) -> Union[LiteLLMProvider, BailianProvider]:\n        \"\"\"创建 LLM 提供者\"\"\"\n        try:\n            # 检测是否使用 Dashscope/Bailian API\n            is_dashscope = (\n                self.base_url and \"dashscope\" in self.base_url.lower()\n            ) or (\n                self.model and self.model.startswith(\"qwen\") and self.base_url\n            )\n            \n            if is_dashscope:\n                # 使用 BailianProvider（专门为百炼 API 设计）\n                if not self.api_key:\n                    raise ValueError(\"API key is required for Bailian provider\")\n                \n                provider = BailianProvider(\n                    model=self.model,\n                    api_key=self.api_key,\n                    base_url=self.base_url or \"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n                    temperature=self.temperature,\n                    timeout=float(settings.LLM_TIMEOUT),  # 从配置读取超时时间\n                    max_retries=2   # 减少重试次数，避免总耗时过长\n                )\n                logger.info(f\"Initialized BailianProvider: {self.model}\")\n                return provider\n            else:\n                # 使用 LiteLLMProvider（通用 provider）\n                provider_kwargs = {\n                    \"model\": self.model,\n                    \"temperature\": self.temperature,\n                    \"max_tokens\": self.max_tokens,\n                    \"api_key\": self.api_key,\n                }\n                \n                # 如果设置了自定义 base_url，添加到配置中\n                if self.base_url:\n                    provider_kwargs[\"base_url\"] = self.base_url\n                    logger.info(f\"Using custom base URL: {self.base_url}\")\n                \n                provider = LiteLLMProvider(**provider_kwargs)\n                logger.info(f\"Initialized LiteLLMProvider: {self.provider_name}/{self.model}\")\n                return provider\n        except Exception as e:\n            logger.error(f\"Failed to initialize LLM provider: {e}\")\n            raise\n    \n    def generate(\n        self,\n        prompt: str,\n        system_message: Optional[str] = None,\n        **kwargs\n    ) -> str:\n        \"\"\"\n        生成文本\n        \n        Args:\n            prompt: 用户提示\n            system_message: 系统消息\n            **kwargs: 额外参数\n            \n        Returns:\n            生成的文本\n        \"\"\"\n        try:\n            messages = []\n            \n            if system_message:\n                messages.append({\"role\": \"system\", \"content\": system_message})\n            \n            messages.append({\"role\": \"user\", \"content\": prompt})\n            \n            # 确保传递 max_tokens（如果 kwargs 中没有）\n            if \"max_tokens\" not in kwargs:\n                kwargs[\"max_tokens\"] = self.max_tokens\n            \n            response: LLMResponse = self.llm_provider.generate(\n                messages=messages,\n                **kwargs\n            )\n            \n            return response.content\n        \n        except Exception as e:\n            logger.error(f\"LLM generation failed: {e}\")\n            raise\n    \n    def analyze_sentiment(self, text: str) -> Dict[str, Any]:\n        \"\"\"\n        分析文本情感\n        \n        Args:\n            text: 待分析文本\n            \n        Returns:\n            情感分析结果\n        \"\"\"\n        system_message = \"\"\"你是一个专业的金融新闻情感分析专家。\n请分析给定新闻的情感倾向，判断其对相关股票的影响是利好、利空还是中性。\n\n输出格式（JSON）：\n{\n    \"sentiment\": \"positive/negative/neutral\",\n    \"score\": 0.0-1.0（情感强度）,\n    \"confidence\": 0.0-1.0（置信度）,\n    \"reasoning\": \"分析理由\"\n}\n\"\"\"\n        \n        prompt = f\"\"\"请分析以下新闻的情感倾向：\n\n{text[:1000]}\n\n请严格按照JSON格式输出结果。\"\"\"\n        \n        try:\n            response_text = self.generate(prompt, system_message)\n            \n            # 尝试解析JSON\n            import json\n            import re\n            \n            # 提取JSON部分\n            json_match = re.search(r'\\{.*\\}', response_text, re.DOTALL)\n            if json_match:\n                result = json.loads(json_match.group())\n                return result\n            else:\n                # 如果无法解析，返回默认值\n                return {\n                    \"sentiment\": \"neutral\",\n                    \"score\": 0.5,\n                    \"confidence\": 0.5,\n                    \"reasoning\": response_text\n                }\n        \n        except Exception as e:\n            logger.error(f\"Sentiment analysis failed: {e}\")\n            return {\n                \"sentiment\": \"neutral\",\n                \"score\": 0.5,\n                \"confidence\": 0.0,\n                \"reasoning\": f\"分析失败: {str(e)}\"\n            }\n    \n    def summarize(self, text: str, max_length: int = 200) -> str:\n        \"\"\"\n        文本摘要\n        \n        Args:\n            text: 原始文本\n            max_length: 摘要最大长度\n            \n        Returns:\n            摘要文本\n        \"\"\"\n        system_message = f\"\"\"你是一个专业的金融新闻摘要专家。\n请将给定的新闻内容总结为不超过{max_length}字的简洁摘要，保留关键信息。\"\"\"\n        \n        prompt = f\"\"\"请总结以下新闻：\n\n{text}\n\n摘要：\"\"\"\n        \n        try:\n            summary = self.generate(prompt, system_message, max_tokens=max_length)\n            return summary.strip()\n        except Exception as e:\n            logger.error(f\"Summarization failed: {e}\")\n            return text[:max_length] + \"...\"\n\n\n# 全局实例\n_llm_service: Optional[LLMService] = None\n\n\ndef get_llm_provider(\n    provider: Optional[str] = None,\n    model: Optional[str] = None\n) -> Union[LiteLLMProvider, BailianProvider]:\n    \"\"\"\n    获取 LLM 提供者实例（用于 AgenticX Agent）\n    \n    Args:\n        provider: 可选的提供商名称（如 openai, bailian, ollama）\n        model: 可选的模型名称\n    \n    Returns:\n        LiteLLMProvider 或 BailianProvider 实例\n    \"\"\"\n    global _llm_service\n    \n    # 如果指定了 provider 或 model，创建新的实例\n    if provider or model:\n        custom_service = LLMService(provider=provider, model=model)\n        return custom_service.llm_provider\n    \n    # 否则使用全局实例\n    if _llm_service is None:\n        _llm_service = LLMService()\n    return _llm_service.llm_provider\n\n\ndef get_llm_service() -> LLMService:\n    \"\"\"\n    获取 LLM 服务实例\n    \n    Returns:\n        LLMService 实例\n    \"\"\"\n    global _llm_service\n    if _llm_service is None:\n        _llm_service = LLMService()\n    return _llm_service\n\n\ndef create_custom_llm_provider(\n    provider: Optional[str] = None,\n    model: Optional[str] = None,\n    temperature: Optional[float] = None,\n    max_tokens: Optional[int] = None,\n    api_key: Optional[str] = None,\n    base_url: Optional[str] = None,\n) -> Union[LiteLLMProvider, BailianProvider]:\n    \"\"\"\n    动态创建自定义 LLM provider（用于模型切换）\n    \n    Args:\n        provider: 厂商名称（bailian, openai, deepseek, kimi, zhipu）\n        model: 模型名称\n        temperature: 温度参数\n        max_tokens: 最大token数\n        api_key: API Key（可选，优先从settings读取）\n        base_url: Base URL（可选，优先从settings读取）\n    \n    Returns:\n        LLM provider 实例\n    \n    Examples:\n        >>> llm = create_custom_llm_provider('bailian', 'qwen-max')\n        >>> llm = create_custom_llm_provider('openai', 'gpt-4')\n        >>> llm = create_custom_llm_provider('zhipu', 'glm-4')\n    \"\"\"\n    _provider = provider or settings.LLM_PROVIDER\n    _model = model or settings.LLM_MODEL\n    _temperature = temperature if temperature is not None else settings.LLM_TEMPERATURE\n    _max_tokens = max_tokens if max_tokens is not None else settings.LLM_MAX_TOKENS\n    \n    logger.info(f\"Creating custom LLM provider: {_provider}/{_model}\")\n    \n    try:\n        if _provider == 'bailian':\n            # 使用阿里云百炼（通过 OpenAI 兼容接口）\n            _api_key = api_key or settings.DASHSCOPE_API_KEY or settings.BAILIAN_API_KEY\n            if not _api_key:\n                raise ValueError(\"DASHSCOPE_API_KEY or BAILIAN_API_KEY is required for bailian provider\")\n            \n            _base_url = base_url or settings.DASHSCOPE_BASE_URL\n            return BailianProvider(\n                model=_model,\n                api_key=_api_key,\n                base_url=_base_url,\n                access_key_id=settings.BAILIAN_ACCESS_KEY_ID,\n                access_key_secret=settings.BAILIAN_ACCESS_KEY_SECRET,\n                agent_code=settings.BAILIAN_AGENT_CODE,\n                region_id=settings.BAILIAN_REGION_ID,\n                temperature=_temperature,\n                max_tokens=_max_tokens,\n                timeout=float(settings.LLM_TIMEOUT),  # 从配置读取超时时间\n                max_retries=2  # 减少重试次数，避免总耗时过长\n            )\n        \n        elif _provider == 'openai':\n            # 使用 OpenAI\n            _api_key = api_key or settings.OPENAI_API_KEY\n            if not _api_key:\n                raise ValueError(\"OPENAI_API_KEY is required for openai provider\")\n            \n            _base_url = base_url or settings.OPENAI_BASE_URL\n            return LiteLLMProvider(\n                provider=\"openai\",\n                model=_model,\n                api_key=_api_key,\n                base_url=_base_url,\n                temperature=_temperature,\n                max_tokens=_max_tokens\n            )\n        \n        elif _provider == 'deepseek':\n            # 使用 DeepSeek（通过 OpenAI 兼容接口）\n            _api_key = api_key or settings.DEEPSEEK_API_KEY\n            if not _api_key:\n                raise ValueError(\"DEEPSEEK_API_KEY is required for deepseek provider\")\n            \n            _base_url = base_url or settings.DEEPSEEK_BASE_URL or 'https://api.deepseek.com/v1'\n            return LiteLLMProvider(\n                provider=\"openai\",\n                model=_model,\n                api_key=_api_key,\n                base_url=_base_url,\n                temperature=_temperature,\n                max_tokens=_max_tokens\n            )\n        \n        elif _provider == 'kimi':\n            # 使用 Kimi (Moonshot)\n            _api_key = api_key or settings.MOONSHOT_API_KEY\n            if not _api_key:\n                raise ValueError(\"MOONSHOT_API_KEY is required for kimi provider\")\n            \n            _base_url = base_url or settings.MOONSHOT_BASE_URL or 'https://api.moonshot.cn/v1'\n            return LiteLLMProvider(\n                provider=\"openai\",\n                model=_model,\n                api_key=_api_key,\n                base_url=_base_url,\n                temperature=_temperature,\n                max_tokens=_max_tokens\n            )\n        \n        elif _provider == 'zhipu':\n            # 使用智谱 AI\n            _api_key = api_key or settings.ZHIPU_API_KEY\n            if not _api_key:\n                raise ValueError(\"ZHIPU_API_KEY is required for zhipu provider\")\n            \n            _base_url = base_url or settings.ZHIPU_BASE_URL or 'https://open.bigmodel.cn/api/paas/v4'\n            return LiteLLMProvider(\n                provider=\"openai\",\n                model=_model,\n                api_key=_api_key,\n                base_url=_base_url,\n                temperature=_temperature,\n                max_tokens=_max_tokens\n            )\n        \n        else:\n            logger.warning(f\"Unsupported provider: {_provider}, falling back to default\")\n            return get_llm_provider()\n    \n    except ValueError as e:\n        logger.error(f\"Configuration error: {e}\")\n        raise\n    except Exception as e:\n        logger.error(f\"Failed to create custom LLM provider: {e}\", exc_info=True)\n        # 降级到默认 provider\n        return get_llm_provider()\n\n"
  },
  {
    "path": "backend/app/services/stock_data_service.py",
    "content": "\"\"\"\n股票数据服务 - 使用 akshare 获取真实股票数据\n\"\"\"\nimport logging\nfrom datetime import datetime, timedelta\nfrom typing import List, Optional, Dict, Any\nfrom functools import lru_cache\nimport asyncio\n\nlogger = logging.getLogger(__name__)\n\n# 尝试导入 akshare\ntry:\n    import akshare as ak\n    import pandas as pd\n    AKSHARE_AVAILABLE = True\nexcept ImportError:\n    AKSHARE_AVAILABLE = False\n    logger.warning(\"akshare not installed, using mock data\")\n\n\nclass StockDataService:\n    \"\"\"股票数据服务 - 封装 akshare 接口\"\"\"\n    \n    # 缓存过期时间（秒）\n    CACHE_TTL = 300  # 5分钟\n    CACHE_TTL_MINUTE = 60  # 分钟级数据缓存1分钟\n    \n    # 股票代码前缀映射\n    MARKET_PREFIX = {\n        \"sh\": \"6\",     # 上海 60xxxx\n        \"sz\": \"0\",     # 深圳 00xxxx, 30xxxx\n        \"sz3\": \"3\",    # 创业板 30xxxx\n    }\n    \n    # 周期映射\n    PERIOD_MAP = {\n        \"1m\": \"1\",      # 1分钟\n        \"5m\": \"5\",      # 5分钟\n        \"15m\": \"15\",    # 15分钟\n        \"30m\": \"30\",    # 30分钟\n        \"60m\": \"60\",    # 60分钟/1小时\n        \"1h\": \"60\",     # 1小时（别名）\n        \"daily\": \"daily\",  # 日线\n        \"1d\": \"daily\",     # 日线（别名）\n    }\n    \n    def __init__(self):\n        self._cache: Dict[str, tuple] = {}  # {key: (data, timestamp)}\n    \n    def _normalize_code(self, stock_code: str) -> str:\n        \"\"\"\n        标准化股票代码，返回纯数字代码\n        支持格式: SH600519, sh600519, 600519\n        \"\"\"\n        code = stock_code.upper().strip()\n        if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n            return code[2:]\n        return code\n    \n    def _get_symbol(self, stock_code: str) -> str:\n        \"\"\"\n        获取 akshare 使用的股票代码格式\n        akshare stock_zh_a_hist 需要纯数字代码\n        \"\"\"\n        return self._normalize_code(stock_code)\n    \n    def _is_cache_valid(self, key: str, ttl: int = None) -> bool:\n        \"\"\"检查缓存是否有效\"\"\"\n        if key not in self._cache:\n            return False\n        _, timestamp = self._cache[key]\n        cache_ttl = ttl if ttl is not None else self.CACHE_TTL\n        # 修复bug: 使用 total_seconds() 而不是 seconds\n        # seconds 只返回秒数部分(0-86399)，不包括天数\n        return (datetime.now() - timestamp).total_seconds() < cache_ttl\n    \n    def _get_cached(self, key: str, ttl: int = None) -> Optional[Any]:\n        \"\"\"获取缓存数据\"\"\"\n        if self._is_cache_valid(key, ttl):\n            return self._cache[key][0]\n        # 清理过期缓存\n        if key in self._cache:\n            del self._cache[key]\n        return None\n    \n    def _set_cache(self, key: str, data: Any):\n        \"\"\"设置缓存\"\"\"\n        self._cache[key] = (data, datetime.now())\n    \n    def clear_cache(self, pattern: str = None):\n        \"\"\"\n        清除缓存\n        Args:\n            pattern: 可选的缓存键模式，如果提供则只清除匹配的缓存\n        \"\"\"\n        if pattern:\n            keys_to_delete = [k for k in self._cache.keys() if pattern in k]\n            for key in keys_to_delete:\n                del self._cache[key]\n            logger.info(f\"🧹 Cleared {len(keys_to_delete)} cache entries matching pattern: {pattern}\")\n        else:\n            count = len(self._cache)\n            self._cache.clear()\n            logger.info(f\"🧹 Cleared all {count} cache entries\")\n    \n    async def get_kline_data(\n        self,\n        stock_code: str,\n        period: str = \"daily\",  # daily, 1m, 5m, 15m, 30m, 60m\n        limit: int = 90,  # 数据条数\n        adjust: str = \"qfq\"  # qfq=前复权, hfq=后复权, \"\"=不复权\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        获取K线数据（支持日线和分钟级数据）\n        \n        Args:\n            stock_code: 股票代码\n            period: 周期 (daily, 1m, 5m, 15m, 30m, 60m)\n            limit: 返回数据条数\n            adjust: 复权类型（仅日线有效）\n            \n        Returns:\n            K线数据列表，每条包含: timestamp, open, high, low, close, volume, turnover\n        \"\"\"\n        # 标准化周期\n        period_key = self.PERIOD_MAP.get(period, period)\n        cache_key = f\"kline:{stock_code}:{period}:{limit}:{adjust}\"\n        \n        # 根据周期使用不同的缓存TTL：日线5分钟，分钟级1分钟\n        cache_ttl = self.CACHE_TTL if period_key == \"daily\" else self.CACHE_TTL_MINUTE\n        cached = self._get_cached(cache_key, ttl=cache_ttl)\n        if cached:\n            latest_date = cached[-1].get('date', 'unknown') if cached else 'empty'\n            logger.info(f\"🔵 Cache hit for {cache_key}, latest date: {latest_date}, count: {len(cached)}\")\n            return cached\n        \n        logger.info(f\"🔴 Cache miss for {cache_key}, fetching fresh data...\")\n        \n        if not AKSHARE_AVAILABLE:\n            logger.warning(\"akshare not available, returning mock data\")\n            return self._generate_mock_kline(stock_code, limit)\n        \n        try:\n            symbol = self._get_symbol(stock_code)\n            loop = asyncio.get_event_loop()\n            \n            if period_key == \"daily\":\n                # 日线数据\n                kline_data = await self._fetch_daily_kline(symbol, limit, adjust, loop)\n            else:\n                # 分钟级数据\n                kline_data = await self._fetch_minute_kline(symbol, period_key, limit, loop)\n            \n            if not kline_data:\n                logger.warning(f\"⚠️ No valid data after parsing for {stock_code} period={period}, using mock data\")\n                return self._generate_mock_kline(stock_code, limit)\n            \n            # 记录最新数据的日期和价格，便于调试\n            latest = kline_data[-1]\n            logger.info(f\"✅ Successfully fetched {len(kline_data)} kline records for {stock_code} period={period}, latest: {latest['date']}, close: {latest['close']}\")\n            \n            self._set_cache(cache_key, kline_data)\n            return kline_data\n            \n        except Exception as e:\n            logger.error(f\"❌ Failed to fetch kline data for {stock_code}: {type(e).__name__}: {e}\", exc_info=True)\n            # 只在某些特定错误时返回mock数据，其他错误应该抛出\n            if \"NaTType\" in str(e) or \"timestamp\" in str(e).lower():\n                logger.warning(f\"Data parsing error, this should not happen after fix. Returning empty list.\")\n                return []\n            # 网络错误或API错误才返回mock数据\n            return self._generate_mock_kline(stock_code, limit)\n    \n    async def _fetch_daily_kline(\n        self, \n        symbol: str, \n        limit: int, \n        adjust: str,\n        loop\n    ) -> List[Dict[str, Any]]:\n        \"\"\"获取日线数据\"\"\"\n        end_date = datetime.now()\n        # 多获取一些天数，确保有足够数据（考虑周末和节假日，约1个交易日=1.5个自然日）\n        # limit * 1.6 能确保获取到足够的交易日数据\n        start_date = end_date - timedelta(days=int(limit * 1.6))\n        \n        logger.info(f\"📊 Calling akshare API: symbol={symbol}, start={start_date.strftime('%Y%m%d')}, end={end_date.strftime('%Y%m%d')}, adjust={adjust}\")\n        \n        df = await loop.run_in_executor(\n            None,\n            lambda: ak.stock_zh_a_hist(\n                symbol=symbol,\n                start_date=start_date.strftime(\"%Y%m%d\"),\n                end_date=end_date.strftime(\"%Y%m%d\"),\n                adjust=adjust\n            )\n        )\n        \n        logger.info(f\"✅ Akshare returned {len(df) if df is not None and not df.empty else 0} rows\")\n        \n        if df is None or df.empty:\n            return []\n        \n        # 清理数据：移除日期为NaT的行\n        df = df.dropna(subset=['日期'])\n        \n        # 只取最近 limit 条数据\n        df = df.tail(limit)\n        \n        # 转换为标准格式\n        kline_data = []\n        for _, row in df.iterrows():\n            try:\n                # 处理日期\n                date_val = row['日期']\n                if pd.isna(date_val):\n                    logger.warning(f\"Skipping row with NaT date\")\n                    continue\n                    \n                if isinstance(date_val, str):\n                    dt = datetime.strptime(date_val, \"%Y-%m-%d\")\n                    date_str = date_val\n                else:\n                    dt = pd.to_datetime(date_val)\n                    if pd.isna(dt):\n                        logger.warning(f\"Skipping row with invalid date\")\n                        continue\n                    date_str = dt.strftime(\"%Y-%m-%d\")\n                \n                timestamp = int(dt.timestamp() * 1000)\n                \n                kline_data.append({\n                    \"timestamp\": timestamp,\n                    \"date\": date_str,\n                    \"open\": float(row['开盘']),\n                    \"high\": float(row['最高']),\n                    \"low\": float(row['最低']),\n                    \"close\": float(row['收盘']),\n                    \"volume\": int(row['成交量']),\n                    \"turnover\": float(row.get('成交额', 0)),\n                    \"change_percent\": float(row.get('涨跌幅', 0)),\n                    \"change_amount\": float(row.get('涨跌额', 0)),\n                    \"amplitude\": float(row.get('振幅', 0)),\n                    \"turnover_rate\": float(row.get('换手率', 0)),\n                })\n            except Exception as e:\n                logger.warning(f\"Failed to parse row, skipping: {e}\")\n                continue\n        \n        # 记录数据范围\n        if kline_data:\n            logger.info(f\"✅ Parsed {len(kline_data)} valid records, date range: {kline_data[0]['date']} to {kline_data[-1]['date']}\")\n        \n        return kline_data\n    \n    async def _fetch_minute_kline(\n        self, \n        symbol: str, \n        period: str,  # \"1\", \"5\", \"15\", \"30\", \"60\"\n        limit: int,\n        loop\n    ) -> List[Dict[str, Any]]:\n        \"\"\"获取分钟级数据\"\"\"\n        df = await loop.run_in_executor(\n            None,\n            lambda: ak.stock_zh_a_hist_min_em(\n                symbol=symbol,\n                period=period,\n                adjust=\"\"\n            )\n        )\n        \n        if df is None or df.empty:\n            return []\n        \n        # 清理数据：移除时间为NaT的行\n        df = df.dropna(subset=['时间'])\n        \n        # 只取最近 limit 条数据\n        df = df.tail(limit)\n        \n        # 转换为标准格式\n        kline_data = []\n        for _, row in df.iterrows():\n            try:\n                # 处理时间\n                time_val = row['时间']\n                if pd.isna(time_val):\n                    logger.warning(f\"Skipping row with NaT time\")\n                    continue\n                \n                time_str = str(time_val)\n                try:\n                    dt = datetime.strptime(time_str, \"%Y-%m-%d %H:%M:%S\")\n                except:\n                    dt = pd.to_datetime(time_val)\n                    if pd.isna(dt):\n                        logger.warning(f\"Skipping row with invalid time\")\n                        continue\n                    time_str = dt.strftime(\"%Y-%m-%d %H:%M:%S\")\n                \n                timestamp = int(dt.timestamp() * 1000)\n                \n                kline_data.append({\n                    \"timestamp\": timestamp,\n                    \"date\": time_str,\n                    \"open\": float(row['开盘']),\n                    \"high\": float(row['最高']),\n                    \"low\": float(row['最低']),\n                    \"close\": float(row['收盘']),\n                    \"volume\": int(row['成交量']),\n                    \"turnover\": float(row.get('成交额', 0)),\n                    \"change_percent\": 0,  # 分钟数据可能没有涨跌幅\n                    \"change_amount\": 0,\n                    \"amplitude\": 0,\n                    \"turnover_rate\": 0,\n                })\n            except Exception as e:\n                logger.warning(f\"Failed to parse minute row, skipping: {e}\")\n                continue\n        \n        # 记录数据范围\n        if kline_data:\n            logger.info(f\"✅ Parsed {len(kline_data)} valid minute records, time range: {kline_data[0]['date']} to {kline_data[-1]['date']}\")\n        \n        return kline_data\n    \n    async def get_realtime_quote(self, stock_code: str) -> Optional[Dict[str, Any]]:\n        \"\"\"\n        获取实时行情\n        \n        Returns:\n            实时行情数据\n        \"\"\"\n        cache_key = f\"realtime:{stock_code}\"\n        cached = self._get_cached(cache_key)\n        if cached:\n            return cached\n        \n        if not AKSHARE_AVAILABLE:\n            return None\n        \n        try:\n            symbol = self._get_symbol(stock_code)\n            \n            loop = asyncio.get_event_loop()\n            df = await loop.run_in_executor(\n                None,\n                lambda: ak.stock_zh_a_spot_em()\n            )\n            \n            if df is None or df.empty:\n                return None\n            \n            # 根据股票代码筛选\n            row = df[df['代码'] == symbol]\n            if row.empty:\n                return None\n            \n            row = row.iloc[0]\n            quote = {\n                \"code\": symbol,\n                \"name\": row.get('名称', ''),\n                \"price\": float(row.get('最新价', 0)),\n                \"change_percent\": float(row.get('涨跌幅', 0)),\n                \"change_amount\": float(row.get('涨跌额', 0)),\n                \"volume\": int(row.get('成交量', 0)),\n                \"turnover\": float(row.get('成交额', 0)),\n                \"high\": float(row.get('最高', 0)),\n                \"low\": float(row.get('最低', 0)),\n                \"open\": float(row.get('今开', 0)),\n                \"prev_close\": float(row.get('昨收', 0)),\n            }\n            \n            self._set_cache(cache_key, quote)\n            return quote\n            \n        except Exception as e:\n            logger.error(f\"Failed to fetch realtime quote for {stock_code}: {e}\")\n            return None\n    \n    async def search_stocks(\n        self,\n        keyword: str,\n        limit: int = 20\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        搜索股票（通过代码或名称模糊匹配）\n        \n        Args:\n            keyword: 搜索关键词\n            limit: 返回数量限制\n            \n        Returns:\n            股票列表\n        \"\"\"\n        cache_key = f\"search:{keyword}:{limit}\"\n        cached = self._get_cached(cache_key)\n        if cached:\n            return cached\n        \n        if not AKSHARE_AVAILABLE:\n            return self._get_mock_stock_list(keyword, limit)\n        \n        try:\n            loop = asyncio.get_event_loop()\n            \n            # 获取全部 A 股实时行情（包含代码和名称）\n            df = await loop.run_in_executor(\n                None,\n                lambda: ak.stock_zh_a_spot_em()\n            )\n            \n            if df is None or df.empty:\n                return self._get_mock_stock_list(keyword, limit)\n            \n            # 模糊匹配代码或名称\n            keyword_upper = keyword.upper()\n            mask = (\n                df['代码'].str.contains(keyword_upper, na=False) |\n                df['名称'].str.contains(keyword, na=False)\n            )\n            matched = df[mask].head(limit)\n            \n            results = []\n            for _, row in matched.iterrows():\n                code = str(row['代码'])\n                # 确定市场前缀\n                if code.startswith('6'):\n                    full_code = f\"SH{code}\"\n                elif code.startswith('0') or code.startswith('3'):\n                    full_code = f\"SZ{code}\"\n                else:\n                    full_code = code\n                \n                results.append({\n                    \"code\": code,\n                    \"name\": str(row['名称']),\n                    \"full_code\": full_code,\n                    \"price\": float(row.get('最新价', 0)) if pd.notna(row.get('最新价')) else 0,\n                    \"change_percent\": float(row.get('涨跌幅', 0)) if pd.notna(row.get('涨跌幅')) else 0,\n                })\n            \n            self._set_cache(cache_key, results)\n            return results\n            \n        except Exception as e:\n            logger.error(f\"Failed to search stocks: {e}\")\n            return self._get_mock_stock_list(keyword, limit)\n    \n    def _get_mock_stock_list(self, keyword: str, limit: int) -> List[Dict[str, Any]]:\n        \"\"\"返回模拟股票列表\"\"\"\n        mock_stocks = [\n            {\"code\": \"600519\", \"name\": \"贵州茅台\", \"full_code\": \"SH600519\", \"price\": 1420.0, \"change_percent\": 0.5},\n            {\"code\": \"000001\", \"name\": \"平安银行\", \"full_code\": \"SZ000001\", \"price\": 12.0, \"change_percent\": -0.3},\n            {\"code\": \"601318\", \"name\": \"中国平安\", \"full_code\": \"SH601318\", \"price\": 45.0, \"change_percent\": 0.2},\n            {\"code\": \"000858\", \"name\": \"五粮液\", \"full_code\": \"SZ000858\", \"price\": 150.0, \"change_percent\": 1.1},\n            {\"code\": \"002594\", \"name\": \"比亚迪\", \"full_code\": \"SZ002594\", \"price\": 250.0, \"change_percent\": -0.8},\n            {\"code\": \"600036\", \"name\": \"招商银行\", \"full_code\": \"SH600036\", \"price\": 35.0, \"change_percent\": 0.1},\n            {\"code\": \"601166\", \"name\": \"兴业银行\", \"full_code\": \"SH601166\", \"price\": 18.0, \"change_percent\": 0.3},\n            {\"code\": \"000333\", \"name\": \"美的集团\", \"full_code\": \"SZ000333\", \"price\": 65.0, \"change_percent\": 0.6},\n            {\"code\": \"002415\", \"name\": \"海康威视\", \"full_code\": \"SZ002415\", \"price\": 32.0, \"change_percent\": -0.5},\n            {\"code\": \"600276\", \"name\": \"恒瑞医药\", \"full_code\": \"SH600276\", \"price\": 42.0, \"change_percent\": 0.4},\n        ]\n        \n        keyword_lower = keyword.lower()\n        filtered = [\n            s for s in mock_stocks\n            if keyword_lower in s[\"code\"].lower() or keyword_lower in s[\"name\"].lower()\n        ]\n        return filtered[:limit]\n    \n    async def get_stock_info(self, stock_code: str) -> Optional[Dict[str, Any]]:\n        \"\"\"\n        获取股票基本信息\n        \"\"\"\n        if not AKSHARE_AVAILABLE:\n            return None\n        \n        try:\n            symbol = self._get_symbol(stock_code)\n            \n            loop = asyncio.get_event_loop()\n            df = await loop.run_in_executor(\n                None,\n                lambda: ak.stock_individual_info_em(symbol=symbol)\n            )\n            \n            if df is None or df.empty:\n                return None\n            \n            # 转换为字典\n            info = {}\n            for _, row in df.iterrows():\n                info[row['item']] = row['value']\n            \n            return info\n            \n        except Exception as e:\n            logger.error(f\"Failed to fetch stock info for {stock_code}: {e}\")\n            return None\n    \n    def _generate_mock_kline(self, stock_code: str, days: int) -> List[Dict[str, Any]]:\n        \"\"\"\n        生成模拟K线数据（当 akshare 不可用时使用）\n        \"\"\"\n        import random\n        \n        # 根据股票代码设定基准价格\n        base_prices = {\n            \"600519\": 1500.0,  # 贵州茅台\n            \"000001\": 12.0,    # 平安银行\n            \"601318\": 45.0,    # 中国平安\n            \"000858\": 150.0,   # 五粮液\n            \"002594\": 250.0,   # 比亚迪\n        }\n        \n        code = self._normalize_code(stock_code)\n        base_price = base_prices.get(code, 50.0)\n        current_price = base_price\n        \n        kline_data = []\n        for i in range(days):\n            dt = datetime.now() - timedelta(days=days - i - 1)\n            # 跳过周末\n            if dt.weekday() >= 5:\n                continue\n                \n            timestamp = int(dt.timestamp() * 1000)\n            date_str = dt.strftime(\"%Y-%m-%d\")\n            \n            # 随机波动\n            change_percent = random.uniform(-3, 3)\n            open_price = current_price\n            close_price = current_price * (1 + change_percent / 100)\n            high_price = max(open_price, close_price) * (1 + random.uniform(0, 1.5) / 100)\n            low_price = min(open_price, close_price) * (1 - random.uniform(0, 1.5) / 100)\n            volume = random.randint(50000, 500000)\n            turnover = volume * close_price\n            \n            kline_data.append({\n                \"timestamp\": timestamp,\n                \"date\": date_str,\n                \"open\": round(open_price, 2),\n                \"high\": round(high_price, 2),\n                \"low\": round(low_price, 2),\n                \"close\": round(close_price, 2),\n                \"volume\": volume,\n                \"turnover\": round(turnover, 2),\n                \"change_percent\": round(change_percent, 2),\n                \"change_amount\": round(close_price - open_price, 2),\n                \"amplitude\": round((high_price - low_price) / open_price * 100, 2),\n                \"turnover_rate\": round(random.uniform(0.5, 5), 2),\n            })\n            \n            current_price = close_price\n        \n        return kline_data[-days:] if len(kline_data) > days else kline_data\n    \n    async def get_financial_indicators(self, stock_code: str) -> Optional[Dict[str, Any]]:\n        \"\"\"\n        获取股票财务指标（用于辩论分析）\n        \n        包括：PE、PB、ROE、净利润增长率等\n        \n        Args:\n            stock_code: 股票代码\n            \n        Returns:\n            财务指标字典\n        \"\"\"\n        cache_key = f\"financial:{stock_code}\"\n        cached = self._get_cached(cache_key, ttl=3600)  # 财务数据缓存1小时\n        if cached:\n            return cached\n        \n        if not AKSHARE_AVAILABLE:\n            logger.warning(\"akshare not available, returning mock financial data\")\n            return self._get_mock_financial_indicators(stock_code)\n        \n        try:\n            symbol = self._get_symbol(stock_code)\n            loop = asyncio.get_event_loop()\n            \n            # 方法1：从实时行情获取基础估值数据\n            spot_df = await loop.run_in_executor(\n                None,\n                lambda: ak.stock_zh_a_spot_em()\n            )\n            \n            financial_data = {}\n            \n            if spot_df is not None and not spot_df.empty:\n                row = spot_df[spot_df['代码'] == symbol]\n                if not row.empty:\n                    row = row.iloc[0]\n                    financial_data.update({\n                        \"pe_ratio\": self._safe_float(row.get('市盈率-动态')),\n                        \"pb_ratio\": self._safe_float(row.get('市净率')),\n                        \"total_market_value\": self._safe_float(row.get('总市值')),\n                        \"circulating_market_value\": self._safe_float(row.get('流通市值')),\n                        \"turnover_rate\": self._safe_float(row.get('换手率')),\n                        \"volume_ratio\": self._safe_float(row.get('量比')),\n                        \"amplitude\": self._safe_float(row.get('振幅')),\n                        \"price_52w_high\": self._safe_float(row.get('52周最高')),\n                        \"price_52w_low\": self._safe_float(row.get('52周最低')),\n                    })\n            \n            # 方法2：尝试获取更详细的财务摘要\n            try:\n                financial_abstract = await loop.run_in_executor(\n                    None,\n                    lambda: ak.stock_financial_abstract_ths(symbol=symbol)\n                )\n                \n                if financial_abstract is not None and not financial_abstract.empty:\n                    # 取最新一期数据\n                    latest = financial_abstract.iloc[0] if len(financial_abstract) > 0 else None\n                    if latest is not None:\n                        financial_data.update({\n                            \"roe\": self._safe_float(latest.get('净资产收益率')),\n                            \"gross_profit_margin\": self._safe_float(latest.get('毛利率')),\n                            \"net_profit_margin\": self._safe_float(latest.get('净利率')),\n                            \"debt_ratio\": self._safe_float(latest.get('资产负债率')),\n                            \"revenue_yoy\": self._safe_float(latest.get('营业总收入同比增长率')),\n                            \"profit_yoy\": self._safe_float(latest.get('净利润同比增长率')),\n                        })\n            except Exception as e:\n                logger.debug(f\"Failed to fetch financial abstract for {stock_code}: {e}\")\n            \n            if financial_data:\n                self._set_cache(cache_key, financial_data)\n                return financial_data\n            \n            return self._get_mock_financial_indicators(stock_code)\n            \n        except Exception as e:\n            logger.error(f\"Failed to fetch financial indicators for {stock_code}: {e}\")\n            return self._get_mock_financial_indicators(stock_code)\n    \n    def _safe_float(self, value, default=None) -> Optional[float]:\n        \"\"\"安全转换为浮点数\"\"\"\n        if value is None or (isinstance(value, float) and pd.isna(value)):\n            return default\n        try:\n            return float(value)\n        except (ValueError, TypeError):\n            return default\n    \n    def _get_mock_financial_indicators(self, stock_code: str) -> Dict[str, Any]:\n        \"\"\"返回模拟财务指标\"\"\"\n        return {\n            \"pe_ratio\": 25.5,\n            \"pb_ratio\": 3.2,\n            \"roe\": 15.8,\n            \"total_market_value\": 100000000000,  # 1000亿\n            \"circulating_market_value\": 80000000000,\n            \"turnover_rate\": 2.5,\n            \"gross_profit_margin\": 45.2,\n            \"net_profit_margin\": 22.1,\n            \"debt_ratio\": 35.5,\n            \"revenue_yoy\": 12.5,\n            \"profit_yoy\": 18.3,\n        }\n    \n    async def get_fund_flow(self, stock_code: str, days: int = 5) -> Optional[Dict[str, Any]]:\n        \"\"\"\n        获取个股资金流向（用于辩论分析）\n        \n        包括：主力资金净流入、散户资金流向等\n        \n        Args:\n            stock_code: 股票代码\n            days: 获取最近几天的数据\n            \n        Returns:\n            资金流向数据\n        \"\"\"\n        cache_key = f\"fund_flow:{stock_code}:{days}\"\n        cached = self._get_cached(cache_key, ttl=300)  # 资金流向缓存5分钟\n        if cached:\n            return cached\n        \n        if not AKSHARE_AVAILABLE:\n            logger.warning(\"akshare not available, returning mock fund flow data\")\n            return self._get_mock_fund_flow(stock_code)\n        \n        try:\n            symbol = self._get_symbol(stock_code)\n            loop = asyncio.get_event_loop()\n            \n            # 获取个股资金流向\n            df = await loop.run_in_executor(\n                None,\n                lambda: ak.stock_individual_fund_flow(stock=symbol, market=\"sh\" if symbol.startswith(\"6\") else \"sz\")\n            )\n            \n            if df is None or df.empty:\n                return self._get_mock_fund_flow(stock_code)\n            \n            # 取最近几天的数据\n            df = df.head(days)\n            \n            # 汇总数据\n            total_main_net = 0\n            total_super_large_net = 0\n            total_large_net = 0\n            total_medium_net = 0\n            total_small_net = 0\n            daily_flows = []\n            \n            for _, row in df.iterrows():\n                main_net = self._safe_float(row.get('主力净流入-净额'), 0)\n                super_large_net = self._safe_float(row.get('超大单净流入-净额'), 0)\n                large_net = self._safe_float(row.get('大单净流入-净额'), 0)\n                medium_net = self._safe_float(row.get('中单净流入-净额'), 0)\n                small_net = self._safe_float(row.get('小单净流入-净额'), 0)\n                \n                total_main_net += main_net\n                total_super_large_net += super_large_net\n                total_large_net += large_net\n                total_medium_net += medium_net\n                total_small_net += small_net\n                \n                daily_flows.append({\n                    \"date\": str(row.get('日期', '')),\n                    \"main_net\": main_net,\n                    \"super_large_net\": super_large_net,\n                    \"large_net\": large_net,\n                    \"medium_net\": medium_net,\n                    \"small_net\": small_net,\n                })\n            \n            fund_flow_data = {\n                \"period_days\": days,\n                \"total_main_net\": total_main_net,\n                \"total_super_large_net\": total_super_large_net,\n                \"total_large_net\": total_large_net,\n                \"total_medium_net\": total_medium_net,\n                \"total_small_net\": total_small_net,\n                \"main_flow_trend\": \"流入\" if total_main_net > 0 else \"流出\",\n                \"daily_flows\": daily_flows,\n            }\n            \n            self._set_cache(cache_key, fund_flow_data)\n            return fund_flow_data\n            \n        except Exception as e:\n            logger.error(f\"Failed to fetch fund flow for {stock_code}: {e}\")\n            return self._get_mock_fund_flow(stock_code)\n    \n    def _get_mock_fund_flow(self, stock_code: str) -> Dict[str, Any]:\n        \"\"\"返回模拟资金流向数据\"\"\"\n        return {\n            \"period_days\": 5,\n            \"total_main_net\": 50000000,  # 5000万\n            \"total_super_large_net\": 30000000,\n            \"total_large_net\": 20000000,\n            \"total_medium_net\": -5000000,\n            \"total_small_net\": -10000000,\n            \"main_flow_trend\": \"流入\",\n            \"daily_flows\": [],\n        }\n    \n    async def get_debate_context(self, stock_code: str) -> Dict[str, Any]:\n        \"\"\"\n        获取用于辩论的综合上下文数据\n        \n        整合财务指标、资金流向、实时行情等信息\n        \n        Args:\n            stock_code: 股票代码\n            \n        Returns:\n            综合上下文数据\n        \"\"\"\n        # 并行获取多个数据源\n        realtime_task = self.get_realtime_quote(stock_code)\n        financial_task = self.get_financial_indicators(stock_code)\n        fund_flow_task = self.get_fund_flow(stock_code, days=5)\n        \n        realtime, financial, fund_flow = await asyncio.gather(\n            realtime_task, financial_task, fund_flow_task,\n            return_exceptions=True\n        )\n        \n        # 处理异常\n        if isinstance(realtime, Exception):\n            logger.error(f\"Failed to get realtime quote: {realtime}\")\n            realtime = None\n        if isinstance(financial, Exception):\n            logger.error(f\"Failed to get financial indicators: {financial}\")\n            financial = None\n        if isinstance(fund_flow, Exception):\n            logger.error(f\"Failed to get fund flow: {fund_flow}\")\n            fund_flow = None\n        \n        # 生成文本摘要\n        context_parts = []\n        \n        if realtime:\n            context_parts.append(\n                f\"【实时行情】当前价: {realtime.get('price', 'N/A')}元, \"\n                f\"涨跌幅: {realtime.get('change_percent', 'N/A')}%, \"\n                f\"成交量: {realtime.get('volume', 'N/A')}\"\n            )\n        \n        if financial:\n            pe = financial.get('pe_ratio')\n            pb = financial.get('pb_ratio')\n            roe = financial.get('roe')\n            profit_yoy = financial.get('profit_yoy')\n            context_parts.append(\n                f\"【估值指标】PE: {pe if pe else 'N/A'}, PB: {pb if pb else 'N/A'}, \"\n                f\"ROE: {roe if roe else 'N/A'}%, 净利润同比: {profit_yoy if profit_yoy else 'N/A'}%\"\n            )\n        \n        if fund_flow:\n            main_net = fund_flow.get('total_main_net', 0)\n            main_net_str = f\"{main_net/10000:.2f}万\" if abs(main_net) < 100000000 else f\"{main_net/100000000:.2f}亿\"\n            context_parts.append(\n                f\"【资金流向】近{fund_flow.get('period_days', 5)}日主力净{fund_flow.get('main_flow_trend', 'N/A')}: {main_net_str}\"\n            )\n        \n        return {\n            \"realtime\": realtime,\n            \"financial\": financial,\n            \"fund_flow\": fund_flow,\n            \"summary\": \"\\n\".join(context_parts) if context_parts else \"暂无额外数据\",\n        }\n\n\n# 单例实例\nstock_data_service = StockDataService()\n\n"
  },
  {
    "path": "backend/app/storage/__init__.py",
    "content": "\"\"\"\n存储模块\n\"\"\"\nfrom .vector_storage import VectorStorage\n\n__all__ = [\"VectorStorage\"]\n\n"
  },
  {
    "path": "backend/app/storage/vector_storage.py",
    "content": "\"\"\"\n向量存储封装 - 直接使用 agenticx.storage.vectordb_storages.milvus.MilvusStorage\n提供简单的兼容性接口，充分利用 base 类的便利方法\n\"\"\"\nimport logging\nimport asyncio\nfrom typing import List, Dict, Any, Optional\n\nfrom ..core.config import settings\nfrom agenticx.storage.vectordb_storages.milvus import MilvusStorage\nfrom agenticx.storage.vectordb_storages.base import VectorRecord, VectorDBQuery\n\nlogger = logging.getLogger(__name__)\n\n\nclass VectorStorage:\n    \"\"\"\n    Milvus 向量存储封装类\n    直接使用 agenticx.storage.vectordb_storages.milvus.MilvusStorage\n    提供简单的兼容性接口，只做必要的接口转换\n    \"\"\"\n    \n    def __init__(\n        self,\n        host: str = None,\n        port: int = None,\n        collection_name: str = None,\n        dim: int = None,\n    ):\n        \"\"\"初始化向量存储\"\"\"\n        self.host = host or settings.MILVUS_HOST\n        self.port = port or settings.MILVUS_PORT\n        self.collection_name = collection_name or settings.MILVUS_COLLECTION_NAME\n        self.dim = dim or settings.MILVUS_DIM\n        \n        # 直接使用 agenticx MilvusStorage\n        self.milvus_storage = MilvusStorage(\n            dimension=self.dim,\n            host=self.host,\n            port=self.port,\n            collection_name=self.collection_name\n        )\n        \n        logger.info(f\"Initialized VectorStorage using MilvusStorage: {self.collection_name}, dim={self.dim}\")\n    \n    def _call_add_async(self, records: List[VectorRecord], timeout: int = 15) -> None:\n        \"\"\"辅助方法：在同步上下文中调用异步 add() 方法\"\"\"\n        try:\n            loop = asyncio.get_running_loop()\n            future = asyncio.run_coroutine_threadsafe(self.milvus_storage.add(records), loop)\n            try:\n                future.result(timeout=timeout)\n            except Exception:\n                logger.warning(f\"Vector insert timeout ({timeout}s), but data may have been inserted\")\n        except RuntimeError:\n            try:\n                asyncio.run(asyncio.wait_for(self.milvus_storage.add(records), timeout=timeout))\n            except asyncio.TimeoutError:\n                logger.warning(f\"Vector insert timeout ({timeout}s), but data may have been inserted\")\n    \n    def connect(self):\n        \"\"\"连接到 Milvus（兼容性方法）\"\"\"\n        # MilvusStorage 在初始化时已经连接\n        pass\n    \n    def create_collection(self, drop_existing: bool = False):\n        \"\"\"创建集合（兼容性方法）\"\"\"\n        # MilvusStorage 在初始化时已经创建集合\n        if drop_existing:\n            self.milvus_storage.clear()\n            self.milvus_storage = MilvusStorage(\n                dimension=self.dim,\n                host=self.host,\n                port=self.port,\n                collection_name=self.collection_name\n            )\n    \n    def load_collection(self):\n        \"\"\"加载集合到内存（兼容性方法）\"\"\"\n        self.milvus_storage.load()\n    \n    def store_embedding(\n        self,\n        news_id: int,\n        embedding: List[float],\n        text: str\n    ) -> int:\n        \"\"\"存储单个向量（兼容性接口）\"\"\"\n        record = VectorRecord(\n            id=str(news_id),\n            vector=embedding,\n            payload={\"news_id\": news_id, \"text\": text[:65535]}\n        )\n        self._call_add_async([record], timeout=15)\n        return news_id\n    \n    def store_embeddings_batch(\n        self,\n        news_ids: List[int],\n        embeddings: List[List[float]],\n        texts: List[str]\n    ) -> List[int]:\n        \"\"\"批量存储向量（兼容性接口）\"\"\"\n        records = [\n            VectorRecord(\n                id=str(news_id),\n                vector=embedding,\n                payload={\"news_id\": news_id, \"text\": text[:65535]}\n            )\n            for news_id, embedding, text in zip(news_ids, embeddings, texts)\n        ]\n        self._call_add_async(records, timeout=30)\n        return news_ids\n    \n    def search_similar(\n        self,\n        query_embedding: List[float],\n        top_k: int = 10,\n        filter_expr: Optional[str] = None\n    ) -> List[Dict[str, Any]]:\n        \"\"\"搜索相似向量（兼容性接口）\"\"\"\n        query = VectorDBQuery(query_vector=query_embedding, top_k=top_k)\n        results = self.milvus_storage.query(query)\n        \n        # 格式化结果\n        formatted_results = []\n        for result in results:\n            payload = result.record.payload or {}\n            news_id = payload.get(\"news_id\")\n            if news_id is None:\n                try:\n                    news_id = int(result.record.id)\n                except (ValueError, TypeError):\n                    continue\n            \n            # 简单的过滤支持\n            if filter_expr and \"news_id\" in filter_expr:\n                import re\n                match = re.search(r'news_id\\s*==\\s*(\\d+)', filter_expr)\n                if match and news_id != int(match.group(1)):\n                    continue\n            \n            formatted_results.append({\n                \"id\": result.record.id,\n                \"news_id\": news_id,\n                \"text\": payload.get(\"text\", \"\"),\n                \"distance\": result.similarity,\n                \"score\": 1 / (1 + result.similarity) if result.similarity > 0 else 1.0,\n            })\n        \n        return formatted_results\n    \n    def delete_by_news_id(self, news_id: int):\n        \"\"\"删除指定新闻的向量（兼容性接口）\"\"\"\n        self.milvus_storage.delete([str(news_id)])\n    \n    def verify_insert(self, news_id: int, wait_for_flush: bool = True) -> bool:\n        \"\"\"验证数据是否成功插入（兼容性接口）\"\"\"\n        if wait_for_flush:\n            import time\n            time.sleep(3)\n        \n        # 使用 base 类的 get_payloads_by_vector 方法\n        zero_vector = [0.0] * self.dim\n        payloads = self.milvus_storage.get_payloads_by_vector(zero_vector, top_k=1000)\n        \n        for payload in payloads:\n            if payload and payload.get(\"news_id\") == news_id:\n                return True\n        return False\n    \n    def get_stats(self) -> Dict[str, Any]:\n        \"\"\"获取集合统计信息（兼容性接口）\n        \n        注意：如果 num_entities 为 0，会通过实际查询来获取真实数量\n        （因为 flush 失败时 num_entities 可能不准确）\n        \"\"\"\n        status = self.milvus_storage.status()\n        num_entities = status.vector_count\n        \n        # 如果 num_entities 为 0，尝试通过查询获取真实数量\n        # 这可以解决 flush 失败导致统计不准确的问题\n        if num_entities == 0:\n            try:\n                from agenticx.storage.vectordb_storages.base import VectorDBQuery\n                # 使用零向量查询，设置一个较大的 top_k 来获取实际数量\n                zero_vector = [0.0] * status.vector_dim\n                query = VectorDBQuery(query_vector=zero_vector, top_k=10000)  # 最多查询10000条\n                results = self.milvus_storage.query(query)\n                if results:\n                    num_entities = len(results)\n                    # 如果返回了10000条，说明可能还有更多，标记为近似值\n                    if len(results) >= 10000:\n                        num_entities = f\"{len(results)}+ (近似值，实际可能更多)\"\n            except Exception as e:\n                logger.debug(f\"无法通过查询获取真实数量: {e}\")\n                # 如果查询失败，仍然使用 num_entities=0\n        \n        return {\n            \"num_entities\": num_entities,\n            \"collection_name\": self.collection_name,\n            \"dim\": status.vector_dim,\n        }\n    \n    def disconnect(self):\n        \"\"\"断开连接（兼容性方法）\"\"\"\n        self.milvus_storage.close()\n    \n    @property\n    def collection(self):\n        \"\"\"兼容性属性：返回底层的 Milvus collection 对象\"\"\"\n        return self.milvus_storage.collection\n\n\n# 全局实例\n_vector_storage: Optional[VectorStorage] = None\n\n\ndef get_vector_storage() -> VectorStorage:\n    \"\"\"获取向量存储实例（单例模式）\"\"\"\n    global _vector_storage\n    if _vector_storage is None:\n        _vector_storage = VectorStorage()\n    return _vector_storage\n"
  },
  {
    "path": "backend/app/tasks/__init__.py",
    "content": "\"\"\"\nCelery 任务模块\n\"\"\"\nfrom .crawl_tasks import realtime_crawl_task, cold_start_crawl_task\n\n__all__ = [\n    \"realtime_crawl_task\",\n    \"cold_start_crawl_task\",\n]\n\n"
  },
  {
    "path": "backend/app/tasks/crawl_tasks.py",
    "content": "\"\"\"\nCelery 爬取任务 - Phase 2: 实时监控升级版 + 多源支持\n\"\"\"\nimport logging\nimport json\nfrom datetime import datetime, timedelta\nfrom typing import List, Dict, Any\nfrom sqlalchemy import select, create_engine, text\nfrom sqlalchemy.orm import Session\nimport asyncio\n\nfrom ..core.celery_app import celery_app\nfrom ..core.config import settings\nfrom ..core.redis_client import redis_client\nfrom ..models.crawl_task import CrawlTask, CrawlMode, TaskStatus\nfrom ..models.news import News\nfrom ..tools import (\n    SinaCrawlerTool,\n    TencentCrawlerTool,\n    JwviewCrawlerTool,\n    EeoCrawlerTool,\n    CaijingCrawlerTool,\n    Jingji21CrawlerTool,\n    NbdCrawlerTool,\n    YicaiCrawlerTool,\n    Netease163CrawlerTool,\n    EastmoneyCrawlerTool,\n    bochaai_search,\n    NewsItem,\n)\nfrom ..tools.crawler_enhanced import EnhancedCrawler, crawl_url\n\nlogger = logging.getLogger(__name__)\n\n\ndef clean_text_for_db(text: str) -> str:\n    \"\"\"\n    清理文本中不适合存入数据库的字符\n    \n    PostgreSQL 不允许在文本字段中存储 NUL 字符 (\\x00)\n    \n    Args:\n        text: 原始文本\n        \n    Returns:\n        清理后的文本\n    \"\"\"\n    if text is None:\n        return None\n    if not isinstance(text, str):\n        return text\n    # 移除 NUL 字符\n    return text.replace('\\x00', '').replace('\\0', '')\n\n\ndef get_crawler_tool(source: str):\n    \"\"\"\n    爬虫工厂函数\n    \n    Args:\n        source: 新闻源名称\n        \n    Returns:\n        对应的爬虫实例\n    \"\"\"\n    crawlers = {\n        \"sina\": SinaCrawlerTool,\n        \"tencent\": TencentCrawlerTool,\n        \"jwview\": JwviewCrawlerTool,\n        \"eeo\": EeoCrawlerTool,\n        \"caijing\": CaijingCrawlerTool,\n        \"jingji21\": Jingji21CrawlerTool,\n        \"nbd\": NbdCrawlerTool,\n        \"yicai\": YicaiCrawlerTool,\n        \"163\": Netease163CrawlerTool,\n        \"eastmoney\": EastmoneyCrawlerTool,\n    }\n    \n    crawler_class = crawlers.get(source)\n    if not crawler_class:\n        raise ValueError(f\"Unknown news source: {source}\")\n    \n    return crawler_class()\n\n\ndef get_sync_db_session():\n    \"\"\"获取同步数据库会话（Celery任务中使用）\"\"\"\n    engine = create_engine(settings.SYNC_DATABASE_URL)\n    return Session(engine)\n\n\n@celery_app.task(bind=True, name=\"app.tasks.crawl_tasks.realtime_crawl_task\")\ndef realtime_crawl_task(self, source: str = \"sina\", force_refresh: bool = False):\n    \"\"\"\n    实时爬取任务 (Phase 2 升级版)\n    \n    核心改进：\n    1. Redis 缓存检查（避免频繁爬取）\n    2. 智能时间过滤（基于配置的 NEWS_RETENTION_HOURS）\n    3. 只爬取最新一页\n    \n    Args:\n        source: 新闻源（sina, jrj等）\n        force_refresh: 是否强制刷新（跳过缓存）\n    \"\"\"\n    db = get_sync_db_session()\n    task_record = None\n    cache_key = f\"news:{source}:latest\"\n    cache_time_key = f\"{cache_key}:timestamp\"\n    \n    try:\n        # ===== Phase 2.1: 检查 Redis 缓存 =====\n        if not force_refresh and redis_client.is_available():\n            cache_metadata = redis_client.get_cache_metadata(cache_key)\n            \n            if cache_metadata:\n                age_seconds = cache_metadata['age_seconds']\n                # 根据不同源获取对应的爬取间隔\n                interval_map = {\n                    \"sina\": settings.CRAWL_INTERVAL_SINA,\n                    \"tencent\": settings.CRAWL_INTERVAL_TENCENT,\n                    \"jwview\": settings.CRAWL_INTERVAL_JWVIEW,\n                    \"eeo\": settings.CRAWL_INTERVAL_EEO,\n                    \"caijing\": settings.CRAWL_INTERVAL_CAIJING,\n                    \"jingji21\": settings.CRAWL_INTERVAL_JINGJI21,\n                    \"nbd\": 60,  # 每日经济新闻\n                    \"yicai\": 60,  # 第一财经\n                    \"163\": 60,  # 网易财经\n                    \"eastmoney\": 60,  # 东方财富\n                }\n                interval = interval_map.get(source, 60)  # 默认60秒\n                \n                # 如果缓存时间 < 爬取间隔，使用缓存\n                if age_seconds < interval:\n                    logger.info(\n                        f\"[{source}] 使用缓存数据 (age: {age_seconds:.0f}s < {interval}s)\"\n                    )\n                    return {\n                        \"status\": \"cached\",\n                        \"source\": source,\n                        \"cache_age\": age_seconds,\n                        \"message\": f\"缓存数据仍然有效，距上次爬取 {age_seconds:.0f} 秒\"\n                    }\n        \n        # ===== 1. 创建任务记录 =====\n        task_record = CrawlTask(\n            celery_task_id=self.request.id,\n            mode=CrawlMode.REALTIME,\n            status=TaskStatus.RUNNING,\n            source=source,\n            config={\n                \"page_limit\": 1, \n                \"retention_hours\": settings.NEWS_RETENTION_HOURS,\n                \"force_refresh\": force_refresh\n            },\n            started_at=datetime.utcnow(),\n        )\n        db.add(task_record)\n        db.commit()\n        db.refresh(task_record)\n        \n        logger.info(f\"[Task {task_record.id}] 🚀 开始实时爬取: {source}\")\n        \n        # ===== 2. 创建爬虫（使用工厂函数） =====\n        try:\n            crawler = get_crawler_tool(source)\n        except ValueError as e:\n            logger.error(f\"[Task {task_record.id}] ❌ {e}\")\n            raise\n        \n        # ===== 3. 执行爬取（只爬第一页） =====\n        start_time = datetime.utcnow()\n        news_list = crawler.crawl(start_page=1, end_page=1)\n        \n        logger.info(f\"[Task {task_record.id}] 📰 爬取到 {len(news_list)} 条新闻\")\n        \n        # ===== Phase 2.2: 智能时间过滤 =====\n        cutoff_time = datetime.utcnow() - timedelta(hours=settings.NEWS_RETENTION_HOURS)\n        recent_news = [\n            news for news in news_list\n            if news.publish_time and news.publish_time > cutoff_time\n        ] if news_list else []\n        \n        logger.info(\n            f\"[Task {task_record.id}] ⏱️  过滤后剩余 {len(recent_news)} 条新闻 \"\n            f\"(保留 {settings.NEWS_RETENTION_HOURS} 小时内)\"\n        )\n        \n        # ===== 4. 去重并保存 =====\n        saved_count = 0\n        duplicate_count = 0\n        \n        for news_item in recent_news:\n            # 检查URL是否已存在\n            existing = db.execute(\n                select(News).where(News.url == news_item.url)\n            ).scalar_one_or_none()\n            \n            if existing:\n                duplicate_count += 1\n                logger.debug(f\"[Task {task_record.id}] ⏭️  跳过重复新闻: {news_item.title[:30]}...\")\n                continue\n            \n            # 创建新记录（清理 NUL 字符，PostgreSQL 不允许存储）\n            news = News(\n                title=clean_text_for_db(news_item.title),\n                content=clean_text_for_db(news_item.content),\n                raw_html=clean_text_for_db(news_item.raw_html),  # 保存原始 HTML\n                url=clean_text_for_db(news_item.url),\n                source=clean_text_for_db(news_item.source),\n                publish_time=news_item.publish_time,\n                author=clean_text_for_db(news_item.author),\n                keywords=news_item.keywords,\n                stock_codes=news_item.stock_codes,\n            )\n            \n            db.add(news)\n            saved_count += 1\n        \n        db.commit()\n        \n        logger.info(\n            f\"[Task {task_record.id}] 💾 保存 {saved_count} 条新新闻 \"\n            f\"(重复: {duplicate_count})\"\n        )\n        \n        # ===== Phase 2.3: 更新 Redis 缓存 =====\n        if redis_client.is_available() and recent_news:\n            # 将新闻列表序列化后存入缓存\n            cache_data = [\n                {\n                    \"title\": n.title,\n                    \"url\": n.url,\n                    \"publish_time\": n.publish_time.isoformat() if n.publish_time else None,\n                    \"source\": n.source,\n                }\n                for n in recent_news\n            ]\n            success = redis_client.set_with_metadata(\n                cache_key, \n                cache_data, \n                ttl=settings.CACHE_TTL\n            )\n            if success:\n                logger.info(f\"[Task {task_record.id}] 💾 Redis 缓存已更新 (TTL: {settings.CACHE_TTL}s)\")\n        \n        # ===== 5. 更新任务状态 =====\n        end_time = datetime.utcnow()\n        execution_time = (end_time - start_time).total_seconds()\n        \n        task_record.status = TaskStatus.COMPLETED\n        task_record.completed_at = end_time\n        task_record.execution_time = execution_time\n        task_record.crawled_count = len(recent_news)\n        task_record.saved_count = saved_count\n        task_record.result = {\n            \"total_crawled\": len(news_list),\n            \"filtered\": len(recent_news),\n            \"saved\": saved_count,\n            \"duplicates\": duplicate_count,\n            \"retention_hours\": settings.NEWS_RETENTION_HOURS,\n        }\n        db.commit()\n        \n        logger.info(\n            f\"[Task {task_record.id}] ✅ 完成! \"\n            f\"爬取: {len(news_list)} → 过滤: {len(recent_news)} → 保存: {saved_count}, \"\n            f\"耗时: {execution_time:.2f}s\"\n        )\n        \n        return {\n            \"task_id\": task_record.id,\n            \"status\": \"completed\",\n            \"source\": source,\n            \"crawled\": len(news_list),\n            \"filtered\": len(recent_news),\n            \"saved\": saved_count,\n            \"duplicates\": duplicate_count,\n            \"execution_time\": execution_time,\n            \"timestamp\": datetime.utcnow().isoformat(),\n        }\n        \n    except Exception as e:\n        logger.error(f\"[Task {task_record.id if task_record else 'unknown'}] 爬取失败: {e}\", exc_info=True)\n        \n        if task_record:\n            task_record.status = TaskStatus.FAILED\n            task_record.completed_at = datetime.utcnow()\n            task_record.error_message = str(e)[:1000]\n            db.commit()\n        \n        # 重新抛出异常，让 Celery 记录\n        raise\n    \n    finally:\n        db.close()\n\n\n@celery_app.task(bind=True, name=\"app.tasks.crawl_tasks.cold_start_crawl_task\")\ndef cold_start_crawl_task(\n    self,\n    source: str = \"sina\",\n    start_page: int = 1,\n    end_page: int = 50,\n):\n    \"\"\"\n    冷启动批量爬取任务\n    \n    Args:\n        source: 新闻源\n        start_page: 起始页\n        end_page: 结束页\n    \"\"\"\n    db = get_sync_db_session()\n    task_record = None\n    \n    try:\n        # 1. 创建任务记录\n        task_record = CrawlTask(\n            celery_task_id=self.request.id,\n            mode=CrawlMode.COLD_START,\n            status=TaskStatus.RUNNING,\n            source=source,\n            config={\n                \"start_page\": start_page,\n                \"end_page\": end_page,\n            },\n            total_pages=end_page - start_page + 1,\n            started_at=datetime.utcnow(),\n        )\n        db.add(task_record)\n        db.commit()\n        db.refresh(task_record)\n        \n        logger.info(f\"[Task {task_record.id}] 开始冷启动爬取: {source}, 页码 {start_page}-{end_page}\")\n        \n        # 2. 创建爬虫\n        if source == \"sina\":\n            crawler = SinaCrawlerTool()\n        else:\n            raise ValueError(f\"不支持的新闻源: {source}\")\n        \n        # 3. 分页爬取\n        start_time = datetime.utcnow()\n        total_crawled = 0\n        total_saved = 0\n        \n        for page in range(start_page, end_page + 1):\n            try:\n                # 更新进度\n                task_record.current_page = page\n                task_record.progress = {\n                    \"current_page\": page,\n                    \"total_pages\": task_record.total_pages,\n                    \"percentage\": round((page - start_page + 1) / task_record.total_pages * 100, 2),\n                }\n                db.commit()\n                \n                # 爬取单页\n                news_list = crawler.crawl(start_page=page, end_page=page)\n                total_crawled += len(news_list)\n                \n                # 保存新闻\n                page_saved = 0\n                for news_item in news_list:\n                    existing = db.execute(\n                        select(News).where(News.url == news_item.url)\n                    ).scalar_one_or_none()\n                    \n                    if not existing:\n                        # 清理 NUL 字符，PostgreSQL 不允许存储\n                        news = News(\n                            title=clean_text_for_db(news_item.title),\n                            content=clean_text_for_db(news_item.content),\n                            raw_html=clean_text_for_db(news_item.raw_html),  # 保存原始 HTML\n                            url=clean_text_for_db(news_item.url),\n                            source=clean_text_for_db(news_item.source),\n                            publish_time=news_item.publish_time,\n                            author=clean_text_for_db(news_item.author),\n                            keywords=news_item.keywords,\n                            stock_codes=news_item.stock_codes,\n                        )\n                        db.add(news)\n                        page_saved += 1\n                \n                db.commit()\n                total_saved += page_saved\n                \n                logger.info(\n                    f\"[Task {task_record.id}] 页 {page}/{end_page}: \"\n                    f\"爬取 {len(news_list)} 条, 保存 {page_saved} 条\"\n                )\n                \n            except Exception as e:\n                logger.error(f\"[Task {task_record.id}] 页 {page} 爬取失败: {e}\")\n                continue\n        \n        # 4. 更新任务状态\n        end_time = datetime.utcnow()\n        execution_time = (end_time - start_time).total_seconds()\n        \n        task_record.status = TaskStatus.COMPLETED\n        task_record.completed_at = end_time\n        task_record.execution_time = execution_time\n        task_record.crawled_count = total_crawled\n        task_record.saved_count = total_saved\n        task_record.result = {\n            \"pages_crawled\": end_page - start_page + 1,\n            \"total_crawled\": total_crawled,\n            \"total_saved\": total_saved,\n            \"duplicates\": total_crawled - total_saved,\n        }\n        db.commit()\n        \n        logger.info(\n            f\"[Task {task_record.id}] 冷启动完成! \"\n            f\"页数: {end_page - start_page + 1}, 爬取: {total_crawled}, 保存: {total_saved}, \"\n            f\"耗时: {execution_time:.2f}s\"\n        )\n        \n        return {\n            \"task_id\": task_record.id,\n            \"status\": \"completed\",\n            \"crawled\": total_crawled,\n            \"saved\": total_saved,\n            \"execution_time\": execution_time,\n        }\n        \n    except Exception as e:\n        logger.error(f\"[Task {task_record.id if task_record else 'unknown'}] 冷启动失败: {e}\", exc_info=True)\n        \n        if task_record:\n            task_record.status = TaskStatus.FAILED\n            task_record.completed_at = datetime.utcnow()\n            task_record.error_message = str(e)[:1000]\n            db.commit()\n        \n        raise\n    \n    finally:\n        db.close()\n\n\n@celery_app.task(bind=True, name=\"app.tasks.crawl_tasks.targeted_stock_crawl_task\")\ndef targeted_stock_crawl_task(\n    self,\n    stock_code: str,\n    stock_name: str,\n    days: int = 30,\n    task_record_id: int = None\n):\n    \"\"\"\n    定向爬取某只股票的相关新闻（精简版 - 只使用 BochaAI）\n    \n    数据来源：BochaAI 搜索引擎 API\n    \n    图谱构建逻辑：\n    - 有历史新闻数据 → 先构建/使用图谱 → 基于图谱扩展关键词搜索\n    - 无历史新闻数据 → 先用 BochaAI 爬取 → 爬取完成后异步构建图谱\n    \n    Args:\n        stock_code: 股票代码（如 SH600519）\n        stock_name: 股票名称（如 贵州茅台）\n        days: 搜索时间范围（天），默认30天\n        task_record_id: 数据库中的任务记录ID（如果已创建）\n    \"\"\"\n    db = get_sync_db_session()\n    task_record = None\n    \n    try:\n        # 标准化股票代码\n        code = stock_code.upper()\n        if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n            pure_code = code[2:]\n        else:\n            pure_code = code\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        # 1. 获取或创建任务记录\n        if task_record_id:\n            task_record = db.query(CrawlTask).filter(CrawlTask.id == task_record_id).first()\n            if task_record:\n                task_record.status = TaskStatus.RUNNING\n                task_record.started_at = datetime.utcnow()\n                db.commit()\n                db.refresh(task_record)\n            else:\n                logger.warning(f\"Task record {task_record_id} not found, creating new one\")\n                task_record_id = None\n        \n        if not task_record:\n            task_record = CrawlTask(\n                celery_task_id=self.request.id,\n                mode=CrawlMode.TARGETED,\n                status=TaskStatus.RUNNING,\n                source=\"targeted\",\n                config={\n                    \"stock_code\": code,\n                    \"stock_name\": stock_name,\n                    \"days\": days,\n                },\n                started_at=datetime.utcnow(),\n            )\n            db.add(task_record)\n            db.commit()\n            db.refresh(task_record)\n        \n        logger.info(f\"[Task {task_record.id}] 🎯 开始定向爬取: {stock_name}({code}), 时间范围: {days}天\")\n        \n        start_time = datetime.utcnow()\n        all_news = []\n        search_results = []\n        \n        # ========================================\n        # 【核心逻辑】先用 akshare 获取股票基础信息，构建简单图谱\n        # ========================================\n        task_record.progress = {\"current\": 5, \"total\": 100, \"message\": \"获取股票基础信息...\"}\n        db.commit()\n        \n        from ..knowledge.knowledge_extractor import AkshareKnowledgeExtractor\n        \n        # 1. 从 akshare 获取公司基础信息\n        logger.info(f\"[Task {task_record.id}] 🔍 从 akshare 获取 {stock_name}({pure_code}) 基础信息...\")\n        akshare_info = None\n        try:\n            akshare_info = AkshareKnowledgeExtractor.extract_company_info(pure_code)\n            if akshare_info:\n                logger.info(f\"[Task {task_record.id}] ✅ akshare 返回: 行业={akshare_info.get('industry')}, 主营={akshare_info.get('main_business', '')[:50]}...\")\n            else:\n                logger.warning(f\"[Task {task_record.id}] ⚠️ akshare 未返回数据，将使用股票名称生成关键词\")\n        except Exception as e:\n            logger.warning(f\"[Task {task_record.id}] ⚠️ akshare 查询失败: {e}，将使用股票名称生成关键词\")\n        \n        # 2. 构建简单图谱并生成搜索关键词\n        task_record.progress = {\"current\": 10, \"total\": 100, \"message\": \"构建知识图谱...\"}\n        db.commit()\n        \n        simple_graph = AkshareKnowledgeExtractor.build_simple_graph_from_info(\n            stock_code=code,\n            stock_name=stock_name,\n            akshare_info=akshare_info\n        )\n        \n        # 获取分层关键词\n        core_keywords = simple_graph.get(\"core_keywords\", [stock_name])\n        extension_keywords = simple_graph.get(\"extension_keywords\", [])\n        \n        logger.info(\n            f\"[Task {task_record.id}] 📋 关键词分层: \"\n            f\"核心={len(core_keywords)}个{core_keywords[:4]}, \"\n            f\"扩展={len(extension_keywords)}个{extension_keywords[:4]}\"\n        )\n        logger.info(f\"[Task {task_record.id}] 🔑 完整核心关键词列表: {core_keywords}\")\n        logger.info(f\"[Task {task_record.id}] 🔑 完整扩展关键词列表: {extension_keywords}\")\n        \n        # ========================================\n        # 【搜索阶段】使用组合关键词调用 BochaAI 搜索\n        # ========================================\n        task_record.progress = {\"current\": 20, \"total\": 100, \"message\": \"BochaAI 组合搜索中...\"}\n        db.commit()\n        \n        if not bochaai_search.is_available():\n            logger.error(f\"[Task {task_record.id}] ❌ BochaAI API Key 未配置，无法执行搜索\")\n            raise ValueError(\"BochaAI API Key 未配置\")\n        \n        # ========================================\n        # 【组合搜索策略】\n        # 1. 必须搜索：核心关键词（公司名、代码）\n        # 2. 可选组合：核心词 + 扩展词（行业、业务、人名）\n        # ========================================\n        all_search_results = []\n        search_queries = []\n        \n        # 策略1：核心关键词单独搜索（取前3个最重要的）\n        for core_kw in core_keywords[:3]:\n            # 跳过纯数字代码（单独搜会很泛）\n            if not (core_kw.isdigit() or core_kw.startswith(\"SH\") or core_kw.startswith(\"SZ\")):\n                search_queries.append(core_kw)\n        \n        # 策略2：核心词 + 扩展词组合搜索（最多3个组合）\n        if extension_keywords:\n            # 取最主要的核心词（通常是股票简称）\n            main_core = core_keywords[0] if core_keywords else stock_name\n            \n            for ext_kw in extension_keywords[:3]:\n                # 组合搜索：如 \"*ST国华 软件开发\"\n                combined_query = f\"{main_core} {ext_kw}\"\n                search_queries.append(combined_query)\n        \n        # 限制总查询数（避免过多请求）\n        search_queries = search_queries[:5]\n        \n        logger.info(f\"[Task {task_record.id}] 🚀 生成 {len(search_queries)} 个搜索查询:\")\n        for i, q in enumerate(search_queries):\n            logger.info(f\"  [{i+1}] {q}\")\n        \n        # 执行搜索\n        for query in search_queries:\n            try:\n                logger.info(f\"[Task {task_record.id}] 🔍 搜索: '{query}'\")\n                kw_results = bochaai_search.search_stock_news(\n                    stock_name=query,  # 使用组合查询\n                    stock_code=pure_code,\n                    days=days,\n                    count=50,  # 每个查询最多 50 条\n                    max_age_days=365\n                )\n                logger.info(f\"[Task {task_record.id}] 📰 查询 '{query}' 搜索到 {len(kw_results)} 条结果\")\n                all_search_results.extend(kw_results)\n            except Exception as e:\n                logger.warning(f\"[Task {task_record.id}] ⚠️ 查询 '{query}' 搜索失败: {e}\")\n        \n        # 去重（按 URL）\n        seen_urls = set()\n        search_results = []\n        for r in all_search_results:\n            if r.url not in seen_urls:\n                seen_urls.add(r.url)\n                search_results.append(r)\n        \n        logger.info(f\"[Task {task_record.id}] 📊 合并 {len(all_search_results)} 条，去重后 {len(search_results)} 条\")\n        \n        # ========================================\n        # 【处理阶段】转换搜索结果为 NewsItem\n        # ========================================\n        task_record.progress = {\"current\": 50, \"total\": 100, \"message\": \"处理搜索结果...\"}\n        db.commit()\n        \n        bochaai_matched = 0\n        bochaai_filtered = 0\n        \n        # 检查是否应该启用宽松过滤模式\n        # 如果核心关键词太少（<= 2个），或者搜索结果很少（<10条），使用宽松过滤\n        use_relaxed_filter = len(core_keywords) <= 2 or len(search_results) < 10\n        if use_relaxed_filter:\n            logger.info(f\"[Task {task_record.id}] 🔓 启用宽松过滤模式（核心词={len(core_keywords)}个, 结果={len(search_results)}条）\")\n        \n        # 打印 BochaAI 返回的前 10 条数据用于调试\n        logger.info(f\"[Task {task_record.id}] 📋 BochaAI 返回数据预览 (前10条):\")\n        for i, r in enumerate(search_results[:10]):\n            logger.info(f\"  [{i+1}] 标题: {r.title[:60]}...\")\n            logger.info(f\"      来源: {r.site_name}, 日期: {r.date_published}\")\n            logger.info(f\"      URL: {r.url[:80]}...\")\n        \n        for idx, result in enumerate(search_results):\n            # 解析发布时间\n            publish_time = None\n            if result.date_published:\n                try:\n                    publish_time = datetime.fromisoformat(\n                        result.date_published.replace('Z', '+00:00')\n                    )\n                except (ValueError, AttributeError):\n                    pass\n            \n            # 【注意】不再二次爬取完整内容，直接使用摘要（提升速度）\n            full_content = result.snippet\n            \n            # 相关性过滤：必须包含至少一个核心关键词\n            text_to_check = result.title + \" \" + result.snippet\n            text_to_check_lower = text_to_check.lower()\n            \n            # 检查是否匹配任何核心关键词\n            is_match = False\n            matched_keyword = None\n            for kw in core_keywords:\n                if not kw or len(kw) < 2:\n                    continue\n                \n                kw_lower = kw.lower()\n                \n                # 宽松匹配策略：\n                # 1. 完整匹配（大小写不敏感）\n                if kw in text_to_check or kw_lower in text_to_check_lower:\n                    is_match = True\n                    matched_keyword = kw\n                    break\n                \n                # 2. 去除特殊字符后匹配（处理 *ST 等情况）\n                import re\n                kw_clean = re.sub(r'[*\\s]', '', kw)\n                if len(kw_clean) >= 2 and kw_clean.lower() in text_to_check_lower:\n                    is_match = True\n                    matched_keyword = f\"{kw} (cleaned: {kw_clean})\"\n                    break\n            \n            if not is_match:\n                # 宽松模式下，如果标题包含股票代码数字，也认为相关\n                if use_relaxed_filter and pure_code in text_to_check:\n                    is_match = True\n                    matched_keyword = f\"{pure_code} (relaxed mode)\"\n                    logger.debug(f\"[Task {task_record.id}] 🔓 宽松模式匹配: {result.title[:40]}... (包含代码)\")\n                else:\n                    bochaai_filtered += 1\n                    # 打印前 5 条被过滤的原因\n                    if bochaai_filtered <= 5:\n                        logger.info(f\"[Task {task_record.id}] ❌ 过滤[{idx+1}]: 不包含核心关键词\")\n                        logger.info(f\"      标题: {result.title[:80]}\")\n                        logger.info(f\"      摘要: {result.snippet[:100]}...\")\n                        logger.info(f\"      核心词: {core_keywords}\")\n                    continue\n            \n            # 如果宽松模式跳过了上面的 continue，需要确保 is_match 为 True\n            if not is_match:\n                continue\n            \n            logger.debug(f\"[Task {task_record.id}] ✅ 匹配核心词 '{matched_keyword}': {result.title[:40]}...\")\n            \n            bochaai_matched += 1\n            \n            # 尝试爬取页面获取完整 HTML（只对前 15 条匹配结果爬取，避免任务太慢）\n            raw_html = None\n            crawled_content = None\n            if bochaai_matched <= 15:\n                try:\n                    from ..tools.interactive_crawler import InteractiveCrawler\n                    page_crawler = InteractiveCrawler(timeout=10)\n                    page_data = page_crawler.crawl_page(result.url)\n                    if page_data:\n                        raw_html = page_data.get('html')\n                        crawled_content = page_data.get('content') or page_data.get('text')\n                        logger.debug(f\"[Task {task_record.id}] 📄 爬取成功: {result.url[:50]}... | HTML {len(raw_html) if raw_html else 0}字符\")\n                except Exception as e:\n                    logger.debug(f\"[Task {task_record.id}] ⚠️ 爬取页面失败 {result.url[:50]}...: {e}\")\n            \n            # 优先使用爬取的完整内容\n            final_content = crawled_content if crawled_content and len(crawled_content) > len(full_content) else full_content\n            \n            news_item = NewsItem(\n                title=result.title,\n                content=final_content,\n                url=result.url,\n                source=result.site_name or \"web_search\",\n                publish_time=publish_time,\n                stock_codes=[pure_code, code],\n                raw_html=raw_html,\n            )\n            all_news.append(news_item)\n            \n            # 每处理 20 条更新一次进度\n            if (idx + 1) % 20 == 0:\n                progress_pct = 50 + int((idx + 1) / len(search_results) * 30)\n                task_record.progress = {\"current\": progress_pct, \"total\": 100, \"message\": f\"处理中 {idx+1}/{len(search_results)}...\"}\n                db.commit()\n        \n        logger.info(f\"[Task {task_record.id}] 🔍 搜索到 {len(search_results)} 条，匹配 {bochaai_matched} 条，过滤 {bochaai_filtered} 条\")\n        \n        # ========================================\n        # 【交互式爬虫补充】如果相关性匹配结果太少，使用交互式爬虫补充\n        # ========================================\n        if bochaai_matched < 5:  # 匹配结果太少时启动交互式爬虫\n            logger.info(f\"[Task {task_record.id}] 🌐 相关结果较少({bochaai_matched}条)，启用交互式爬虫补充...\")\n            \n            try:\n                from ..tools.interactive_crawler import create_interactive_crawler\n                \n                # 使用核心关键词进行搜索\n                # 取最主要的核心词（通常是股票简称）\n                interactive_query = core_keywords[0] if core_keywords else stock_name\n                \n                logger.info(f\"[Task {task_record.id}] 🔍 使用交互式爬虫搜索: '{interactive_query}'\")\n                \n                crawler = create_interactive_crawler(headless=True)\n                # 使用百度资讯搜索（专门获取新闻，比 Bing 更稳定）\n                interactive_results = crawler.interactive_search(\n                    interactive_query,\n                    engines=[\"baidu_news\", \"sogou\"],  # 百度资讯 + 搜狗\n                    num_results=15,\n                    search_type=\"news\"  # 新闻搜索\n                )\n                \n                logger.info(f\"[Task {task_record.id}] ✅ 交互式爬虫返回 {len(interactive_results)} 条结果\")\n                \n                # 现在使用 news.baidu.com 入口，返回的是真实的第三方链接\n                # 可以安全爬取这些页面获取完整内容（除了需要 JS 渲染的网站）\n                \n                # 需要 JS 渲染的网站列表（无法用 requests 爬取）\n                JS_RENDERED_SITES = [\n                    'baijiahao.baidu.com',  # 百家号需要 JS 渲染\n                    'mbd.baidu.com',        # 百度移动版\n                    'xueqiu.com',           # 雪球\n                    'mp.weixin.qq.com',     # 微信公众号\n                ]\n                \n                for result in interactive_results[:10]:  # 最多取 10 条\n                    url = result.get('url', '')\n                    title = result.get('title', '')\n                    snippet = result.get('snippet', '')\n                    \n                    # 跳过无效结果\n                    if not url or not title:\n                        continue\n                    # 跳过已存在的 URL\n                    if url in {item.url for item in all_news}:\n                        continue\n                    # 跳过百度跳转链接\n                    if 'baidu.com/link?' in url:\n                        logger.debug(f\"跳过百度跳转链接: {url}\")\n                        continue\n                    \n                    # 检查是否是需要 JS 渲染的网站\n                    needs_js_render = any(site in url for site in JS_RENDERED_SITES)\n                    \n                    page_content = \"\"\n                    raw_html = None\n                    \n                    if needs_js_render:\n                        # JS 渲染网站：直接使用搜索结果的摘要\n                        logger.debug(f\"  ⚠️ JS渲染网站，使用搜索摘要: {url[:50]}...\")\n                        page_content = snippet if snippet else title\n                    else:\n                        # 普通网站：尝试爬取页面获取完整内容\n                        try:\n                            page_data = crawler.crawl_page(url)\n                            if page_data:\n                                page_content = page_data.get('text', '') or page_data.get('content', '')\n                                raw_html = page_data.get('html', '')\n                                # 如果爬取的标题更完整，使用爬取的标题\n                                if page_data.get('title') and len(page_data.get('title', '')) > len(title):\n                                    title = page_data.get('title', title)\n                                logger.debug(f\"  ✅ 成功爬取页面: {title[:30]}...\")\n                        except Exception as e:\n                            logger.debug(f\"  ⚠️ 爬取页面失败 {url}: {e}\")\n                    \n                    # 如果爬取失败，使用搜索结果的摘要\n                    if not page_content:\n                        page_content = snippet if snippet else title\n                    \n                    news_item = NewsItem(\n                        title=title,\n                        content=page_content,\n                        url=url,\n                        source=result.get('news_source') or result.get('source', 'baidu_news'),\n                        publish_time=None,  # 交互爬虫没有发布时间\n                        stock_codes=[pure_code, code],\n                        raw_html=raw_html,  # JS 渲染网站不保存乱码 HTML\n                    )\n                    all_news.append(news_item)\n                    bochaai_matched += 1\n                \n                logger.info(f\"[Task {task_record.id}] 📊 交互式爬虫补充后总计: {bochaai_matched} 条匹配结果\")\n                \n            except ImportError:\n                logger.warning(f\"[Task {task_record.id}] ⚠️ 交互式爬虫模块不可用，跳过补充搜索\")\n            except Exception as e:\n                logger.error(f\"[Task {task_record.id}] ❌ 交互式爬虫补充失败: {e}\", exc_info=True)\n        \n        # ========================================\n        # 【保存阶段】去重并保存新闻\n        # ========================================\n        task_record.progress = {\"current\": 80, \"total\": 100, \"message\": \"保存新闻...\"}\n        db.commit()\n        saved_count = 0\n        duplicate_count = 0\n        \n        logger.info(f\"[Task {task_record.id}] 💾 开始保存 {len(all_news)} 条新闻...\")\n        \n        for news_item in all_news:\n            # 检查URL是否已存在\n            existing = db.execute(\n                select(News).where(News.url == news_item.url)\n            ).scalar_one_or_none()\n            \n            if existing:\n                duplicate_count += 1\n                # 如果已存在但没有关联这个股票，更新关联\n                if existing.stock_codes is None:\n                    existing.stock_codes = []\n                if pure_code not in existing.stock_codes:\n                    existing.stock_codes = existing.stock_codes + [pure_code]\n                    db.commit()\n                continue\n            \n            # 创建新记录（清理 NUL 字符，PostgreSQL 不允许存储）\n            news = News(\n                title=clean_text_for_db(news_item.title),\n                content=clean_text_for_db(news_item.content),\n                raw_html=clean_text_for_db(news_item.raw_html),  # 保存原始 HTML\n                url=clean_text_for_db(news_item.url),\n                source=clean_text_for_db(news_item.source),\n                publish_time=news_item.publish_time,\n                author=clean_text_for_db(news_item.author),\n                keywords=news_item.keywords,\n                stock_codes=news_item.stock_codes or [pure_code, code],\n            )\n            \n            db.add(news)\n            saved_count += 1\n        \n        db.commit()\n        \n        logger.info(\n            f\"[Task {task_record.id}] 💾 保存 {saved_count} 条新闻 \"\n            f\"(重复: {duplicate_count})\"\n        )\n        \n        # ========================================\n        # 【图谱更新阶段】异步构建完整图谱（基于 Neo4j）\n        # ========================================\n        task_record.progress = {\"current\": 90, \"total\": 100, \"message\": \"触发异步图谱构建...\"}\n        db.commit()\n        \n        if saved_count > 0:\n            # 有新闻保存成功后，触发异步图谱构建任务\n            logger.info(f\"[Task {task_record.id}] 🧠 触发异步图谱构建任务...\")\n            try:\n                build_knowledge_graph_task.delay(code, stock_name)\n                logger.info(f\"[Task {task_record.id}] ✅ 异步图谱构建任务已触发\")\n            except Exception as e:\n                logger.error(f\"[Task {task_record.id}] ❌ 触发异步图谱构建失败: {e}\")\n        \n        # ========================================\n        # 【完成阶段】更新任务状态\n        # ========================================\n        end_time = datetime.utcnow()\n        execution_time = (end_time - start_time).total_seconds()\n        \n        task_record.status = TaskStatus.COMPLETED\n        task_record.completed_at = end_time\n        task_record.execution_time = execution_time\n        task_record.crawled_count = len(all_news)\n        task_record.saved_count = saved_count\n        task_record.result = {\n            \"stock_code\": code,\n            \"stock_name\": stock_name,\n            \"total_found\": len(all_news),\n            \"saved\": saved_count,\n            \"duplicates\": duplicate_count,\n            \"akshare_info\": bool(akshare_info),  # 是否获取到 akshare 数据\n            \"core_keywords\": core_keywords[:5],  # 核心关键词\n            \"search_queries\": search_queries,  # 实际搜索的查询\n            \"sources\": {\n                \"bochaai\": len(search_results),\n            }\n        }\n        task_record.progress = {\n            \"current\": 100,\n            \"total\": 100,\n            \"message\": f\"完成！新增 {saved_count} 条新闻\"\n        }\n        db.commit()\n        \n        logger.info(\n            f\"[Task {task_record.id}] ✅ 定向爬取完成! \"\n            f\"股票: {stock_name}({code}), 找到: {len(all_news)}, 保存: {saved_count}, \"\n            f\"耗时: {execution_time:.2f}s\"\n        )\n        \n        return {\n            \"task_id\": task_record.id,\n            \"status\": \"completed\",\n            \"stock_code\": code,\n            \"stock_name\": stock_name,\n            \"crawled\": len(all_news),\n            \"saved\": saved_count,\n            \"duplicates\": duplicate_count,\n            \"execution_time\": execution_time,\n            \"timestamp\": datetime.utcnow().isoformat(),\n        }\n        \n    except Exception as e:\n        logger.error(f\"[Task {task_record.id if task_record else 'unknown'}] 定向爬取失败: {e}\", exc_info=True)\n        \n        if task_record:\n            task_record.status = TaskStatus.FAILED\n            task_record.completed_at = datetime.utcnow()\n            task_record.error_message = str(e)[:1000]\n            task_record.progress = {\n                \"current\": 0,\n                \"total\": 100,\n                \"message\": f\"失败: {str(e)[:100]}\"\n            }\n            db.commit()\n        \n        raise\n    \n    finally:\n        db.close()\n\n\n@celery_app.task(bind=True, name=\"app.tasks.crawl_tasks.build_knowledge_graph_task\")\ndef build_knowledge_graph_task(self, stock_code: str, stock_name: str):\n    \"\"\"\n    异步构建知识图谱任务\n    \n    在无历史新闻数据的股票首次爬取完成后触发。\n    从数据库中的新闻数据 + akshare 基础信息构建知识图谱。\n    \n    Args:\n        stock_code: 股票代码（如 SH600519）\n        stock_name: 股票名称（如 贵州茅台）\n    \"\"\"\n    db = get_sync_db_session()\n    \n    try:\n        code = stock_code.upper()\n        if code.startswith(\"SH\") or code.startswith(\"SZ\"):\n            pure_code = code[2:]\n        else:\n            pure_code = code\n            code = f\"SH{code}\" if code.startswith(\"6\") else f\"SZ{code}\"\n        \n        logger.info(f\"[GraphTask] 🏗️ 开始异步构建知识图谱: {stock_name}({code})\")\n        \n        from ..knowledge.graph_service import get_graph_service\n        from ..knowledge.knowledge_extractor import (\n            create_knowledge_extractor,\n            AkshareKnowledgeExtractor\n        )\n        \n        graph_service = get_graph_service()\n        \n        # 1. 检查图谱是否已存在（避免重复构建）\n        existing_graph = graph_service.get_company_graph(code)\n        if existing_graph:\n            logger.info(f\"[GraphTask] ✅ 图谱已存在，跳过构建\")\n            return {\"status\": \"skipped\", \"reason\": \"graph_exists\"}\n        \n        # 2. 从 akshare 获取基础公司信息\n        akshare_info = AkshareKnowledgeExtractor.extract_company_info(code)\n        \n        if akshare_info:\n            extractor = create_knowledge_extractor()\n            base_graph = asyncio.run(\n                extractor.extract_from_akshare(code, stock_name, akshare_info)\n            )\n            graph_service.build_company_graph(base_graph)\n            logger.info(f\"[GraphTask] ✅ 基础图谱构建完成\")\n        else:\n            logger.warning(f\"[GraphTask] ⚠️ akshare 未返回数据\")\n        \n        # 3. 从数据库新闻中提取信息更新图谱\n        recent_news = db.execute(\n            text(\"\"\"\n                SELECT title, content FROM news \n                WHERE stock_codes @> ARRAY[:code]::varchar[] \n                ORDER BY publish_time DESC LIMIT 50\n            \"\"\").bindparams(code=pure_code)\n        ).fetchall()\n        \n        if recent_news:\n            news_data = [{\"title\": n[0], \"content\": n[1]} for n in recent_news]\n            extractor = create_knowledge_extractor()\n            \n            extracted_info = asyncio.run(\n                extractor.extract_from_news(code, stock_name, news_data)\n            )\n            \n            if any(extracted_info.values()):\n                graph_service.update_from_news(code, \"\", extracted_info)\n                logger.info(f\"[GraphTask] ✅ 从新闻更新图谱完成\")\n        \n        logger.info(f\"[GraphTask] ✅ 知识图谱构建完成: {stock_name}({code})\")\n        \n        return {\n            \"status\": \"completed\",\n            \"stock_code\": code,\n            \"stock_name\": stock_name,\n            \"news_count\": len(recent_news) if recent_news else 0,\n        }\n        \n    except Exception as e:\n        logger.error(f\"[GraphTask] ❌ 知识图谱构建失败: {e}\", exc_info=True)\n        return {\"status\": \"failed\", \"error\": str(e)}\n    \n    finally:\n        db.close()\n\n"
  },
  {
    "path": "backend/app/tools/__init__.py",
    "content": "\"\"\"\n工具模块\n\"\"\"\nfrom .crawler_base import BaseCrawler, NewsItem\nfrom .sina_crawler import SinaCrawlerTool, create_sina_crawler\nfrom .tencent_crawler import TencentCrawlerTool\nfrom .jwview_crawler import JwviewCrawlerTool\nfrom .eeo_crawler import EeoCrawlerTool\nfrom .caijing_crawler import CaijingCrawlerTool\nfrom .jingji21_crawler import Jingji21CrawlerTool\nfrom .nbd_crawler import NbdCrawlerTool\nfrom .yicai_crawler import YicaiCrawlerTool\nfrom .netease163_crawler import Netease163CrawlerTool\nfrom .eastmoney_crawler import EastmoneyCrawlerTool\nfrom .text_cleaner import TextCleanerTool, create_text_cleaner\nfrom .bochaai_search import BochaAISearchTool, bochaai_search, SearchResult\n\n__all__ = [\n    \"BaseCrawler\",\n    \"NewsItem\",\n    \"SinaCrawlerTool\",\n    \"create_sina_crawler\",\n    \"TencentCrawlerTool\",\n    \"JwviewCrawlerTool\",\n    \"EeoCrawlerTool\",\n    \"CaijingCrawlerTool\",\n    \"Jingji21CrawlerTool\",\n    \"NbdCrawlerTool\",\n    \"YicaiCrawlerTool\",\n    \"Netease163CrawlerTool\",\n    \"EastmoneyCrawlerTool\",\n    \"TextCleanerTool\",\n    \"create_text_cleaner\",\n    \"BochaAISearchTool\",\n    \"bochaai_search\",\n    \"SearchResult\",\n]\n\n"
  },
  {
    "path": "backend/app/tools/bochaai_search.py",
    "content": "\"\"\"\nBochaAI Web Search Tool\n用于定向搜索股票相关新闻\n\"\"\"\nimport json\nimport logging\nimport urllib.request\nimport urllib.error\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime\nfrom dataclasses import dataclass\n\nfrom ..core.config import settings\n\nlogger = logging.getLogger(__name__)\n\n\n@dataclass\nclass SearchResult:\n    \"\"\"搜索结果数据类\"\"\"\n    title: str\n    url: str\n    snippet: str\n    site_name: Optional[str] = None\n    date_published: Optional[str] = None\n    \n\nclass BochaAISearchTool:\n    \"\"\"\n    BochaAI Web Search 工具\n    用于搜索股票相关新闻\n    \"\"\"\n    \n    def __init__(self, api_key: Optional[str] = None, endpoint: Optional[str] = None):\n        \"\"\"\n        初始化 BochaAI 搜索工具\n        \n        Args:\n            api_key: BochaAI API Key（如果不提供，从配置中获取）\n            endpoint: API 端点（默认使用配置中的端点）\n        \"\"\"\n        self.api_key = api_key or settings.BOCHAAI_API_KEY\n        self.endpoint = endpoint or settings.BOCHAAI_ENDPOINT\n        \n        if not self.api_key:\n            logger.warning(\n                \"BochaAI API Key 未配置，搜索功能将不可用。\\n\"\n                \"请在 .env 文件中设置 BOCHAAI_API_KEY=your_api_key\"\n            )\n    \n    def is_available(self) -> bool:\n        \"\"\"检查搜索功能是否可用\"\"\"\n        return bool(self.api_key)\n    \n    def search(\n        self,\n        query: str,\n        freshness: str = \"noLimit\",\n        count: int = 10,\n        offset: int = 0,\n        include_sites: Optional[str] = None,\n        exclude_sites: Optional[str] = None,\n    ) -> List[SearchResult]:\n        \"\"\"\n        执行 Web 搜索\n        \n        Args:\n            query: 搜索查询字符串\n            freshness: 时间范围（noLimit, day, week, month）\n            count: 返回结果数量（1-50，单次最大50条）\n            offset: 结果偏移量（用于分页）\n            include_sites: 限定搜索的网站（逗号分隔）\n            exclude_sites: 排除的网站（逗号分隔）\n            \n        Returns:\n            搜索结果列表\n        \"\"\"\n        if not self.is_available():\n            logger.warning(\"BochaAI API Key 未配置，跳过搜索\")\n            return []\n        \n        try:\n            # 构建请求数据\n            request_data = {\n                \"query\": query,\n                \"freshness\": freshness,\n                \"summary\": False,\n                \"count\": min(max(count, 1), 50)\n            }\n            \n            # 添加 offset 参数进行分页\n            if offset > 0:\n                request_data[\"offset\"] = offset\n            \n            if include_sites:\n                request_data[\"include\"] = include_sites\n            if exclude_sites:\n                request_data[\"exclude\"] = exclude_sites\n            \n            # 创建请求\n            req = urllib.request.Request(\n                self.endpoint,\n                data=json.dumps(request_data).encode('utf-8'),\n                headers={\n                    'Authorization': f'Bearer {self.api_key}',\n                    'Content-Type': 'application/json',\n                    'User-Agent': 'FinnewsHunter-BochaAI-Search/1.0'\n                }\n            )\n            \n            # 发送请求\n            with urllib.request.urlopen(req, timeout=30) as response:\n                data = response.read().decode('utf-8')\n                result = json.loads(data)\n            \n            # 解析结果\n            results = []\n            \n            if 'data' in result:\n                data = result['data']\n                if 'webPages' in data and data['webPages'] and 'value' in data['webPages']:\n                    for item in data['webPages']['value']:\n                        search_result = SearchResult(\n                            title=item.get('name', '无标题'),\n                            url=item.get('url', ''),\n                            snippet=item.get('snippet', ''),\n                            site_name=item.get('siteName', ''),\n                            date_published=item.get('datePublished', '')\n                        )\n                        results.append(search_result)\n            \n            logger.info(f\"BochaAI 搜索完成: query='{query}', offset={offset}, 结果数={len(results)}\")\n            return results\n            \n        except urllib.error.HTTPError as e:\n            error_msg = f\"BochaAI API HTTP 错误: {e.code} - {e.reason}\"\n            if e.code == 401:\n                error_msg += \" (请检查 BOCHAAI_API_KEY 是否正确)\"\n            elif e.code == 429:\n                error_msg += \" (请求过于频繁)\"\n            logger.error(error_msg)\n            return []\n            \n        except urllib.error.URLError as e:\n            logger.error(f\"BochaAI 网络错误: {e.reason}\")\n            return []\n            \n        except json.JSONDecodeError as e:\n            logger.error(f\"BochaAI 响应解析失败: {e}\")\n            return []\n            \n        except Exception as e:\n            logger.error(f\"BochaAI 搜索失败: {e}\")\n            return []\n    \n    def search_stock_news(\n        self,\n        stock_name: str,\n        stock_code: Optional[str] = None,\n        days: int = 30,\n        count: int = 100,\n        max_age_days: int = 365,\n    ) -> List[SearchResult]:\n        \"\"\"\n        搜索股票相关新闻\n        \n        Args:\n            stock_name: 股票名称（如\"贵州茅台\"）\n            stock_code: 股票代码（可选，如\"600519\"）\n            days: 搜索时间范围（天），用于API freshness参数\n            count: 返回结果数量（支持超过50条，会自动分页请求）\n            max_age_days: 最大新闻年龄（天），默认365天（1年），超过此时间的新闻将被过滤\n            \n        Returns:\n            搜索结果列表（按时间从新到旧排序，只返回最近max_age_days天内的新闻）\n        \"\"\"\n        # 构建搜索查询 - 简洁明确，添加\"最新\"关键词优先获取新内容\n        query = f\"{stock_name} 最新\"\n        \n        # BochaAI API 支持的 freshness 参数值：\n        # - noLimit: 不限制\n        # - oneDay: 一天内\n        # - oneWeek: 一周内  \n        # - oneMonth: 一月内\n        # 注意：不支持 \"year\"、\"day\"、\"week\" 等其他值！\n        \n        # 根据请求天数确定 freshness 参数\n        if days <= 1:\n            freshness = \"oneDay\"\n        elif days <= 7:\n            freshness = \"oneWeek\"\n        elif days <= 30:\n            freshness = \"oneMonth\"\n        else:\n            freshness = \"noLimit\"  # 超过30天用 noLimit，本地再过滤\n        \n        # 财经网站列表（用于优先搜索）\n        finance_sites = (\n            \"finance.sina.com.cn,\"\n            \"stock.eastmoney.com,\"\n            \"finance.qq.com,\"\n            \"money.163.com,\"\n            \"caijing.com.cn,\"\n            \"yicai.com,\"\n            \"nbd.com.cn,\"\n            \"21jingji.com,\"\n            \"eeo.com.cn,\"\n            \"chinanews.com.cn\"\n        )\n        \n        # 计算截止时间（半年前）\n        from datetime import timedelta\n        cutoff_date = datetime.now() - timedelta(days=max_age_days)\n        \n        all_results = []\n        offset = 0\n        batch_size = 50  # API单次最大返回数\n        max_requests = 5  # 最多请求5次，防止无限循环\n        request_count = 0\n        \n        logger.info(f\"BochaAI 开始搜索股票新闻: {stock_name}, 目标数量={count}, 截止日期={cutoff_date.strftime('%Y-%m-%d')}\")\n        \n        while len(all_results) < count and request_count < max_requests:\n            batch_results = self.search(\n                query=query,\n                freshness=freshness,\n                count=batch_size,\n                offset=offset,\n                include_sites=finance_sites\n            )\n            \n            if not batch_results:\n                logger.info(f\"BochaAI 第{request_count+1}次请求未返回结果，停止分页\")\n                break\n            \n            # 时间过滤：保留有日期且在范围内的新闻，以及无日期但可能相关的新闻\n            for result in batch_results:\n                # 如果有发布日期，检查是否在时间范围内\n                if result.date_published:\n                    try:\n                        # 尝试解析发布时间\n                        pub_date = datetime.fromisoformat(\n                            result.date_published.replace('Z', '+00:00')\n                        )\n                        # 转换为无时区的时间进行比较\n                        if pub_date.tzinfo:\n                            pub_date = pub_date.replace(tzinfo=None)\n                        \n                        # 检查是否在指定时间范围内\n                        if pub_date < cutoff_date:\n                            logger.debug(f\"过滤超过{max_age_days}天的新闻: {result.title[:30]}... ({result.date_published})\")\n                            continue\n                            \n                    except (ValueError, AttributeError) as e:\n                        # 日期解析失败，但仍然保留（可能是新闻）\n                        logger.debug(f\"无法解析日期，但仍保留: {result.title[:30]}...\")\n                else:\n                    # 无日期的新闻也保留（可能是相关新闻）\n                    logger.debug(f\"无日期新闻，保留: {result.title[:30]}...\")\n                \n                # 添加到结果中\n                all_results.append(result)\n                \n                if len(all_results) >= count:\n                    break\n            \n            offset += batch_size\n            request_count += 1\n            logger.info(f\"BochaAI 第{request_count}次请求完成，当前累计 {len(all_results)} 条有效结果\")\n        \n        # 按发布时间排序（从新到旧）\n        def parse_date(r):\n            if r.date_published:\n                try:\n                    dt = datetime.fromisoformat(r.date_published.replace('Z', '+00:00'))\n                    if dt.tzinfo:\n                        dt = dt.replace(tzinfo=None)\n                    return dt\n                except (ValueError, AttributeError):\n                    pass\n            return datetime.min  # 无法解析的日期排在最后\n        \n        all_results.sort(key=parse_date, reverse=True)\n        \n        logger.info(f\"BochaAI 搜索股票新闻完成: {stock_name}, 返回 {len(all_results)} 条结果 (共请求{request_count}次, 仅保留最近{max_age_days}天即{max_age_days//30}个月内)\")\n        \n        return all_results[:count]  # 确保不超过请求数量\n\n\n# 全局实例\nbochaai_search = BochaAISearchTool()\n\n"
  },
  {
    "path": "backend/app/tools/caijing_crawler.py",
    "content": "\"\"\"\n财经网爬虫工具\n目标URL: https://www.caijing.com.cn/ (股市栏目)\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime, timedelta\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass CaijingCrawlerTool(BaseCrawler):\n    \"\"\"\n    财经网爬虫\n    主要爬取股市相关新闻\n    \"\"\"\n    \n    BASE_URL = \"https://finance.caijing.com.cn/\"\n    # 股市栏目URL\n    STOCK_URL = \"https://finance.caijing.com.cn/\"\n    SOURCE_NAME = \"caijing\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"caijing_crawler\",\n            description=\"Crawl financial news from Caijing (caijing.com.cn)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取财经网新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled Caijing, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling Caijing: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"爬取单页新闻\"\"\"\n        news_items = []\n        \n        try:\n            # 尝试爬取股市栏目或主页\n            try:\n                response = self._fetch_page(self.STOCK_URL)\n            except:\n                response = self._fetch_page(self.BASE_URL)\n            \n            # 财经网编码处理\n            if response.encoding == 'ISO-8859-1' or not response.encoding:\n                response.encoding = 'utf-8'\n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links\")\n            \n            # 限制爬取数量\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接\n        all_links = soup.find_all('a', href=True)\n        \n        # 财经网新闻URL模式（扩展更多模式）\n        caijing_patterns = [\n            r'/\\d{4}/',           # 日期路径 /2024/\n            '/article/',         # 文章\n            '.shtml',            # 静态HTML\n            '/finance/',         # 财经频道\n            '/stock/',           # 股票频道\n        ]\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 检查是否匹配财经网URL模式\n            is_caijing_url = False\n            \n            # 方式1: 检查URL模式\n            for pattern in caijing_patterns:\n                if re.search(pattern, href):\n                    is_caijing_url = True\n                    break\n            \n            # 方式2: 检查是否包含caijing.com.cn域名\n            if 'caijing.com.cn' in href or 'finance.caijing.com.cn' in href:\n                is_caijing_url = True\n            \n            # 方式3: 检查链接的class或data属性\n            if not is_caijing_url:\n                link_class = link.get('class', [])\n                if isinstance(link_class, list):\n                    link_class_str = ' '.join(link_class)\n                else:\n                    link_class_str = str(link_class)\n                if any(kw in link_class_str.lower() for kw in ['news', 'article', 'item', 'title', 'list']):\n                    if href.startswith('/') or 'caijing.com.cn' in href:\n                        is_caijing_url = True\n            \n            if is_caijing_url and title and len(title.strip()) > 5:\n                # 规范化 URL，优先 https，避免重复前缀\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://www.caijing.com.cn' + href\n                elif href.startswith('http://'):\n                    href = href.replace('http://', 'https://', 1)\n                elif not href.startswith('http'):\n                    href = 'https://www.caijing.com.cn/' + href.lstrip('/')\n                \n                # 过滤掉明显不是新闻的链接\n                if any(skip in href.lower() for skip in ['javascript:', 'mailto:', '#', 'void(0)', '/tag/', '/author/', '/user/']):\n                    continue\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title.strip()})\n        \n        logger.debug(f\"Caijing: Found {len(news_links)} potential news links\")\n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'article-content'},\n            {'class': 'main_txt'},\n            {'class': 'content'},\n            {'id': 'the_content'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content:\n                        return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/app/tools/crawler_base.py",
    "content": "\"\"\"\n爬虫基类\n符合 AgenticX BaseTool 协议\n\"\"\"\nimport time\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom dataclasses import dataclass\nfrom datetime import datetime\nimport requests\nfrom bs4 import BeautifulSoup\nimport requests.exceptions\n\nfrom agenticx import BaseTool\nfrom agenticx.core import ToolMetadata, ToolCategory\nfrom ..core.config import settings\n\nlogger = logging.getLogger(__name__)\n\n\n@dataclass\nclass NewsItem:\n    \"\"\"新闻数据项\"\"\"\n    title: str\n    content: str\n    url: str\n    source: str\n    publish_time: Optional[datetime] = None\n    author: Optional[str] = None\n    keywords: Optional[List[str]] = None\n    stock_codes: Optional[List[str]] = None\n    summary: Optional[str] = None\n    raw_html: Optional[str] = None  # 原始 HTML 内容\n    \n    def to_dict(self) -> Dict[str, Any]:\n        \"\"\"转换为字典\"\"\"\n        return {\n            \"title\": self.title,\n            \"content\": self.content,\n            \"url\": self.url,\n            \"source\": self.source,\n            \"publish_time\": self.publish_time.isoformat() if self.publish_time else None,\n            \"author\": self.author,\n            \"keywords\": self.keywords,\n            \"stock_codes\": self.stock_codes,\n            \"summary\": self.summary,\n            \"raw_html\": self.raw_html,\n        }\n\n\nclass BaseCrawler(BaseTool):\n    \"\"\"\n    爬虫基类\n    继承自 AgenticX BaseTool\n    \"\"\"\n    \n    # 股票相关URL关键词\n    STOCK_URL_KEYWORDS = [\n        '/stock/', '/gupiao/', '/securities/', '/zhengquan/', \n        '/a-shares/', '/ashares/', '/equity/', '/shares/',\n        '/market/', '/listed/', '/ipo/'\n    ]\n    \n    # 股票相关标题关键词\n    STOCK_TITLE_KEYWORDS = [\n        '股票', 'A股', 'a股', '上市', '个股', '涨停', '跌停', \n        'IPO', 'ipo', '新股', '配股', '增发', '重组', '并购',\n        '股东', '董事', '证券', '港股', '科创板', '创业板',\n        '主板', '中小板', '北交所', '沪市', '深市', '股价',\n        '股份', '停牌', '复牌', '退市', '借壳'\n    ]\n    \n    def __init__(self, name: str = \"base_crawler\", description: str = \"Base crawler for financial news\"):\n        # 创建 ToolMetadata\n        metadata = ToolMetadata(\n            name=name,\n            description=description,\n            category=ToolCategory.DATA_ACCESS,\n            version=\"1.0.0\"\n        )\n        super().__init__(metadata=metadata)\n        \n        # 爬虫特定配置\n        self.user_agent = settings.CRAWLER_USER_AGENT\n        self.timeout = settings.CRAWLER_TIMEOUT\n        self.max_retries = settings.CRAWLER_MAX_RETRIES\n        self.delay = settings.CRAWLER_DELAY\n        self.session = requests.Session()\n        self.session.headers.update({'User-Agent': self.user_agent})\n    \n    def _fetch_page(self, url: str) -> requests.Response:\n        \"\"\"\n        获取网页内容（带重试机制，但503错误不重试）\n        \n        Args:\n            url: 目标URL\n            \n        Returns:\n            响应对象\n        \"\"\"\n        max_retries = 3\n        for attempt in range(max_retries):\n            try:\n                response = self.session.get(url, timeout=self.timeout)\n                \n                # 对于503错误，不重试，直接抛出（让调用者处理）\n                if response.status_code == 503:\n                    logger.debug(f\"503 error for {url}, skipping retry (server overloaded)\")\n                    response.raise_for_status()\n                \n                response.raise_for_status()\n                \n                # 修复编码问题：优先使用 apparent_encoding，如果检测失败则尝试常见编码\n                if response.encoding is None or response.encoding == 'ISO-8859-1':\n                    # 尝试检测真实编码\n                    if response.apparent_encoding:\n                        response.encoding = response.apparent_encoding\n                    else:\n                        # 对于中文网站，尝试常见编码\n                        encodings = ['utf-8', 'gb2312', 'gbk', 'gb18030']\n                        for enc in encodings:\n                            try:\n                                # 尝试解码验证\n                                response.content.decode(enc)\n                                response.encoding = enc\n                                break\n                            except (UnicodeDecodeError, LookupError):\n                                continue\n                        else:\n                            # 如果都失败，默认使用 utf-8\n                            response.encoding = 'utf-8'\n                \n                time.sleep(self.delay)  # 请求间隔\n                return response\n                \n            except requests.exceptions.HTTPError as e:\n                # 503错误不重试，直接抛出\n                if e.response and e.response.status_code == 503:\n                    logger.debug(f\"503 error for {url}, not retrying\")\n                    raise\n                # 其他HTTP错误，重试\n                if attempt < max_retries - 1:\n                    wait_time = min(2 ** attempt, 10)\n                    logger.warning(f\"HTTP error fetching {url} (attempt {attempt + 1}/{max_retries}): {e}, retrying in {wait_time}s...\")\n                    time.sleep(wait_time)\n                else:\n                    logger.error(f\"HTTP error fetching {url} after {max_retries} attempts: {e}\")\n                    raise\n            except Exception as e:\n                # 其他错误，重试\n                if attempt < max_retries - 1:\n                    wait_time = min(2 ** attempt, 10)\n                    logger.warning(f\"Error fetching {url} (attempt {attempt + 1}/{max_retries}): {e}, retrying in {wait_time}s...\")\n                    time.sleep(wait_time)\n                else:\n                    logger.error(f\"Failed to fetch {url} after {max_retries} attempts: {e}\")\n                    raise\n        \n        # 理论上不会到达这里\n        raise Exception(f\"Failed to fetch {url} after {max_retries} attempts\")\n    \n    def _parse_html(self, html: str) -> BeautifulSoup:\n        \"\"\"\n        解析HTML\n        \n        Args:\n            html: HTML字符串\n            \n        Returns:\n            BeautifulSoup对象\n        \"\"\"\n        return BeautifulSoup(html, 'lxml')\n    \n    def _extract_chinese_ratio(self, text: str) -> float:\n        \"\"\"\n        计算中文字符比例\n        \n        Args:\n            text: 文本\n            \n        Returns:\n            中文字符比例（0-1）\n        \"\"\"\n        import re\n        pattern = re.compile(r'[\\u4e00-\\u9fa5]+')\n        chinese_chars = pattern.findall(text)\n        chinese_count = sum(len(chars) for chars in chinese_chars)\n        total_count = len(text)\n        return chinese_count / total_count if total_count > 0 else 0\n    \n    def _clean_text(self, text: str) -> str:\n        \"\"\"\n        清理文本\n        \n        Args:\n            text: 原始文本\n            \n        Returns:\n            清理后的文本\n        \"\"\"\n        import re\n        # 移除HTML标签\n        text = re.sub(r'<[^>]+>', '', text)\n        # 移除特殊空格\n        text = text.replace('\\u3000', ' ')\n        # 移除多余空格和换行\n        text = ' '.join(text.split())\n        return text.strip()\n    \n    def _extract_article_content(self, soup: BeautifulSoup, selectors: List[dict] = None) -> str:\n        \"\"\"\n        通用智能内容提取方法\n        \n        Args:\n            soup: BeautifulSoup对象\n            selectors: 可选的自定义选择器列表\n            \n        Returns:\n            提取的正文内容\n        \"\"\"\n        import re\n        \n        # 默认选择器（按优先级排序）\n        default_selectors = [\n            # 文章主体选择器\n            {'class': re.compile(r'article[-_]?(body|content|text|main)', re.I)},\n            {'class': re.compile(r'content[-_]?(article|body|text|main)', re.I)},\n            {'class': re.compile(r'main[-_]?(content|body|text|article)', re.I)},\n            {'class': re.compile(r'^(article|content|body|text|post)$', re.I)},\n            {'itemprop': 'articleBody'},\n            {'id': re.compile(r'(article|content|body|text)[-_]?(content|body|text)?', re.I)},\n            # 通用选择器\n            {'class': 'g-article-content'},\n            {'class': 'article-content'},\n            {'class': 'news-content'},\n            {'id': 'contentText'},\n        ]\n        \n        all_selectors = (selectors or []) + default_selectors\n        \n        for selector in all_selectors:\n            content_div = soup.find(['div', 'article', 'section', 'main'], selector)\n            if content_div:\n                # 移除无关元素\n                for tag in content_div.find_all(['script', 'style', 'iframe', 'ins', 'noscript', 'nav', 'footer', 'header']):\n                    tag.decompose()\n                for ad in content_div.find_all(class_=re.compile(r'(ad|advertisement|banner|recommend|related|share|comment)', re.I)):\n                    ad.decompose()\n                \n                # 提取所有段落（不限制数量）\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content and len(content) > 50:\n                        return self._clean_text(content)\n                \n                # 如果没有 p 标签，直接取文本\n                text = content_div.get_text(separator='\\n', strip=True)\n                if text and len(text) > 50:\n                    return self._clean_text(text)\n        \n        # 后备方案：取所有符合条件的段落（不限制数量）\n        paragraphs = soup.find_all('p')\n        if paragraphs:\n            valid_paragraphs = [\n                p.get_text(strip=True) for p in paragraphs \n                if p.get_text(strip=True) and len(p.get_text(strip=True)) > 15\n                and not any(kw in p.get_text(strip=True).lower() for kw in ['copyright', '版权', '广告', 'advertisement'])\n            ]\n            content = '\\n'.join(valid_paragraphs)\n            if content:\n                return self._clean_text(content)\n        \n        return \"\"\n    \n    def _is_stock_related_by_url(self, url: str) -> bool:\n        \"\"\"\n        根据URL路径判断是否为股票相关新闻\n        \n        Args:\n            url: 新闻URL\n            \n        Returns:\n            是否为股票相关\n        \"\"\"\n        url_lower = url.lower()\n        return any(keyword in url_lower for keyword in self.STOCK_URL_KEYWORDS)\n    \n    def _is_stock_related_by_title(self, title: str) -> bool:\n        \"\"\"\n        根据标题关键词判断是否为股票相关新闻\n        \n        Args:\n            title: 新闻标题\n            \n        Returns:\n            是否为股票相关\n        \"\"\"\n        return any(keyword in title for keyword in self.STOCK_TITLE_KEYWORDS)\n    \n    def _filter_stock_news(self, news_list: List[NewsItem]) -> List[NewsItem]:\n        \"\"\"\n        筛选股票相关新闻\n        组合URL路径和标题关键词两种策略\n        \n        策略调整：\n        - 如果过滤后没有新闻，返回所有新闻（避免过度过滤）\n        - 对于财经类网站，放宽筛选条件\n        \n        Args:\n            news_list: 原始新闻列表\n            \n        Returns:\n            股票相关新闻列表\n        \"\"\"\n        filtered_news = []\n        url_matched = 0\n        title_matched = 0\n        filtered_out = 0\n        \n        for news in news_list:\n            # URL匹配 或 标题匹配\n            url_match = self._is_stock_related_by_url(news.url)\n            title_match = self._is_stock_related_by_title(news.title)\n            \n            if url_match or title_match:\n                filtered_news.append(news)\n                if url_match:\n                    url_matched += 1\n                if title_match:\n                    title_matched += 1\n                logger.debug(f\"✓ Stock news matched: {news.title[:50]}... (URL:{url_match}, Title:{title_match})\")\n            else:\n                filtered_out += 1\n                # 只记录前5条被过滤的，避免日志过多\n                if filtered_out <= 5:\n                    logger.debug(f\"✗ Filtered out: {news.title[:50]}...\")\n        \n        logger.info(f\"Stock filter [{self.SOURCE_NAME}]: {len(news_list)} -> {len(filtered_news)} items \"\n                   f\"(URL matched: {url_matched}, Title matched: {title_matched}, Filtered: {filtered_out})\")\n        \n        # 如果过滤后没有新闻，返回所有新闻（避免过度过滤）\n        # 这对于财经类网站特别重要，因为它们的新闻通常都与金融相关\n        if len(news_list) > 0 and len(filtered_news) == 0:\n            logger.warning(f\"⚠️  All {len(news_list)} news items were filtered out for source {self.SOURCE_NAME}. \"\n                          f\"Returning all news to avoid over-filtering.\")\n            return news_list\n        \n        return filtered_news\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取新闻\n        \n        Args:\n            start_page: 起始页\n            end_page: 结束页\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        raise NotImplementedError(\"Subclass must implement crawl method\")\n    \n    def _setup_parameters(self):\n        \"\"\"设置工具参数（AgenticX 要求）\"\"\"\n        pass  # 爬虫不需要特殊参数设置\n    \n    def execute(self, **kwargs) -> Dict[str, Any]:\n        \"\"\"\n        同步执行方法（AgenticX Tool 协议要求）\n        \n        Args:\n            **kwargs: 参数字典\n                - start_page: 起始页\n                - end_page: 结束页\n                \n        Returns:\n            执行结果\n        \"\"\"\n        start_page = kwargs.get('start_page', 1)\n        end_page = kwargs.get('end_page', 1)\n        \n        logger.info(f\"Crawling from page {start_page} to {end_page}\")\n        news_list = self.crawl(start_page, end_page)\n        \n        return {\n            \"success\": True,\n            \"count\": len(news_list),\n            \"news_list\": [news.to_dict() for news in news_list],\n        }\n    \n    async def aexecute(self, **kwargs) -> Dict[str, Any]:\n        \"\"\"\n        异步执行方法（AgenticX Tool 协议要求）\n        当前实现为同步执行的包装\n        \n        Args:\n            **kwargs: 参数字典\n                \n        Returns:\n            执行结果\n        \"\"\"\n        return self.execute(**kwargs)\n\n"
  },
  {
    "path": "backend/app/tools/crawler_enhanced.py",
    "content": "\"\"\"\n增强版爬虫模块\n整合 deer-flow、BasicWebCrawler 和现有爬虫的优点\n\n特性：\n1. 多引擎支持：本地爬取 + Jina Reader API + Playwright JS 渲染\n2. 智能内容提取：readabilipy + 启发式算法\n3. 网站特定配置\n4. 内容质量评估与自动重试\n5. 缓存和去重\n6. 统一 Article 模型，支持 LLM 消息格式\n\"\"\"\nimport re\nimport os\nimport json\nimport time\nimport hashlib\nimport logging\nfrom typing import List, Dict, Any, Optional, Literal\nfrom dataclasses import dataclass, field\nfrom datetime import datetime\nfrom pathlib import Path\nfrom urllib.parse import urlparse, urljoin\n\nimport requests\nfrom bs4 import BeautifulSoup\nfrom tenacity import retry, stop_after_attempt, wait_exponential\n\n# 可选依赖\ntry:\n    from markdownify import markdownify as md\nexcept ImportError:\n    md = None\n\ntry:\n    from readabilipy import simple_json_from_html_string\nexcept ImportError:\n    simple_json_from_html_string = None\n\ntry:\n    from playwright.sync_api import sync_playwright\nexcept ImportError:\n    sync_playwright = None\n\nlogger = logging.getLogger(__name__)\n\n\n# ============ 配置 ============\n\n# 财经新闻网站特定配置\nFINANCE_SITE_CONFIGS = {\n    # 新浪财经\n    'finance.sina.com.cn': {\n        'main_content_selectors': [\n            '.article-content', '.article', '#artibody', \n            '.main-content', '.post-body'\n        ],\n        'title_selectors': ['h1.main-title', 'h1', '.article-title'],\n        'time_selectors': ['.date', '.pub_date', '.time-source'],\n        'needs_js': False,\n        'headers': {\n            'Referer': 'https://finance.sina.com.cn/',\n        }\n    },\n    # 东方财富\n    'eastmoney.com': {\n        'main_content_selectors': [\n            '.article-content', '#ContentBody', '.newsContent',\n            '.article', '.content-article'\n        ],\n        'needs_js': True,\n        'wait_selectors': ['.article-content', '#ContentBody'],\n    },\n    # 每经网\n    'nbd.com.cn': {\n        'main_content_selectors': [\n            '.article-content', '.g-article-content', \n            '.article-detail', '.post-content'\n        ],\n        'needs_js': False,\n    },\n    # 财新\n    'caixin.com': {\n        'main_content_selectors': [\n            '#Main_Content_Val', '.article-content', \n            '.articleBody', '.main-content'\n        ],\n        'needs_cookies': True,  # 付费内容\n        'needs_js': False,\n    },\n    # 腾讯财经\n    'finance.qq.com': {\n        'main_content_selectors': [\n            '.content-article', '.Cnt-Main-Article-QQ',\n            '#Cnt-Main-Article-QQ', '.article-content'\n        ],\n        'needs_js': False,\n    },\n    # 21世纪经济报道\n    '21jingji.com': {\n        'main_content_selectors': [\n            '.article-content', '.detailContent', \n            '.article-body', '.post-content'\n        ],\n        'needs_js': False,\n    },\n    # 默认配置\n    'default': {\n        'main_content_selectors': [\n            'article', 'main', '.article', '.content', \n            '.post-content', '.entry-content', '#content'\n        ],\n        'needs_js': False,\n        'headers': {}\n    }\n}\n\n\n# ============ Article 模型 ============\n\n@dataclass\nclass Article:\n    \"\"\"\n    统一的文章模型（参考 deer-flow）\n    支持转换为 Markdown 和 LLM 消息格式\n    \"\"\"\n    title: str\n    content: str  # 纯文本内容\n    html_content: Optional[str] = None  # 原始 HTML\n    url: str = \"\"\n    source: str = \"\"\n    publish_time: Optional[datetime] = None\n    author: Optional[str] = None\n    keywords: List[str] = field(default_factory=list)\n    stock_codes: List[str] = field(default_factory=list)\n    images: List[str] = field(default_factory=list)\n    \n    # 元数据\n    crawl_time: datetime = field(default_factory=datetime.utcnow)\n    engine_used: str = \"\"  # 使用的爬取引擎\n    quality_score: float = 0.0  # 内容质量评分\n    \n    def to_markdown(self, include_title: bool = True, include_meta: bool = False) -> str:\n        \"\"\"转换为 Markdown 格式\"\"\"\n        parts = []\n        \n        if include_title and self.title:\n            parts.append(f\"# {self.title}\\n\")\n        \n        if include_meta:\n            meta = []\n            if self.source:\n                meta.append(f\"来源: {self.source}\")\n            if self.publish_time:\n                meta.append(f\"时间: {self.publish_time.strftime('%Y-%m-%d %H:%M')}\")\n            if self.author:\n                meta.append(f\"作者: {self.author}\")\n            if self.url:\n                meta.append(f\"原文: {self.url}\")\n            if meta:\n                parts.append(f\"*{' | '.join(meta)}*\\n\")\n        \n        # 如果有 HTML 内容且安装了 markdownify，转换它\n        if self.html_content and md:\n            parts.append(md(self.html_content))\n        else:\n            parts.append(self.content)\n        \n        return \"\\n\".join(parts)\n    \n    def to_llm_message(self) -> List[Dict[str, Any]]:\n        \"\"\"\n        转换为 LLM 消息格式（参考 deer-flow）\n        将图片和文本分离，便于多模态 LLM 处理\n        \"\"\"\n        content: List[Dict[str, str]] = []\n        markdown = self.to_markdown()\n        \n        if not markdown.strip():\n            return [{\"type\": \"text\", \"text\": \"No content available\"}]\n        \n        # 提取图片 URL\n        image_pattern = r\"!\\[.*?\\]\\((.*?)\\)\"\n        parts = re.split(image_pattern, markdown)\n        \n        for i, part in enumerate(parts):\n            if i % 2 == 1:  # 图片 URL\n                image_url = urljoin(self.url, part.strip())\n                content.append({\n                    \"type\": \"image_url\", \n                    \"image_url\": {\"url\": image_url}\n                })\n            else:  # 文本内容\n                text_part = part.strip()\n                if text_part:\n                    content.append({\"type\": \"text\", \"text\": text_part})\n        \n        return content if content else [{\"type\": \"text\", \"text\": \"No content available\"}]\n    \n    def to_dict(self) -> Dict[str, Any]:\n        \"\"\"转换为字典\"\"\"\n        return {\n            \"title\": self.title,\n            \"content\": self.content,\n            \"html_content\": self.html_content,\n            \"url\": self.url,\n            \"source\": self.source,\n            \"publish_time\": self.publish_time.isoformat() if self.publish_time else None,\n            \"author\": self.author,\n            \"keywords\": self.keywords,\n            \"stock_codes\": self.stock_codes,\n            \"images\": self.images,\n            \"crawl_time\": self.crawl_time.isoformat(),\n            \"engine_used\": self.engine_used,\n            \"quality_score\": self.quality_score,\n        }\n\n\n# ============ 内容提取器 ============\n\nclass ContentExtractor:\n    \"\"\"\n    智能内容提取器\n    结合 readabilipy 和启发式算法\n    \"\"\"\n    \n    @staticmethod\n    def extract_with_readability(html: str) -> Optional[Article]:\n        \"\"\"使用 readabilipy 提取（参考 deer-flow）\"\"\"\n        if simple_json_from_html_string is None:\n            return None\n        \n        try:\n            result = simple_json_from_html_string(html, use_readability=True)\n            content = result.get(\"content\", \"\")\n            title = result.get(\"title\", \"Untitled\")\n            \n            if not content or len(content.strip()) < 100:\n                return None\n            \n            return Article(\n                title=title,\n                content=BeautifulSoup(content, 'html.parser').get_text(separator='\\n', strip=True),\n                html_content=content,\n            )\n        except Exception as e:\n            logger.warning(f\"Readability extraction failed: {e}\")\n            return None\n    \n    @staticmethod\n    def extract_with_selectors(soup: BeautifulSoup, config: dict) -> Optional[Article]:\n        \"\"\"使用 CSS 选择器提取\"\"\"\n        # 提取标题\n        title = None\n        for sel in config.get('title_selectors', ['h1', 'title']):\n            el = soup.select_one(sel)\n            if el:\n                title = el.get_text(strip=True)\n                break\n        \n        if not title:\n            title_el = soup.find('title')\n            title = title_el.get_text(strip=True) if title_el else \"Untitled\"\n        \n        # 提取主要内容\n        content_el = None\n        for sel in config.get('main_content_selectors', []):\n            content_el = soup.select_one(sel)\n            if content_el and len(content_el.get_text(strip=True)) > 100:\n                break\n        \n        if not content_el:\n            return None\n        \n        # 清理内容\n        for tag in content_el.find_all(['script', 'style', 'nav', 'footer', 'aside']):\n            tag.decompose()\n        \n        content = content_el.get_text(separator='\\n', strip=True)\n        html_content = str(content_el)\n        \n        if len(content) < 100:\n            return None\n        \n        return Article(\n            title=title,\n            content=content,\n            html_content=html_content,\n        )\n    \n    @staticmethod\n    def extract_heuristic(soup: BeautifulSoup) -> Optional[Article]:\n        \"\"\"\n        启发式内容提取（参考 BasicWebCrawler）\n        找到包含最多段落文本的元素\n        \"\"\"\n        # 提取标题\n        title_el = soup.find('title')\n        title = title_el.get_text(strip=True) if title_el else \"Untitled\"\n        \n        # 排除导航等元素\n        for tag in soup.find_all(['script', 'style', 'nav', 'footer', 'aside', \n                                   'header', '.sidebar', '.advertisement']):\n            if hasattr(tag, 'decompose'):\n                tag.decompose()\n        \n        # 找到最佳内容容器\n        candidates = []\n        for tag in ['article', 'main', 'section', 'div']:\n            for elem in soup.find_all(tag):\n                # 排除导航、侧边栏等\n                elem_class = ' '.join(elem.get('class', [])).lower()\n                elem_id = (elem.get('id') or '').lower()\n                \n                exclude_keywords = ['nav', 'sidebar', 'footer', 'header', \n                                    'menu', 'ad', 'banner', 'comment']\n                if any(kw in elem_class or kw in elem_id for kw in exclude_keywords):\n                    continue\n                \n                text = elem.get_text(strip=True)\n                text_len = len(text)\n                \n                if text_len > 200:\n                    score = text_len\n                    # 有标题加分\n                    if elem.find(['h1', 'h2', 'h3']):\n                        score += 1000\n                    # 有段落加分\n                    p_count = len(elem.find_all('p'))\n                    score += p_count * 50\n                    \n                    candidates.append((elem, score, text_len))\n        \n        if not candidates:\n            return None\n        \n        # 选择得分最高的\n        best_elem = max(candidates, key=lambda x: x[1])[0]\n        content = best_elem.get_text(separator='\\n', strip=True)\n        \n        return Article(\n            title=title,\n            content=content,\n            html_content=str(best_elem),\n        )\n    \n    @classmethod\n    def extract(cls, html: str, url: str = \"\", config: dict = None) -> Article:\n        \"\"\"\n        智能提取：依次尝试多种方法\n        1. readabilipy（最智能）\n        2. CSS 选择器（网站特定）\n        3. 启发式算法（兜底）\n        \"\"\"\n        soup = BeautifulSoup(html, 'html.parser')\n        config = config or FINANCE_SITE_CONFIGS.get('default', {})\n        \n        # 方法 1: readabilipy\n        article = cls.extract_with_readability(html)\n        if article and article.quality_score > 0.5:\n            article.engine_used = \"readability\"\n            return article\n        \n        # 方法 2: CSS 选择器\n        article = cls.extract_with_selectors(soup, config)\n        if article:\n            article.engine_used = \"selectors\"\n            return article\n        \n        # 方法 3: 启发式\n        article = cls.extract_heuristic(soup)\n        if article:\n            article.engine_used = \"heuristic\"\n            return article\n        \n        # 兜底：返回整个 body\n        body = soup.find('body')\n        return Article(\n            title=soup.title.string if soup.title else \"Untitled\",\n            content=body.get_text(separator='\\n', strip=True) if body else \"\",\n            html_content=str(body) if body else \"\",\n            engine_used=\"fallback\",\n        )\n\n\n# ============ 爬取引擎 ============\n\nclass JinaReaderEngine:\n    \"\"\"\n    Jina Reader API 引擎（参考 deer-flow）\n    https://jina.ai/reader\n    \"\"\"\n    \n    API_URL = \"https://r.jina.ai/\"\n    \n    def __init__(self, api_key: Optional[str] = None):\n        self.api_key = api_key or os.getenv(\"JINA_API_KEY\")\n    \n    def crawl(self, url: str, return_format: str = \"html\") -> Optional[str]:\n        \"\"\"爬取 URL\"\"\"\n        headers = {\n            \"Content-Type\": \"application/json\",\n            \"X-Return-Format\": return_format,\n        }\n        if self.api_key:\n            headers[\"Authorization\"] = f\"Bearer {self.api_key}\"\n        \n        try:\n            response = requests.post(\n                self.API_URL,\n                headers=headers,\n                json={\"url\": url},\n                timeout=30\n            )\n            \n            if response.status_code != 200:\n                logger.error(f\"Jina API error: {response.status_code}\")\n                return None\n            \n            return response.text\n        except Exception as e:\n            logger.error(f\"Jina crawl failed: {e}\")\n            return None\n\n\nclass PlaywrightEngine:\n    \"\"\"\n    Playwright 浏览器引擎（参考 BasicWebCrawler）\n    支持 JS 渲染\n    \"\"\"\n    \n    def __init__(self, headless: bool = True):\n        self.headless = headless\n    \n    def crawl(self, url: str, wait_selectors: List[str] = None, \n              timeout_ms: int = 15000) -> Optional[str]:\n        \"\"\"使用 Playwright 爬取\"\"\"\n        if sync_playwright is None:\n            logger.warning(\"Playwright not installed\")\n            return None\n        \n        try:\n            with sync_playwright() as p:\n                browser = p.chromium.launch(\n                    headless=self.headless,\n                    args=['--disable-blink-features=AutomationControlled']\n                )\n                \n                context = browser.new_context(\n                    user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '\n                               'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',\n                    viewport={'width': 1920, 'height': 1080},\n                )\n                \n                # 反检测\n                context.add_init_script(\"\"\"\n                    Object.defineProperty(navigator, 'webdriver', {\n                        get: () => undefined\n                    });\n                \"\"\")\n                \n                page = context.new_page()\n                page.goto(url, wait_until='networkidle', timeout=timeout_ms)\n                \n                # 等待选择器\n                if wait_selectors:\n                    for sel in wait_selectors:\n                        try:\n                            page.wait_for_selector(sel, timeout=5000)\n                            break\n                        except Exception:\n                            continue\n                \n                # 等待内容稳定\n                page.wait_for_timeout(1000)\n                \n                content = page.content()\n                context.close()\n                browser.close()\n                \n                return content\n                \n        except Exception as e:\n            logger.error(f\"Playwright crawl failed: {e}\")\n            return None\n\n\nclass RequestsEngine:\n    \"\"\"\n    基础 Requests 引擎\n    \"\"\"\n    \n    DEFAULT_HEADERS = {\n        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '\n                      'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',\n        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',\n        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',\n    }\n    \n    def __init__(self, timeout: int = 20):\n        self.timeout = timeout\n        self.session = requests.Session()\n        self.session.headers.update(self.DEFAULT_HEADERS)\n    \n    @retry(stop=stop_after_attempt(3), wait=wait_exponential(min=2, max=10))\n    def crawl(self, url: str, headers: dict = None, cookies: dict = None) -> Optional[str]:\n        \"\"\"爬取 URL\"\"\"\n        try:\n            response = self.session.get(\n                url,\n                headers=headers,\n                cookies=cookies,\n                timeout=self.timeout\n            )\n            response.raise_for_status()\n            response.encoding = response.apparent_encoding\n            return response.text\n        except Exception as e:\n            logger.error(f\"Requests crawl failed: {e}\")\n            raise\n\n\n# ============ 缓存 ============\n\nclass CrawlCache:\n    \"\"\"\n    爬取缓存（参考 BasicWebCrawler）\n    \"\"\"\n    \n    def __init__(self, cache_dir: str = \".crawl_cache\", ttl_hours: int = 24):\n        self.cache_dir = Path(cache_dir)\n        self.cache_dir.mkdir(parents=True, exist_ok=True)\n        self.ttl_seconds = ttl_hours * 3600\n    \n    def _key(self, url: str) -> str:\n        return hashlib.md5(url.encode()).hexdigest()\n    \n    def get(self, url: str) -> Optional[str]:\n        \"\"\"获取缓存\"\"\"\n        key = self._key(url)\n        cache_file = self.cache_dir / f\"{key}.json\"\n        \n        if not cache_file.exists():\n            return None\n        \n        try:\n            data = json.loads(cache_file.read_text(encoding='utf-8'))\n            cached_time = datetime.fromisoformat(data['time'])\n            \n            if (datetime.utcnow() - cached_time).total_seconds() > self.ttl_seconds:\n                cache_file.unlink()  # 过期删除\n                return None\n            \n            return data['html']\n        except Exception:\n            return None\n    \n    def set(self, url: str, html: str):\n        \"\"\"设置缓存\"\"\"\n        key = self._key(url)\n        cache_file = self.cache_dir / f\"{key}.json\"\n        \n        try:\n            data = {\n                'url': url,\n                'time': datetime.utcnow().isoformat(),\n                'html': html,\n            }\n            cache_file.write_text(json.dumps(data, ensure_ascii=False), encoding='utf-8')\n        except Exception as e:\n            logger.warning(f\"Cache write failed: {e}\")\n\n\n# ============ 主爬虫类 ============\n\nclass EnhancedCrawler:\n    \"\"\"\n    增强版爬虫\n    自动选择最佳引擎，智能提取内容\n    \"\"\"\n    \n    def __init__(\n        self,\n        use_cache: bool = True,\n        cache_ttl_hours: int = 24,\n        jina_api_key: Optional[str] = None,\n        default_engine: Literal['requests', 'playwright', 'jina'] = 'requests'\n    ):\n        self.use_cache = use_cache\n        self.cache = CrawlCache(ttl_hours=cache_ttl_hours) if use_cache else None\n        \n        # 初始化引擎\n        self.requests_engine = RequestsEngine()\n        self.playwright_engine = PlaywrightEngine()\n        self.jina_engine = JinaReaderEngine(api_key=jina_api_key)\n        \n        self.default_engine = default_engine\n    \n    def _get_site_config(self, url: str) -> dict:\n        \"\"\"获取网站配置\"\"\"\n        domain = urlparse(url).netloc\n        \n        for site_domain, config in FINANCE_SITE_CONFIGS.items():\n            if site_domain in domain:\n                return config\n        \n        return FINANCE_SITE_CONFIGS['default']\n    \n    def _evaluate_quality(self, article: Article) -> float:\n        \"\"\"\n        评估内容质量\n        返回 0-1 的分数\n        \"\"\"\n        score = 0.0\n        \n        # 内容长度\n        content_len = len(article.content)\n        if content_len > 500:\n            score += 0.3\n        elif content_len > 200:\n            score += 0.2\n        elif content_len > 100:\n            score += 0.1\n        \n        # 有标题\n        if article.title and article.title != \"Untitled\":\n            score += 0.2\n        \n        # 中文内容比例（财经新闻应该主要是中文）\n        chinese_pattern = re.compile(r'[\\u4e00-\\u9fa5]')\n        chinese_count = len(chinese_pattern.findall(article.content))\n        if content_len > 0:\n            chinese_ratio = chinese_count / content_len\n            if chinese_ratio > 0.5:\n                score += 0.3\n            elif chinese_ratio > 0.3:\n                score += 0.2\n        \n        # 段落结构\n        paragraph_count = article.content.count('\\n')\n        if paragraph_count > 5:\n            score += 0.2\n        elif paragraph_count > 2:\n            score += 0.1\n        \n        return min(score, 1.0)\n    \n    def crawl(\n        self,\n        url: str,\n        engine: Optional[Literal['requests', 'playwright', 'jina', 'auto']] = None,\n        force_refresh: bool = False\n    ) -> Article:\n        \"\"\"\n        爬取单个 URL\n        \n        Args:\n            url: 目标 URL\n            engine: 爬取引擎 ('requests', 'playwright', 'jina', 'auto')\n            force_refresh: 是否强制刷新缓存\n            \n        Returns:\n            Article 对象\n        \"\"\"\n        # 检查缓存\n        if self.use_cache and not force_refresh:\n            cached_html = self.cache.get(url)\n            if cached_html:\n                logger.info(f\"Using cached content for {url}\")\n                article = ContentExtractor.extract(cached_html, url)\n                article.url = url\n                article.quality_score = self._evaluate_quality(article)\n                return article\n        \n        # 获取网站配置\n        config = self._get_site_config(url)\n        engine = engine or self.default_engine\n        \n        html = None\n        used_engine = engine\n        \n        # 自动选择引擎\n        if engine == 'auto':\n            if config.get('needs_js'):\n                engine = 'playwright'\n            else:\n                engine = 'requests'\n        \n        # 爬取\n        if engine == 'requests':\n            html = self.requests_engine.crawl(\n                url,\n                headers=config.get('headers'),\n                cookies=config.get('cookies')\n            )\n            used_engine = 'requests'\n            \n        elif engine == 'playwright':\n            html = self.playwright_engine.crawl(\n                url,\n                wait_selectors=config.get('wait_selectors')\n            )\n            used_engine = 'playwright'\n            \n        elif engine == 'jina':\n            html = self.jina_engine.crawl(url)\n            used_engine = 'jina'\n        \n        # 如果主引擎失败，尝试备用引擎\n        if not html or len(html) < 500:\n            logger.warning(f\"Primary engine failed, trying fallback...\")\n            \n            if used_engine != 'jina' and self.jina_engine.api_key:\n                html = self.jina_engine.crawl(url)\n                used_engine = 'jina'\n            \n            if not html and used_engine != 'playwright' and sync_playwright:\n                html = self.playwright_engine.crawl(url)\n                used_engine = 'playwright'\n        \n        if not html:\n            logger.error(f\"All engines failed for {url}\")\n            return Article(\n                title=\"Crawl Failed\",\n                content=f\"Failed to crawl {url}\",\n                url=url,\n                engine_used=\"none\",\n                quality_score=0.0\n            )\n        \n        # 缓存\n        if self.use_cache:\n            self.cache.set(url, html)\n        \n        # 提取内容\n        article = ContentExtractor.extract(html, url, config)\n        article.url = url\n        article.source = urlparse(url).netloc\n        article.engine_used = used_engine\n        article.quality_score = self._evaluate_quality(article)\n        \n        # 质量检查：如果质量太低且没用过 Jina，尝试用 Jina\n        if article.quality_score < 0.3 and used_engine != 'jina' and self.jina_engine.api_key:\n            logger.info(f\"Low quality ({article.quality_score:.2f}), retrying with Jina...\")\n            jina_html = self.jina_engine.crawl(url)\n            if jina_html:\n                jina_article = ContentExtractor.extract(jina_html, url, config)\n                jina_article.quality_score = self._evaluate_quality(jina_article)\n                \n                if jina_article.quality_score > article.quality_score:\n                    article = jina_article\n                    article.engine_used = 'jina'\n        \n        return article\n    \n    def crawl_batch(\n        self,\n        urls: List[str],\n        engine: Optional[str] = None,\n        delay: float = 1.0\n    ) -> List[Article]:\n        \"\"\"\n        批量爬取\n        \n        Args:\n            urls: URL 列表\n            engine: 爬取引擎\n            delay: 请求间隔（秒）\n            \n        Returns:\n            Article 列表\n        \"\"\"\n        articles = []\n        \n        for i, url in enumerate(urls):\n            logger.info(f\"Crawling {i+1}/{len(urls)}: {url}\")\n            \n            try:\n                article = self.crawl(url, engine=engine)\n                articles.append(article)\n            except Exception as e:\n                logger.error(f\"Failed to crawl {url}: {e}\")\n                articles.append(Article(\n                    title=\"Crawl Failed\",\n                    content=str(e),\n                    url=url,\n                    quality_score=0.0\n                ))\n            \n            if delay > 0 and i < len(urls) - 1:\n                time.sleep(delay)\n        \n        return articles\n\n\n# ============ 便捷函数 ============\n\n# 全局爬虫实例\n_crawler: Optional[EnhancedCrawler] = None\n\n\ndef get_crawler() -> EnhancedCrawler:\n    \"\"\"获取全局爬虫实例\"\"\"\n    global _crawler\n    if _crawler is None:\n        _crawler = EnhancedCrawler()\n    return _crawler\n\n\ndef crawl_url(url: str, engine: str = 'auto') -> Article:\n    \"\"\"便捷函数：爬取单个 URL\"\"\"\n    return get_crawler().crawl(url, engine=engine)\n\n\ndef crawl_urls(urls: List[str], engine: str = 'auto') -> List[Article]:\n    \"\"\"便捷函数：批量爬取\"\"\"\n    return get_crawler().crawl_batch(urls, engine=engine)\n\n\n# ============ 测试 ============\n\nif __name__ == \"__main__\":\n    logging.basicConfig(level=logging.INFO)\n    \n    # 测试爬取\n    test_urls = [\n        \"https://finance.sina.com.cn/roll/c/56592.shtml\",\n    ]\n    \n    crawler = EnhancedCrawler(use_cache=True)\n    \n    for url in test_urls:\n        print(f\"\\n{'='*60}\")\n        print(f\"Crawling: {url}\")\n        \n        article = crawler.crawl(url, engine='auto')\n        \n        print(f\"Title: {article.title}\")\n        print(f\"Engine: {article.engine_used}\")\n        print(f\"Quality: {article.quality_score:.2f}\")\n        print(f\"Content length: {len(article.content)}\")\n        print(f\"Preview: {article.content[:200]}...\")\n\n"
  },
  {
    "path": "backend/app/tools/dynamic_crawler_example.py",
    "content": "\"\"\"\n动态网站爬虫示例 - 使用 Selenium\n适用于需要点击\"加载更多\"的网站\n\n依赖安装：\npip install selenium webdriver-manager\n\"\"\"\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom selenium import webdriver\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.chrome.service import Service\nfrom webdriver_manager.chrome import ChromeDriverManager\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass DynamicCrawlerExample(BaseCrawler):\n    \"\"\"\n    动态网站爬虫示例\n    支持点击\"加载更多\"按钮\n    \"\"\"\n    \n    BASE_URL = \"https://www.eeo.com.cn/\"\n    STOCK_URL = \"https://www.eeo.com.cn/jg/jinrong/zhengquan/\"\n    SOURCE_NAME = \"eeo_dynamic\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"eeo_dynamic_crawler\",\n            description=\"Crawl EEO with dynamic loading support\"\n        )\n        self.driver = None\n    \n    def _init_driver(self):\n        \"\"\"初始化 Selenium WebDriver\"\"\"\n        if self.driver:\n            return\n        \n        chrome_options = Options()\n        chrome_options.add_argument('--headless')  # 无头模式\n        chrome_options.add_argument('--no-sandbox')\n        chrome_options.add_argument('--disable-dev-shm-usage')\n        chrome_options.add_argument(f'user-agent={self.user_agent}')\n        \n        service = Service(ChromeDriverManager().install())\n        self.driver = webdriver.Chrome(service=service, options=chrome_options)\n        logger.info(\"Selenium WebDriver initialized\")\n    \n    def _close_driver(self):\n        \"\"\"关闭 WebDriver\"\"\"\n        if self.driver:\n            self.driver.quit()\n            self.driver = None\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取新闻（支持动态加载）\n        \n        Args:\n            start_page: 起始页（对于点击加载更多的网站，这个参数表示点击次数）\n            end_page: 结束页\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            self._init_driver()\n            page_news = self._crawl_with_selenium()\n            news_list.extend(page_news)\n            logger.info(f\"Crawled EEO (dynamic), got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling EEO (dynamic): {e}\")\n        finally:\n            self._close_driver()\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_with_selenium(self) -> List[NewsItem]:\n        \"\"\"使用 Selenium 爬取动态加载的内容\"\"\"\n        news_items = []\n        \n        try:\n            # 1. 访问页面\n            self.driver.get(self.STOCK_URL)\n            logger.info(f\"Loaded page: {self.STOCK_URL}\")\n            \n            # 2. 等待页面加载\n            WebDriverWait(self.driver, 10).until(\n                EC.presence_of_element_located((By.TAG_NAME, \"body\"))\n            )\n            \n            # 3. 尝试点击\"加载更多\"按钮（如果存在）\n            click_count = 0\n            max_clicks = 3  # 最多点击3次\"加载更多\"\n            \n            while click_count < max_clicks:\n                try:\n                    # 查找\"加载更多\"按钮（根据实际页面调整选择器）\n                    load_more_button = self.driver.find_element(\n                        By.XPATH, \n                        \"//button[contains(text(), '加载更多')] | //div[contains(text(), '点击加载更多')]\"\n                    )\n                    \n                    # 滚动到按钮位置\n                    self.driver.execute_script(\"arguments[0].scrollIntoView();\", load_more_button)\n                    \n                    # 点击按钮\n                    load_more_button.click()\n                    click_count += 1\n                    logger.info(f\"Clicked 'Load More' button {click_count} times\")\n                    \n                    # 等待新内容加载\n                    import time\n                    time.sleep(2)\n                    \n                except Exception as e:\n                    logger.debug(f\"No more 'Load More' button or click failed: {e}\")\n                    break\n            \n            # 4. 提取所有新闻链接\n            news_links = self._extract_news_links_from_selenium()\n            logger.info(f\"Found {len(news_links)} news links\")\n            \n            # 5. 爬取每条新闻的详情\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error in Selenium crawling: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links_from_selenium(self) -> List[dict]:\n        \"\"\"从 Selenium 页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        try:\n            # 查找所有新闻链接（根据实际页面结构调整选择器）\n            link_elements = self.driver.find_elements(By.CSS_SELECTOR, \"a[href*='/article/']\")\n            \n            for element in link_elements:\n                try:\n                    href = element.get_attribute('href')\n                    title = element.text.strip()\n                    \n                    if href and title and href not in [n['url'] for n in news_links]:\n                        news_links.append({'url': href, 'title': title})\n                except Exception as e:\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error extracting links: {e}\")\n        \n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情（使用传统 requests 方式）\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            soup = self._parse_html(response.text)\n            \n            # 提取正文（简化示例）\n            content_div = soup.find('div', class_='article-content')\n            if content_div:\n                content = content_div.get_text(strip=True)\n            else:\n                content = \"\"\n            \n            if not content:\n                return None\n            \n            return NewsItem(\n                title=title,\n                content=self._clean_text(content),\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=datetime.now(),\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n\n\n# 使用示例\nif __name__ == \"__main__\":\n    crawler = DynamicCrawlerExample()\n    news = crawler.crawl()\n    print(f\"Crawled {len(news)} news items\")\n    for item in news[:5]:\n        print(f\"- {item.title}\")\n\n"
  },
  {
    "path": "backend/app/tools/eastmoney_crawler.py",
    "content": "\"\"\"\n东方财富爬虫工具\n目标URL: https://stock.eastmoney.com/\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass EastmoneyCrawlerTool(BaseCrawler):\n    \"\"\"\n    东方财富爬虫\n    主要爬取股市新闻\n    \"\"\"\n    \n    BASE_URL = \"https://stock.eastmoney.com/\"\n    STOCK_URL = \"https://stock.eastmoney.com/news/\"\n    SOURCE_NAME = \"eastmoney\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"eastmoney_crawler\",\n            description=\"Crawl financial news from East Money (eastmoney.com)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取东方财富新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled Eastmoney, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling Eastmoney: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"爬取单页新闻\"\"\"\n        news_items = []\n        \n        try:\n            # 尝试爬取新闻栏目或主页\n            try:\n                response = self._fetch_page(self.STOCK_URL)\n            except:\n                response = self._fetch_page(self.BASE_URL)\n            \n            # 东方财富编码处理\n            if response.encoding == 'ISO-8859-1' or not response.encoding:\n                response.encoding = 'utf-8'\n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links\")\n            \n            # 限制爬取数量\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接\n        all_links = soup.find_all('a', href=True)\n        \n        # 东方财富新闻URL模式（扩展更多模式）\n        eastmoney_patterns = [\n            '/news/',             # 新闻频道\n            '/stock/',            # 股票频道\n            '/a/',                # 文章\n            '/article/',          # 文章\n            '.html',              # HTML页面\n            '/guba/',             # 股吧\n        ]\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 检查是否匹配东方财富URL模式\n            is_eastmoney_url = False\n            \n            # 方式1: 检查是否包含eastmoney.com域名\n            if 'eastmoney.com' in href or 'eastmoney.cn' in href:\n                for pattern in eastmoney_patterns:\n                    if pattern in href:\n                        is_eastmoney_url = True\n                        break\n            \n            # 方式2: 相对路径且匹配模式\n            if not is_eastmoney_url and href.startswith('/'):\n                for pattern in eastmoney_patterns:\n                    if pattern in href:\n                        is_eastmoney_url = True\n                        break\n            \n            # 方式3: 检查data属性或class中包含新闻标识\n            if not is_eastmoney_url:\n                link_class = link.get('class', [])\n                if isinstance(link_class, list):\n                    link_class_str = ' '.join(link_class)\n                else:\n                    link_class_str = str(link_class)\n                if any(kw in link_class_str.lower() for kw in ['news', 'article', 'item', 'title']):\n                    if any(pattern in href for pattern in ['/a/', '/news/', '.html']):\n                        is_eastmoney_url = True\n            \n            if is_eastmoney_url and title and len(title.strip()) > 5:\n                # 确保是完整URL\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    # 判断是stock还是www域名\n                    if '/stock/' in href or '/guba/' in href:\n                        href = 'https://stock.eastmoney.com' + href\n                    else:\n                        href = 'https://www.eastmoney.com' + href\n                elif not href.startswith('http'):\n                    href = 'https://stock.eastmoney.com/' + href.lstrip('/')\n                \n                # 过滤掉明显不是新闻的链接\n                if any(skip in href.lower() for skip in ['javascript:', 'mailto:', '#', 'void(0)', '/guba/']):\n                    continue\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title.strip()})\n        \n        logger.debug(f\"Eastmoney: Found {len(news_links)} potential news links\")\n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'Body'},\n            {'id': 'ContentBody'},\n            {'class': 'article-content'},\n            {'class': 'newsContent'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content:\n                        return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('div', {'class': re.compile(r'time|date')})\n            if not time_elem:\n                time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('div', {'class': re.compile(r'author|source')})\n            if not author_elem:\n                author_elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/app/tools/eeo_crawler.py",
    "content": "\"\"\"\n经济观察网爬虫工具\n目标URL: https://www.eeo.com.cn/jg/jinrong/zhengquan/\n\"\"\"\nimport re\nimport json\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass EeoCrawlerTool(BaseCrawler):\n    \"\"\"\n    经济观察网爬虫\n    主要爬取证券栏目\n    使用官方API接口\n    \"\"\"\n    \n    BASE_URL = \"https://www.eeo.com.cn/\"\n    # 证券栏目URL（用于获取uuid）\n    STOCK_URL = \"https://www.eeo.com.cn/jg/jinrong/zhengquan/\"\n    # API接口URL\n    API_URL = \"https://app.eeo.com.cn/\"\n    SOURCE_NAME = \"eeo\"\n    # 证券频道的UUID（通过访问页面获取）\n    CHANNEL_UUID = \"9905934f8ec548ddae87652dbb9eebc6\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"eeo_crawler\",\n            description=\"Crawl financial news from Economic Observer (eeo.com.cn)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取经济观察网新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled EEO, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling EEO: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _fetch_api_news(self, page: int = 0, prev_uuid: str = \"\", prev_publish_date: str = \"\") -> List[dict]:\n        \"\"\"\n        通过API获取新闻列表\n        \n        Args:\n            page: 页码（从0开始）\n            prev_uuid: 上一条新闻的UUID（用于翻页）\n            prev_publish_date: 上一条新闻的发布时间（用于翻页）\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        try:\n            # 构建API参数\n            params = {\n                \"app\": \"article\",\n                \"controller\": \"index\",\n                \"action\": \"getMoreArticle\",\n                \"uuid\": self.CHANNEL_UUID,\n                \"page\": page,\n                \"pageSize\": 20,  # 每页20条\n                \"prevUuid\": prev_uuid,\n                \"prevPublishDate\": prev_publish_date,\n            }\n            \n            # 添加必要的请求头\n            headers = {\n                \"User-Agent\": self.user_agent,\n                \"Referer\": self.STOCK_URL,\n                \"Accept\": \"*/*\",\n            }\n            \n            response = self.session.get(\n                self.API_URL,\n                params=params,\n                headers=headers,\n                timeout=self.timeout\n            )\n            response.raise_for_status()\n            \n            # 处理JSONP响应\n            # 响应格式可能是: jQuery11130...callback({\"code\":200,\"data\":[...]})\n            # 或者直接是JSON: {\"code\":200,\"data\":[...]}\n            content = response.text.strip()\n            logger.debug(f\"[EEO] API response preview (first 300 chars): {content[:300]}\")\n            \n            # 尝试1: 如果是JSONP格式，提取JSON部分\n            json_match = re.search(r'\\((.*)\\)$', content)\n            if json_match:\n                try:\n                    json_str = json_match.group(1)\n                    data = json.loads(json_str)\n                    # 支持两种格式：status==1 或 code==200\n                    if (data.get('status') == 1 or data.get('code') == 200) and 'data' in data:\n                        logger.info(f\"[EEO] Successfully parsed JSONP, found {len(data['data'])} items\")\n                        return data['data']\n                except json.JSONDecodeError as e:\n                    logger.debug(f\"[EEO] JSONP parse failed: {e}\")\n                    pass\n            \n            # 尝试2: 直接解析JSON\n            try:\n                data = json.loads(content)\n                if isinstance(data, dict):\n                    # 支持两种格式：status==1 或 code==200\n                    if (data.get('status') == 1 or data.get('code') == 200) and 'data' in data:\n                        logger.info(f\"[EEO] Successfully parsed JSON, found {len(data['data'])} items\")\n                        return data['data']\n                elif isinstance(data, list):\n                    logger.info(f\"[EEO] API returned list with {len(data)} items\")\n                    return data\n            except json.JSONDecodeError as e:\n                logger.debug(f\"[EEO] JSON parse failed: {e}\")\n                pass\n            \n            # 尝试3: 查找JSON对象（更宽松的匹配）\n            json_obj_match = re.search(r'\\{[^{}]*\"(status|code)\"[^{}]*\"data\"[^{}]*\\}', content, re.DOTALL)\n            if json_obj_match:\n                try:\n                    data = json.loads(json_obj_match.group(0))\n                    # 支持两种格式：status==1 或 code==200\n                    if (data.get('status') == 1 or data.get('code') == 200) and 'data' in data:\n                        logger.info(f\"[EEO] Successfully parsed with regex, found {len(data['data'])} items\")\n                        return data['data']\n                except json.JSONDecodeError as e:\n                    logger.debug(f\"[EEO] Regex parse failed: {e}\")\n                    pass\n            \n            logger.warning(f\"Failed to parse API response, content preview: {content[:200]}\")\n            return []\n            \n        except Exception as e:\n            logger.error(f\"API fetch failed: {e}\")\n            return []\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"\n        爬取单页新闻（使用API）\n        \n        Args:\n            page: 页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_items = []\n        \n        try:\n            # 使用API获取新闻列表\n            api_news_list = self._fetch_api_news(page=0)  # 第一页\n            \n            if not api_news_list:\n                logger.warning(\"No news from API, fallback to HTML parsing\")\n                return self._crawl_page_html()\n            \n            logger.info(f\"Fetched {len(api_news_list)} news from API\")\n            \n            # 解析每条新闻\n            for news_data in api_news_list[:20]:  # 限制20条\n                try:\n                    news_item = self._parse_api_news_item(news_data)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to parse news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _parse_api_news_item(self, news_data: dict) -> Optional[NewsItem]:\n        \"\"\"\n        解析API返回的新闻数据\n        \n        Args:\n            news_data: API返回的单条新闻数据\n            \n        Returns:\n            NewsItem对象\n        \"\"\"\n        try:\n            # 提取基本信息\n            title = news_data.get('title', '').strip()\n            url = news_data.get('url', '')\n            \n            # 确保URL是完整的\n            if url and not url.startswith('http'):\n                url = 'https://www.eeo.com.cn' + url\n            \n            if not title or not url:\n                return None\n            \n            # 提取发布时间（API返回的字段可能是 published 或 publishDate）\n            publish_time_str = news_data.get('published', '') or news_data.get('publishDate', '')\n            publish_time = self._parse_time_string(publish_time_str) if publish_time_str else datetime.now()\n            \n            # 提取作者\n            author = news_data.get('author', '')\n            \n            # 获取新闻详情（内容和原始HTML）\n            content, raw_html = self._fetch_news_content(url)\n            \n            if not content:\n                return None\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author if author else None,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to parse API news item: {e}\")\n            return None\n    \n    def _fetch_news_content(self, url: str) -> tuple:\n        \"\"\"\n        获取新闻详情页内容\n        \n        Args:\n            url: 新闻详情页URL\n            \n        Returns:\n            (新闻正文, 原始HTML)\n        \"\"\"\n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            return content, raw_html\n            \n        except Exception as e:\n            logger.warning(f\"Failed to fetch content from {url}: {e}\")\n            return \"\", \"\"\n    \n    def _crawl_page_html(self) -> List[NewsItem]:\n        \"\"\"\n        备用方案：直接解析HTML页面（只能获取首屏内容）\n        \"\"\"\n        news_items = []\n        \n        try:\n            response = self._fetch_page(self.STOCK_URL)\n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links from HTML\")\n            \n            # 限制爬取数量\n            max_news = 10\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling HTML page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接\n        all_links = soup.find_all('a', href=True)\n        \n        # 经济观察网新闻URL模式（扩展更多模式）\n        eeo_patterns = [\n            r'/\\d{4}/',           # 日期路径 /2024/\n            '.shtml',              # 静态HTML\n            '/jg/',                # 经济观察\n            '/jinrong/',           # 金融\n            '/zhengquan/',         # 证券\n            '/article/',           # 文章\n        ]\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 检查是否匹配经济观察网URL模式\n            is_eeo_url = False\n            \n            # 方式1: 检查URL模式\n            for pattern in eeo_patterns:\n                if re.search(pattern, href):\n                    is_eeo_url = True\n                    break\n            \n            # 方式2: 检查是否包含eeo.com.cn域名\n            if 'eeo.com.cn' in href:\n                is_eeo_url = True\n            \n            # 方式3: 检查链接的class或data属性\n            if not is_eeo_url:\n                link_class = link.get('class', [])\n                if isinstance(link_class, list):\n                    link_class_str = ' '.join(link_class)\n                else:\n                    link_class_str = str(link_class)\n                if any(kw in link_class_str.lower() for kw in ['news', 'article', 'item', 'title', 'list']):\n                    if href.startswith('/') or 'eeo.com.cn' in href:\n                        is_eeo_url = True\n            \n            if is_eeo_url and title and len(title.strip()) > 5:\n                # 确保是完整URL\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://www.eeo.com.cn' + href\n                elif not href.startswith('http'):\n                    href = 'https://www.eeo.com.cn/' + href.lstrip('/')\n                \n                # 过滤掉明显不是新闻的链接\n                if any(skip in href.lower() for skip in ['javascript:', 'mailto:', '#', 'void(0)', '/tag/', '/author/']):\n                    continue\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title.strip()})\n        \n        logger.debug(f\"EEO: Found {len(news_links)} potential news links from HTML\")\n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情（HTML方式）\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'article-content'},\n            {'class': 'content'},\n            {'id': 'articleContent'},\n            {'class': 'news-content'},\n            {'class': 'text_content'},  # 常见的正文类名\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find(['div', 'article'], selector)\n            if content_div:\n                # 1. 移除明确的噪音元素\n                for tag in content_div.find_all(['script', 'style', 'iframe', 'ins', 'select', 'input', 'button', 'form']):\n                    tag.decompose()\n                \n                # 2. 移除特定的广告和推荐块\n                for ad in content_div.find_all(class_=re.compile(r'ad|banner|share|otherContent|recommend|app-guide|qrcode', re.I)):\n                    ad.decompose()\n\n                # 3. 获取所有文本，使用换行符分隔\n                # 关键修改：使用 get_text 而不是 find_all('p')\n                full_text = content_div.get_text(separator='\\n', strip=True)\n                \n                # 4. 按行分割并清洗\n                lines = full_text.split('\\n')\n                article_parts = []\n                \n                for line in lines:\n                    line = line.strip()\n                    if not line:\n                        continue\n                        \n                    # 5. 简单的长度过滤，防止页码等噪音\n                    if len(line) < 2:\n                        continue\n                        \n                    article_parts.append(line)\n                \n                if article_parts:\n                    content = '\\n'.join(article_parts)\n                    return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n"
  },
  {
    "path": "backend/app/tools/interactive_crawler.py",
    "content": "\"\"\"\n交互式网页爬虫\n使用 requests + BeautifulSoup 进行网页爬取\n特别用于搜索结果补充，当 BochaAI 结果不足时使用\n\n注意：主要搜索引擎（Bing、百度）都有反爬机制，本模块已做相应优化：\n1. 模拟真实浏览器请求头\n2. 检测验证页面并自动降级\n3. 多引擎轮换备选\n\"\"\"\nimport logging\nimport re\nimport time\nimport random\nfrom typing import List, Dict, Any, Optional\nfrom urllib.parse import quote_plus, urljoin, urlparse\n\nimport requests\nfrom bs4 import BeautifulSoup\n\nlogger = logging.getLogger(__name__)\n\n# 更完善的 User-Agent，模拟最新的 Chrome 浏览器\nUSER_AGENTS = [\n    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',\n    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',\n    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36',\n    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',\n]\n\n# 验证页面关键词（用于检测被拦截）\nCAPTCHA_KEYWORDS = [\n    '确认您是真人', '人机验证', 'captcha', 'verify you are human',\n    '验证码', '请完成验证', '安全验证', '异常访问', '请输入验证码',\n    '最后一步', '请解决以下难题'\n]\n\n\nclass InteractiveCrawler:\n    \"\"\"交互式网页爬虫（纯 requests 实现）\"\"\"\n    \n    def __init__(self, timeout: int = 15):\n        \"\"\"\n        初始化爬虫\n        \n        Args:\n            timeout: 请求超时时间（秒）\n        \"\"\"\n        self.timeout = timeout\n        self.session = requests.Session()\n        self._user_agent = random.choice(USER_AGENTS)\n        # 更完善的请求头，模拟真实浏览器\n        self.session.headers.update({\n            'User-Agent': self._user_agent,\n            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',\n            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',\n            'Accept-Encoding': 'gzip, deflate, br',\n            'Connection': 'keep-alive',\n            'Upgrade-Insecure-Requests': '1',\n            'Sec-Fetch-Dest': 'document',\n            'Sec-Fetch-Mode': 'navigate',\n            'Sec-Fetch-Site': 'none',\n            'Sec-Fetch-User': '?1',\n            'Cache-Control': 'max-age=0',\n            'sec-ch-ua': '\"Google Chrome\";v=\"131\", \"Chromium\";v=\"131\", \"Not_A Brand\";v=\"24\"',\n            'sec-ch-ua-mobile': '?0',\n            'sec-ch-ua-platform': '\"macOS\"',\n        })\n    \n    def _is_captcha_page(self, html_content: str, soup: BeautifulSoup = None) -> bool:\n        \"\"\"\n        检测页面是否为验证码/人机验证页面\n        \n        Args:\n            html_content: HTML 原始内容\n            soup: 已解析的 BeautifulSoup 对象\n            \n        Returns:\n            True 如果是验证页面\n        \"\"\"\n        text_to_check = html_content.lower()\n        if soup:\n            text_to_check = soup.get_text().lower()\n        \n        for keyword in CAPTCHA_KEYWORDS:\n            if keyword.lower() in text_to_check:\n                return True\n        return False\n    \n    def search_on_bing(\n        self,\n        query: str,\n        num_results: int = 10\n    ) -> List[Dict[str, str]]:\n        \"\"\"\n        在 Bing 上搜索并获取结果\n        \n        Args:\n            query: 搜索关键词\n            num_results: 获取的结果数量\n            \n        Returns:\n            搜索结果列表 [{\"url\": \"...\", \"title\": \"...\", \"snippet\": \"...\"}]\n        \"\"\"\n        results = []\n        \n        try:\n            # 使用国际版 Bing，中国版有更严格的反爬\n            search_url = f\"https://www.bing.com/search?q={quote_plus(query)}&count={num_results}\"\n            \n            logger.info(f\"🔍 Bing 搜索: {query}\")\n            logger.debug(f\"搜索URL: {search_url}\")\n            \n            response = self.session.get(search_url, timeout=self.timeout)\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, 'html.parser')\n            \n            # ========== 检测验证码页面 ==========\n            if self._is_captcha_page(response.text, soup):\n                logger.warning(\"⚠️ Bing 触发人机验证，跳过此引擎\")\n                return []  # 返回空，让调用者使用其他引擎\n            \n            # ========== 调试：打印找到的元素 ==========\n            # 尝试多种选择器\n            b_algo_items = soup.select('.b_algo')\n            logger.info(f\"📊 Bing HTML解析: .b_algo={len(b_algo_items)}个\")\n            \n            # 如果 .b_algo 没找到，尝试其他选择器\n            if not b_algo_items:\n                # 尝试查找所有包含链接的 li 元素\n                li_items = soup.select('#b_results > li')\n                logger.info(f\"📊 尝试 #b_results > li: {len(li_items)}个\")\n                \n                # 打印页面中所有链接供调试\n                all_links = soup.select('a[href^=\"http\"]')\n                logger.info(f\"📊 页面总链接数: {len(all_links)}个\")\n                \n                # 打印前10个链接\n                for i, link in enumerate(all_links[:15]):\n                    href = link.get('href', '')\n                    text = link.get_text(strip=True)[:50]\n                    # 过滤掉 Bing 内部链接\n                    if 'bing.com' not in href and 'microsoft.com' not in href:\n                        logger.info(f\"  链接{i+1}: {text} -> {href[:80]}\")\n            \n            # ========== 提取搜索结果 ==========\n            # 方法1: 标准 .b_algo 选择器\n            for result in b_algo_items[:num_results]:\n                try:\n                    # 提取标题和链接\n                    title_elem = result.select_one('h2 a')\n                    if not title_elem:\n                        title_elem = result.select_one('a')  # 备选\n                    if not title_elem:\n                        continue\n                    \n                    title = title_elem.get_text(strip=True)\n                    url = title_elem.get('href', '')\n                    \n                    # 提取摘要\n                    snippet_elem = result.select_one('.b_caption p, p')\n                    snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''\n                    \n                    if url and title and 'bing.com' not in url:\n                        results.append({\n                            \"url\": url,\n                            \"title\": title,\n                            \"snippet\": snippet[:300],\n                            \"source\": \"bing\"\n                        })\n                        logger.debug(f\"  ✅ 提取: {title[:40]} -> {url[:60]}\")\n                        \n                except Exception as e:\n                    logger.debug(f\"解析 Bing 结果失败: {e}\")\n                    continue\n            \n            # 方法2: 如果 .b_algo 没有结果，可能是验证页面的残留链接，不再使用备选提取\n            if not results and b_algo_items:\n                logger.info(\"⚠️ Bing 无有效结果\")\n            \n            logger.info(f\"✅ Bing 搜索完成，获得 {len(results)} 条结果\")\n            \n        except requests.exceptions.Timeout:\n            logger.warning(f\"⚠️ Bing 搜索超时: {query}\")\n        except requests.exceptions.RequestException as e:\n            logger.warning(f\"⚠️ Bing 搜索请求失败: {e}\")\n        except Exception as e:\n            logger.error(f\"❌ Bing 搜索失败: {e}\")\n        \n        return results\n    \n    def search_on_baidu(\n        self,\n        query: str,\n        num_results: int = 10\n    ) -> List[Dict[str, str]]:\n        \"\"\"\n        在百度上搜索并获取结果（百度对简单爬虫相对友好）\n        \n        Args:\n            query: 搜索关键词\n            num_results: 获取的结果数量\n            \n        Returns:\n            搜索结果列表\n        \"\"\"\n        results = []\n        \n        try:\n            # 百度搜索 URL\n            search_url = f\"https://www.baidu.com/s?wd={quote_plus(query)}&rn={num_results}\"\n            \n            logger.info(f\"🔍 百度搜索: {query}\")\n            logger.debug(f\"搜索URL: {search_url}\")\n            \n            # 百度需要特定的请求头\n            headers = {\n                'User-Agent': self._user_agent,\n                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',\n                'Accept-Language': 'zh-CN,zh;q=0.9',\n                'Accept-Encoding': 'gzip, deflate',\n                'Referer': 'https://www.baidu.com/',\n                'Connection': 'keep-alive',\n            }\n            \n            response = self.session.get(search_url, headers=headers, timeout=self.timeout)\n            response.encoding = 'utf-8'\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, 'html.parser')\n            \n            # 检测验证码\n            if self._is_captcha_page(response.text, soup):\n                logger.warning(\"⚠️ 百度触发验证，跳过此引擎\")\n                return []\n            \n            # 百度搜索结果选择器（多种尝试）\n            result_items = soup.select('.result.c-container, .c-container, div[class*=\"result\"]')\n            logger.info(f\"📊 百度HTML解析: 结果容器={len(result_items)}个\")\n            \n            for result in result_items[:num_results]:\n                try:\n                    # 提取标题和链接\n                    title_elem = result.select_one('h3 a, .t a, a[href]')\n                    if not title_elem:\n                        continue\n                    \n                    title = title_elem.get_text(strip=True)\n                    url = title_elem.get('href', '')\n                    \n                    # 百度使用跳转链接，需要提取真实URL\n                    # 但通常跳转链接也能用\n                    \n                    # 提取摘要\n                    snippet_elem = result.select_one('.c-abstract, .c-span-last, .content-right_8Zs40')\n                    snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''\n                    \n                    if url and title and 'baidu.com' not in url:\n                        results.append({\n                            \"url\": url,\n                            \"title\": title,\n                            \"snippet\": snippet[:300],\n                            \"source\": \"baidu\"\n                        })\n                        logger.debug(f\"  ✅ 提取: {title[:40]}\")\n                        \n                except Exception as e:\n                    logger.debug(f\"解析百度结果失败: {e}\")\n                    continue\n            \n            # 备选方法：从所有标题链接提取\n            if not results:\n                logger.info(\"⚠️ 百度标准选择器无结果，尝试提取 h3 链接...\")\n                h3_links = soup.select('h3 a')\n                for link in h3_links[:num_results]:\n                    href = link.get('href', '')\n                    text = link.get_text(strip=True)\n                    \n                    if not href or not text or len(text) < 3:\n                        continue\n                    if href in [r['url'] for r in results]:\n                        continue\n                    \n                    results.append({\n                        \"url\": href,\n                        \"title\": text[:100],\n                        \"snippet\": \"\",\n                        \"source\": \"baidu\"\n                    })\n                    \n                    if len(results) >= num_results:\n                        break\n            \n            logger.info(f\"✅ 百度搜索完成，获得 {len(results)} 条结果\")\n            \n        except Exception as e:\n            logger.warning(f\"⚠️ 百度搜索失败: {e}\")\n        \n        return results\n    \n    def search_on_baidu_news(\n        self,\n        query: str,\n        num_results: int = 10\n    ) -> List[Dict[str, str]]:\n        \"\"\"\n        在百度新闻搜索（news.baidu.com）获取新闻结果\n        \n        使用 news.baidu.com 入口，返回的 URL 是真实的第三方新闻链接，\n        不是百度跳转链接，避免乱码问题。\n        \n        Args:\n            query: 搜索关键词\n            num_results: 获取的结果数量\n            \n        Returns:\n            搜索结果列表\n        \"\"\"\n        results = []\n        \n        try:\n            # 使用百度新闻入口（news.baidu.com），返回真实的第三方 URL\n            search_url = f\"https://news.baidu.com/ns?word={quote_plus(query)}&tn=news&from=news&cl=2&rn={num_results}&ct=1\"\n            \n            logger.info(f\"🔍 百度新闻搜索: {query}\")\n            logger.debug(f\"搜索URL: {search_url}\")\n            \n            # 百度需要特定的请求头\n            headers = {\n                'User-Agent': self._user_agent,\n                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',\n                'Accept-Language': 'zh-CN,zh;q=0.9',\n                'Accept-Encoding': 'gzip, deflate',\n                'Referer': 'https://news.baidu.com/',\n                'Connection': 'keep-alive',\n            }\n            \n            response = self.session.get(search_url, headers=headers, timeout=self.timeout, allow_redirects=True)\n            response.encoding = 'utf-8'\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, 'html.parser')\n            \n            # 检测验证码\n            if self._is_captcha_page(response.text, soup):\n                logger.warning(\"⚠️ 百度新闻触发验证，跳过\")\n                return []\n            \n            # 百度新闻搜索结果选择器\n            # 新闻标题在 h3 > a 中，链接是真实的第三方 URL\n            news_h3_links = soup.select('h3 a[href^=\"http\"]')\n            logger.info(f\"📊 百度新闻HTML解析: h3链接={len(news_h3_links)}个\")\n            \n            for link in news_h3_links[:num_results * 2]:  # 多取一些，后面过滤\n                try:\n                    url = link.get('href', '')\n                    title = link.get_text(strip=True)\n                    \n                    # 清理标题（去掉\"标题：\"前缀）\n                    if title.startswith('标题：'):\n                        title = title[3:]\n                    \n                    # 过滤无效结果\n                    if not url or not title or len(title) < 5:\n                        continue\n                    # 过滤百度内部链接（但保留百家号 baijiahao.baidu.com）\n                    if 'baidu.com' in url and 'baijiahao.baidu.com' not in url:\n                        continue\n                    if url in [r['url'] for r in results]:\n                        continue  # 去重\n                    \n                    # 尝试找到父容器获取摘要\n                    parent = link.find_parent(['div', 'li'])\n                    snippet = ''\n                    news_source = ''\n                    publish_time = ''\n                    \n                    if parent:\n                        # 提取摘要（通常在 generic 或 p 元素中）\n                        snippet_elem = parent.select_one('[class*=\"summary\"], [class*=\"abstract\"], p')\n                        if snippet_elem:\n                            snippet = snippet_elem.get_text(strip=True)[:300]\n                        \n                        # 提取来源（通常在包含\"来源\"的链接中）\n                        source_links = parent.select('a')\n                        for src_link in source_links:\n                            src_text = src_link.get_text(strip=True)\n                            if src_text and src_text != title[:20] and len(src_text) < 20:\n                                # 可能是来源（如\"同花顺财经\"、\"新浪财经\"）\n                                if '新闻来源' in (src_link.get('aria-label', '') or ''):\n                                    news_source = src_text\n                                    break\n                                elif not news_source and not src_text.startswith('标题'):\n                                    news_source = src_text\n                    \n                    results.append({\n                        \"url\": url,\n                        \"title\": title,\n                        \"snippet\": snippet,\n                        \"source\": \"baidu_news\",\n                        \"news_source\": news_source  # 新闻来源（如\"同花顺财经\"）\n                    })\n                    logger.debug(f\"  ✅ 新闻: {title[:40]} | {news_source}\")\n                    \n                    if len(results) >= num_results:\n                        break\n                        \n                except Exception as e:\n                    logger.debug(f\"解析百度新闻结果失败: {e}\")\n                    continue\n            \n            logger.info(f\"✅ 百度新闻搜索完成，获得 {len(results)} 条新闻\")\n            \n        except Exception as e:\n            logger.warning(f\"⚠️ 百度新闻搜索失败: {e}\")\n        \n        return results\n    \n    def search_on_sogou(\n        self,\n        query: str,\n        num_results: int = 10\n    ) -> List[Dict[str, str]]:\n        \"\"\"\n        在搜狗上搜索并获取结果（备用搜索引擎）\n        \n        Args:\n            query: 搜索关键词\n            num_results: 获取的结果数量\n            \n        Returns:\n            搜索结果列表\n        \"\"\"\n        results = []\n        \n        try:\n            # 构建搜狗搜索 URL\n            search_url = f\"https://www.sogou.com/web?query={quote_plus(query)}\"\n            \n            logger.info(f\"🔍 搜狗搜索: {query}\")\n            logger.debug(f\"搜索URL: {search_url}\")\n            \n            response = self.session.get(search_url, timeout=self.timeout)\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, 'html.parser')\n            \n            # 检测验证码\n            if self._is_captcha_page(response.text, soup):\n                logger.warning(\"⚠️ 搜狗触发验证，跳过此引擎\")\n                return []\n            \n            # ========== 调试：打印找到的元素 ==========\n            vrwrap_items = soup.select('.vrwrap, .rb, .results .vrwrap')\n            logger.info(f\"📊 搜狗HTML解析: .vrwrap/.rb={len(vrwrap_items)}个\")\n            \n            # 搜狗搜索结果选择器\n            for result in vrwrap_items[:num_results]:\n                try:\n                    title_elem = result.select_one('h3 a, .vr-title a, a[href]')\n                    if not title_elem:\n                        continue\n                    \n                    title = title_elem.get_text(strip=True)\n                    url = title_elem.get('href', '')\n                    \n                    snippet_elem = result.select_one('.str_info, .str-text, p, .txt-info')\n                    snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''\n                    \n                    if url and title and 'sogou.com' not in url:\n                        results.append({\n                            \"url\": url,\n                            \"title\": title,\n                            \"snippet\": snippet[:300],\n                            \"source\": \"sogou\"\n                        })\n                        logger.debug(f\"  ✅ 提取: {title[:40]} -> {url[:60]}\")\n                        \n                except Exception as e:\n                    logger.debug(f\"解析搜狗结果失败: {e}\")\n                    continue\n            \n            # 备选方法：从页面链接提取\n            if not results:\n                logger.info(\"⚠️ 搜狗标准选择器无结果，尝试从页面链接提取...\")\n                all_links = soup.select('a[href^=\"http\"]')\n                for link in all_links[:num_results * 3]:\n                    href = link.get('href', '')\n                    text = link.get_text(strip=True)\n                    \n                    if not href or not text or len(text) < 5:\n                        continue\n                    if 'sogou.com' in href:\n                        continue\n                    if href in [r['url'] for r in results]:\n                        continue\n                    \n                    results.append({\n                        \"url\": href,\n                        \"title\": text[:100],\n                        \"snippet\": \"\",\n                        \"source\": \"sogou\"\n                    })\n                    \n                    if len(results) >= num_results:\n                        break\n            \n            logger.info(f\"✅ 搜狗搜索完成，获得 {len(results)} 条结果\")\n            \n        except Exception as e:\n            logger.warning(f\"⚠️ 搜狗搜索失败: {e}\")\n        \n        return results\n    \n    def search_on_360(\n        self,\n        query: str,\n        num_results: int = 10\n    ) -> List[Dict[str, str]]:\n        \"\"\"\n        在 360 搜索上搜索并获取结果\n        \n        Args:\n            query: 搜索关键词\n            num_results: 获取的结果数量\n            \n        Returns:\n            搜索结果列表\n        \"\"\"\n        results = []\n        \n        try:\n            # 构建 360 搜索 URL\n            search_url = f\"https://www.so.com/s?q={quote_plus(query)}\"\n            \n            logger.info(f\"🔍 360搜索: {query}\")\n            logger.debug(f\"搜索URL: {search_url}\")\n            \n            response = self.session.get(search_url, timeout=self.timeout)\n            response.raise_for_status()\n            \n            soup = BeautifulSoup(response.text, 'html.parser')\n            \n            # 检测验证码\n            if self._is_captcha_page(response.text, soup):\n                logger.warning(\"⚠️ 360触发验证，跳过此引擎\")\n                return []\n            \n            # ========== 调试：打印找到的元素 ==========\n            res_items = soup.select('.res-list, .result, li.res-list')\n            logger.info(f\"📊 360 HTML解析: .res-list/.result={len(res_items)}个\")\n            \n            # 360 搜索结果选择器\n            for result in res_items[:num_results]:\n                try:\n                    title_elem = result.select_one('h3 a, .res-title a, a[href]')\n                    if not title_elem:\n                        continue\n                    \n                    title = title_elem.get_text(strip=True)\n                    url = title_elem.get('href', '')\n                    \n                    snippet_elem = result.select_one('.res-desc, p.res-summary, p, .res-comm-con')\n                    snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''\n                    \n                    if url and title and 'so.com' not in url and '360.cn' not in url:\n                        results.append({\n                            \"url\": url,\n                            \"title\": title,\n                            \"snippet\": snippet[:300],\n                            \"source\": \"360\"\n                        })\n                        logger.debug(f\"  ✅ 提取: {title[:40]} -> {url[:60]}\")\n                        \n                except Exception as e:\n                    logger.debug(f\"解析 360 结果失败: {e}\")\n                    continue\n            \n            # 备选方法：从页面链接提取\n            if not results:\n                logger.info(\"⚠️ 360 标准选择器无结果，尝试从页面链接提取...\")\n                all_links = soup.select('a[href^=\"http\"]')\n                for link in all_links[:num_results * 3]:\n                    href = link.get('href', '')\n                    text = link.get_text(strip=True)\n                    \n                    if not href or not text or len(text) < 5:\n                        continue\n                    if 'so.com' in href or '360.cn' in href:\n                        continue\n                    if href in [r['url'] for r in results]:\n                        continue\n                    \n                    results.append({\n                        \"url\": href,\n                        \"title\": text[:100],\n                        \"snippet\": \"\",\n                        \"source\": \"360\"\n                    })\n                    \n                    if len(results) >= num_results:\n                        break\n            \n            logger.info(f\"✅ 360搜索完成，获得 {len(results)} 条结果\")\n            \n        except Exception as e:\n            logger.warning(f\"⚠️ 360搜索失败: {e}\")\n        \n        return results\n    \n    def interactive_search(\n        self,\n        query: str,\n        engines: List[str] = None,\n        num_results: int = 10,\n        search_type: str = \"news\",  # 新增参数：news（新闻）或 web（网页）\n        **kwargs  # 兼容旧接口\n    ) -> List[Dict[str, str]]:\n        \"\"\"\n        使用多个搜索引擎进行搜索\n        \n        Args:\n            query: 搜索关键词\n            engines: 搜索引擎列表 ['baidu_news', 'baidu', 'sogou', '360', 'bing']\n            num_results: 每个引擎的结果数量\n            search_type: 搜索类型 'news'（新闻优先）或 'web'（网页）\n            \n        Returns:\n            合并的搜索结果\n        \"\"\"\n        if engines is None:\n            if search_type == \"news\":\n                # 新闻搜索：优先使用百度资讯\n                engines = [\"baidu_news\", \"sogou\"]\n            else:\n                # 普通网页搜索\n                engines = [\"baidu\", \"sogou\"]\n        \n        all_results = []\n        engines_tried = []\n        \n        for engine in engines:\n            try:\n                engine_lower = engine.lower()\n                if engine_lower == \"baidu_news\":\n                    results = self.search_on_baidu_news(query, num_results)\n                elif engine_lower == \"baidu\":\n                    results = self.search_on_baidu(query, num_results)\n                elif engine_lower == \"bing\":\n                    results = self.search_on_bing(query, num_results)\n                elif engine_lower == \"sogou\":\n                    results = self.search_on_sogou(query, num_results)\n                elif engine_lower == \"360\":\n                    results = self.search_on_360(query, num_results)\n                else:\n                    logger.warning(f\"⚠️ 不支持的搜索引擎: {engine}\")\n                    continue\n                \n                if results:\n                    all_results.extend(results)\n                    engines_tried.append(engine_lower)\n                    logger.info(f\"✅ {engine} 返回 {len(results)} 条结果\")\n                else:\n                    logger.info(f\"⚠️ {engine} 无结果或被拦截\")\n                \n                # 搜索间隔，避免被封\n                if len(engines) > 1:\n                    time.sleep(random.uniform(0.8, 1.5))\n                    \n            except Exception as e:\n                logger.error(f\"❌ 使用 {engine} 搜索失败: {e}\")\n                continue\n        \n        # 如果所有引擎都失败了，尝试备用引擎\n        if not all_results:\n            backup_engines = [\"baidu_news\", \"360\", \"baidu\", \"sogou\"]\n            for backup in backup_engines:\n                if backup not in [e.lower() for e in engines]:\n                    logger.info(f\"🔄 尝试备用引擎: {backup}\")\n                    try:\n                        if backup == \"baidu_news\":\n                            results = self.search_on_baidu_news(query, num_results)\n                        elif backup == \"360\":\n                            results = self.search_on_360(query, num_results)\n                        elif backup == \"baidu\":\n                            results = self.search_on_baidu(query, num_results)\n                        elif backup == \"sogou\":\n                            results = self.search_on_sogou(query, num_results)\n                        \n                        if results:\n                            all_results.extend(results)\n                            engines_tried.append(backup)\n                            logger.info(f\"✅ 备用引擎 {backup} 返回 {len(results)} 条结果\")\n                            break\n                    except Exception as e:\n                        logger.warning(f\"备用引擎 {backup} 也失败: {e}\")\n                        continue\n        \n        # 去重\n        seen_urls = set()\n        unique_results = []\n        for r in all_results:\n            if r[\"url\"] not in seen_urls:\n                seen_urls.add(r[\"url\"])\n                unique_results.append(r)\n        \n        logger.info(f\"交互式搜索完成: {len(all_results)} -> {len(unique_results)} (去重后), 使用引擎: {engines_tried}\")\n        return unique_results\n    \n    def crawl_page(self, url: str) -> Optional[Dict[str, Any]]:\n        \"\"\"\n        爬取单个页面内容\n        \n        Args:\n            url: 页面 URL\n            \n        Returns:\n            {\"url\": \"...\", \"title\": \"...\", \"content\": \"...\", \"text\": \"...\", \"html\": \"...\"} 或 None\n        \"\"\"\n        try:\n            response = self.session.get(url, timeout=self.timeout)\n            response.encoding = response.apparent_encoding or 'utf-8'\n            \n            # 保存原始 HTML（清理 NUL 字符）\n            raw_html = response.text.replace('\\x00', '').replace('\\0', '')\n            \n            soup = BeautifulSoup(raw_html, 'html.parser')\n            \n            # 获取标题（在移除元素之前）\n            title = ''\n            title_elem = soup.find('title')\n            if title_elem:\n                title = title_elem.get_text(strip=True)\n            \n            # 尝试获取 h1 作为更好的标题\n            h1_elem = soup.find('h1')\n            if h1_elem:\n                h1_text = h1_elem.get_text(strip=True)\n                if h1_text and len(h1_text) > 5:\n                    title = h1_text\n            \n            # 移除无关元素（用于提取正文）\n            for elem in soup.select('script, style, iframe, nav, footer, header, aside, .ad, .advertisement, .comment, .sidebar'):\n                elem.decompose()\n            \n            # 获取主要内容\n            # 优先选择 article, main, .content 等\n            main_content = None\n            content_selectors = [\n                'article', 'main', '.content', '.post-content', '.article-content', \n                '#content', '.main-content', '.news-content', '.article-body',\n                '.entry-content', '.post-body', '[itemprop=\"articleBody\"]'\n            ]\n            for selector in content_selectors:\n                main_content = soup.select_one(selector)\n                if main_content:\n                    break\n            \n            if not main_content:\n                main_content = soup.find('body') or soup\n            \n            # 提取文本\n            text_content = main_content.get_text(separator='\\n', strip=True)\n            \n            # 清理文本\n            text_content = re.sub(r'\\n{3,}', '\\n\\n', text_content)\n            # 不再截断内容，保留完整正文（数据库字段应该支持长文本）\n            # text_content = text_content[:5000]  # 移除截断\n            \n            logger.debug(f\"📄 爬取完成: {title[:40]}... | 正文{len(text_content)}字符 | HTML{len(raw_html) if raw_html else 0}字符\")\n            \n            return {\n                \"url\": url,\n                \"title\": title,\n                \"content\": text_content,  # 完整正文\n                \"text\": text_content,  # 兼容字段\n                \"html\": raw_html if raw_html else None  # 完整原始 HTML\n            }\n            \n        except requests.exceptions.Timeout:\n            logger.warning(f\"⚠️ 爬取页面超时: {url[:60]}...\")\n        except Exception as e:\n            logger.warning(f\"⚠️ 爬取页面失败 {url[:60]}...: {e}\")\n        \n        return None\n    \n    def crawl_search_results(\n        self,\n        search_results: List[Dict[str, str]],\n        max_results: int = 5\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        爬取搜索结果中的页面内容\n        \n        Args:\n            search_results: 搜索结果列表\n            max_results: 最多爬取多少个页面\n            \n        Returns:\n            爬取结果列表 [{\"url\": \"...\", \"title\": \"...\", \"content\": \"...\"}]\n        \"\"\"\n        crawled = []\n        \n        for i, result in enumerate(search_results[:max_results]):\n            url = result.get(\"url\")\n            if not url:\n                continue\n            \n            logger.info(f\"📄 爬取页面 {i+1}/{min(max_results, len(search_results))}: {url[:60]}...\")\n            \n            page_data = self.crawl_page(url)\n            \n            if page_data and page_data.get(\"content\"):\n                page_data[\"snippet\"] = result.get(\"snippet\", \"\")\n                page_data[\"source\"] = result.get(\"source\", \"web\")\n                crawled.append(page_data)\n                logger.debug(f\"✅ 爬取成功: {page_data['title'][:50]}...\")\n            else:\n                # 爬取失败时，使用搜索结果的摘要\n                crawled.append({\n                    \"url\": url,\n                    \"title\": result.get(\"title\", \"\"),\n                    \"content\": result.get(\"snippet\", \"\"),\n                    \"snippet\": result.get(\"snippet\", \"\"),\n                    \"source\": result.get(\"source\", \"web\")\n                })\n                logger.debug(f\"⚠️ 使用摘要代替: {result.get('title', 'N/A')[:50]}...\")\n            \n            # 爬取间隔\n            if i < max_results - 1:\n                time.sleep(random.uniform(0.3, 0.8))\n        \n        logger.info(f\"📄 页面爬取完成: {len(crawled)} 个成功\")\n        return crawled\n\n\n# 便捷函数\ndef create_interactive_crawler(headless: bool = True, **kwargs) -> InteractiveCrawler:\n    \"\"\"创建交互式爬虫（兼容旧接口）\"\"\"\n    return InteractiveCrawler()\n\n\ndef search_and_crawl(\n    query: str,\n    engines: List[str] = None,\n    max_search_results: int = 10,\n    max_crawl_results: int = 5,\n    **kwargs  # 兼容旧接口\n) -> Dict[str, Any]:\n    \"\"\"\n    一体化搜索和爬取函数\n    \n    Args:\n        query: 搜索关键词\n        engines: 搜索引擎列表\n        max_search_results: 最多获取多少个搜索结果\n        max_crawl_results: 最多爬取多少个页面\n        \n    Returns:\n        {\n            \"search_results\": [...],\n            \"crawled_results\": [...],\n            \"total_results\": int\n        }\n    \"\"\"\n    crawler = InteractiveCrawler()\n    \n    logger.info(f\"🔍 开始搜索: {query}\")\n    search_results = crawler.interactive_search(\n        query,\n        engines=engines,\n        num_results=max_search_results\n    )\n    \n    if not search_results:\n        logger.warning(f\"搜索未返回结果: {query}\")\n        return {\n            \"search_results\": [],\n            \"crawled_results\": [],\n            \"total_results\": 0\n        }\n    \n    logger.info(f\"📄 开始爬取前 {max_crawl_results} 个结果\")\n    crawled_results = crawler.crawl_search_results(\n        search_results,\n        max_results=max_crawl_results\n    )\n    \n    return {\n        \"search_results\": search_results,\n        \"crawled_results\": crawled_results,\n        \"total_results\": len(crawled_results)\n    }\n"
  },
  {
    "path": "backend/app/tools/jingji21_crawler.py",
    "content": "\"\"\"\n21经济网爬虫工具\n目标URL: https://www.21jingji.com/ (证券栏目)\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime, timedelta\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass Jingji21CrawlerTool(BaseCrawler):\n    \"\"\"\n    21经济网爬虫\n    主要爬取证券栏目\n    \"\"\"\n    \n    BASE_URL = \"https://www.21jingji.com/\"\n    # 证券栏目URL\n    STOCK_URL = \"https://www.21jingji.com/channel/capital/\"\n    SOURCE_NAME = \"jingji21\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"jingji21_crawler\",\n            description=\"Crawl financial news from 21 Jingji (21jingji.com)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取21经济网新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled Jingji21, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling Jingji21: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"爬取单页新闻\"\"\"\n        news_items = []\n        \n        try:\n            # 尝试爬取证券栏目或主页\n            try:\n                response = self._fetch_page(self.STOCK_URL)\n            except:\n                response = self._fetch_page(self.BASE_URL)\n            \n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links\")\n            \n            # 限制爬取数量\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 21经济网新闻URL模式\n            if ('/article/' in href or '/html/' in href or '.shtml' in href) and title:\n                # 确保是完整URL\n                if not href.startswith('http'):\n                    href = 'https://www.21jingji.com' + href\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            # 确保编码正确：21经济网可能使用 gbk 编码\n            if '21jingji.com' in url:\n                # 尝试多种编码\n                encodings = ['utf-8', 'gbk', 'gb2312', 'gb18030']\n                raw_html = None\n                for enc in encodings:\n                    try:\n                        raw_html = response.content.decode(enc)\n                        # 验证是否包含中文字符（避免乱码）\n                        if '\\u4e00' <= raw_html[0:100] <= '\\u9fff' or any('\\u4e00' <= c <= '\\u9fff' for c in raw_html[:500]):\n                            break\n                    except (UnicodeDecodeError, LookupError):\n                        continue\n                if raw_html is None:\n                    raw_html = response.text\n            else:\n                raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'article-content'},\n            {'class': 'content'},\n            {'class': 'text'},\n            {'id': 'content'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content:\n                        return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/app/tools/jwview_crawler.py",
    "content": "\"\"\"\n中新经纬爬虫工具\n目标URL: https://www.jwview.com/\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime, timedelta\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass JwviewCrawlerTool(BaseCrawler):\n    \"\"\"\n    中新经纬新闻爬虫\n    爬取中新经纬财经新闻\n    \"\"\"\n    \n    BASE_URL = \"https://www.jwview.com/\"\n    # 股票/证券专栏URL（如果有）\n    STOCK_URL = \"https://www.jwview.com/jingwei/html/index.shtml\"\n    SOURCE_NAME = \"jwview\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"jwview_crawler\",\n            description=\"Crawl financial news from Zhongxin Jingwei (jwview.com)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取中新经纬新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled Jwview, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling Jwview: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"爬取单页新闻\"\"\"\n        news_items = []\n        \n        try:\n            # 尝试爬取主页或股票专栏\n            response = self._fetch_page(self.BASE_URL)\n            # 金融界可能使用 gbk 编码\n            if response.encoding == 'ISO-8859-1' or not response.encoding:\n                try:\n                    response.content.decode('gbk')\n                    response.encoding = 'gbk'\n                except:\n                    response.encoding = 'utf-8'\n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links\")\n            \n            # 限制爬取数量\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接（中新经纬的URL模式）\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 中新经纬新闻URL模式\n            if ('/jingwei/' in href or '/html/' in href) and title:\n                # 规范化 URL，避免出现 //www... 重复前缀\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://www.jwview.com' + href\n                elif not href.startswith('http'):\n                    href = 'https://www.jwview.com/' + href.lstrip('/')\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'content'},\n            {'class': 'article-content'},\n            {'class': 'text'},\n            {'id': 'content'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content:\n                        return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 处理相对时间\n        if '分钟前' in time_str:\n            minutes = int(re.search(r'(\\d+)', time_str).group(1))\n            return now - timedelta(minutes=minutes)\n        elif '小时前' in time_str:\n            hours = int(re.search(r'(\\d+)', time_str).group(1))\n            return now - timedelta(hours=hours)\n        elif '昨天' in time_str:\n            return now - timedelta(days=1)\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/app/tools/nbd_crawler.py",
    "content": "\"\"\"\n每日经济新闻爬虫工具\n目标URL: https://finance.nbd.com.cn/\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass NbdCrawlerTool(BaseCrawler):\n    \"\"\"\n    每日经济新闻爬虫\n    主要爬取财经股市新闻\n    \"\"\"\n    \n    BASE_URL = \"https://www.nbd.com.cn/\"\n    STOCK_URL = \"https://www.nbd.com.cn/columns/3/\"\n    SOURCE_NAME = \"nbd\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"nbd_crawler\",\n            description=\"Crawl financial news from NBD (nbd.com.cn)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取每日经济新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled NBD, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling NBD: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"爬取单页新闻\"\"\"\n        news_items = []\n        \n        try:\n            response = self._fetch_page(self.STOCK_URL)\n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links\")\n            \n            # 限制爬取数量\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    # 如果是503错误，记录但继续处理其他URL\n                    error_str = str(e)\n                    if '503' in error_str or 'Service Temporarily Unavailable' in error_str:\n                        logger.warning(f\"Skipping {link_info.get('url', 'unknown')} due to 503 error (server overloaded)\")\n                    else:\n                        logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接\n        all_links = soup.find_all('a', href=True)\n        \n        # NBD新闻URL模式（扩展更多模式）\n        nbd_patterns = [\n            '/articles/',        # 文章列表\n            '/article/',         # 文章\n            '.html',             # HTML页面\n            '/columns/',         # 栏目\n            '/finance/',         # 财经\n        ]\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 检查是否匹配NBD URL模式\n            is_nbd_url = False\n            \n            # 方式1: 检查URL模式\n            for pattern in nbd_patterns:\n                if pattern in href:\n                    is_nbd_url = True\n                    break\n            \n            # 方式2: 检查是否包含nbd.com.cn域名\n            if 'nbd.com.cn' in href:\n                is_nbd_url = True\n            \n            # 方式3: 检查链接的class或data属性\n            if not is_nbd_url:\n                link_class = link.get('class', [])\n                if isinstance(link_class, list):\n                    link_class_str = ' '.join(link_class)\n                else:\n                    link_class_str = str(link_class)\n                if any(kw in link_class_str.lower() for kw in ['news', 'article', 'item', 'title', 'list']):\n                    if href.startswith('/') or 'nbd.com.cn' in href:\n                        is_nbd_url = True\n            \n            if is_nbd_url and title and len(title.strip()) > 5:\n                # 确保是完整URL\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://www.nbd.com.cn' + href\n                elif not href.startswith('http'):\n                    href = 'https://www.nbd.com.cn/' + href.lstrip('/')\n                \n                # 过滤掉明显不是新闻的链接\n                if any(skip in href.lower() for skip in ['javascript:', 'mailto:', '#', 'void(0)', '/tag/', '/author/', '/user/', '/login']):\n                    continue\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title.strip()})\n        \n        logger.debug(f\"NBD: Found {len(news_links)} potential news links\")\n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            # 检查是否是503错误（服务器过载）\n            error_str = str(e)\n            if '503' in error_str or 'Service Temporarily Unavailable' in error_str:\n                logger.debug(f\"Skipping {url} due to 503 error (server overloaded, will retry later)\")\n                # 对于503错误，直接返回None，不记录为警告，因为这是临时性问题\n                return None\n            else:\n                logger.warning(f\"Failed to extract news from {url}: {e}\")\n                return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        # 每经网站可能的正文容器选择器（按优先级排序）\n        content_selectors = [\n            # 新版页面结构\n            {'class': 'article-body'},\n            {'class': 'article__body'},\n            {'class': 'article-text'},\n            {'class': 'content-article'},\n            {'class': 'main-content'},\n            # 旧版页面结构\n            {'class': 'g-article-content'},\n            {'class': 'article-content'},\n            {'class': 'content'},\n            {'id': 'contentText'},\n            {'id': 'article-content'},\n            # 通用选择器\n            {'itemprop': 'articleBody'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find(['div', 'article', 'section'], selector)\n            if content_div:\n                # 移除脚本、样式、广告等无关元素\n                for tag in content_div.find_all(['script', 'style', 'iframe', 'ins', 'noscript']):\n                    tag.decompose()\n                for ad in content_div.find_all(class_=re.compile(r'ad|advertisement|banner|recommend')):\n                    ad.decompose()\n                \n                # 提取所有段落，不限制数量\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content and len(content) > 50:\n                        return self._clean_text(content)\n                \n                # 如果没有 p 标签，直接取文本\n                text = content_div.get_text(separator='\\n', strip=True)\n                if text and len(text) > 50:\n                    return self._clean_text(text)\n        \n        # 后备方案：取所有段落（不限制数量）\n        paragraphs = soup.find_all('p')\n        if paragraphs:\n            # 过滤掉可能的导航、页脚等短段落\n            valid_paragraphs = [\n                p.get_text(strip=True) for p in paragraphs \n                if p.get_text(strip=True) and len(p.get_text(strip=True)) > 10\n            ]\n            content = '\\n'.join(valid_paragraphs)\n            if content:\n                return self._clean_text(content)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date|pub')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('span', {'class': re.compile(r'author|source|editor')})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/app/tools/netease163_crawler.py",
    "content": "\"\"\"\n网易财经爬虫工具\n目标URL: https://money.163.com/\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass Netease163CrawlerTool(BaseCrawler):\n    \"\"\"\n    网易财经爬虫\n    主要爬取财经股市新闻\n    \"\"\"\n    \n    BASE_URL = \"https://money.163.com/\"\n    STOCK_URL = \"https://money.163.com/stock/\"\n    SOURCE_NAME = \"163\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"netease163_crawler\",\n            description=\"Crawl financial news from Netease Money (money.163.com)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取网易财经新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled 163, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling 163: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"爬取单页新闻\"\"\"\n        news_items = []\n        \n        try:\n            # 尝试爬取股票栏目或主页\n            try:\n                response = self._fetch_page(self.STOCK_URL)\n            except:\n                response = self._fetch_page(self.BASE_URL)\n            \n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links\")\n            \n            # 限制爬取数量\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 网易新闻URL模式\n            if ('money.163.com' in href or 'stock' in href) and title:\n                # 确保是完整URL\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://money.163.com' + href\n                elif not href.startswith('http'):\n                    href = 'https://money.163.com/' + href.lstrip('/')\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'post_text'},\n            {'id': 'endText'},\n            {'class': 'article-content'},\n            {'class': 'content'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content:\n                        return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('div', {'class': re.compile(r'post_time|time')})\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if not author_elem:\n                author_elem = soup.find('div', {'id': 'ne_article_source'})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/app/tools/search_engine_crawler.py",
    "content": "\"\"\"\n搜索引擎爬虫工具\n直接爬取搜索引擎结果页面（Bing/Baidu）\n\"\"\"\nimport logging\nimport re\nimport requests\nfrom typing import List, Dict, Any, Optional\nfrom datetime import datetime, timedelta\nfrom urllib.parse import quote_plus\nfrom bs4 import BeautifulSoup\nimport time\n\nlogger = logging.getLogger(__name__)\n\n\nclass SearchEngineCrawler:\n    \"\"\"\n    搜索引擎爬虫\n    直接爬取 Bing/Baidu 搜索结果\n    \"\"\"\n    \n    def __init__(self):\n        \"\"\"初始化搜索引擎爬虫\"\"\"\n        self.headers = {\n            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',\n            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',\n            'Accept-Encoding': 'gzip, deflate',\n            'DNT': '1',\n            'Connection': 'keep-alive',\n            'Upgrade-Insecure-Requests': '1'\n        }\n        \n        self.session = requests.Session()\n        self.session.headers.update(self.headers)\n        \n        logger.info(\"🔧 搜索引擎爬虫已初始化\")\n    \n    def _fetch_url(self, url: str, timeout: int = 10) -> Optional[str]:\n        \"\"\"\n        爬取URL内容\n        \n        Args:\n            url: 目标URL\n            timeout: 超时时间\n            \n        Returns:\n            HTML内容\n        \"\"\"\n        try:\n            response = self.session.get(url, timeout=timeout)\n            response.raise_for_status()\n            \n            # 尝试检测编码\n            if response.encoding == 'ISO-8859-1':\n                # 对于中文网站，尝试使用 gb2312 或 utf-8\n                encodings = ['utf-8', 'gb2312', 'gbk']\n                for enc in encodings:\n                    try:\n                        response.encoding = enc\n                        _ = response.text\n                        break\n                    except:\n                        continue\n            \n            return response.text\n            \n        except Exception as e:\n            logger.error(f\"❌ 爬取失败 {url}: {e}\")\n            return None\n    \n    def search_with_engine(\n        self,\n        query: str,\n        engine: str = \"bing\",\n        days: int = 30,\n        max_results: int = 50\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        使用搜索引擎搜索新闻\n        \n        Args:\n            query: 搜索关键词\n            engine: 搜索引擎 (bing/baidu)\n            days: 时间范围（天）\n            max_results: 最大结果数\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        if engine not in self.search_engines:\n            logger.error(f\"❌ 不支持的搜索引擎: {engine}\")\n            return []\n        \n        # 构建搜索URL\n        search_query = self._build_search_query(query, days)\n        search_url = self.search_engines[engine].format(query=quote_plus(search_query))\n        \n        logger.info(f\"🔍 搜索引擎爬取: {engine} - {search_query}\")\n        logger.info(f\"    URL: {search_url}\")\n        \n        # 创建临时输出目录\n        with tempfile.TemporaryDirectory() as temp_dir:\n            # 爬取搜索结果页面\n            result = self._call_mcp_crawl(search_url, temp_dir)\n            \n            if not result:\n                logger.warning(f\"⚠️ 搜索引擎爬取失败: {search_url}\")\n                return []\n            \n            # 解析搜索结果\n            news_items = self._parse_search_results(\n                content=result.get(\"content\", \"\"),\n                engine=engine,\n                max_results=max_results\n            )\n            \n            logger.info(f\"✅ 从 {engine} 提取到 {len(news_items)} 条结果\")\n            return news_items\n    \n    def _build_search_query(self, query: str, days: int) -> str:\n        \"\"\"\n        构建搜索查询字符串（添加时间限制）\n        \n        Args:\n            query: 原始查询\n            days: 时间范围\n            \n        Returns:\n            增强的搜索查询\n        \"\"\"\n        # 添加时间范围（对于 Bing 和 Baidu）\n        # Bing: 支持 \"query site:xxx.com\"\n        # 可以添加新闻源限制\n        \n        # 可选：限制到新闻网站\n        news_sites = [\n            \"sina.com.cn\",\n            \"163.com\",\n            \"eastmoney.com\",\n            \"cnstock.com\",\n            \"stcn.com\",\n            \"caijing.com.cn\",\n            \"yicai.com\",\n        ]\n        \n        # 构建基础查询\n        enhanced_query = f\"{query} 新闻\"\n        \n        # 添加时间提示词\n        if days <= 7:\n            enhanced_query += \" 最近一周\"\n        elif days <= 30:\n            enhanced_query += \" 最近一个月\"\n        \n        return enhanced_query\n    \n    def _parse_search_results(\n        self,\n        content: str,\n        engine: str,\n        max_results: int\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        解析搜索引擎返回的内容，提取新闻链接和标题\n        \n        Args:\n            content: 爬取的页面内容（Markdown格式）\n            engine: 搜索引擎类型\n            max_results: 最大结果数\n            \n        Returns:\n            新闻条目列表\n        \"\"\"\n        news_items = []\n        \n        # 从 Markdown 内容中提取链接\n        # 格式：[标题](URL)\n        link_pattern = r'\\[([^\\]]+)\\]\\(([^\\)]+)\\)'\n        matches = re.findall(link_pattern, content)\n        \n        for title, url in matches[:max_results]:\n            # 过滤掉搜索引擎自身的链接\n            if engine in url.lower():\n                continue\n            \n            # 过滤掉非新闻链接\n            if not self._is_news_url(url):\n                continue\n            \n            news_items.append({\n                \"title\": title.strip(),\n                \"url\": url.strip(),\n                \"snippet\": \"\",  # 暂时为空，后续可以从 content 中提取\n                \"source\": self._extract_source_from_url(url),\n                \"engine\": engine\n            })\n        \n        return news_items\n    \n    def _is_news_url(self, url: str) -> bool:\n        \"\"\"判断是否为新闻URL\"\"\"\n        news_domains = [\n            \"sina.com\", \"163.com\", \"eastmoney.com\", \"cnstock.com\",\n            \"stcn.com\", \"caijing.com\", \"yicai.com\", \"nbd.com\",\n            \"jwview.com\", \"eeo.com.cn\", \"finance.qq.com\"\n        ]\n        return any(domain in url.lower() for domain in news_domains)\n    \n    def _extract_source_from_url(self, url: str) -> str:\n        \"\"\"从URL提取来源\"\"\"\n        domain_mapping = {\n            \"sina.com\": \"新浪财经\",\n            \"163.com\": \"网易财经\",\n            \"eastmoney.com\": \"东方财富\",\n            \"cnstock.com\": \"中国证券网\",\n            \"stcn.com\": \"证券时报\",\n            \"caijing.com\": \"财经网\",\n            \"yicai.com\": \"第一财经\",\n            \"nbd.com\": \"每日经济新闻\",\n            \"jwview.com\": \"金融界\",\n            \"eeo.com.cn\": \"经济观察网\",\n            \"qq.com\": \"腾讯财经\",\n        }\n        \n        for domain, source in domain_mapping.items():\n            if domain in url.lower():\n                return source\n        \n        return \"未知来源\"\n    \n    def search_stock_news(\n        self,\n        stock_name: str,\n        stock_code: str,\n        days: int = 30,\n        engines: Optional[List[str]] = None,\n        max_per_engine: int = 30\n    ) -> List[Dict[str, Any]]:\n        \"\"\"\n        搜索股票新闻（多搜索引擎）\n        \n        Args:\n            stock_name: 股票名称\n            stock_code: 股票代码\n            days: 时间范围\n            engines: 搜索引擎列表，默认 [\"bing\"]\n            max_per_engine: 每个搜索引擎最大结果数\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        if engines is None:\n            engines = [\"bing\"]  # 默认只用 Bing（Baidu 可能需要处理反爬）\n        \n        all_news = []\n        \n        # 构建搜索关键词\n        queries = [\n            stock_name,\n            f\"{stock_name} {stock_code}\",\n            f\"{stock_name} 公告\",\n        ]\n        \n        for engine in engines:\n            for query in queries:\n                try:\n                    news = self.search_with_engine(\n                        query=query,\n                        engine=engine,\n                        days=days,\n                        max_results=max_per_engine\n                    )\n                    all_news.extend(news)\n                except Exception as e:\n                    logger.error(f\"❌ 搜索失败 [{engine}] {query}: {e}\")\n        \n        # 去重（按URL）\n        seen_urls = set()\n        unique_news = []\n        for news in all_news:\n            url = news.get(\"url\")\n            if url and url not in seen_urls:\n                seen_urls.add(url)\n                unique_news.append(news)\n        \n        logger.info(f\"✅ 多引擎搜索完成: 总计 {len(unique_news)} 条（去重后）\")\n        return unique_news\n\n\n# 便捷函数\ndef create_search_engine_crawler(mcp_server_path: Optional[str] = None) -> SearchEngineCrawler:\n    \"\"\"创建搜索引擎爬虫实例\"\"\"\n    return SearchEngineCrawler(mcp_server_path)\n\n\n# 测试代码\nif __name__ == \"__main__\":\n    logging.basicConfig(level=logging.INFO)\n    \n    crawler = create_search_engine_crawler()\n    \n    # 测试搜索\n    results = crawler.search_stock_news(\n        stock_name=\"深振业A\",\n        stock_code=\"000006\",\n        days=7,\n        engines=[\"bing\"],\n        max_per_engine=10\n    )\n    \n    print(f\"\\n✅ 搜索到 {len(results)} 条新闻:\")\n    for i, news in enumerate(results[:5], 1):\n        print(f\"{i}. {news['title']}\")\n        print(f\"   来源: {news['source']}\")\n        print(f\"   URL: {news['url']}\")\n\n"
  },
  {
    "path": "backend/app/tools/sina_crawler.py",
    "content": "\"\"\"\n新浪财经爬虫工具\n重构自 legacy_v1/Crawler/crawler_sina.py\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass SinaCrawlerTool(BaseCrawler):\n    \"\"\"\n    新浪财经新闻爬虫\n    爬取最新滚动新闻页面\n    \"\"\"\n    \n    # 新浪财经最新滚动新闻页面（2024年后的新URL）\n    BASE_URL = \"https://finance.sina.com.cn/roll/c/56592.shtml\"  # 暂不支持翻页，只爬首页\n    SOURCE_NAME = \"sina\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"sina_finance_crawler\",\n            description=\"Crawl financial news from Sina Finance (sina.com.cn)\"\n        )\n        self.min_chinese_ratio = 0.5  # 最小中文比例阈值\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取新浪财经新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        for page in range(start_page, end_page + 1):\n            try:\n                page_news = self._crawl_page(page)\n                news_list.extend(page_news)\n                logger.info(f\"Crawled page {page}, got {len(page_news)} news items\")\n            except Exception as e:\n                logger.error(f\"Failed to crawl page {page}: {e}\")\n                continue\n        \n        return news_list\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"\n        爬取单页新闻列表\n        \n        Args:\n            page: 页码（目前只支持首页，忽略此参数）\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        url = self.BASE_URL  # 新URL不支持翻页，只爬首页\n        logger.info(f\"Fetching page: {url}\")\n        response = self._fetch_page(url)\n        \n        # 设置正确的编码\n        response.encoding = 'utf-8'\n        soup = self._parse_html(response.text)\n        \n        # 查找新闻链接（改进选择器，更精确地找到新闻链接）\n        news_links = []\n        for link in soup.find_all('a', href=True):\n            href = link.get('href', '')\n            # 匹配新浪财经股票相关新闻URL\n            if 'finance.sina.com.cn' in href and ('/stock/' in href or '/roll/' in href):\n                # 确保是完整的URL\n                if href.startswith('http'):\n                    news_links.append(href)\n                elif href.startswith('//'):\n                    news_links.append('http:' + href)\n        \n        # 去重\n        news_links = list(set(news_links))\n        logger.info(f\"Found {len(news_links)} news links on page {page}\")\n        \n        # 爬取每条新闻详情（限制每页最多50条，避免超时）\n        news_list = []\n        max_news_per_page = 50 if page == 1 else 30  # 第一页爬取更多，其他页少一些\n        for idx, news_url in enumerate(news_links[:max_news_per_page], 1):\n            try:\n                logger.debug(f\"Crawling news {idx}/{min(len(news_links), max_news_per_page)}: {news_url}\")\n                news_item = self._crawl_news_detail(news_url)\n                if news_item:\n                    news_list.append(news_item)\n                    logger.debug(f\"Successfully crawled: {news_item.title[:50]}\")\n            except Exception as e:\n                logger.warning(f\"Failed to crawl news detail {news_url}: {e}\")\n                continue\n        \n        logger.info(f\"Successfully crawled {len(news_list)} news items from page {page}\")\n        return news_list\n    \n    def _crawl_news_detail(self, url: str) -> Optional[NewsItem]:\n        \"\"\"\n        爬取新闻详情页\n        \n        Args:\n            url: 新闻URL\n            \n        Returns:\n            新闻项或None\n        \"\"\"\n        try:\n            response = self._fetch_page(url)\n            response.encoding = BeautifulSoup(response.content, \"lxml\").original_encoding\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取标题\n            title = self._extract_title(soup)\n            if not title:\n                return None\n            \n            # 提取摘要和关键词\n            summary, keywords = self._extract_meta(soup)\n            \n            # 提取发布时间\n            publish_time = self._extract_date(soup)\n            \n            # 提取关联股票代码\n            stock_codes = self._extract_stock_codes(soup)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content or len(content) < 50:\n                return None\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                summary=summary,\n                keywords=keywords,\n                stock_codes=stock_codes,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.error(f\"Error crawling {url}: {e}\")\n            return None\n    \n    def _extract_title(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取标题\"\"\"\n        # 尝试多个可能的标题位置\n        title_tag = soup.find('h1', class_='main-title')\n        if not title_tag:\n            title_tag = soup.find('h1')\n        if not title_tag:\n            title_tag = soup.find('title')\n        \n        if title_tag:\n            title = title_tag.get_text().strip()\n            # 移除来源信息\n            title = re.sub(r'[-_].*?(新浪|财经|网)', '', title)\n            return title.strip()\n        return None\n    \n    def _extract_meta(self, soup: BeautifulSoup) -> tuple:\n        \"\"\"提取元数据（摘要和关键词）\"\"\"\n        summary = \"\"\n        keywords = []\n        \n        for meta in soup.find_all('meta'):\n            name = meta.get('name', '').lower()\n            content = meta.get('content', '')\n            \n            if name == 'description':\n                summary = content\n            elif name == 'keywords':\n                keywords = [kw.strip() for kw in content.split(',') if kw.strip()]\n        \n        return summary, keywords\n    \n    def _extract_date(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        # 查找时间标签\n        for span in soup.find_all('span'):\n            # 检查 class 属性\n            class_attr = span.get('class', [])\n            if 'date' in class_attr or 'time-source' in class_attr:\n                date_text = span.get_text()\n                return self._parse_date(date_text)\n            \n            # 检查 id 属性\n            if span.get('id') == 'pub_date':\n                date_text = span.get_text()\n                return self._parse_date(date_text)\n        \n        return None\n    \n    def _parse_date(self, date_text: str) -> Optional[datetime]:\n        \"\"\"解析日期字符串\"\"\"\n        try:\n            # 格式：2024年12月01日 10:30\n            date_text = date_text.strip()\n            date_text = date_text.replace('年', '-').replace('月', '-').replace('日', '')\n            \n            # 尝试多种格式\n            for fmt in [\n                '%Y-%m-%d %H:%M',\n                '%Y-%m-%d %H:%M:%S',\n                '%Y-%m-%d',\n            ]:\n                try:\n                    return datetime.strptime(date_text.strip(), fmt)\n                except ValueError:\n                    continue\n        except Exception:\n            pass\n        \n        return None\n    \n    def _extract_stock_codes(self, soup: BeautifulSoup) -> List[str]:\n        \"\"\"提取关联股票代码\"\"\"\n        stock_codes = []\n        \n        for span in soup.find_all('span'):\n            span_id = span.get('id', '')\n            if span_id.startswith('stock_'):\n                # 格式：stock_sh600519\n                code = span_id[6:]  # 移除 'stock_' 前缀\n                if code:\n                    stock_codes.append(code.upper())\n        \n        return list(set(stock_codes))\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取正文内容\"\"\"\n        # 尝试使用更精确的选择器\n        content_selectors = [\n            {'id': 'artibody'},\n            {'class': 'article-content'},\n            {'class': 'article'},\n            {'id': 'article'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find(['div', 'article'], selector)\n            if content_div:\n                # 1. 移除明确的噪音元素\n                for tag in content_div.find_all(['script', 'style', 'iframe', 'ins', 'select', 'input', 'button', 'form']):\n                    tag.decompose()\n                \n                # 2. 移除特定的广告和推荐块\n                for ad in content_div.find_all(class_=re.compile(r'ad|banner|share|otherContent|recommend|app-guide', re.I)):\n                    ad.decompose()\n\n                # 3. 获取所有文本，使用换行符分隔\n                # 关键修改：使用 get_text 而不是 find_all('p')，确保不漏掉裸露的文本节点\n                full_text = content_div.get_text(separator='\\n', strip=True)\n                \n                # 4. 按行分割并清洗\n                lines = full_text.split('\\n')\n                article_parts = []\n                \n                for line in lines:\n                    line = line.strip()\n                    if not line:\n                        continue\n                        \n                    # 5. 过滤和清洗行\n                    # 检查中文比例\n                    chinese_ratio = self._extract_chinese_ratio(line)\n                    \n                    # 宽松的保留策略：\n                    # - 忽略极短的非中文行（可能是页码、特殊符号）\n                    if len(line) < 2:\n                        continue\n                        \n                    # 保留条件：\n                    # 1. 包含一定比例中文（>5%）\n                    # 2. 或者长文本（>20字符），可能是纯数据或英文段落\n                    if chinese_ratio > 0.05 or len(line) > 20:\n                        clean_line = self._clean_text(line)\n                        if clean_line and not self._is_noise_text(clean_line):\n                            article_parts.append(clean_line)\n                \n                if article_parts:\n                    return '\\n'.join(article_parts)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n    \n    def _is_noise_text(self, text: str) -> bool:\n        \"\"\"判断是否为噪音文本（广告、版权等）\"\"\"\n        noise_patterns = [\n            r'^责任编辑',\n            r'^编辑[:：]',\n            r'^来源[:：]',\n            r'^声明[:：]',\n            r'^免责声明',\n            r'^版权',\n            r'^copyright',\n            r'^点击进入',\n            r'^相关阅读',\n            r'^延伸阅读',\n            r'^\\s*$',\n            r'登录新浪财经APP',\n            r'搜索【信披】',\n            r'缩小字体',\n            r'放大字体',\n            r'收藏',\n            r'微博',\n            r'微信',\n            r'分享',\n            r'腾讯QQ',\n        ]\n        text_lower = text.lower().strip()\n        for pattern in noise_patterns:\n            if re.match(pattern, text_lower, re.I) or re.search(pattern, text_lower, re.I):\n                return True\n        return False\n\n\n# 便捷创建函数\ndef create_sina_crawler() -> SinaCrawlerTool:\n    \"\"\"创建新浪财经爬虫实例\"\"\"\n    return SinaCrawlerTool()\n\n"
  },
  {
    "path": "backend/app/tools/tencent_crawler.py",
    "content": "\"\"\"\n腾讯财经爬虫工具\n目标URL: https://news.qq.com/ch/finance/\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime, timedelta\nfrom bs4 import BeautifulSoup\nimport json\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass TencentCrawlerTool(BaseCrawler):\n    \"\"\"\n    腾讯财经新闻爬虫\n    爬取腾讯财经频道最新新闻\n    \"\"\"\n    \n    BASE_URL = \"https://news.qq.com/ch/finance_stock/\"\n    # 腾讯新闻API（如果页面动态加载，可能需要调用API）\n    API_URL = \"https://pacaio.match.qq.com/irs/rcd\"\n    SOURCE_NAME = \"tencent\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"tencent_finance_crawler\",\n            description=\"Crawl financial news from Tencent Finance (news.qq.com)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取腾讯财经新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            # 腾讯财经页面只爬取首页\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled Tencent Finance, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling Tencent Finance: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"\n        爬取单页新闻\n        \n        优先使用API获取新闻，如果API失败则回退到HTML解析\n        \n        Args:\n            page: 页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_items = []\n        \n        # 先尝试使用API获取新闻\n        try:\n            logger.info(f\"[Tencent] Attempting API fetch for page {page}\")\n            api_news = self._fetch_api_news(page)\n            logger.info(f\"[Tencent] API returned {len(api_news) if api_news else 0} news items\")\n            if api_news:\n                logger.info(f\"Fetched {len(api_news)} news from API\")\n                for news_data in api_news[:20]:  # 限制20条\n                    try:\n                        news_item = self._parse_api_news_item(news_data)\n                        if news_item:\n                            news_items.append(news_item)\n                    except Exception as e:\n                        logger.warning(f\"Failed to parse API news item: {e}\")\n                        continue\n                if news_items:\n                    logger.info(f\"[Tencent] Successfully parsed {len(news_items)} news items from API\")\n                    return news_items\n            else:\n                logger.info(f\"[Tencent] API returned empty list, falling back to HTML\")\n        except Exception as e:\n            logger.warning(f\"API fetch failed, fallback to HTML: {e}\")\n        \n        # API失败，回退到HTML解析\n        try:\n            response = self._fetch_page(self.BASE_URL)\n            # 腾讯新闻可能使用动态加载，确保编码正确\n            if response.encoding == 'ISO-8859-1' or not response.encoding:\n                response.encoding = 'utf-8'\n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            # 腾讯的新闻可能在各种容器中，尝试提取所有新闻链接\n            news_links = self._extract_news_links(soup)\n            \n            logger.info(f\"Found {len(news_links)} potential news links from HTML\")\n            \n            # 限制爬取数量，避免过多请求\n            max_news = 20\n            for i, link_info in enumerate(news_links[:max_news]):\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item {i+1}: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page {page}: {e}\")\n        \n        return news_items\n    \n    def _fetch_api_news(self, page: int = 0) -> List[dict]:\n        \"\"\"\n        通过API获取新闻列表\n        \n        Args:\n            page: 页码（从0开始）\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        try:\n            # 腾讯新闻API参数（根据实际API文档调整）\n            params = {\n                \"cid\": \"finance_stock\",  # 股票频道\n                \"page\": page,\n                \"num\": 20,  # 每页20条\n                \"ext\": \"finance_stock\",  # 扩展参数\n            }\n            \n            headers = {\n                \"User-Agent\": self.user_agent,\n                \"Referer\": self.BASE_URL,\n                \"Accept\": \"application/json, text/javascript, */*; q=0.01\",\n            }\n            \n            logger.info(f\"[Tencent] Calling API: {self.API_URL} with params: {params}\")\n            response = self.session.get(\n                self.API_URL,\n                params=params,\n                headers=headers,\n                timeout=self.timeout\n            )\n            logger.info(f\"[Tencent] API response status: {response.status_code}\")\n            response.raise_for_status()\n            \n            # 解析JSON响应（可能是JSONP格式）\n            content = response.text.strip()\n            logger.info(f\"[Tencent] API response preview (first 500 chars): {content[:500]}\")\n            \n            # 尝试解析JSONP格式\n            if content.startswith('callback(') or content.startswith('jQuery'):\n                # 提取JSON部分\n                import re\n                json_match = re.search(r'\\((.*)\\)$', content)\n                if json_match:\n                    content = json_match.group(1)\n            \n            data = json.loads(content)\n            logger.info(f\"[Tencent] Parsed API response type: {type(data)}, keys: {list(data.keys()) if isinstance(data, dict) else 'N/A'}\")\n            \n            if isinstance(data, dict):\n                if 'data' in data:\n                    logger.info(f\"[Tencent] Found 'data' key with {len(data['data']) if isinstance(data['data'], list) else 'non-list'} items\")\n                    return data['data']\n                elif 'list' in data:\n                    logger.info(f\"[Tencent] Found 'list' key with {len(data['list']) if isinstance(data['list'], list) else 'non-list'} items\")\n                    return data['list']\n                elif 'result' in data:\n                    logger.info(f\"[Tencent] Found 'result' key with {len(data['result']) if isinstance(data['result'], list) else 'non-list'} items\")\n                    return data['result']\n                else:\n                    logger.warning(f\"[Tencent] Unexpected API response format, keys: {list(data.keys())}\")\n            elif isinstance(data, list):\n                logger.info(f\"[Tencent] API returned list with {len(data)} items\")\n                return data\n            \n            logger.warning(f\"Unexpected API response format: {type(data)}\")\n            return []\n            \n        except json.JSONDecodeError as e:\n            logger.warning(f\"API JSON decode failed: {e}, response preview: {response.text[:200] if 'response' in locals() else 'N/A'}\")\n            return []\n        except Exception as e:\n            logger.warning(f\"API fetch failed: {e}\")\n            return []\n    \n    def _parse_api_news_item(self, news_data: dict) -> Optional[NewsItem]:\n        \"\"\"\n        解析API返回的新闻数据\n        \n        Args:\n            news_data: API返回的单条新闻数据\n            \n        Returns:\n            NewsItem对象\n        \"\"\"\n        try:\n            # 提取基本信息\n            title = news_data.get('title', '').strip()\n            url = news_data.get('url', '') or news_data.get('surl', '')\n            \n            # 确保URL是完整的\n            if url and not url.startswith('http'):\n                if url.startswith('//'):\n                    url = 'https:' + url\n                elif url.startswith('/'):\n                    url = 'https://news.qq.com' + url\n                else:\n                    url = 'https://news.qq.com/' + url.lstrip('/')\n            \n            if not title or not url:\n                return None\n            \n            # 提取发布时间\n            publish_time_str = news_data.get('time', '') or news_data.get('publish_time', '')\n            publish_time = self._parse_time_string(publish_time_str) if publish_time_str else datetime.now()\n            \n            # 提取摘要作为内容（API通常不返回完整内容）\n            content = news_data.get('abstract', '') or news_data.get('intro', '') or title\n            \n            # 提取作者\n            author = news_data.get('author', '') or news_data.get('source', '')\n            \n            # 尝试获取完整内容\n            try:\n                response = self._fetch_page(url)\n                if response.encoding == 'ISO-8859-1' or not response.encoding:\n                    response.encoding = 'utf-8'\n                raw_html = response.text\n                soup = self._parse_html(raw_html)\n                full_content = self._extract_content(soup)\n                if full_content and len(full_content) > len(content):\n                    content = full_content\n            except Exception as e:\n                logger.debug(f\"Failed to fetch full content from {url}: {e}\")\n                raw_html = None\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author if author else None,\n                raw_html=raw_html,\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to parse API news item: {e}\")\n            return None\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"\n        从页面中提取新闻链接\n        \n        Args:\n            soup: BeautifulSoup对象\n            \n        Returns:\n            新闻链接信息列表\n        \"\"\"\n        news_links = []\n        \n        # 查找所有链接\n        all_links = soup.find_all('a', href=True)\n        \n        # 腾讯新闻URL模式（扩展更多模式）\n        tencent_patterns = [\n            '/rain/a/',           # 旧模式\n            '/omn/',              # 旧模式\n            '/a/',                # 新模式\n            '/finance/',          # 财经频道\n            'finance.qq.com',     # 财经域名\n            '/stock/',            # 股票相关\n        ]\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 检查是否匹配腾讯新闻URL模式\n            is_tencent_url = False\n            for pattern in tencent_patterns:\n                if pattern in href:\n                    is_tencent_url = True\n                    break\n            \n            # 或者检查是否是qq.com域名且包含新闻相关关键词\n            if not is_tencent_url:\n                if 'qq.com' in href and any(kw in href for kw in ['/a/', '/article/', '/news/', '/finance/']):\n                    is_tencent_url = True\n            \n            if is_tencent_url and title and len(title.strip()) > 5:\n                # 确保是完整URL\n                if not href.startswith('http'):\n                    if href.startswith('//'):\n                        href = 'https:' + href\n                    elif href.startswith('/'):\n                        href = 'https://news.qq.com' + href\n                    else:\n                        href = 'https://news.qq.com/' + href.lstrip('/')\n                \n                # 过滤掉明显不是新闻的链接\n                if any(skip in href.lower() for skip in ['javascript:', 'mailto:', '#', 'void(0)']):\n                    continue\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({\n                        'url': href,\n                        'title': title.strip()\n                    })\n        \n        logger.debug(f\"Tencent: Found {len(news_links)} potential news links\")\n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"\n        提取单条新闻详情\n        \n        Args:\n            link_info: 新闻链接信息\n            \n        Returns:\n            NewsItem或None\n        \"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            # 获取新闻详情页\n            response = self._fetch_page(url)\n            # 确保编码正确\n            if response.encoding == 'ISO-8859-1' or not response.encoding:\n                response.encoding = 'utf-8'\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文内容\n            content = self._extract_content(soup)\n            if not content:\n                logger.debug(f\"No content found for: {title}\")\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"\n        提取新闻正文\n        \n        Args:\n            soup: BeautifulSoup对象\n            \n        Returns:\n            新闻正文\n        \"\"\"\n        # 尝试多种选择器\n        content_selectors = [\n            {'class': 'content-article'},\n            {'class': 'LEFT'},\n            {'id': 'Cnt-Main-Article-QQ'},\n            {'class': 'article'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                # 获取所有段落\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content:\n                        return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"\n        提取发布时间\n        \n        Args:\n            soup: BeautifulSoup对象\n            \n        Returns:\n            发布时间\n        \"\"\"\n        try:\n            # 尝试多种时间选择器\n            time_selectors = [\n                {'class': 'a-time'},\n                {'class': 'article-time'},\n                {'class': 'time'},\n            ]\n            \n            for selector in time_selectors:\n                time_elem = soup.find('span', selector)\n                if time_elem:\n                    time_str = time_elem.get_text(strip=True)\n                    return self._parse_time_string(time_str)\n            \n            # 尝试从meta标签获取\n            meta_time = soup.find('meta', {'property': 'article:published_time'})\n            if meta_time and meta_time.get('content'):\n                return datetime.fromisoformat(meta_time['content'].replace('Z', '+00:00'))\n            \n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        # 默认返回当前时间\n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"\n        解析时间字符串（如\"1小时前\"、\"昨天\"、\"2024-12-06 10:00\"）\n        \n        Args:\n            time_str: 时间字符串\n            \n        Returns:\n            datetime对象\n        \"\"\"\n        now = datetime.now()\n        \n        # 处理相对时间\n        if '分钟前' in time_str:\n            minutes = int(re.search(r'(\\d+)', time_str).group(1))\n            return now - timedelta(minutes=minutes)\n        elif '小时前' in time_str:\n            hours = int(re.search(r'(\\d+)', time_str).group(1))\n            return now - timedelta(hours=hours)\n        elif '昨天' in time_str:\n            return now - timedelta(days=1)\n        elif '前天' in time_str:\n            return now - timedelta(days=2)\n        \n        # 尝试解析绝对时间\n        try:\n            # 尝试多种格式\n            formats = [\n                '%Y-%m-%d %H:%M:%S',\n                '%Y-%m-%d %H:%M',\n                '%Y-%m-%d',\n                '%Y年%m月%d日 %H:%M',\n                '%Y年%m月%d日',\n            ]\n            for fmt in formats:\n                try:\n                    return datetime.strptime(time_str, fmt)\n                except ValueError:\n                    continue\n        except Exception:\n            pass\n        \n        # 默认返回当前时间\n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"\n        提取作者\n        \n        Args:\n            soup: BeautifulSoup对象\n            \n        Returns:\n            作者名称\n        \"\"\"\n        try:\n            # 尝试多种作者选择器\n            author_selectors = [\n                {'class': 'author'},\n                {'class': 'article-author'},\n                {'class': 'source'},\n            ]\n            \n            for selector in author_selectors:\n                author_elem = soup.find('span', selector) or soup.find('a', selector)\n                if author_elem:\n                    author = author_elem.get_text(strip=True)\n                    if author:\n                        return author\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/app/tools/text_cleaner.py",
    "content": "\"\"\"\n文本清洗工具\n重构自 legacy_v1/src/Killua/\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Set\nimport jieba\n\nfrom agenticx import BaseTool\nfrom agenticx.core import ToolMetadata, ToolCategory\n\nlogger = logging.getLogger(__name__)\n\n\nclass TextCleanerTool(BaseTool):\n    \"\"\"\n    文本清洗工具\n    提供去停用词、分词、文本标准化等功能\n    \"\"\"\n    \n    # 中文停用词列表（简化版）\n    STOP_WORDS = {\n        '的', '了', '在', '是', '我', '有', '和', '就', '不', '人', '都', '一',\n        '一个', '上', '也', '很', '到', '说', '要', '去', '你', '会', '着', '没有',\n        '看', '好', '自己', '这', '那', '里', '就是', '什么', '可以', '为', '以',\n        '及', '等', '将', '并', '个', '与', '对', '如', '所', '于', '被', '由',\n        '从', '而', '把', '让', '向', '却', '但', '或', '及', '但是', '然而',\n        '因为', '所以', '如果', '虽然', '尽管', '无论', '不管', '只要', '除非',\n        '、', '，', '。', '；', '：', '？', '！', '\"', '\"', ''', ''', '（', '）',\n        '【', '】', '《', '》', '—', '…', '·', '~', '#', '@', '&',\n    }\n    \n    def __init__(self):\n        metadata = ToolMetadata(\n            name=\"text_cleaner\",\n            description=\"Clean and preprocess Chinese financial text\",\n            category=ToolCategory.UTILITY,\n            version=\"1.0.0\"\n        )\n        super().__init__(metadata=metadata)\n        # 初始化jieba\n        jieba.setLogLevel(logging.WARNING)\n        \n        # 加载金融领域自定义词典（可选）\n        self._load_custom_dict()\n    \n    def _load_custom_dict(self):\n        \"\"\"加载自定义词典\"\"\"\n        # 金融领域常用词\n        financial_words = [\n            '股票', '证券', '基金', '债券', '期货', '期权', '外汇',\n            '上证指数', '深证成指', '创业板', '科创板',\n            '涨停', '跌停', '停牌', '复牌', '退市', '上市',\n            '市盈率', '市净率', '市值', '流通股', '限售股',\n            '分红', '配股', '增发', '回购', '重组', '并购',\n            '利好', '利空', '看多', '看空', '做多', '做空',\n            '成交量', '换手率', '振幅', '量比',\n        ]\n        \n        for word in financial_words:\n            jieba.add_word(word)\n    \n    def clean_text(self, text: str) -> str:\n        \"\"\"\n        基础文本清洗\n        \n        Args:\n            text: 原始文本\n            \n        Returns:\n            清洗后的文本\n        \"\"\"\n        if not text:\n            return \"\"\n        \n        # 移除URL\n        text = re.sub(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\\\(\\\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', '', text)\n        \n        # 移除邮箱\n        text = re.sub(r'[\\w\\.-]+@[\\w\\.-]+\\.\\w+', '', text)\n        \n        # 移除特殊字符（保留中文、英文、数字）\n        text = re.sub(r'[^\\u4e00-\\u9fa5a-zA-Z0-9\\s\\.\\,\\!\\?\\:\\;\\-\\%\\(\\)]', '', text)\n        \n        # 统一空格\n        text = re.sub(r'\\s+', ' ', text)\n        \n        return text.strip()\n    \n    def tokenize(self, text: str, remove_stopwords: bool = True) -> List[str]:\n        \"\"\"\n        中文分词\n        \n        Args:\n            text: 文本\n            remove_stopwords: 是否去除停用词\n            \n        Returns:\n            词语列表\n        \"\"\"\n        # 分词\n        words = jieba.cut(text)\n        \n        # 过滤\n        result = []\n        for word in words:\n            word = word.strip()\n            if not word:\n                continue\n            \n            # 去除停用词\n            if remove_stopwords and word in self.STOP_WORDS:\n                continue\n            \n            # 去除单字符（除了一些特殊字如\"涨\"、\"跌\"）\n            if len(word) == 1 and not re.match(r'[\\u4e00-\\u9fa5]', word):\n                continue\n            \n            result.append(word)\n        \n        return result\n    \n    def extract_keywords(self, text: str, top_k: int = 10) -> List[str]:\n        \"\"\"\n        提取关键词\n        \n        Args:\n            text: 文本\n            top_k: 返回的关键词数量\n            \n        Returns:\n            关键词列表\n        \"\"\"\n        import jieba.analyse\n        \n        keywords = jieba.analyse.extract_tags(\n            text,\n            topK=top_k,\n            withWeight=False\n        )\n        return keywords\n    \n    def normalize_stock_code(self, code: str) -> str:\n        \"\"\"\n        标准化股票代码\n        \n        Args:\n            code: 原始代码（如 sh600519, 600519, SH600519）\n            \n        Returns:\n            标准化代码（如 600519）\n        \"\"\"\n        code = code.upper().strip()\n        # 移除市场前缀\n        code = re.sub(r'^(SH|SZ|HK)', '', code)\n        return code\n    \n    def _setup_parameters(self):\n        \"\"\"设置工具参数（AgenticX 要求）\"\"\"\n        # TextCleanerTool 的参数通过 execute 方法的 kwargs 传递\n        pass\n    \n    def execute(self, **kwargs) -> dict:\n        \"\"\"\n        同步执行方法（AgenticX Tool 协议要求）\n        \n        Args:\n            **kwargs: 参数字典\n                - text: 输入文本（必需）\n                - operation: 操作类型（clean, tokenize, keywords），默认 \"clean\"\n                - remove_stopwords: 是否去除停用词（仅用于 tokenize），默认 True\n                - top_k: 关键词数量（仅用于 keywords），默认 10\n                \n        Returns:\n            执行结果\n        \"\"\"\n        text = kwargs.get(\"text\", \"\")\n        if not text:\n            return {\"success\": False, \"error\": \"Missing required parameter: text\"}\n        \n        operation = kwargs.get(\"operation\", \"clean\")\n        \n        if operation == \"clean\":\n            result = self.clean_text(text)\n            return {\"success\": True, \"result\": result}\n        \n        elif operation == \"tokenize\":\n            remove_stopwords = kwargs.get(\"remove_stopwords\", True)\n            result = self.tokenize(text, remove_stopwords)\n            return {\"success\": True, \"result\": result, \"count\": len(result)}\n        \n        elif operation == \"keywords\":\n            top_k = kwargs.get(\"top_k\", 10)\n            result = self.extract_keywords(text, top_k)\n            return {\"success\": True, \"result\": result}\n        \n        else:\n            return {\"success\": False, \"error\": f\"Unknown operation: {operation}\"}\n    \n    async def aexecute(self, **kwargs) -> dict:\n        \"\"\"\n        异步执行方法（AgenticX Tool 协议要求）\n        当前实现为同步执行的包装\n        \n        Args:\n            **kwargs: 参数字典\n                \n        Returns:\n            执行结果\n        \"\"\"\n        return self.execute(**kwargs)\n\n\n# 便捷创建函数\ndef create_text_cleaner() -> TextCleanerTool:\n    \"\"\"创建文本清洗工具实例\"\"\"\n    return TextCleanerTool()\n\n"
  },
  {
    "path": "backend/app/tools/yicai_crawler.py",
    "content": "\"\"\"\n第一财经爬虫工具\n目标URL: https://www.yicai.com/news/gushi/\n\"\"\"\nimport re\nimport logging\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom bs4 import BeautifulSoup\n\nfrom .crawler_base import BaseCrawler, NewsItem\n\nlogger = logging.getLogger(__name__)\n\n\nclass YicaiCrawlerTool(BaseCrawler):\n    \"\"\"\n    第一财经爬虫\n    主要爬取股市新闻\n    \"\"\"\n    \n    BASE_URL = \"https://www.yicai.com/\"\n    STOCK_URL = \"https://www.yicai.com/news/gushi/\"\n    SOURCE_NAME = \"yicai\"\n    \n    def __init__(self):\n        super().__init__(\n            name=\"yicai_crawler\",\n            description=\"Crawl financial news from Yicai (yicai.com)\"\n        )\n    \n    def crawl(self, start_page: int = 1, end_page: int = 1) -> List[NewsItem]:\n        \"\"\"\n        爬取第一财经新闻\n        \n        Args:\n            start_page: 起始页码\n            end_page: 结束页码\n            \n        Returns:\n            新闻列表\n        \"\"\"\n        news_list = []\n        \n        try:\n            page_news = self._crawl_page(1)\n            news_list.extend(page_news)\n            logger.info(f\"Crawled Yicai, got {len(page_news)} news items\")\n        except Exception as e:\n            logger.error(f\"Error crawling Yicai: {e}\")\n        \n        # 应用股票筛选\n        filtered_news = self._filter_stock_news(news_list)\n        return filtered_news\n    \n    def _crawl_page(self, page: int) -> List[NewsItem]:\n        \"\"\"爬取单页新闻\"\"\"\n        news_items = []\n        \n        try:\n            response = self._fetch_page(self.STOCK_URL)\n            soup = self._parse_html(response.text)\n            \n            # 提取新闻列表\n            news_links = self._extract_news_links(soup)\n            logger.info(f\"Found {len(news_links)} potential news links\")\n            \n            # 限制爬取数量\n            max_news = 20\n            for link_info in news_links[:max_news]:\n                try:\n                    news_item = self._extract_news_item(link_info)\n                    if news_item:\n                        news_items.append(news_item)\n                except Exception as e:\n                    logger.warning(f\"Failed to extract news item: {e}\")\n                    continue\n            \n        except Exception as e:\n            logger.error(f\"Error crawling page: {e}\")\n        \n        return news_items\n    \n    def _extract_news_links(self, soup: BeautifulSoup) -> List[dict]:\n        \"\"\"从页面中提取新闻链接\"\"\"\n        news_links = []\n        \n        # 查找新闻链接\n        all_links = soup.find_all('a', href=True)\n        \n        for link in all_links:\n            href = link.get('href', '')\n            title = link.get_text(strip=True)\n            \n            # 第一财经新闻URL模式\n            if ('/news/' in href or '/article/' in href) and title:\n                # 确保是完整URL\n                if href.startswith('//'):\n                    href = 'https:' + href\n                elif href.startswith('/'):\n                    href = 'https://www.yicai.com' + href\n                elif not href.startswith('http'):\n                    href = 'https://www.yicai.com/' + href.lstrip('/')\n                \n                if href not in [n['url'] for n in news_links]:\n                    news_links.append({'url': href, 'title': title})\n        \n        return news_links\n    \n    def _extract_news_item(self, link_info: dict) -> Optional[NewsItem]:\n        \"\"\"提取单条新闻详情\"\"\"\n        url = link_info['url']\n        title = link_info['title']\n        \n        try:\n            response = self._fetch_page(url)\n            raw_html = response.text  # 保存原始 HTML\n            soup = self._parse_html(raw_html)\n            \n            # 提取正文\n            content = self._extract_content(soup)\n            if not content:\n                return None\n            \n            # 提取发布时间\n            publish_time = self._extract_publish_time(soup)\n            \n            # 提取作者\n            author = self._extract_author(soup)\n            \n            return NewsItem(\n                title=title,\n                content=content,\n                url=url,\n                source=self.SOURCE_NAME,\n                publish_time=publish_time,\n                author=author,\n                raw_html=raw_html,  # 保存原始 HTML\n            )\n            \n        except Exception as e:\n            logger.warning(f\"Failed to extract news from {url}: {e}\")\n            return None\n    \n    def _extract_content(self, soup: BeautifulSoup) -> str:\n        \"\"\"提取新闻正文\"\"\"\n        content_selectors = [\n            {'class': 'm-txt'},\n            {'class': 'article-content'},\n            {'class': 'content'},\n            {'class': 'newsContent'},\n        ]\n        \n        for selector in content_selectors:\n            content_div = soup.find('div', selector)\n            if content_div:\n                paragraphs = content_div.find_all('p')\n                if paragraphs:\n                    content = '\\n'.join([p.get_text(strip=True) for p in paragraphs if p.get_text(strip=True)])\n                    if content:\n                        return self._clean_text(content)\n        \n        # 后备方案：使用基类的智能提取方法\n        return self._extract_article_content(soup)\n        \n        return \"\"\n    \n    def _extract_publish_time(self, soup: BeautifulSoup) -> Optional[datetime]:\n        \"\"\"提取发布时间\"\"\"\n        try:\n            time_elem = soup.find('span', {'class': re.compile(r'time|date')})\n            if not time_elem:\n                time_elem = soup.find('time')\n            if time_elem:\n                time_str = time_elem.get_text(strip=True)\n                return self._parse_time_string(time_str)\n        except Exception as e:\n            logger.debug(f\"Failed to parse publish time: {e}\")\n        \n        return datetime.now()\n    \n    def _parse_time_string(self, time_str: str) -> datetime:\n        \"\"\"解析时间字符串\"\"\"\n        now = datetime.now()\n        \n        # 尝试解析绝对时间\n        formats = [\n            '%Y-%m-%d %H:%M:%S',\n            '%Y-%m-%d %H:%M',\n            '%Y-%m-%d',\n            '%Y年%m月%d日 %H:%M',\n            '%Y年%m月%d日',\n        ]\n        for fmt in formats:\n            try:\n                return datetime.strptime(time_str, fmt)\n            except ValueError:\n                continue\n        \n        return now\n    \n    def _extract_author(self, soup: BeautifulSoup) -> Optional[str]:\n        \"\"\"提取作者\"\"\"\n        try:\n            author_elem = soup.find('span', {'class': re.compile(r'author|source')})\n            if author_elem:\n                return author_elem.get_text(strip=True)\n        except Exception as e:\n            logger.debug(f\"Failed to extract author: {e}\")\n        \n        return None\n\n"
  },
  {
    "path": "backend/clear_news_data.py",
    "content": "\"\"\"\n清除所有新闻相关数据\n\"\"\"\nimport os\nimport sys\nfrom pathlib import Path\n\n# 加载环境变量\nfrom dotenv import load_dotenv\nenv_path = Path(__file__).parent / \".env\"\nload_dotenv(env_path)\n\n# 构建数据库 URL\nPOSTGRES_USER = os.getenv(\"POSTGRES_USER\", \"postgres\")\nPOSTGRES_PASSWORD = os.getenv(\"POSTGRES_PASSWORD\", \"postgres\")\nPOSTGRES_HOST = os.getenv(\"POSTGRES_HOST\", \"localhost\")\nPOSTGRES_PORT = os.getenv(\"POSTGRES_PORT\", \"5432\")\nPOSTGRES_DB = os.getenv(\"POSTGRES_DB\", \"finnews_db\")\n\nDATABASE_URL = f\"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}\"\n\nfrom sqlalchemy import create_engine, text\n\ndef clear_all_news_data():\n    \"\"\"清除所有新闻相关数据\"\"\"\n    print(\"🗑️  正在清除所有新闻数据...\")\n    \n    engine = create_engine(DATABASE_URL)\n    \n    with engine.connect() as conn:\n        # 查询存在的表\n        result = conn.execute(text(\"\"\"\n            SELECT table_name FROM information_schema.tables \n            WHERE table_schema = 'public' AND table_type = 'BASE TABLE'\n        \"\"\"))\n        existing_tables = [row[0] for row in result.fetchall()]\n        print(f\"   数据库中的表: {existing_tables}\")\n        \n        # 清除 news 表\n        if 'news' in existing_tables:\n            result = conn.execute(text(\"SELECT COUNT(*) FROM news\"))\n            news_count = result.scalar()\n            print(f\"   当前新闻数量: {news_count}\")\n            conn.execute(text(\"TRUNCATE TABLE news RESTART IDENTITY CASCADE\"))\n            print(\"   ✅ news 表已清除\")\n        else:\n            print(\"   ⚠️ news 表不存在\")\n        \n        # 清除 news_analysis 表（如果存在）\n        if 'news_analysis' in existing_tables:\n            result = conn.execute(text(\"SELECT COUNT(*) FROM news_analysis\"))\n            analysis_count = result.scalar()\n            print(f\"   当前分析数量: {analysis_count}\")\n            conn.execute(text(\"TRUNCATE TABLE news_analysis RESTART IDENTITY CASCADE\"))\n            print(\"   ✅ news_analysis 表已清除\")\n        \n        # 清除 analysis 表（如果存在）\n        if 'analysis' in existing_tables:\n            result = conn.execute(text(\"SELECT COUNT(*) FROM analysis\"))\n            analysis_count = result.scalar()\n            print(f\"   当前 analysis 数量: {analysis_count}\")\n            conn.execute(text(\"TRUNCATE TABLE analysis RESTART IDENTITY CASCADE\"))\n            print(\"   ✅ analysis 表已清除\")\n        \n        conn.commit()\n        print(\"\\n✅ 所有新闻数据已清除！\")\n\nif __name__ == \"__main__\":\n    print(\"=\" * 50)\n    print(\"📰 FinnewsHunter - 清除新闻数据\")\n    print(\"=\" * 50)\n    \n    # 确认操作\n    if len(sys.argv) > 1 and sys.argv[1] == \"--yes\":\n        confirm = \"y\"\n    else:\n        confirm = input(\"\\n⚠️  确定要清除所有新闻数据吗？(y/N): \").strip().lower()\n    \n    if confirm == \"y\":\n        clear_all_news_data()\n        print(\"\\n🎉 完成！\")\n    else:\n        print(\"❌ 已取消操作\")\n\n"
  },
  {
    "path": "backend/env.example",
    "content": "# FinnewsHunter 环境变量配置模板\n# 复制此文件为 .env 并填入实际值\n\n# ===== 应用配置 =====\nAPP_NAME=FinnewsHunter\nAPP_VERSION=0.1.0\nDEBUG=True\n\n# ===== 服务器配置 =====\nHOST=0.0.0.0\nPORT=8000\n\n# ===== 数据库配置 =====\nPOSTGRES_USER=finnews\nPOSTGRES_PASSWORD=finnews_dev_password\nPOSTGRES_HOST=localhost\nPOSTGRES_PORT=5432\nPOSTGRES_DB=finnews_db\n\n# ===== Redis 配置 =====\nREDIS_HOST=localhost\nREDIS_PORT=6379\nREDIS_DB=0\n# REDIS_PASSWORD=  # 可选，生产环境建议设置\n\n# ===== Milvus 配置 =====\nMILVUS_HOST=localhost\nMILVUS_PORT=19530\nMILVUS_COLLECTION_NAME=finnews_embeddings\n\n# ⚠️ 重要：向量维度必须与 Embedding 模型匹配\n# - OpenAI text-embedding-ada-002: 1536 维\n# - 百炼 text-embedding-v4: 1024 维\nMILVUS_DIM=1536\n\n# ===== Neo4j 知识图谱配置 =====\nNEO4J_URI=bolt://localhost:7687\nNEO4J_USER=neo4j\nNEO4J_PASSWORD=finnews_neo4j_password\n\n# ==========================================\n# LLM 和 Embedding 配置\n# ==========================================\n# 支持5个厂商：bailian、openai、deepseek、kimi、zhipu\n# 前端可以动态切换，后端需要配置对应的 API Key\n\n# ===== 默认LLM配置（可选，用于后端默认行为） =====\nLLM_PROVIDER=bailian  # 默认提供商\nLLM_MODEL=qwen-plus   # 默认模型\nLLM_TEMPERATURE=0.7\nLLM_MAX_TOKENS=2000\nLLM_TIMEOUT=180  # LLM 调用超时时间（秒）\n\n# ==========================================\n# 各厂商 API Key 配置\n# ==========================================\n# ⚠️ 注意：前端可以切换任意厂商，请配置所有需要使用的厂商的 API Key\n\n# ----- 1. 百炼（Bailian / 阿里云）-----\n# 获取地址：https://dashscope.console.aliyun.com/\nDASHSCOPE_API_KEY=your-dashscope-api-key-here\nDASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\n# 可用模型列表（逗号分隔，可自定义添加新模型）\nBAILIAN_MODELS=qwen-plus,qwen-max,qwen-turbo,qwen-long\n# 百炼可选配置（如需使用 Agent 功能）\n# BAILIAN_ACCESS_KEY_ID=your-access-key-id\n# BAILIAN_ACCESS_KEY_SECRET=your-access-key-secret\n# BAILIAN_AGENT_CODE=your-agent-code\n# BAILIAN_REGION_ID=cn-beijing\n\n# ----- 2. OpenAI -----\n# 获取地址：https://platform.openai.com/api-keys\nOPENAI_API_KEY=your-openai-api-key-here\nOPENAI_BASE_URL=  # 留空使用官方 API，或填写代理地址\n# 可用模型列表（逗号分隔，可自定义添加新模型）\nOPENAI_MODELS=gpt-4,gpt-4-turbo,gpt-3.5-turbo\n\n# ----- 3. DeepSeek -----\n# 获取地址：https://platform.deepseek.com/api_keys\nDEEPSEEK_API_KEY=your-deepseek-api-key-here\nDEEPSEEK_BASE_URL=https://api.deepseek.com/v1  # 默认值，可不填\n# 可用模型列表（逗号分隔，可自定义添加新模型）\nDEEPSEEK_MODELS=deepseek-chat,deepseek-coder\n\n# ----- 4. Kimi (Moonshot) -----\n# 获取地址：https://platform.moonshot.cn/console/api-keys\nMOONSHOT_API_KEY=your-moonshot-api-key-here\nMOONSHOT_BASE_URL=https://api.moonshot.cn/v1  # 默认值，可不填\n# 可用模型列表（逗号分隔，可自定义添加新模型）\nMOONSHOT_MODELS=moonshot-v1-8k,moonshot-v1-32k,moonshot-v1-128k\n\n# ----- 5. 智谱 (Zhipu AI) -----\n# 获取地址：https://open.bigmodel.cn/usercenter/apikeys\nZHIPU_API_KEY=your-zhipu-api-key-here\nZHIPU_BASE_URL=https://open.bigmodel.cn/api/paas/v4  # 默认值，可不填\n# 可用模型列表（逗号分隔，可自定义添加新模型）\nZHIPU_MODELS=glm-4,glm-4-plus,glm-4-air,glm-3-turbo\n\n# ----- 6. BochaAI (Web Search API) -----\n# 获取地址：https://bochaai.com/\n# 用于定向爬取股票新闻时的搜索引擎\nBOCHAAI_API_KEY=your-bochaai-api-key-here\nBOCHAAI_ENDPOINT=https://api.bochaai.com/v1/web-search  # 默认值，可不填\n\n# ==========================================\n# Embedding 配置\n# ==========================================\n# EMBEDDING_PROVIDER=openai\n# EMBEDDING_MODEL=text-embedding-ada-002\n# EMBEDDING_BATCH_SIZE=100\n# EMBEDDING_BASE_URL=  # 留空使用官方 API\n\n# 使用百炼 Embedding 时的配置示例：\nEMBEDDING_PROVIDER=openai\nEMBEDDING_MODEL=text-embedding-v4\nEMBEDDING_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nMILVUS_DIM=1024  # 百炼 embedding 是 1024 维\n\n# ===== 爬取间隔配置（多源支持）=====\nCRAWL_INTERVAL_SINA=60  # 新浪财经爬取间隔（秒）\nCRAWL_INTERVAL_TENCENT=60  # 腾讯财经爬取间隔（秒）\nCRAWL_INTERVAL_JWVIEW=60  # 中新经纬爬取间隔（秒）\nCRAWL_INTERVAL_EEO=60  # 经济观察网爬取间隔（秒）\nCRAWL_INTERVAL_CAIJING=60  # 财经网爬取间隔（秒）\nCRAWL_INTERVAL_JINGJI21=60  # 21经济网爬取间隔（秒）\n\n# ===== 实时爬取与缓存配置 =====\nCACHE_TTL=1800  # 缓存过期时间（秒），默认30分钟\nNEWS_RETENTION_HOURS=24  # 新闻保留时间（小时），默认24小时\nFRONTEND_REFETCH_INTERVAL=180  # 前端自动刷新间隔（秒），默认3分钟\n\n# ===== 爬虫配置 =====\nCRAWLER_TIMEOUT=30\nCRAWLER_MAX_RETRIES=3\nCRAWLER_DELAY=1.0\n\n# ===== 安全配置 =====\nSECRET_KEY=your-secret-key-here-please-change-in-production\nACCESS_TOKEN_EXPIRE_MINUTES=10080\n\n# ===== 日志配置 =====\nLOG_LEVEL=INFO\nLOG_FILE=logs/finnews.log\n\n# ===== 业务配置 =====\nMAX_NEWS_PER_REQUEST=50\nNEWS_CACHE_TTL=3600"
  },
  {
    "path": "backend/init_db.py",
    "content": "#!/usr/bin/env python\n\"\"\"\n数据库初始化脚本\n独立运行以创建数据库表\n\"\"\"\nimport sys\nimport os\n\n# 添加当前目录到 Python 路径\nsys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))\n\nif __name__ == \"__main__\":\n    print(\"=\" * 60)\n    print(\"Initializing FinnewsHunter Database...\")\n    print(\"=\" * 60)\n    \n    try:\n        from sqlalchemy import create_engine\n        from sqlalchemy.orm import declarative_base\n        from app.core.config import settings\n        \n        # 导入所有模型\n        from app.models.database import Base\n        from app.models.news import News\n        from app.models.stock import Stock\n        from app.models.analysis import Analysis\n        from app.models.crawl_task import CrawlTask\n        from app.models.debate_history import DebateHistory\n        \n        print(f\"\\nConnecting to database: {settings.POSTGRES_HOST}:{settings.POSTGRES_PORT}/{settings.POSTGRES_DB}\")\n        \n        # 创建同步引擎\n        sync_engine = create_engine(\n            settings.SYNC_DATABASE_URL,\n            echo=False,\n            pool_pre_ping=True,\n        )\n        \n        print(\"Creating tables...\")\n        Base.metadata.create_all(bind=sync_engine)\n        \n        print(\"\\nDatabase initialized successfully!\")\n        print(f\"   - News table created\")\n        print(f\"   - Stock table created\")\n        print(f\"   - Analysis table created\")\n        print(f\"   - CrawlTask table created\")\n        print(f\"   - DebateHistory table created\")\n        print(\"=\" * 60)\n        sys.exit(0)\n        \n    except Exception as e:\n        print(f\"\\nDatabase initialization failed: {e}\")\n        import traceback\n        traceback.print_exc()\n        print(\"=\" * 60)\n        print(\"\\nNote: If tables already exist, this error is expected.\")\n        print(\"You can safely ignore it and proceed with starting the server.\")\n        sys.exit(0)  # 即使失败也返回0，因为表可能已存在\n\n"
  },
  {
    "path": "backend/init_knowledge_graph.py",
    "content": "#!/usr/bin/env python\n\"\"\"\n初始化知识图谱\n创建 Neo4j 约束、索引，并为示例股票构建图谱\n\"\"\"\nimport asyncio\nimport logging\nimport sys\n\n# 配置日志\nlogging.basicConfig(\n    level=logging.INFO,\n    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'\n)\nlogger = logging.getLogger(__name__)\n\n\nasync def init_knowledge_graph():\n    \"\"\"初始化知识图谱\"\"\"\n    try:\n        from app.core.neo4j_client import get_neo4j_client\n        from app.knowledge.graph_service import get_graph_service\n        from app.knowledge.knowledge_extractor import (\n            create_knowledge_extractor,\n            AkshareKnowledgeExtractor\n        )\n        \n        logger.info(\"=\" * 80)\n        logger.info(\"开始初始化知识图谱\")\n        logger.info(\"=\" * 80)\n        \n        # 1. 测试 Neo4j 连接\n        logger.info(\"\\n[1/4] 测试 Neo4j 连接...\")\n        neo4j_client = get_neo4j_client()\n        if neo4j_client.health_check():\n            logger.info(\"Neo4j 连接正常\")\n        else:\n            logger.error(\"Neo4j 连接失败，请检查配置\")\n            sys.exit(1)\n        \n        # 2. 初始化约束和索引\n        logger.info(\"\\n[2/4] 初始化数据库约束和索引...\")\n        graph_service = get_graph_service()\n        logger.info(\"约束和索引已创建\")\n        \n        # 3. 为示例股票创建图谱\n        logger.info(\"\\n[3/4] 为示例股票创建知识图谱...\")\n        \n        example_stocks = [\n            (\"SH600519\", \"贵州茅台\"),  # 示例1：大盘蓝筹\n            (\"SZ300634\", \"彩讯股份\"),  # 示例2：中小板\n        ]\n        \n        extractor = create_knowledge_extractor()\n        \n        for stock_code, stock_name in example_stocks:\n            logger.info(f\"\\n处理: {stock_name}({stock_code})\")\n            \n            # 检查是否已存在\n            existing = graph_service.get_company_graph(stock_code)\n            if existing:\n                logger.info(f\"  图谱已存在，跳过\")\n                continue\n            \n            # 从 akshare 获取信息\n            logger.info(f\"  从 akshare 获取信息...\")\n            akshare_info = AkshareKnowledgeExtractor.extract_company_info(stock_code)\n            \n            if not akshare_info:\n                logger.warning(f\"  akshare 未返回数据，跳过\")\n                continue\n            \n            # 使用 LLM 提取详细信息\n            logger.info(f\"  使用 LLM 提取详细信息...\")\n            base_graph = await extractor.extract_from_akshare(\n                stock_code, stock_name, akshare_info\n            )\n            \n            # 构建图谱\n            logger.info(f\"  构建图谱...\")\n            success = graph_service.build_company_graph(base_graph)\n            \n            if success:\n                stats = graph_service.get_graph_stats(stock_code)\n                logger.info(f\"  图谱构建成功: {stats}\")\n            else:\n                logger.error(f\"  图谱构建失败\")\n        \n        # 4. 显示统计信息\n        logger.info(\"\\n[4/4] 图谱统计...\")\n        companies = graph_service.list_all_companies()\n        logger.info(f\"当前共有 {len(companies)} 家公司的知识图谱\")\n        \n        for company in companies:\n            stats = graph_service.get_graph_stats(company['stock_code'])\n            logger.info(f\"  - {company['stock_name']}({company['stock_code']}): {stats}\")\n        \n        logger.info(\"\\n\" + \"=\" * 80)\n        logger.info(\"知识图谱初始化完成！\")\n        logger.info(\"=\" * 80)\n        logger.info(\"\\n下一步：\")\n        logger.info(\"  1. 访问 http://localhost:7474 查看 Neo4j 浏览器\")\n        logger.info(\"  2. 用户名: neo4j, 密码: finnews_neo4j_password\")\n        logger.info(\"  3. 执行定向爬取时，系统会自动使用知识图谱进行多关键词并发检索\")\n        logger.info(\"\\n\")\n        \n    except Exception as e:\n        logger.error(f\"初始化失败: {e}\", exc_info=True)\n        sys.exit(1)\n\n\nif __name__ == \"__main__\":\n    asyncio.run(init_knowledge_graph())\n\n"
  },
  {
    "path": "backend/requirements.txt",
    "content": "# ===== Web 框架 =====\nfastapi>=0.100.0\nuvicorn[standard]>=0.22.0\npydantic>=2.0.0\npydantic-settings>=2.0.0\npython-dotenv>=1.0.0\n\n# ===== 数据库 =====\nsqlalchemy>=2.0.0\nasyncpg>=0.29.0  # PostgreSQL 异步驱动\npsycopg2-binary>=2.9.0  # PostgreSQL 同步驱动（用于初始化）\nalembic>=1.12.0  # 数据库迁移工具\n\n# ===== 缓存与任务队列 =====\nredis>=4.5.0\ncelery>=5.3.0\n\n# ===== 向量数据库 =====\npymilvus>=2.3.0\n\n# ===== 图数据库 =====\nneo4j>=5.14.0  # Neo4j Python驱动\n\n# ===== 网络请求与爬虫 =====\nrequests>=2.31.0\nbeautifulsoup4>=4.12.0\nlxml>=4.9.0\naiohttp>=3.9.0\nmarkdownify>=0.11.0  # HTML 转 Markdown\nreadabilipy>=0.2.0  # 智能内容提取（Mozilla Readability）\nplaywright>=1.40.0  # JS 渲染（可选，需运行 playwright install）\n\n# ===== AI/ML =====\nopenai>=1.0.0\nanthropic>=0.7.0\nlitellm>=1.0.0\ntiktoken>=0.5.0  # Token 计数\n\n# ===== 文本处理 =====\njieba>=0.42.1  # 中文分词\npython-dateutil>=2.8.2\n\n# ===== 工具库 =====\nhttpx>=0.25.0\ntenacity>=8.2.0  # 重试机制\n\n# ===== AgenticX 框架 =====\nagenticx==0.1.9  # Docker 容器中使用 PyPI 版本\n# 本地开发可以用：pip install -e ../../../../agenticx\n\nakshare"
  },
  {
    "path": "backend/reset_database.py",
    "content": "\"\"\"\n清空数据库并重新开始\n用于重置系统数据\n\"\"\"\nimport asyncio\nimport sys\nfrom sqlalchemy import text\nfrom app.core.database import get_async_engine\nfrom app.core.redis_client import redis_client\nimport logging\n\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\n\nasync def reset_database():\n    \"\"\"清空所有数据\"\"\"\n    engine = get_async_engine()\n    \n    try:\n        async with engine.begin() as conn:\n            logger.info(\"=\" * 60)\n            logger.info(\"开始清空数据库...\")\n            logger.info(\"=\" * 60)\n            \n            # 1. 清空新闻表\n            logger.info(\"清空新闻表 (news)...\")\n            result = await conn.execute(text(\"DELETE FROM news\"))\n            logger.info(f\"✅ 已删除 {result.rowcount} 条新闻记录\")\n            \n            # 2. 清空爬取任务表\n            logger.info(\"清空爬取任务表 (crawl_tasks)...\")\n            result = await conn.execute(text(\"DELETE FROM crawl_tasks\"))\n            logger.info(f\"✅ 已删除 {result.rowcount} 条任务记录\")\n            \n            # 3. 清空分析表（如果存在）\n            try:\n                logger.info(\"清空分析表 (analyses)...\")\n                result = await conn.execute(text(\"DELETE FROM analyses\"))\n                logger.info(f\"✅ 已删除 {result.rowcount} 条分析记录\")\n            except Exception as e:\n                logger.warning(f\"清空分析表失败（表可能不存在）: {e}\")\n            \n            # 4. 重置自增ID\n            logger.info(\"重置表自增ID...\")\n            try:\n                await conn.execute(text(\"ALTER SEQUENCE news_id_seq RESTART WITH 1\"))\n                await conn.execute(text(\"ALTER SEQUENCE crawl_tasks_id_seq RESTART WITH 1\"))\n                await conn.execute(text(\"ALTER SEQUENCE analyses_id_seq RESTART WITH 1\"))\n                logger.info(\"✅ 自增ID已重置\")\n            except Exception as e:\n                logger.warning(f\"重置自增ID失败: {e}\")\n            \n            logger.info(\"=\" * 60)\n            logger.info(\"数据库清空完成！\")\n            logger.info(\"=\" * 60)\n        \n        # 5. 清空Redis缓存\n        if redis_client.is_available():\n            logger.info(\"清空Redis缓存...\")\n            try:\n                # 删除所有news相关的缓存键\n                redis_client.client.flushdb()\n                logger.info(\"✅ Redis缓存已清空\")\n            except Exception as e:\n                logger.error(f\"清空Redis失败: {e}\")\n        else:\n            logger.warning(\"⚠️  Redis不可用，跳过缓存清理\")\n        \n        logger.info(\"=\" * 60)\n        logger.info(\"✨ 数据重置完成！\")\n        logger.info(\"=\" * 60)\n        logger.info(\"下一步：\")\n        logger.info(\"1. 重启 Celery Worker 和 Beat\")\n        logger.info(\"2. 系统将自动开始爬取最新新闻\")\n        logger.info(\"3. 约5-10分钟后可在前端查看新数据\")\n        logger.info(\"=\" * 60)\n        \n    except Exception as e:\n        logger.error(f\"❌ 清空数据失败: {e}\")\n        import traceback\n        traceback.print_exc()\n        sys.exit(1)\n    finally:\n        await engine.dispose()\n\n\nif __name__ == \"__main__\":\n    # 确认操作\n    print(\"⚠️  警告：此操作将删除所有新闻和任务数据！\")\n    print(\"⚠️  此操作不可恢复！\")\n    confirm = input(\"确认要清空所有数据吗？(yes/no): \")\n    \n    if confirm.lower() in ['yes', 'y']:\n        asyncio.run(reset_database())\n    else:\n        print(\"❌ 操作已取消\")\n        sys.exit(0)\n\n"
  },
  {
    "path": "backend/setup_env.sh",
    "content": "#!/bin/bash\n# 环境变量快速配置脚本\n\necho \"============================================\"\necho \"  FinnewsHunter 环境配置向导\"\necho \"============================================\"\necho \"\"\necho \"请选择 LLM 服务商：\"\necho \"  1) OpenAI 官方（默认）\"\necho \"  2) 阿里云百炼（推荐国内用户）\"\necho \"  3) 其他 OpenAI 代理\"\necho \"  4) 手动配置（复制模板）\"\necho \"\"\nread -p \"请输入选项 (1-4) [默认:1]: \" choice\nchoice=${choice:-1}\n\ncase $choice in\n  1)\n    # OpenAI 官方\n    cat > .env << 'EOF'\n# FinnewsHunter 环境配置\nAPP_NAME=FinnewsHunter\nDEBUG=True\n\nPOSTGRES_USER=finnews\nPOSTGRES_PASSWORD=finnews_dev_password\nPOSTGRES_HOST=localhost\nPOSTGRES_PORT=5432\nPOSTGRES_DB=finnews_db\n\nREDIS_HOST=localhost\nREDIS_PORT=6379\n\nMILVUS_HOST=localhost\nMILVUS_PORT=19530\nMILVUS_DIM=1536\n\n# OpenAI 官方配置\nLLM_PROVIDER=openai\nLLM_MODEL=gpt-3.5-turbo\nLLM_TEMPERATURE=0.7\nLLM_MAX_TOKENS=2000\nOPENAI_API_KEY=sk-your-openai-api-key-here\n\nEMBEDDING_PROVIDER=openai\nEMBEDDING_MODEL=text-embedding-ada-002\n\nLOG_LEVEL=INFO\nEOF\n    echo \"\"\n    echo \"OpenAI 配置已创建\"\n    echo \"请编辑 .env 并填入你的 OPENAI_API_KEY\"\n    ;;\n    \n  2)\n    # 阿里云百炼\n    cat > .env << 'EOF'\n# FinnewsHunter 环境配置\nAPP_NAME=FinnewsHunter\nDEBUG=True\n\nPOSTGRES_USER=finnews\nPOSTGRES_PASSWORD=finnews_dev_password\nPOSTGRES_HOST=localhost\nPOSTGRES_PORT=5432\nPOSTGRES_DB=finnews_db\n\nREDIS_HOST=localhost\nREDIS_PORT=6379\n\nMILVUS_HOST=localhost\nMILVUS_PORT=19530\nMILVUS_DIM=1024\n\n# 阿里云百炼配置（OpenAI 兼容模式）\nLLM_PROVIDER=openai\nLLM_MODEL=qwen-plus\nLLM_TEMPERATURE=0.7\nLLM_MAX_TOKENS=2000\nOPENAI_API_KEY=sk-your-bailian-api-key-here\nOPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\n\nEMBEDDING_PROVIDER=openai\nEMBEDDING_MODEL=text-embedding-v4\n\nLOG_LEVEL=INFO\nEOF\n    echo \"\"\n    echo \"百炼配置已创建\"\n    echo \"请编辑 .env 并填入你的百炼 API Key\"\n    echo \"获取 Key: https://dashscope.console.aliyun.com/\"\n    ;;\n    \n  3)\n    # 其他代理\n    cat > .env << 'EOF'\n# FinnewsHunter 环境配置\nAPP_NAME=FinnewsHunter\nDEBUG=True\n\nPOSTGRES_USER=finnews\nPOSTGRES_PASSWORD=finnews_dev_password\nPOSTGRES_HOST=localhost\nPOSTGRES_PORT=5432\nPOSTGRES_DB=finnews_db\n\nREDIS_HOST=localhost\nREDIS_PORT=6379\n\nMILVUS_HOST=localhost\nMILVUS_PORT=19530\nMILVUS_DIM=1536\n\n# OpenAI 代理配置\nLLM_PROVIDER=openai\nLLM_MODEL=gpt-3.5-turbo\nLLM_TEMPERATURE=0.7\nLLM_MAX_TOKENS=2000\nOPENAI_API_KEY=sk-your-proxy-api-key\nOPENAI_BASE_URL=https://your-proxy.com/v1\n\nEMBEDDING_PROVIDER=openai\nEMBEDDING_MODEL=text-embedding-ada-002\n\nLOG_LEVEL=INFO\nEOF\n    echo \"\"\n    echo \"代理配置已创建\"\n    echo \"请编辑 .env 并填入你的代理信息\"\n    ;;\n    \n  4)\n    # 手动配置\n    cp env.example .env\n    echo \"\"\n    echo \"配置模板已复制\"\n    echo \"请编辑 .env 并选择合适的配置方案\"\n    ;;\n    \n  *)\n    echo \"无效选项\"\n    exit 1\n    ;;\nesac\n\necho \"\"\nread -p \"是否现在编辑配置文件？(Y/n): \" -n 1 -r\necho\nif [[ ! $REPLY =~ ^[Nn]$ ]]; then\n    ${EDITOR:-nano} .env\nfi\n\necho \"\"\necho \"配置完成！运行 ./start.sh 启动服务\"\n\n"
  },
  {
    "path": "backend/start.sh",
    "content": "#!/bin/bash\n# FinnewsHunter 启动脚本\n\nset -e\n\necho \"===================================\"\necho \"  FinnewsHunter Backend Startup\"\necho \"===================================\"\n\n# 获取脚本所在目录（backend目录）\nSCRIPT_DIR=\"$(cd \"$(dirname \"$0\")\" && pwd)\"\nDEPLOY_DIR=\"$(cd \"$SCRIPT_DIR/../deploy\" && pwd)\"\n\n# 1. 启动 Docker Compose 服务\necho \"\"\necho \"[1/4] Starting Docker Compose services...\"\ncd \"$DEPLOY_DIR\"\ndocker-compose -f docker-compose.dev.yml up -d\n\n# 等待数据库启动\necho \"\"\necho \"[2/4] Waiting for databases to be ready...\"\nsleep 10\n\n# 2. 初始化数据库（首次运行）\necho \"\"\necho \"[3/4] Initializing database...\"\ncd \"$SCRIPT_DIR\"\npython init_db.py || echo \"Database initialization skipped (may already exist)\"\n\n# 3. 启动 FastAPI 应用\necho \"\"\necho \"[4/4] Starting FastAPI application...\"\necho \"\"\necho \"Server will start at: http://localhost:8000\"\necho \"API Documentation: http://localhost:8000/docs\"\necho \"\"\n\n# 确保在 backend 目录下启动\ncd \"$SCRIPT_DIR\"\nuvicorn app.main:app --reload --host 0.0.0.0 --port 8000\n\n"
  },
  {
    "path": "backend/start_celery.sh",
    "content": "#!/bin/bash\n# Celery 容器化重启脚本\n# 用法: ./start_celery.sh [--restart|-r] [--force-recreate|-f] [--rebuild|-b] [--logs|-l]\n\nset -e\n\n# 解析命令行参数\nAUTO_RESTART=false\nFORCE_RECREATE=false\nREBUILD_IMAGE=false\nSHOW_LOGS=false\n\nwhile [[ $# -gt 0 ]]; do\n    case $1 in\n        --restart|-r)\n            AUTO_RESTART=true\n            shift\n            ;;\n        --force-recreate|-f)\n            FORCE_RECREATE=true\n            AUTO_RESTART=true\n            shift\n            ;;\n        --rebuild|-b)\n            REBUILD_IMAGE=true\n            FORCE_RECREATE=true\n            AUTO_RESTART=true\n            shift\n            ;;\n        --logs|-l)\n            SHOW_LOGS=true\n            shift\n            ;;\n        --help|-h)\n            echo \"用法: $0 [选项]\"\n            echo \"\"\n            echo \"选项:\"\n            echo \"  --restart, -r        自动重启容器（容器使用 python:3.11 基础镜像 + volumes 挂载）\"\n            echo \"  --force-recreate, -f 强制重建容器（会重新安装依赖，因为使用基础镜像）\"\n            echo \"  --rebuild, -b        重新构建镜像（构建的镜像不会被使用，仅用于清理未使用的镜像）\"\n            echo \"  --logs, -l           重启后自动显示日志\"\n            echo \"  --help, -h           显示帮助信息\"\n            echo \"\"\n            echo \"注意:\"\n            echo \"  - 当前容器使用 python:3.11 基础镜像 + volumes 挂载代码\"\n            echo \"  - 每次启动容器都会执行 pip install 安装依赖\"\n            echo \"  - --rebuild 选项会构建镜像，但构建的镜像不会被容器使用\"\n            echo \"\"\n            echo \"示例:\"\n            echo \"  $0                   交互式重启容器\"\n            echo \"  $0 --restart         自动重启容器\"\n            echo \"  $0 -r -l             自动重启并显示日志\"\n            echo \"  $0 -f                强制重建容器（会重新安装依赖）\"\n            echo \"  $0 --rebuild         重新构建镜像（仅用于清理未使用的镜像）\"\n            exit 0\n            ;;\n        *)\n            echo \"未知参数: $1\"\n            echo \"使用 --help 查看帮助信息\"\n            exit 1\n            ;;\n    esac\ndone\n\necho \"============================================\"\necho \"  FinnewsHunter Celery 容器重启脚本\"\necho \"============================================\"\necho \"\"\n\n# 获取脚本所在目录\nSCRIPT_DIR=\"$(cd \"$(dirname \"$0\")\" && pwd)\"\ncd \"$SCRIPT_DIR\"\n\n# 检查 Docker 是否运行\nif ! docker info > /dev/null 2>&1; then\n    echo \"Docker 未运行，请先启动 Docker\"\n    exit 1\nfi\n\n# 检查 docker-compose 文件是否存在\nCOMPOSE_FILE=\"../deploy/docker-compose.dev.yml\"\nif [ ! -f \"$COMPOSE_FILE\" ]; then\n    echo \"找不到 docker-compose 文件: $COMPOSE_FILE\"\n    exit 1\nfi\n\n# 检查容器状态\necho \"\"\necho \"[1/4] 检查 Celery 容器状态...\"\nWORKER_RUNNING=$(docker ps -q -f name=finnews_celery_worker)\nBEAT_RUNNING=$(docker ps -q -f name=finnews_celery_beat)\n\nif [ -n \"$WORKER_RUNNING\" ] || [ -n \"$BEAT_RUNNING\" ]; then\n    echo \"检测到 Celery 容器正在运行\"\n    echo \"   - Worker: $([ -n \"$WORKER_RUNNING\" ] && echo \"运行中 ($WORKER_RUNNING)\" || echo \"未运行\")\"\n    echo \"   - Beat: $([ -n \"$BEAT_RUNNING\" ] && echo \"运行中 ($BEAT_RUNNING)\" || echo \"未运行\")\"\n    \n    if [ \"$AUTO_RESTART\" = false ]; then\n        read -p \"是否重启容器？(y/N): \" -n 1 -r\n        echo\n        if [[ ! $REPLY =~ ^[Yy]$ ]]; then\n            echo \"已取消重启\"\n            exit 0\n        fi\n    else\n        echo \"自动重启模式，无需确认\"\n    fi\nfi\n\n# 检查 Redis 是否运行\necho \"\"\necho \"[2/4] 检查 Redis 连接...\"\nif docker exec finnews_redis redis-cli ping > /dev/null 2>&1; then\n    echo \"Redis 正常运行\"\nelse\n    echo \"Redis 未运行，请先启动 Docker Compose:\"\n    echo \"   cd ../deploy && docker-compose -f docker-compose.dev.yml up -d redis\"\n    exit 1\nfi\n\n# 重启 Celery Worker 容器\necho \"\"\ncd ../deploy\n\nif [ \"$REBUILD_IMAGE\" = true ]; then\n    echo \"[3/5] 重新构建镜像（注意：构建的镜像不会被容器使用，仅用于清理未使用的镜像）...\"\n    docker-compose -f docker-compose.dev.yml build celery-worker celery-beat\n    echo \"[4/5] 强制重建 Celery Worker 容器（使用 python:3.11 基础镜像 + volumes 挂载）...\"\n    docker-compose -f docker-compose.dev.yml up -d --force-recreate celery-worker\nelif [ \"$FORCE_RECREATE\" = true ]; then\n    echo \"[3/4] 强制重建 Celery Worker 容器（使用 python:3.11 基础镜像，会重新安装依赖）...\"\n    docker-compose -f docker-compose.dev.yml up -d --force-recreate celery-worker\nelse\n    echo \"[3/4] 重启 Celery Worker 容器（使用 python:3.11 基础镜像 + volumes 挂载）...\"\n    docker-compose -f docker-compose.dev.yml restart celery-worker\nfi\nWORKER_CONTAINER_ID=$(docker ps -q -f name=finnews_celery_worker)\necho \"Worker 容器已重启 (Container ID: $WORKER_CONTAINER_ID)\"\n\n# 等待 Worker 启动\nsleep 3\n\n# 重启 Celery Beat 容器\necho \"\"\nif [ \"$REBUILD_IMAGE\" = true ]; then\n    echo \"[5/5] 强制重建 Celery Beat 容器（使用 python:3.11 基础镜像 + volumes 挂载）...\"\n    docker-compose -f docker-compose.dev.yml up -d --force-recreate celery-beat\nelif [ \"$FORCE_RECREATE\" = true ]; then\n    echo \"[4/4] 强制重建 Celery Beat 容器（使用 python:3.11 基础镜像，会重新安装依赖）...\"\n    docker-compose -f docker-compose.dev.yml up -d --force-recreate celery-beat\nelse\n    echo \"[4/4] 重启 Celery Beat 容器（使用 python:3.11 基础镜像 + volumes 挂载）...\"\n    docker-compose -f docker-compose.dev.yml restart celery-beat\nfi\nBEAT_CONTAINER_ID=$(docker ps -q -f name=finnews_celery_beat)\necho \"Beat 容器已重启 (Container ID: $BEAT_CONTAINER_ID)\"\n\ncd \"$SCRIPT_DIR\"\n\necho \"\"\necho \"============================================\"\necho \"  Celery 容器重启成功！\"\necho \"============================================\"\necho \"\"\necho \"容器信息:\"\necho \"   - Worker Container ID: $WORKER_CONTAINER_ID\"\necho \"   - Beat Container ID: $BEAT_CONTAINER_ID\"\necho \"\"\necho \"查看日志命令:\"\necho \"   - Worker 日志: docker logs -f finnews_celery_worker\"\necho \"   - Beat 日志: docker logs -f finnews_celery_beat\"\necho \"   - 最近100行: docker logs --tail 100 finnews_celery_worker\"\necho \"\"\necho \"监控命令:\"\necho \"   - 查看任务列表: curl http://localhost:8000/api/v1/tasks/\"\necho \"   - 查看容器状态: docker ps | grep celery\"\necho \"\"\necho \"实时监控已启动，每1分钟自动爬取新闻\"\necho \"\"\necho \"说明:\"\necho \"   - 容器使用 python:3.11 基础镜像 + volumes 挂载代码\"\necho \"   - 每次启动容器都会执行 pip install 安装依赖\"\necho \"   - 构建的镜像（deploy-celery-worker/beat）不会被使用，可以删除释放空间\"\necho \"\"\necho \"停止服务:\"\necho \"   cd ../deploy && docker-compose -f docker-compose.dev.yml stop celery-worker celery-beat\"\necho \"\"\necho \"完全重启（重建容器，会重新安装依赖）:\"\necho \"   cd ../deploy && docker-compose -f docker-compose.dev.yml up -d --force-recreate celery-worker celery-beat\"\necho \"\"\necho \"============================================\"\n\nif [ \"$SHOW_LOGS\" = true ]; then\n    echo \"\"\n    echo \"正在监控日志（按 Ctrl+C 退出）...\"\n    echo \"\"\n    sleep 2\n    docker logs -f --tail 50 finnews_celery_worker\nfi\n\n"
  },
  {
    "path": "backend/tests/__init__.py",
    "content": "\"\"\"FinnewsHunter Tests\"\"\"\n"
  },
  {
    "path": "backend/tests/check_milvus_data.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\n检查 Milvus 向量存储中的数据\n\"\"\"\nimport sys\nimport os\nimport asyncio\n\n# 添加项目路径\nsys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))\n\nfrom app.storage.vector_storage import get_vector_storage\nfrom app.core.config import settings\n\ndef main():\n    try:\n        print(\"=\" * 60)\n        print(\"Milvus 向量存储信息\")\n        print(\"=\" * 60)\n        \n        storage = get_vector_storage()\n        stats = storage.get_stats()\n        \n        print(f\"\\n📊 集合统计信息:\")\n        print(f\"  集合名称: {stats['collection_name']}\")\n        print(f\"  向量维度: {stats['dim']}\")\n        num_entities = stats['num_entities']\n        if isinstance(num_entities, str):\n            print(f\"  存储的向量数量: {num_entities}\")\n        else:\n            print(f\"  存储的向量数量: {num_entities}\")\n            if num_entities == 0:\n                print(f\"  ⚠️  注意：如果显示为 0，可能是 flush 失败导致统计不准确\")\n        print(f\"  Milvus地址: {storage.host}:{storage.port}\")\n        \n        # 查询一些示例数据\n        print(f\"\\n📝 查询示例数据:\")\n        try:\n            # 使用 agenticx 的 query 方法获取数据\n            from agenticx.storage.vectordb_storages.base import VectorDBQuery\n            \n            # 创建一个零向量查询来获取所有数据（top_k 限制结果数）\n            zero_vector = [0.0] * stats['dim']\n            query = VectorDBQuery(query_vector=zero_vector, top_k=10)\n            \n            # query 是同步方法，可以直接调用\n            results = storage.milvus_storage.query(query)\n            \n            if results:\n                print(f\"   ✅ 找到 {len(results)} 条记录\")\n                if isinstance(stats['num_entities'], str) or stats['num_entities'] != len(results):\n                    print(f\"   ℹ️  统计数量: {stats['num_entities']}\")\n                print()\n                for i, result in enumerate(results[:5], 1):  # 只显示前5条\n                    payload = result.record.payload or {}\n                    news_id = payload.get('news_id', result.record.id)\n                    text = payload.get('text', '')\n                    text_preview = text[:100] + \"...\" if len(text) > 100 else text\n                    print(f\"  {i}. 新闻ID: {news_id}\")\n                    print(f\"     文本预览: {text_preview}\")\n                if len(results) > 5:\n                    print(f\"\\n  ... 还有 {len(results) - 5} 条记录未显示\")\n            else:\n                if stats['num_entities'] == 0:\n                    print(\"   ⚠️  未找到数据，集合可能确实为空\")\n                    print(\"   提示: 向量数据会在新闻分析时自动生成并存储\")\n                else:\n                    print(f\"   ⚠️  未找到数据，但统计显示有 {stats['num_entities']} 条记录\")\n                    print(\"   可能的原因：数据在缓冲区中，需要等待 Milvus 自动刷新\")\n        except Exception as e:\n            print(f\"  ❌ 无法查询数据: {e}\")\n            import traceback\n            traceback.print_exc()\n            if stats['num_entities'] == 0:\n                print(\"\\n   提示: 如果这是首次运行，集合可能确实为空\")\n        \n        print(\"\\n\" + \"=\" * 60)\n        print(\"💡 提示:\")\n        print(\"  - 向量数据存储在 Milvus 数据库中\")\n        print(\"  - 可以通过 Milvus 客户端工具查看完整数据\")\n        print(\"  - 向量维度必须与 embedding 模型匹配\")\n        print(\"=\" * 60)\n        \n    except Exception as e:\n        print(f\"\\n❌ 错误: {e}\")\n        print(\"\\n可能的原因:\")\n        print(\"  1. Milvus 服务未启动\")\n        print(\"  2. Milvus 连接配置错误\")\n        print(\"  3. 集合尚未创建\")\n        print(\"\\n检查方法:\")\n        print(f\"  - 确认 Milvus 运行在 {settings.MILVUS_HOST}:{settings.MILVUS_PORT}\")\n        print(f\"  - 检查 .env 文件中的 MILVUS_* 配置\")\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "backend/tests/check_news_embedding_status.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\n检查新闻的向量化状态\n\"\"\"\nimport sys\nimport os\nimport asyncio\n\n# 添加项目路径\nsys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))\n\nfrom sqlalchemy import select, func\nfrom app.core.database import get_db\nfrom app.models.news import News\nfrom app.models.analysis import Analysis\n\nasync def main():\n    try:\n        async for db in get_db():\n            # 统计总体情况\n            total_result = await db.execute(select(func.count(News.id)))\n            total_news = total_result.scalar() or 0\n            \n            embedded_result = await db.execute(\n                select(func.count(News.id)).where(News.is_embedded == 1)\n            )\n            embedded_count = embedded_result.scalar() or 0\n            \n            analyzed_result = await db.execute(\n                select(func.count(News.id)).where(News.sentiment_score.isnot(None))\n            )\n            analyzed_count = analyzed_result.scalar() or 0\n            \n            # 查找已分析但未向量化的新闻\n            not_embedded_result = await db.execute(\n                select(News.id, News.title, News.sentiment_score)\n                .where(\n                    News.sentiment_score.isnot(None),\n                    News.is_embedded == 0\n                )\n                .order_by(News.id.desc())\n                .limit(10)\n            )\n            not_embedded_news = not_embedded_result.all()\n            \n            print(\"=\" * 60)\n            print(\"新闻向量化状态统计\")\n            print(\"=\" * 60)\n            print(f\"\\n📊 总体统计:\")\n            print(f\"  总新闻数: {total_news}\")\n            print(f\"  已分析新闻: {analyzed_count}\")\n            print(f\"  已向量化新闻: {embedded_count}\")\n            print(f\"  已分析但未向量化: {analyzed_count - embedded_count}\")\n            \n            if not_embedded_news:\n                print(f\"\\n⚠️  最近10条已分析但未向量化的新闻:\")\n                for news_id, title, sentiment_score in not_embedded_news:\n                    title_preview = title[:50] + \"...\" if len(title) > 50 else title\n                    print(f\"  - ID: {news_id}, 情感分数: {sentiment_score:.2f}\")\n                    print(f\"    标题: {title_preview}\")\n            else:\n                print(\"\\n✅ 所有已分析的新闻都已向量化\")\n            \n            print(\"\\n\" + \"=\" * 60)\n            print(\"💡 可能的原因:\")\n            print(\"  1. Embedding API 超时（20秒超时）\")\n            print(\"  2. Milvus 连接失败\")\n            print(\"  3. Embedding 服务配置错误\")\n            print(\"\\n🔧 解决方案:\")\n            print(\"  1. 检查后端日志中的 embedding 错误\")\n            print(\"  2. 确认 Milvus 服务正在运行\")\n            print(\"  3. 检查 embedding API 配置（百炼/OpenAI）\")\n            print(\"  4. 可以手动重新向量化这些新闻\")\n            print(\"=\" * 60)\n            \n    except Exception as e:\n        print(f\"\\n❌ 错误: {e}\")\n        import traceback\n        traceback.print_exc()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n"
  },
  {
    "path": "backend/tests/financial/__init__.py",
    "content": "\"\"\"Financial module tests\"\"\"\n"
  },
  {
    "path": "backend/tests/financial/test_smoke_openbb_models.py",
    "content": "\"\"\"\n冒烟测试: Standard Models (P0-1, P0-2)\n\n验证:\n- NewsQueryParams, NewsData 模型可正常实例化\n- StockQueryParams, StockPriceData 模型可正常实例化\n- 字段验证逻辑正确\n- to_legacy_dict 兼容方法正常工作\n\n运行:\n    pytest -q -k \"smoke_openbb_models\"\n\"\"\"\nimport pytest\nfrom datetime import datetime\n\n\nclass TestNewsModels:\n    \"\"\"测试新闻相关模型\"\"\"\n\n    def test_news_query_params_basic(self):\n        \"\"\"测试 NewsQueryParams 基本实例化\"\"\"\n        from app.financial.models.news import NewsQueryParams\n\n        # 默认参数\n        params = NewsQueryParams()\n        assert params.limit == 50\n        assert params.keywords is None\n        assert params.stock_codes is None\n\n        # 自定义参数\n        params = NewsQueryParams(\n            keywords=[\"茅台\", \"白酒\"],\n            stock_codes=[\"600519\"],\n            limit=20\n        )\n        assert params.keywords == [\"茅台\", \"白酒\"]\n        assert params.stock_codes == [\"600519\"]\n        assert params.limit == 20\n\n    def test_news_query_params_validation(self):\n        \"\"\"测试 NewsQueryParams 字段验证\"\"\"\n        from app.financial.models.news import NewsQueryParams\n        from pydantic import ValidationError\n\n        # limit 边界测试\n        params = NewsQueryParams(limit=1)\n        assert params.limit == 1\n\n        params = NewsQueryParams(limit=500)\n        assert params.limit == 500\n\n        # limit 超出范围应报错\n        with pytest.raises(ValidationError):\n            NewsQueryParams(limit=0)\n\n        with pytest.raises(ValidationError):\n            NewsQueryParams(limit=501)\n\n    def test_news_data_basic(self):\n        \"\"\"测试 NewsData 基本实例化\"\"\"\n        from app.financial.models.news import NewsData, NewsSentiment\n\n        news = NewsData(\n            id=\"test123\",\n            title=\"测试新闻标题\",\n            content=\"这是测试新闻的正文内容...\",\n            source=\"sina\",\n            source_url=\"https://finance.sina.com.cn/test\",\n            publish_time=datetime(2024, 1, 1, 10, 30)\n        )\n\n        assert news.id == \"test123\"\n        assert news.title == \"测试新闻标题\"\n        assert news.source == \"sina\"\n        assert news.sentiment is None  # 可选字段默认 None\n        assert news.stock_codes == []  # 默认空列表\n\n    def test_news_data_with_sentiment(self):\n        \"\"\"测试 NewsData 带情感标签\"\"\"\n        from app.financial.models.news import NewsData, NewsSentiment\n\n        news = NewsData(\n            id=\"test456\",\n            title=\"利好消息\",\n            content=\"公司业绩超预期...\",\n            source=\"sina\",\n            source_url=\"https://example.com\",\n            publish_time=datetime.now(),\n            sentiment=NewsSentiment.POSITIVE,\n            sentiment_score=0.85\n        )\n\n        assert news.sentiment == NewsSentiment.POSITIVE\n        assert news.sentiment_score == 0.85\n\n    def test_news_data_generate_id(self):\n        \"\"\"测试 NewsData.generate_id 方法\"\"\"\n        from app.financial.models.news import NewsData\n\n        url1 = \"https://finance.sina.com.cn/news/123\"\n        url2 = \"https://finance.sina.com.cn/news/456\"\n\n        id1 = NewsData.generate_id(url1)\n        id2 = NewsData.generate_id(url2)\n\n        # 相同 URL 生成相同 ID\n        assert id1 == NewsData.generate_id(url1)\n        # 不同 URL 生成不同 ID\n        assert id1 != id2\n        # ID 长度为 16\n        assert len(id1) == 16\n\n    def test_news_data_to_legacy_dict(self):\n        \"\"\"测试 NewsData.to_legacy_dict 兼容方法\"\"\"\n        from app.financial.models.news import NewsData\n\n        news = NewsData(\n            id=\"test789\",\n            title=\"测试标题\",\n            content=\"测试内容\",\n            source=\"sina\",\n            source_url=\"https://example.com/news\",\n            publish_time=datetime(2024, 6, 15, 14, 30),\n            author=\"记者\",\n            stock_codes=[\"SH600519\"]\n        )\n\n        legacy = news.to_legacy_dict()\n\n        # 验证字段映射\n        assert legacy[\"title\"] == \"测试标题\"\n        assert legacy[\"url\"] == \"https://example.com/news\"  # source_url → url\n        assert legacy[\"source\"] == \"sina\"\n        assert legacy[\"author\"] == \"记者\"\n        assert \"SH600519\" in legacy[\"stock_codes\"]\n\n\nclass TestStockModels:\n    \"\"\"测试股票相关模型\"\"\"\n\n    def test_stock_query_params_basic(self):\n        \"\"\"测试 StockQueryParams 基本实例化\"\"\"\n        from app.financial.models.stock import (\n            StockQueryParams, KlineInterval, AdjustType\n        )\n\n        # 最小参数\n        params = StockQueryParams(symbol=\"600519\")\n        assert params.symbol == \"600519\"\n        assert params.interval == KlineInterval.DAILY\n        assert params.adjust == AdjustType.QFQ\n        assert params.limit == 90\n\n        # 自定义参数\n        params = StockQueryParams(\n            symbol=\"SH600519\",\n            interval=KlineInterval.MIN_5,\n            adjust=AdjustType.HFQ,\n            limit=30\n        )\n        assert params.interval == KlineInterval.MIN_5\n        assert params.adjust == AdjustType.HFQ\n\n    def test_stock_price_data_basic(self):\n        \"\"\"测试 StockPriceData 基本实例化\"\"\"\n        from app.financial.models.stock import StockPriceData\n\n        price = StockPriceData(\n            symbol=\"600519\",\n            date=datetime(2024, 6, 15),\n            open=1500.0,\n            high=1520.0,\n            low=1490.0,\n            close=1510.0,\n            volume=1000000\n        )\n\n        assert price.symbol == \"600519\"\n        assert price.close == 1510.0\n        assert price.turnover is None  # 可选字段\n\n    def test_stock_price_data_to_legacy_dict(self):\n        \"\"\"测试 StockPriceData.to_legacy_dict 兼容方法\"\"\"\n        from app.financial.models.stock import StockPriceData\n\n        price = StockPriceData(\n            symbol=\"600519\",\n            date=datetime(2024, 6, 15, 10, 0, 0),\n            open=1500.0,\n            high=1520.0,\n            low=1490.0,\n            close=1510.0,\n            volume=1000000,\n            change_percent=0.67\n        )\n\n        legacy = price.to_legacy_dict()\n\n        # 验证字段\n        assert legacy[\"date\"] == \"2024-06-15\"\n        assert legacy[\"close\"] == 1510.0\n        assert legacy[\"change_percent\"] == 0.67\n        assert \"timestamp\" in legacy  # 应包含毫秒时间戳\n\n    def test_kline_interval_enum(self):\n        \"\"\"测试 KlineInterval 枚举\"\"\"\n        from app.financial.models.stock import KlineInterval\n\n        assert KlineInterval.MIN_1.value == \"1m\"\n        assert KlineInterval.DAILY.value == \"1d\"\n        assert KlineInterval(\"1d\") == KlineInterval.DAILY\n\n    def test_adjust_type_enum(self):\n        \"\"\"测试 AdjustType 枚举\"\"\"\n        from app.financial.models.stock import AdjustType\n\n        assert AdjustType.QFQ.value == \"qfq\"\n        assert AdjustType.HFQ.value == \"hfq\"\n        assert AdjustType(\"none\") == AdjustType.NONE\n"
  },
  {
    "path": "backend/tests/financial/test_smoke_openbb_provider.py",
    "content": "\"\"\"\n冒烟测试: Provider & Registry (P0-3, P0-4)\n\n验证:\n- BaseFetcher 抽象类可被正确继承\n- BaseProvider 抽象类可被正确继承\n- ProviderRegistry 注册/获取/降级逻辑\n- SinaProvider 正确注册\n\n运行:\n    pytest -q -k \"smoke_openbb_provider\"\n\"\"\"\nimport pytest\nfrom typing import Dict, Any, List, Type\nfrom datetime import datetime\n\n\nclass TestBaseFetcherAbstraction:\n    \"\"\"测试 BaseFetcher 抽象\"\"\"\n\n    def test_fetcher_subclass_implementation(self):\n        \"\"\"测试 Fetcher 子类实现\"\"\"\n        from app.financial.providers.base import BaseFetcher\n        from app.financial.models.news import NewsQueryParams, NewsData\n\n        class MockNewsFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n            query_model = NewsQueryParams\n            data_model = NewsData\n\n            def transform_query(self, params: NewsQueryParams) -> Dict[str, Any]:\n                return {\"limit\": params.limit, \"keywords\": params.keywords}\n\n            async def extract_data(self, query: Dict[str, Any]) -> List[Dict]:\n                return [\n                    {\"title\": \"Test News\", \"content\": \"Content\", \"url\": \"http://test.com\"}\n                ]\n\n            def transform_data(self, raw_data: List[Dict], query: NewsQueryParams) -> List[NewsData]:\n                return [\n                    NewsData(\n                        id=f\"mock_{i}\",\n                        title=item[\"title\"],\n                        content=item[\"content\"],\n                        source=\"mock\",\n                        source_url=item[\"url\"],\n                        publish_time=datetime.now()\n                    )\n                    for i, item in enumerate(raw_data)\n                ]\n\n        fetcher = MockNewsFetcher()\n\n        # 测试 transform_query\n        params = NewsQueryParams(limit=10, keywords=[\"test\"])\n        query = fetcher.transform_query(params)\n        assert query[\"limit\"] == 10\n        assert query[\"keywords\"] == [\"test\"]\n\n    @pytest.mark.asyncio\n    async def test_fetcher_fetch_pipeline(self):\n        \"\"\"测试 Fetcher 完整 TET Pipeline\"\"\"\n        from app.financial.providers.base import BaseFetcher\n        from app.financial.models.news import NewsQueryParams, NewsData\n\n        class MockFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n            query_model = NewsQueryParams\n            data_model = NewsData\n\n            def transform_query(self, params):\n                return {\"count\": params.limit}\n\n            async def extract_data(self, query):\n                return [{\"title\": f\"News {i}\"} for i in range(query[\"count\"])]\n\n            def transform_data(self, raw_data, query):\n                return [\n                    NewsData(\n                        id=f\"id_{i}\",\n                        title=item[\"title\"],\n                        content=\"content\",\n                        source=\"mock\",\n                        source_url=\"http://mock.com\",\n                        publish_time=datetime.now()\n                    )\n                    for i, item in enumerate(raw_data)\n                ]\n\n        fetcher = MockFetcher()\n        params = NewsQueryParams(limit=5)\n        results = await fetcher.fetch(params)\n\n        assert len(results) == 5\n        assert all(isinstance(r, NewsData) for r in results)\n\n\nclass TestBaseProviderAbstraction:\n    \"\"\"测试 BaseProvider 抽象\"\"\"\n\n    def test_provider_subclass_implementation(self):\n        \"\"\"测试 Provider 子类实现\"\"\"\n        from app.financial.providers.base import BaseProvider, BaseFetcher, ProviderInfo\n        from app.financial.models.news import NewsQueryParams, NewsData\n\n        class MockFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n            query_model = NewsQueryParams\n            data_model = NewsData\n\n            def transform_query(self, params):\n                return {}\n\n            async def extract_data(self, query):\n                return []\n\n            def transform_data(self, raw_data, query):\n                return []\n\n        class MockProvider(BaseProvider):\n            @property\n            def info(self) -> ProviderInfo:\n                return ProviderInfo(\n                    name=\"mock\",\n                    display_name=\"Mock Provider\",\n                    description=\"For testing\",\n                    priority=99\n                )\n\n            @property\n            def fetchers(self) -> Dict[str, Type[BaseFetcher]]:\n                return {\"news\": MockFetcher}\n\n        provider = MockProvider()\n\n        assert provider.info.name == \"mock\"\n        assert provider.supports(\"news\") is True\n        assert provider.supports(\"stock_price\") is False\n\n        fetcher = provider.get_fetcher(\"news\")\n        assert fetcher is not None\n        assert isinstance(fetcher, MockFetcher)\n\n\nclass TestProviderRegistry:\n    \"\"\"测试 ProviderRegistry\"\"\"\n\n    def test_registry_singleton(self):\n        \"\"\"测试 Registry 单例模式\"\"\"\n        from app.financial.registry import ProviderRegistry\n\n        r1 = ProviderRegistry()\n        r2 = ProviderRegistry()\n        assert r1 is r2\n\n    def test_registry_register_and_list(self):\n        \"\"\"测试注册和列出 Provider\"\"\"\n        from app.financial.registry import reset_registry\n        from app.financial.providers.base import BaseProvider, ProviderInfo, BaseFetcher\n        from typing import Dict, Type\n\n        registry = reset_registry()\n\n        class MockProvider1(BaseProvider):\n            @property\n            def info(self):\n                return ProviderInfo(name=\"p1\", display_name=\"P1\", description=\"\", priority=2)\n\n            @property\n            def fetchers(self):\n                return {}\n\n        class MockProvider2(BaseProvider):\n            @property\n            def info(self):\n                return ProviderInfo(name=\"p2\", display_name=\"P2\", description=\"\", priority=1)\n\n            @property\n            def fetchers(self):\n                return {}\n\n        registry.register(MockProvider1())\n        registry.register(MockProvider2())\n\n        providers = registry.list_providers()\n        assert \"p1\" in providers\n        assert \"p2\" in providers\n        # p2 优先级更高，应该在前面\n        assert providers.index(\"p2\") < providers.index(\"p1\")\n\n    def test_registry_get_fetcher_auto_fallback(self):\n        \"\"\"测试获取 Fetcher 自动降级\"\"\"\n        from app.financial.registry import reset_registry, FetcherNotFoundError\n        from app.financial.providers.base import BaseProvider, ProviderInfo, BaseFetcher\n        from app.financial.models.news import NewsQueryParams, NewsData\n        from typing import Dict, Type\n        from datetime import datetime\n\n        registry = reset_registry()\n\n        class MockFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n            query_model = NewsQueryParams\n            data_model = NewsData\n\n            def transform_query(self, params):\n                return {}\n\n            async def extract_data(self, query):\n                return []\n\n            def transform_data(self, raw_data, query):\n                return []\n\n        class ProviderA(BaseProvider):\n            @property\n            def info(self):\n                return ProviderInfo(name=\"a\", display_name=\"A\", description=\"\", priority=1)\n\n            @property\n            def fetchers(self):\n                return {\"news\": MockFetcher}\n\n        class ProviderB(BaseProvider):\n            @property\n            def info(self):\n                return ProviderInfo(name=\"b\", display_name=\"B\", description=\"\", priority=2)\n\n            @property\n            def fetchers(self):\n                return {\"news\": MockFetcher, \"stock\": MockFetcher}\n\n        registry.register(ProviderA())\n        registry.register(ProviderB())\n\n        # 获取 news：应该返回 ProviderA 的 (优先级更高)\n        fetcher = registry.get_fetcher(\"news\")\n        assert fetcher is not None\n\n        # 获取 stock：只有 ProviderB 支持\n        fetcher = registry.get_fetcher(\"stock\")\n        assert fetcher is not None\n\n        # 获取不存在的类型\n        with pytest.raises(FetcherNotFoundError):\n            registry.get_fetcher(\"nonexistent\")\n\n    def test_registry_get_fetcher_by_name(self):\n        \"\"\"测试指定 Provider 名称获取 Fetcher\"\"\"\n        from app.financial.registry import reset_registry, ProviderNotFoundError\n        from app.financial.providers.base import BaseProvider, ProviderInfo, BaseFetcher\n        from app.financial.models.news import NewsQueryParams, NewsData\n\n        registry = reset_registry()\n\n        class MockFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n            query_model = NewsQueryParams\n            data_model = NewsData\n\n            def transform_query(self, params):\n                return {}\n\n            async def extract_data(self, query):\n                return []\n\n            def transform_data(self, raw_data, query):\n                return []\n\n        class MyProvider(BaseProvider):\n            @property\n            def info(self):\n                return ProviderInfo(name=\"my\", display_name=\"My\", description=\"\")\n\n            @property\n            def fetchers(self):\n                return {\"news\": MockFetcher}\n\n        registry.register(MyProvider())\n\n        # 指定存在的 Provider\n        fetcher = registry.get_fetcher(\"news\", provider=\"my\")\n        assert fetcher is not None\n\n        # 指定不存在的 Provider\n        with pytest.raises(ProviderNotFoundError):\n            registry.get_fetcher(\"news\", provider=\"nonexistent\")\n\n\nclass TestSinaProvider:\n    \"\"\"测试 SinaProvider\"\"\"\n\n    def test_sina_provider_info(self):\n        \"\"\"测试 SinaProvider 元信息\"\"\"\n        from app.financial.providers.sina import SinaProvider\n\n        provider = SinaProvider()\n\n        assert provider.info.name == \"sina\"\n        assert provider.info.display_name == \"新浪财经\"\n        assert provider.supports(\"news\") is True\n\n    def test_sina_provider_get_news_fetcher(self):\n        \"\"\"测试获取 SinaNewsFetcher\"\"\"\n        from app.financial.providers.sina import SinaProvider\n        from app.financial.providers.sina.fetchers.news import SinaNewsFetcher\n\n        provider = SinaProvider()\n        fetcher = provider.get_fetcher(\"news\")\n\n        assert fetcher is not None\n        assert isinstance(fetcher, SinaNewsFetcher)\n\n    def test_sina_news_fetcher_transform_query(self):\n        \"\"\"测试 SinaNewsFetcher.transform_query\"\"\"\n        from app.financial.providers.sina.fetchers.news import SinaNewsFetcher\n        from app.financial.models.news import NewsQueryParams\n\n        fetcher = SinaNewsFetcher()\n\n        # 无股票代码\n        params = NewsQueryParams(limit=10)\n        query = fetcher.transform_query(params)\n        assert query[\"limit\"] == 10\n        assert \"base_url\" in query\n\n        # 有股票代码\n        params = NewsQueryParams(stock_codes=[\"600519\"], limit=20)\n        query = fetcher.transform_query(params)\n        assert \"stock_urls\" in query\n        assert len(query[\"stock_urls\"]) == 1\n        assert \"sh600519\" in query[\"stock_urls\"][0].lower()\n"
  },
  {
    "path": "backend/tests/financial/test_smoke_openbb_tools.py",
    "content": "\"\"\"\n冒烟测试: Financial Tools (P1-2)\n\n验证:\n- FinancialNewsTool 可正常实例化\n- Tool 在无 Provider 时返回错误而非崩溃\n- Tool 正确调用 Registry\n\n运行:\n    pytest -q -k \"smoke_openbb_tools\"\n\"\"\"\nimport pytest\nfrom unittest.mock import patch, AsyncMock, MagicMock\nfrom datetime import datetime\n\n\nclass TestFinancialNewsTool:\n    \"\"\"测试 FinancialNewsTool\"\"\"\n\n    def test_tool_instantiation(self):\n        \"\"\"测试工具实例化\"\"\"\n        from app.financial.tools import FinancialNewsTool\n\n        tool = FinancialNewsTool()\n\n        assert tool.name == \"financial_news\"\n        assert \"金融新闻\" in tool.description or \"news\" in tool.description.lower()\n\n    def test_tool_has_required_methods(self):\n        \"\"\"测试工具具有必要方法\"\"\"\n        from app.financial.tools import FinancialNewsTool\n\n        tool = FinancialNewsTool()\n\n        assert hasattr(tool, \"execute\")\n        assert hasattr(tool, \"aexecute\")\n        assert callable(tool.execute)\n        assert callable(tool.aexecute)\n\n    @pytest.mark.asyncio\n    async def test_tool_returns_error_when_no_provider(self):\n        \"\"\"测试无 Provider 时返回错误\"\"\"\n        from app.financial.tools import FinancialNewsTool\n        from app.financial.registry import reset_registry\n\n        # 清空 Registry\n        reset_registry()\n\n        tool = FinancialNewsTool()\n        result = await tool.aexecute(limit=10)\n\n        # 应返回错误而非崩溃\n        assert result[\"success\"] is False\n        assert \"error\" in result\n\n    @pytest.mark.asyncio\n    async def test_tool_with_mocked_fetcher(self):\n        \"\"\"测试工具与 Mock Fetcher 集成\"\"\"\n        from app.financial.tools import FinancialNewsTool\n        from app.financial.registry import reset_registry, get_registry\n        from app.financial.providers.base import BaseProvider, ProviderInfo, BaseFetcher\n        from app.financial.models.news import NewsQueryParams, NewsData\n\n        registry = reset_registry()\n\n        # 创建 Mock Fetcher\n        class MockFetcher(BaseFetcher[NewsQueryParams, NewsData]):\n            query_model = NewsQueryParams\n            data_model = NewsData\n\n            def transform_query(self, params):\n                return {\"limit\": params.limit}\n\n            async def extract_data(self, query):\n                return [\n                    {\"title\": \"Mock News 1\", \"content\": \"Content 1\", \"url\": \"http://mock1.com\"},\n                    {\"title\": \"Mock News 2\", \"content\": \"Content 2\", \"url\": \"http://mock2.com\"},\n                ]\n\n            def transform_data(self, raw_data, query):\n                return [\n                    NewsData(\n                        id=f\"mock_{i}\",\n                        title=item[\"title\"],\n                        content=item[\"content\"],\n                        source=\"mock\",\n                        source_url=item[\"url\"],\n                        publish_time=datetime.now()\n                    )\n                    for i, item in enumerate(raw_data)\n                ]\n\n        class MockProvider(BaseProvider):\n            @property\n            def info(self):\n                return ProviderInfo(name=\"mock\", display_name=\"Mock\", description=\"\")\n\n            @property\n            def fetchers(self):\n                return {\"news\": MockFetcher}\n\n        registry.register(MockProvider())\n\n        tool = FinancialNewsTool()\n        result = await tool.aexecute(limit=10)\n\n        assert result[\"success\"] is True\n        assert result[\"count\"] == 2\n        assert len(result[\"data\"]) == 2\n        assert result[\"data\"][0][\"title\"] == \"Mock News 1\"\n\n\nclass TestStockPriceTool:\n    \"\"\"测试 StockPriceTool\"\"\"\n\n    def test_tool_instantiation(self):\n        \"\"\"测试工具实例化\"\"\"\n        from app.financial.tools import StockPriceTool\n\n        tool = StockPriceTool()\n\n        assert tool.name == \"stock_price\"\n        assert \"K线\" in tool.description or \"price\" in tool.description.lower()\n\n    @pytest.mark.asyncio\n    async def test_tool_returns_error_for_invalid_interval(self):\n        \"\"\"测试无效参数时返回错误\"\"\"\n        from app.financial.tools import StockPriceTool\n\n        tool = StockPriceTool()\n        result = await tool.aexecute(symbol=\"600519\", interval=\"invalid_interval\")\n\n        assert result[\"success\"] is False\n        assert \"error\" in result\n\n    @pytest.mark.asyncio\n    async def test_tool_returns_error_when_no_provider(self):\n        \"\"\"测试无 Provider 时返回错误\"\"\"\n        from app.financial.tools import StockPriceTool\n        from app.financial.registry import reset_registry\n\n        reset_registry()\n\n        tool = StockPriceTool()\n        result = await tool.aexecute(symbol=\"600519\")\n\n        assert result[\"success\"] is False\n        assert \"error\" in result\n\n\nclass TestSetupDefaultProviders:\n    \"\"\"测试默认 Provider 设置\"\"\"\n\n    def test_setup_registers_sina(self):\n        \"\"\"测试 setup_default_providers 注册 SinaProvider\"\"\"\n        from app.financial.registry import reset_registry, get_registry\n        from app.financial.tools import setup_default_providers\n\n        registry = reset_registry()\n        assert \"sina\" not in registry.list_providers()\n\n        setup_default_providers()\n\n        assert \"sina\" in registry.list_providers()\n\n    def test_setup_idempotent(self):\n        \"\"\"测试 setup_default_providers 幂等性\"\"\"\n        from app.financial.registry import reset_registry, get_registry\n        from app.financial.tools import setup_default_providers\n\n        reset_registry()\n\n        # 多次调用不应报错\n        setup_default_providers()\n        setup_default_providers()\n        setup_default_providers()\n\n        registry = get_registry()\n        assert registry.list_providers().count(\"sina\") == 1\n"
  },
  {
    "path": "backend/tests/manual_vectorize.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\n手动向量化新闻（用于修复未向量化的新闻）\n\"\"\"\nimport sys\nimport os\nimport asyncio\nimport logging\n\n# 添加项目路径\nsys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))\n\n# 先加载环境变量（避免循环导入）\nfrom dotenv import load_dotenv\nfrom pathlib import Path\nenv_path = Path(__file__).parent / \".env\"\nload_dotenv(env_path)\n\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\nasync def vectorize_news_manually(news_id: int):\n    \"\"\"手动向量化单个新闻\"\"\"\n    # 直接使用 SQLAlchemy 创建连接，避免循环导入\n    from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker\n    from sqlalchemy import text\n    from starlette.concurrency import run_in_threadpool\n    \n    # 从环境变量构建数据库 URL\n    POSTGRES_USER = os.getenv(\"POSTGRES_USER\", \"postgres\")\n    POSTGRES_PASSWORD = os.getenv(\"POSTGRES_PASSWORD\", \"postgres\")\n    POSTGRES_HOST = os.getenv(\"POSTGRES_HOST\", \"localhost\")\n    POSTGRES_PORT = os.getenv(\"POSTGRES_PORT\", \"5432\")\n    POSTGRES_DB = os.getenv(\"POSTGRES_DB\", \"finnews_db\")\n    DATABASE_URL = f\"postgresql+asyncpg://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}\"\n    \n    # 创建引擎和会话\n    engine = create_async_engine(DATABASE_URL, echo=False)\n    AsyncSessionLocal = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)\n    \n    try:\n        # 使用原始 SQL 查询，避免导入模型\n        async with AsyncSessionLocal() as db:\n            # 查询新闻数据\n            result = await db.execute(\n                text(\"SELECT id, title, content, is_embedded FROM news WHERE id = :news_id\"),\n                {\"news_id\": news_id}\n            )\n            row = result.first()\n            \n            if not row:\n                print(f\"❌ 新闻 {news_id} 不存在\")\n                return False\n            \n            news_id_db, title, content, is_embedded = row\n            \n            if is_embedded == 1:\n                print(f\"ℹ️  新闻 {news_id} 已经向量化过了\")\n                return True\n            \n            print(f\"🔄 开始向量化新闻 {news_id}: {title[:50]}...\")\n            \n            # 获取服务（这些服务不依赖数据库连接）\n            from app.services.embedding_service import get_embedding_service\n            from app.storage.vector_storage import get_vector_storage\n            \n            embedding_service = get_embedding_service()\n            vector_storage = get_vector_storage()\n            \n            # 组合文本\n            text_to_embed = f\"{title}\\n{content[:1000]}\"\n            \n            # 生成向量（增加超时时间到60秒）\n            print(\"  📡 调用 embedding API...\")\n            embedding = await asyncio.wait_for(\n                embedding_service.aembed_text(text_to_embed),\n                timeout=60.0  # 增加到60秒\n            )\n            print(f\"  ✅ 向量生成成功，维度: {len(embedding)}\")\n            \n            # 存储到 Milvus（设置超时，避免卡住）\n            print(\"  💾 存储到 Milvus...\")\n            try:\n                await asyncio.wait_for(\n                    run_in_threadpool(\n                        vector_storage.store_embedding,\n                        news_id=news_id,\n                        embedding=embedding,\n                        text=text_to_embed\n                    ),\n                    timeout=30.0  # 30秒超时\n                )\n                print(\"  ✅ 存储成功\")\n            except asyncio.TimeoutError:\n                print(\"  ⚠️  存储超时（30秒），但数据可能已插入\")\n                # 即使超时，数据可能已经插入，只是flush还没完成\n            \n            # 更新数据库标志\n            await db.execute(\n                text(\"UPDATE news SET is_embedded = 1 WHERE id = :news_id\"),\n                {\"news_id\": news_id}\n            )\n            await db.commit()\n            print(f\"  ✅ 更新数据库标志成功\")\n            \n            print(f\"✅ 新闻 {news_id} 向量化完成！\")\n            return True\n            \n    except asyncio.TimeoutError:\n        print(f\"❌ 新闻 {news_id} 向量化超时（60秒）\")\n        return False\n    except Exception as e:\n        print(f\"❌ 新闻 {news_id} 向量化失败: {e}\")\n        import traceback\n        traceback.print_exc()\n        return False\n    finally:\n        await engine.dispose()\n\nasync def vectorize_all_pending():\n    \"\"\"向量化所有未向量化但已分析的新闻\"\"\"\n    # 直接使用 SQLAlchemy 创建连接，避免循环导入\n    from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker\n    from sqlalchemy import text\n    \n    # 从环境变量构建数据库 URL\n    POSTGRES_USER = os.getenv(\"POSTGRES_USER\", \"postgres\")\n    POSTGRES_PASSWORD = os.getenv(\"POSTGRES_PASSWORD\", \"postgres\")\n    POSTGRES_HOST = os.getenv(\"POSTGRES_HOST\", \"localhost\")\n    POSTGRES_PORT = os.getenv(\"POSTGRES_PORT\", \"5432\")\n    POSTGRES_DB = os.getenv(\"POSTGRES_DB\", \"finnews_db\")\n    DATABASE_URL = f\"postgresql+asyncpg://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}\"\n    \n    # 创建引擎和会话\n    engine = create_async_engine(DATABASE_URL, echo=False)\n    AsyncSessionLocal = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)\n    \n    try:\n        print(\"🔍 正在查找需要向量化的新闻...\")\n        async with AsyncSessionLocal() as db:\n            # 使用原始 SQL 查询，避免导入模型\n            result = await db.execute(\n                text(\"\"\"\n                    SELECT id, title \n                    FROM news \n                    WHERE sentiment_score IS NOT NULL \n                    AND is_embedded = 0 \n                    ORDER BY id DESC\n                \"\"\")\n            )\n            pending_news = result.all()\n            \n            print(f\"📊 查询完成，找到 {len(pending_news) if pending_news else 0} 条记录\")\n            \n            if not pending_news:\n                print(\"✅ 没有需要向量化的新闻\")\n                return\n            \n            print(f\"📊 找到 {len(pending_news)} 条需要向量化的新闻\")\n            print(\"=\" * 60)\n            \n            success_count = 0\n            failed_count = 0\n            \n            # 使用单个处理方式，但添加了超时保护\n            for news_id, title in pending_news:\n                print(f\"\\n处理新闻 {news_id}...\")\n                if await vectorize_news_manually(news_id):\n                    success_count += 1\n                else:\n                    failed_count += 1\n            \n            print(\"\\n\" + \"=\" * 60)\n            print(f\"📊 向量化完成统计:\")\n            print(f\"  成功: {success_count}\")\n            print(f\"  失败: {failed_count}\")\n            print(\"=\" * 60)\n    finally:\n        await engine.dispose()\n\nasync def main_async():\n    import sys\n    \n    print(\"🚀 脚本开始执行...\")\n    \n    if len(sys.argv) > 1:\n        try:\n            # 向量化指定的新闻ID\n            news_id = int(sys.argv[1])\n            print(f\"📌 向量化指定的新闻: {news_id}\")\n            await vectorize_news_manually(news_id)\n        except ValueError:\n            # 如果不是数字，可能是 --no-wait 参数\n            if sys.argv[1] == \"--no-wait\":\n                print(\"📌 向量化所有未向量化的新闻（跳过等待）\")\n                await vectorize_all_pending()\n            else:\n                print(f\"❌ 无效的参数: {sys.argv[1]}\")\n                print(\"用法: python manual_vectorize.py [news_id|--no-wait]\")\n    else:\n        # 向量化所有未向量化的新闻\n        print(\"⚠️  这将向量化所有已分析但未向量化的新闻\")\n        print(\"   按 Ctrl+C 取消，或等待5秒后继续...\")\n        print(\"   (使用 --no-wait 参数可跳过等待)\")\n        try:\n            await asyncio.sleep(5)\n        except KeyboardInterrupt:\n            print(\"\\n已取消\")\n            sys.exit(0)\n        \n        await vectorize_all_pending()\n    \n    print(\"✅ 脚本执行完成\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main_async())\n"
  },
  {
    "path": "backend/tests/test_alpha_mining/__init__.py",
    "content": "\"\"\"Alpha Mining 测试模块\"\"\"\n"
  },
  {
    "path": "backend/tests/test_alpha_mining/test_integration_p2.py",
    "content": "\"\"\"\nP2 集成测试 - Alpha Mining 完整集成\n\n测试覆盖：\n- F18: QuantitativeAgent 集成\n- F19: REST API 端点\n- 完整工作流测试\n\"\"\"\n\nimport pytest\nimport sys\nfrom pathlib import Path\nfrom unittest.mock import AsyncMock, MagicMock, patch\nimport asyncio\n\n# 添加项目路径\nproject_root = Path(__file__).parent.parent.parent\nsys.path.insert(0, str(project_root))\n\n\n# ============================================================================\n# F18: QuantitativeAgent 集成测试\n# ============================================================================\n\nclass TestQuantitativeAgent:\n    \"\"\"量化分析智能体测试\"\"\"\n    \n    def test_agent_import(self):\n        \"\"\"测试 Agent 可导入\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent, create_quantitative_agent\n        \n        assert QuantitativeAgent is not None\n        assert create_quantitative_agent is not None\n    \n    def test_agent_init_without_llm(self):\n        \"\"\"测试不使用 LLM 初始化\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        \n        agent = QuantitativeAgent(\n            llm_provider=None,\n            enable_alpha_mining=True\n        )\n        \n        assert agent.enable_alpha_mining is True\n        assert agent._alpha_mining_initialized is False\n    \n    def test_agent_lazy_init(self):\n        \"\"\"测试延迟初始化\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        \n        agent = QuantitativeAgent(enable_alpha_mining=True)\n        \n        # 初始时未初始化\n        assert agent._generator is None\n        assert agent._vm is None\n        \n        # 调用 _init_alpha_mining\n        agent._init_alpha_mining()\n        \n        # 现在应该已初始化\n        assert agent._alpha_mining_initialized is True\n        assert agent._generator is not None\n        assert agent._vm is not None\n    \n    @pytest.mark.asyncio\n    async def test_agent_mine_factors(self):\n        \"\"\"测试因子挖掘功能\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        \n        agent = QuantitativeAgent(enable_alpha_mining=True)\n        \n        result = await agent._mine_factors(\n            stock_code=\"000001\",\n            stock_name=\"测试股票\",\n            market_data=None,\n            sentiment_data=None\n        )\n        \n        assert \"factors\" in result\n        assert \"stats\" in result\n        assert isinstance(result[\"factors\"], list)\n    \n    @pytest.mark.asyncio\n    async def test_agent_full_analysis(self):\n        \"\"\"测试完整分析流程（无 LLM）\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        \n        agent = QuantitativeAgent(\n            llm_provider=None,\n            enable_alpha_mining=True\n        )\n        \n        result = await agent.analyze(\n            stock_code=\"000001\",\n            stock_name=\"平安银行\",\n            market_data=None,\n            sentiment_data=None,\n            context=\"\"\n        )\n        \n        assert result[\"success\"] is True\n        assert result[\"stock_code\"] == \"000001\"\n        assert \"factors_discovered\" in result\n    \n    @pytest.mark.asyncio\n    async def test_agent_with_mock_llm(self):\n        \"\"\"测试使用 Mock LLM\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        \n        # 创建 Mock LLM\n        mock_llm = AsyncMock()\n        mock_llm.chat = AsyncMock(return_value='{\"trend\": \"上涨\", \"confidence\": 0.7}')\n        \n        agent = QuantitativeAgent(\n            llm_provider=mock_llm,\n            enable_alpha_mining=True\n        )\n        \n        # 准备模拟数据\n        import torch\n        market_data = {\n            \"close\": torch.randn(100).abs() * 100 + 50,\n            \"volume\": torch.randn(100).abs() * 1e6\n        }\n        \n        result = await agent.analyze(\n            stock_code=\"000001\",\n            stock_name=\"平安银行\",\n            market_data=market_data,\n            context=\"测试上下文\"\n        )\n        \n        assert result[\"success\"] is True\n        assert len(result[\"factors_discovered\"]) >= 0\n    \n    def test_agent_evaluate_factor(self):\n        \"\"\"测试因子评估\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        \n        agent = QuantitativeAgent(enable_alpha_mining=True)\n        \n        # 同步包装异步调用\n        loop = asyncio.get_event_loop()\n        result = loop.run_until_complete(\n            agent.evaluate_factor(\"ADD RET VOL\")\n        )\n        \n        # 可能成功或失败，取决于公式解析\n        assert \"success\" in result\n    \n    def test_agent_get_best_factors(self):\n        \"\"\"测试获取最优因子\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        \n        agent = QuantitativeAgent(enable_alpha_mining=True)\n        \n        # 手动添加一些因子\n        agent.discovered_factors = [\n            {\"formula_str\": \"ADD(RET, VOL)\", \"sortino\": 1.5},\n            {\"formula_str\": \"MUL(RET, MA5(VOL))\", \"sortino\": 0.8},\n            {\"formula_str\": \"SUB(RET, DELTA1(VOL))\", \"sortino\": 2.0},\n        ]\n        \n        best = agent.get_best_factors(top_k=2)\n        \n        assert len(best) == 2\n        assert best[0][\"sortino\"] == 2.0  # 最高分在前\n\n\n# ============================================================================\n# F19: REST API 测试\n# ============================================================================\n\nclass TestAlphaMiningAPI:\n    \"\"\"Alpha Mining REST API 测试\"\"\"\n    \n    def test_api_module_import(self):\n        \"\"\"测试 API 模块可导入\"\"\"\n        from app.api.v1.alpha_mining import router\n        \n        assert router is not None\n        assert router.prefix == \"/alpha-mining\"\n    \n    def test_api_routes_exist(self):\n        \"\"\"测试 API 路由存在\"\"\"\n        from app.api.v1.alpha_mining import router\n        \n        routes = [r.path for r in router.routes]\n        \n        assert \"/mine\" in routes\n        assert \"/evaluate\" in routes\n        assert \"/generate\" in routes\n        assert \"/factors\" in routes\n        assert \"/status/{task_id}\" in routes\n        assert \"/operators\" in routes\n    \n    @pytest.fixture\n    def test_client(self):\n        \"\"\"创建测试客户端\"\"\"\n        try:\n            from fastapi.testclient import TestClient\n            from app.main import app\n            return TestClient(app)\n        except ImportError:\n            pytest.skip(\"FastAPI test client not available\")\n    \n    def test_get_operators(self, test_client):\n        \"\"\"测试获取操作符列表\"\"\"\n        if test_client is None:\n            pytest.skip(\"Test client not available\")\n        \n        response = test_client.get(\"/api/v1/alpha-mining/operators\")\n        \n        assert response.status_code == 200\n        data = response.json()\n        assert data[\"success\"] is True\n        assert \"features\" in data\n        assert \"operators\" in data\n    \n    def test_get_factors_empty(self, test_client):\n        \"\"\"测试获取因子列表（空）\"\"\"\n        if test_client is None:\n            pytest.skip(\"Test client not available\")\n        \n        response = test_client.get(\"/api/v1/alpha-mining/factors\")\n        \n        assert response.status_code == 200\n        data = response.json()\n        assert data[\"success\"] is True\n        assert \"factors\" in data\n    \n    def test_evaluate_factor(self, test_client):\n        \"\"\"测试因子评估端点\"\"\"\n        if test_client is None:\n            pytest.skip(\"Test client not available\")\n        \n        response = test_client.post(\n            \"/api/v1/alpha-mining/evaluate\",\n            json={\"formula\": \"RET\"}\n        )\n        \n        assert response.status_code == 200\n        data = response.json()\n        assert \"success\" in data\n    \n    def test_generate_factors(self, test_client):\n        \"\"\"测试因子生成端点\"\"\"\n        if test_client is None:\n            pytest.skip(\"Test client not available\")\n        \n        response = test_client.post(\n            \"/api/v1/alpha-mining/generate\",\n            json={\"batch_size\": 5, \"max_len\": 6}\n        )\n        \n        assert response.status_code == 200\n        data = response.json()\n        assert data[\"success\"] is True\n        assert \"factors\" in data\n\n\n# ============================================================================\n# 完整工作流测试\n# ============================================================================\n\nclass TestFullWorkflow:\n    \"\"\"完整工作流测试\"\"\"\n    \n    @pytest.mark.asyncio\n    async def test_end_to_end_factor_discovery(self):\n        \"\"\"端到端因子发现流程\"\"\"\n        import torch\n        \n        # 1. 准备数据\n        from app.alpha_mining import (\n            AlphaMiningConfig,\n            FactorVocab,\n            FactorVM,\n            AlphaGenerator,\n            AlphaTrainer,\n            FactorEvaluator,\n            MarketFeatureBuilder,\n            SentimentFeatureBuilder,\n            generate_mock_data\n        )\n        \n        # 2. 初始化组件\n        config = AlphaMiningConfig(\n            d_model=32,\n            num_layers=1,\n            batch_size=8,\n            max_seq_len=6\n        )\n        vocab = FactorVocab()\n        vm = FactorVM(vocab=vocab)\n        generator = AlphaGenerator(vocab=vocab, config=config)\n        evaluator = FactorEvaluator(config=config)\n        \n        # 3. 生成模拟数据\n        features, returns = generate_mock_data(\n            num_samples=30,\n            num_features=6,\n            time_steps=100,\n            seed=42\n        )\n        \n        # 4. 创建训练器并训练\n        trainer = AlphaTrainer(\n            generator=generator,\n            vocab=vocab,\n            config=config\n        )\n        \n        result = trainer.train(\n            features=features,\n            returns=returns,\n            num_steps=5,  # 少量步数用于测试\n            progress_bar=False\n        )\n        \n        assert result[\"total_steps\"] == 5\n        assert \"best_score\" in result\n        \n        # 5. 验证最优因子\n        if result[\"best_formula\"]:\n            factor = vm.execute(result[\"best_formula\"], features)\n            assert factor is not None or factor is None  # 可能无效\n            \n            if factor is not None:\n                metrics = evaluator.evaluate(factor, returns)\n                assert \"sortino_ratio\" in metrics\n        \n        print(\"\\n✅ End-to-end factor discovery test passed!\")\n    \n    @pytest.mark.asyncio\n    async def test_quantitative_agent_workflow(self):\n        \"\"\"量化智能体工作流测试\"\"\"\n        from app.agents.quantitative_agent import QuantitativeAgent\n        import torch\n        \n        # 创建智能体\n        agent = QuantitativeAgent(enable_alpha_mining=True)\n        \n        # 准备数据\n        market_data = {\n            \"close\": torch.randn(252).abs() * 100 + 50,\n            \"volume\": torch.randn(252).abs() * 1e6\n        }\n        \n        sentiment_data = {\n            \"sentiment\": torch.randn(252).tolist(),\n            \"news_count\": torch.abs(torch.randn(252)).tolist()\n        }\n        \n        # 执行分析\n        result = await agent.analyze(\n            stock_code=\"600000\",\n            stock_name=\"浦发银行\",\n            market_data=market_data,\n            sentiment_data=sentiment_data,\n            context=\"银行股分析\"\n        )\n        \n        assert result[\"success\"] is True\n        assert result[\"stock_code\"] == \"600000\"\n        assert \"factors_discovered\" in result\n        \n        print(\"\\n✅ QuantitativeAgent workflow test passed!\")\n        print(f\"   - Factors discovered: {len(result['factors_discovered'])}\")\n    \n    def test_api_and_agent_integration(self):\n        \"\"\"API 和 Agent 集成测试\"\"\"\n        from app.agents.quantitative_agent import create_quantitative_agent\n        \n        # 创建智能体\n        agent = create_quantitative_agent(enable_alpha_mining=True)\n        \n        # 验证组件\n        agent._init_alpha_mining()\n        \n        assert agent._generator is not None\n        assert agent._vm is not None\n        assert agent._evaluator is not None\n        \n        # 验证因子生成\n        formulas, _ = agent._generator.generate(batch_size=3, max_len=5)\n        \n        assert len(formulas) == 3\n        \n        # 验证因子执行\n        from app.alpha_mining import generate_mock_data\n        features, returns = generate_mock_data(num_samples=10, time_steps=50)\n        \n        valid_count = 0\n        for formula in formulas:\n            factor = agent._vm.execute(formula, features)\n            if factor is not None:\n                valid_count += 1\n        \n        print(f\"\\n✅ API-Agent integration test passed!\")\n        print(f\"   - Generated: {len(formulas)}, Valid: {valid_count}\")\n\n\n# ============================================================================\n# 性能测试\n# ============================================================================\n\nclass TestPerformance:\n    \"\"\"性能测试\"\"\"\n    \n    def test_generator_speed(self):\n        \"\"\"测试生成器速度\"\"\"\n        import time\n        from app.alpha_mining import AlphaGenerator, AlphaMiningConfig\n        \n        config = AlphaMiningConfig(d_model=64, num_layers=2)\n        generator = AlphaGenerator(config=config)\n        \n        # 预热\n        generator.generate(batch_size=10, max_len=8)\n        \n        # 计时\n        start = time.time()\n        for _ in range(10):\n            generator.generate(batch_size=100, max_len=8)\n        elapsed = time.time() - start\n        \n        avg_time = elapsed / 10\n        print(f\"\\n📊 Generator speed: {avg_time*1000:.2f}ms per batch (100 factors)\")\n        \n        assert avg_time < 5.0  # 应该在 5 秒内完成\n    \n    def test_vm_execution_speed(self):\n        \"\"\"测试 VM 执行速度\"\"\"\n        import time\n        import torch\n        from app.alpha_mining import FactorVM, FactorVocab, generate_mock_data\n        \n        vm = FactorVM()\n        vocab = FactorVocab()\n        features, _ = generate_mock_data(num_samples=100, time_steps=252)\n        \n        # 创建测试公式\n        formulas = [\n            [0],  # RET\n            [0, 1, vocab.name_to_token(\"ADD\")],  # ADD(RET, VOL)\n            [0, vocab.name_to_token(\"MA5\")],  # MA5(RET)\n        ]\n        \n        # 计时\n        start = time.time()\n        for _ in range(100):\n            for formula in formulas:\n                vm.execute(formula, features)\n        elapsed = time.time() - start\n        \n        avg_time = elapsed / (100 * len(formulas))\n        print(f\"\\n📊 VM execution speed: {avg_time*1000:.3f}ms per formula\")\n        \n        assert avg_time < 0.1  # 应该在 100ms 内完成\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "backend/tests/test_alpha_mining/test_smoke_p0.py",
    "content": "\"\"\"\nP0 冒烟测试 - Alpha Mining 核心机制\n\n测试覆盖：\n- F02: 配置模块\n- F03-F04: 操作符和时序函数\n- F05: 词汇表\n- F06-F07: FactorVM 执行和解码\n- F08-F09: AlphaGenerator 模型和生成\n- F10: AlphaTrainer 训练\n- F11: 模拟数据生成\n\"\"\"\n\nimport pytest\nimport torch\nimport sys\nfrom pathlib import Path\n\n# 添加项目路径\nproject_root = Path(__file__).parent.parent.parent\nsys.path.insert(0, str(project_root))\n\nfrom app.alpha_mining.config import AlphaMiningConfig, DEFAULT_CONFIG\nfrom app.alpha_mining.dsl.ops import (\n    OPS_CONFIG, ts_delay, ts_delta, ts_mean, ts_std, get_op_names\n)\nfrom app.alpha_mining.dsl.vocab import FactorVocab, FEATURES, DEFAULT_VOCAB\nfrom app.alpha_mining.vm.factor_vm import FactorVM\nfrom app.alpha_mining.model.alpha_generator import AlphaGenerator\nfrom app.alpha_mining.model.trainer import AlphaTrainer\nfrom app.alpha_mining.utils import generate_mock_data\n\n\n# ============================================================================\n# F02: 配置模块测试\n# ============================================================================\n\nclass TestConfig:\n    \"\"\"配置模块测试\"\"\"\n    \n    def test_default_config_exists(self):\n        \"\"\"测试默认配置存在\"\"\"\n        assert DEFAULT_CONFIG is not None\n        assert isinstance(DEFAULT_CONFIG, AlphaMiningConfig)\n    \n    def test_config_device(self):\n        \"\"\"测试设备配置\"\"\"\n        config = AlphaMiningConfig()\n        assert config.device in [\"cpu\", \"cuda\", \"mps\"]\n        assert isinstance(config.torch_device, torch.device)\n    \n    def test_config_features(self):\n        \"\"\"测试特征配置\"\"\"\n        config = AlphaMiningConfig()\n        assert len(config.market_features) >= 4\n        assert len(config.all_features) >= 4\n        assert config.num_features > 0\n\n\n# ============================================================================\n# F03-F04: 操作符测试\n# ============================================================================\n\nclass TestOps:\n    \"\"\"操作符测试\"\"\"\n    \n    @pytest.fixture\n    def sample_tensor(self):\n        \"\"\"创建测试张量\"\"\"\n        return torch.randn(10, 100)  # [batch=10, time=100]\n    \n    def test_ts_delay(self, sample_tensor):\n        \"\"\"测试时序延迟\"\"\"\n        result = ts_delay(sample_tensor, d=1)\n        assert result.shape == sample_tensor.shape\n        # 第一列应该是 0\n        assert (result[:, 0] == 0).all()\n        # 后续应该是原始值的延迟\n        assert torch.allclose(result[:, 1:], sample_tensor[:, :-1])\n    \n    def test_ts_delta(self, sample_tensor):\n        \"\"\"测试时序差分\"\"\"\n        result = ts_delta(sample_tensor, d=1)\n        assert result.shape == sample_tensor.shape\n        # 差分 = x[t] - x[t-1]\n        expected = sample_tensor - ts_delay(sample_tensor, 1)\n        assert torch.allclose(result, expected)\n    \n    def test_ts_mean(self, sample_tensor):\n        \"\"\"测试滑动平均\"\"\"\n        result = ts_mean(sample_tensor, window=5)\n        assert result.shape == sample_tensor.shape\n        # 值应该在合理范围内\n        assert not torch.isnan(result).any()\n    \n    def test_ts_std(self, sample_tensor):\n        \"\"\"测试滑动标准差\"\"\"\n        result = ts_std(sample_tensor, window=5)\n        assert result.shape == sample_tensor.shape\n        # 标准差应该非负\n        assert (result >= 0).all()\n    \n    def test_ops_config_complete(self):\n        \"\"\"测试操作符配置完整性\"\"\"\n        assert len(OPS_CONFIG) >= 10\n        for name, func, arity in OPS_CONFIG:\n            assert isinstance(name, str)\n            assert callable(func)\n            assert arity in [1, 2, 3]\n    \n    def test_all_ops_executable(self, sample_tensor):\n        \"\"\"测试所有操作符可执行\"\"\"\n        y = torch.randn_like(sample_tensor)\n        z = torch.randn_like(sample_tensor)\n        \n        for name, func, arity in OPS_CONFIG:\n            try:\n                if arity == 1:\n                    result = func(sample_tensor)\n                elif arity == 2:\n                    result = func(sample_tensor, y)\n                elif arity == 3:\n                    result = func(sample_tensor, y, z)\n                \n                assert result.shape == sample_tensor.shape, f\"{name} shape mismatch\"\n                assert not torch.isnan(result).all(), f\"{name} all NaN\"\n            except Exception as e:\n                pytest.fail(f\"Operator {name} failed: {e}\")\n\n\n# ============================================================================\n# F05: 词汇表测试\n# ============================================================================\n\nclass TestVocab:\n    \"\"\"词汇表测试\"\"\"\n    \n    def test_default_vocab_exists(self):\n        \"\"\"测试默认词汇表存在\"\"\"\n        assert DEFAULT_VOCAB is not None\n        assert DEFAULT_VOCAB.vocab_size > 0\n    \n    def test_vocab_token_mapping(self):\n        \"\"\"测试 token 映射\"\"\"\n        vocab = FactorVocab()\n        \n        # 测试特征映射\n        assert vocab.token_to_name(0) == FEATURES[0]\n        assert vocab.name_to_token(FEATURES[0]) == 0\n        \n        # 测试操作符映射\n        op_names = get_op_names()\n        first_op_token = vocab.num_features\n        assert vocab.token_to_name(first_op_token) == op_names[0]\n    \n    def test_vocab_is_feature_operator(self):\n        \"\"\"测试特征/操作符判断\"\"\"\n        vocab = FactorVocab()\n        \n        # 特征 token\n        assert vocab.is_feature(0)\n        assert not vocab.is_operator(0)\n        \n        # 操作符 token\n        op_token = vocab.num_features\n        assert vocab.is_operator(op_token)\n        assert not vocab.is_feature(op_token)\n    \n    def test_vocab_get_operator_arity(self):\n        \"\"\"测试获取操作符参数数量\"\"\"\n        vocab = FactorVocab()\n        \n        for i, (name, func, arity) in enumerate(OPS_CONFIG):\n            token = vocab.num_features + i\n            assert vocab.get_operator_arity(token) == arity\n\n\n# ============================================================================\n# F06-F07: FactorVM 测试\n# ============================================================================\n\nclass TestFactorVM:\n    \"\"\"因子执行器测试\"\"\"\n    \n    @pytest.fixture\n    def vm(self):\n        \"\"\"创建 VM 实例\"\"\"\n        return FactorVM()\n    \n    @pytest.fixture\n    def features(self):\n        \"\"\"创建测试特征\"\"\"\n        # [batch=10, num_features=6, time=100]\n        return torch.randn(10, 6, 100)\n    \n    def test_vm_execute_simple(self, vm, features):\n        \"\"\"测试简单表达式执行\"\"\"\n        # 只取第一个特征\n        formula = [0]  # RET\n        result = vm.execute(formula, features)\n        \n        assert result is not None\n        assert result.shape == (10, 100)\n        assert torch.allclose(result, features[:, 0, :])\n    \n    def test_vm_execute_binary_op(self, vm, features):\n        \"\"\"测试二元操作\"\"\"\n        vocab = vm.vocab\n        add_token = vocab.name_to_token(\"ADD\")\n        \n        # ADD(RET, VOL) = features[0] + features[1]\n        formula = [0, 1, add_token]\n        result = vm.execute(formula, features)\n        \n        assert result is not None\n        expected = features[:, 0, :] + features[:, 1, :]\n        assert torch.allclose(result, expected)\n    \n    def test_vm_execute_unary_op(self, vm, features):\n        \"\"\"测试一元操作\"\"\"\n        vocab = vm.vocab\n        neg_token = vocab.name_to_token(\"NEG\")\n        \n        # NEG(RET) = -features[0]\n        formula = [0, neg_token]\n        result = vm.execute(formula, features)\n        \n        assert result is not None\n        expected = -features[:, 0, :]\n        assert torch.allclose(result, expected)\n    \n    def test_vm_execute_invalid_formula(self, vm, features):\n        \"\"\"测试无效公式\"\"\"\n        vocab = vm.vocab\n        add_token = vocab.name_to_token(\"ADD\")\n        \n        # 只有一个参数的 ADD（无效）\n        formula = [0, add_token]\n        result = vm.execute(formula, features)\n        \n        assert result is None  # 应该返回 None\n    \n    def test_vm_decode_simple(self, vm):\n        \"\"\"测试表达式解码\"\"\"\n        # RET\n        assert \"RET\" in vm.decode([0])\n        \n        # ADD(RET, VOL)\n        vocab = vm.vocab\n        add_token = vocab.name_to_token(\"ADD\")\n        decoded = vm.decode([0, 1, add_token])\n        assert \"ADD\" in decoded\n        assert \"RET\" in decoded\n    \n    def test_vm_validate(self, vm):\n        \"\"\"测试表达式验证\"\"\"\n        vocab = vm.vocab\n        add_token = vocab.name_to_token(\"ADD\")\n        neg_token = vocab.name_to_token(\"NEG\")\n        \n        # 有效公式\n        assert vm.validate([0])  # RET\n        assert vm.validate([0, neg_token])  # NEG(RET)\n        assert vm.validate([0, 1, add_token])  # ADD(RET, VOL)\n        \n        # 无效公式\n        assert not vm.validate([add_token])  # ADD without args\n        assert not vm.validate([0, 1])  # Two features, no op\n\n\n# ============================================================================\n# F08-F09: AlphaGenerator 测试\n# ============================================================================\n\nclass TestAlphaGenerator:\n    \"\"\"因子生成器测试\"\"\"\n    \n    @pytest.fixture\n    def generator(self):\n        \"\"\"创建生成器实例\"\"\"\n        config = AlphaMiningConfig(d_model=32, num_layers=1)  # 小模型用于测试\n        return AlphaGenerator(config=config)\n    \n    def test_generator_init(self, generator):\n        \"\"\"测试生成器初始化\"\"\"\n        assert generator.vocab_size > 0\n        assert generator.d_model > 0\n    \n    def test_generator_forward(self, generator):\n        \"\"\"测试前向传播\"\"\"\n        batch_size = 4\n        seq_len = 5\n        tokens = torch.zeros((batch_size, seq_len), dtype=torch.long)\n        \n        logits, value = generator(tokens)\n        \n        assert logits.shape == (batch_size, generator.vocab_size)\n        assert value.shape == (batch_size, 1)\n    \n    def test_generator_generate(self, generator):\n        \"\"\"测试生成功能\"\"\"\n        batch_size = 8\n        max_len = 6\n        \n        formulas, log_probs = generator.generate(\n            batch_size=batch_size,\n            max_len=max_len\n        )\n        \n        assert len(formulas) == batch_size\n        assert all(len(f) == max_len for f in formulas)\n        assert len(log_probs) == batch_size\n    \n    def test_generator_generate_with_training(self, generator):\n        \"\"\"测试训练模式生成\"\"\"\n        batch_size = 4\n        max_len = 6\n        \n        sequences, log_probs, values = generator.generate_with_training(\n            batch_size=batch_size,\n            max_len=max_len\n        )\n        \n        assert sequences.shape == (batch_size, max_len)\n        assert len(log_probs) == max_len\n        assert len(values) == max_len\n\n\n# ============================================================================\n# F10: AlphaTrainer 测试\n# ============================================================================\n\nclass TestAlphaTrainer:\n    \"\"\"训练器测试\"\"\"\n    \n    @pytest.fixture\n    def trainer(self):\n        \"\"\"创建训练器实例\"\"\"\n        config = AlphaMiningConfig(\n            d_model=32,\n            num_layers=1,\n            batch_size=16,\n            max_seq_len=6\n        )\n        return AlphaTrainer(config=config)\n    \n    @pytest.fixture\n    def mock_data(self):\n        \"\"\"创建模拟数据\"\"\"\n        return generate_mock_data(\n            num_samples=20,\n            num_features=6,\n            time_steps=50,\n            seed=42\n        )\n    \n    def test_trainer_init(self, trainer):\n        \"\"\"测试训练器初始化\"\"\"\n        assert trainer.generator is not None\n        assert trainer.vm is not None\n        assert trainer.best_score == -float('inf')\n    \n    def test_trainer_train_step(self, trainer, mock_data):\n        \"\"\"测试单步训练\"\"\"\n        features, returns = mock_data\n        \n        metrics = trainer.train_step(features, returns)\n        \n        assert \"loss\" in metrics\n        assert \"avg_reward\" in metrics\n        assert \"valid_ratio\" in metrics\n        assert trainer.step_count == 1\n    \n    def test_trainer_short_training(self, trainer, mock_data):\n        \"\"\"测试短训练（3步）\"\"\"\n        features, returns = mock_data\n        \n        result = trainer.train(\n            features, returns,\n            num_steps=3,\n            progress_bar=False\n        )\n        \n        assert result[\"total_steps\"] == 3\n        assert \"best_score\" in result\n        assert len(trainer.training_history) == 3\n\n\n# ============================================================================\n# F11: 模拟数据测试\n# ============================================================================\n\nclass TestMockData:\n    \"\"\"模拟数据生成测试\"\"\"\n    \n    def test_generate_mock_data_shape(self):\n        \"\"\"测试模拟数据形状\"\"\"\n        features, returns = generate_mock_data(\n            num_samples=50,\n            num_features=6,\n            time_steps=100\n        )\n        \n        assert features.shape == (50, 6, 100)\n        assert returns.shape == (50, 100)\n    \n    def test_generate_mock_data_no_nan(self):\n        \"\"\"测试模拟数据无 NaN\"\"\"\n        features, returns = generate_mock_data()\n        \n        assert not torch.isnan(features).any()\n        assert not torch.isnan(returns).any()\n    \n    def test_generate_mock_data_reproducible(self):\n        \"\"\"测试模拟数据可复现\"\"\"\n        f1, r1 = generate_mock_data(seed=42)\n        f2, r2 = generate_mock_data(seed=42)\n        \n        assert torch.allclose(f1, f2)\n        assert torch.allclose(r1, r2)\n\n\n# ============================================================================\n# 端到端冒烟测试\n# ============================================================================\n\nclass TestEndToEnd:\n    \"\"\"端到端测试\"\"\"\n    \n    def test_full_pipeline_smoke(self):\n        \"\"\"完整流程冒烟测试\"\"\"\n        # 1. 创建配置\n        config = AlphaMiningConfig(\n            d_model=32,\n            num_layers=1,\n            batch_size=8,\n            max_seq_len=6\n        )\n        \n        # 2. 创建组件\n        vocab = FactorVocab()\n        vm = FactorVM(vocab=vocab)\n        generator = AlphaGenerator(vocab=vocab, config=config)\n        trainer = AlphaTrainer(generator=generator, vocab=vocab, config=config)\n        \n        # 3. 生成模拟数据\n        features, returns = generate_mock_data(\n            num_samples=10,\n            num_features=6,\n            time_steps=30,\n            seed=42\n        )\n        \n        # 4. 生成因子表达式\n        formulas, _ = generator.generate(batch_size=4, max_len=5)\n        \n        # 5. 执行表达式\n        valid_count = 0\n        for formula in formulas:\n            result = vm.execute(formula, features)\n            if result is not None:\n                valid_count += 1\n                decoded = vm.decode(formula)\n                assert isinstance(decoded, str)\n        \n        # 6. 训练（1步）\n        metrics = trainer.train_step(features, returns)\n        assert metrics[\"step\"] == 1\n        \n        print(f\"\\n✅ End-to-end smoke test passed!\")\n        print(f\"   - Valid formulas: {valid_count}/{len(formulas)}\")\n        print(f\"   - Avg reward: {metrics['avg_reward']:.4f}\")\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "backend/tests/test_alpha_mining/test_smoke_p1.py",
    "content": "\"\"\"\nP1 冒烟测试 - Alpha Mining 数据集成\n\n测试覆盖：\n- F13: MarketFeatureBuilder\n- F14: SentimentFeatureBuilder\n- F15: FactorEvaluator\n- F16: AlphaMiningTool\n\"\"\"\n\nimport pytest\nimport torch\nimport pandas as pd\nimport numpy as np\nimport sys\nfrom pathlib import Path\nfrom datetime import datetime, timedelta\n\n# 添加项目路径\nproject_root = Path(__file__).parent.parent.parent\nsys.path.insert(0, str(project_root))\n\nfrom app.alpha_mining.config import AlphaMiningConfig, DEFAULT_CONFIG\nfrom app.alpha_mining.features.market import MarketFeatureBuilder\nfrom app.alpha_mining.features.sentiment import SentimentFeatureBuilder\nfrom app.alpha_mining.backtest.evaluator import FactorEvaluator\nfrom app.alpha_mining.utils import generate_mock_data\n\n\n# ============================================================================\n# F13: MarketFeatureBuilder 测试\n# ============================================================================\n\nclass TestMarketFeatureBuilder:\n    \"\"\"行情特征构建器测试\"\"\"\n    \n    @pytest.fixture\n    def builder(self):\n        return MarketFeatureBuilder()\n    \n    @pytest.fixture\n    def sample_df(self):\n        \"\"\"创建示例 DataFrame\"\"\"\n        dates = pd.date_range(\"2024-01-01\", periods=100, freq=\"D\")\n        np.random.seed(42)\n        \n        return pd.DataFrame({\n            \"date\": dates,\n            \"close\": 100 * np.exp(np.cumsum(np.random.randn(100) * 0.02)),\n            \"volume\": np.abs(np.random.randn(100)) * 1e6 + 1e6,\n            \"turnover\": np.abs(np.random.randn(100)) * 0.05,\n        }).set_index(\"date\")\n    \n    def test_build_from_dataframe(self, builder, sample_df):\n        \"\"\"测试从 DataFrame 构建特征\"\"\"\n        features = builder.build(sample_df)\n        \n        assert features.dim() == 3  # [batch, features, time]\n        assert features.size(0) == 1  # batch=1\n        assert features.size(1) == 4  # 4 个特征\n        assert features.size(2) == 100  # time_steps\n    \n    def test_build_from_tensors(self, builder):\n        \"\"\"测试从张量字典构建特征\"\"\"\n        data = {\n            \"close\": torch.randn(10, 100).abs() * 100 + 50,\n            \"volume\": torch.randn(10, 100).abs() * 1e6,\n        }\n        \n        features = builder.build(data)\n        \n        assert features.shape == (10, 4, 100)\n    \n    def test_features_normalized(self, builder, sample_df):\n        \"\"\"测试特征被正确标准化\"\"\"\n        features = builder.build(sample_df)\n        \n        # 检查值在合理范围内\n        assert features.max() <= 5.0\n        assert features.min() >= -5.0\n    \n    def test_no_nan_in_features(self, builder, sample_df):\n        \"\"\"测试特征无 NaN\"\"\"\n        features = builder.build(sample_df)\n        \n        assert not torch.isnan(features).any()\n        assert not torch.isinf(features).any()\n    \n    def test_feature_names(self, builder):\n        \"\"\"测试特征名称\"\"\"\n        names = builder.get_feature_names()\n        \n        assert \"RET\" in names\n        assert \"VOL\" in names\n        assert \"VOLUME_CHG\" in names\n        assert \"TURNOVER\" in names\n\n\n# ============================================================================\n# F14: SentimentFeatureBuilder 测试\n# ============================================================================\n\nclass TestSentimentFeatureBuilder:\n    \"\"\"情感特征构建器测试\"\"\"\n    \n    @pytest.fixture\n    def builder(self):\n        return SentimentFeatureBuilder()\n    \n    @pytest.fixture\n    def sample_df(self):\n        \"\"\"创建示例 DataFrame\"\"\"\n        dates = pd.date_range(\"2024-01-01\", periods=50, freq=\"D\")\n        np.random.seed(42)\n        \n        return pd.DataFrame({\n            \"date\": dates,\n            \"sentiment\": np.random.randn(50) * 0.3,\n            \"news_count\": np.abs(np.random.randn(50)) * 5 + 1,\n        }).set_index(\"date\")\n    \n    def test_build_from_dataframe(self, builder, sample_df):\n        \"\"\"测试从 DataFrame 构建特征\"\"\"\n        features = builder.build(sample_df)\n        \n        assert features.dim() == 3\n        assert features.size(0) == 1\n        assert features.size(1) == 2  # SENTIMENT, NEWS_COUNT\n        assert features.size(2) == 50\n    \n    def test_build_from_dict(self, builder):\n        \"\"\"测试从字典构建特征\"\"\"\n        data = {\n            \"sentiment\": [0.1, -0.2, 0.3, 0.0, -0.1],\n            \"news_count\": [5, 3, 8, 2, 4]\n        }\n        \n        features = builder.build(data)\n        \n        assert features.shape == (1, 2, 5)\n    \n    def test_build_from_list(self, builder):\n        \"\"\"测试从列表构建特征\"\"\"\n        data = [\n            {\"sentiment\": 0.1, \"news_count\": 5},\n            {\"sentiment\": -0.2, \"news_count\": 3},\n            {\"sentiment\": 0.3, \"news_count\": 8},\n        ]\n        \n        features = builder.build(data)\n        \n        assert features.shape == (1, 2, 3)\n    \n    def test_time_alignment(self, builder):\n        \"\"\"测试时间步对齐\"\"\"\n        data = {\"sentiment\": [0.1, 0.2, 0.3], \"news_count\": [1, 2, 3]}\n        \n        features = builder.build(data, time_steps=10)\n        \n        assert features.size(2) == 10\n    \n    def test_sentiment_decay(self, builder):\n        \"\"\"测试情感衰减\"\"\"\n        # 创建一个有明显峰值的情感序列\n        data = {\"sentiment\": [0, 0, 0, 1.0, 0, 0, 0], \"news_count\": [1] * 7}\n        \n        features = builder.build(data)\n        \n        # 衰减后的值应该逐渐减小\n        sentiment = features[0, 0, :]\n        assert sentiment[4] < sentiment[3]  # 峰值后开始衰减\n    \n    def test_combine_with_market(self, builder):\n        \"\"\"测试与行情特征合并\"\"\"\n        market = torch.randn(2, 4, 100)  # [batch, 4 features, time]\n        sentiment = torch.randn(2, 2, 100)  # [batch, 2 features, time]\n        \n        combined = builder.combine_with_market(market, sentiment)\n        \n        assert combined.shape == (2, 6, 100)\n\n\n# ============================================================================\n# F15: FactorEvaluator 测试\n# ============================================================================\n\nclass TestFactorEvaluator:\n    \"\"\"因子评估器测试\"\"\"\n    \n    @pytest.fixture\n    def evaluator(self):\n        return FactorEvaluator()\n    \n    @pytest.fixture\n    def sample_data(self):\n        \"\"\"创建示例数据\"\"\"\n        np.random.seed(42)\n        time_steps = 252\n        \n        # 模拟收益率\n        returns = torch.randn(time_steps) * 0.02\n        \n        # 模拟因子（与收益率有一定相关性）\n        noise = torch.randn(time_steps) * 0.5\n        factor = returns + noise\n        \n        return factor, returns\n    \n    def test_evaluate_basic(self, evaluator, sample_data):\n        \"\"\"测试基础评估\"\"\"\n        factor, returns = sample_data\n        \n        metrics = evaluator.evaluate(factor, returns)\n        \n        assert \"sortino_ratio\" in metrics\n        assert \"sharpe_ratio\" in metrics\n        assert \"ic\" in metrics\n        assert \"rank_ic\" in metrics\n        assert \"max_drawdown\" in metrics\n        assert \"turnover\" in metrics\n    \n    def test_evaluate_batch(self, evaluator):\n        \"\"\"测试批量评估\"\"\"\n        factor = torch.randn(10, 100)\n        returns = torch.randn(10, 100) * 0.02\n        \n        metrics = evaluator.evaluate(factor, returns)\n        \n        # 应该返回平均值和标准差\n        assert \"sortino_ratio\" in metrics\n        assert \"sortino_ratio_std\" in metrics\n    \n    def test_get_reward(self, evaluator, sample_data):\n        \"\"\"测试获取 RL 奖励\"\"\"\n        factor, returns = sample_data\n        \n        reward = evaluator.get_reward(factor, returns)\n        \n        assert isinstance(reward, float)\n        assert not np.isnan(reward)\n    \n    def test_good_factor_high_ic(self, evaluator):\n        \"\"\"测试好因子有较高 IC\"\"\"\n        # 创建一个与收益率高度相关的因子\n        returns = torch.randn(252) * 0.02\n        factor = returns * 0.8 + torch.randn(252) * 0.01  # 80% 相关\n        \n        metrics = evaluator.evaluate(factor, returns)\n        \n        # IC 应该显著为正\n        assert metrics[\"ic\"] > 0.3\n    \n    def test_random_factor_low_ic(self, evaluator):\n        \"\"\"测试随机因子 IC 接近 0\"\"\"\n        returns = torch.randn(252) * 0.02\n        factor = torch.randn(252)  # 完全随机\n        \n        metrics = evaluator.evaluate(factor, returns)\n        \n        # IC 应该接近 0\n        assert abs(metrics[\"ic\"]) < 0.3\n    \n    def test_compare_factors(self, evaluator):\n        \"\"\"测试因子比较\"\"\"\n        returns = torch.randn(252) * 0.02\n        \n        # 创建不同质量的因子\n        good_factor = returns * 0.8 + torch.randn(252) * 0.01\n        bad_factor = torch.randn(252)\n        \n        results = evaluator.compare_factors(\n            [good_factor, bad_factor],\n            returns,\n            [\"good\", \"bad\"]\n        )\n        \n        assert \"good\" in results\n        assert \"bad\" in results\n        assert results[\"good\"][\"ic\"] > results[\"bad\"][\"ic\"]\n    \n    def test_rank_factors(self, evaluator):\n        \"\"\"测试因子排名\"\"\"\n        returns = torch.randn(100) * 0.02\n        \n        factors = [torch.randn(100) for _ in range(5)]\n        \n        ranking = evaluator.rank_factors(factors, returns)\n        \n        assert len(ranking) == 5\n        # 检查是降序排列\n        scores = [score for _, score in ranking]\n        assert scores == sorted(scores, reverse=True)\n\n\n# ============================================================================\n# F16: AlphaMiningTool 测试（需要 AgenticX 依赖）\n# ============================================================================\n\nclass TestAlphaMiningToolImport:\n    \"\"\"AlphaMiningTool 导入测试\"\"\"\n    \n    def test_import_tool(self):\n        \"\"\"测试工具可导入\"\"\"\n        try:\n            from app.alpha_mining.tools.alpha_mining_tool import AlphaMiningTool\n            assert AlphaMiningTool is not None\n        except ImportError as e:\n            # 如果 AgenticX 不可用，跳过\n            pytest.skip(f\"AgenticX not available: {e}\")\n    \n    def test_tool_metadata(self):\n        \"\"\"测试工具元数据\"\"\"\n        try:\n            from app.alpha_mining.tools.alpha_mining_tool import AlphaMiningTool\n            \n            tool = AlphaMiningTool()\n            \n            assert tool.name == \"alpha_mining\"\n            assert \"量化因子\" in tool.description\n            assert len(tool.parameters) > 0\n        except ImportError:\n            pytest.skip(\"AgenticX not available\")\n\n\n# ============================================================================\n# 端到端 P1 测试\n# ============================================================================\n\nclass TestP1EndToEnd:\n    \"\"\"P1 端到端测试\"\"\"\n    \n    def test_full_pipeline_with_real_features(self):\n        \"\"\"使用真实特征的完整流程\"\"\"\n        # 1. 准备行情数据\n        dates = pd.date_range(\"2024-01-01\", periods=252, freq=\"D\")\n        np.random.seed(42)\n        \n        market_df = pd.DataFrame({\n            \"close\": 100 * np.exp(np.cumsum(np.random.randn(252) * 0.02)),\n            \"volume\": np.abs(np.random.randn(252)) * 1e6 + 1e6,\n            \"turnover\": np.abs(np.random.randn(252)) * 0.05,\n        }, index=dates)\n        \n        # 2. 构建行情特征\n        market_builder = MarketFeatureBuilder()\n        market_features = market_builder.build(market_df)\n        \n        assert market_features.shape == (1, 4, 252)\n        \n        # 3. 准备情感数据\n        sentiment_data = {\n            \"sentiment\": np.random.randn(252) * 0.3,\n            \"news_count\": np.abs(np.random.randn(252)) * 5 + 1\n        }\n        \n        # 4. 构建情感特征\n        sentiment_builder = SentimentFeatureBuilder()\n        sentiment_features = sentiment_builder.build(sentiment_data, time_steps=252)\n        \n        assert sentiment_features.shape == (1, 2, 252)\n        \n        # 5. 合并特征\n        combined = sentiment_builder.combine_with_market(\n            market_features, sentiment_features\n        )\n        \n        assert combined.shape == (1, 6, 252)\n        \n        # 6. 导入生成器和 VM\n        from app.alpha_mining.model.alpha_generator import AlphaGenerator\n        from app.alpha_mining.vm.factor_vm import FactorVM\n        \n        config = AlphaMiningConfig(d_model=32, num_layers=1, max_seq_len=6)\n        generator = AlphaGenerator(config=config)\n        vm = FactorVM()\n        \n        # 7. 生成并执行因子\n        formulas, _ = generator.generate(batch_size=5, max_len=5)\n        \n        valid_factors = []\n        for formula in formulas:\n            factor = vm.execute(formula, combined)\n            if factor is not None and factor.std() > 1e-6:\n                valid_factors.append(factor)\n        \n        # 8. 评估因子\n        if valid_factors:\n            evaluator = FactorEvaluator()\n            returns = market_features[:, 0, :]  # RET 作为收益率\n            \n            for factor in valid_factors:\n                metrics = evaluator.evaluate(factor, returns)\n                assert \"sortino_ratio\" in metrics\n        \n        print(f\"\\n✅ P1 End-to-end test passed!\")\n        print(f\"   - Market features: {market_features.shape}\")\n        print(f\"   - Sentiment features: {sentiment_features.shape}\")\n        print(f\"   - Combined features: {combined.shape}\")\n        print(f\"   - Valid factors generated: {len(valid_factors)}/{len(formulas)}\")\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "backend/tests/test_smoke_alpha_mining.py",
    "content": "\"\"\"\nAlpha Mining 模块冒烟测试\n\n测试覆盖：\n1. DSL 操作符执行\n2. 因子虚拟机（FactorVM）\n3. 因子生成模型（AlphaGenerator）\n4. RL 训练器（AlphaTrainer）\n5. 因子评估器（FactorEvaluator）\n6. REST API 端点\n\"\"\"\n\nimport pytest\nimport torch\nimport numpy as np\nfrom typing import List\n\n# 确保可以导入模块\nimport sys\nfrom pathlib import Path\nsys.path.insert(0, str(Path(__file__).parent.parent / \"app\"))\n\n\nclass TestDSLOperators:\n    \"\"\"测试 DSL 操作符\"\"\"\n    \n    def test_ops_config_exists(self):\n        \"\"\"操作符配置存在\"\"\"\n        from app.alpha_mining.dsl.ops import OPS_CONFIG, get_op_names\n        \n        assert len(OPS_CONFIG) == 21, f\"Expected 21 operators, got {len(OPS_CONFIG)}\"\n        \n        names = get_op_names()\n        assert 'ADD' in names\n        assert 'SUB' in names\n        assert 'MUL' in names\n        assert 'DIV' in names\n        assert 'MA5' in names\n        assert 'DELAY1' in names\n    \n    def test_arithmetic_ops(self):\n        \"\"\"算术操作符测试\"\"\"\n        from app.alpha_mining.dsl.ops import get_op_by_name\n        \n        x = torch.tensor([1.0, 2.0, 3.0])\n        y = torch.tensor([2.0, 3.0, 4.0])\n        \n        # ADD\n        add_fn, add_arity = get_op_by_name('ADD')\n        assert add_arity == 2\n        result = add_fn(x, y)\n        assert torch.allclose(result, torch.tensor([3.0, 5.0, 7.0]))\n        \n        # MUL\n        mul_fn, mul_arity = get_op_by_name('MUL')\n        result = mul_fn(x, y)\n        assert torch.allclose(result, torch.tensor([2.0, 6.0, 12.0]))\n        \n        # DIV (safe division)\n        div_fn, _ = get_op_by_name('DIV')\n        result = div_fn(x, y)\n        assert result.shape == x.shape\n        assert not torch.any(torch.isinf(result))\n    \n    def test_timeseries_ops(self):\n        \"\"\"时序操作符测试\"\"\"\n        from app.alpha_mining.dsl.ops import ts_delay, ts_mean, ts_std\n        \n        x = torch.tensor([[1.0, 2.0, 3.0, 4.0, 5.0]])\n        \n        # Delay\n        delayed = ts_delay(x, 1)\n        assert delayed[0, 0] == 0  # 填充 0\n        assert delayed[0, 1] == 1  # 原来的第一个值\n        \n        # MA\n        ma = ts_mean(x, 3)\n        assert ma.shape == x.shape\n        \n        # STD\n        std = ts_std(x, 3)\n        assert std.shape == x.shape\n\n\nclass TestFactorVM:\n    \"\"\"测试因子虚拟机\"\"\"\n    \n    @pytest.fixture\n    def vm(self):\n        from app.alpha_mining.vm.factor_vm import FactorVM\n        from app.alpha_mining.dsl.vocab import DEFAULT_VOCAB\n        return FactorVM(vocab=DEFAULT_VOCAB)\n    \n    @pytest.fixture\n    def sample_features(self):\n        \"\"\"[batch=2, features=4, time=10]\"\"\"\n        return torch.randn(2, 4, 10)\n    \n    def test_execute_simple_formula(self, vm, sample_features):\n        \"\"\"执行简单因子表达式\"\"\"\n        # RET + VOL (假设 RET=0, VOL=1, ADD=某个 token)\n        formula = [0, 1, vm.vocab.name_to_token('ADD')]\n        \n        result = vm.execute(formula, sample_features)\n        assert result is not None\n        assert result.shape == (2, 10)  # [batch, time]\n    \n    def test_execute_invalid_formula(self, vm, sample_features):\n        \"\"\"无效表达式返回 None\"\"\"\n        # 不完整的表达式\n        formula = [0]  # 只有一个特征，没有操作\n        result = vm.execute(formula, sample_features)\n        # 只有一个操作数，应该返回该操作数（有效）\n        assert result is not None\n        \n        # 操作符参数不足\n        formula = [vm.vocab.name_to_token('ADD')]  # 二元操作符但没有操作数\n        result = vm.execute(formula, sample_features)\n        assert result is None\n    \n    def test_decode_formula(self, vm):\n        \"\"\"解码因子表达式为字符串\"\"\"\n        formula = [0, 1, vm.vocab.name_to_token('ADD')]\n        decoded = vm.decode(formula)\n        assert decoded is not None\n        assert 'ADD' in decoded or '+' in decoded\n\n\nclass TestAlphaGenerator:\n    \"\"\"测试因子生成模型\"\"\"\n    \n    @pytest.fixture\n    def generator(self):\n        from app.alpha_mining.model.alpha_generator import AlphaGenerator\n        from app.alpha_mining.dsl.vocab import DEFAULT_VOCAB\n        from app.alpha_mining.config import AlphaMiningConfig\n        \n        config = AlphaMiningConfig()\n        return AlphaGenerator(vocab=DEFAULT_VOCAB, config=config)\n    \n    def test_generate_batch(self, generator):\n        \"\"\"生成一批因子表达式\"\"\"\n        formulas, log_probs = generator.generate(batch_size=5, max_len=8)\n        \n        assert len(formulas) == 5\n        for formula in formulas:\n            assert len(formula) <= 8\n            assert all(isinstance(t, int) for t in formula)\n    \n    def test_generate_with_training(self, generator):\n        \"\"\"训练模式生成\"\"\"\n        sequences, log_probs_list, values = generator.generate_with_training(\n            batch_size=3, device='cpu'\n        )\n        \n        assert sequences.shape[0] == 3\n        assert len(log_probs_list) > 0\n\n\nclass TestAlphaTrainer:\n    \"\"\"测试 RL 训练器\"\"\"\n    \n    @pytest.fixture\n    def trainer(self):\n        from app.alpha_mining.model.trainer import AlphaTrainer\n        from app.alpha_mining.config import AlphaMiningConfig\n        \n        config = AlphaMiningConfig()\n        config.batch_size = 8\n        return AlphaTrainer(config=config)\n    \n    @pytest.fixture\n    def sample_data(self):\n        \"\"\"生成样本数据\"\"\"\n        features = torch.randn(10, 4, 50)  # [samples, features, time]\n        returns = torch.randn(10, 50)      # [samples, time]\n        return features, returns\n    \n    def test_train_step(self, trainer, sample_data):\n        \"\"\"单步训练测试\"\"\"\n        features, returns = sample_data\n        \n        metrics = trainer.train_step(features, returns)\n        \n        assert 'step' in metrics\n        assert 'loss' in metrics\n        assert 'avg_reward' in metrics\n        assert 'valid_ratio' in metrics\n        assert 'best_score' in metrics\n    \n    def test_train_with_callback(self, trainer, sample_data):\n        \"\"\"带回调的训练测试\"\"\"\n        features, returns = sample_data\n        \n        callback_results = []\n        def callback(metrics):\n            callback_results.append(metrics)\n        \n        result = trainer.train(\n            features=features,\n            returns=returns,\n            num_steps=3,\n            progress_bar=False,\n            step_callback=callback\n        )\n        \n        assert len(callback_results) == 3\n        assert 'best_score' in result\n        assert 'best_formula_str' in result\n\n\nclass TestFactorEvaluator:\n    \"\"\"测试因子评估器\"\"\"\n    \n    @pytest.fixture\n    def evaluator(self):\n        from app.alpha_mining.backtest.evaluator import FactorEvaluator\n        return FactorEvaluator()\n    \n    def test_evaluate_factor(self, evaluator):\n        \"\"\"评估因子\"\"\"\n        factor = torch.randn(50)   # 因子值\n        returns = torch.randn(50)  # 收益率\n        \n        metrics = evaluator.evaluate(factor, returns)\n        \n        assert 'sortino_ratio' in metrics\n        assert 'sharpe_ratio' in metrics\n        assert 'ic' in metrics\n        assert 'rank_ic' in metrics\n        assert 'max_drawdown' in metrics\n        assert 'turnover' in metrics\n        assert 'win_rate' in metrics\n    \n    def test_get_reward(self, evaluator):\n        \"\"\"获取 RL 奖励\"\"\"\n        factor = torch.randn(50)\n        returns = torch.randn(50)\n        \n        reward = evaluator.get_reward(factor, returns)\n        \n        assert isinstance(reward, float)\n\n\nclass TestVocab:\n    \"\"\"测试词汇表\"\"\"\n    \n    def test_vocab_initialization(self):\n        \"\"\"词汇表初始化\"\"\"\n        from app.alpha_mining.dsl.vocab import FactorVocab, FEATURES\n        \n        vocab = FactorVocab()\n        \n        assert vocab.vocab_size > 0\n        assert vocab.num_features == len(FEATURES)\n        assert vocab.num_ops > 0\n    \n    def test_token_conversion(self):\n        \"\"\"Token 转换\"\"\"\n        from app.alpha_mining.dsl.vocab import FactorVocab\n        \n        vocab = FactorVocab()\n        \n        # 特征转换\n        token = vocab.name_to_token('RET')\n        name = vocab.token_to_name(token)\n        assert name == 'RET'\n        \n        # 操作符转换\n        token = vocab.name_to_token('ADD')\n        name = vocab.token_to_name(token)\n        assert name == 'ADD'\n\n\nclass TestAPIEndpoints:\n    \"\"\"测试 REST API 端点（需要 FastAPI TestClient）\"\"\"\n    \n    @pytest.fixture\n    def client(self):\n        \"\"\"创建测试客户端\"\"\"\n        try:\n            from fastapi.testclient import TestClient\n            from app.main import app\n            return TestClient(app)\n        except ImportError:\n            pytest.skip(\"FastAPI TestClient not available\")\n    \n    def test_get_operators(self, client):\n        \"\"\"获取操作符列表\"\"\"\n        response = client.get(\"/api/v1/alpha-mining/operators\")\n        \n        assert response.status_code == 200\n        data = response.json()\n        assert data.get('success') is True\n        assert 'operators' in data\n        assert 'features' in data\n        assert len(data['operators']) == 21\n    \n    def test_get_factors_empty(self, client):\n        \"\"\"获取因子列表（空）\"\"\"\n        response = client.get(\"/api/v1/alpha-mining/factors?top_k=5\")\n        \n        assert response.status_code == 200\n        data = response.json()\n        assert data.get('success') is True\n        assert 'factors' in data\n    \n    def test_evaluate_factor(self, client):\n        \"\"\"评估因子表达式\"\"\"\n        response = client.post(\n            \"/api/v1/alpha-mining/evaluate\",\n            json={\"formula\": \"ADD(RET, VOL)\"}\n        )\n        \n        assert response.status_code == 200\n        data = response.json()\n        # 可能成功或失败（取决于公式解析）\n        assert 'success' in data\n    \n    def test_mine_task_start(self, client):\n        \"\"\"启动挖掘任务\"\"\"\n        response = client.post(\n            \"/api/v1/alpha-mining/mine\",\n            json={\"num_steps\": 5, \"use_sentiment\": False, \"batch_size\": 4}\n        )\n        \n        assert response.status_code == 200\n        data = response.json()\n        assert data.get('success') is True\n        assert 'task_id' in data\n\n\nclass TestEdgeCases:\n    \"\"\"边界条件测试\"\"\"\n    \n    def test_empty_formula(self):\n        \"\"\"空表达式\"\"\"\n        from app.alpha_mining.vm.factor_vm import FactorVM\n        from app.alpha_mining.dsl.vocab import DEFAULT_VOCAB\n        \n        vm = FactorVM(vocab=DEFAULT_VOCAB)\n        features = torch.randn(2, 4, 10)\n        \n        result = vm.execute([], features)\n        assert result is None\n    \n    def test_constant_factor_penalty(self):\n        \"\"\"常量因子惩罚\"\"\"\n        from app.alpha_mining.model.trainer import AlphaTrainer\n        from app.alpha_mining.config import AlphaMiningConfig\n        \n        config = AlphaMiningConfig()\n        trainer = AlphaTrainer(config=config)\n        \n        # 常量因子的标准差接近 0\n        constant_factor = torch.ones(50)\n        assert constant_factor.std() < config.constant_threshold\n    \n    def test_nan_handling(self):\n        \"\"\"NaN 处理\"\"\"\n        from app.alpha_mining.vm.factor_vm import FactorVM\n        from app.alpha_mining.dsl.vocab import DEFAULT_VOCAB\n        \n        vm = FactorVM(vocab=DEFAULT_VOCAB)\n        \n        # 创建包含 NaN 的特征\n        features = torch.randn(2, 4, 10)\n        features[0, 0, 5] = float('nan')\n        \n        # 执行应该处理 NaN\n        formula = [0]  # 只取第一个特征\n        result = vm.execute(formula, features)\n        \n        if result is not None:\n            # NaN 应该被替换为 0\n            assert not torch.any(torch.isnan(result))\n\n\n# 运行测试\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\", \"--tb=short\"])\n"
  },
  {
    "path": "deploy/Dockerfile.celery",
    "content": "FROM python:3.11\n\nWORKDIR /app\n\n# 复制requirements文件和entrypoint脚本\nCOPY backend/requirements.txt /app/requirements.txt\nCOPY deploy/celery-entrypoint.sh /usr/local/bin/celery-entrypoint.sh\n\n# 安装依赖（构建时安装，用于生产环境）\n# 注意：开发环境使用 volumes 挂载会覆盖 /app，依赖会在 entrypoint 中重新安装\nRUN pip install --no-cache-dir -r requirements.txt && \\\n    chmod +x /usr/local/bin/celery-entrypoint.sh\n\n# 设置entrypoint（用于开发环境：检查并安装依赖）\nENTRYPOINT [\"/usr/local/bin/celery-entrypoint.sh\"]\n\n# 设置默认命令（可以被docker-compose覆盖）\nCMD [\"celery\", \"-A\", \"app.core.celery_app\", \"worker\", \"--loglevel=info\"]\n"
  },
  {
    "path": "deploy/celery-entrypoint.sh",
    "content": "#!/bin/bash\nset -e\n\n# 开发环境：检查依赖是否已安装（通过检查关键包）\n# 注意：由于 volumes 挂载会覆盖 /app，构建时安装的依赖可能不可见\n# 这个脚本确保在开发环境中依赖总是可用的\nCHECK_PACKAGES=(\"celery\" \"fastapi\" \"sqlalchemy\")\nNEED_INSTALL=false\n\nfor pkg in \"${CHECK_PACKAGES[@]}\"; do\n    if ! python -c \"import ${pkg}\" 2>/dev/null; then\n        NEED_INSTALL=true\n        break\n    fi\ndone\n\nif [ \"$NEED_INSTALL\" = true ]; then\n    echo \"📦 [开发环境] 检测到依赖未安装，正在安装...\"\n    echo \"   提示：这是因为 volumes 挂载覆盖了镜像中的依赖\"\n    pip install --no-cache-dir -r requirements.txt\n    echo \"✅ 依赖安装完成\"\nelse\n    echo \"✅ 依赖已存在，跳过安装\"\nfi\n\n# 执行传入的命令\nexec \"$@\"\n"
  },
  {
    "path": "deploy/docker-compose.dev.yml",
    "content": "version: '3.8'\n\nservices:\n  postgres:\n    image: postgres:15-alpine\n    container_name: finnews_postgres\n    environment:\n      POSTGRES_USER: finnews\n      POSTGRES_PASSWORD: finnews_dev_password\n      POSTGRES_DB: finnews_db\n    ports:\n      - \"5432:5432\"\n    volumes:\n      - postgres_data:/var/lib/postgresql/data\n    healthcheck:\n      test: [\"CMD-SHELL\", \"pg_isready -U finnews -d finnews_db\"]\n      interval: 10s\n      timeout: 5s\n      retries: 5\n    networks:\n      - finnews_network\n\n  redis:\n    image: redis:7-alpine\n    container_name: finnews_redis\n    ports:\n      - \"6379:6379\"\n    command: redis-server --appendonly yes\n    volumes:\n      - redis_data:/data\n    healthcheck:\n      test: [\"CMD\", \"redis-cli\", \"ping\"]\n      interval: 10s\n      timeout: 5s\n      retries: 5\n    networks:\n      - finnews_network\n\n  milvus-etcd:\n    image: quay.io/coreos/etcd:v3.5.5\n    container_name: finnews_milvus_etcd\n    environment:\n      - ETCD_AUTO_COMPACTION_MODE=revision\n      - ETCD_AUTO_COMPACTION_RETENTION=1000\n      - ETCD_QUOTA_BACKEND_BYTES=4294967296\n      - ETCD_SNAPSHOT_COUNT=50000\n    volumes:\n      - milvus_etcd_data:/etcd\n    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd\n    healthcheck:\n      test: [\"CMD\", \"etcdctl\", \"endpoint\", \"health\"]\n      interval: 30s\n      timeout: 20s\n      retries: 3\n    networks:\n      - finnews_network\n\n  milvus-minio:\n    image: minio/minio:RELEASE.2023-03-20T20-16-18Z\n    container_name: finnews_milvus_minio\n    environment:\n      MINIO_ACCESS_KEY: minioadmin\n      MINIO_SECRET_KEY: minioadmin\n    ports:\n      - \"9001:9001\"\n      - \"9000:9000\"\n    volumes:\n      - milvus_minio_data:/minio_data\n    command: minio server /minio_data --console-address \":9001\"\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:9000/minio/health/live\"]\n      interval: 30s\n      timeout: 20s\n      retries: 3\n    networks:\n      - finnews_network\n\n  milvus-standalone:\n    image: milvusdb/milvus:v2.3.3\n    container_name: finnews_milvus\n    command: [\"milvus\", \"run\", \"standalone\"]\n    security_opt:\n      - seccomp:unconfined\n    environment:\n      ETCD_ENDPOINTS: milvus-etcd:2379\n      MINIO_ADDRESS: milvus-minio:9000\n    volumes:\n      - milvus_data:/var/lib/milvus\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:9091/healthz\"]\n      interval: 30s\n      start_period: 90s\n      timeout: 20s\n      retries: 3\n    ports:\n      - \"19530:19530\"\n      - \"9091:9091\"\n    depends_on:\n      - milvus-etcd\n      - milvus-minio\n    networks:\n      - finnews_network\n\n  celery-worker:\n    build:\n      context: ..\n      dockerfile: deploy/Dockerfile.celery\n    container_name: finnews_celery_worker\n    working_dir: /app\n    command: celery -A app.core.celery_app worker --loglevel=info\n    volumes:\n      - ../backend:/app\n    env_file:\n      - ../backend/.env\n    environment:\n      - POSTGRES_USER=finnews\n      - POSTGRES_PASSWORD=finnews_dev_password\n      - POSTGRES_HOST=postgres\n      - POSTGRES_PORT=5432\n      - POSTGRES_DB=finnews_db\n      - REDIS_HOST=redis\n      - REDIS_PORT=6379\n      - REDIS_DB=0\n      - NEO4J_URI=bolt://neo4j:7687\n      - NEO4J_USER=neo4j\n      - NEO4J_PASSWORD=finnews_neo4j_password\n    depends_on:\n      postgres:\n        condition: service_healthy\n      redis:\n        condition: service_healthy\n      neo4j:\n        condition: service_healthy\n    networks:\n      - finnews_network\n    dns:\n      - 8.8.8.8\n      - 8.8.4.4\n    restart: unless-stopped\n\n  celery-beat:\n    build:\n      context: ..\n      dockerfile: deploy/Dockerfile.celery\n    container_name: finnews_celery_beat\n    working_dir: /app\n    command: celery -A app.core.celery_app beat --loglevel=info\n    volumes:\n      - ../backend:/app\n    env_file:\n      - ../backend/.env\n    environment:\n      - POSTGRES_USER=finnews\n      - POSTGRES_PASSWORD=finnews_dev_password\n      - POSTGRES_HOST=postgres\n      - POSTGRES_PORT=5432\n      - POSTGRES_DB=finnews_db\n      - REDIS_HOST=redis\n      - REDIS_PORT=6379\n      - REDIS_DB=0\n      - NEO4J_URI=bolt://neo4j:7687\n      - NEO4J_USER=neo4j\n      - NEO4J_PASSWORD=finnews_neo4j_password\n    depends_on:\n      postgres:\n        condition: service_healthy\n      redis:\n        condition: service_healthy\n      neo4j:\n        condition: service_healthy\n    networks:\n      - finnews_network\n    dns:\n      - 8.8.8.8\n      - 8.8.4.4\n    restart: unless-stopped\n\n  # Neo4j - 知识图谱数据库\n  neo4j:\n    image: neo4j:5.26.0\n    container_name: finnews_neo4j\n    environment:\n      NEO4J_AUTH: neo4j/finnews_neo4j_password\n      NEO4J_PLUGINS: '[\"apoc\", \"graph-data-science\"]'\n      NEO4J_dbms_memory_pagecache_size: 1G\n      NEO4J_dbms_memory_heap_initial__size: 1G\n      NEO4J_dbms_memory_heap_max__size: 2G\n      NEO4J_apoc_export_file_enabled: 'true'\n      NEO4J_apoc_import_file_enabled: 'true'\n      NEO4J_apoc_import_file_use__neo4j__config: 'true'\n    ports:\n      - \"7474:7474\"  # HTTP\n      - \"7687:7687\"  # Bolt\n    volumes:\n      - neo4j_data:/data\n      - neo4j_logs:/logs\n      - neo4j_import:/var/lib/neo4j/import\n      - neo4j_plugins:/plugins\n    healthcheck:\n      test: [\"CMD\", \"cypher-shell\", \"-u\", \"neo4j\", \"-p\", \"finnews_neo4j_password\", \"RETURN 1\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n      start_period: 40s\n    networks:\n      - finnews_network\n    restart: unless-stopped\n\nvolumes:\n  postgres_data:\n    driver: local\n  redis_data:\n    driver: local\n  milvus_etcd_data:\n    driver: local\n  milvus_minio_data:\n    driver: local\n  milvus_data:\n    driver: local\n  neo4j_data:\n    driver: local\n  neo4j_logs:\n    driver: local\n  neo4j_import:\n    driver: local\n  neo4j_plugins:\n    driver: local\n\nnetworks:\n  finnews_network:\n    driver: bridge\n\n"
  },
  {
    "path": "docs/BochaAI_Web_Search_API_20251222_121535.md",
    "content": "# BochaAI_Web_Search_API\n\n> 来源: https://bocha-ai.feishu.cn/wiki/RXEOw02rFiwzGSkd9mUcqoeAnNK\n> 爬取时间: 2025-12-22 12:15:35\n> 方式: 浏览器提取\n\n---\n\n博查用户帮助文档\nWeb Search API\n\n一、API简介\n从全网搜索任何网页信息和网页链接，结果准确、摘要完整，更适合AI使用。\n可配置搜索时间范围、是否显示摘要，支持按分页获取更多结果。\n\n二、搜索结果\n包括网页、图片、视频，Response格式兼容Bing Search API。\n• 网页包括name、url、snippet、summary、siteName、siteIcon、datePublished等信息\n• 图片包括 contentUrl、hostPageUrl、width、height等信息\n\n三、API接口\n请求方式: POST\n请求地址: https://api.bochaai.com/v1/web-search\n\n四、请求参数\n| 参数 | 类型 | 必填 | 描述 |\n| --- | --- | --- | --- |\n| query | string | 是 | 搜索关键词 |\n| freshness | string | 否 | 搜索时间范围（noLimit, oneDay, oneWeek, oneMonth） |\n| count | integer | 否 | 返回结果数量（默认10，最大50） |\n| offset | integer | 否 | 偏移量 |\n\n五、响应定义\n返回结果包含 webPages, images, videos 等模块。\n每个网页包含 title, url, snippet, datePublished, siteName 等。\n\n六、Python SDK 示例\n```python\nimport requests\nimport json\n\nurl = \"https://api.bochaai.com/v1/web-search\"\npayload = json.dumps({\n  \"query\": \"彩讯股份\",\n  \"freshness\": \"oneMonth\",\n  \"count\": 10\n})\nheaders = {\n  'Authorization': 'Bearer YOUR_API_KEY',\n  'Content-Type': 'application/json'\n}\nresponse = requests.request(\"POST\", url, headers=headers, data=payload)\nprint(response.text)\n```\n\n"
  },
  {
    "path": "docs/天眼查MCP服务_20260104_171528.md",
    "content": "# 天眼查MCP服务\n\n> 来源: https://bigmodel.cn/marketplace/detail/1846da9039e4\n> 爬取时间: 2026-01-04 17:15:28\n> 方式: 浏览器提取\n\n---\n\n控制台\n应用空间\n体验中心\n开发文档\n特惠专区 🔥\nAPI Key\n财务\n返回广场\n天眼查\n全方位展示企业信息，实时监控企业风险，深挖股权关系，查询企业法律诉讼、知识产权等情况，助力识别风险。\n立即体验 \n介绍信息\n价格\n工具\n使用指南\n介绍信息\n什么是天眼查MCP服务？\n\n天眼查 MCP（Model Context Protocol）服务，作为连接天眼查丰富数据资源与各类应用的桥梁，通过标准化接口，为用户在企业信息查询、企业风险评估、企业专利洞察等方面，提供一站式、便捷且高效的数据调用与分析解决方案。该服务依托天眼查海量数据优势，借助 MCP 协议特性，突破传统数据获取与处理瓶颈，让不同类型用户轻松获取所需企业深度信息，辅助商业决策。\n\n支持类型：该 MCP 支持 SSE 和 Streamable 两种协议。\n\n核心功能\n（一）企业信息查询\n全量工商数据：支持通过企业标识快速获取注册信息、股权结构、分支机构、变更记录等，数据源自权威平台。\n多维度筛选：可按行业、注册资本、经营状态等条件精准定位目标企业。\n变更轨迹追溯：记录企业工商信息变更历史，辅助分析经营战略调整。\n（二）企业风险评估\n全维度风险监控：实时同步法律诉讼、失信记录、行政处罚等风险数据，直连法院、工商等权威源。\n风险关联分析：挖掘风险传导路径，如关联企业风险扩散、诉讼影响评估等。\n实时预警推送：自定义风险类型，目标企业触发条件时即时通知。\n（三）企业专利洞察\n专利全要素获取：快速调取专利名称、类型、法律状态、发明人等核心字段，评估技术实力。\n专利价值量化：综合引用次数、法律状态等指标量化专利资产，辅助投资与合作决策。\n侵权智能预警：比对专利相似度，提前识别侵权风险，支持技术研发避坑。\n如何在MCP Server上使用天眼查插件服务？\n\nMCP Server已完成天眼查插件服务的云端部署，用户操作简便。目前 MCP 服务已支持在体验中心添加使用。\n\n支持运行 MCP 协议的客户端，如Cherry Studio、vscode等中配置，在个人中心的API Key页面复制您的 API 密钥，并按照文档内容设置服务器命令。\n\n天眼查MCP插件服务的关键特性\n海量数据支撑：整合全国乃至全球海量企业数据，构建全面企业信息库，无论是新兴创业公司，还是成熟大型集团，都能在其中找到详尽信息。\n实时数据更新：与权威数据源实时对接，企业信息变更、风险事件发生、专利状态更新等，第一时间同步至系统，确保用户获取信息始终处于最新状态。\n智能检索分析：支持自然语言检索，用户可用日常语言描述需求；同时，内置智能分析引擎，对查询结果进行关联分析、趋势预测等，挖掘数据深层价值。\n安全可靠保障：数据传输全程加密，采用多重防护机制抵御网络攻击，确保数据隐私与服务稳定；服务器具备高并发处理能力，满足大量用户同时查询需求。\n多端集成便捷：无缝对接浏览器、办公软件及各类管理系统，用户无需切换复杂系统，在日常办公环境中即可随时调用天眼查 MCP 服务，提升工作效率。\n价格\n工具名称\t工具说明\t价格\ncompanyBaseInfo\t公司名称或ID、类型、成立日期、经营状态、注册资本、\n法人、工商注册号、组织机构代码、纳税人识别号等信息\t0.15/次\nrisk\t企业自身/周边/预警风险信息\t0.2/次\nenterprisePatent\t包括专利名称、申请号、申请公布号等字段的详细信息\t0.1/次\n工具\ncompanyBaseInfo\n可以通过公司名称或ID获取企业基本信息，企业基本信息包括公司名称或ID、类型、成立日期、经营状态、注册资本、法人、工商注册号、统一社会信用代码、组织机构代码、纳税人识别号等字段信息\nrisk\n可以通过关键字（公司名称、公司id、注册号或社会统一信用代码）获取企业相关天眼风险列表，包括企业自身/周边/预警风险信息\nenterprisePatent\n可以通过公司名称或ID获取专利的有关信息，包括专利名称、申请号、申请公布号等字段的详细信息\n使用指南\n天眼查MCP服务的使用场景示例\n投资领域：投资人在筛选投资项目时，借助天眼查 MCP 服务，通过企业信息了解目标企业基本面，利用风险评估功能排查潜在风险，依据专利洞察判断企业创新能力与技术壁垒，综合评估投资价值与风险，辅助投资决策。\n企业合作：企业寻求合作伙伴时，查询对方企业信息，明确其实力与信誉；评估合作方风险，避免合作过程中陷入法律纠纷、经营异常等陷阱；分析合作方专利布局，判断技术互补性，保障合作顺利开展。\n研发创新：研发人员利用企业专利洞察功能，检索行业内相关专利，了解前沿技术动态，避免重复研发；同时，通过分析竞争对手专利，寻找技术创新突破口，优化自身研发路径。\n政府招商：政府部门在招商引资过程中，借助天眼查 MCP 服务筛选拥有核心专利、具备创新实力与良好发展前景的企业；评估企业对本地产业带动价值，精准定位优质招商对象，提升招商质量与效果 。\n使用教程\n\n支持GLM文本模型API直接调用MCP\n\n或持运行MCP协议的客户端，如Cherry Studio、Vscode、Cursor\n\n点击获取智谱 BigModel 开放平台的API Key\n\n在BigModel体验中心使用\n\n目前 MCP 服务已支持在体验中心添加使用。\n\n打开模型设置，打开MCP开关，点击添加MCP。\n选择MCP，确认后，发送Prompt进行对话。\n通过GLM文本模型API直接调用\n\ncURL代码示例：\n\ncurl --request POST \\\\\n  --url https://open.bigmodel.cn/api/paas/v4/chat/completions \\\\\n  --header 'Authorization: Bearer Your_Zhipu_API_Key' \\\\\n  --header 'Content-Type: application/json' \\\\\n  --data '{\n  \"model\": \"glm-4.5\",\n  \"do_sample\": true,\n  \"stream\": false,\n  \"thinking\": {\n    \"type\": \"enabled\"\n  },\n  \"temperature\": 0.6,\n  \"top_p\": 0.95,\n  \"response_format\": {\n    \"type\": \"text\"\n  },\n  \"messages\": [\n    {\n      \"role\": \"user\",\n      \"content\": \"帮我查询下北京天眼查科技有限公司的基本信息\"\n    }\n  ],\n  \"tools\": [\n    {\n      \"mcp\": {\n        \"transport_type\": \"sse\",\n        \"server_label\": \"tianyancha\",\n        \"server_url\": \"https://open.bigmodel.cn/api/mcp-broker/proxy/tianyancha/sse\",\n        \"headers\": {\n          \"Authorization\": \"Bearer Your_Zhipu_API_Key\"\n        }\n      },\n      \"type\": \"mcp\"\n    }\n  ]\n}'\n\n在Cherry studio中使用\n1. 在对话界面，点击MCP按钮\n\n2. MCP服务器界面，点击添加服务器\n\n3. 完成以下配置：\n\n3.1 可流式传输的HTTP（streamableHttp）\n\n· URL：https://open.bigmodel.cn/api/mcp-broker/proxy/tianyancha/mcp\n\n· 请求头：Authorization = Your Zhipu API Key\n\n3.2 服务器发送事件（sse）\n\n· URL：https://open.bigmodel.cn/api/mcp-broker/proxy/tianyancha/sse\n\n· 请求头：Authorization = Your Zhipu API Key\n\n4. 回到对话界面，选择MCP\n\n5. 进行模型对话，即可使用\n\n在Cursor中使用\n\nCursor0.45.6版本提供了MCP功能，Cursor将作为MCP服务客户端使用的MCP服务，在Cursor中通过简单的配置就可以完成MCP服务的接入。\n\n操作路径：Cursor设置-->【Tools&Integrations】-->【tianyancha】。\n\n配置MCP服务器\n{\n  \"mcpServers\":\n     {\n        \"tianyancha\": \n           {\n               \"url\": \"https://open.bigmodel.cn/api/mcp-broker/proxy/tianyancha/mcp?Authorization=Your Zhipu API Key\"\n           }\n     } \n}\n\nCursor MCP使用\n\nCursor MCP必须在Composer的agent模式下使用。\n\n常见问题解答\n\n1. BigModel上哪些类型的模型支持MCP？\n\n实际上，MCP 是基于 Function Calling 接口来实现功能的，所以，使用 MCP 所选用的模型必须具备支持 Function Calling 的特性。Bigmodel上现在所有的语言模型（包括GLM-4-Plus、GLM-4-Flash等）均支持Function Calling。 Z1系列推理模型因为不支持 Function Calling ，无法调用MCP。\n\n2.如何获取API Key进行调用？\n\n前往智谱 BigModel 开放平台的 API Key\n点击\"添加新的API Key\"\nHover新添加的 API Key，点击复制ICON按钮。\n\n3. 现在有哪些MCP是支持在Cursor中使用的？\n\n所有Streamable的MCP，目前是可以在Cursor中调用的，其他的MCP仅支持在我们的体验中心、Cherry Studio和Vscode中使用。后续我们所有的MCP将新增Streamable类型，方便大家使用。\n\n4.我开发了一个MCP，如何申请入驻BigModel应用空间？\n\n你可以填写申请入驻的表单，我们将优先处理您的合作申请。\n\n官方推荐\n值得买\n北京值得买科技股份有限公司\n帮助用户查询商品的优惠信息、商品评测、商品概况、价格、购买渠道、性价比推荐等信息，并给出优惠商品的链接地址。\n贵金属价格查询\n杭州安那其科技有限公司\n提供全球贵金属的实时行情、历史价格、K线走势及期货合约数据。\n农产品行情数据\n湖南惠农科技有限公司\n全国常见农产品的行情价格数据，来自真实产地和市场用户一手行情，数据真实，可追溯。\n今日油价查询\n杭州玖舟数字科技有限公司\n提供全国实时油价、历史价格趋势、调价预测及地区对比，助力车主、物流等场景优化加油决策与成本管理。\n万物识别\n北京智谱华章科技股份有限公司\n万物识别MCP服务是智谱提供的基于深度学习的图像识别工具，能够快速分析图片中的地点和人物信息，支持整图及局部区域识别。\n"
  },
  {
    "path": "frontend/.gitignore",
    "content": "# Logs\nlogs\n*.log\nnpm-debug.log*\nyarn-debug.log*\nyarn-error.log*\npnpm-debug.log*\nlerna-debug.log*\n\nnode_modules\ndist\ndist-ssr\n*.local\n\n# Editor directories and files\n.vscode/*\n!.vscode/extensions.json\n.idea\n.DS_Store\n*.suo\n*.ntvs*\n*.njsproj\n*.sln\n*.sw?\n\n"
  },
  {
    "path": "frontend/QUICKSTART.md",
    "content": "# FinnewsHunter Frontend 快速启动\n\n## 🚀 5分钟启动\n\n### 1. 安装依赖\n\n```bash\nnpm install\n```\n\n### 2. 配置环境变量\n\n```bash\ncp .env.example .env\n# 默认配置已经指向 localhost:8000，无需修改\n```\n\n### 3. 启动开发服务器\n\n```bash\nnpm run dev\n```\n\n访问 http://localhost:3000\n\n### 4. 确保后端运行\n\n```bash\n# 在另一个终端\ncd ../backend\nuvicorn app.main:app --reload --host 0.0.0.0 --port 8000\n```\n\n---\n\n## 📁 项目结构\n\n- `src/pages/` - 页面组件\n- `src/components/ui/` - UI 组件库\n- `src/lib/` - 工具函数和 API 客户端\n- `src/store/` - Zustand 全局状态\n- `src/types/` - TypeScript 类型定义\n\n---\n\n## ✨ 功能演示\n\n### 1. 首页仪表盘\n- 统计卡片（总新闻数、任务数、成功率）\n- 最新新闻预览\n\n### 2. 新闻流\n- 爬取新闻（可配置页码范围）\n- 新闻卡片展示\n- 一键分析（调用 NewsAnalyst 智能体）\n- 情感评分展示\n\n### 3. 任务管理\n- 实时任务列表\n- 任务状态和进度\n- 自动刷新（每5秒）\n\n---\n\n## 🛠️ 开发命令\n\n```bash\n# 开发\nnpm run dev\n\n# 构建\nnpm run build\n\n# 预览构建\nnpm run preview\n\n# 代码检查\nnpm run lint\n\n# 格式化\nnpm run format\n```\n\n---\n\n**享受现代化的开发体验！🎉**\n\n"
  },
  {
    "path": "frontend/README.md",
    "content": "# FinnewsHunter Frontend (React + TypeScript)\n\n现代化的金融新闻智能分析平台前端，基于 **React 18 + TypeScript + Vite + Tailwind CSS + Shadcn UI**。\n\n## 技术栈\n\n- **Core**: React 18, TypeScript, Vite\n- **UI**: Tailwind CSS, Shadcn UI (Radix Primitives)\n- **State**: Zustand, TanStack Query (React Query)\n- **Routing**: React Router v6\n- **Icons**: Lucide React\n- **Notifications**: Sonner\n\n## 快速开始\n\n### 安装依赖\n\n```bash\nnpm install\n# 或使用 pnpm/yarn\n```\n\n### 开发模式\n\n```bash\nnpm run dev\n# 访问 http://localhost:3000\n```\n\n### 构建生产版本\n\n```bash\nnpm run build\nnpm run preview\n```\n\n## 项目结构\n\n```\nsrc/\n├── components/\n│   └── ui/              # Shadcn UI 组件\n│       ├── button.tsx\n│       ├── card.tsx\n│       └── badge.tsx\n├── layout/\n│   └── MainLayout.tsx   # 主布局（侧边栏+顶部栏）\n├── pages/\n│   ├── Dashboard.tsx            # 首页仪表盘\n│   ├── NewsListPage.tsx         # 新闻流\n│   ├── StockAnalysisPage.tsx    # 个股分析（待实现）\n│   ├── AgentMonitorPage.tsx     # 智能体监控（待实现）\n│   └── TaskManagerPage.tsx      # 任务管理\n├── lib/\n│   ├── api-client.ts    # API 客户端\n│   └── utils.ts         # 工具函数\n├── store/\n│   ├── useNewsStore.ts  # 新闻状态\n│   └── useTaskStore.ts  # 任务状态\n├── types/\n│   └── api.ts           # TypeScript 类型定义\n├── App.tsx\n├── main.tsx\n└── index.css\n```\n\n## 功能特性\n\n### ✅ 已实现\n- Dashboard 仪表盘（统计卡片）\n- 新闻列表展示\n- 新闻爬取功能\n- 智能分析按钮\n- 任务管理列表\n- 响应式布局\n- 实时数据刷新（React Query）\n\n### 🚧 开发中\n- 个股深度分析\n- K线图展示\n- 智能体监控台\n- WebSocket 实时推送\n- 辩论可视化\n\n## 开发指南\n\n### 添加新组件\n\n```bash\n# 从 Shadcn UI 添加组件\nnpx shadcn-ui@latest add dialog\nnpx shadcn-ui@latest add tabs\n```\n\n### API 调用\n\n```typescript\nimport { newsApi } from '@/lib/api-client'\nimport { useQuery } from '@tanstack/react-query'\n\nconst { data, isLoading } = useQuery({\n  queryKey: ['news', 'list'],\n  queryFn: () => newsApi.getNewsList({ limit: 20 }),\n})\n```\n\n### 状态管理\n\n```typescript\nimport { useNewsStore } from '@/store/useNewsStore'\n\nconst { newsList, setNewsList } = useNewsStore()\n```\n\n## 环境变量\n\n创建 `.env.local` 文件：\n\n```\nVITE_API_BASE_URL=http://localhost:8000/api/v1\n```\n\n## 与后端集成\n\n确保后端服务运行在 `http://localhost:8000`，前端会自动代理 API 请求到后端。\n\n## 下一步\n\n- [ ] 实现 WebSocket 连接（实时新闻推送）\n- [ ] 实现个股分析页面（K线图）\n- [ ] 实现智能体监控台（Chain of Thought）\n- [ ] 实现辩论可视化（Bull vs Bear）\n\n---\n\n**Built with ❤️ using React + AgenticX**\n\n"
  },
  {
    "path": "frontend/index.html",
    "content": "<!doctype html>\n<html lang=\"zh-CN\">\n  <head>\n    <meta charset=\"UTF-8\" />\n    <link rel=\"icon\" type=\"image/svg+xml\" href=\"/vite.svg\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n    <title>FinnewsHunter - 金融新闻智能分析平台</title>\n  </head>\n  <body>\n    <div id=\"root\"></div>\n    <script type=\"module\" src=\"/src/main.tsx\"></script>\n  </body>\n</html>\n\n"
  },
  {
    "path": "frontend/package.json",
    "content": "{\n  \"name\": \"finnews-hunter-frontend\",\n  \"private\": true,\n  \"version\": \"0.1.0\",\n  \"type\": \"module\",\n  \"scripts\": {\n    \"dev\": \"vite\",\n    \"build\": \"tsc && vite build\",\n    \"preview\": \"vite preview\",\n    \"lint\": \"eslint . --ext ts,tsx --report-unused-disable-directives --max-warnings 0\",\n    \"format\": \"prettier --write \\\"src/**/*.{ts,tsx,css}\\\"\"\n  },\n  \"dependencies\": {\n    \"@radix-ui/react-avatar\": \"^1.1.0\",\n    \"@radix-ui/react-dialog\": \"^1.1.1\",\n    \"@radix-ui/react-dropdown-menu\": \"^2.1.1\",\n    \"@radix-ui/react-label\": \"^2.1.0\",\n    \"@radix-ui/react-popover\": \"^1.1.1\",\n    \"@radix-ui/react-scroll-area\": \"^1.1.0\",\n    \"@radix-ui/react-select\": \"^2.1.1\",\n    \"@radix-ui/react-separator\": \"^1.1.0\",\n    \"@radix-ui/react-slot\": \"^1.1.0\",\n    \"@radix-ui/react-tabs\": \"^1.1.0\",\n    \"@radix-ui/react-tooltip\": \"^1.1.2\",\n    \"@tanstack/react-query\": \"^5.28.0\",\n    \"axios\": \"^1.6.7\",\n    \"class-variance-authority\": \"^0.7.0\",\n    \"clsx\": \"^2.1.0\",\n    \"date-fns\": \"^3.3.1\",\n    \"framer-motion\": \"^11.0.8\",\n    \"lucide-react\": \"^0.343.0\",\n    \"react\": \"^18.2.0\",\n    \"react-dom\": \"^18.2.0\",\n    \"react-markdown\": \"^9.0.1\",\n    \"react-router-dom\": \"^6.22.2\",\n    \"recharts\": \"^2.12.0\",\n    \"klinecharts\": \"^9.8.10\",\n    \"remark-gfm\": \"^4.0.1\",\n    \"socket.io-client\": \"^4.7.4\",\n    \"sonner\": \"^1.4.3\",\n    \"tailwind-merge\": \"^2.2.1\",\n    \"tailwindcss-animate\": \"^1.0.7\",\n    \"zustand\": \"^4.5.1\"\n  },\n  \"devDependencies\": {\n    \"@types/node\": \"^20.11.24\",\n    \"@types/react\": \"^18.2.61\",\n    \"@types/react-dom\": \"^18.2.19\",\n    \"@typescript-eslint/eslint-plugin\": \"^7.1.0\",\n    \"@typescript-eslint/parser\": \"^7.1.0\",\n    \"@vitejs/plugin-react-swc\": \"^3.5.0\",\n    \"autoprefixer\": \"^10.4.18\",\n    \"eslint\": \"^8.57.0\",\n    \"eslint-plugin-react-hooks\": \"^4.6.0\",\n    \"eslint-plugin-react-refresh\": \"^0.4.5\",\n    \"postcss\": \"^8.4.35\",\n    \"prettier\": \"^3.2.5\",\n    \"tailwindcss\": \"^3.4.1\",\n    \"typescript\": \"^5.3.3\",\n    \"vite\": \"^7.2.7\"\n  }\n}\n"
  },
  {
    "path": "frontend/postcss.config.js",
    "content": "export default {\n  plugins: {\n    tailwindcss: {},\n    autoprefixer: {},\n  },\n}\n\n"
  },
  {
    "path": "frontend/src/App.tsx",
    "content": "import { Routes, Route } from 'react-router-dom'\nimport { Toaster } from 'sonner'\nimport MainLayout from './layout/MainLayout'\nimport Dashboard from './pages/Dashboard'\nimport NewsListPage from './pages/NewsListPage'\nimport StockSearchPage from './pages/StockSearchPage'\nimport StockAnalysisPage from './pages/StockAnalysisPage'\nimport AgentMonitorPage from './pages/AgentMonitorPage'\nimport TaskManagerPage from './pages/TaskManagerPage'\nimport AlphaMiningPage from './pages/AlphaMiningPage'\n\nfunction App() {\n  return (\n    <>\n      <Routes>\n        <Route path=\"/\" element={<MainLayout />}>\n          <Route index element={<Dashboard />} />\n          <Route path=\"news\" element={<NewsListPage />} />\n          <Route path=\"stock\" element={<StockSearchPage />} />\n          <Route path=\"stock/:code\" element={<StockAnalysisPage />} />\n          <Route path=\"agents\" element={<AgentMonitorPage />} />\n          <Route path=\"tasks\" element={<TaskManagerPage />} />\n          <Route path=\"alpha-mining\" element={<AlphaMiningPage />} />\n        </Route>\n      </Routes>\n      <Toaster richColors position=\"top-right\" />\n    </>\n  )\n}\n\nexport default App\n\n"
  },
  {
    "path": "frontend/src/components/DebateChatRoom.tsx",
    "content": "import React, { useState, useRef, useEffect, useCallback } from 'react'\nimport { \n  Send, User, TrendingUp, TrendingDown, Briefcase, \n  Loader2, Bot, History, Trash2, Search, ChevronDown,\n  CheckCircle2, Clock, ListChecks, PlayCircle, XCircle\n} from 'lucide-react'\nimport { Button } from '@/components/ui/button'\nimport ReactMarkdown from 'react-markdown'\nimport remarkGfm from 'remark-gfm'\nimport { cn } from '@/lib/utils'\nimport MentionInput, { MentionTarget } from './MentionInput'\nimport type { DebateSession } from '@/store/useDebateStore'\nimport { agentApi, SSEDebateEvent } from '@/lib/api-client'\nimport { toast } from 'sonner'\nimport { useGlobalI18n, useLanguageStore } from '@/store/useLanguageStore'\n\n// 消息角色类型\nexport type ChatRole = 'user' | 'bull' | 'bear' | 'manager' | 'system' | 'data_collector' | 'search'\n\n// 搜索计划类型\nexport interface SearchTask {\n  id: string\n  source: string\n  query: string\n  description: string\n  icon: string\n  estimated_time: number\n}\n\nexport interface SearchPlan {\n  plan_id: string\n  stock_code: string\n  stock_name: string\n  user_query: string\n  tasks: SearchTask[]\n  total_estimated_time: number\n}\n\n// 聊天消息类型\nexport interface ChatMessage {\n  id: string\n  role: ChatRole\n  content: string\n  timestamp: Date\n  round?: number\n  isStreaming?: boolean\n  searchPlan?: SearchPlan // 关联的搜索计划\n  searchStatus?: 'pending' | 'executing' | 'completed' | 'cancelled'\n}\n\n// 获取角色配置（支持国际化）\nconst getRoleConfig = (t: any): Record<ChatRole, {\n  name: string\n  icon: React.ReactNode\n  bgColor: string\n  textColor: string\n  borderColor: string\n  align: 'left' | 'right'\n}> => ({\n  user: {\n    name: t.debateHistory.roleNames.user,\n    icon: <User className=\"w-4 h-4\" />,\n    bgColor: 'bg-blue-500',\n    textColor: 'text-white',\n    borderColor: 'border-blue-500',\n    align: 'right'\n  },\n  bull: {\n    name: t.debateHistory.roleNames.bull,\n    icon: <TrendingUp className=\"w-4 h-4\" />,\n    bgColor: 'bg-emerald-500',\n    textColor: 'text-white',\n    borderColor: 'border-emerald-300',\n    align: 'left'\n  },\n  bear: {\n    name: t.debateHistory.roleNames.bear,\n    icon: <TrendingDown className=\"w-4 h-4\" />,\n    bgColor: 'bg-rose-500',\n    textColor: 'text-white',\n    borderColor: 'border-rose-300',\n    align: 'left'\n  },\n  manager: {\n    name: t.debateHistory.roleNames.manager,\n    icon: <Briefcase className=\"w-4 h-4\" />,\n    bgColor: 'bg-indigo-500',\n    textColor: 'text-white',\n    borderColor: 'border-indigo-300',\n    align: 'left'\n  },\n  data_collector: {\n    name: t.debateHistory.roleNames.data_collector,\n    icon: <Bot className=\"w-4 h-4\" />,\n    bgColor: 'bg-purple-500',\n    textColor: 'text-white',\n    borderColor: 'border-purple-300',\n    align: 'left'\n  },\n  system: {\n    name: 'System',\n    icon: <Bot className=\"w-4 h-4\" />,\n    bgColor: 'bg-gray-400',\n    textColor: 'text-white',\n    borderColor: 'border-gray-200',\n    align: 'left'\n  },\n  search: {\n    name: 'Search Results',\n    icon: <Bot className=\"w-4 h-4\" />,\n    bgColor: 'bg-cyan-500',\n    textColor: 'text-white',\n    borderColor: 'border-cyan-300',\n    align: 'left'\n  }\n})\n\ninterface DebateChatRoomProps {\n  messages: ChatMessage[]\n  onSendMessage: (content: string, mentions?: MentionTarget[]) => void\n  isDebating: boolean\n  currentRound?: { round: number; maxRounds: number } | null\n  activeAgent?: string | null\n  stockName?: string\n  disabled?: boolean\n  // 历史相关\n  historySessions?: DebateSession[]\n  onLoadSession?: (sessionId: string) => void\n  onClearHistory?: () => void\n  showHistory?: boolean\n  // 搜索计划相关\n  onConfirmSearch?: (plan: SearchPlan, msgId: string) => void\n  onCancelSearch?: (msgId: string) => void\n}\n\n// 搜索计划展示组件\nconst SearchPlanCard: React.FC<{ \n  plan: SearchPlan, \n  status: string,\n  onConfirm: (plan: SearchPlan) => void,\n  onCancel: () => void\n}> = ({ plan, status, onConfirm, onCancel }) => {\n  const t = useGlobalI18n()\n  const isPending = status === 'pending'\n  const isExecuting = status === 'executing'\n  \n  return (\n    <div className=\"mt-3 p-4 bg-slate-50 rounded-xl border border-slate-200 shadow-sm animate-in fade-in zoom-in duration-300\">\n      <div className=\"flex items-center gap-2 mb-3 pb-2 border-b border-slate-200\">\n        <ListChecks className=\"w-5 h-5 text-indigo-500\" />\n        <h4 className=\"font-semibold text-slate-800 text-sm\">📋 {t.debateRoom.searchPlanConfirm}</h4>\n      </div>\n      \n      <div className=\"space-y-2 mb-4\">\n        {plan.tasks.map((task, index) => (\n          <div key={task.id} className=\"flex items-start gap-3 text-xs text-slate-600\">\n            <span className=\"mt-0.5\">{task.icon || '🔍'}</span>\n            <div className=\"flex-1\">\n              <p className=\"font-medium text-slate-700\">{index + 1}. {task.description}</p>\n              <p className=\"text-[10px] text-slate-400\">{t.debateRoom.roundPrefix === '第' ? '关键词' : 'Keyword'}: \"{task.query}\"</p>\n            </div>\n          </div>\n        ))}\n      </div>\n      \n      <div className=\"flex items-center justify-between pt-2\">\n        <div className=\"flex items-center gap-1.5 text-[10px] text-slate-400\">\n          <Clock className=\"w-3 h-3\" />\n          {t.debateRoom.estimatedTime}: {plan.total_estimated_time}{t.debateRoom.seconds}\n        </div>\n        \n        {isPending && (\n          <div className=\"flex gap-2\">\n            <Button \n              size=\"sm\" \n              variant=\"outline\" \n              className=\"h-7 text-[10px] px-3 py-0\"\n              onClick={onCancel}\n            >\n              {t.debateRoom.searchPlanCancel}\n            </Button>\n            <Button \n              size=\"sm\" \n              className=\"h-7 text-[10px] px-3 py-0 bg-indigo-500 hover:bg-indigo-600\"\n              onClick={() => onConfirm(plan)}\n            >\n              {t.debateRoom.searchPlanConfirmBtn}\n            </Button>\n          </div>\n        )}\n        \n        {isExecuting && (\n          <div className=\"flex items-center gap-2 text-[10px] text-indigo-600 animate-pulse\">\n            <Loader2 className=\"w-3 h-3 animate-spin\" />\n            {t.debateRoom.searchPlanExecuting}\n          </div>\n        )}\n        \n        {status === 'completed' && (\n          <div className=\"flex items-center gap-1 text-[10px] text-emerald-600 font-medium\">\n            <CheckCircle2 className=\"w-3 h-3\" />\n            {t.debateRoom.searchPlanCompleted}\n          </div>\n        )}\n      </div>\n    </div>\n  )\n}\n\n// 单条消息组件\nconst ChatBubble: React.FC<{ \n  message: ChatMessage,\n  onConfirmSearch?: (plan: SearchPlan, msgId: string) => void,\n  onCancelSearch?: (msgId: string) => void\n}> = ({ message, onConfirmSearch, onCancelSearch }) => {\n  const t = useGlobalI18n()\n  const ROLE_CONFIG = getRoleConfig(t)\n  const config = ROLE_CONFIG[message.role]\n  const isRight = config.align === 'right'\n  \n  return (\n    <div className={cn(\n      \"flex gap-2 mb-4 animate-in fade-in slide-in-from-bottom-2 duration-300\",\n      isRight ? \"flex-row-reverse\" : \"flex-row\"\n    )}>\n      {/* 头像 */}\n      <div className={cn(\n        \"w-9 h-9 rounded-full flex items-center justify-center flex-shrink-0 shadow-sm\",\n        config.bgColor,\n        config.textColor\n      )}>\n        {config.icon}\n      </div>\n      \n      {/* 消息体 */}\n      <div className={cn(\"flex flex-col max-w-[75%]\", isRight ? \"items-end\" : \"items-start\")}>\n        {/* 角色名称和轮次 */}\n        <div className={cn(\n          \"flex items-center gap-2 mb-1 text-xs\",\n          isRight ? \"flex-row-reverse\" : \"flex-row\"\n        )}>\n          <span className=\"font-medium text-gray-600\">{config.name}</span>\n          {message.round && (\n            <span className=\"px-1.5 py-0.5 rounded bg-gray-100 text-gray-500 text-[10px]\">\n              {t.debateRoom.roundPrefix}{message.round}{t.debateRoom.roundSuffix}\n            </span>\n          )}\n          <span className=\"text-gray-400\">\n            {message.timestamp.toLocaleTimeString(t.debateRoom.roundPrefix === '第' ? 'zh-CN' : 'en-US', { hour: '2-digit', minute: '2-digit' })}\n          </span>\n        </div>\n        \n        {/* 消息气泡 */}\n        <div className={cn(\n          \"rounded-2xl px-4 py-2.5 shadow-sm border\",\n          isRight \n            ? \"bg-blue-500 text-white rounded-tr-sm border-blue-400\" \n            : `bg-white ${config.borderColor} rounded-tl-sm`\n        )}>\n          {message.content ? (\n            <div className={cn(\n              \"prose prose-sm max-w-none\",\n              isRight ? \"prose-invert\" : \"prose-gray\"\n            )}>\n              <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                {message.content}\n              </ReactMarkdown>\n              {message.isStreaming && (\n                <span className=\"inline-block w-2 h-4 bg-current opacity-70 animate-pulse ml-1 align-middle rounded-sm\" />\n              )}\n            </div>\n          ) : message.searchPlan ? (\n            <div className=\"text-sm text-gray-500 italic\">{t.stockDetail.generatingSearchPlan}</div>\n          ) : (\n            <div className=\"flex items-center gap-2 text-gray-400\">\n              <Loader2 className=\"w-4 h-4 animate-spin\" />\n              <span className=\"text-sm\">{t.debateRoom.thinking}</span>\n            </div>\n          )}\n          \n          {/* 搜索计划卡片 */}\n          {message.searchPlan && (\n            <SearchPlanCard \n              plan={message.searchPlan} \n              status={message.searchStatus || 'pending'}\n              onConfirm={(plan) => onConfirmSearch?.(plan, message.id)}\n              onCancel={() => onCancelSearch?.(message.id)}\n            />\n          )}\n        </div>\n      </div>\n    </div>\n  )\n}\n\n// 系统消息组件\nconst SystemMessage: React.FC<{ message: ChatMessage }> = ({ message }) => (\n  <div className=\"flex justify-center my-3\">\n    <div className=\"px-3 py-1 rounded-full bg-gray-100 text-gray-500 text-xs\">\n      {message.content}\n    </div>\n  </div>\n)\n\n// 主组件\nconst DebateChatRoom: React.FC<DebateChatRoomProps> = ({\n  messages,\n  onSendMessage,\n  isDebating,\n  currentRound,\n  activeAgent,\n  stockName,\n  disabled = false,\n  historySessions = [],\n  onLoadSession,\n  onClearHistory,\n  showHistory = true,\n  onConfirmSearch,\n  onCancelSearch\n}) => {\n  const t = useGlobalI18n()\n  const ROLE_CONFIG = getRoleConfig(t)\n  const [inputValue, setInputValue] = useState('')\n  const [showHistoryDropdown, setShowHistoryDropdown] = useState(false)\n  const [pendingMentions, setPendingMentions] = useState<MentionTarget[]>([])\n  const scrollRef = useRef<HTMLDivElement>(null)\n  const historyDropdownRef = useRef<HTMLDivElement>(null)\n  \n  // 自动滚动到底部\n  useEffect(() => {\n    if (scrollRef.current) {\n      scrollRef.current.scrollTop = scrollRef.current.scrollHeight\n    }\n  }, [messages])\n  \n  // 点击外部关闭历史下拉框\n  useEffect(() => {\n    const handleClickOutside = (e: MouseEvent) => {\n      if (historyDropdownRef.current && !historyDropdownRef.current.contains(e.target as Node)) {\n        setShowHistoryDropdown(false)\n      }\n    }\n    document.addEventListener('mousedown', handleClickOutside)\n    return () => document.removeEventListener('mousedown', handleClickOutside)\n  }, [])\n  \n  const handleSendWithMentions = useCallback((content: string, mentions: MentionTarget[]) => {\n    if (content.trim() && !disabled && !isDebating) {\n      onSendMessage(content.trim(), mentions)\n      setInputValue('')\n      setPendingMentions([])\n    }\n  }, [disabled, isDebating, onSendMessage])\n  \n  // 获取当前活跃角色的提示\n  const getActiveIndicator = () => {\n    if (!activeAgent) return null\n    \n    const agentMap: Record<string, ChatRole> = {\n      'BullResearcher': 'bull',\n      'BearResearcher': 'bear',\n      'InvestmentManager': 'manager',\n      'DataCollector': 'data_collector'\n    }\n    \n    const role = agentMap[activeAgent]\n    if (!role) return null\n    \n    const config = ROLE_CONFIG[role]\n    return (\n      <div className=\"flex items-center gap-2 text-sm text-gray-500\">\n        <div className={cn(\"w-2 h-2 rounded-full animate-pulse\", config.bgColor)} />\n        <span>{config.name} {t.debateRoom.typing}</span>\n      </div>\n    )\n  }\n  \n  return (\n    <div className=\"flex flex-col h-[600px] bg-gradient-to-b from-gray-50 to-white rounded-xl border shadow-lg overflow-hidden\">\n      {/* 头部 */}\n      <div className=\"flex items-center justify-between px-4 py-3 bg-white border-b\">\n        <div className=\"flex items-center gap-3\">\n          <div className=\"flex -space-x-2\">\n            <div className=\"w-8 h-8 rounded-full bg-emerald-500 flex items-center justify-center text-white ring-2 ring-white\">\n              <TrendingUp className=\"w-4 h-4\" />\n            </div>\n            <div className=\"w-8 h-8 rounded-full bg-rose-500 flex items-center justify-center text-white ring-2 ring-white\">\n              <TrendingDown className=\"w-4 h-4\" />\n            </div>\n            <div className=\"w-8 h-8 rounded-full bg-indigo-500 flex items-center justify-center text-white ring-2 ring-white\">\n              <Briefcase className=\"w-4 h-4\" />\n            </div>\n          </div>\n          <div>\n            <h3 className=\"font-semibold text-gray-900\">\n              {stockName ? `${stockName} ${t.debateRoom.title}` : t.debateRoom.titlePlaceholder}\n            </h3>\n            <p className=\"text-xs text-gray-500\">{t.debateRoom.subtitle}</p>\n          </div>\n        </div>\n        \n        {/* 轮次指示器 */}\n        {currentRound && (\n          <div className=\"flex items-center gap-2 px-3 py-1.5 bg-purple-50 rounded-full\">\n            <div className=\"flex gap-0.5\">\n              {Array.from({ length: currentRound.maxRounds }, (_, i) => (\n                <div\n                  key={i}\n                  className={cn(\n                    \"w-2 h-2 rounded-full transition-colors\",\n                    i < currentRound.round\n                      ? 'bg-purple-500'\n                      : 'bg-gray-200'\n                  )}\n                />\n              ))}\n            </div>\n            <span className=\"text-xs font-medium text-purple-600\">\n              {t.debateRoom.roundPrefix}{currentRound.round}{t.debateRoom.roundSuffix}\n            </span>\n          </div>\n        )}\n      </div>\n      \n      {/* 消息区域 */}\n      <div \n        ref={scrollRef}\n        className=\"flex-1 px-4 overflow-y-auto scrollbar-thin scrollbar-thumb-gray-300 scrollbar-track-transparent\"\n      >\n        <div className=\"py-4\">\n          {messages.length === 0 ? (\n            <div className=\"flex flex-col items-center justify-center h-full text-gray-400 py-20\">\n              <div className=\"w-16 h-16 rounded-full bg-gray-100 flex items-center justify-center mb-4\">\n                <Briefcase className=\"w-8 h-8 text-gray-300\" />\n              </div>\n              <p className=\"text-sm mb-2\">{t.debateRoom.clickStartDebate}</p>\n              <p className=\"text-xs\">{t.debateRoom.canSpeakDuringDebate}</p>\n            </div>\n          ) : (\n            messages.map((msg) => (\n              msg.role === 'system' ? (\n                <SystemMessage key={msg.id} message={msg} />\n              ) : (\n                <ChatBubble \n                  key={msg.id} \n                  message={msg} \n                  onConfirmSearch={onConfirmSearch}\n                  onCancelSearch={onCancelSearch}\n                />\n              )\n            ))\n          )}\n          \n          {/* 输入指示器 */}\n          {isDebating && activeAgent && (\n            <div className=\"flex items-center gap-2 ml-11 mb-4\">\n              {getActiveIndicator()}\n            </div>\n          )}\n        </div>\n      </div>\n      \n      {/* 输入区域 */}\n      <div className=\"px-4 py-3 bg-white border-t\">\n        <div className=\"flex items-center gap-2\">\n          <div className=\"w-8 h-8 rounded-full bg-blue-500 flex items-center justify-center text-white flex-shrink-0\">\n            <User className=\"w-4 h-4\" />\n          </div>\n          <MentionInput\n            value={inputValue}\n            onChange={setInputValue}\n            onSubmit={handleSendWithMentions}\n            placeholder={isDebating ? t.debateRoom.debateInProgress : t.mentionInput.placeholder}\n            disabled={disabled}\n          />\n          <Button\n            onClick={() => handleSendWithMentions(inputValue, pendingMentions)}\n            disabled={!inputValue.trim() || disabled || isDebating}\n            size=\"icon\"\n            className=\"rounded-full bg-blue-500 hover:bg-blue-600\"\n          >\n            <Send className=\"w-4 h-4\" />\n          </Button>\n        </div>\n        \n        {/* 提示和历史按钮 */}\n        <div className=\"flex items-center justify-between mt-2 ml-10\">\n          {isDebating ? (\n            <p className=\"text-xs text-gray-400\">\n              💡 {t.debateRoom.mentionTip}\n            </p>\n          ) : (\n            <p className=\"text-xs text-gray-400\">\n              💡 {t.stockDetail.history === '历史' ? '输入 @ 可以选择智能体或数据源' : 'Enter @ to select agents or data sources'}\n          </p>\n        )}\n          \n          {/* 历史记录按钮 */}\n          {showHistory && historySessions.length > 0 && (\n            <div className=\"relative\" ref={historyDropdownRef}>\n              <Button\n                variant=\"ghost\"\n                size=\"sm\"\n                onClick={() => setShowHistoryDropdown(!showHistoryDropdown)}\n                className=\"h-7 px-2 text-gray-500 hover:text-gray-700\"\n              >\n                <History className=\"w-3.5 h-3.5 mr-1\" />\n                {t.debateHistory.history} ({historySessions.length})\n                <ChevronDown className={cn(\"w-3 h-3 ml-1 transition-transform\", showHistoryDropdown && \"rotate-180\")} />\n              </Button>\n              \n              {/* 历史下拉菜单 */}\n              {showHistoryDropdown && (\n                <div className=\"absolute bottom-full right-0 mb-1 w-64 bg-white rounded-lg shadow-xl border border-gray-200 py-2 z-50 animate-in fade-in slide-in-from-bottom-2 duration-200\">\n                  <div className=\"px-3 py-1 border-b border-gray-100 flex items-center justify-between\">\n                    <span className=\"text-xs font-medium text-gray-500\">{t.debateHistory.history} {t.stockDetail.session}</span>\n                    {onClearHistory && (\n                      <button \n                        onClick={() => {\n                          if (confirm(t.agents.confirmClearLogs)) {\n                            onClearHistory()\n                            setShowHistoryDropdown(false)\n                          }\n                        }}\n                        className=\"text-xs text-rose-500 hover:text-rose-600 flex items-center gap-1\"\n                      >\n                        <Trash2 className=\"w-3 h-3\" />\n                        {t.common.cancel === '取消' ? '清除' : 'Clear'}\n                      </button>\n                    )}\n                  </div>\n                  <div className=\"max-h-48 overflow-y-auto\">\n                    {historySessions.map((session, index) => (\n                      <button\n                        key={session.id}\n                        onClick={() => {\n                          onLoadSession?.(session.id)\n                          setShowHistoryDropdown(false)\n                        }}\n                        className=\"w-full px-3 py-2 text-left hover:bg-gray-50 flex items-center gap-2\"\n                      >\n                        <div className=\"w-6 h-6 rounded-full bg-gray-100 flex items-center justify-center text-xs text-gray-500 flex-shrink-0\">\n                          {index + 1}\n                        </div>\n                        <div className=\"flex-1 min-w-0\">\n                          <div className=\"text-sm font-medium text-gray-700 truncate\">\n                            {session.stockName || session.stockCode}\n                          </div>\n                          <div className=\"text-xs text-gray-400\">\n                            {session.messages.length} {t.debateHistory.messages} · {new Date(session.updatedAt).toLocaleDateString(t.debateRoom.roundPrefix === '第' ? 'zh-CN' : 'en-US')}\n                          </div>\n                        </div>\n                      </button>\n                    ))}\n                  </div>\n                </div>\n              )}\n            </div>\n          )}\n        </div>\n      </div>\n    </div>\n  )\n}\n\nexport default DebateChatRoom\n\n"
  },
  {
    "path": "frontend/src/components/DebateConfig.tsx",
    "content": "/**\n * 辩论模式配置组件\n * 支持选择不同的多智能体协作模式\n */\nimport React, { useState, useEffect } from 'react'\nimport {\n  Settings,\n  Zap,\n  Theater,\n  Rocket,\n  Clock,\n  Users,\n  MessageSquare,\n  ChevronDown,\n  ChevronUp,\n  Info\n} from 'lucide-react'\nimport { useGlobalI18n } from '@/store/useLanguageStore'\n\n// 辩论模式类型\nexport interface DebateMode {\n  id: string\n  name: string\n  description: string\n  icon: string\n  isDefault?: boolean\n}\n\n// 模式规则配置\nexport interface ModeRules {\n  maxTime: number\n  maxRounds?: number\n  managerCanInterrupt?: boolean\n  requireDataCollection?: boolean\n}\n\n// 可用的辩论模式（使用函数获取，支持国际化）\nconst getDebateModes = (t: any): DebateMode[] => [\n  {\n    id: 'parallel',\n    name: t.stockDetail.parallelAnalysis,\n    description: t.stockDetail.parallelAnalysisDesc || 'Bull/Bear parallel analysis, Investment Manager summarizes decision',\n    icon: '⚡',\n    isDefault: true\n  },\n  {\n    id: 'realtime_debate',\n    name: t.stockDetail.realtimeDebate,\n    description: t.stockDetail.realtimeDebateDesc || 'Four agents real-time dialogue, Investment Manager moderates, Bull/Bear alternate',\n    icon: '🎭'\n  },\n  {\n    id: 'quick_analysis',\n    name: t.stockDetail.quickAnalysis,\n    description: t.stockDetail.quickAnalysisDesc || 'Single analyst quick recommendation, suitable for time-sensitive scenarios',\n    icon: '🚀'\n  }\n]\n\n// 默认规则配置\nconst DEFAULT_RULES: Record<string, ModeRules> = {\n  parallel: {\n    maxTime: 300,\n    maxRounds: 1,\n    managerCanInterrupt: false,\n    requireDataCollection: false\n  },\n  realtime_debate: {\n    maxTime: 600,\n    maxRounds: 5,\n    managerCanInterrupt: true,\n    requireDataCollection: true\n  },\n  quick_analysis: {\n    maxTime: 60,\n    maxRounds: 1,\n    managerCanInterrupt: false,\n    requireDataCollection: false\n  }\n}\n\ninterface DebateConfigProps {\n  selectedMode: string\n  onModeChange: (mode: string) => void\n  rules?: ModeRules\n  onRulesChange?: (rules: ModeRules) => void\n  disabled?: boolean\n  compact?: boolean\n}\n\nexport const DebateConfig: React.FC<DebateConfigProps> = ({\n  selectedMode,\n  onModeChange,\n  rules,\n  onRulesChange,\n  disabled = false,\n  compact = false\n}) => {\n  const t = useGlobalI18n()\n  const DEBATE_MODES = getDebateModes(t)\n  const [showAdvanced, setShowAdvanced] = useState(false)\n  const [localRules, setLocalRules] = useState<ModeRules>(\n    rules || DEFAULT_RULES[selectedMode] || DEFAULT_RULES.parallel\n  )\n\n  useEffect(() => {\n    // 模式切换时重置规则为默认值\n    setLocalRules(DEFAULT_RULES[selectedMode] || DEFAULT_RULES.parallel)\n  }, [selectedMode])\n\n  const handleRuleChange = (key: keyof ModeRules, value: number | boolean) => {\n    const newRules = { ...localRules, [key]: value }\n    setLocalRules(newRules)\n    onRulesChange?.(newRules)\n  }\n\n  const getModeIcon = (mode: DebateMode) => {\n    switch (mode.id) {\n      case 'parallel':\n        return <Zap className=\"w-5 h-5 text-yellow-500\" />\n      case 'realtime_debate':\n        return <Theater className=\"w-5 h-5 text-purple-500\" />\n      case 'quick_analysis':\n        return <Rocket className=\"w-5 h-5 text-blue-500\" />\n      default:\n        return <Settings className=\"w-5 h-5 text-gray-500\" />\n    }\n  }\n\n  const selectedModeData = DEBATE_MODES.find(m => m.id === selectedMode)\n\n  if (compact) {\n    return (\n      <div className=\"flex items-center gap-2\">\n        <label className=\"text-sm text-gray-500\">{t.stockDetail.analysisMode}:</label>\n        <select\n          value={selectedMode}\n          onChange={(e) => onModeChange(e.target.value)}\n          disabled={disabled}\n          className=\"text-sm border border-gray-200 rounded-md px-2 py-1 bg-white focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:bg-gray-100 disabled:cursor-not-allowed\"\n        >\n          {DEBATE_MODES.map((mode) => (\n            <option key={mode.id} value={mode.id}>\n              {mode.icon} {mode.name}\n            </option>\n          ))}\n        </select>\n      </div>\n    )\n  }\n\n  return (\n    <div className=\"bg-white rounded-xl border border-gray-200 overflow-hidden\">\n      {/* 模式选择 */}\n      <div className=\"p-4 border-b border-gray-100\">\n        <div className=\"flex items-center gap-2 mb-3\">\n          <Settings className=\"w-5 h-5 text-gray-600\" />\n          <h3 className=\"font-semibold text-gray-800\">{t.stockDetail.analysisModeConfig || t.stockDetail.analysisMode}</h3>\n        </div>\n        \n        <div className=\"grid grid-cols-1 md:grid-cols-3 gap-3\">\n          {DEBATE_MODES.map((mode) => (\n            <button\n              key={mode.id}\n              onClick={() => onModeChange(mode.id)}\n              disabled={disabled}\n              className={`\n                relative p-4 rounded-lg border-2 transition-all text-left\n                ${selectedMode === mode.id\n                  ? 'border-blue-500 bg-blue-50'\n                  : 'border-gray-200 hover:border-gray-300 hover:bg-gray-50'\n                }\n                ${disabled ? 'opacity-50 cursor-not-allowed' : 'cursor-pointer'}\n              `}\n            >\n              {mode.isDefault && (\n                <span className=\"absolute top-2 right-2 text-xs bg-blue-100 text-blue-600 px-2 py-0.5 rounded-full\">\n                  {t.stockDetail.default || 'Default'}\n                </span>\n              )}\n              <div className=\"flex items-center gap-2 mb-2\">\n                {getModeIcon(mode)}\n                <span className=\"font-medium text-gray-800\">{mode.name}</span>\n              </div>\n              <p className=\"text-xs text-gray-500 line-clamp-2\">\n                {mode.description}\n              </p>\n            </button>\n          ))}\n        </div>\n      </div>\n\n      {/* 模式说明 */}\n      {selectedModeData && (\n        <div className=\"p-4 bg-gray-50 border-b border-gray-100\">\n          <div className=\"flex items-start gap-3\">\n            <div className=\"p-2 bg-white rounded-lg shadow-sm\">\n              {getModeIcon(selectedModeData)}\n            </div>\n            <div className=\"flex-1\">\n              <h4 className=\"font-medium text-gray-800 mb-1\">\n                {selectedModeData.name}\n              </h4>\n              <p className=\"text-sm text-gray-600\">\n                {selectedModeData.description}\n              </p>\n              \n              {/* 模式特性标签 */}\n              <div className=\"flex flex-wrap gap-2 mt-3\">\n                {selectedMode === 'parallel' && (\n                  <>\n                    <span className=\"inline-flex items-center gap-1 text-xs bg-yellow-100 text-yellow-700 px-2 py-1 rounded-full\">\n                      <Zap className=\"w-3 h-3\" /> {t.stockDetail.parallelExecution || 'Parallel Execution'}\n                    </span>\n                    <span className=\"inline-flex items-center gap-1 text-xs bg-green-100 text-green-700 px-2 py-1 rounded-full\">\n                      <Clock className=\"w-3 h-3\" /> {t.stockDetail.about2to3min || '~2-3 min'}\n                    </span>\n                  </>\n                )}\n                {selectedMode === 'realtime_debate' && (\n                  <>\n                    <span className=\"inline-flex items-center gap-1 text-xs bg-purple-100 text-purple-700 px-2 py-1 rounded-full\">\n                      <MessageSquare className=\"w-3 h-3\" /> {t.stockDetail.realtimeDialogue || 'Real-time Dialogue'}\n                    </span>\n                    <span className=\"inline-flex items-center gap-1 text-xs bg-orange-100 text-orange-700 px-2 py-1 rounded-full\">\n                      <Users className=\"w-3 h-3\" /> {t.stockDetail.fourAgents || '4 Agents'}\n                    </span>\n                    <span className=\"inline-flex items-center gap-1 text-xs bg-green-100 text-green-700 px-2 py-1 rounded-full\">\n                      <Clock className=\"w-3 h-3\" /> {t.stockDetail.about5to10min || '~5-10 min'}\n                    </span>\n                  </>\n                )}\n                {selectedMode === 'quick_analysis' && (\n                  <>\n                    <span className=\"inline-flex items-center gap-1 text-xs bg-blue-100 text-blue-700 px-2 py-1 rounded-full\">\n                      <Rocket className=\"w-3 h-3\" /> {t.stockDetail.singleAgent || 'Single Agent'}\n                    </span>\n                    <span className=\"inline-flex items-center gap-1 text-xs bg-green-100 text-green-700 px-2 py-1 rounded-full\">\n                      <Clock className=\"w-3 h-3\" /> {t.stockDetail.about1min || '~1 min'}\n                    </span>\n                  </>\n                )}\n              </div>\n            </div>\n          </div>\n        </div>\n      )}\n\n      {/* 高级配置 */}\n      <div className=\"border-t border-gray-100\">\n        <button\n          onClick={() => setShowAdvanced(!showAdvanced)}\n          disabled={disabled}\n          className=\"w-full p-3 flex items-center justify-between text-sm text-gray-600 hover:bg-gray-50 transition-colors disabled:cursor-not-allowed\"\n        >\n          <span className=\"flex items-center gap-2\">\n            <Info className=\"w-4 h-4\" />\n            {t.stockDetail.advancedConfig || 'Advanced Config'}\n          </span>\n          {showAdvanced ? (\n            <ChevronUp className=\"w-4 h-4\" />\n          ) : (\n            <ChevronDown className=\"w-4 h-4\" />\n          )}\n        </button>\n\n        {showAdvanced && (\n          <div className=\"p-4 border-t border-gray-100 bg-gray-50 space-y-4\">\n            {/* 最大时间 */}\n            <div className=\"flex items-center justify-between\">\n              <label className=\"text-sm text-gray-600\">{t.stockDetail.maxExecutionTime || 'Max Execution Time'}</label>\n              <div className=\"flex items-center gap-2\">\n                <input\n                  type=\"number\"\n                  value={localRules.maxTime}\n                  onChange={(e) => handleRuleChange('maxTime', parseInt(e.target.value) || 300)}\n                  disabled={disabled}\n                  min={60}\n                  max={1800}\n                  step={60}\n                  className=\"w-20 text-sm border border-gray-200 rounded px-2 py-1 text-right disabled:bg-gray-100\"\n                />\n                <span className=\"text-sm text-gray-500\">{t.stockDetail.seconds || 's'}</span>\n              </div>\n            </div>\n\n            {/* 实时辩论模式专属配置 */}\n            {selectedMode === 'realtime_debate' && (\n              <>\n                <div className=\"flex items-center justify-between\">\n                  <label className=\"text-sm text-gray-600\">{t.stockDetail.maxDebateRounds || 'Max Debate Rounds'}</label>\n                  <div className=\"flex items-center gap-2\">\n                    <input\n                      type=\"number\"\n                      value={localRules.maxRounds || 5}\n                      onChange={(e) => handleRuleChange('maxRounds', parseInt(e.target.value) || 5)}\n                      disabled={disabled}\n                      min={1}\n                      max={10}\n                      className=\"w-20 text-sm border border-gray-200 rounded px-2 py-1 text-right disabled:bg-gray-100\"\n                    />\n                    <span className=\"text-sm text-gray-500\">{t.stockDetail.rounds || 'rounds'}</span>\n                  </div>\n                </div>\n\n                <div className=\"flex items-center justify-between\">\n                  <label className=\"text-sm text-gray-600\">{t.stockDetail.managerCanInterrupt || 'Manager Can Interrupt'}</label>\n                  <input\n                    type=\"checkbox\"\n                    checked={localRules.managerCanInterrupt || false}\n                    onChange={(e) => handleRuleChange('managerCanInterrupt', e.target.checked)}\n                    disabled={disabled}\n                    className=\"w-4 h-4 text-blue-600 border-gray-300 rounded focus:ring-blue-500 disabled:cursor-not-allowed\"\n                  />\n                </div>\n\n                <div className=\"flex items-center justify-between\">\n                  <label className=\"text-sm text-gray-600\">{t.stockDetail.collectDataBeforeDebate || 'Collect Data Before Debate'}</label>\n                  <input\n                    type=\"checkbox\"\n                    checked={localRules.requireDataCollection || false}\n                    onChange={(e) => handleRuleChange('requireDataCollection', e.target.checked)}\n                    disabled={disabled}\n                    className=\"w-4 h-4 text-blue-600 border-gray-300 rounded focus:ring-blue-500 disabled:cursor-not-allowed\"\n                  />\n                </div>\n              </>\n            )}\n          </div>\n        )}\n      </div>\n    </div>\n  )\n}\n\n// 辩论模式选择器（简化版本，用于其他页面）\nexport const DebateModeSelector: React.FC<{\n  value: string\n  onChange: (value: string) => void\n  disabled?: boolean\n}> = ({ value, onChange, disabled }) => {\n  const t = useGlobalI18n()\n  const DEBATE_MODES = getDebateModes(t)\n  return (\n    <div className=\"flex gap-2\">\n      {DEBATE_MODES.map((mode) => (\n        <button\n          key={mode.id}\n          onClick={() => onChange(mode.id)}\n          disabled={disabled}\n          className={`\n            px-3 py-1.5 rounded-lg text-sm font-medium transition-all\n            ${value === mode.id\n              ? 'bg-blue-100 text-blue-700 border border-blue-300'\n              : 'bg-gray-100 text-gray-600 border border-transparent hover:bg-gray-200'\n            }\n            ${disabled ? 'opacity-50 cursor-not-allowed' : 'cursor-pointer'}\n          `}\n          title={mode.description}\n        >\n          <span className=\"mr-1\">{mode.icon}</span>\n          {mode.name}\n        </button>\n      ))}\n    </div>\n  )\n}\n\nexport default DebateConfig\n\n"
  },
  {
    "path": "frontend/src/components/DebateHistorySidebar.tsx",
    "content": "import React, { useState, useMemo } from 'react'\nimport { \n  History, \n  Trash2, \n  MessageSquare, \n  Clock, \n  PlayCircle,\n  Swords,\n  Zap,\n  Activity,\n  X,\n  Search,\n  Calendar\n} from 'lucide-react'\nimport { Button } from '@/components/ui/button'\nimport { cn } from '@/lib/utils'\nimport type { DebateSession } from '@/store/useDebateStore'\nimport { useGlobalI18n } from '@/store/useLanguageStore'\n\ninterface DebateHistorySidebarProps {\n  sessions: DebateSession[]\n  currentSessionId?: string | null\n  onLoadSession: (session: DebateSession) => void\n  onDeleteSession?: (sessionId: string) => void\n  onClearHistory?: () => void\n  isOpen: boolean\n  onToggle: () => void\n}\n\n// 获取模式图标和样式（支持国际化）\nconst getModeInfo = (mode: string, t: any) => {\n  switch (mode) {\n    case 'parallel':\n      return {\n        icon: <Zap className=\"w-3.5 h-3.5\" />,\n        label: t.stockDetail.parallelAnalysis,\n        color: 'text-amber-600',\n        bgColor: 'bg-amber-50'\n      }\n    case 'realtime_debate':\n      return {\n        icon: <Swords className=\"w-3.5 h-3.5\" />,\n        label: t.stockDetail.realtimeDebate,\n        color: 'text-purple-600',\n        bgColor: 'bg-purple-50'\n      }\n    case 'quick_analysis':\n      return {\n        icon: <Activity className=\"w-3.5 h-3.5\" />,\n        label: t.stockDetail.quickAnalysis || 'Quick Analysis',\n        color: 'text-blue-600',\n        bgColor: 'bg-blue-50'\n      }\n    default:\n      return {\n        icon: <MessageSquare className=\"w-3.5 h-3.5\" />,\n        label: t.stockDetail.bullBear || 'Debate',\n        color: 'text-gray-600',\n        bgColor: 'bg-gray-50'\n      }\n  }\n}\n\n// 格式化时间（支持国际化）\nconst formatTime = (date: Date, t: any) => {\n  const now = new Date()\n  const diff = now.getTime() - date.getTime()\n  const minutes = Math.floor(diff / 60000)\n  const hours = Math.floor(diff / 3600000)\n  const days = Math.floor(diff / 86400000)\n\n  if (minutes < 1) return t.debateHistory.justNow\n  if (minutes < 60) return `${minutes}${t.debateHistory.minutesAgo}`\n  if (hours < 24) return `${hours}${t.debateHistory.hoursAgo}`\n  if (days < 7) return `${days}${t.debateHistory.daysAgo}`\n  \n  return date.toLocaleDateString(t.debateHistory.justNow === '刚刚' ? 'zh-CN' : 'en-US', {\n    month: 'short',\n    day: 'numeric'\n  })\n}\n\n// 会话预览内容（支持国际化）\nconst getSessionPreview = (session: DebateSession, t: any) => {\n  if (session.messages.length === 0) {\n    return t.debateHistory.noMessages\n  }\n  \n  // 获取最后一条非系统消息\n  const lastMessage = [...session.messages]\n    .reverse()\n    .find(m => m.role !== 'system')\n  \n  if (lastMessage) {\n    const roleName = t.debateHistory.roleNames[lastMessage.role] || lastMessage.role\n    const content = lastMessage.content.slice(0, 40)\n    return `${roleName}: ${content}${lastMessage.content.length > 40 ? '...' : ''}`\n  }\n  \n  return `${session.messages.length} ${t.debateHistory.messages}`\n}\n\nconst DebateHistorySidebar: React.FC<DebateHistorySidebarProps> = ({\n  sessions,\n  currentSessionId,\n  onLoadSession,\n  onDeleteSession,\n  onClearHistory,\n  isOpen,\n  onToggle\n}) => {\n  const t = useGlobalI18n()\n  const [searchTerm, setSearchTerm] = useState('')\n  const [hoveredId, setHoveredId] = useState<string | null>(null)\n\n  // 过滤会话\n  const filteredSessions = useMemo(() => {\n    if (!searchTerm) return sessions\n    const term = searchTerm.toLowerCase()\n    return sessions.filter(s => \n      s.stockName?.toLowerCase().includes(term) ||\n      s.messages.some(m => m.content.toLowerCase().includes(term))\n    )\n  }, [sessions, searchTerm])\n\n  // 按日期分组\n  const groupedSessions = useMemo(() => {\n    const groups: { label: string; sessions: DebateSession[] }[] = []\n    const today = new Date()\n    today.setHours(0, 0, 0, 0)\n    const yesterday = new Date(today)\n    yesterday.setDate(yesterday.getDate() - 1)\n    const weekAgo = new Date(today)\n    weekAgo.setDate(weekAgo.getDate() - 7)\n\n    const todaySessions: DebateSession[] = []\n    const yesterdaySessions: DebateSession[] = []\n    const thisWeekSessions: DebateSession[] = []\n    const olderSessions: DebateSession[] = []\n\n    filteredSessions.forEach(session => {\n      const sessionDate = new Date(session.updatedAt)\n      sessionDate.setHours(0, 0, 0, 0)\n\n      if (sessionDate.getTime() === today.getTime()) {\n        todaySessions.push(session)\n      } else if (sessionDate.getTime() === yesterday.getTime()) {\n        yesterdaySessions.push(session)\n      } else if (sessionDate > weekAgo) {\n        thisWeekSessions.push(session)\n      } else {\n        olderSessions.push(session)\n      }\n    })\n\n    if (todaySessions.length > 0) groups.push({ label: t.debateHistory.today, sessions: todaySessions })\n    if (yesterdaySessions.length > 0) groups.push({ label: t.debateHistory.yesterday, sessions: yesterdaySessions })\n    if (thisWeekSessions.length > 0) groups.push({ label: t.debateHistory.thisWeek, sessions: thisWeekSessions })\n    if (olderSessions.length > 0) groups.push({ label: t.debateHistory.older, sessions: olderSessions })\n\n    return groups\n  }, [filteredSessions, t])\n\n  return (\n    <>\n      {/* 折叠状态的标签按钮 */}\n      {!isOpen && sessions.length > 0 && (\n        <button\n          onClick={onToggle}\n          className=\"fixed right-0 top-1/2 -translate-y-1/2 z-40 bg-white shadow-lg rounded-l-lg px-2 py-4 border border-r-0 border-gray-200 hover:bg-gray-50 transition-colors group\"\n          title={t.debateHistory.expandHistory}\n        >\n          <div className=\"flex flex-col items-center gap-2\">\n            <History className=\"w-5 h-5 text-gray-600 group-hover:text-indigo-600\" />\n            <span className=\"text-xs font-medium text-gray-600 writing-vertical group-hover:text-indigo-600\">\n              {t.debateHistory.history}\n            </span>\n            <span className=\"text-xs bg-indigo-100 text-indigo-600 rounded-full w-5 h-5 flex items-center justify-center\">\n              {sessions.length}\n            </span>\n          </div>\n        </button>\n      )}\n\n      {/* 侧边栏面板 */}\n      <div\n        className={cn(\n          \"fixed right-0 top-0 h-full bg-white shadow-2xl border-l border-gray-200 z-50 transition-transform duration-300 ease-in-out flex flex-col\",\n          isOpen ? \"translate-x-0\" : \"translate-x-full\",\n          \"w-80\"\n        )}\n      >\n        {/* 头部 */}\n        <div className=\"flex items-center justify-between px-4 py-3 border-b border-gray-100 bg-gradient-to-r from-indigo-50 to-purple-50\">\n          <div className=\"flex items-center gap-2\">\n            <div className=\"w-8 h-8 rounded-full bg-indigo-100 flex items-center justify-center\">\n              <History className=\"w-4 h-4 text-indigo-600\" />\n            </div>\n            <div>\n              <h3 className=\"font-semibold text-gray-900 text-sm\">{t.debateHistory.history}</h3>\n              <p className=\"text-xs text-gray-500\">{sessions.length} {t.stockDetail.session}</p>\n            </div>\n          </div>\n          <Button\n            variant=\"ghost\"\n            size=\"icon\"\n            onClick={onToggle}\n            className=\"h-8 w-8 text-gray-500 hover:text-gray-700\"\n          >\n            <X className=\"w-4 h-4\" />\n          </Button>\n        </div>\n\n        {/* 搜索框 */}\n        <div className=\"px-3 py-2 border-b border-gray-100\">\n          <div className=\"relative\">\n            <Search className=\"absolute left-3 top-1/2 -translate-y-1/2 w-4 h-4 text-gray-400\" />\n            <input\n              type=\"text\"\n              value={searchTerm}\n              onChange={(e) => setSearchTerm(e.target.value)}\n              placeholder={t.debateHistory.searchPlaceholder}\n              className=\"w-full pl-9 pr-3 py-2 text-sm border border-gray-200 rounded-lg focus:outline-none focus:ring-2 focus:ring-indigo-200 focus:border-indigo-300\"\n            />\n          </div>\n        </div>\n\n        {/* 会话列表 */}\n        <div className=\"flex-1 overflow-y-auto\">\n          {groupedSessions.length === 0 ? (\n            <div className=\"flex flex-col items-center justify-center h-full text-gray-400 px-4\">\n              <History className=\"w-12 h-12 mb-3 opacity-50\" />\n              <p className=\"text-sm text-center\">\n                {searchTerm ? t.debateHistory.noMatchingRecords : t.debateHistory.noHistoryYet}\n              </p>\n              <p className=\"text-xs mt-1 text-center\">\n                {searchTerm ? t.debateHistory.tryOtherKeywords : t.debateHistory.historyAutoSave}\n              </p>\n            </div>\n          ) : (\n            <div className=\"py-2\">\n              {groupedSessions.map(group => (\n                <div key={group.label} className=\"mb-4\">\n                  <div className=\"px-4 py-1.5 text-xs font-medium text-gray-500 uppercase tracking-wider flex items-center gap-2\">\n                    <Calendar className=\"w-3 h-3\" />\n                    {group.label}\n                  </div>\n                  {group.sessions.map(session => {\n                    const modeInfo = getModeInfo(session.mode, t)\n                    const isActive = session.id === currentSessionId\n                    const isHovered = session.id === hoveredId\n                    \n                    return (\n                      <div\n                        key={session.id}\n                        className={cn(\n                          \"relative px-3 py-2 mx-2 rounded-lg cursor-pointer transition-all duration-200\",\n                          isActive \n                            ? \"bg-indigo-50 border border-indigo-200\" \n                            : \"hover:bg-gray-50 border border-transparent\"\n                        )}\n                        onMouseEnter={() => setHoveredId(session.id)}\n                        onMouseLeave={() => setHoveredId(null)}\n                        onClick={() => onLoadSession(session)}\n                      >\n                        <div className=\"flex items-start gap-3\">\n                          {/* 模式图标 */}\n                          <div className={cn(\n                            \"w-8 h-8 rounded-lg flex items-center justify-center flex-shrink-0 mt-0.5\",\n                            modeInfo.bgColor,\n                            modeInfo.color\n                          )}>\n                            {modeInfo.icon}\n                          </div>\n                          \n                          {/* 会话信息 */}\n                          <div className=\"flex-1 min-w-0\">\n                            <div className=\"flex items-center gap-2\">\n                              <span className={cn(\n                                \"text-sm font-medium truncate\",\n                                isActive ? \"text-indigo-700\" : \"text-gray-700\"\n                              )}>\n                                {session.stockName || session.stockCode}\n                              </span>\n                              <span className={cn(\n                                \"text-[10px] px-1.5 py-0.5 rounded\",\n                                modeInfo.bgColor,\n                                modeInfo.color\n                              )}>\n                                {modeInfo.label}\n                              </span>\n                            </div>\n                            \n                            <p className=\"text-xs text-gray-500 mt-0.5 truncate\">\n                              {getSessionPreview(session, t)}\n                            </p>\n                            \n                            <div className=\"flex items-center gap-2 mt-1.5 text-[10px] text-gray-400\">\n                              <span className=\"flex items-center gap-1\">\n                                <MessageSquare className=\"w-3 h-3\" />\n                                {session.messages.length}\n                              </span>\n                              <span>·</span>\n                              <span className=\"flex items-center gap-1\">\n                                <Clock className=\"w-3 h-3\" />\n                                {formatTime(new Date(session.updatedAt), t)}\n                              </span>\n                            </div>\n                          </div>\n                          \n                          {/* 操作按钮 */}\n                          <div className={cn(\n                            \"flex items-center gap-1 transition-opacity\",\n                            isHovered || isActive ? \"opacity-100\" : \"opacity-0\"\n                          )}>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"icon\"\n                              className=\"h-6 w-6 text-indigo-500 hover:text-indigo-600 hover:bg-indigo-100\"\n                              onClick={(e) => {\n                                e.stopPropagation()\n                                onLoadSession(session)\n                              }}\n                              title={t.debateHistory.continueDebate}\n                            >\n                              <PlayCircle className=\"w-3.5 h-3.5\" />\n                            </Button>\n                            {onDeleteSession && (\n                              <Button\n                                variant=\"ghost\"\n                                size=\"icon\"\n                                className=\"h-6 w-6 text-rose-400 hover:text-rose-500 hover:bg-rose-50\"\n                                onClick={(e) => {\n                                  e.stopPropagation()\n                                  if (confirm(t.stockDetail.deleteSessionConfirm)) {\n                                    onDeleteSession(session.id)\n                                  }\n                                }}\n                                title={t.debateHistory.delete}\n                              >\n                                <Trash2 className=\"w-3.5 h-3.5\" />\n                              </Button>\n                            )}\n                          </div>\n                        </div>\n                        \n                        {/* 活跃指示器 */}\n                        {isActive && (\n                          <div className=\"absolute left-0 top-1/2 -translate-y-1/2 w-1 h-8 bg-indigo-500 rounded-r\" />\n                        )}\n                      </div>\n                    )\n                  })}\n                </div>\n              ))}\n            </div>\n          )}\n        </div>\n\n        {/* 底部操作 */}\n        {sessions.length > 0 && onClearHistory && (\n          <div className=\"px-4 py-3 border-t border-gray-100 bg-gray-50\">\n            <Button\n              variant=\"outline\"\n              size=\"sm\"\n              onClick={() => {\n                if (confirm(t.stockDetail.clearAllHistoryConfirm)) {\n                  onClearHistory()\n                }\n              }}\n              className=\"w-full text-rose-500 border-rose-200 hover:bg-rose-50 hover:text-rose-600\"\n            >\n              <Trash2 className=\"w-3.5 h-3.5 mr-2\" />\n              {t.stockDetail.clearAllRecords}\n            </Button>\n          </div>\n        )}\n      </div>\n\n      {/* 遮罩层 */}\n      {isOpen && (\n        <div\n          className=\"fixed inset-0 bg-black/20 z-40 transition-opacity\"\n          onClick={onToggle}\n        />\n      )}\n    </>\n  )\n}\n\nexport default DebateHistorySidebar\n\n"
  },
  {
    "path": "frontend/src/components/HighlightText.tsx",
    "content": "import React from 'react'\n\ninterface HighlightTextProps {\n  text: string\n  highlight: string\n  className?: string\n}\n\n/**\n * HighlightText 组件\n * \n * 用于在文本中高亮显示指定的关键词\n * \n * @param text - 原始文本\n * @param highlight - 需要高亮的关键词\n * @param className - 应用到容器的 CSS 类名\n * \n * @example\n * <HighlightText \n *   text=\"贵州茅台股价上涨\" \n *   highlight=\"茅台\" \n *   className=\"text-sm\"\n * />\n */\nexport default function HighlightText({ text, highlight, className = '' }: HighlightTextProps) {\n  // 如果没有高亮词，直接返回原文本\n  if (!highlight || !highlight.trim()) {\n    return <span className={className}>{text}</span>\n  }\n\n  // 转义特殊正则字符，避免正则表达式错误\n  const escapeRegExp = (str: string) => {\n    return str.replace(/[.*+?^${}()|[\\]\\\\]/g, '\\\\$&')\n  }\n\n  try {\n    // 使用正则表达式分割文本，保留匹配部分\n    const escapedHighlight = escapeRegExp(highlight.trim())\n    const parts = text.split(new RegExp(`(${escapedHighlight})`, 'gi'))\n\n    return (\n      <span className={className}>\n        {parts.map((part, index) => {\n          // 判断是否为匹配的关键词（不区分大小写）\n          const isMatch = part.toLowerCase() === highlight.toLowerCase()\n          \n          return isMatch ? (\n            <mark \n              key={index} \n              className=\"bg-yellow-200 text-gray-900 font-semibold px-0.5 rounded\"\n            >\n              {part}\n            </mark>\n          ) : (\n            <React.Fragment key={index}>{part}</React.Fragment>\n          )\n        })}\n      </span>\n    )\n  } catch (error) {\n    // 如果正则表达式出错，返回原文本\n    console.error('HighlightText error:', error)\n    return <span className={className}>{text}</span>\n  }\n}\n\n"
  },
  {
    "path": "frontend/src/components/KLineChart.tsx",
    "content": "/**\n * KLineChart 组件\n * 使用 klinecharts 库展示专业的 K 线图\n * 支持：蜡烛图、成交量、MA均线、MACD等\n */\nimport { useEffect, useRef, useCallback, useState } from 'react'\nimport { init, dispose, registerLocale } from 'klinecharts'\nimport type { Chart } from 'klinecharts'\nimport type { KLineDataPoint } from '@/types/api'\nimport { cn } from '@/lib/utils'\nimport { useLanguageStore } from '@/store/useLanguageStore'\n\n// 注册语言包（使用动态语言）\nconst registerKLineLocales = () => {\n  const { lang } = useLanguageStore.getState();\n  const t = globalI18n[lang];\n  \n  registerLocale('zh-CN', {\n    time: `${t.stockDetail.timeLabel}：`,\n    open: `${t.stockDetail.openLabel}：`,\n    high: `${t.stockDetail.highLabel}：`,\n    low: `${t.stockDetail.lowLabel}：`,\n    close: `${t.stockDetail.closeLabel}：`,\n    volume: `${t.stockDetail.volumeLabel}：`,\n    turnover: '额：',\n    change: '涨跌：',\n  })\n\n  registerLocale('en-US', {\n    time: `${t.stockDetail.timeLabel}: `,\n    open: `${t.stockDetail.openLabel}: `,\n    high: `${t.stockDetail.highLabel}: `,\n    low: `${t.stockDetail.lowLabel}: `,\n    close: `${t.stockDetail.closeLabel}: `,\n    volume: `${t.stockDetail.volumeLabel}: `,\n    turnover: 'Turnover: ',\n    change: 'Change: ',\n  })\n}\n\n// 初始化注册\nregisterLocale('zh-CN', {\n  time: '时间：',\n  open: '开：',\n  high: '高：',\n  low: '低：',\n  close: '收：',\n  volume: '量：',\n  turnover: '额：',\n  change: '涨跌：',\n})\n\nregisterLocale('en-US', {\n  time: 'Time: ',\n  open: 'Open: ',\n  high: 'High: ',\n  low: 'Low: ',\n  close: 'Close: ',\n  volume: 'Volume: ',\n  turnover: 'Turnover: ',\n  change: 'Change: ',\n})\n\ninterface KLineChartProps {\n  data: KLineDataPoint[]\n  height?: number\n  className?: string\n  showVolume?: boolean\n  showMA?: boolean\n  showMACD?: boolean\n  theme?: 'light' | 'dark'\n  period?: 'daily' | '1m' | '5m' | '15m' | '30m' | '60m'  // 添加周期参数\n}\n\nexport default function KLineChart({\n  data,\n  height = 500,\n  className,\n  showVolume = true,\n  showMA = true,\n  showMACD = false,\n  theme = 'light',\n  period = 'daily',\n}: KLineChartProps) {\n  const { lang } = useLanguageStore()\n  const containerRef = useRef<HTMLDivElement>(null)\n  const chartRef = useRef<Chart | null>(null)\n  const [isInitialized, setIsInitialized] = useState(false)\n\n  // 转换数据格式 - klinecharts 需要的格式\n  const formatData = useCallback((rawData: KLineDataPoint[]) => {\n    return rawData.map((item) => ({\n      timestamp: item.timestamp,\n      open: item.open,\n      high: item.high,\n      low: item.low,\n      close: item.close,\n      volume: item.volume,\n      turnover: item.turnover,\n    }))\n  }, [])\n\n  // 初始化图表\n  useEffect(() => {\n    if (!containerRef.current) return\n\n    // 重置初始化状态\n    setIsInitialized(false)\n\n    // 销毁旧图表\n    if (chartRef.current) {\n      dispose(chartRef.current)\n      chartRef.current = null\n    }\n\n    // 中国 A 股风格样式：红涨绿跌\n    const styles = {\n      grid: {\n        show: true,\n        horizontal: {\n          show: true,\n          size: 1,\n          color: theme === 'dark' ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.06)',\n          style: 'dashed' as const,\n        },\n        vertical: {\n          show: true,\n          size: 1,\n          color: theme === 'dark' ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.06)',\n          style: 'dashed' as const,\n        },\n      },\n      candle: {\n        type: 'candle_solid' as const,\n        bar: {\n          upColor: '#EF5350',      // 红色涨\n          downColor: '#26A69A',    // 绿色跌\n          noChangeColor: '#888888',\n          upBorderColor: '#EF5350',\n          downBorderColor: '#26A69A',\n          noChangeBorderColor: '#888888',\n          upWickColor: '#EF5350',\n          downWickColor: '#26A69A',\n          noChangeWickColor: '#888888',\n        },\n        priceMark: {\n          show: true,\n          high: {\n            show: true,\n            color: theme === 'dark' ? '#D9D9D9' : '#333333',\n            textOffset: 5,\n            textSize: 10,\n            textFamily: 'Helvetica Neue',\n            textWeight: 'normal',\n          },\n          low: {\n            show: true,\n            color: theme === 'dark' ? '#D9D9D9' : '#333333',\n            textOffset: 5,\n            textSize: 10,\n            textFamily: 'Helvetica Neue',\n            textWeight: 'normal',\n          },\n          last: {\n            show: true,\n            upColor: '#EF5350',\n            downColor: '#26A69A',\n            noChangeColor: '#888888',\n            line: {\n              show: true,\n              style: 'dashed' as const,\n              dashedValue: [4, 4],\n              size: 1,\n            },\n            text: {\n              show: true,\n              style: 'fill' as const,\n              size: 12,\n              paddingLeft: 4,\n              paddingTop: 4,\n              paddingRight: 4,\n              paddingBottom: 4,\n              borderColor: 'transparent',\n              borderSize: 0,\n              borderRadius: 2,\n              color: '#FFFFFF',\n              family: 'Helvetica Neue',\n              weight: 'normal',\n            },\n          },\n        },\n        tooltip: {\n          showRule: 'always' as const,\n          showType: 'standard' as const,\n        },\n      },\n      indicator: {\n        ohlc: {\n          upColor: '#EF5350',\n          downColor: '#26A69A',\n          noChangeColor: '#888888',\n        },\n        bars: [\n          {\n            style: 'fill' as const,\n            borderStyle: 'solid' as const,\n            borderSize: 1,\n            borderDashedValue: [2, 2],\n            upColor: 'rgba(239, 83, 80, 0.7)',\n            downColor: 'rgba(38, 166, 154, 0.7)',\n            noChangeColor: '#888888',\n          },\n        ],\n        lines: [\n          { style: 'solid' as const, smooth: false, size: 1, dashedValue: [2, 2], color: '#FF9600' },\n          { style: 'solid' as const, smooth: false, size: 1, dashedValue: [2, 2], color: '#9D65C9' },\n          { style: 'solid' as const, smooth: false, size: 1, dashedValue: [2, 2], color: '#2196F3' },\n          { style: 'solid' as const, smooth: false, size: 1, dashedValue: [2, 2], color: '#E91E63' },\n          { style: 'solid' as const, smooth: false, size: 1, dashedValue: [2, 2], color: '#00BCD4' },\n        ],\n      },\n      xAxis: {\n        show: true,\n        size: 'auto' as const,\n        axisLine: {\n          show: true,\n          color: theme === 'dark' ? 'rgba(255,255,255,0.15)' : 'rgba(0,0,0,0.1)',\n          size: 1,\n        },\n        tickText: {\n          show: true,\n          color: theme === 'dark' ? '#D9D9D9' : '#666666',\n          family: 'Helvetica Neue',\n          weight: 'normal',\n          size: 11,\n        },\n        tickLine: {\n          show: true,\n          size: 1,\n          length: 3,\n          color: theme === 'dark' ? 'rgba(255,255,255,0.15)' : 'rgba(0,0,0,0.1)',\n        },\n      },\n      yAxis: {\n        show: true,\n        size: 'auto' as const,\n        position: 'right' as const,\n        type: 'normal' as const,\n        inside: false,\n        reverse: false,\n        axisLine: {\n          show: true,\n          color: theme === 'dark' ? 'rgba(255,255,255,0.15)' : 'rgba(0,0,0,0.1)',\n          size: 1,\n        },\n        tickText: {\n          show: true,\n          color: theme === 'dark' ? '#D9D9D9' : '#666666',\n          family: 'Helvetica Neue',\n          weight: 'normal',\n          size: 11,\n        },\n        tickLine: {\n          show: true,\n          size: 1,\n          length: 3,\n          color: theme === 'dark' ? 'rgba(255,255,255,0.15)' : 'rgba(0,0,0,0.1)',\n        },\n      },\n      crosshair: {\n        show: true,\n        horizontal: {\n          show: true,\n          line: {\n            show: true,\n            style: 'dashed' as const,\n            dashedValue: [4, 2],\n            size: 1,\n            color: theme === 'dark' ? 'rgba(255,255,255,0.3)' : 'rgba(0,0,0,0.2)',\n          },\n          text: {\n            show: true,\n            style: 'fill' as const,\n            color: '#FFFFFF',\n            size: 12,\n            family: 'Helvetica Neue',\n            weight: 'normal',\n            borderStyle: 'solid' as const,\n            borderDashedValue: [2, 2],\n            borderSize: 1,\n            borderColor: theme === 'dark' ? 'rgba(255,255,255,0.15)' : 'rgba(0,0,0,0.1)',\n            borderRadius: 2,\n            paddingLeft: 4,\n            paddingRight: 4,\n            paddingTop: 2,\n            paddingBottom: 2,\n            backgroundColor: theme === 'dark' ? 'rgba(35,35,35,0.95)' : 'rgba(50,50,50,0.9)',\n          },\n        },\n        vertical: {\n          show: true,\n          line: {\n            show: true,\n            style: 'dashed' as const,\n            dashedValue: [4, 2],\n            size: 1,\n            color: theme === 'dark' ? 'rgba(255,255,255,0.3)' : 'rgba(0,0,0,0.2)',\n          },\n          text: {\n            show: true,\n            style: 'fill' as const,\n            color: '#FFFFFF',\n            size: 12,\n            family: 'Helvetica Neue',\n            weight: 'normal',\n            borderStyle: 'solid' as const,\n            borderDashedValue: [2, 2],\n            borderSize: 1,\n            borderColor: theme === 'dark' ? 'rgba(255,255,255,0.15)' : 'rgba(0,0,0,0.1)',\n            borderRadius: 2,\n            paddingLeft: 4,\n            paddingRight: 4,\n            paddingTop: 2,\n            paddingBottom: 2,\n            backgroundColor: theme === 'dark' ? 'rgba(35,35,35,0.95)' : 'rgba(50,50,50,0.9)',\n          },\n        },\n      },\n    }\n\n    // 创建图表\n    const chart = init(containerRef.current, {\n      locale: lang === 'zh' ? 'zh-CN' : 'en-US',\n      styles,\n    })\n\n    if (chart) {\n      chartRef.current = chart\n      \n      // 设置自定义时间格式化\n      chart.setCustomApi({\n        formatDate: (dateTimeFormat: any, timestamp: number, format: string, type: number) => {\n          const date = new Date(timestamp)\n          \n          // 日线：只显示日期\n          if (period === 'daily') {\n            const year = date.getFullYear()\n            const month = String(date.getMonth() + 1).padStart(2, '0')\n            const day = String(date.getDate()).padStart(2, '0')\n            return `${month}-${day}`  // 简化为月-日\n          }\n          \n          // 分钟线：显示月-日 时:分\n          const month = String(date.getMonth() + 1).padStart(2, '0')\n          const day = String(date.getDate()).padStart(2, '0')\n          const hours = String(date.getHours()).padStart(2, '0')\n          const minutes = String(date.getMinutes()).padStart(2, '0')\n          return `${month}-${day} ${hours}:${minutes}`\n        },\n      })\n      \n      // 设置右侧留白为最小，让 K 线尽量占满\n      chart.setOffsetRightDistance(20)\n      \n      // 先添加 MA 均线到主图（蜡烛图上叠加）\n      if (showMA) {\n        chart.createIndicator('MA', false, { id: 'candle_pane' })\n      }\n\n      // 添加成交量指标 - 在独立的副图面板\n      if (showVolume) {\n        chart.createIndicator('VOL')\n      }\n\n      // 添加 MACD 指标 - 在独立的副图面板\n      if (showMACD) {\n        chart.createIndicator('MACD')\n      }\n\n      // 如果有数据，立即应用\n      if (data && data.length > 0) {\n        try {\n          const formattedData = formatData(data)\n          chart.applyNewData(formattedData)\n        } catch (error) {\n          console.error('Failed to apply initial chart data:', error)\n        }\n      }\n\n      setIsInitialized(true)\n    }\n\n    return () => {\n      setIsInitialized(false)\n      if (chartRef.current) {\n        dispose(chartRef.current)\n        chartRef.current = null\n      }\n    }\n  }, [theme, showVolume, showMA, showMACD, period, lang, data, formatData])\n\n  // 更新数据 - 当图表初始化完成且有数据时应用\n  useEffect(() => {\n    if (!chartRef.current || !isInitialized || !data || data.length === 0) return\n\n    try {\n      const formattedData = formatData(data)\n      chartRef.current.applyNewData(formattedData)\n    } catch (error) {\n      console.error('Failed to apply chart data:', error)\n    }\n  }, [data, isInitialized, formatData])\n\n  return (\n    <div\n      ref={containerRef}\n      className={cn('w-full rounded-lg overflow-hidden bg-white', className)}\n      style={{ height }}\n    />\n  )\n}\n\n// 简化版迷你 K 线图组件\nexport function MiniKLineChart({\n  data,\n  height = 150,\n  className,\n}: {\n  data: KLineDataPoint[]\n  height?: number\n  className?: string\n}) {\n  return (\n    <KLineChart\n      data={data}\n      height={height}\n      className={className}\n      showVolume={false}\n      showMA={false}\n      showMACD={false}\n      theme=\"light\"\n    />\n  )\n}\n"
  },
  {
    "path": "frontend/src/components/MentionInput.tsx",
    "content": "import React, { useState, useRef, useEffect, useCallback, useMemo } from 'react'\nimport { \n  TrendingUp, \n  TrendingDown, \n  Briefcase, \n  Search, \n  Database, \n  Globe, \n  Chrome,\n  Bot,\n  Hash,\n  X\n} from 'lucide-react'\nimport { cn } from '@/lib/utils'\nimport { useGlobalI18n } from '@/store/useLanguageStore'\n\n// 可提及的目标类型\nexport type MentionType = 'agent' | 'source' | 'stock'\n\nexport interface MentionTarget {\n  type: MentionType\n  id: string\n  label: string\n  description?: string\n  icon: React.ReactNode\n  color: string\n}\n\n// 预定义的智能体列表\nconst AGENTS: MentionTarget[] = [\n  { \n    type: 'agent', \n    id: 'bull', \n    label: '多方辩手', \n    description: '分析看多因素',\n    icon: <TrendingUp className=\"w-4 h-4\" />,\n    color: 'text-emerald-600 bg-emerald-50'\n  },\n  { \n    type: 'agent', \n    id: 'bear', \n    label: '空方辩手', \n    description: '分析看空因素',\n    icon: <TrendingDown className=\"w-4 h-4\" />,\n    color: 'text-rose-600 bg-rose-50'\n  },\n  { \n    type: 'agent', \n    id: 'manager', \n    label: '投资经理', \n    description: '综合决策',\n    icon: <Briefcase className=\"w-4 h-4\" />,\n    color: 'text-indigo-600 bg-indigo-50'\n  },\n  { \n    type: 'agent', \n    id: 'data_collector', \n    label: '数据专员', \n    description: '收集市场数据/动态搜索',\n    icon: <Bot className=\"w-4 h-4\" />,\n    color: 'text-cyan-600 bg-cyan-50'\n  },\n]\n\n// 预定义的数据源列表\nconst SOURCES: MentionTarget[] = [\n  { \n    type: 'source', \n    id: 'akshare', \n    label: 'AkShare', \n    description: '金融数据接口',\n    icon: <Database className=\"w-4 h-4\" />,\n    color: 'text-blue-600 bg-blue-50'\n  },\n  { \n    type: 'source', \n    id: 'bochaai', \n    label: 'BochaAI', \n    description: '实时新闻搜索',\n    icon: <Globe className=\"w-4 h-4\" />,\n    color: 'text-orange-600 bg-orange-50'\n  },\n  { \n    type: 'source', \n    id: 'browser', \n    label: '网页搜索', \n    description: '多引擎网页搜索',\n    icon: <Chrome className=\"w-4 h-4\" />,\n    color: 'text-green-600 bg-green-50'\n  },\n  { \n    type: 'source', \n    id: 'kb', \n    label: '知识库', \n    description: '历史新闻数据',\n    icon: <Hash className=\"w-4 h-4\" />,\n    color: 'text-amber-600 bg-amber-50'\n  },\n]\n\n// 所有可提及目标\nconst ALL_TARGETS = [...AGENTS, ...SOURCES]\n\ninterface MentionInputProps {\n  value: string\n  onChange: (value: string) => void\n  onSubmit: (value: string, mentions: MentionTarget[]) => void\n  placeholder?: string\n  disabled?: boolean\n  className?: string\n  // 可选：动态股票列表\n  stockOptions?: Array<{ code: string; name: string }>\n}\n\nconst MentionInput: React.FC<MentionInputProps> = ({\n  value,\n  onChange,\n  onSubmit,\n  placeholder,\n  disabled = false,\n  className,\n  stockOptions = []\n}) => {\n  const t = useGlobalI18n()\n  const defaultPlaceholder = placeholder || t.mentionInput.placeholder\n  const [showPopup, setShowPopup] = useState(false)\n  const [popupPosition, setPopupPosition] = useState({ top: 0, left: 0 })\n  const [selectedIndex, setSelectedIndex] = useState(0)\n  const [mentionQuery, setMentionQuery] = useState('')\n  const [mentionStartPos, setMentionStartPos] = useState(-1)\n  const [activeMentions, setActiveMentions] = useState<MentionTarget[]>([])\n  \n  const inputRef = useRef<HTMLInputElement>(null)\n  const popupRef = useRef<HTMLDivElement>(null)\n  \n  // 合并股票选项到目标列表\n  const allTargets = useMemo(() => {\n    const stockTargets: MentionTarget[] = stockOptions.map(s => ({\n      type: 'stock' as MentionType,\n      id: s.code,\n      label: s.name,\n      description: s.code,\n      icon: <Hash className=\"w-4 h-4\" />,\n      color: 'text-gray-600 bg-gray-50'\n    }))\n    return [...ALL_TARGETS, ...stockTargets]\n  }, [stockOptions])\n  \n  // 过滤后的目标列表\n  const filteredTargets = useMemo(() => {\n    if (!mentionQuery) return allTargets\n    const query = mentionQuery.toLowerCase()\n    return allTargets.filter(t => \n      t.label.toLowerCase().includes(query) ||\n      t.id.toLowerCase().includes(query) ||\n      t.description?.toLowerCase().includes(query)\n    )\n  }, [allTargets, mentionQuery])\n  \n  // 分组显示\n  const groupedTargets = useMemo(() => {\n    const agents = filteredTargets.filter(t => t.type === 'agent')\n    const sources = filteredTargets.filter(t => t.type === 'source')\n    const stocks = filteredTargets.filter(t => t.type === 'stock')\n    \n    const groups: { label: string; items: MentionTarget[] }[] = []\n    if (agents.length > 0) groups.push({ label: t.mentionInput.agents, items: agents })\n    if (sources.length > 0) groups.push({ label: t.mentionInput.sources, items: sources })\n    if (stocks.length > 0) groups.push({ label: t.mentionInput.stocks, items: stocks.slice(0, 5) })\n    \n    return groups\n  }, [filteredTargets, t])\n  \n  // 扁平化用于键盘导航\n  const flatTargets = useMemo(() => {\n    return groupedTargets.flatMap(g => g.items)\n  }, [groupedTargets])\n  \n  // 处理输入变化\n  const handleChange = useCallback((e: React.ChangeEvent<HTMLInputElement>) => {\n    const newValue = e.target.value\n    const cursorPos = e.target.selectionStart || 0\n    \n    onChange(newValue)\n    \n    // 检测 @ 符号\n    const textBeforeCursor = newValue.slice(0, cursorPos)\n    const lastAtIndex = textBeforeCursor.lastIndexOf('@')\n    \n    if (lastAtIndex !== -1) {\n      // 检查 @ 后面是否有空格（如果有，说明不是正在输入的提及）\n      const textAfterAt = textBeforeCursor.slice(lastAtIndex + 1)\n      if (!textAfterAt.includes(' ')) {\n        setMentionQuery(textAfterAt)\n        setMentionStartPos(lastAtIndex)\n        setShowPopup(true)\n        setSelectedIndex(0)\n        \n        // 计算弹窗位置\n        if (inputRef.current) {\n          const rect = inputRef.current.getBoundingClientRect()\n          setPopupPosition({\n            top: rect.top - 8, // 在输入框上方显示\n            left: rect.left\n          })\n        }\n        return\n      }\n    }\n    \n    setShowPopup(false)\n    setMentionQuery('')\n    setMentionStartPos(-1)\n  }, [onChange])\n  \n  // 选择提及目标\n  const selectTarget = useCallback((target: MentionTarget) => {\n    if (mentionStartPos === -1) return\n    \n    const beforeMention = value.slice(0, mentionStartPos)\n    const afterMention = value.slice(mentionStartPos + mentionQuery.length + 1) // +1 for @\n    const newValue = `${beforeMention}@${target.label} ${afterMention}`\n    \n    onChange(newValue)\n    setActiveMentions(prev => [...prev, target])\n    setShowPopup(false)\n    setMentionQuery('')\n    setMentionStartPos(-1)\n    \n    // 聚焦回输入框\n    inputRef.current?.focus()\n  }, [value, mentionStartPos, mentionQuery, onChange])\n  \n  // 键盘事件处理\n  const handleKeyDown = useCallback((e: React.KeyboardEvent<HTMLInputElement>) => {\n    if (showPopup) {\n      switch (e.key) {\n        case 'ArrowDown':\n          e.preventDefault()\n          setSelectedIndex(prev => \n            prev < flatTargets.length - 1 ? prev + 1 : 0\n          )\n          break\n        case 'ArrowUp':\n          e.preventDefault()\n          setSelectedIndex(prev => \n            prev > 0 ? prev - 1 : flatTargets.length - 1\n          )\n          break\n        case 'Enter':\n          e.preventDefault()\n          if (flatTargets[selectedIndex]) {\n            selectTarget(flatTargets[selectedIndex])\n          }\n          break\n        case 'Escape':\n          e.preventDefault()\n          setShowPopup(false)\n          break\n        case 'Tab':\n          e.preventDefault()\n          if (flatTargets[selectedIndex]) {\n            selectTarget(flatTargets[selectedIndex])\n          }\n          break\n      }\n    } else if (e.key === 'Enter' && !e.shiftKey) {\n      e.preventDefault()\n      if (value.trim()) {\n        onSubmit(value.trim(), activeMentions)\n        setActiveMentions([])\n      }\n    }\n  }, [showPopup, flatTargets, selectedIndex, selectTarget, value, onSubmit, activeMentions])\n  \n  // 点击外部关闭弹窗\n  useEffect(() => {\n    const handleClickOutside = (e: MouseEvent) => {\n      if (\n        popupRef.current && \n        !popupRef.current.contains(e.target as Node) &&\n        inputRef.current &&\n        !inputRef.current.contains(e.target as Node)\n      ) {\n        setShowPopup(false)\n      }\n    }\n    \n    document.addEventListener('mousedown', handleClickOutside)\n    return () => document.removeEventListener('mousedown', handleClickOutside)\n  }, [])\n  \n  // 滚动选中项到可见区域\n  useEffect(() => {\n    if (showPopup && popupRef.current) {\n      const selectedElement = popupRef.current.querySelector(`[data-index=\"${selectedIndex}\"]`)\n      selectedElement?.scrollIntoView({ block: 'nearest' })\n    }\n  }, [selectedIndex, showPopup])\n  \n  // 移除已添加的提及标签\n  const removeMention = useCallback((targetId: string) => {\n    const target = activeMentions.find(m => m.id === targetId)\n    if (target) {\n      const newValue = value.replace(`@${target.label}`, '').replace(/\\s+/g, ' ').trim()\n      onChange(newValue)\n      setActiveMentions(prev => prev.filter(m => m.id !== targetId))\n    }\n  }, [activeMentions, value, onChange])\n  \n  return (\n    <div className={cn(\"relative flex-1\", className)}>\n      {/* 已选择的提及标签 */}\n      {activeMentions.length > 0 && (\n        <div className=\"flex flex-wrap gap-1 mb-2\">\n          {activeMentions.map(mention => (\n            <span \n              key={mention.id}\n              className={cn(\n                \"inline-flex items-center gap-1 px-2 py-0.5 rounded-full text-xs font-medium\",\n                mention.color\n              )}\n            >\n              {mention.icon}\n              {mention.label}\n              <button \n                onClick={() => removeMention(mention.id)}\n                className=\"ml-0.5 hover:opacity-70\"\n              >\n                <X className=\"w-3 h-3\" />\n              </button>\n            </span>\n          ))}\n        </div>\n      )}\n      \n      {/* 输入框 */}\n      <input\n        ref={inputRef}\n        type=\"text\"\n        value={value}\n        onChange={handleChange}\n        onKeyDown={handleKeyDown}\n            placeholder={defaultPlaceholder}\n        disabled={disabled}\n        className={cn(\n          \"w-full px-4 py-2 rounded-full bg-gray-50 border border-gray-200\",\n          \"focus:border-blue-300 focus:outline-none focus:ring-2 focus:ring-blue-100\",\n          \"text-sm disabled:opacity-50 disabled:cursor-not-allowed\",\n          \"transition-all duration-200\"\n        )}\n      />\n      \n      {/* @ 提及弹窗 */}\n      {showPopup && filteredTargets.length > 0 && (\n        <div\n          ref={popupRef}\n          className={cn(\n            \"absolute z-50 w-72 max-h-80 overflow-y-auto\",\n            \"bg-white rounded-xl shadow-xl border border-gray-200\",\n            \"animate-in fade-in slide-in-from-bottom-2 duration-200\"\n          )}\n          style={{\n            bottom: '100%',\n            left: 0,\n            marginBottom: '8px'\n          }}\n        >\n          <div className=\"p-2\">\n            <div className=\"text-xs text-gray-400 px-2 py-1 mb-1\">\n              使用 ↑↓ 选择，Enter 确认，Esc 取消\n            </div>\n            \n            {groupedTargets.map((group, groupIndex) => (\n              <div key={group.label} className={groupIndex > 0 ? 'mt-2' : ''}>\n                <div className=\"text-xs font-medium text-gray-500 px-2 py-1 sticky top-0 bg-white\">\n                  {group.label}\n                </div>\n                {group.items.map((target, itemIndex) => {\n                  const flatIndex = groupedTargets\n                    .slice(0, groupIndex)\n                    .reduce((acc, g) => acc + g.items.length, 0) + itemIndex\n                  \n                  return (\n                    <button\n                      key={target.id}\n                      data-index={flatIndex}\n                      onClick={() => selectTarget(target)}\n                      className={cn(\n                        \"w-full flex items-center gap-3 px-3 py-2 rounded-lg text-left\",\n                        \"transition-colors duration-100\",\n                        flatIndex === selectedIndex\n                          ? \"bg-blue-50 text-blue-700\"\n                          : \"hover:bg-gray-50\"\n                      )}\n                    >\n                      <div className={cn(\n                        \"w-8 h-8 rounded-lg flex items-center justify-center\",\n                        target.color\n                      )}>\n                        {target.icon}\n                      </div>\n                      <div className=\"flex-1 min-w-0\">\n                        <div className=\"font-medium text-sm truncate\">\n                          {target.label}\n                        </div>\n                        {target.description && (\n                          <div className=\"text-xs text-gray-500 truncate\">\n                            {target.description}\n                          </div>\n                        )}\n                      </div>\n                    </button>\n                  )\n                })}\n              </div>\n            ))}\n          </div>\n        </div>\n      )}\n      \n      {/* 空结果提示 */}\n      {showPopup && filteredTargets.length === 0 && (\n        <div\n          ref={popupRef}\n          className={cn(\n            \"absolute z-50 w-72\",\n            \"bg-white rounded-xl shadow-xl border border-gray-200 p-4\",\n            \"animate-in fade-in slide-in-from-bottom-2 duration-200\"\n          )}\n          style={{\n            bottom: '100%',\n            left: 0,\n            marginBottom: '8px'\n          }}\n        >\n          <div className=\"text-sm text-gray-500 text-center\">\n            未找到匹配的选项\n          </div>\n        </div>\n      )}\n    </div>\n  )\n}\n\nexport default MentionInput\nexport { AGENTS, SOURCES, ALL_TARGETS }\n\n"
  },
  {
    "path": "frontend/src/components/ModelSelector.tsx",
    "content": "import { useState, useEffect, useMemo } from 'react'\nimport { useQuery } from '@tanstack/react-query'\nimport { Button } from '@/components/ui/button'\nimport {\n  DropdownMenu,\n  DropdownMenuContent,\n  DropdownMenuItem,\n  DropdownMenuLabel,\n  DropdownMenuSeparator,\n  DropdownMenuTrigger,\n} from '@/components/ui/dropdown-menu'\nimport { ChevronDown, Check, AlertCircle } from 'lucide-react'\nimport { cn } from '@/lib/utils'\nimport { llmApi } from '@/lib/api-client'\nimport { useGlobalI18n, useLanguageStore } from '@/store/useLanguageStore'\n\n// 模型配置\nexport interface ModelConfig {\n  provider: string\n  model: string\n}\n\n// Provider 和 Model 的国际化映射\nconst PROVIDER_I18N: Record<string, { labelZh: string; labelEn: string }> = {\n  bailian: {\n    labelZh: '百炼（阿里云）',\n    labelEn: 'Bailian (Alibaba Cloud)',\n  },\n  openai: {\n    labelZh: 'OpenAI',\n    labelEn: 'OpenAI',\n  },\n  deepseek: {\n    labelZh: 'DeepSeek',\n    labelEn: 'DeepSeek',\n  },\n  kimi: {\n    labelZh: 'Kimi (Moonshot)',\n    labelEn: 'Kimi (Moonshot)',\n  },\n  zhipu: {\n    labelZh: '智谱',\n    labelEn: 'Zhipu',\n  },\n}\n\nconst MODEL_DESCRIPTION_I18N: Record<string, { descZh: string; descEn: string }> = {\n  bailian: {\n    descZh: '百炼 模型',\n    descEn: 'Bailian Model',\n  },\n  openai: {\n    descZh: 'OpenAI 模型',\n    descEn: 'OpenAI Model',\n  },\n  deepseek: {\n    descZh: 'DeepSeek 模型',\n    descEn: 'DeepSeek Model',\n  },\n  kimi: {\n    descZh: 'Kimi 模型',\n    descEn: 'Kimi Model',\n  },\n  zhipu: {\n    descZh: '智谱 模型',\n    descEn: 'Zhipu Model',\n  },\n}\n\nconst DEFAULT_CONFIG: ModelConfig = {\n  provider: 'bailian',\n  model: 'qwen-plus',\n}\n\nexport default function ModelSelector() {\n  const t = useGlobalI18n()\n  const { lang } = useLanguageStore()\n  const [config, setConfig] = useState<ModelConfig>(DEFAULT_CONFIG)\n  \n  // 从后端 API 动态加载可用厂商和模型\n  const { data: llmConfig, isLoading } = useQuery({\n    queryKey: ['llm-config'],\n    queryFn: llmApi.getConfig,\n    staleTime: 5 * 60 * 1000, // 缓存 5 分钟\n    retry: 1,\n  })\n  \n  // 国际化处理：将后端返回的 provider 和 model 数据转换为国际化文本\n  const providers = useMemo(() => {\n    if (!llmConfig?.providers) return []\n    return llmConfig.providers.map(provider => {\n      const providerI18n = PROVIDER_I18N[provider.value] || { \n        labelZh: provider.label, \n        labelEn: provider.label \n      }\n      const modelDescI18n = MODEL_DESCRIPTION_I18N[provider.value] || { \n        descZh: `${provider.label} 模型`, \n        descEn: `${provider.label} Model` \n      }\n      \n      return {\n        ...provider,\n        label: lang === 'zh' ? providerI18n.labelZh : providerI18n.labelEn,\n        models: provider.models.map(model => ({\n          ...model,\n          description: lang === 'zh' ? modelDescI18n.descZh : modelDescI18n.descEn,\n        })),\n      }\n    })\n  }, [llmConfig?.providers, lang])\n\n  // 从 localStorage 加载配置\n  useEffect(() => {\n    const saved = localStorage.getItem('modelConfig')\n    if (saved) {\n      try {\n        setConfig(JSON.parse(saved))\n      } catch (e) {\n        console.error('Failed to load model config:', e)\n      }\n    }\n  }, [])\n\n  // 保存配置到 localStorage\n  const saveConfig = (newConfig: ModelConfig) => {\n    setConfig(newConfig)\n    localStorage.setItem('modelConfig', JSON.stringify(newConfig))\n    // 触发全局事件，通知其他组件\n    window.dispatchEvent(\n      new CustomEvent('model-config-changed', { detail: newConfig })\n    )\n  }\n\n  const currentProvider = providers.find((p) => p.value === config.provider)\n  const currentModel = currentProvider?.models.find(\n    (m) => m.value === config.model\n  )\n\n  // 加载状态\n  if (isLoading) {\n    return (\n      <div className=\"flex items-center\">\n        <Button variant=\"outline\" size=\"sm\" disabled className=\"gap-2 h-10 rounded-lg px-3\">\n          <span className=\"text-sm\">{t.model.loading}</span>\n        </Button>\n      </div>\n    )\n  }\n\n  // 无可用厂商\n  if (providers.length === 0) {\n    return (\n      <div className=\"flex items-center\">\n        <Button variant=\"outline\" size=\"sm\" disabled className=\"gap-2 h-10 rounded-lg px-3 border-orange-300\">\n          <AlertCircle className=\"h-4 w-4 text-orange-500\" />\n          <span className=\"text-sm text-orange-600\">{t.model.notConfigured}</span>\n        </Button>\n      </div>\n    )\n  }\n\n  return (\n    <div className=\"flex items-center\">\n      <DropdownMenu>\n        <DropdownMenuTrigger asChild>\n          <Button\n            variant=\"outline\"\n            size=\"sm\"\n            className=\"gap-2 h-10 rounded-lg px-3 border-slate-200 bg-white shadow-sm hover:shadow-md transition-all\"\n          >\n            <span className=\"text-base\">{currentProvider?.icon || '📦'}</span>\n            <div className=\"flex flex-col items-start leading-tight\">\n              <span className=\"text-[11px] text-slate-500\">\n                {currentProvider?.label || t.model.selectModel}\n              </span>\n              <span className=\"text-sm font-semibold text-slate-900\">\n                {currentModel?.label || config.model}\n              </span>\n            </div>\n            <ChevronDown className=\"h-4 w-4 opacity-60\" />\n          </Button>\n        </DropdownMenuTrigger>\n        <DropdownMenuContent\n          align=\"end\"\n          className=\"w-96 max-h-[480px] overflow-y-auto border-slate-200 shadow-xl\"\n        >\n          <DropdownMenuLabel className=\"text-xs text-slate-500\">\n            {t.model.selectTip}\n          </DropdownMenuLabel>\n          <DropdownMenuSeparator />\n\n          {providers.map((provider) => (\n            <div key={provider.value} className=\"px-1 py-1\">\n              <DropdownMenuLabel className=\"text-xs text-slate-500 flex items-center gap-2\">\n                <span className=\"text-base\">{provider.icon}</span>\n                <span className=\"font-medium text-slate-700\">{provider.label}</span>\n                {!provider.has_api_key && (\n                  <span className=\"text-xs text-orange-500 ml-auto\">⚠️ {t.model.noApiKey}</span>\n                )}\n              </DropdownMenuLabel>\n              <div className=\"grid gap-1\">\n                {provider.models.map((model) => {\n                  const isActive =\n                    config.provider === provider.value &&\n                    config.model === model.value\n                  return (\n                    <DropdownMenuItem\n                      key={`${provider.value}-${model.value}`}\n                      onClick={() =>\n                        saveConfig({\n                          provider: provider.value,\n                          model: model.value,\n                        })\n                      }\n                      disabled={!provider.has_api_key}\n                      className={cn(\n                        \"flex items-start gap-3 rounded-lg border border-transparent px-3 py-3 transition-colors\",\n                        !provider.has_api_key && \"opacity-50 cursor-not-allowed\",\n                        isActive\n                          ? \"border-primary/30 bg-primary/5\"\n                          : \"hover:bg-slate-50\"\n                      )}\n                    >\n                      <div className=\"flex flex-1 flex-col\">\n                        <div className=\"flex items-center gap-2\">\n                          <span className=\"font-medium text-sm text-slate-900\">\n                            {model.label}\n                          </span>\n                          {isActive && <Check className=\"h-4 w-4 text-primary\" />}\n                        </div>\n                        <span className=\"text-xs text-slate-500\">\n                          {model.description}\n                        </span>\n                      </div>\n                    </DropdownMenuItem>\n                  )\n                })}\n              </div>\n              <DropdownMenuSeparator className=\"my-2\" />\n            </div>\n          ))}\n\n          <div className=\"px-3 py-2 text-xs text-slate-500 bg-slate-50 rounded-md mx-1\">\n            {t.model.current}：{currentProvider?.label} · {currentModel?.label}\n          </div>\n        </DropdownMenuContent>\n      </DropdownMenu>\n    </div>\n  )\n}\n\n// 导出 hook 供其他组件使用\nexport function useModelConfig() {\n  const [config, setConfig] = useState<ModelConfig>(DEFAULT_CONFIG)\n\n  useEffect(() => {\n    // 加载配置\n    const saved = localStorage.getItem('modelConfig')\n    if (saved) {\n      try {\n        setConfig(JSON.parse(saved))\n      } catch (e) {\n        console.error('Failed to load model config:', e)\n      }\n    }\n\n    // 监听配置变化\n    const handleConfigChange = (e: CustomEvent<ModelConfig>) => {\n      setConfig(e.detail)\n    }\n\n    window.addEventListener(\n      'model-config-changed',\n      handleConfigChange as EventListener\n    )\n\n    return () => {\n      window.removeEventListener(\n        'model-config-changed',\n        handleConfigChange as EventListener\n      )\n    }\n  }, [])\n\n  return config\n}\n\n"
  },
  {
    "path": "frontend/src/components/NewsDetailDrawer.tsx",
    "content": "import { useQuery } from '@tanstack/react-query'\nimport { useState, useEffect } from 'react'\nimport { toast } from 'sonner'\nimport ReactMarkdown from 'react-markdown'\nimport remarkGfm from 'remark-gfm'\nimport {\n  Sheet,\n  SheetContent,\n  SheetHeader,\n  SheetTitle,\n  SheetDescription,\n} from '@/components/ui/sheet'\nimport { Button } from '@/components/ui/button'\nimport { Badge } from '@/components/ui/badge'\nimport { Card, CardContent } from '@/components/ui/card'\nimport { newsApi, analysisApi } from '@/lib/api-client'\nimport { formatRelativeTime } from '@/lib/utils'\nimport {\n  ExternalLink,\n  Share2,\n  Calendar,\n  TrendingUp,\n  CheckCircle2,\n  XCircle,\n  MinusCircle,\n  Sparkles,\n  Copy,\n  Check,\n  FileText,\n  Code,\n} from 'lucide-react'\n\n// 新闻源配置\nconst NEWS_SOURCES = [\n  { key: 'all', name: '全部来源', icon: '📰' },\n  { key: 'sina', name: '新浪财经', icon: '🌐' },\n  { key: 'tencent', name: '腾讯财经', icon: '🐧' },\n  { key: 'jwview', name: '金融界', icon: '💰' },\n  { key: 'eeo', name: '经济观察网', icon: '📊' },\n  { key: 'caijing', name: '财经网', icon: '📈' },\n  { key: 'jingji21', name: '21经济网', icon: '📉' },\n  { key: 'nbd', name: '每日经济新闻', icon: '📰' },\n  { key: 'yicai', name: '第一财经', icon: '🎯' },\n  { key: '163', name: '网易财经', icon: '📧' },\n  { key: 'eastmoney', name: '东方财富', icon: '💎' },\n]\n\ninterface NewsDetailDrawerProps {\n  newsId: number | null\n  open: boolean\n  onOpenChange: (open: boolean) => void\n}\n\nexport default function NewsDetailDrawer({\n  newsId,\n  open,\n  onOpenChange,\n}: NewsDetailDrawerProps) {\n  const [analyzing, setAnalyzing] = useState(false)\n  const [copiedId, setCopiedId] = useState<number | null>(null)\n  const [showRawHtml, setShowRawHtml] = useState(false)  // 是否显示原始 HTML\n\n  // 清理HTML标签并转换为Markdown\n  const cleanMarkdown = (text: string): string => {\n    return text\n      // 替换HTML换行标签为Markdown换行\n      .replace(/<br\\s*\\/?>/gi, '\\n')\n      .replace(/<br>/gi, '\\n')\n      // 移除其他HTML标签\n      .replace(/<[^>]+>/g, '')\n      // 清理多余空行\n      .replace(/\\n{3,}/g, '\\n\\n')\n      .trim()\n  }\n\n  // 复制功能\n  const handleCopy = async (text: string, analysisId: number) => {\n    try {\n      await navigator.clipboard.writeText(text)\n      setCopiedId(analysisId)\n      toast.success('已复制到剪贴板')\n      setTimeout(() => setCopiedId(null), 2000)\n    } catch (err) {\n      toast.error('复制失败，请手动复制')\n    }\n  }\n\n  // 获取新闻详情\n  const { data: news, isLoading } = useQuery({\n    queryKey: ['news', 'detail', newsId],\n    queryFn: () => newsApi.getNewsDetail(newsId!),\n    enabled: !!newsId && open,\n  })\n\n  // 获取分析结果（如果已分析）\n  const { data: analyses, refetch: refetchAnalyses } = useQuery({\n    queryKey: ['analysis', 'news', newsId],\n    queryFn: () => analysisApi.getNewsAnalyses(newsId!),\n    enabled: !!newsId && open,\n    staleTime: 0,  // 立即过期，确保每次打开都获取最新数据\n  })\n\n  // 获取相关新闻（同来源的其他新闻）\n  const { data: relatedNews } = useQuery({\n    queryKey: ['news', 'related', newsId],\n    queryFn: async () => {\n      if (!news) return []\n      const allNews = await newsApi.getLatestNews({ \n        source: news.source, \n        limit: 10 \n      })\n      // 排除当前新闻，返回前5条\n      return allNews.filter(n => n.id !== newsId).slice(0, 5)\n    },\n    enabled: !!newsId && open && !!news,\n  })\n\n  // 获取原始 HTML（仅在点击\"查看原始内容\"时加载）\n  const { data: htmlData, isLoading: htmlLoading } = useQuery({\n    queryKey: ['news', 'html', newsId],\n    queryFn: () => newsApi.getNewsHtml(newsId!),\n    enabled: !!newsId && open && showRawHtml,\n  })\n\n  // 当切换到新新闻时，重置分析状态\n  useEffect(() => {\n    setAnalyzing(false)\n  }, [newsId])\n\n  // 处理分享\n  const handleShare = async () => {\n    if (!news) return\n    const url = `${window.location.origin}/news/${news.id}`\n    try {\n      await navigator.clipboard.writeText(url)\n      toast.success('链接已复制到剪贴板')\n    } catch (err) {\n      toast.error('复制失败，请手动复制')\n    }\n  }\n\n  // 处理分析\n  const handleAnalyze = async () => {\n    if (!newsId) return\n    setAnalyzing(true)\n    try {\n      const result = await analysisApi.analyzeNews(newsId)\n      if (result.success) {\n        toast.success('分析完成！')\n        // 刷新分析数据（不重载整个页面）\n        await refetchAnalyses()\n      } else {\n        toast.error(result.error || '分析失败')\n      }\n    } catch (error) {\n      toast.error('分析失败，请稍后重试')\n    } finally {\n      setAnalyzing(false)\n    }\n  }\n\n  // 获取情感标签\n  const getSentimentBadge = (score: number | null) => {\n    if (score === null) {\n      return (\n        <Badge variant=\"outline\" className=\"bg-gray-50 text-gray-700\">\n          <span className=\"mr-1\">😐</span>\n          待分析\n        </Badge>\n      )\n    }\n    if (score > 0.1) {\n      return (\n        <Badge className=\"bg-emerald-100 text-emerald-700 border-emerald-300\">\n          <CheckCircle2 className=\"w-3 h-3 mr-1\" />\n          利好 {score.toFixed(2)}\n        </Badge>\n      )\n    }\n    if (score < -0.1) {\n      return (\n        <Badge className=\"bg-rose-100 text-rose-700 border-rose-300\">\n          <XCircle className=\"w-3 h-3 mr-1\" />\n          利空 {score.toFixed(2)}\n        </Badge>\n      )\n    }\n    return (\n      <Badge className=\"bg-slate-100 text-slate-700 border-slate-300\">\n        <MinusCircle className=\"w-3 h-3 mr-1\" />\n        中性 {score.toFixed(2)}\n      </Badge>\n    )\n  }\n\n  const sourceInfo = NEWS_SOURCES.find(s => s.key === news?.source)\n\n  return (\n    <Sheet open={open} onOpenChange={onOpenChange}>\n      <SheetContent side=\"right\" className=\"overflow-y-auto\">\n        {isLoading ? (\n          <div className=\"flex items-center justify-center h-full\">\n            <div className=\"text-center\">\n              <div className=\"inline-block animate-spin rounded-full h-8 w-8 border-b-2 border-primary mb-4\"></div>\n              <p className=\"text-gray-500\">加载中...</p>\n            </div>\n          </div>\n        ) : !news ? (\n          <div className=\"flex items-center justify-center h-full\">\n            <p className=\"text-gray-500\">新闻不存在</p>\n          </div>\n        ) : (\n          <div className=\"space-y-6\">\n            {/* 头部区域 */}\n            <SheetHeader>\n              <SheetTitle className=\"text-2xl font-bold leading-tight pr-8\">\n                {news.title}\n              </SheetTitle>\n              <SheetDescription>\n                <div className=\"flex items-center gap-4 text-sm text-gray-500 mt-2\">\n                  <div className=\"flex items-center gap-1\">\n                    <span>{sourceInfo?.icon || '📰'}</span>\n                    <span>{sourceInfo?.name || news.source}</span>\n                  </div>\n                  <span>•</span>\n                  <div className=\"flex items-center gap-1\">\n                    <Calendar className=\"w-3 h-3\" />\n                    <span>{formatRelativeTime(news.publish_time || news.created_at)}</span>\n                  </div>\n                  {news.author && (\n                    <>\n                      <span>•</span>\n                      <span>作者：{news.author}</span>\n                    </>\n                  )}\n                </div>\n              </SheetDescription>\n            </SheetHeader>\n\n            {/* 操作按钮栏 */}\n            <div className=\"flex flex-wrap gap-2 pb-4 border-b\">\n              <Button\n                variant=\"outline\"\n                size=\"sm\"\n                onClick={() => window.open(news.url, '_blank')}\n                className=\"flex items-center gap-2\"\n              >\n                <ExternalLink className=\"w-4 h-4\" />\n                原文链接\n              </Button>\n              <Button\n                variant=\"outline\"\n                size=\"sm\"\n                onClick={handleShare}\n                className=\"flex items-center gap-2\"\n              >\n                <Share2 className=\"w-4 h-4\" />\n                分享\n              </Button>\n              <Button\n                variant=\"outline\"\n                size=\"sm\"\n                onClick={handleAnalyze}\n                disabled={analyzing}\n                className=\"flex items-center gap-2\"\n              >\n                <Sparkles className={`w-4 h-4 ${analyzing ? 'animate-spin' : ''}`} />\n                {analyzing ? '分析中...' : '分析'}\n              </Button>\n              <Button\n                variant={showRawHtml ? \"default\" : \"outline\"}\n                size=\"sm\"\n                onClick={() => setShowRawHtml(!showRawHtml)}\n                className=\"flex items-center gap-2\"\n              >\n                <Code className=\"w-4 h-4\" />\n                {showRawHtml ? '显示解析内容' : '查看原始内容'}\n              </Button>\n            </div>\n\n            {/* 情感分析卡片 - 优先显示最新分析结果 */}\n            {(() => {\n              // 优先使用最新分析记录中的评分，否则使用 news 表中的评分\n              const latestScore = analyses && analyses.length > 0 && analyses[0].sentiment_score !== null\n                ? analyses[0].sentiment_score\n                : news.sentiment_score;\n              \n              if (latestScore === null) return null;\n              \n              return (\n                <Card className=\"bg-gradient-to-r from-blue-50 to-indigo-50 border-blue-200\">\n                  <CardContent className=\"pt-6\">\n                    <div className=\"flex items-center justify-between\">\n                      <div>\n                        <h3 className=\"font-semibold text-gray-900 mb-2\">情感分析</h3>\n                        <div className=\"flex items-center gap-2\">\n                          {getSentimentBadge(latestScore)}\n                          <span className=\"text-sm text-gray-600\">\n                            评分：{latestScore.toFixed(3)}\n                          </span>\n                        </div>\n                      </div>\n                      {analyses && analyses.length > 0 && (\n                        <div className=\"text-xs text-gray-500\">\n                          分析时间：{formatRelativeTime(analyses[0].created_at)}\n                        </div>\n                      )}\n                    </div>\n                  </CardContent>\n                </Card>\n              );\n            })()}\n\n            {/* 股票代码区域 */}\n            {news.stock_codes && news.stock_codes.length > 0 && (\n              <div>\n                <h3 className=\"font-semibold text-gray-900 mb-3 flex items-center gap-2\">\n                  <TrendingUp className=\"w-4 h-4\" />\n                  关联股票\n                </h3>\n                <div className=\"flex flex-wrap gap-2\">\n                  {news.stock_codes.map((code) => (\n                    <Badge\n                      key={code}\n                      variant=\"outline\"\n                      className=\"text-sm bg-blue-50 text-blue-700 border-blue-200 hover:bg-blue-100 cursor-pointer px-3 py-1\"\n                    >\n                      <TrendingUp className=\"w-3 h-3 mr-1\" />\n                      {code}\n                    </Badge>\n                  ))}\n                </div>\n              </div>\n            )}\n\n            {/* 完整正文区域 */}\n            <div>\n              <h3 className=\"font-semibold text-gray-900 mb-3 flex items-center gap-2\">\n                {showRawHtml ? <Code className=\"w-4 h-4\" /> : <FileText className=\"w-4 h-4\" />}\n                {showRawHtml ? '原始内容' : '正文内容'}\n              </h3>\n              \n              {showRawHtml ? (\n                // 原始 HTML 展示区域\n                <div className=\"border rounded-lg overflow-hidden bg-white\">\n                  {htmlLoading ? (\n                    <div className=\"p-8 text-center text-gray-500\">\n                      <div className=\"animate-spin w-6 h-6 border-2 border-blue-500 border-t-transparent rounded-full mx-auto mb-2\"></div>\n                      加载原始内容中...\n                    </div>\n                  ) : htmlData?.raw_html ? (\n                    <iframe\n                      srcDoc={htmlData.raw_html}\n                      className=\"w-full border-0\"\n                      style={{ height: '600px' }}\n                      sandbox=\"allow-same-origin\"\n                      title=\"原始新闻内容\"\n                    />\n                  ) : (\n                    <div className=\"p-8 text-center text-gray-500\">\n                      <Code className=\"w-8 h-8 mx-auto mb-2 opacity-50\" />\n                      <p>该新闻暂无原始 HTML 内容</p>\n                      <p className=\"text-sm mt-1\">请重新爬取该新闻以获取完整内容</p>\n                    </div>\n                  )}\n                </div>\n              ) : (\n                // 解析后的文本展示\n                <div className=\"prose prose-sm max-w-none\">\n                  <div className=\"text-gray-700 leading-relaxed whitespace-pre-wrap\">\n                    {news.content.split('\\n').map((paragraph, idx) => (\n                      <p key={idx} className=\"mb-4\">\n                        {paragraph}\n                      </p>\n                    ))}\n                  </div>\n                </div>\n              )}\n            </div>\n\n            {/* 分析详情 */}\n            {analyses && analyses.length > 0 && (\n              <div>\n                <h3 className=\"font-semibold text-gray-900 mb-3 flex items-center gap-2\">\n                  <Sparkles className=\"w-4 h-4\" />\n                  智能体分析详情\n                </h3>\n                {analyses.map((analysis) => {\n                  // 清理和合并所有内容用于复制\n                  const fullContent = [\n                    analysis.summary ? `## 摘要\\n\\n${cleanMarkdown(analysis.summary)}` : '',\n                    analysis.analysis_result ? `## 详细分析\\n\\n${cleanMarkdown(analysis.analysis_result)}` : ''\n                  ].filter(Boolean).join('\\n\\n')\n\n                  return (\n                    <Card key={analysis.id} className=\"mb-4 relative\">\n                      <CardContent className=\"pt-6\">\n                        <div className=\"space-y-3\">\n                          <div className=\"flex items-center justify-between\">\n                            <Badge variant=\"outline\">{analysis.agent_name}</Badge>\n                            <div className=\"flex items-center gap-2\">\n                              {analysis.confidence && (\n                                <span className=\"text-xs text-gray-500\">\n                                  置信度：{(analysis.confidence * 100).toFixed(1)}%\n                                </span>\n                              )}\n                            </div>\n                          </div>\n                          {analysis.summary && (\n                            <div>\n                              <h4 className=\"font-medium text-sm text-gray-700 mb-2\">摘要</h4>\n                              <div className=\"prose prose-sm max-w-none\">\n                                <ReactMarkdown\n                                  remarkPlugins={[remarkGfm]}\n                                  className=\"text-sm text-gray-600 leading-relaxed\"\n                                  components={{\n                                    h1: ({node, ...props}) => <h1 className=\"text-base font-bold mb-2 mt-3\" {...props} />,\n                                    h2: ({node, ...props}) => <h2 className=\"text-sm font-bold mb-2 mt-2\" {...props} />,\n                                    h3: ({node, ...props}) => <h3 className=\"text-sm font-semibold mb-1 mt-2\" {...props} />,\n                                    h4: ({node, ...props}) => <h4 className=\"text-sm font-medium mb-1 mt-2\" {...props} />,\n                                    p: ({node, ...props}) => <p className=\"mb-2\" {...props} />,\n                                    ul: ({node, ...props}) => <ul className=\"list-disc list-inside mb-2 space-y-1\" {...props} />,\n                                    ol: ({node, ...props}) => <ol className=\"list-decimal list-inside mb-2 space-y-1\" {...props} />,\n                                    li: ({node, ...props}) => <li className=\"ml-2\" {...props} />,\n                                    strong: ({node, ...props}) => <strong className=\"font-semibold text-gray-800\" {...props} />,\n                                    em: ({node, ...props}) => <em className=\"italic\" {...props} />,\n                                    code: ({node, ...props}) => (\n                                      <code className=\"bg-gray-100 px-1 py-0.5 rounded text-xs font-mono text-gray-800\" {...props} />\n                                    ),\n                                    pre: ({node, ...props}) => <pre className=\"bg-gray-100 p-2 rounded overflow-x-auto mb-2\" {...props} />,\n                                    blockquote: ({node, ...props}) => <blockquote className=\"border-l-4 border-gray-300 pl-3 italic text-gray-600 my-2\" {...props} />,\n                                    hr: ({node, ...props}) => <hr className=\"my-3 border-gray-200\" {...props} />,\n                                    table: ({node, ...props}) => (\n                                      <div className=\"overflow-x-auto my-3\">\n                                        <table className=\"min-w-full border-collapse border border-gray-300 text-xs\" {...props} />\n                                      </div>\n                                    ),\n                                    thead: ({node, ...props}) => <thead className=\"bg-gray-50\" {...props} />,\n                                    tbody: ({node, ...props}) => <tbody {...props} />,\n                                    tr: ({node, ...props}) => <tr className=\"border-b border-gray-200\" {...props} />,\n                                    th: ({node, ...props}) => <th className=\"border border-gray-300 px-3 py-2 text-left font-semibold bg-gray-100\" {...props} />,\n                                    td: ({node, ...props}) => <td className=\"border border-gray-300 px-3 py-2\" {...props} />,\n                                  }}\n                                >\n                                  {cleanMarkdown(analysis.summary)}\n                                </ReactMarkdown>\n                              </div>\n                            </div>\n                          )}\n                          {analysis.analysis_result && (\n                            <div>\n                              <h4 className=\"font-medium text-sm text-gray-700 mb-2\">详细分析</h4>\n                              <div className=\"prose prose-sm max-w-none\">\n                                <ReactMarkdown\n                                  remarkPlugins={[remarkGfm]}\n                                  className=\"text-sm text-gray-600 leading-relaxed\"\n                                  components={{\n                                    h1: ({node, ...props}) => <h1 className=\"text-base font-bold mb-2 mt-3\" {...props} />,\n                                    h2: ({node, ...props}) => <h2 className=\"text-sm font-bold mb-2 mt-2\" {...props} />,\n                                    h3: ({node, ...props}) => <h3 className=\"text-sm font-semibold mb-1 mt-2\" {...props} />,\n                                    h4: ({node, ...props}) => <h4 className=\"text-sm font-medium mb-1 mt-2\" {...props} />,\n                                    p: ({node, ...props}) => <p className=\"mb-2\" {...props} />,\n                                    ul: ({node, ...props}) => <ul className=\"list-disc list-inside mb-2 space-y-1\" {...props} />,\n                                    ol: ({node, ...props}) => <ol className=\"list-decimal list-inside mb-2 space-y-1\" {...props} />,\n                                    li: ({node, ...props}) => <li className=\"ml-2\" {...props} />,\n                                    strong: ({node, ...props}) => <strong className=\"font-semibold text-gray-800\" {...props} />,\n                                    em: ({node, ...props}) => <em className=\"italic\" {...props} />,\n                                    code: ({node, ...props}) => (\n                                      <code className=\"bg-gray-100 px-1 py-0.5 rounded text-xs font-mono text-gray-800\" {...props} />\n                                    ),\n                                    pre: ({node, ...props}) => <pre className=\"bg-gray-100 p-2 rounded overflow-x-auto mb-2\" {...props} />,\n                                    blockquote: ({node, ...props}) => <blockquote className=\"border-l-4 border-gray-300 pl-3 italic text-gray-600 my-2\" {...props} />,\n                                    hr: ({node, ...props}) => <hr className=\"my-3 border-gray-200\" {...props} />,\n                                    table: ({node, ...props}) => (\n                                      <div className=\"overflow-x-auto my-3\">\n                                        <table className=\"min-w-full border-collapse border border-gray-300 text-xs\" {...props} />\n                                      </div>\n                                    ),\n                                    thead: ({node, ...props}) => <thead className=\"bg-gray-50\" {...props} />,\n                                    tbody: ({node, ...props}) => <tbody {...props} />,\n                                    tr: ({node, ...props}) => <tr className=\"border-b border-gray-200\" {...props} />,\n                                    th: ({node, ...props}) => <th className=\"border border-gray-300 px-3 py-2 text-left font-semibold bg-gray-100\" {...props} />,\n                                    td: ({node, ...props}) => <td className=\"border border-gray-300 px-3 py-2\" {...props} />,\n                                  }}\n                                >\n                                  {cleanMarkdown(analysis.analysis_result)}\n                                </ReactMarkdown>\n                              </div>\n                            </div>\n                          )}\n                          <div className=\"flex items-center justify-between pt-2 border-t\">\n                            <span className=\"text-xs text-gray-400\">\n                              分析时间：{formatRelativeTime(analysis.created_at)}\n                            </span>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={() => handleCopy(fullContent, analysis.id)}\n                              className=\"h-7 px-2 text-xs\"\n                            >\n                              {copiedId === analysis.id ? (\n                                <>\n                                  <Check className=\"w-3 h-3 mr-1\" />\n                                  已复制\n                                </>\n                              ) : (\n                                <>\n                                  <Copy className=\"w-3 h-3 mr-1\" />\n                                  复制\n                                </>\n                              )}\n                            </Button>\n                          </div>\n                        </div>\n                      </CardContent>\n                    </Card>\n                  )\n                })}\n              </div>\n            )}\n\n            {/* 相关新闻推荐 */}\n            {relatedNews && relatedNews.length > 0 && (\n              <div>\n                <h3 className=\"font-semibold text-gray-900 mb-3\">相关新闻</h3>\n                <div className=\"space-y-2\">\n                  {relatedNews.map((related) => (\n                    <Card\n                      key={related.id}\n                      className=\"hover:shadow-md transition-shadow cursor-pointer\"\n                      onClick={() => {\n                        onOpenChange(false)\n                        setTimeout(() => {\n                          // 触发父组件更新newsId\n                          window.dispatchEvent(new CustomEvent('news-select', { detail: related.id }))\n                        }, 300)\n                      }}\n                    >\n                      <CardContent className=\"pt-4\">\n                        <h4 className=\"font-medium text-sm line-clamp-2 mb-2\">\n                          {related.title}\n                        </h4>\n                        <div className=\"flex items-center gap-2 text-xs text-gray-500\">\n                          <span>{formatRelativeTime(related.publish_time || related.created_at)}</span>\n                          {related.stock_codes && related.stock_codes.length > 0 && (\n                            <>\n                              <span>•</span>\n                              <span>{related.stock_codes.length} 只股票</span>\n                            </>\n                          )}\n                        </div>\n                      </CardContent>\n                    </Card>\n                  ))}\n                </div>\n              </div>\n            )}\n          </div>\n        )}\n      </SheetContent>\n    </Sheet>\n  )\n}\n\n"
  },
  {
    "path": "frontend/src/components/StockSearch.tsx",
    "content": "/**\n * 股票搜索组件\n * 支持代码和名称模糊搜索\n */\nimport { useState, useCallback, useRef, useEffect } from 'react'\nimport { useQuery, useMutation, useQueryClient } from '@tanstack/react-query'\nimport { useNavigate } from 'react-router-dom'\nimport { stockApi } from '@/lib/api-client'\nimport { cn } from '@/lib/utils'\nimport { Search, Loader2, Database, RefreshCw } from 'lucide-react'\nimport { toast } from 'sonner'\n\ninterface StockSearchProps {\n  className?: string\n  placeholder?: string\n  onSelect?: (stock: { code: string; name: string; full_code: string }) => void\n}\n\nexport default function StockSearch({\n  className,\n  placeholder = '搜索股票代码或名称...',\n  onSelect,\n}: StockSearchProps) {\n  const [keyword, setKeyword] = useState('')\n  const [isOpen, setIsOpen] = useState(false)\n  const [selectedIndex, setSelectedIndex] = useState(-1)\n  const inputRef = useRef<HTMLInputElement>(null)\n  const listRef = useRef<HTMLDivElement>(null)\n  const navigate = useNavigate()\n  const queryClient = useQueryClient()\n\n  // 获取股票数量\n  const { data: stockCount } = useQuery({\n    queryKey: ['stock-count'],\n    queryFn: () => stockApi.getStockCount(),\n    staleTime: 60 * 1000,\n  })\n\n  // 初始化股票数据\n  const initMutation = useMutation({\n    mutationFn: () => stockApi.initStockData(),\n    onSuccess: (data) => {\n      if (data.success) {\n        toast.success(`成功导入 ${data.count} 只股票！`)\n        queryClient.invalidateQueries({ queryKey: ['stock-count'] })\n        queryClient.invalidateQueries({ queryKey: ['stock-search'] })\n      } else {\n        toast.error(data.message)\n      }\n    },\n    onError: (error: Error) => {\n      toast.error(`初始化失败: ${error.message}`)\n    },\n  })\n\n  // 搜索查询\n  const { data: searchResults, isLoading } = useQuery({\n    queryKey: ['stock-search', keyword],\n    queryFn: () => stockApi.searchRealtime(keyword, 15),\n    enabled: keyword.length >= 1,\n    staleTime: 30 * 1000,\n  })\n\n  // 处理选择股票\n  const handleSelect = useCallback((stock: { code: string; name: string; full_code: string }) => {\n    setKeyword('')\n    setIsOpen(false)\n    setSelectedIndex(-1)\n    \n    if (onSelect) {\n      onSelect(stock)\n    } else {\n      // 默认跳转到股票分析页面\n      navigate(`/stock/${stock.full_code}`)\n    }\n  }, [navigate, onSelect])\n\n  // 键盘导航\n  const handleKeyDown = useCallback((e: React.KeyboardEvent) => {\n    if (!searchResults || searchResults.length === 0) return\n\n    switch (e.key) {\n      case 'ArrowDown':\n        e.preventDefault()\n        setSelectedIndex(prev => \n          prev < searchResults.length - 1 ? prev + 1 : 0\n        )\n        break\n      case 'ArrowUp':\n        e.preventDefault()\n        setSelectedIndex(prev => \n          prev > 0 ? prev - 1 : searchResults.length - 1\n        )\n        break\n      case 'Enter':\n        e.preventDefault()\n        if (selectedIndex >= 0 && searchResults[selectedIndex]) {\n          handleSelect(searchResults[selectedIndex])\n        }\n        break\n      case 'Escape':\n        setIsOpen(false)\n        setSelectedIndex(-1)\n        break\n    }\n  }, [searchResults, selectedIndex, handleSelect])\n\n  // 点击外部关闭\n  useEffect(() => {\n    const handleClickOutside = (e: MouseEvent) => {\n      if (\n        inputRef.current &&\n        !inputRef.current.contains(e.target as Node) &&\n        listRef.current &&\n        !listRef.current.contains(e.target as Node)\n      ) {\n        setIsOpen(false)\n      }\n    }\n\n    document.addEventListener('mousedown', handleClickOutside)\n    return () => document.removeEventListener('mousedown', handleClickOutside)\n  }, [])\n\n  // 滚动到选中项\n  useEffect(() => {\n    if (selectedIndex >= 0 && listRef.current) {\n      const selectedItem = listRef.current.children[selectedIndex] as HTMLElement\n      if (selectedItem) {\n        selectedItem.scrollIntoView({ block: 'nearest' })\n      }\n    }\n  }, [selectedIndex])\n\n  return (\n    <div className={cn('relative', className)}>\n      {/* 搜索输入框 */}\n      <div className=\"relative\">\n        <Search className=\"absolute left-3 top-1/2 -translate-y-1/2 w-4 h-4 text-gray-400\" />\n        <input\n          ref={inputRef}\n          type=\"text\"\n          value={keyword}\n          onChange={(e) => {\n            setKeyword(e.target.value)\n            setIsOpen(true)\n            setSelectedIndex(-1)\n          }}\n          onFocus={() => setIsOpen(true)}\n          onKeyDown={handleKeyDown}\n          placeholder={placeholder}\n          className={cn(\n            'w-full pl-10 pr-4 py-2.5 text-sm',\n            'border border-gray-200 rounded-lg',\n            'focus:outline-none focus:ring-2 focus:ring-blue-500/20 focus:border-blue-400',\n            'placeholder:text-gray-400',\n            'transition-all duration-200'\n          )}\n        />\n        {isLoading && (\n          <Loader2 className=\"absolute right-3 top-1/2 -translate-y-1/2 w-4 h-4 text-gray-400 animate-spin\" />\n        )}\n      </div>\n\n      {/* 搜索结果下拉列表 */}\n      {isOpen && keyword.length >= 1 && (\n        <div\n          ref={listRef}\n          className={cn(\n            'absolute z-50 w-full mt-1',\n            'bg-white rounded-lg shadow-lg border border-gray-100',\n            'max-h-[400px] overflow-y-auto',\n            'animate-in fade-in-0 zoom-in-95 duration-150'\n          )}\n        >\n          {isLoading ? (\n            <div className=\"flex items-center justify-center py-8 text-gray-500\">\n              <Loader2 className=\"w-5 h-5 animate-spin mr-2\" />\n              搜索中...\n            </div>\n          ) : searchResults && searchResults.length > 0 ? (\n            <div className=\"py-1\">\n              {searchResults.map((stock, index) => (\n                <div\n                  key={stock.code}\n                  onClick={() => handleSelect(stock)}\n                  className={cn(\n                    'flex items-center justify-between px-4 py-3 cursor-pointer',\n                    'transition-colors duration-100',\n                    selectedIndex === index\n                      ? 'bg-blue-50'\n                      : 'hover:bg-gray-50'\n                  )}\n                >\n                  <div className=\"flex items-center gap-3\">\n                    <div className=\"flex flex-col\">\n                      <span className=\"font-medium text-gray-900\">\n                        {stock.name}\n                      </span>\n                      <span className=\"text-xs text-gray-500\">\n                        {stock.full_code}\n                      </span>\n                    </div>\n                  </div>\n                  <div className=\"flex items-center gap-2\">\n                    {stock.market && (\n                      <span className=\"text-xs px-1.5 py-0.5 bg-gray-100 text-gray-600 rounded\">\n                        {stock.market}\n                      </span>\n                    )}\n                    {stock.industry && (\n                      <span className=\"text-xs text-gray-500\">\n                        {stock.industry}\n                      </span>\n                    )}\n                  </div>\n                </div>\n              ))}\n            </div>\n          ) : (\n            <div className=\"py-6 text-center\">\n              {stockCount && stockCount.count === 0 ? (\n                // 数据库为空时显示初始化按钮\n                <div className=\"space-y-3\">\n                  <Database className=\"w-10 h-10 mx-auto text-gray-300\" />\n                  <p className=\"text-gray-500\">股票数据库为空</p>\n                  <p className=\"text-sm text-gray-400\">点击下方按钮初始化股票数据</p>\n                  <button\n                    onClick={(e) => {\n                      e.stopPropagation()\n                      initMutation.mutate()\n                    }}\n                    disabled={initMutation.isPending}\n                    className={cn(\n                      'inline-flex items-center gap-2 px-4 py-2 text-sm font-medium rounded-lg',\n                      'bg-blue-600 text-white hover:bg-blue-700',\n                      'disabled:opacity-50 disabled:cursor-not-allowed',\n                      'transition-colors'\n                    )}\n                  >\n                    {initMutation.isPending ? (\n                      <>\n                        <Loader2 className=\"w-4 h-4 animate-spin\" />\n                        正在导入股票数据...\n                      </>\n                    ) : (\n                      <>\n                        <RefreshCw className=\"w-4 h-4\" />\n                        初始化股票数据\n                      </>\n                    )}\n                  </button>\n                </div>\n              ) : (\n                // 有数据但没有匹配结果\n                <div>\n                  <p className=\"text-gray-500\">未找到匹配的股票</p>\n                  <p className=\"text-sm text-gray-400 mt-1\">尝试输入股票代码或名称</p>\n                </div>\n              )}\n            </div>\n          )}\n          \n          {/* 快捷提示 */}\n          <div className=\"px-4 py-2 border-t border-gray-100 bg-gray-50/50\">\n            <div className=\"flex items-center gap-4 text-xs text-gray-400\">\n              <span>\n                <kbd className=\"px-1.5 py-0.5 bg-gray-100 rounded text-gray-500\">↑↓</kbd> 导航\n              </span>\n              <span>\n                <kbd className=\"px-1.5 py-0.5 bg-gray-100 rounded text-gray-500\">Enter</kbd> 选择\n              </span>\n              <span>\n                <kbd className=\"px-1.5 py-0.5 bg-gray-100 rounded text-gray-500\">Esc</kbd> 关闭\n              </span>\n            </div>\n          </div>\n        </div>\n      )}\n    </div>\n  )\n}\n\n"
  },
  {
    "path": "frontend/src/components/alpha-mining/AgentDemo.tsx",
    "content": "/**\n * AgenticX Agent 调用演示组件\n * \n * 展示如何通过 Agent 接口调用 AlphaMiningTool：\n * - Agent 调用流程可视化\n * - Tool 参数输入面板\n * - 执行日志流式显示\n */\n\nimport React, { useState, useCallback } from 'react';\nimport { Card, CardContent, CardHeader, CardTitle, CardDescription } from '../ui/card';\nimport { Button } from '../ui/button';\nimport { Badge } from '../ui/badge';\nimport { motion, AnimatePresence } from 'framer-motion';\nimport {\n  Bot, Wrench, Play, CheckCircle2, XCircle,\n  Clock, ArrowRight, Terminal, Loader2, Code\n} from 'lucide-react';\nimport { useGlobalI18n } from '@/store/useLanguageStore';\n\ninterface AgentDemoResult {\n  success: boolean;\n  agent_name: string;\n  tool_name: string;\n  input_params: Record<string, any>;\n  output: Record<string, any> | null;\n  execution_time: number;\n  logs: string[];\n}\n\ninterface AgentDemoProps {\n  apiBaseUrl?: string;\n}\n\nconst AgentDemo: React.FC<AgentDemoProps> = ({\n  apiBaseUrl = '/api/v1',\n}) => {\n  const t = useGlobalI18n();\n  const [loading, setLoading] = useState(false);\n  const [result, setResult] = useState<AgentDemoResult | null>(null);\n  const [error, setError] = useState<string | null>(null);\n  \n  // 参数\n  const [stockCode, setStockCode] = useState('SH600519');\n  const [numSteps, setNumSteps] = useState(30);\n  const [useSentiment, setUseSentiment] = useState(true);\n  \n  // 执行演示\n  const runDemo = useCallback(async () => {\n    setLoading(true);\n    setError(null);\n    setResult(null);\n\n    try {\n      const response = await fetch(`${apiBaseUrl}/alpha-mining/agent-demo`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({\n          stock_code: stockCode || null,\n          num_steps: numSteps,\n          use_sentiment: useSentiment,\n        }),\n      });\n\n      if (!response.ok) {\n        throw new Error(`HTTP error! status: ${response.status}`);\n      }\n\n      const data = await response.json();\n      setResult(data);\n    } catch (err: any) {\n      console.error('Agent demo error:', err);\n      setError(err.message || t.alphaMining.agent.executeFailed);\n    } finally {\n      setLoading(false);\n    }\n  }, [apiBaseUrl, stockCode, numSteps, useSentiment]);\n\n  return (\n    <Card className=\"w-full\">\n      <CardHeader>\n        <div className=\"flex items-center justify-between\">\n          <div>\n            <CardTitle className=\"flex items-center gap-2\">\n              <Bot className=\"w-5 h-5 text-indigo-500\" />\n              {t.alphaMining.agent.title}\n            </CardTitle>\n            <CardDescription>\n              {t.alphaMining.agent.desc}\n            </CardDescription>\n          </div>\n          {result && (\n            <Badge className={result.success ? 'bg-green-100 text-green-700' : 'bg-red-100 text-red-700'}>\n              {result.success ? <CheckCircle2 className=\"w-3 h-3 mr-1\" /> : <XCircle className=\"w-3 h-3 mr-1\" />}\n              {result.success ? t.alphaMining.agent.success : t.alphaMining.agent.failed}\n            </Badge>\n          )}\n        </div>\n      </CardHeader>\n\n      <CardContent className=\"space-y-4\">\n        {/* 调用流程图 */}\n        <div className=\"flex items-center justify-center gap-2 p-4 bg-gradient-to-r from-indigo-50 to-purple-50 rounded-lg\">\n          <FlowNode icon={<Bot className=\"w-5 h-5\" />} label=\"QuantitativeAgent\" active={loading} />\n          <ArrowRight className=\"w-4 h-4 text-gray-400\" />\n          <FlowNode icon={<Wrench className=\"w-5 h-5\" />} label=\"AlphaMiningTool\" active={loading} />\n          <ArrowRight className=\"w-4 h-4 text-gray-400\" />\n          <FlowNode icon={<Code className=\"w-5 h-5\" />} label=\"AlphaTrainer\" active={loading} />\n        </div>\n\n        {/* 参数输入面板 */}\n        <Card className=\"bg-gray-50\">\n          <CardHeader className=\"pb-2\">\n            <CardTitle className=\"text-sm flex items-center gap-2\">\n              <Wrench className=\"w-4 h-4 text-gray-500\" />\n              {t.alphaMining.agent.toolParams}\n            </CardTitle>\n          </CardHeader>\n          <CardContent>\n            <div className=\"grid grid-cols-1 md:grid-cols-3 gap-4\">\n              <div>\n                <label className=\"text-xs font-medium text-gray-600 block mb-1\">\n                  {t.alphaMining.agent.stockCode}\n                </label>\n                <input\n                  type=\"text\"\n                  value={stockCode}\n                  onChange={(e) => setStockCode(e.target.value)}\n                  placeholder={t.alphaMining.agent.stockPlaceholder}\n                  disabled={loading}\n                  className=\"w-full px-3 py-2 border rounded-md text-sm\"\n                />\n              </div>\n              <div>\n                <label className=\"text-xs font-medium text-gray-600 block mb-1\">\n                  {t.alphaMining.agent.steps}\n                </label>\n                <input\n                  type=\"number\"\n                  value={numSteps}\n                  onChange={(e) => setNumSteps(Number(e.target.value))}\n                  min={10}\n                  max={100}\n                  disabled={loading}\n                  className=\"w-full px-3 py-2 border rounded-md text-sm\"\n                />\n              </div>\n              <div className=\"flex items-end\">\n                <label className=\"flex items-center gap-2 cursor-pointer\">\n                  <input\n                    type=\"checkbox\"\n                    checked={useSentiment}\n                    onChange={(e) => setUseSentiment(e.target.checked)}\n                    disabled={loading}\n                    className=\"rounded\"\n                  />\n                  <span className=\"text-sm\">{t.alphaMining.agent.useSentiment}</span>\n                </label>\n              </div>\n            </div>\n            \n            <div className=\"mt-4 flex justify-end\">\n              <Button onClick={runDemo} disabled={loading}>\n                {loading ? (\n                  <>\n                    <Loader2 className=\"w-4 h-4 mr-1 animate-spin\" />\n                    {t.alphaMining.agent.executing}\n                  </>\n                ) : (\n                  <>\n                    <Play className=\"w-4 h-4 mr-1\" />\n                    {t.alphaMining.agent.execute}\n                  </>\n                )}\n              </Button>\n            </div>\n          </CardContent>\n        </Card>\n\n        {/* 执行结果 */}\n        <AnimatePresence>\n          {result && (\n            <motion.div\n              initial={{ opacity: 0, y: 20 }}\n              animate={{ opacity: 1, y: 0 }}\n              exit={{ opacity: 0, y: -20 }}\n              className=\"space-y-4\"\n            >\n              {/* Agent & Tool 信息 */}\n              <div className=\"grid grid-cols-2 gap-4\">\n                <Card>\n                  <CardContent className=\"pt-4\">\n                    <div className=\"flex items-center gap-2 text-sm text-gray-500 mb-1\">\n                      <Bot className=\"w-4 h-4\" />\n                      Agent\n                    </div>\n                    <div className=\"font-medium\">{result.agent_name}</div>\n                  </CardContent>\n                </Card>\n                <Card>\n                  <CardContent className=\"pt-4\">\n                    <div className=\"flex items-center gap-2 text-sm text-gray-500 mb-1\">\n                      <Wrench className=\"w-4 h-4\" />\n                      Tool\n                    </div>\n                    <div className=\"font-medium\">{result.tool_name}</div>\n                  </CardContent>\n                </Card>\n              </div>\n\n              {/* 输入参数 */}\n              <Card>\n                <CardHeader className=\"pb-2\">\n                  <CardTitle className=\"text-sm\">{t.alphaMining.agent.inputParams}</CardTitle>\n                </CardHeader>\n                <CardContent>\n                  <pre className=\"text-xs bg-gray-900 text-green-400 p-3 rounded-md overflow-x-auto\">\n                    {JSON.stringify(result.input_params, null, 2)}\n                  </pre>\n                </CardContent>\n              </Card>\n\n              {/* 输出结果 */}\n              {result.output && (\n                <Card className=\"border-green-200 bg-green-50/50\">\n                  <CardHeader className=\"pb-2\">\n                    <CardTitle className=\"text-sm flex items-center gap-2 text-green-700\">\n                      <CheckCircle2 className=\"w-4 h-4\" />\n                      {t.alphaMining.agent.output}\n                    </CardTitle>\n                  </CardHeader>\n                  <CardContent>\n                    <div className=\"grid grid-cols-3 gap-4 mb-3\">\n                      <div className=\"text-center p-3 bg-white rounded-lg\">\n                        <div className=\"text-xs text-gray-500\">Best Score</div>\n                        <div className=\"text-lg font-bold text-green-600\">\n                          {result.output.best_score?.toFixed(4) || '--'}\n                        </div>\n                      </div>\n                      <div className=\"text-center p-3 bg-white rounded-lg\">\n                        <div className=\"text-xs text-gray-500\">Total Steps</div>\n                        <div className=\"text-lg font-bold\">\n                          {result.output.total_steps || '--'}\n                        </div>\n                      </div>\n                      <div className=\"text-center p-3 bg-white rounded-lg\">\n                        <div className=\"text-xs text-gray-500 flex items-center justify-center gap-1\">\n                          <Clock className=\"w-3 h-3\" />\n                          {t.alphaMining.agent.executionTime}\n                        </div>\n                        <div className=\"text-lg font-bold\">\n                          {result.execution_time}s\n                        </div>\n                      </div>\n                    </div>\n                    {result.output.best_formula && (\n                      <div className=\"p-3 bg-white rounded-lg\">\n                        <div className=\"text-xs text-gray-500 mb-1\">{t.alphaMining.agent.bestFactor}</div>\n                        <code className=\"text-sm font-mono text-emerald-700\">\n                          {result.output.best_formula}\n                        </code>\n                      </div>\n                    )}\n                  </CardContent>\n                </Card>\n              )}\n\n              {/* 执行日志 */}\n              <Card>\n                <CardHeader className=\"pb-2\">\n                  <CardTitle className=\"text-sm flex items-center gap-2\">\n                    <Terminal className=\"w-4 h-4 text-gray-500\" />\n                    {t.alphaMining.agent.logs}\n                  </CardTitle>\n                </CardHeader>\n                <CardContent>\n                  <div className=\"bg-gray-900 rounded-md p-3 max-h-48 overflow-y-auto\">\n                    {result.logs.map((log, idx) => (\n                      <motion.div\n                        key={idx}\n                        initial={{ opacity: 0, x: -10 }}\n                        animate={{ opacity: 1, x: 0 }}\n                        transition={{ delay: idx * 0.1 }}\n                        className=\"text-xs font-mono text-gray-300 mb-1\"\n                      >\n                        <span className=\"text-gray-500\">{idx + 1}.</span> {log}\n                      </motion.div>\n                    ))}\n                  </div>\n                </CardContent>\n              </Card>\n\n              {/* 代码示例 */}\n              <Card>\n                <CardHeader className=\"pb-2\">\n                  <CardTitle className=\"text-sm flex items-center gap-2\">\n                    <Code className=\"w-4 h-4 text-gray-500\" />\n                    {t.alphaMining.agent.codeExample}\n                  </CardTitle>\n                </CardHeader>\n                <CardContent>\n                  <pre className=\"text-xs bg-gray-900 text-gray-300 p-3 rounded-md overflow-x-auto\">\n{`from agenticx.agents import QuantitativeAgent\nfrom finnews.alpha_mining.tools import AlphaMiningTool\n\n# ${t.alphaMining.agent.createAgent || 'Create Agent'}\nagent = QuantitativeAgent(name=\"Quant\")\n\n# ${t.alphaMining.agent.registerTool || 'Register Tool'}\nagent.register_tool(AlphaMiningTool())\n\n# ${t.alphaMining.agent.executeMining || 'Execute factor mining'}\nresult = await agent.run(\n    task=\"${t.alphaMining.agent.miningTask.replace('{code}', stockCode || 'SH600519')}\",\n    tools=[\"alpha_mining\"],\n    params={\n        \"num_steps\": ${numSteps},\n        \"use_sentiment\": ${useSentiment}\n    }\n)\n\nprint(f\"Best Factor: {result.best_formula}\")\nprint(f\"Score: {result.best_score}\")`}\n                  </pre>\n                </CardContent>\n              </Card>\n            </motion.div>\n          )}\n        </AnimatePresence>\n\n        {/* 错误提示 */}\n        {error && (\n          <div className=\"p-4 bg-red-50 rounded-lg border border-red-200\">\n            <p className=\"text-sm text-red-600\">{error}</p>\n          </div>\n        )}\n\n        {/* 初始状态 */}\n        {!loading && !result && !error && (\n          <div className=\"py-8 text-center text-gray-500\">\n            <Bot className=\"w-12 h-12 mx-auto opacity-50 mb-3\" />\n            <p>{t.alphaMining.agent.startHint}</p>\n            <p className=\"text-sm mt-1\">\n              {t.alphaMining.agent.startDesc}\n            </p>\n          </div>\n        )}\n      </CardContent>\n    </Card>\n  );\n};\n\n// 流程节点组件\ninterface FlowNodeProps {\n  icon: React.ReactNode;\n  label: string;\n  active?: boolean;\n}\n\nconst FlowNode: React.FC<FlowNodeProps> = ({ icon, label, active }) => {\n  return (\n    <div className={`\n      flex flex-col items-center p-3 rounded-lg transition-all\n      ${active ? 'bg-indigo-100 ring-2 ring-indigo-400' : 'bg-white'}\n    `}>\n      <div className={`\n        p-2 rounded-full mb-1\n        ${active ? 'bg-indigo-500 text-white' : 'bg-gray-100 text-gray-600'}\n      `}>\n        {active ? <Loader2 className=\"w-5 h-5 animate-spin\" /> : icon}\n      </div>\n      <span className=\"text-xs font-medium text-gray-700\">{label}</span>\n    </div>\n  );\n};\n\nexport default AgentDemo;\n"
  },
  {
    "path": "frontend/src/components/alpha-mining/MetricsDashboard.tsx",
    "content": "/**\n * 完整评估指标仪表盘\n * \n * 展示因子评估的所有指标：\n * - 雷达图：多维度指标可视化\n * - 收益曲线：策略收益 vs 基准\n * - 风险指标卡片\n */\n\nimport React from 'react';\nimport { Card, CardContent, CardHeader, CardTitle, CardDescription } from '../ui/card';\nimport { Badge } from '../ui/badge';\nimport {\n  RadarChart, PolarGrid, PolarAngleAxis, PolarRadiusAxis, Radar,\n  LineChart, Line, XAxis, YAxis, CartesianGrid, Tooltip,\n  ResponsiveContainer, Legend, Area, AreaChart, BarChart, Bar\n} from 'recharts';\nimport {\n  TrendingUp, TrendingDown, Activity, AlertTriangle,\n  Target, BarChart2, PieChart, Percent\n} from 'lucide-react';\nimport { useGlobalI18n } from '@/store/useLanguageStore';\n\nexport interface FactorMetrics {\n  sortino_ratio: number;\n  sharpe_ratio: number;\n  ic: number;\n  rank_ic: number;\n  max_drawdown: number;\n  turnover: number;\n  total_return: number;\n  win_rate: number;\n  avg_return?: number;\n}\n\ninterface MetricsDashboardProps {\n  metrics: FactorMetrics | null;\n  formula?: string;\n  loading?: boolean;\n  returnsCurve?: { date: string; strategy: number; benchmark: number }[];\n}\n\nconst MetricsDashboard: React.FC<MetricsDashboardProps> = ({\n  metrics,\n  formula,\n  loading = false,\n  returnsCurve,\n}) => {\n  const t = useGlobalI18n();\n  \n  if (loading) {\n    return (\n      <Card className=\"w-full animate-pulse\">\n        <CardHeader>\n          <div className=\"h-6 bg-gray-200 rounded w-1/3\" />\n        </CardHeader>\n        <CardContent>\n          <div className=\"h-64 bg-gray-100 rounded\" />\n        </CardContent>\n      </Card>\n    );\n  }\n\n  if (!metrics) {\n    return (\n      <Card className=\"w-full\">\n        <CardContent className=\"py-12 text-center text-gray-500\">\n          <BarChart2 className=\"w-12 h-12 mx-auto opacity-50 mb-3\" />\n          <p>{t.alphaMining.metrics.noData}</p>\n          <p className=\"text-sm mt-1\">{t.alphaMining.metrics.hint}</p>\n        </CardContent>\n      </Card>\n    );\n  }\n\n  // 雷达图数据（归一化到 0-100）\n  const radarData = [\n    { \n      metric: 'Sortino', \n      value: normalizeMetric(metrics.sortino_ratio, -2, 5), \n      fullMark: 100 \n    },\n    { \n      metric: 'Sharpe', \n      value: normalizeMetric(metrics.sharpe_ratio, -2, 3), \n      fullMark: 100 \n    },\n    { \n      metric: 'IC', \n      value: normalizeMetric(metrics.ic, -0.3, 0.3) , \n      fullMark: 100 \n    },\n    { \n      metric: 'Rank IC', \n      value: normalizeMetric(metrics.rank_ic, -0.3, 0.3), \n      fullMark: 100 \n    },\n    { \n      metric: 'Win Rate', \n      value: metrics.win_rate * 100, \n      fullMark: 100 \n    },\n    { \n      metric: t.alphaMining.metrics.lowTurnover, \n      value: 100 - metrics.turnover * 100, \n      fullMark: 100 \n    },\n  ];\n\n  // 评级逻辑\n  const rating = getFactorRating(metrics, t);\n\n  return (\n    <div className=\"space-y-4\">\n      {/* 因子表达式 & 评级 */}\n      {formula && (\n        <Card className=\"bg-gradient-to-r from-blue-50 to-indigo-50\">\n          <CardContent className=\"py-4\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <div className=\"text-xs text-gray-500 mb-1\">{t.alphaMining.metrics.currentFactor}</div>\n                <code className=\"text-sm font-mono font-medium text-gray-800\">\n                  {formula}\n                </code>\n              </div>\n              <Badge className={rating.className}>\n                {rating.icon}\n                <span className=\"ml-1\">{rating.label}</span>\n              </Badge>\n            </div>\n          </CardContent>\n        </Card>\n      )}\n\n      {/* 主要指标卡片 */}\n      <div className=\"grid grid-cols-2 md:grid-cols-4 gap-3\">\n        <MetricCard\n          label=\"Sortino Ratio\"\n          value={metrics.sortino_ratio.toFixed(4)}\n          description={t.alphaMining.metrics.maxDrawdown}\n          icon={<Target className=\"w-4 h-4\" />}\n          trend={metrics.sortino_ratio > 0 ? 'up' : 'down'}\n          good={metrics.sortino_ratio > 1}\n        />\n        <MetricCard\n          label=\"Sharpe Ratio\"\n          value={metrics.sharpe_ratio.toFixed(4)}\n          description={t.alphaMining.metrics.maxDrawdown}\n          icon={<TrendingUp className=\"w-4 h-4\" />}\n          trend={metrics.sharpe_ratio > 0 ? 'up' : 'down'}\n          good={metrics.sharpe_ratio > 0.5}\n        />\n        <MetricCard\n          label=\"IC\"\n          value={metrics.ic.toFixed(4)}\n          description={t.alphaMining.metrics.maxDrawdown}\n          icon={<Activity className=\"w-4 h-4\" />}\n          trend={metrics.ic > 0 ? 'up' : 'down'}\n          good={Math.abs(metrics.ic) > 0.03}\n        />\n        <MetricCard\n          label=\"Rank IC\"\n          value={metrics.rank_ic.toFixed(4)}\n          description={t.alphaMining.metrics.maxDrawdown}\n          icon={<BarChart2 className=\"w-4 h-4\" />}\n          trend={metrics.rank_ic > 0 ? 'up' : 'down'}\n          good={Math.abs(metrics.rank_ic) > 0.03}\n        />\n      </div>\n\n      {/* 雷达图 & 风险指标 */}\n      <div className=\"grid grid-cols-1 lg:grid-cols-2 gap-4\">\n        {/* 雷达图 */}\n        <Card>\n          <CardHeader className=\"pb-2\">\n            <CardTitle className=\"text-sm flex items-center gap-2\">\n              <PieChart className=\"w-4 h-4 text-indigo-500\" />\n              {t.alphaMining.metrics.multiDim}\n            </CardTitle>\n          </CardHeader>\n          <CardContent>\n            <div className=\"h-64\">\n              <ResponsiveContainer width=\"100%\" height=\"100%\">\n                <RadarChart data={radarData}>\n                  <PolarGrid stroke=\"#e5e7eb\" />\n                  <PolarAngleAxis \n                    dataKey=\"metric\" \n                    tick={{ fontSize: 11, fill: '#6b7280' }}\n                  />\n                  <PolarRadiusAxis \n                    angle={30} \n                    domain={[0, 100]} \n                    tick={{ fontSize: 10 }}\n                  />\n                  <Radar\n                    name={t.alphaMining.metrics.currentFactor}\n                    dataKey=\"value\"\n                    stroke=\"#6366f1\"\n                    fill=\"#6366f1\"\n                    fillOpacity={0.3}\n                    strokeWidth={2}\n                  />\n                  <Tooltip\n                    contentStyle={{\n                      backgroundColor: 'rgba(255, 255, 255, 0.95)',\n                      borderRadius: '8px',\n                      border: '1px solid #e5e7eb',\n                      fontSize: 12,\n                    }}\n                    formatter={(value: number) => [`${value.toFixed(1)}`, t.alphaMining.metrics.currentFactor]}\n                  />\n                </RadarChart>\n              </ResponsiveContainer>\n            </div>\n          </CardContent>\n        </Card>\n\n        {/* 风险指标 */}\n        <Card>\n          <CardHeader className=\"pb-2\">\n            <CardTitle className=\"text-sm flex items-center gap-2\">\n              <AlertTriangle className=\"w-4 h-4 text-amber-500\" />\n              {t.alphaMining.metrics.riskMetrics}\n            </CardTitle>\n          </CardHeader>\n          <CardContent className=\"space-y-4\">\n            {/* 最大回撤 */}\n            <div>\n              <div className=\"flex justify-between text-sm mb-1\">\n                <span className=\"text-gray-600\">{t.alphaMining.metrics.maxDrawdown}</span>\n                <span className={metrics.max_drawdown > 0.2 ? 'text-red-600 font-medium' : 'text-gray-800'}>\n                  {(metrics.max_drawdown * 100).toFixed(2)}%\n                </span>\n              </div>\n              <div className=\"w-full bg-gray-200 rounded-full h-2\">\n                <div\n                  className={`h-2 rounded-full ${\n                    metrics.max_drawdown > 0.3 ? 'bg-red-500' :\n                    metrics.max_drawdown > 0.2 ? 'bg-amber-500' : 'bg-green-500'\n                  }`}\n                  style={{ width: `${Math.min(metrics.max_drawdown * 100, 100)}%` }}\n                />\n              </div>\n              <div className=\"flex justify-between text-xs text-gray-400 mt-0.5\">\n                <span>0%</span>\n                <span>{t.alphaMining.metrics.safe}</span>\n                <span>{t.alphaMining.metrics.danger}</span>\n                <span>100%</span>\n              </div>\n            </div>\n\n            {/* 换手率 */}\n            <div>\n              <div className=\"flex justify-between text-sm mb-1\">\n                <span className=\"text-gray-600\">{t.alphaMining.metrics.dailyTurnover}</span>\n                <span className={metrics.turnover > 0.5 ? 'text-amber-600 font-medium' : 'text-gray-800'}>\n                  {(metrics.turnover * 100).toFixed(2)}%\n                </span>\n              </div>\n              <div className=\"w-full bg-gray-200 rounded-full h-2\">\n                <div\n                  className={`h-2 rounded-full ${\n                    metrics.turnover > 0.5 ? 'bg-amber-500' : 'bg-blue-500'\n                  }`}\n                  style={{ width: `${Math.min(metrics.turnover * 100, 100)}%` }}\n                />\n              </div>\n            </div>\n\n            {/* 胜率 */}\n            <div>\n              <div className=\"flex justify-between text-sm mb-1\">\n                <span className=\"text-gray-600\">{t.alphaMining.metrics.winRate}</span>\n                <span className={metrics.win_rate > 0.5 ? 'text-green-600 font-medium' : 'text-gray-800'}>\n                  {(metrics.win_rate * 100).toFixed(2)}%\n                </span>\n              </div>\n              <div className=\"w-full bg-gray-200 rounded-full h-2\">\n                <div\n                  className={`h-2 rounded-full ${\n                    metrics.win_rate > 0.55 ? 'bg-green-500' :\n                    metrics.win_rate > 0.5 ? 'bg-blue-500' : 'bg-gray-400'\n                  }`}\n                  style={{ width: `${metrics.win_rate * 100}%` }}\n                />\n              </div>\n            </div>\n\n            {/* 总收益 */}\n            <div className=\"pt-2 border-t\">\n              <div className=\"flex justify-between items-center\">\n                <span className=\"text-sm text-gray-600\">{t.alphaMining.metrics.totalReturn}</span>\n                <span className={`text-lg font-bold ${\n                  metrics.total_return > 0 ? 'text-green-600' : 'text-red-600'\n                }`}>\n                  {metrics.total_return > 0 ? '+' : ''}\n                  {(metrics.total_return * 100).toFixed(2)}%\n                </span>\n              </div>\n            </div>\n          </CardContent>\n        </Card>\n      </div>\n\n      {/* 收益曲线 */}\n      {returnsCurve && returnsCurve.length > 0 && (\n        <Card>\n          <CardHeader className=\"pb-2\">\n            <CardTitle className=\"text-sm flex items-center gap-2\">\n              <TrendingUp className=\"w-4 h-4 text-emerald-500\" />\n              {t.alphaMining.metrics.returnsCurve}\n            </CardTitle>\n            <CardDescription>{t.alphaMining.metrics.returnsDesc}</CardDescription>\n          </CardHeader>\n          <CardContent>\n            <div className=\"h-64\">\n              <ResponsiveContainer width=\"100%\" height=\"100%\">\n                <AreaChart data={returnsCurve}>\n                  <defs>\n                    <linearGradient id=\"colorStrategy\" x1=\"0\" y1=\"0\" x2=\"0\" y2=\"1\">\n                      <stop offset=\"5%\" stopColor=\"#10b981\" stopOpacity={0.3}/>\n                      <stop offset=\"95%\" stopColor=\"#10b981\" stopOpacity={0}/>\n                    </linearGradient>\n                    <linearGradient id=\"colorBenchmark\" x1=\"0\" y1=\"0\" x2=\"0\" y2=\"1\">\n                      <stop offset=\"5%\" stopColor=\"#6b7280\" stopOpacity={0.3}/>\n                      <stop offset=\"95%\" stopColor=\"#6b7280\" stopOpacity={0}/>\n                    </linearGradient>\n                  </defs>\n                  <CartesianGrid strokeDasharray=\"3 3\" stroke=\"#e5e7eb\" />\n                  <XAxis \n                    dataKey=\"date\" \n                    tick={{ fontSize: 10 }}\n                    tickFormatter={(value) => value.slice(5)}\n                  />\n                  <YAxis \n                    tick={{ fontSize: 10 }}\n                    tickFormatter={(value) => `${(value * 100).toFixed(0)}%`}\n                  />\n                  <Tooltip\n                    contentStyle={{\n                      backgroundColor: 'rgba(255, 255, 255, 0.95)',\n                      borderRadius: '8px',\n                      border: '1px solid #e5e7eb',\n                      fontSize: 12,\n                    }}\n                    formatter={(value: number, name: string) => [\n                      `${(value * 100).toFixed(2)}%`,\n                      name === 'strategy' ? t.alphaMining.metrics.strategy : t.alphaMining.metrics.benchmark\n                    ]}\n                  />\n                  <Legend />\n                  <Area\n                    type=\"monotone\"\n                    dataKey=\"strategy\"\n                    stroke=\"#10b981\"\n                    strokeWidth={2}\n                    fillOpacity={1}\n                    fill=\"url(#colorStrategy)\"\n                    name={t.alphaMining.metrics.strategy}\n                  />\n                  <Area\n                    type=\"monotone\"\n                    dataKey=\"benchmark\"\n                    stroke=\"#6b7280\"\n                    strokeWidth={1}\n                    fillOpacity={1}\n                    fill=\"url(#colorBenchmark)\"\n                    name={t.alphaMining.metrics.benchmark}\n                    strokeDasharray=\"5 5\"\n                  />\n                </AreaChart>\n              </ResponsiveContainer>\n            </div>\n          </CardContent>\n        </Card>\n      )}\n\n      {/* 指标说明 */}\n      <Card className=\"bg-gray-50\">\n        <CardContent className=\"py-3\">\n          <div className=\"grid grid-cols-2 md:grid-cols-4 gap-3 text-xs text-gray-600\">\n            <div><strong>Sortino:</strong> {t.alphaMining.metrics.sortinoDesc}</div>\n            <div><strong>Sharpe:</strong> {t.alphaMining.metrics.sharpeDesc}</div>\n            <div><strong>IC:</strong> {t.alphaMining.metrics.icDesc}</div>\n            <div><strong>Max DD:</strong> {t.alphaMining.metrics.maxDDDesc}</div>\n          </div>\n        </CardContent>\n      </Card>\n    </div>\n  );\n};\n\n// 单个指标卡片\ninterface MetricCardProps {\n  label: string;\n  value: string;\n  description: string;\n  icon: React.ReactNode;\n  trend?: 'up' | 'down';\n  good?: boolean;\n}\n\nconst MetricCard: React.FC<MetricCardProps> = ({\n  label,\n  value,\n  description,\n  icon,\n  trend,\n  good,\n}) => {\n  return (\n    <Card className={good ? 'border-green-200 bg-green-50/50' : ''}>\n      <CardContent className=\"p-3\">\n        <div className=\"flex items-center gap-2 text-gray-500 mb-1\">\n          {icon}\n          <span className=\"text-xs\">{label}</span>\n        </div>\n        <div className=\"flex items-center gap-1\">\n          <span className={`text-lg font-bold ${\n            good ? 'text-green-600' : trend === 'down' ? 'text-red-600' : ''\n          }`}>\n            {value}\n          </span>\n          {trend === 'up' && <TrendingUp className=\"w-3 h-3 text-green-500\" />}\n          {trend === 'down' && <TrendingDown className=\"w-3 h-3 text-red-500\" />}\n        </div>\n        <div className=\"text-xs text-gray-400 mt-0.5\">{description}</div>\n      </CardContent>\n    </Card>\n  );\n};\n\n// 归一化函数\nfunction normalizeMetric(value: number, min: number, max: number): number {\n  const normalized = ((value - min) / (max - min)) * 100;\n  return Math.max(0, Math.min(100, normalized));\n}\n\n// 因子评级\nfunction getFactorRating(metrics: FactorMetrics, t: any): {\n  label: string;\n  className: string;\n  icon: React.ReactNode;\n} {\n  const score = \n    (metrics.sortino_ratio > 1 ? 25 : metrics.sortino_ratio > 0 ? 15 : 0) +\n    (metrics.sharpe_ratio > 0.5 ? 25 : metrics.sharpe_ratio > 0 ? 15 : 0) +\n    (Math.abs(metrics.ic) > 0.05 ? 25 : Math.abs(metrics.ic) > 0.03 ? 15 : 0) +\n    (metrics.win_rate > 0.55 ? 25 : metrics.win_rate > 0.5 ? 15 : 0);\n\n  if (score >= 80) {\n    return {\n      label: t.alphaMining.metrics.excellent,\n      className: 'bg-green-100 text-green-700',\n      icon: <TrendingUp className=\"w-3 h-3\" />,\n    };\n  } else if (score >= 50) {\n    return {\n      label: t.alphaMining.metrics.good,\n      className: 'bg-blue-100 text-blue-700',\n      icon: <Activity className=\"w-3 h-3\" />,\n    };\n  } else if (score >= 30) {\n    return {\n      label: t.alphaMining.metrics.average,\n      className: 'bg-amber-100 text-amber-700',\n      icon: <AlertTriangle className=\"w-3 h-3\" />,\n    };\n  } else {\n    return {\n      label: t.alphaMining.metrics.poor,\n      className: 'bg-red-100 text-red-700',\n      icon: <TrendingDown className=\"w-3 h-3\" />,\n    };\n  }\n}\n\nexport default MetricsDashboard;\n"
  },
  {
    "path": "frontend/src/components/alpha-mining/OperatorGrid.tsx",
    "content": "/**\n * DSL 操作符可视化组件\n * \n * 展示 21 个因子操作符，按类别分组显示\n * 支持点击插入到因子表达式输入框\n */\n\nimport React, { useState } from 'react';\nimport { Card } from '../ui/card';\nimport { Badge } from '../ui/badge';\nimport { motion } from 'framer-motion';\nimport { \n  Plus, Minus, X, Divide,\n  ArrowRight, Clock, BarChart2,\n  GitBranch, Maximize, Minimize,\n  Activity, Zap, TrendingUp\n} from 'lucide-react';\nimport { useGlobalI18n } from '@/store/useLanguageStore';\n\n// 操作符分类\ntype OperatorCategory = 'arithmetic' | 'unary' | 'timeseries' | 'conditional' | 'special';\n\ninterface Operator {\n  name: string;\n  arity: number;\n  description: string;\n  category: OperatorCategory;\n  example: string;\n  icon: React.ReactNode;\n}\n\n// 获取操作符图标组件类型\ntype IconComponent = React.ComponentType<{ className?: string }>;\n\n// 操作符图标组件映射\nconst OPERATOR_ICON_COMPONENTS: Record<string, IconComponent> = {\n  ADD: Plus,\n  SUB: Minus,\n  MUL: X,\n  DIV: Divide,\n  NEG: Minus,\n  ABS: Activity,\n  SIGN: ArrowRight,\n  GATE: GitBranch,\n  MAX: Maximize,\n  MIN: Minimize,\n  DELAY1: Clock,\n  DELAY5: Clock,\n  DELTA1: TrendingUp,\n  DELTA5: TrendingUp,\n  MA5: BarChart2,\n  MA10: BarChart2,\n  STD5: Activity,\n  STD10: Activity,\n  JUMP: Zap,\n  DECAY: TrendingUp,\n  MAX3: Maximize,\n};\n\n// 获取操作符图标\nconst getOperatorIcon = (name: string): React.ReactNode => {\n  const IconComponent = OPERATOR_ICON_COMPONENTS[name] || Activity;\n  return <IconComponent className=\"w-4 h-4\" />;\n};\n\n// 获取操作符定义（支持国际化）\nconst getOperators = (t: any): Operator[] => [\n  // 算术运算 (4)\n  { name: 'ADD', arity: 2, description: t.alphaMining.operators.add, category: 'arithmetic', example: 'ADD(x, y) = x + y', icon: getOperatorIcon('ADD') },\n  { name: 'SUB', arity: 2, description: t.alphaMining.operators.sub, category: 'arithmetic', example: 'SUB(x, y) = x - y', icon: getOperatorIcon('SUB') },\n  { name: 'MUL', arity: 2, description: t.alphaMining.operators.mul, category: 'arithmetic', example: 'MUL(x, y) = x × y', icon: getOperatorIcon('MUL') },\n  { name: 'DIV', arity: 2, description: t.alphaMining.operators.div, category: 'arithmetic', example: 'DIV(x, y) = x / (y + ε)', icon: getOperatorIcon('DIV') },\n  \n  // 一元运算 (3)\n  { name: 'NEG', arity: 1, description: t.alphaMining.operators.neg, category: 'unary', example: 'NEG(x) = -x', icon: getOperatorIcon('NEG') },\n  { name: 'ABS', arity: 1, description: t.alphaMining.operators.abs, category: 'unary', example: 'ABS(x) = |x|', icon: getOperatorIcon('ABS') },\n  { name: 'SIGN', arity: 1, description: t.alphaMining.operators.sign, category: 'unary', example: 'SIGN(x) = ±1 or 0', icon: getOperatorIcon('SIGN') },\n  \n  // 条件运算 (3)\n  { name: 'GATE', arity: 3, description: t.alphaMining.operators.gate, category: 'conditional', example: 'GATE(c,x,y) = c>0?x:y', icon: getOperatorIcon('GATE') },\n  { name: 'MAX', arity: 2, description: t.alphaMining.operators.max, category: 'conditional', example: 'MAX(x, y)', icon: getOperatorIcon('MAX') },\n  { name: 'MIN', arity: 2, description: t.alphaMining.operators.min, category: 'conditional', example: 'MIN(x, y)', icon: getOperatorIcon('MIN') },\n  \n  // 时序运算 (8)\n  { name: 'DELAY1', arity: 1, description: t.alphaMining.operators.delay1, category: 'timeseries', example: 'x[t-1]', icon: getOperatorIcon('DELAY1') },\n  { name: 'DELAY5', arity: 1, description: t.alphaMining.operators.delay5, category: 'timeseries', example: 'x[t-5]', icon: getOperatorIcon('DELAY5') },\n  { name: 'DELTA1', arity: 1, description: t.alphaMining.operators.delta1, category: 'timeseries', example: 'x[t] - x[t-1]', icon: getOperatorIcon('DELTA1') },\n  { name: 'DELTA5', arity: 1, description: t.alphaMining.operators.delta5, category: 'timeseries', example: 'x[t] - x[t-5]', icon: getOperatorIcon('DELTA5') },\n  { name: 'MA5', arity: 1, description: t.alphaMining.operators.ma5, category: 'timeseries', example: 'mean(x[t-4:t])', icon: getOperatorIcon('MA5') },\n  { name: 'MA10', arity: 1, description: t.alphaMining.operators.ma10, category: 'timeseries', example: 'mean(x[t-9:t])', icon: getOperatorIcon('MA10') },\n  { name: 'STD5', arity: 1, description: t.alphaMining.operators.std5, category: 'timeseries', example: 'std(x[t-4:t])', icon: getOperatorIcon('STD5') },\n  { name: 'STD10', arity: 1, description: t.alphaMining.operators.std10, category: 'timeseries', example: 'std(x[t-9:t])', icon: getOperatorIcon('STD10') },\n  \n  // 特殊运算 (3)\n  { name: 'JUMP', arity: 1, description: t.alphaMining.operators.jump, category: 'special', example: t.alphaMining.operators.jumpExample, icon: getOperatorIcon('JUMP') },\n  { name: 'DECAY', arity: 1, description: t.alphaMining.operators.decay, category: 'special', example: 'x+0.8x[-1]+0.6x[-2]', icon: getOperatorIcon('DECAY') },\n  { name: 'MAX3', arity: 1, description: t.alphaMining.operators.max3, category: 'special', example: 'max(x[t:t-2])', icon: getOperatorIcon('MAX3') },\n];\n\n// 特征列表\nconst FEATURES = ['RET', 'VOL', 'VOLUME_CHG', 'TURNOVER', 'SENTIMENT', 'NEWS_COUNT'];\n\n// 获取类别配置（支持国际化）\nconst getCategoryConfig = (t: any): Record<OperatorCategory, { label: string; color: string; bgColor: string }> => ({\n  arithmetic: { label: t.alphaMining.operators.categoryArithmetic, color: 'text-blue-600', bgColor: 'bg-blue-50 hover:bg-blue-100' },\n  unary: { label: t.alphaMining.operators.categoryUnary, color: 'text-purple-600', bgColor: 'bg-purple-50 hover:bg-purple-100' },\n  timeseries: { label: t.alphaMining.operators.categoryTimeseries, color: 'text-emerald-600', bgColor: 'bg-emerald-50 hover:bg-emerald-100' },\n  conditional: { label: t.alphaMining.operators.categoryConditional, color: 'text-amber-600', bgColor: 'bg-amber-50 hover:bg-amber-100' },\n  special: { label: t.alphaMining.operators.categorySpecial, color: 'text-rose-600', bgColor: 'bg-rose-50 hover:bg-rose-100' },\n});\n\ninterface OperatorGridProps {\n  onOperatorClick?: (operator: string) => void;\n  onFeatureClick?: (feature: string) => void;\n  compact?: boolean;\n}\n\nconst OperatorGrid: React.FC<OperatorGridProps> = ({\n  onOperatorClick,\n  onFeatureClick,\n  compact = false,\n}) => {\n  const t = useGlobalI18n();\n  const OPERATORS = getOperators(t);\n  const CATEGORY_CONFIG = getCategoryConfig(t);\n  const [selectedCategory, setSelectedCategory] = useState<OperatorCategory | 'all'>('all');\n  const [hoveredOp, setHoveredOp] = useState<string | null>(null);\n\n  // 按类别过滤\n  const filteredOperators = selectedCategory === 'all' \n    ? OPERATORS \n    : OPERATORS.filter(op => op.category === selectedCategory);\n\n  // 按类别分组\n  const groupedOperators = filteredOperators.reduce((acc, op) => {\n    if (!acc[op.category]) acc[op.category] = [];\n    acc[op.category].push(op);\n    return acc;\n  }, {} as Record<OperatorCategory, Operator[]>);\n\n  return (\n    <div className=\"space-y-4\">\n      {/* 类别筛选 */}\n      <div className=\"flex flex-wrap gap-2\">\n        <Badge\n          variant={selectedCategory === 'all' ? 'default' : 'outline'}\n          className=\"cursor-pointer\"\n          onClick={() => setSelectedCategory('all')}\n        >\n          {t.alphaMining.operators.all} ({OPERATORS.length})\n        </Badge>\n        {(Object.entries(CATEGORY_CONFIG) as [OperatorCategory, typeof CATEGORY_CONFIG.arithmetic][]).map(([key, config]) => (\n          <Badge\n            key={key}\n            variant={selectedCategory === key ? 'default' : 'outline'}\n            className={`cursor-pointer ${selectedCategory === key ? '' : config.color}`}\n            onClick={() => setSelectedCategory(key)}\n          >\n            {config.label} ({OPERATORS.filter(o => o.category === key).length})\n          </Badge>\n        ))}\n      </div>\n\n      {/* 特征列表 */}\n      <Card className=\"p-3\">\n        <h4 className=\"text-sm font-medium text-gray-700 mb-2\">{t.alphaMining.operators.availableFeatures}</h4>\n        <div className=\"flex flex-wrap gap-2\">\n          {FEATURES.map((feature, idx) => (\n            <motion.button\n              key={feature}\n              whileHover={{ scale: 1.05 }}\n              whileTap={{ scale: 0.95 }}\n              className={`px-3 py-1.5 rounded-md text-sm font-mono transition-colors ${\n                idx < 4 \n                  ? 'bg-blue-100 text-blue-700 hover:bg-blue-200' \n                  : 'bg-emerald-100 text-emerald-700 hover:bg-emerald-200'\n              }`}\n              onClick={() => onFeatureClick?.(feature)}\n              title={idx < 4 ? t.alphaMining.operators.techFeature : t.alphaMining.operators.sentimentFeature}\n            >\n              {feature}\n            </motion.button>\n          ))}\n        </div>\n      </Card>\n\n      {/* 操作符网格 */}\n      {selectedCategory === 'all' ? (\n        // 分组显示\n        Object.entries(groupedOperators).map(([category, ops]) => (\n          <div key={category}>\n            <h4 className={`text-sm font-medium mb-2 ${CATEGORY_CONFIG[category as OperatorCategory].color}`}>\n              {CATEGORY_CONFIG[category as OperatorCategory].label}\n            </h4>\n            <div className={`grid gap-2 ${compact ? 'grid-cols-4 md:grid-cols-6' : 'grid-cols-2 md:grid-cols-4'}`}>\n              {ops.map((op) => (\n                <OperatorCard\n                  key={op.name}\n                  operator={op}\n                  compact={compact}\n                  isHovered={hoveredOp === op.name}\n                  onHover={() => setHoveredOp(op.name)}\n                  onLeave={() => setHoveredOp(null)}\n                  onClick={() => onOperatorClick?.(op.name)}\n                />\n              ))}\n            </div>\n          </div>\n        ))\n      ) : (\n        // 单一类别\n        <div className={`grid gap-2 ${compact ? 'grid-cols-4 md:grid-cols-6' : 'grid-cols-2 md:grid-cols-4'}`}>\n          {filteredOperators.map((op) => (\n            <OperatorCard\n              key={op.name}\n              operator={op}\n              compact={compact}\n              isHovered={hoveredOp === op.name}\n              onHover={() => setHoveredOp(op.name)}\n              onLeave={() => setHoveredOp(null)}\n              onClick={() => onOperatorClick?.(op.name)}\n            />\n          ))}\n        </div>\n      )}\n\n      {/* 操作符总数统计 */}\n      <div className=\"text-xs text-gray-500 text-center\">\n        {t.alphaMining.operators.totalOperators.replace('{count}', String(OPERATORS.length))} · {t.alphaMining.operators.totalFeatures.replace('{count}', String(FEATURES.length))}\n      </div>\n    </div>\n  );\n};\n\n// 单个操作符卡片\ninterface OperatorCardProps {\n  operator: Operator;\n  compact?: boolean;\n  isHovered: boolean;\n  onHover: () => void;\n  onLeave: () => void;\n  onClick: () => void;\n}\n\nconst OperatorCard: React.FC<OperatorCardProps> = ({\n  operator,\n  compact,\n  isHovered,\n  onHover,\n  onLeave,\n  onClick,\n}) => {\n  const t = useGlobalI18n();\n  const CATEGORY_CONFIG = getCategoryConfig(t);\n  const config = CATEGORY_CONFIG[operator.category];\n\n  return (\n    <motion.div\n      whileHover={{ scale: 1.02, y: -2 }}\n      whileTap={{ scale: 0.98 }}\n      className={`\n        ${config.bgColor} \n        rounded-lg cursor-pointer transition-all duration-200\n        ${isHovered ? 'shadow-md ring-2 ring-offset-1' : ''}\n        ${compact ? 'p-2' : 'p-3'}\n      `}\n      style={{ \n        '--tw-ring-color': isHovered ? config.color.replace('text-', 'rgb(var(--') + ')' : undefined \n      } as React.CSSProperties}\n      onMouseEnter={onHover}\n      onMouseLeave={onLeave}\n      onClick={onClick}\n    >\n      <div className=\"flex items-center gap-2\">\n        <span className={config.color}>{operator.icon}</span>\n        <span className={`font-mono font-semibold ${compact ? 'text-xs' : 'text-sm'} ${config.color}`}>\n          {operator.name}\n        </span>\n        {!compact && (\n          <Badge variant=\"secondary\" className=\"text-xs ml-auto\">\n            {operator.arity}{t.alphaMining.operators.params}\n          </Badge>\n        )}\n      </div>\n      \n      {!compact && (\n        <>\n          <p className=\"text-xs text-gray-600 mt-1\">{operator.description}</p>\n          <code className=\"text-xs text-gray-500 mt-1 block truncate\" title={operator.example}>\n            {operator.example}\n          </code>\n        </>\n      )}\n    </motion.div>\n  );\n};\n\nexport default OperatorGrid;\nexport { FEATURES };\nexport type { Operator, OperatorCategory };\n"
  },
  {
    "path": "frontend/src/components/alpha-mining/SentimentCompare.tsx",
    "content": "/**\n * 情感融合效果对比组件\n * \n * 对比纯技术因子 vs 情感增强因子的效果：\n * - 左右两栏对比\n * - 指标对比条形图\n * - 改进幅度高亮\n */\n\nimport React, { useState, useCallback } from 'react';\nimport { Card, CardContent, CardHeader, CardTitle, CardDescription } from '../ui/card';\nimport { Button } from '../ui/button';\nimport { Badge } from '../ui/badge';\nimport {\n  BarChart, Bar, XAxis, YAxis, CartesianGrid, Tooltip,\n  ResponsiveContainer, Legend, Cell, ReferenceLine\n} from 'recharts';\nimport {\n  Play, Heart, Cpu, TrendingUp, TrendingDown,\n  ArrowRight, Loader2, Sparkles\n} from 'lucide-react';\nimport { useGlobalI18n } from '@/store/useLanguageStore';\n\ninterface CompareResult {\n  best_score: number;\n  best_formula: string;\n  total_steps: number;\n  num_features: number;\n}\n\ninterface SentimentCompareProps {\n  apiBaseUrl?: string;\n}\n\nconst SentimentCompare: React.FC<SentimentCompareProps> = ({\n  apiBaseUrl = '/api/v1',\n}) => {\n  const t = useGlobalI18n();\n  const [loading, setLoading] = useState(false);\n  const [withSentiment, setWithSentiment] = useState<CompareResult | null>(null);\n  const [withoutSentiment, setWithoutSentiment] = useState<CompareResult | null>(null);\n  const [improvement, setImprovement] = useState<{ score_diff: number; improvement_pct: number } | null>(null);\n  const [numSteps, setNumSteps] = useState(50);\n  const [error, setError] = useState<string | null>(null);\n\n  // 执行对比\n  const runComparison = useCallback(async () => {\n    setLoading(true);\n    setError(null);\n    setWithSentiment(null);\n    setWithoutSentiment(null);\n    setImprovement(null);\n\n    try {\n      const response = await fetch(`${apiBaseUrl}/alpha-mining/compare-sentiment`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({\n          num_steps: numSteps,\n          batch_size: 16,\n        }),\n      });\n\n      if (!response.ok) {\n        throw new Error(`HTTP error! status: ${response.status}`);\n      }\n\n      const data = await response.json();\n\n      if (data.success) {\n        setWithSentiment(data.with_sentiment);\n        setWithoutSentiment(data.without_sentiment);\n        setImprovement(data.improvement);\n      } else {\n        throw new Error(t.alphaMining.sentiment.comparisonFailed);\n      }\n    } catch (err: any) {\n      console.error('Comparison error:', err);\n      setError(err.message || t.alphaMining.sentiment.comparisonFailed);\n    } finally {\n      setLoading(false);\n    }\n  }, [apiBaseUrl, numSteps]);\n\n  // 对比条形图数据\n  const chartData = withSentiment && withoutSentiment ? [\n    {\n      name: 'Best Score',\n      without: withoutSentiment.best_score,\n      with: withSentiment.best_score,\n    },\n    {\n      name: 'Features',\n      without: withoutSentiment.num_features,\n      with: withSentiment.num_features,\n    },\n  ] : [];\n\n  // 改进幅度是否为正\n  const isImproved = improvement && improvement.score_diff > 0;\n\n  return (\n    <Card className=\"w-full\">\n      <CardHeader>\n        <div className=\"flex items-center justify-between\">\n          <div>\n            <CardTitle className=\"flex items-center gap-2\">\n              <Sparkles className=\"w-5 h-5 text-purple-500\" />\n              {t.alphaMining.sentiment.title}\n            </CardTitle>\n            <CardDescription>\n              {t.alphaMining.sentiment.desc}\n            </CardDescription>\n          </div>\n          {improvement && (\n            <Badge className={isImproved ? 'bg-green-100 text-green-700' : 'bg-red-100 text-red-700'}>\n              {isImproved ? <TrendingUp className=\"w-3 h-3 mr-1\" /> : <TrendingDown className=\"w-3 h-3 mr-1\" />}\n              {isImproved ? '+' : ''}{improvement.improvement_pct.toFixed(1)}%\n            </Badge>\n          )}\n        </div>\n      </CardHeader>\n\n      <CardContent className=\"space-y-4\">\n        {/* 控制面板 */}\n        <div className=\"flex items-center gap-4 p-3 bg-gray-50 rounded-lg\">\n          <div className=\"flex items-center gap-2\">\n            <label className=\"text-sm font-medium\">{t.alphaMining.sentiment.steps}:</label>\n            <input\n              type=\"number\"\n              value={numSteps}\n              onChange={(e) => setNumSteps(Number(e.target.value))}\n              min={20}\n              max={200}\n              disabled={loading}\n              className=\"w-20 px-2 py-1 border rounded text-sm\"\n            />\n          </div>\n          <div className=\"flex-1\" />\n          <Button onClick={runComparison} disabled={loading}>\n            {loading ? (\n              <>\n                <Loader2 className=\"w-4 h-4 mr-1 animate-spin\" />\n                {t.alphaMining.sentiment.comparing}\n              </>\n            ) : (\n              <>\n                <Play className=\"w-4 h-4 mr-1\" />\n                {t.alphaMining.sentiment.start}\n              </>\n            )}\n          </Button>\n        </div>\n\n        {/* 对比结果 */}\n        {withSentiment && withoutSentiment && (\n          <>\n            {/* 左右对比卡片 */}\n            <div className=\"grid grid-cols-1 md:grid-cols-2 gap-4\">\n              {/* 纯技术因子 */}\n              <Card className=\"border-blue-200 bg-blue-50/50\">\n                <CardHeader className=\"pb-2\">\n                  <CardTitle className=\"text-sm flex items-center gap-2 text-blue-700\">\n                    <Cpu className=\"w-4 h-4\" />\n                    {t.alphaMining.sentiment.techOnly}\n                  </CardTitle>\n                  <CardDescription className=\"text-xs\">\n                    {withoutSentiment.num_features}{t.alphaMining.sentiment.techDesc}\n                  </CardDescription>\n                </CardHeader>\n                <CardContent>\n                  <div className=\"space-y-3\">\n                    <div>\n                      <div className=\"text-xs text-gray-500\">{t.alphaMining.sentiment.bestFactor}</div>\n                      <code className=\"text-sm font-mono block mt-1 p-2 bg-white rounded border truncate\">\n                        {withoutSentiment.best_formula || t.alphaMining.sentiment.none}\n                      </code>\n                    </div>\n                    <div className=\"flex justify-between items-center\">\n                      <span className=\"text-sm text-gray-600\">Best Score</span>\n                      <span className=\"text-lg font-bold text-blue-600\">\n                        {withoutSentiment.best_score.toFixed(4)}\n                      </span>\n                    </div>\n                  </div>\n                </CardContent>\n              </Card>\n\n              {/* 情感增强因子 */}\n              <Card className=\"border-emerald-200 bg-emerald-50/50\">\n                <CardHeader className=\"pb-2\">\n                  <CardTitle className=\"text-sm flex items-center gap-2 text-emerald-700\">\n                    <Heart className=\"w-4 h-4\" />\n                    {t.alphaMining.sentiment.enhanced}\n                  </CardTitle>\n                  <CardDescription className=\"text-xs\">\n                    {withSentiment.num_features}{t.alphaMining.sentiment.enhancedDesc}\n                  </CardDescription>\n                </CardHeader>\n                <CardContent>\n                  <div className=\"space-y-3\">\n                    <div>\n                      <div className=\"text-xs text-gray-500\">{t.alphaMining.sentiment.bestFactor}</div>\n                      <code className=\"text-sm font-mono block mt-1 p-2 bg-white rounded border truncate\">\n                        {withSentiment.best_formula || t.alphaMining.sentiment.none}\n                      </code>\n                    </div>\n                    <div className=\"flex justify-between items-center\">\n                      <span className=\"text-sm text-gray-600\">Best Score</span>\n                      <span className=\"text-lg font-bold text-emerald-600\">\n                        {withSentiment.best_score.toFixed(4)}\n                      </span>\n                    </div>\n                  </div>\n                </CardContent>\n              </Card>\n            </div>\n\n            {/* 改进幅度 */}\n            {improvement && (\n              <Card className={isImproved ? 'bg-green-50 border-green-200' : 'bg-red-50 border-red-200'}>\n                <CardContent className=\"py-4\">\n                  <div className=\"flex items-center justify-between\">\n                    <div className=\"flex items-center gap-3\">\n                      <div className={`p-2 rounded-full ${isImproved ? 'bg-green-100' : 'bg-red-100'}`}>\n                        {isImproved ? (\n                          <TrendingUp className=\"w-5 h-5 text-green-600\" />\n                        ) : (\n                          <TrendingDown className=\"w-5 h-5 text-red-600\" />\n                        )}\n                      </div>\n                      <div>\n                        <div className=\"text-sm font-medium\">\n                          {isImproved ? t.alphaMining.sentiment.improved : t.alphaMining.sentiment.degraded}\n                        </div>\n                        <div className=\"text-xs text-gray-500\">\n                          {t.alphaMining.sentiment.scoreDiff}: {improvement.score_diff > 0 ? '+' : ''}{improvement.score_diff.toFixed(6)}\n                        </div>\n                      </div>\n                    </div>\n                    <div className={`text-3xl font-bold ${isImproved ? 'text-green-600' : 'text-red-600'}`}>\n                      {isImproved ? '+' : ''}{improvement.improvement_pct.toFixed(1)}%\n                    </div>\n                  </div>\n                </CardContent>\n              </Card>\n            )}\n\n            {/* 对比条形图 */}\n            <Card>\n              <CardHeader className=\"pb-2\">\n                <CardTitle className=\"text-sm\">{t.alphaMining.sentiment.comparison}</CardTitle>\n              </CardHeader>\n              <CardContent>\n                <div className=\"h-48\">\n                  <ResponsiveContainer width=\"100%\" height=\"100%\">\n                    <BarChart\n                      data={[{\n                        name: 'Best Score',\n                        [t.alphaMining.sentiment.techOnlyBar]: withoutSentiment.best_score,\n                        [t.alphaMining.sentiment.enhancedBar]: withSentiment.best_score,\n                      }]}\n                      layout=\"vertical\"\n                    >\n                      <CartesianGrid strokeDasharray=\"3 3\" stroke=\"#e5e7eb\" />\n                      <XAxis type=\"number\" tick={{ fontSize: 11 }} />\n                      <YAxis type=\"category\" dataKey=\"name\" tick={{ fontSize: 11 }} width={80} />\n                      <Tooltip\n                        contentStyle={{\n                          backgroundColor: 'rgba(255, 255, 255, 0.95)',\n                          borderRadius: '8px',\n                          border: '1px solid #e5e7eb',\n                          fontSize: 12,\n                        }}\n                        formatter={(value: number) => value.toFixed(4)}\n                      />\n                      <Legend />\n                      <Bar dataKey={t.alphaMining.sentiment.techOnlyBar} fill=\"#3b82f6\" radius={[0, 4, 4, 0]} />\n                      <Bar dataKey={t.alphaMining.sentiment.enhancedBar} fill=\"#10b981\" radius={[0, 4, 4, 0]} />\n                      <ReferenceLine x={0} stroke=\"#666\" />\n                    </BarChart>\n                  </ResponsiveContainer>\n                </div>\n              </CardContent>\n            </Card>\n\n            {/* 结论 */}\n            <div className=\"p-4 bg-gray-50 rounded-lg text-sm text-gray-600\">\n              <strong>{t.alphaMining.sentiment.conclusion}</strong>\n              {isImproved ? (\n                <>\n                  {t.alphaMining.sentiment.conclusionPositive}\n                </>\n              ) : (\n                <>\n                  {t.alphaMining.sentiment.conclusionNegative}\n                </>\n              )}\n            </div>\n          </>\n        )}\n\n        {/* 加载状态 */}\n        {loading && (\n          <div className=\"py-12 text-center\">\n            <Loader2 className=\"w-8 h-8 animate-spin mx-auto text-purple-500 mb-3\" />\n            <p className=\"text-sm text-gray-500\">{t.alphaMining.sentiment.comparingText}</p>\n            <p className=\"text-xs text-gray-400 mt-1\">\n              {t.alphaMining.sentiment.comparingHint} {numSteps} {t.alphaMining.sentiment.stepsText}\n            </p>\n          </div>\n        )}\n\n        {/* 错误提示 */}\n        {error && (\n          <div className=\"p-4 bg-red-50 rounded-lg border border-red-200\">\n            <p className=\"text-sm text-red-600\">{error}</p>\n          </div>\n        )}\n\n        {/* 初始状态 */}\n        {!loading && !withSentiment && !error && (\n          <div className=\"py-12 text-center text-gray-500\">\n            <Sparkles className=\"w-12 h-12 mx-auto opacity-50 mb-3\" />\n            <p>{t.alphaMining.sentiment.startHint}</p>\n            <p className=\"text-sm mt-1\">\n              {t.alphaMining.sentiment.startDesc}\n            </p>\n          </div>\n        )}\n      </CardContent>\n    </Card>\n  );\n};\n\nexport default SentimentCompare;\n"
  },
  {
    "path": "frontend/src/components/alpha-mining/TrainingMonitor.tsx",
    "content": "/**\n * 训练进度实时监控组件\n * \n * 使用 SSE 订阅训练进度，实时显示：\n * - 训练步数/进度\n * - Loss/Reward 曲线\n * - 当前最优因子表达式\n */\n\nimport React, { useState, useEffect, useCallback, useRef } from 'react';\nimport { Card, CardContent, CardHeader, CardTitle, CardDescription } from '../ui/card';\nimport { Button } from '../ui/button';\nimport { Badge } from '../ui/badge';\nimport { \n  LineChart, Line, XAxis, YAxis, CartesianGrid, Tooltip, \n  ResponsiveContainer, Legend, ReferenceLine \n} from 'recharts';\nimport { \n  Play, Square, RefreshCw, Activity, \n  TrendingUp, Zap, CheckCircle2, AlertCircle \n} from 'lucide-react';\nimport { useGlobalI18n } from '@/store/useLanguageStore';\n\ninterface TrainingMetrics {\n  step: number;\n  progress: number;\n  loss: number;\n  avg_reward: number;\n  max_reward: number;\n  valid_ratio: number;\n  best_score: number;\n  best_formula: string;\n}\n\ninterface TrainingMonitorProps {\n  apiBaseUrl?: string;\n  onTrainingComplete?: (result: { best_score: number; best_formula: string }) => void;\n}\n\ntype TrainingStatus = 'idle' | 'running' | 'completed' | 'error';\n\nconst TrainingMonitor: React.FC<TrainingMonitorProps> = ({\n  apiBaseUrl = '/api/v1',\n  onTrainingComplete,\n}) => {\n  const t = useGlobalI18n();\n  const [status, setStatus] = useState<TrainingStatus>('idle');\n  const [progress, setProgress] = useState(0);\n  const [currentMetrics, setCurrentMetrics] = useState<TrainingMetrics | null>(null);\n  const [history, setHistory] = useState<TrainingMetrics[]>([]);\n  const [error, setError] = useState<string | null>(null);\n  const [numSteps, setNumSteps] = useState(100);\n  const [useSentiment, setUseSentiment] = useState(true);\n  \n  const eventSourceRef = useRef<EventSource | null>(null);\n  const abortControllerRef = useRef<AbortController | null>(null);\n\n  // 开始训练\n  const startTraining = useCallback(async () => {\n    setStatus('running');\n    setProgress(0);\n    setHistory([]);\n    setError(null);\n    setCurrentMetrics(null);\n\n    try {\n      // 使用 fetch + SSE\n      abortControllerRef.current = new AbortController();\n      \n      const response = await fetch(`${apiBaseUrl}/alpha-mining/mine/stream`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({\n          num_steps: numSteps,\n          use_sentiment: useSentiment,\n          batch_size: 16,\n        }),\n        signal: abortControllerRef.current.signal,\n      });\n\n      if (!response.ok) {\n        throw new Error(`HTTP error! status: ${response.status}`);\n      }\n\n      const reader = response.body?.getReader();\n      if (!reader) {\n        throw new Error('No response body');\n      }\n\n      const decoder = new TextDecoder();\n      let buffer = '';\n\n      while (true) {\n        const { done, value } = await reader.read();\n        if (done) break;\n\n        buffer += decoder.decode(value, { stream: true });\n\n        // 解析 SSE 事件\n        const lines = buffer.split('\\n');\n        buffer = lines.pop() || '';\n\n        let currentEvent = '';\n        let currentData = '';\n\n        for (const line of lines) {\n          if (line.startsWith('event: ')) {\n            currentEvent = line.slice(7);\n          } else if (line.startsWith('data: ')) {\n            currentData = line.slice(6);\n          } else if (line === '' && currentEvent && currentData) {\n            try {\n              const data = JSON.parse(currentData);\n              handleSSEEvent(currentEvent, data);\n            } catch (e) {\n              console.error('Failed to parse SSE data:', currentData);\n            }\n            currentEvent = '';\n            currentData = '';\n          }\n        }\n      }\n    } catch (err: any) {\n      if (err.name !== 'AbortError') {\n        console.error('Training error:', err);\n        setError(err.message || t.alphaMining.training.trainingFailed);\n        setStatus('error');\n      }\n    }\n  }, [apiBaseUrl, numSteps, useSentiment]);\n\n  // 处理 SSE 事件\n  const handleSSEEvent = useCallback((event: string, data: any) => {\n    switch (event) {\n      case 'start':\n        console.log('Training started:', data);\n        break;\n        \n      case 'progress':\n        const metrics: TrainingMetrics = {\n          step: data.step,\n          progress: data.progress,\n          loss: data.loss,\n          avg_reward: data.avg_reward,\n          max_reward: data.max_reward,\n          valid_ratio: data.valid_ratio,\n          best_score: data.best_score,\n          best_formula: data.best_formula,\n        };\n        setCurrentMetrics(metrics);\n        setProgress(data.progress);\n        setHistory(prev => [...prev, metrics]);\n        break;\n        \n      case 'complete':\n        setStatus('completed');\n        setProgress(100);\n        onTrainingComplete?.({\n          best_score: data.best_score,\n          best_formula: data.best_formula,\n        });\n        break;\n        \n      case 'error':\n        setError(data.error);\n        setStatus('error');\n        break;\n    }\n  }, [onTrainingComplete]);\n\n  // 停止训练\n  const stopTraining = useCallback(() => {\n    abortControllerRef.current?.abort();\n    eventSourceRef.current?.close();\n    setStatus('idle');\n  }, []);\n\n  // 组件卸载时清理\n  useEffect(() => {\n    return () => {\n      abortControllerRef.current?.abort();\n      eventSourceRef.current?.close();\n    };\n  }, []);\n\n  // 状态颜色映射\n  const statusConfig = {\n    idle: { color: 'bg-gray-100 text-gray-600', icon: <Activity className=\"w-4 h-4\" /> },\n    running: { color: 'bg-blue-100 text-blue-600', icon: <RefreshCw className=\"w-4 h-4 animate-spin\" /> },\n    completed: { color: 'bg-green-100 text-green-600', icon: <CheckCircle2 className=\"w-4 h-4\" /> },\n    error: { color: 'bg-red-100 text-red-600', icon: <AlertCircle className=\"w-4 h-4\" /> },\n  };\n\n  return (\n    <Card className=\"w-full\">\n      <CardHeader>\n        <div className=\"flex items-center justify-between\">\n          <div>\n            <CardTitle className=\"flex items-center gap-2\">\n              <Zap className=\"w-5 h-5 text-amber-500\" />\n              {t.alphaMining.training.title}\n            </CardTitle>\n            <CardDescription>\n              {t.alphaMining.training.desc}\n            </CardDescription>\n          </div>\n          <Badge className={statusConfig[status].color}>\n            {statusConfig[status].icon}\n            <span className=\"ml-1\">\n              {status === 'idle' && t.alphaMining.training.ready}\n              {status === 'running' && t.alphaMining.training.running}\n              {status === 'completed' && t.alphaMining.training.completed}\n              {status === 'error' && t.alphaMining.training.error}\n            </span>\n          </Badge>\n        </div>\n      </CardHeader>\n      \n      <CardContent className=\"space-y-4\">\n        {/* 控制面板 */}\n        <div className=\"flex flex-wrap items-center gap-4 p-4 bg-gray-50 rounded-lg\">\n          <div className=\"flex items-center gap-2\">\n            <label className=\"text-sm font-medium\">{t.alphaMining.training.steps}:</label>\n            <input\n              type=\"number\"\n              value={numSteps}\n              onChange={(e) => setNumSteps(Number(e.target.value))}\n              min={10}\n              max={1000}\n              disabled={status === 'running'}\n              className=\"w-24 px-2 py-1 border rounded text-sm\"\n            />\n          </div>\n          \n          <div className=\"flex items-center gap-2\">\n            <input\n              type=\"checkbox\"\n              id=\"useSentiment\"\n              checked={useSentiment}\n              onChange={(e) => setUseSentiment(e.target.checked)}\n              disabled={status === 'running'}\n              className=\"rounded\"\n            />\n            <label htmlFor=\"useSentiment\" className=\"text-sm\">\n              {t.alphaMining.training.useSentiment}\n            </label>\n          </div>\n          \n          <div className=\"flex-1\" />\n          \n          {status === 'running' ? (\n            <Button variant=\"destructive\" size=\"sm\" onClick={stopTraining}>\n              <Square className=\"w-4 h-4 mr-1\" />\n              {t.alphaMining.training.stop}\n            </Button>\n          ) : (\n            <Button onClick={startTraining} disabled={status !== 'idle'}>\n              <Play className=\"w-4 h-4 mr-1\" />\n              {t.alphaMining.training.start}\n            </Button>\n          )}\n        </div>\n\n        {/* 进度条 */}\n        <div className=\"space-y-1\">\n          <div className=\"flex justify-between text-sm\">\n            <span>{t.alphaMining.training.progress}</span>\n            <span>{progress.toFixed(1)}%</span>\n          </div>\n          <div className=\"w-full bg-gray-200 rounded-full h-2\">\n            <div\n              className=\"bg-blue-600 h-2 rounded-full transition-all duration-300\"\n              style={{ width: `${progress}%` }}\n            />\n          </div>\n          {currentMetrics && (\n            <div className=\"text-xs text-gray-500\">\n              Step {currentMetrics.step} / {numSteps}\n            </div>\n          )}\n        </div>\n\n        {/* 实时指标 */}\n        {currentMetrics && (\n          <div className=\"grid grid-cols-2 md:grid-cols-4 gap-3\">\n            <MetricCard label=\"Loss\" value={currentMetrics.loss.toFixed(4)} trend=\"down\" />\n            <MetricCard label=\"Avg Reward\" value={currentMetrics.avg_reward.toFixed(4)} trend=\"up\" />\n            <MetricCard label=\"Best Score\" value={currentMetrics.best_score.toFixed(4)} trend=\"up\" highlight />\n            <MetricCard label=\"Valid Ratio\" value={`${(currentMetrics.valid_ratio * 100).toFixed(1)}%`} />\n          </div>\n        )}\n\n        {/* 当前最优因子 */}\n        {currentMetrics?.best_formula && (\n          <div className=\"p-3 bg-emerald-50 rounded-lg border border-emerald-200\">\n            <div className=\"text-xs text-emerald-600 font-medium mb-1\">{t.alphaMining.training.bestFactor}</div>\n            <code className=\"text-sm font-mono text-emerald-800\">\n              {currentMetrics.best_formula}\n            </code>\n          </div>\n        )}\n\n        {/* 收敛曲线 */}\n        {history.length > 0 && (\n          <div className=\"space-y-2\">\n            <h4 className=\"text-sm font-medium\">{t.alphaMining.training.convergence}</h4>\n            <div className=\"h-64\">\n              <ResponsiveContainer width=\"100%\" height=\"100%\">\n                <LineChart data={history}>\n                  <CartesianGrid strokeDasharray=\"3 3\" stroke=\"#e5e7eb\" />\n                  <XAxis \n                    dataKey=\"step\" \n                    tick={{ fontSize: 10 }}\n                    label={{ value: 'Step', position: 'bottom', fontSize: 12 }}\n                  />\n                  <YAxis \n                    yAxisId=\"left\"\n                    tick={{ fontSize: 10 }}\n                    label={{ value: 'Reward', angle: -90, position: 'insideLeft', fontSize: 12 }}\n                  />\n                  <YAxis \n                    yAxisId=\"right\"\n                    orientation=\"right\"\n                    tick={{ fontSize: 10 }}\n                    label={{ value: 'Loss', angle: 90, position: 'insideRight', fontSize: 12 }}\n                  />\n                  <Tooltip\n                    contentStyle={{\n                      backgroundColor: 'rgba(255, 255, 255, 0.95)',\n                      borderRadius: '8px',\n                      border: '1px solid #e5e7eb',\n                      fontSize: 12,\n                    }}\n                  />\n                  <Legend />\n                  <Line\n                    yAxisId=\"left\"\n                    type=\"monotone\"\n                    dataKey=\"avg_reward\"\n                    stroke=\"#10b981\"\n                    strokeWidth={2}\n                    dot={false}\n                    name=\"Avg Reward\"\n                  />\n                  <Line\n                    yAxisId=\"left\"\n                    type=\"monotone\"\n                    dataKey=\"best_score\"\n                    stroke=\"#f59e0b\"\n                    strokeWidth={2}\n                    dot={false}\n                    name=\"Best Score\"\n                  />\n                  <Line\n                    yAxisId=\"right\"\n                    type=\"monotone\"\n                    dataKey=\"loss\"\n                    stroke=\"#ef4444\"\n                    strokeWidth={1}\n                    dot={false}\n                    name=\"Loss\"\n                    strokeDasharray=\"5 5\"\n                  />\n                  <ReferenceLine yAxisId=\"left\" y={0} stroke=\"#666\" strokeDasharray=\"3 3\" />\n                </LineChart>\n              </ResponsiveContainer>\n            </div>\n          </div>\n        )}\n\n        {/* 错误信息 */}\n        {error && (\n          <div className=\"p-3 bg-red-50 rounded-lg border border-red-200\">\n            <div className=\"text-sm text-red-600\">{error}</div>\n          </div>\n        )}\n      </CardContent>\n    </Card>\n  );\n};\n\n// 指标卡片组件\ninterface MetricCardProps {\n  label: string;\n  value: string;\n  trend?: 'up' | 'down';\n  highlight?: boolean;\n}\n\nconst MetricCard: React.FC<MetricCardProps> = ({ label, value, trend, highlight }) => {\n  return (\n    <div className={`p-3 rounded-lg ${highlight ? 'bg-amber-50 border border-amber-200' : 'bg-gray-50'}`}>\n      <div className=\"text-xs text-gray-500\">{label}</div>\n      <div className={`text-lg font-semibold flex items-center gap-1 ${highlight ? 'text-amber-600' : ''}`}>\n        {value}\n        {trend === 'up' && <TrendingUp className=\"w-3 h-3 text-green-500\" />}\n        {trend === 'down' && <TrendingUp className=\"w-3 h-3 text-red-500 rotate-180\" />}\n      </div>\n    </div>\n  );\n};\n\nexport default TrainingMonitor;\n"
  },
  {
    "path": "frontend/src/components/alpha-mining/index.ts",
    "content": "/**\n * Alpha Mining 组件导出\n */\n\nexport { default as OperatorGrid, FEATURES } from './OperatorGrid';\nexport type { Operator, OperatorCategory } from './OperatorGrid';\n\nexport { default as TrainingMonitor } from './TrainingMonitor';\n\nexport { default as MetricsDashboard } from './MetricsDashboard';\nexport type { FactorMetrics } from './MetricsDashboard';\n\nexport { default as SentimentCompare } from './SentimentCompare';\n\nexport { default as AgentDemo } from './AgentDemo';\n"
  },
  {
    "path": "frontend/src/components/ui/badge.tsx",
    "content": "import * as React from \"react\"\nimport { cva, type VariantProps } from \"class-variance-authority\"\n\nimport { cn } from \"@/lib/utils\"\n\nconst badgeVariants = cva(\n  \"inline-flex items-center rounded-full border px-2.5 py-0.5 text-xs font-semibold transition-colors focus:outline-none focus:ring-2 focus:ring-ring focus:ring-offset-2\",\n  {\n    variants: {\n      variant: {\n        default:\n          \"border-transparent bg-primary text-primary-foreground hover:bg-primary/80\",\n        secondary:\n          \"border-transparent bg-secondary text-secondary-foreground hover:bg-secondary/80\",\n        destructive:\n          \"border-transparent bg-destructive text-destructive-foreground hover:bg-destructive/80\",\n        outline: \"text-foreground\",\n        success: \"border-transparent bg-green-100 text-green-800 dark:bg-green-900/30 dark:text-green-400\",\n        warning: \"border-transparent bg-yellow-100 text-yellow-800 dark:bg-yellow-900/30 dark:text-yellow-400\",\n      },\n    },\n    defaultVariants: {\n      variant: \"default\",\n    },\n  }\n)\n\nexport interface BadgeProps\n  extends React.HTMLAttributes<HTMLDivElement>,\n    VariantProps<typeof badgeVariants> {}\n\nfunction Badge({ className, variant, ...props }: BadgeProps) {\n  return (\n    <div className={cn(badgeVariants({ variant }), className)} {...props} />\n  )\n}\n\nexport { Badge, badgeVariants }\n\n"
  },
  {
    "path": "frontend/src/components/ui/button.tsx",
    "content": "import * as React from \"react\"\nimport { Slot } from \"@radix-ui/react-slot\"\nimport { cva, type VariantProps } from \"class-variance-authority\"\n\nimport { cn } from \"@/lib/utils\"\n\nconst buttonVariants = cva(\n  \"inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-md text-sm font-medium ring-offset-background transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none [&_svg]:size-4 [&_svg]:shrink-0\",\n  {\n    variants: {\n      variant: {\n        default: \"bg-primary text-primary-foreground hover:bg-primary/90\",\n        destructive:\n          \"bg-destructive text-destructive-foreground hover:bg-destructive/90\",\n        outline:\n          \"border border-input bg-background hover:bg-accent hover:text-accent-foreground\",\n        secondary:\n          \"bg-secondary text-secondary-foreground hover:bg-secondary/80\",\n        ghost: \"hover:bg-accent hover:text-accent-foreground\",\n        link: \"text-primary underline-offset-4 hover:underline\",\n      },\n      size: {\n        default: \"h-10 px-4 py-2\",\n        sm: \"h-9 rounded-md px-3\",\n        lg: \"h-11 rounded-md px-8\",\n        icon: \"h-10 w-10\",\n      },\n    },\n    defaultVariants: {\n      variant: \"default\",\n      size: \"default\",\n    },\n  }\n)\n\nexport interface ButtonProps\n  extends React.ButtonHTMLAttributes<HTMLButtonElement>,\n    VariantProps<typeof buttonVariants> {\n  asChild?: boolean\n}\n\nconst Button = React.forwardRef<HTMLButtonElement, ButtonProps>(\n  ({ className, variant, size, asChild = false, ...props }, ref) => {\n    const Comp = asChild ? Slot : \"button\"\n    return (\n      <Comp\n        className={cn(buttonVariants({ variant, size, className }))}\n        ref={ref}\n        {...props}\n      />\n    )\n  }\n)\nButton.displayName = \"Button\"\n\nexport { Button, buttonVariants }\n\n"
  },
  {
    "path": "frontend/src/components/ui/card.tsx",
    "content": "import * as React from \"react\"\n\nimport { cn } from \"@/lib/utils\"\n\nconst Card = React.forwardRef<\n  HTMLDivElement,\n  React.HTMLAttributes<HTMLDivElement>\n>(({ className, ...props }, ref) => (\n  <div\n    ref={ref}\n    className={cn(\n      \"rounded-lg border bg-card text-card-foreground shadow-sm\",\n      className\n    )}\n    {...props}\n  />\n))\nCard.displayName = \"Card\"\n\nconst CardHeader = React.forwardRef<\n  HTMLDivElement,\n  React.HTMLAttributes<HTMLDivElement>\n>(({ className, ...props }, ref) => (\n  <div\n    ref={ref}\n    className={cn(\"flex flex-col space-y-1.5 p-6\", className)}\n    {...props}\n  />\n))\nCardHeader.displayName = \"CardHeader\"\n\nconst CardTitle = React.forwardRef<\n  HTMLParagraphElement,\n  React.HTMLAttributes<HTMLHeadingElement>\n>(({ className, ...props }, ref) => (\n  <h3\n    ref={ref}\n    className={cn(\n      \"text-2xl font-semibold leading-none tracking-tight\",\n      className\n    )}\n    {...props}\n  />\n))\nCardTitle.displayName = \"CardTitle\"\n\nconst CardDescription = React.forwardRef<\n  HTMLParagraphElement,\n  React.HTMLAttributes<HTMLParagraphElement>\n>(({ className, ...props }, ref) => (\n  <p\n    ref={ref}\n    className={cn(\"text-sm text-muted-foreground\", className)}\n    {...props}\n  />\n))\nCardDescription.displayName = \"CardDescription\"\n\nconst CardContent = React.forwardRef<\n  HTMLDivElement,\n  React.HTMLAttributes<HTMLDivElement>\n>(({ className, ...props }, ref) => (\n  <div ref={ref} className={cn(\"p-6 pt-0\", className)} {...props} />\n))\nCardContent.displayName = \"CardContent\"\n\nconst CardFooter = React.forwardRef<\n  HTMLDivElement,\n  React.HTMLAttributes<HTMLDivElement>\n>(({ className, ...props }, ref) => (\n  <div\n    ref={ref}\n    className={cn(\"flex items-center p-6 pt-0\", className)}\n    {...props}\n  />\n))\nCardFooter.displayName = \"CardFooter\"\n\nexport { Card, CardHeader, CardFooter, CardTitle, CardDescription, CardContent }\n\n"
  },
  {
    "path": "frontend/src/components/ui/dropdown-menu.tsx",
    "content": "import * as React from \"react\"\nimport * as DropdownMenuPrimitive from \"@radix-ui/react-dropdown-menu\"\nimport { Check, ChevronRight } from \"lucide-react\"\n\nimport { cn } from \"@/lib/utils\"\n\nconst DropdownMenu = DropdownMenuPrimitive.Root\n\nconst DropdownMenuTrigger = DropdownMenuPrimitive.Trigger\n\nconst DropdownMenuGroup = DropdownMenuPrimitive.Group\n\nconst DropdownMenuPortal = DropdownMenuPrimitive.Portal\n\nconst DropdownMenuSub = DropdownMenuPrimitive.Sub\n\nconst DropdownMenuRadioGroup = DropdownMenuPrimitive.RadioGroup\n\nconst DropdownMenuSubTrigger = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.SubTrigger>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.SubTrigger> & {\n    inset?: boolean\n  }\n>(({ className, inset, children, ...props }, ref) => (\n  <DropdownMenuPrimitive.SubTrigger\n    ref={ref}\n    className={cn(\n      \"flex cursor-default select-none items-center rounded-sm px-2 py-1.5 text-sm outline-none focus:bg-slate-100 data-[state=open]:bg-slate-100\",\n      inset && \"pl-8\",\n      className\n    )}\n    {...props}\n  >\n    {children}\n    <ChevronRight className=\"ml-auto h-4 w-4\" />\n  </DropdownMenuPrimitive.SubTrigger>\n))\nDropdownMenuSubTrigger.displayName =\n  DropdownMenuPrimitive.SubTrigger.displayName\n\nconst DropdownMenuSubContent = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.SubContent>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.SubContent>\n>(({ className, ...props }, ref) => (\n  <DropdownMenuPrimitive.SubContent\n    ref={ref}\n    className={cn(\n      \"z-50 min-w-[8rem] overflow-hidden rounded-md border border-slate-200 bg-white p-1 text-slate-950 shadow-lg data-[state=open]:animate-in data-[state=closed]:animate-out data-[state=closed]:fade-out-0 data-[state=open]:fade-in-0 data-[state=closed]:zoom-out-95 data-[state=open]:zoom-in-95 data-[side=bottom]:slide-in-from-top-2 data-[side=left]:slide-in-from-right-2 data-[side=right]:slide-in-from-left-2 data-[side=top]:slide-in-from-bottom-2\",\n      className\n    )}\n    {...props}\n  />\n))\nDropdownMenuSubContent.displayName =\n  DropdownMenuPrimitive.SubContent.displayName\n\nconst DropdownMenuContent = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.Content>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.Content>\n>(({ className, sideOffset = 4, ...props }, ref) => (\n  <DropdownMenuPrimitive.Portal>\n    <DropdownMenuPrimitive.Content\n      ref={ref}\n      sideOffset={sideOffset}\n      className={cn(\n        \"z-50 min-w-[8rem] overflow-hidden rounded-md border border-slate-200 bg-white p-1 text-slate-950 shadow-md data-[state=open]:animate-in data-[state=closed]:animate-out data-[state=closed]:fade-out-0 data-[state=open]:fade-in-0 data-[state=closed]:zoom-out-95 data-[state=open]:zoom-in-95 data-[side=bottom]:slide-in-from-top-2 data-[side=left]:slide-in-from-right-2 data-[side=right]:slide-in-from-left-2 data-[side=top]:slide-in-from-bottom-2\",\n        className\n      )}\n      {...props}\n    />\n  </DropdownMenuPrimitive.Portal>\n))\nDropdownMenuContent.displayName = DropdownMenuPrimitive.Content.displayName\n\nconst DropdownMenuItem = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.Item>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.Item> & {\n    inset?: boolean\n  }\n>(({ className, inset, ...props }, ref) => (\n  <DropdownMenuPrimitive.Item\n    ref={ref}\n    className={cn(\n      \"relative flex cursor-default select-none items-center rounded-sm px-2 py-1.5 text-sm outline-none transition-colors focus:bg-slate-100 focus:text-slate-900 data-[disabled]:pointer-events-none data-[disabled]:opacity-50\",\n      inset && \"pl-8\",\n      className\n    )}\n    {...props}\n  />\n))\nDropdownMenuItem.displayName = DropdownMenuPrimitive.Item.displayName\n\nconst DropdownMenuCheckboxItem = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.CheckboxItem>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.CheckboxItem>\n>(({ className, children, checked, ...props }, ref) => (\n  <DropdownMenuPrimitive.CheckboxItem\n    ref={ref}\n    className={cn(\n      \"relative flex cursor-default select-none items-center rounded-sm py-1.5 pl-8 pr-2 text-sm outline-none transition-colors focus:bg-slate-100 focus:text-slate-900 data-[disabled]:pointer-events-none data-[disabled]:opacity-50\",\n      className\n    )}\n    checked={checked}\n    {...props}\n  >\n    <span className=\"absolute left-2 flex h-3.5 w-3.5 items-center justify-center\">\n      <DropdownMenuPrimitive.ItemIndicator>\n        <Check className=\"h-4 w-4\" />\n      </DropdownMenuPrimitive.ItemIndicator>\n    </span>\n    {children}\n  </DropdownMenuPrimitive.CheckboxItem>\n))\nDropdownMenuCheckboxItem.displayName =\n  DropdownMenuPrimitive.CheckboxItem.displayName\n\nconst DropdownMenuRadioItem = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.RadioItem>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.RadioItem>\n>(({ className, children, ...props }, ref) => (\n  <DropdownMenuPrimitive.RadioItem\n    ref={ref}\n    className={cn(\n      \"relative flex cursor-default select-none items-center rounded-sm py-1.5 pl-8 pr-2 text-sm outline-none transition-colors focus:bg-slate-100 focus:text-slate-900 data-[disabled]:pointer-events-none data-[disabled]:opacity-50\",\n      className\n    )}\n    {...props}\n  >\n    <span className=\"absolute left-2 flex h-3.5 w-3.5 items-center justify-center\">\n      <DropdownMenuPrimitive.ItemIndicator>\n        <div className=\"h-2 w-2 rounded-full bg-current\" />\n      </DropdownMenuPrimitive.ItemIndicator>\n    </span>\n    {children}\n  </DropdownMenuPrimitive.RadioItem>\n))\nDropdownMenuRadioItem.displayName = DropdownMenuPrimitive.RadioItem.displayName\n\nconst DropdownMenuLabel = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.Label>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.Label> & {\n    inset?: boolean\n  }\n>(({ className, inset, ...props }, ref) => (\n  <DropdownMenuPrimitive.Label\n    ref={ref}\n    className={cn(\n      \"px-2 py-1.5 text-sm font-semibold\",\n      inset && \"pl-8\",\n      className\n    )}\n    {...props}\n  />\n))\nDropdownMenuLabel.displayName = DropdownMenuPrimitive.Label.displayName\n\nconst DropdownMenuSeparator = React.forwardRef<\n  React.ElementRef<typeof DropdownMenuPrimitive.Separator>,\n  React.ComponentPropsWithoutRef<typeof DropdownMenuPrimitive.Separator>\n>(({ className, ...props }, ref) => (\n  <DropdownMenuPrimitive.Separator\n    ref={ref}\n    className={cn(\"-mx-1 my-1 h-px bg-slate-200\", className)}\n    {...props}\n  />\n))\nDropdownMenuSeparator.displayName = DropdownMenuPrimitive.Separator.displayName\n\nconst DropdownMenuShortcut = ({\n  className,\n  ...props\n}: React.HTMLAttributes<HTMLSpanElement>) => {\n  return (\n    <span\n      className={cn(\"ml-auto text-xs tracking-widest opacity-60\", className)}\n      {...props}\n    />\n  )\n}\nDropdownMenuShortcut.displayName = \"DropdownMenuShortcut\"\n\nexport {\n  DropdownMenu,\n  DropdownMenuTrigger,\n  DropdownMenuContent,\n  DropdownMenuItem,\n  DropdownMenuCheckboxItem,\n  DropdownMenuRadioItem,\n  DropdownMenuLabel,\n  DropdownMenuSeparator,\n  DropdownMenuShortcut,\n  DropdownMenuGroup,\n  DropdownMenuPortal,\n  DropdownMenuSub,\n  DropdownMenuSubContent,\n  DropdownMenuSubTrigger,\n  DropdownMenuRadioGroup,\n}\n\n"
  },
  {
    "path": "frontend/src/components/ui/sheet.tsx",
    "content": "import * as React from \"react\"\nimport { X } from \"lucide-react\"\nimport { cn } from \"@/lib/utils\"\n\ninterface SheetContextValue {\n  open: boolean\n  onOpenChange: (open: boolean) => void\n}\n\nconst SheetContext = React.createContext<SheetContextValue | undefined>(undefined)\n\nconst Sheet = ({ \n  open, \n  onOpenChange, \n  children \n}: { \n  open: boolean\n  onOpenChange: (open: boolean) => void\n  children: React.ReactNode \n}) => {\n  React.useEffect(() => {\n    const handleEscape = (e: KeyboardEvent) => {\n      if (e.key === 'Escape' && open) {\n        onOpenChange(false)\n      }\n    }\n    if (open) {\n      document.addEventListener('keydown', handleEscape)\n      document.body.style.overflow = 'hidden'\n    } else {\n      document.body.style.overflow = ''\n    }\n    return () => {\n      document.removeEventListener('keydown', handleEscape)\n      document.body.style.overflow = ''\n    }\n  }, [open, onOpenChange])\n\n  return (\n    <SheetContext.Provider value={{ open, onOpenChange }}>\n      {children}\n    </SheetContext.Provider>\n  )\n}\n\nconst SheetOverlay = React.forwardRef<\n  HTMLDivElement,\n  React.HTMLAttributes<HTMLDivElement>\n>(({ className, ...props }, ref) => {\n  const context = React.useContext(SheetContext)\n  if (!context) return null\n  \n  if (!context.open) return null\n\n  return (\n    <div\n      ref={ref}\n      className={cn(\n        \"fixed inset-0 z-50 bg-black/50 transition-opacity\",\n        context.open ? \"opacity-100\" : \"opacity-0\",\n        className\n      )}\n      onClick={() => context.onOpenChange(false)}\n      {...props}\n    />\n  )\n})\nSheetOverlay.displayName = \"SheetOverlay\"\n\ninterface SheetContentProps extends React.HTMLAttributes<HTMLDivElement> {\n  side?: \"top\" | \"bottom\" | \"left\" | \"right\"\n}\n\nconst SheetContent = React.forwardRef<\n  HTMLDivElement,\n  SheetContentProps\n>(({ side = \"right\", className, children, ...props }, ref) => {\n  const context = React.useContext(SheetContext)\n  if (!context) return null\n\n  // 使用状态来控制渲染，确保关闭动画能播放\n  const [isVisible, setIsVisible] = React.useState(context.open)\n\n  React.useEffect(() => {\n    if (context.open) {\n      setIsVisible(true)\n    } else {\n      // 延迟隐藏，让关闭动画播放\n      const timer = setTimeout(() => setIsVisible(false), 300)\n      return () => clearTimeout(timer)\n    }\n  }, [context.open])\n\n  if (!isVisible) return null\n\n  return (\n    <>\n      <SheetOverlay />\n      <div\n        ref={ref}\n        className={cn(\n          \"fixed z-50 gap-4 bg-white p-6 shadow-lg transition-transform duration-300 ease-in-out\",\n          side === \"right\" && \"inset-y-0 right-0 h-full w-full border-l sm:w-3/4 sm:max-w-2xl lg:max-w-4xl\",\n          side === \"left\" && \"inset-y-0 left-0 h-full w-full border-r sm:w-3/4 sm:max-w-sm\",\n          side === \"top\" && \"inset-x-0 top-0 border-b\",\n          side === \"bottom\" && \"inset-x-0 bottom-0 border-t\",\n          className\n        )}\n        style={{\n          transform: context.open \n            ? \"translateX(0)\" \n            : side === \"right\" \n              ? \"translateX(100%)\" \n              : side === \"left\" \n                ? \"translateX(-100%)\" \n                : \"translateX(0)\",\n        }}\n        {...props}\n      >\n        {children}\n        <button\n          onClick={() => context.onOpenChange(false)}\n          className=\"absolute right-4 top-4 rounded-sm opacity-70 ring-offset-white transition-opacity hover:opacity-100 focus:outline-none focus:ring-2 focus:ring-slate-950 focus:ring-offset-2 disabled:pointer-events-none\"\n        >\n          <X className=\"h-4 w-4\" />\n          <span className=\"sr-only\">Close</span>\n        </button>\n      </div>\n    </>\n  )\n})\nSheetContent.displayName = \"SheetContent\"\n\nconst SheetHeader = ({\n  className,\n  ...props\n}: React.HTMLAttributes<HTMLDivElement>) => (\n  <div\n    className={cn(\n      \"flex flex-col space-y-2 text-center sm:text-left\",\n      className\n    )}\n    {...props}\n  />\n)\nSheetHeader.displayName = \"SheetHeader\"\n\nconst SheetFooter = ({\n  className,\n  ...props\n}: React.HTMLAttributes<HTMLDivElement>) => (\n  <div\n    className={cn(\n      \"flex flex-col-reverse sm:flex-row sm:justify-end sm:space-x-2\",\n      className\n    )}\n    {...props}\n  />\n)\nSheetFooter.displayName = \"SheetFooter\"\n\nconst SheetTitle = React.forwardRef<\n  HTMLHeadingElement,\n  React.HTMLAttributes<HTMLHeadingElement>\n>(({ className, ...props }, ref) => (\n  <h2\n    ref={ref}\n    className={cn(\"text-lg font-semibold text-slate-950\", className)}\n    {...props}\n  />\n))\nSheetTitle.displayName = \"SheetTitle\"\n\nconst SheetDescription = React.forwardRef<\n  HTMLParagraphElement,\n  React.HTMLAttributes<HTMLParagraphElement>\n>(({ className, ...props }, ref) => (\n  <p\n    ref={ref}\n    className={cn(\"text-sm text-slate-500\", className)}\n    {...props}\n  />\n))\nSheetDescription.displayName = \"SheetDescription\"\n\nexport {\n  Sheet,\n  SheetOverlay,\n  SheetContent,\n  SheetHeader,\n  SheetFooter,\n  SheetTitle,\n  SheetDescription,\n}\n"
  },
  {
    "path": "frontend/src/components/ui/tabs.tsx",
    "content": "import * as React from \"react\"\nimport * as TabsPrimitive from \"@radix-ui/react-tabs\"\nimport { cn } from \"@/lib/utils\"\n\nconst Tabs = TabsPrimitive.Root\n\nconst TabsList = React.forwardRef<\n  React.ElementRef<typeof TabsPrimitive.List>,\n  React.ComponentPropsWithoutRef<typeof TabsPrimitive.List>\n>(({ className, ...props }, ref) => (\n  <TabsPrimitive.List\n    ref={ref}\n    className={cn(\n      \"inline-flex h-10 items-center justify-center rounded-md bg-muted p-1 text-muted-foreground\",\n      className\n    )}\n    {...props}\n  />\n))\nTabsList.displayName = TabsPrimitive.List.displayName\n\nconst TabsTrigger = React.forwardRef<\n  React.ElementRef<typeof TabsPrimitive.Trigger>,\n  React.ComponentPropsWithoutRef<typeof TabsPrimitive.Trigger>\n>(({ className, ...props }, ref) => (\n  <TabsPrimitive.Trigger\n    ref={ref}\n    className={cn(\n      \"inline-flex items-center justify-center whitespace-nowrap rounded-sm px-3 py-1.5 text-sm font-medium ring-offset-background transition-all focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50 data-[state=active]:bg-background data-[state=active]:text-foreground data-[state=active]:shadow-sm\",\n      className\n    )}\n    {...props}\n  />\n))\nTabsTrigger.displayName = TabsPrimitive.Trigger.displayName\n\nconst TabsContent = React.forwardRef<\n  React.ElementRef<typeof TabsPrimitive.Content>,\n  React.ComponentPropsWithoutRef<typeof TabsPrimitive.Content>\n>(({ className, ...props }, ref) => (\n  <TabsPrimitive.Content\n    ref={ref}\n    className={cn(\n      \"mt-2 ring-offset-background focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2\",\n      className\n    )}\n    {...props}\n  />\n))\nTabsContent.displayName = TabsPrimitive.Content.displayName\n\nexport { Tabs, TabsList, TabsTrigger, TabsContent }\n"
  },
  {
    "path": "frontend/src/context/NewsToolbarContext.tsx",
    "content": "import React, { createContext, useContext, useState } from 'react'\n\ninterface ToolbarContent {\n  left?: React.ReactNode | null\n  right?: React.ReactNode | null\n}\n\ninterface NewsToolbarContextValue {\n  content: ToolbarContent\n  setContent: (content: ToolbarContent) => void\n}\n\nconst NewsToolbarContext = createContext<NewsToolbarContextValue>({\n  content: {},\n  setContent: () => {},\n})\n\nexport const NewsToolbarProvider = ({\n  children,\n}: {\n  children: React.ReactNode\n}) => {\n  const [content, setContent] = useState<ToolbarContent>({})\n\n  return (\n    <NewsToolbarContext.Provider value={{ content, setContent }}>\n      {children}\n    </NewsToolbarContext.Provider>\n  )\n}\n\nexport const useNewsToolbar = () => useContext(NewsToolbarContext)\n\n\n"
  },
  {
    "path": "frontend/src/hooks/useDebounce.ts",
    "content": "import { useState, useEffect } from 'react'\n\n/**\n * useDebounce Hook\n * \n * 用于延迟处理快速变化的值（如搜索输入），避免频繁触发计算或API请求\n * \n * @param value - 需要防抖的值\n * @param delay - 延迟时间（毫秒），默认 500ms\n * @returns 防抖后的值\n * \n * @example\n * const [searchTerm, setSearchTerm] = useState('')\n * const debouncedSearchTerm = useDebounce(searchTerm, 300)\n * \n * useEffect(() => {\n *   // 只有当用户停止输入 300ms 后才会执行\n *   fetchSearchResults(debouncedSearchTerm)\n * }, [debouncedSearchTerm])\n */\nexport function useDebounce<T>(value: T, delay: number = 500): T {\n  const [debouncedValue, setDebouncedValue] = useState<T>(value)\n\n  useEffect(() => {\n    // 设置定时器，在delay后更新debouncedValue\n    const timer = setTimeout(() => {\n      setDebouncedValue(value)\n    }, delay)\n\n    // 清理函数：如果value在delay时间内再次变化，清除上一个定时器\n    return () => {\n      clearTimeout(timer)\n    }\n  }, [value, delay])\n\n  return debouncedValue\n}\n\n"
  },
  {
    "path": "frontend/src/index.css",
    "content": "@tailwind base;\n@tailwind components;\n@tailwind utilities;\n\n@layer base {\n  :root {\n    --background: 0 0% 100%;\n    --foreground: 222.2 84% 4.9%;\n    --card: 0 0% 100%;\n    --card-foreground: 222.2 84% 4.9%;\n    --popover: 0 0% 100%;\n    --popover-foreground: 222.2 84% 4.9%;\n    --primary: 221.2 83.2% 53.3%;\n    --primary-foreground: 210 40% 98%;\n    --secondary: 210 40% 96.1%;\n    --secondary-foreground: 222.2 47.4% 11.2%;\n    --muted: 210 40% 96.1%;\n    --muted-foreground: 215.4 16.3% 46.9%;\n    --accent: 210 40% 96.1%;\n    --accent-foreground: 222.2 47.4% 11.2%;\n    --destructive: 0 84.2% 60.2%;\n    --destructive-foreground: 210 40% 98%;\n    --border: 214.3 31.8% 91.4%;\n    --input: 214.3 31.8% 91.4%;\n    --ring: 221.2 83.2% 53.3%;\n    --radius: 0.5rem;\n  }\n\n  .dark {\n    --background: 222.2 84% 4.9%;\n    --foreground: 210 40% 98%;\n    --card: 222.2 84% 4.9%;\n    --card-foreground: 210 40% 98%;\n    --popover: 222.2 84% 4.9%;\n    --popover-foreground: 210 40% 98%;\n    --primary: 217.2 91.2% 59.8%;\n    --primary-foreground: 222.2 47.4% 11.2%;\n    --secondary: 217.2 32.6% 17.5%;\n    --secondary-foreground: 210 40% 98%;\n    --muted: 217.2 32.6% 17.5%;\n    --muted-foreground: 215 20.2% 65.1%;\n    --accent: 217.2 32.6% 17.5%;\n    --accent-foreground: 210 40% 98%;\n    --destructive: 0 62.8% 30.6%;\n    --destructive-foreground: 210 40% 98%;\n    --border: 217.2 32.6% 17.5%;\n    --input: 217.2 32.6% 17.5%;\n    --ring: 224.3 76.3% 48%;\n  }\n}\n\n@layer base {\n  * {\n    @apply border-border;\n  }\n  body {\n    @apply bg-background text-foreground;\n  }\n}\n\n"
  },
  {
    "path": "frontend/src/layout/MainLayout.tsx",
    "content": "import { Outlet, Link, useLocation } from 'react-router-dom'\nimport { Home, Newspaper, TrendingUp, Activity, Settings, Brain } from 'lucide-react'\nimport { cn } from '@/lib/utils'\nimport ModelSelector from '@/components/ModelSelector'\nimport { NewsToolbarProvider, useNewsToolbar } from '@/context/NewsToolbarContext'\nimport { useLanguageStore, useGlobalI18n } from '@/store/useLanguageStore'\n\nconst navigationConfig = [\n  { key: 'home', href: '/', icon: Home },\n  { key: 'news', href: '/news', icon: Newspaper },\n  { key: 'stock', href: '/stock', icon: TrendingUp },\n  { key: 'alphaMining', href: '/alpha-mining', icon: Brain },\n  { key: 'agents', href: '/agents', icon: Activity },\n  { key: 'tasks', href: '/tasks', icon: Settings },\n]\n\nexport default function MainLayout() {\n  return (\n    <NewsToolbarProvider>\n      <MainLayoutInner />\n    </NewsToolbarProvider>\n  )\n}\n\nfunction MainLayoutInner() {\n  const location = useLocation()\n  const { content } = useNewsToolbar()\n  const { lang, setLang } = useLanguageStore()\n  const t = useGlobalI18n()\n  \n  const isNewsPage =\n    location.pathname === '/news' || location.pathname.startsWith('/news/')\n\n  return (\n    <div className=\"flex h-screen bg-gray-50\">\n      {/* 侧边栏 */}\n      <div className=\"w-64 bg-white border-r border-gray-200 flex flex-col\">\n        {/* Logo */}\n        <div className=\"h-16 flex items-center px-6 border-b border-gray-200\">\n          <h1 className=\"text-xl font-bold bg-gradient-to-r from-blue-600 to-purple-600 bg-clip-text text-transparent\">\n            🎯 FinnewsHunter\n          </h1>\n        </div>\n\n        {/* 导航 */}\n        <nav className=\"flex-1 px-4 py-4 space-y-1\">\n          {navigationConfig.map((item) => {\n            const Icon = item.icon\n            const name = t.nav[item.key as keyof typeof t.nav]\n            const isActive = location.pathname === item.href ||\n              (item.href !== '/' && location.pathname.startsWith(item.href))\n\n            return (\n              <Link\n                key={item.key}\n                to={item.href}\n                className={cn(\n                  'flex items-center gap-3 px-3 py-2 rounded-lg text-sm font-medium transition-colors',\n                  isActive\n                    ? 'bg-blue-50 text-blue-700'\n                    : 'text-gray-700 hover:bg-gray-100'\n                )}\n              >\n                <Icon className=\"w-5 h-5\" />\n                {name}\n              </Link>\n            )\n          })}\n        </nav>\n\n        {/* 底部信息 */}\n        <div className=\"p-4 border-t border-gray-200\">\n          <div className=\"text-xs text-gray-500\">\n            {t.header.poweredBy} <span className=\"font-semibold\">AgenticX</span>\n          </div>\n        </div>\n      </div>\n\n      {/* 主内容区 */}\n      <div className=\"flex-1 flex flex-col overflow-hidden\">\n        {/* 顶部栏 */}\n        <header className=\"h-16 bg-white border-b border-gray-200 flex items-center justify-between px-6 gap-4\">\n          {/* 左侧：搜索框或标题 */}\n          <div className=\"flex-1 max-w-xl\">\n            {isNewsPage ? (\n              content.left || <h1 className=\"text-xl font-semibold text-gray-900\">\n                {lang === 'zh' ? '实时新闻流' : 'Real-time News Feed'}\n              </h1>\n            ) : (\n              <h1 className=\"text-xl font-semibold text-gray-900\">{t.header.title}</h1>\n            )}\n          </div>\n          \n          {/* 右侧：工具栏 */}\n          <div className=\"flex items-center gap-4\">\n            {/* 语言切换 */}\n            <div className=\"flex items-center gap-1 bg-gray-100 rounded-lg p-1\">\n              <button\n                onClick={() => setLang('en')}\n                className={`px-3 py-1.5 text-sm font-medium rounded-md transition-colors ${\n                  lang === 'en' \n                    ? 'bg-white text-gray-900 shadow-sm' \n                    : 'text-gray-600 hover:text-gray-900'\n                }`}\n              >\n                EN\n              </button>\n              <button\n                onClick={() => setLang('zh')}\n                className={`px-3 py-1.5 text-sm font-medium rounded-md transition-colors ${\n                  lang === 'zh' \n                    ? 'bg-white text-gray-900 shadow-sm' \n                    : 'text-gray-600 hover:text-gray-900'\n                }`}\n              >\n                中文\n              </button>\n            </div>\n            \n            <ModelSelector />\n            {isNewsPage && content.right}\n          </div>\n        </header>\n\n        {/* 页面内容 */}\n        <main className=\"flex-1 overflow-auto\">\n          <Outlet />\n        </main>\n      </div>\n    </div>\n  )\n}\n"
  },
  {
    "path": "frontend/src/lib/api-client.ts",
    "content": "import axios from 'axios'\nimport type {\n  News,\n  Analysis,\n  CrawlTask,\n  TaskStats,\n  CrawlRequest,\n  CrawlResponse,\n  AnalysisResponse,\n  StockOverview,\n  StockNewsItem,\n  SentimentTrendPoint,\n  KLineDataPoint,\n  RealtimeQuote,\n  DebateRequest,\n  DebateResponse,\n  AgentLogEntry,\n  AgentMetrics,\n  AgentInfo,\n  WorkflowInfo,\n} from '@/types/api'\n\nconst API_BASE_URL = import.meta.env.VITE_API_BASE_URL || 'http://localhost:8000/api/v1'\n\nconst apiClient = axios.create({\n  baseURL: API_BASE_URL,\n  timeout: 60000,\n  headers: {\n    'Content-Type': 'application/json',\n  },\n})\n\n// Request interceptor\napiClient.interceptors.request.use(\n  (config) => {\n    // 可以在这里添加认证 token\n    return config\n  },\n  (error) => {\n    return Promise.reject(error)\n  }\n)\n\n// Response interceptor\napiClient.interceptors.response.use(\n  (response) => response,\n  (error) => {\n    // 详细的错误日志\n    if (error.response) {\n      // 服务器返回了错误响应\n      console.error('API Error Response:', {\n        status: error.response.status,\n        statusText: error.response.statusText,\n        data: error.response.data,\n        url: error.config?.url,\n      })\n    } else if (error.request) {\n      // 请求已发出但没有收到响应\n      console.error('API Error Request:', {\n        message: error.message,\n        url: error.config?.url,\n        baseURL: error.config?.baseURL,\n        timeout: error.code === 'ECONNABORTED' ? 'Request timeout' : 'Network error',\n      })\n    } else {\n      // 请求配置出错\n      console.error('API Error Config:', error.message)\n    }\n    return Promise.reject(error)\n  }\n)\n\n/**\n * 新闻相关 API - Phase 2 升级版\n */\nexport const newsApi = {\n  /**\n   * Phase 2: 获取最新新闻（智能缓存 + 自动刷新）\n   */\n  getLatestNews: async (params?: {\n    source?: string\n    limit?: number\n    force_refresh?: boolean\n  }): Promise<News[]> => {\n    const response = await apiClient.get<any>('/news/latest', { params })\n    // Phase 2 API 返回 { success, data: News[], ... }\n    // 兼容处理：如果返回的是对象，提取 data 字段；否则直接返回\n    if (response.data && typeof response.data === 'object' && 'data' in response.data) {\n      return response.data.data\n    }\n    return response.data\n  },\n\n  /**\n   * Phase 2: 强制刷新新闻\n   */\n  forceRefresh: async (params: { source: string }): Promise<{ success: boolean; message: string }> => {\n    const response = await apiClient.post('/news/refresh', null, { params })\n    return response.data\n  },\n\n  /**\n   * 获取新闻列表（带筛选）\n   */\n  getNewsList: async (params?: {\n    skip?: number\n    limit?: number\n    source?: string\n    sentiment?: string\n  }): Promise<News[]> => {\n    const response = await apiClient.get<News[]>('/news/', { params })\n    return response.data\n  },\n\n  /**\n   * 获取新闻详情\n   */\n  getNewsDetail: async (newsId: number): Promise<News> => {\n    const response = await apiClient.get<News>(`/news/${newsId}`)\n    return response.data\n  },\n\n  /**\n   * 获取新闻原始 HTML\n   */\n  getNewsHtml: async (newsId: number): Promise<{ id: number; title: string; url: string; raw_html: string | null; has_raw_html: boolean }> => {\n    const response = await apiClient.get(`/news/${newsId}/html`)\n    return response.data\n  },\n\n  /**\n   * 【已废弃】触发爬取\n   */\n  crawlNews: async (data: CrawlRequest): Promise<CrawlResponse> => {\n    console.warn('⚠️ crawlNews API 已废弃，请使用 forceRefresh')\n    const response = await apiClient.post<CrawlResponse>('/news/crawl', data)\n    return response.data\n  },\n\n  /**\n   * 删除新闻\n   */\n  deleteNews: async (newsId: number): Promise<void> => {\n    await apiClient.delete(`/news/${newsId}`)\n  },\n\n  /**\n   * 批量删除新闻\n   */\n  batchDeleteNews: async (newsIds: number[]): Promise<{ success: boolean; message: string; deleted_count: number }> => {\n    const response = await apiClient.post('/news/batch/delete', { news_ids: newsIds })\n    return response.data\n  },\n}\n\n/**\n * 分析相关 API\n */\nexport const analysisApi = {\n  /**\n   * 触发新闻分析\n   * @param newsId - 新闻ID\n   * @param config - 可选的LLM配置 (provider和model)\n   */\n  analyzeNews: async (\n    newsId: number, \n    config?: { provider?: string; model?: string }\n  ): Promise<AnalysisResponse> => {\n    const response = await apiClient.post<AnalysisResponse>(\n      `/analysis/news/${newsId}`,\n      config || {}\n    )\n    return response.data\n  },\n\n  /**\n   * 获取分析详情\n   */\n  getAnalysisDetail: async (analysisId: number): Promise<Analysis> => {\n    const response = await apiClient.get<Analysis>(`/analysis/${analysisId}`)\n    return response.data\n  },\n\n  /**\n   * 获取新闻的所有分析结果\n   */\n  getNewsAnalyses: async (newsId: number): Promise<Analysis[]> => {\n    const response = await apiClient.get<Analysis[]>(`/analysis/news/${newsId}/all`)\n    return response.data\n  },\n\n  /**\n   * 批量分析新闻\n   * 注意：批量分析可能需要较长时间，超时时间设置为5分钟\n   */\n  batchAnalyzeNews: async (\n    newsIds: number[],\n    config?: { provider?: string; model?: string }\n  ): Promise<{ success: boolean; message: string; total_count: number; success_count: number; failed_count: number }> => {\n    // 确保newsIds是有效的数组\n    if (!Array.isArray(newsIds) || newsIds.length === 0) {\n      throw new Error('newsIds must be a non-empty array')\n    }\n    \n    const requestBody: { news_ids: number[]; provider?: string; model?: string } = {\n      news_ids: newsIds\n    }\n    \n    // 只有当config存在且值不为undefined和空字符串时才添加到请求体\n    if (config) {\n      if (config.provider !== undefined && config.provider !== null && config.provider !== '') {\n        requestBody.provider = config.provider\n      }\n      if (config.model !== undefined && config.model !== null && config.model !== '') {\n        requestBody.model = config.model\n      }\n    }\n    \n    // 批量分析可能需要较长时间，设置5分钟超时\n    const response = await apiClient.post('/analysis/news/batch', requestBody, {\n      timeout: 5 * 60 * 1000  // 5分钟超时\n    })\n    return response.data\n  },\n}\n\n/**\n * LLM 配置相关类型\n */\nexport interface ModelInfo {\n  value: string\n  label: string\n  description: string\n}\n\nexport interface ProviderInfo {\n  value: string\n  label: string\n  icon: string\n  models: ModelInfo[]\n  has_api_key: boolean\n}\n\nexport interface LLMConfigResponse {\n  default_provider: string\n  default_model: string\n  providers: ProviderInfo[]\n}\n\n/**\n * LLM 配置相关 API\n */\nexport const llmApi = {\n  /**\n   * 获取 LLM 配置（可用厂商和模型列表）\n   */\n  getConfig: async (): Promise<LLMConfigResponse> => {\n    const response = await apiClient.get<LLMConfigResponse>('/llm/config')\n    return response.data\n  },\n}\n\n/**\n * 任务相关 API\n */\nexport const taskApi = {\n  /**\n   * 获取任务列表\n   */\n  getTaskList: async (params?: {\n    skip?: number\n    limit?: number\n    mode?: string\n    status?: string\n  }): Promise<CrawlTask[]> => {\n    const response = await apiClient.get<CrawlTask[]>('/tasks/', { params })\n    return response.data\n  },\n\n  /**\n   * 获取任务详情\n   */\n  getTaskDetail: async (taskId: number): Promise<CrawlTask> => {\n    const response = await apiClient.get<CrawlTask>(`/tasks/${taskId}`)\n    return response.data\n  },\n\n  /**\n   * 触发冷启动\n   */\n  triggerColdStart: async (data: {\n    source: string\n    start_page: number\n    end_page: number\n  }): Promise<{ success: boolean; message: string; celery_task_id?: string }> => {\n    const response = await apiClient.post('/tasks/cold-start', data)\n    return response.data\n  },\n\n  /**\n   * 获取任务统计\n   */\n  getTaskStats: async (): Promise<TaskStats> => {\n    const response = await apiClient.get<TaskStats>('/tasks/stats/summary')\n    return response.data\n  },\n}\n\n/**\n * 股票分析相关 API - Phase 2\n */\nexport const stockApi = {\n  /**\n   * 获取股票概览信息\n   */\n  getOverview: async (stockCode: string): Promise<StockOverview> => {\n    const response = await apiClient.get<StockOverview>(`/stocks/${stockCode}`)\n    return response.data\n  },\n\n  /**\n   * 获取股票关联新闻\n   */\n  getNews: async (stockCode: string, params?: {\n    limit?: number\n    offset?: number\n    sentiment?: 'positive' | 'negative' | 'neutral'\n  }): Promise<StockNewsItem[]> => {\n    const response = await apiClient.get<StockNewsItem[]>(`/stocks/${stockCode}/news`, { params })\n    return response.data\n  },\n\n  /**\n   * 获取情感趋势\n   */\n  getSentimentTrend: async (stockCode: string, days: number = 30): Promise<SentimentTrendPoint[]> => {\n    const response = await apiClient.get<SentimentTrendPoint[]>(\n      `/stocks/${stockCode}/sentiment-trend`,\n      { params: { days } }\n    )\n    return response.data\n  },\n\n  /**\n   * 获取K线数据（真实数据，使用 akshare）\n   * @param stockCode 股票代码\n   * @param period 周期：daily, 1m, 5m, 15m, 30m, 60m\n   * @param limit 数据条数\n   * @param adjust 复权类型：qfq=前复权, hfq=后复权, \"\"=不复权\n   */\n  getKLineData: async (\n    stockCode: string, \n    period: 'daily' | '1m' | '5m' | '15m' | '30m' | '60m' = 'daily',\n    limit: number = 90,\n    adjust: 'qfq' | 'hfq' | '' = 'qfq'\n  ): Promise<KLineDataPoint[]> => {\n    const response = await apiClient.get<KLineDataPoint[]>(\n      `/stocks/${stockCode}/kline`,\n      { params: { period, limit, adjust } }\n    )\n    return response.data\n  },\n\n  /**\n   * 获取实时行情\n   */\n  getRealtimeQuote: async (stockCode: string): Promise<RealtimeQuote | null> => {\n    const response = await apiClient.get<RealtimeQuote | null>(\n      `/stocks/${stockCode}/realtime`\n    )\n    return response.data\n  },\n\n  /**\n   * 搜索股票（从数据库）\n   */\n  searchRealtime: async (query: string, limit: number = 20): Promise<Array<{\n    code: string\n    name: string\n    full_code: string\n    market: string | null\n    industry: string | null\n  }>> => {\n    const response = await apiClient.get('/stocks/search/realtime', {\n      params: { q: query, limit }\n    })\n    return response.data\n  },\n\n  /**\n   * 初始化股票数据（从 akshare 获取并存入数据库）\n   */\n  initStockData: async (): Promise<{\n    success: boolean\n    message: string\n    count: number\n  }> => {\n    const response = await apiClient.post('/stocks/init')\n    return response.data\n  },\n\n  /**\n   * 获取数据库中的股票数量\n   */\n  getStockCount: async (): Promise<{ count: number; message: string }> => {\n    const response = await apiClient.get('/stocks/count')\n    return response.data\n  },\n\n  /**\n   * 从数据库搜索股票\n   */\n  search: async (query: string, limit: number = 10): Promise<Array<{\n    code: string\n    name: string\n    full_code: string | null\n    industry: string | null\n  }>> => {\n    const response = await apiClient.get('/stocks/search/code', {\n      params: { q: query, limit }\n    })\n    return response.data\n  },\n\n  /**\n   * 触发定向爬取任务\n   */\n  startTargetedCrawl: async (\n    stockCode: string,\n    stockName: string,\n    days: number = 30\n  ): Promise<{\n    success: boolean\n    message: string\n    task_id?: number\n    celery_task_id?: string\n  }> => {\n    const response = await apiClient.post(`/stocks/${stockCode}/targeted-crawl`, {\n      stock_name: stockName,\n      days\n    })\n    return response.data\n  },\n\n  /**\n   * 查询定向爬取任务状态\n   */\n  getTargetedCrawlStatus: async (stockCode: string): Promise<{\n    task_id?: number\n    status: string\n    celery_task_id?: string\n    progress?: {\n      current: number\n      total: number\n      message?: string\n    }\n    crawled_count?: number\n    saved_count?: number\n    error_message?: string\n    execution_time?: number\n    started_at?: string\n    completed_at?: string\n  }> => {\n    const response = await apiClient.get(`/stocks/${stockCode}/targeted-crawl/status`)\n    return response.data\n  },\n\n  /**\n   * 取消定向爬取任务\n   */\n  cancelTargetedCrawl: async (stockCode: string): Promise<{\n    success: boolean\n    message: string\n    task_id?: number\n  }> => {\n    const response = await apiClient.post(`/stocks/${stockCode}/targeted-crawl/cancel`)\n    return response.data\n  },\n\n  /**\n   * 清除股票新闻\n   */\n  clearStockNews: async (stockCode: string): Promise<{\n    success: boolean\n    message: string\n    deleted_count?: number\n  }> => {\n    const response = await apiClient.delete(`/stocks/${stockCode}/news`)\n    return response.data\n  },\n}\n\n/**\n * 知识图谱 API\n */\nexport const knowledgeGraphApi = {\n  /**\n   * 获取公司知识图谱\n   */\n  getCompanyGraph: async (stockCode: string): Promise<{\n    stock_code: string\n    stock_name: string\n    graph_exists: boolean\n    stats?: Record<string, number>\n    name_variants: string[]\n    businesses: Array<{\n      name: string\n      type: string\n      status: string\n      description?: string\n    }>\n    industries: string[]\n    products: string[]\n    concepts: string[]\n    search_queries: string[]\n  }> => {\n    const response = await apiClient.get(`/knowledge-graph/${stockCode}`)\n    return response.data\n  },\n\n  /**\n   * 构建公司知识图谱\n   */\n  buildGraph: async (stockCode: string, forceRebuild: boolean = false): Promise<{\n    success: boolean\n    message: string\n    graph_stats?: Record<string, number>\n  }> => {\n    const response = await apiClient.post(`/knowledge-graph/${stockCode}/build`, {\n      force_rebuild: forceRebuild\n    })\n    return response.data\n  },\n\n  /**\n   * 更新公司知识图谱\n   */\n  updateGraph: async (stockCode: string): Promise<{\n    success: boolean\n    message: string\n    graph_stats?: Record<string, number>\n  }> => {\n    const response = await apiClient.post(`/knowledge-graph/${stockCode}/update`, {\n      update_from_news: true,\n      news_limit: 20\n    })\n    return response.data\n  },\n\n  /**\n   * 删除公司知识图谱\n   */\n  deleteGraph: async (stockCode: string): Promise<{\n    success: boolean\n    message: string\n  }> => {\n    const response = await apiClient.delete(`/knowledge-graph/${stockCode}`)\n    return response.data\n  },\n}\n\n/**\n * 智能体相关 API - Phase 2\n */\n// SSE 事件类型\nexport interface SSEDebateEvent {\n  type: 'phase' | 'agent' | 'progress' | 'result' | 'error' | 'complete' | 'task_plan'\n  data: {\n    phase?: string\n    message?: string\n    agent?: string\n    role?: string\n    content?: string\n    is_chunk?: boolean\n    is_start?: boolean\n    is_end?: boolean\n    round?: number\n    max_rounds?: number\n    success?: boolean\n    mode?: string\n    bull_analysis?: any\n    bear_analysis?: any\n    final_decision?: any\n    quick_analysis?: any\n    debate_id?: string\n    execution_time?: number\n    total_rounds?: number\n    debate_history?: any[]\n  }\n}\n\nexport const agentApi = {\n  /**\n   * 触发股票辩论分析（非流式）\n   * 注意：辩论分析需要多次LLM调用，耗时较长（可能2-5分钟）\n   */\n  runDebate: async (request: DebateRequest): Promise<DebateResponse> => {\n    const response = await apiClient.post<DebateResponse>('/agents/debate', request, {\n      timeout: 300000  // 5分钟超时，因为辩论需要多次LLM调用\n    })\n    return response.data\n  },\n\n  /**\n   * 流式辩论分析（SSE）\n   * 使用 Server-Sent Events 实时推送辩论过程\n   */\n  runDebateStream: (\n    request: DebateRequest,\n    onEvent: (event: SSEDebateEvent) => void,\n    onError?: (error: Error) => void,\n    onComplete?: () => void\n  ): (() => void) => {\n    const controller = new AbortController()\n    \n    // 使用 fetch 发送 POST 请求并处理 SSE 响应\n    fetch(`${API_BASE_URL}/agents/debate/stream`, {\n      method: 'POST',\n      headers: {\n        'Content-Type': 'application/json',\n      },\n      body: JSON.stringify(request),\n      signal: controller.signal,\n    })\n      .then(async (response) => {\n        if (!response.ok) {\n          throw new Error(`HTTP error! status: ${response.status}`)\n        }\n        \n        const reader = response.body?.getReader()\n        if (!reader) {\n          throw new Error('No response body')\n        }\n        \n        const decoder = new TextDecoder()\n        let buffer = ''\n        \n        while (true) {\n          const { done, value } = await reader.read()\n          if (done) break\n          \n          buffer += decoder.decode(value, { stream: true })\n          \n          // 解析 SSE 事件\n          const lines = buffer.split('\\n')\n          buffer = lines.pop() || '' // 保留未完成的行\n          \n          let currentEvent = ''\n          let currentData = ''\n          \n          for (const line of lines) {\n            if (line.startsWith('event: ')) {\n              currentEvent = line.slice(7)\n            } else if (line.startsWith('data: ')) {\n              currentData = line.slice(6)\n            } else if (line === '' && currentEvent && currentData) {\n              // 完整的事件\n              try {\n                const data = JSON.parse(currentData)\n                onEvent({ type: currentEvent as SSEDebateEvent['type'], data })\n              } catch (e) {\n                console.error('Failed to parse SSE data:', currentData)\n              }\n              currentEvent = ''\n              currentData = ''\n            }\n          }\n        }\n        \n        onComplete?.()\n      })\n      .catch((error) => {\n        if (error.name !== 'AbortError') {\n          console.error('SSE error:', error)\n          onError?.(error)\n        }\n      })\n    \n    // 返回取消函数\n    return () => controller.abort()\n  },\n\n  /**\n   * 辩论追问（SSE）\n   * 用户可以在辩论结束后继续提问\n   */\n  followUp: (\n    request: {\n      stock_code: string\n      stock_name?: string\n      question: string\n      target_agent?: string\n      context?: string\n    },\n    onEvent: (event: SSEDebateEvent) => void,\n    onError?: (error: Error) => void,\n    onComplete?: () => void\n  ): (() => void) => {\n    const controller = new AbortController()\n    \n    fetch(`${API_BASE_URL}/agents/debate/followup`, {\n      method: 'POST',\n      headers: {\n        'Content-Type': 'application/json',\n      },\n      body: JSON.stringify(request),\n      signal: controller.signal,\n    })\n      .then(async (response) => {\n        if (!response.ok) {\n          throw new Error(`HTTP error! status: ${response.status}`)\n        }\n        \n        const reader = response.body?.getReader()\n        if (!reader) {\n          throw new Error('No response body')\n        }\n        \n        const decoder = new TextDecoder()\n        let buffer = ''\n        \n        while (true) {\n          const { done, value } = await reader.read()\n          if (done) break\n          \n          buffer += decoder.decode(value, { stream: true })\n          \n          const lines = buffer.split('\\n')\n          buffer = lines.pop() || ''\n          \n          let currentEvent = ''\n          let currentData = ''\n          \n          for (const line of lines) {\n            if (line.startsWith('event: ')) {\n              currentEvent = line.slice(7)\n            } else if (line.startsWith('data: ')) {\n              currentData = line.slice(6)\n            } else if (line === '' && currentEvent && currentData) {\n              try {\n                const data = JSON.parse(currentData)\n                onEvent({ type: currentEvent as SSEDebateEvent['type'], data })\n              } catch (e) {\n                console.error('Failed to parse SSE data:', currentData)\n              }\n              currentEvent = ''\n              currentData = ''\n            }\n          }\n        }\n        \n        onComplete?.()\n      })\n      .catch((error) => {\n        if (error.name !== 'AbortError') {\n          console.error('SSE error:', error)\n          onError?.(error)\n        }\n      })\n    \n    return () => controller.abort()\n  },\n\n  /**\n   * 获取辩论结果\n   */\n  getDebateResult: async (debateId: string): Promise<DebateResponse> => {\n    const response = await apiClient.get<DebateResponse>(`/agents/debate/${debateId}`)\n    return response.data\n  },\n\n  /**\n   * 获取智能体执行日志\n   */\n  getLogs: async (params?: {\n    limit?: number\n    agent_name?: string\n    status?: 'started' | 'completed' | 'failed'\n  }): Promise<AgentLogEntry[]> => {\n    const response = await apiClient.get<AgentLogEntry[]>('/agents/logs', { params })\n    return response.data\n  },\n\n  /**\n   * 获取智能体性能指标\n   */\n  getMetrics: async (): Promise<AgentMetrics> => {\n    const response = await apiClient.get<AgentMetrics>('/agents/metrics')\n    return response.data\n  },\n\n  /**\n   * 获取辩论执行轨迹\n   */\n  getTrajectory: async (debateId: string): Promise<Array<{\n    step_id: string\n    step_name: string\n    timestamp: string\n    agent_name?: string\n    output_data?: Record<string, any>\n    status: string\n  }>> => {\n    const response = await apiClient.get(`/agents/trajectory/${debateId}`)\n    return response.data\n  },\n\n  /**\n   * 获取可用智能体列表\n   */\n  getAvailable: async (): Promise<{\n    agents: AgentInfo[]\n    workflows: WorkflowInfo[]\n  }> => {\n    const response = await apiClient.get('/agents/available')\n    return response.data\n  },\n\n  /**\n   * 执行搜索计划 (SSE)\n   */\n  executeSearch: (\n    plan: any,\n    onEvent: (event: SSEDebateEvent) => void,\n    onError?: (error: Error) => void,\n    onComplete?: () => void\n  ): (() => void) => {\n    const controller = new AbortController()\n    \n    fetch(`${API_BASE_URL}/agents/search/execute`, {\n      method: 'POST',\n      headers: {\n        'Content-Type': 'application/json',\n      },\n      body: JSON.stringify({ plan }),\n      signal: controller.signal,\n    })\n      .then(async (response) => {\n        if (!response.ok) {\n          throw new Error(`HTTP error! status: ${response.status}`)\n        }\n        \n        const reader = response.body?.getReader()\n        if (!reader) {\n          throw new Error('No response body')\n        }\n        \n        const decoder = new TextDecoder()\n        let buffer = ''\n        \n        while (true) {\n          const { done, value } = await reader.read()\n          if (done) break\n          \n          buffer += decoder.decode(value, { stream: true })\n          \n          const lines = buffer.split('\\n')\n          buffer = lines.pop() || ''\n          \n          let currentEvent = ''\n          let currentData = ''\n          \n          for (const line of lines) {\n            if (line.startsWith('event: ')) {\n              currentEvent = line.slice(7)\n            } else if (line.startsWith('data: ')) {\n              currentData = line.slice(6)\n            } else if (line === '' && currentEvent && currentData) {\n              try {\n                const data = JSON.parse(currentData)\n                onEvent({ type: currentEvent as SSEDebateEvent['type'], data })\n              } catch (e) {\n                console.error('Failed to parse SSE data:', currentData)\n              }\n              currentEvent = ''\n              currentData = ''\n            }\n          }\n        }\n        \n        onComplete?.()\n      })\n      .catch((error) => {\n        if (error.name !== 'AbortError') {\n          console.error('SSE error:', error)\n          onError?.(error)\n        }\n      })\n    \n    return () => controller.abort()\n  },\n\n  /**\n   * 清空执行日志（仅开发用）\n   */\n  clearLogs: async (): Promise<{ message: string }> => {\n    const response = await apiClient.delete('/agents/logs')\n    return response.data\n  },\n}\n\n/**\n * Alpha Mining 相关类型\n */\nexport interface AlphaMiningFactor {\n  formula: number[]\n  formula_str: string\n  sortino: number\n  sharpe?: number\n  ic?: number\n  discovered_at?: string\n  stock_code?: string\n}\n\nexport interface AlphaMiningMetrics {\n  sortino_ratio: number\n  sharpe_ratio: number\n  ic: number\n  rank_ic: number\n  max_drawdown: number\n  turnover: number\n  total_return: number\n  win_rate: number\n  avg_return?: number\n}\n\nexport interface MineRequest {\n  stock_code?: string\n  num_steps: number\n  use_sentiment: boolean\n  batch_size?: number\n}\n\nexport interface EvaluateRequest {\n  formula: string\n  stock_code?: string\n}\n\nexport interface SentimentCompareResult {\n  best_score: number\n  best_formula: string\n  total_steps: number\n  num_features: number\n}\n\nexport interface OperatorInfo {\n  name: string\n  arity: number\n  description: string\n}\n\n/**\n * Alpha Mining 相关 API\n */\nexport const alphaMiningApi = {\n  /**\n   * 启动因子挖掘任务（后台执行）\n   */\n  mine: async (request: MineRequest): Promise<{\n    success: boolean\n    task_id: string\n    message: string\n  }> => {\n    const response = await apiClient.post('/alpha-mining/mine', request)\n    return response.data\n  },\n\n  /**\n   * SSE 流式挖掘（返回 fetch Response）\n   */\n  mineStream: (\n    request: MineRequest,\n    onProgress: (data: {\n      step: number\n      progress: number\n      loss: number\n      avg_reward: number\n      max_reward: number\n      valid_ratio: number\n      best_score: number\n      best_formula: string\n    }) => void,\n    onComplete: (data: { best_score: number; best_formula: string }) => void,\n    onError: (error: string) => void\n  ): (() => void) => {\n    const controller = new AbortController()\n\n    fetch(`${apiClient.defaults.baseURL}/alpha-mining/mine/stream`, {\n      method: 'POST',\n      headers: { 'Content-Type': 'application/json' },\n      body: JSON.stringify(request),\n      signal: controller.signal,\n    })\n      .then(async (response) => {\n        if (!response.ok) throw new Error(`HTTP ${response.status}`)\n        const reader = response.body?.getReader()\n        if (!reader) throw new Error('No body')\n\n        const decoder = new TextDecoder()\n        let buffer = ''\n\n        while (true) {\n          const { done, value } = await reader.read()\n          if (done) break\n          buffer += decoder.decode(value, { stream: true })\n\n          const lines = buffer.split('\\n')\n          buffer = lines.pop() || ''\n\n          let event = '', data = ''\n          for (const line of lines) {\n            if (line.startsWith('event: ')) event = line.slice(7)\n            else if (line.startsWith('data: ')) data = line.slice(6)\n            else if (line === '' && event && data) {\n              try {\n                const parsed = JSON.parse(data)\n                if (event === 'progress') onProgress(parsed)\n                else if (event === 'complete') onComplete(parsed)\n                else if (event === 'error') onError(parsed.error)\n              } catch {}\n              event = data = ''\n            }\n          }\n        }\n      })\n      .catch((err) => {\n        if (err.name !== 'AbortError') onError(err.message)\n      })\n\n    return () => controller.abort()\n  },\n\n  /**\n   * 评估因子表达式\n   */\n  evaluate: async (request: EvaluateRequest): Promise<{\n    success: boolean\n    formula: string\n    metrics?: AlphaMiningMetrics\n    error?: string\n  }> => {\n    const response = await apiClient.post('/alpha-mining/evaluate', request)\n    return response.data\n  },\n\n  /**\n   * 生成候选因子\n   */\n  generate: async (batch_size: number = 10, max_len: number = 8): Promise<{\n    success: boolean\n    generated: number\n    valid: number\n    factors: Array<{\n      formula: number[]\n      formula_str: string\n      sortino: number\n      ic: number\n    }>\n  }> => {\n    const response = await apiClient.post('/alpha-mining/generate', { batch_size, max_len })\n    return response.data\n  },\n\n  /**\n   * 获取已发现的因子列表\n   */\n  getFactors: async (top_k: number = 20, stock_code?: string): Promise<{\n    success: boolean\n    total: number\n    returned: number\n    factors: AlphaMiningFactor[]\n  }> => {\n    const response = await apiClient.get('/alpha-mining/factors', {\n      params: { top_k, stock_code }\n    })\n    return response.data\n  },\n\n  /**\n   * 获取任务状态\n   */\n  getTaskStatus: async (task_id: string): Promise<{\n    task_id: string\n    status: 'pending' | 'running' | 'completed' | 'failed'\n    progress: number\n    result?: { best_factor: string; best_score: number; total_steps: number }\n    error?: string\n  }> => {\n    const response = await apiClient.get(`/alpha-mining/status/${task_id}`)\n    return response.data\n  },\n\n  /**\n   * 获取支持的操作符列表\n   */\n  getOperators: async (): Promise<{\n    success: boolean\n    features: string[]\n    operators: OperatorInfo[]\n  }> => {\n    const response = await apiClient.get('/alpha-mining/operators')\n    return response.data\n  },\n\n  /**\n   * 情感融合效果对比\n   */\n  compareSentiment: async (num_steps: number = 50, batch_size: number = 16): Promise<{\n    success: boolean\n    with_sentiment: SentimentCompareResult\n    without_sentiment: SentimentCompareResult\n    improvement: { score_diff: number; improvement_pct: number }\n  }> => {\n    const response = await apiClient.post('/alpha-mining/compare-sentiment', {\n      num_steps,\n      batch_size\n    })\n    return response.data\n  },\n\n  /**\n   * Agent 调用演示\n   */\n  agentDemo: async (params: {\n    stock_code?: string\n    num_steps: number\n    use_sentiment: boolean\n  }): Promise<{\n    success: boolean\n    agent_name: string\n    tool_name: string\n    input_params: Record<string, any>\n    output: { best_formula: string; best_score: number; total_steps: number } | null\n    execution_time: number\n    logs: string[]\n  }> => {\n    const response = await apiClient.post('/alpha-mining/agent-demo', params)\n    return response.data\n  },\n\n  /**\n   * 删除任务\n   */\n  deleteTask: async (task_id: string): Promise<{ success: boolean; message: string }> => {\n    const response = await apiClient.delete(`/alpha-mining/tasks/${task_id}`)\n    return response.data\n  },\n}\n\nexport { apiClient }\nexport default apiClient\n\n"
  },
  {
    "path": "frontend/src/lib/utils.ts",
    "content": "import { type ClassValue, clsx } from \"clsx\"\nimport { twMerge } from \"tailwind-merge\"\n\nexport function cn(...inputs: ClassValue[]) {\n  return twMerge(clsx(inputs))\n}\n\nexport function formatDate(date: string | Date): string {\n  const d = typeof date === 'string' ? new Date(date) : date\n  return new Intl.DateTimeFormat('zh-CN', {\n    year: 'numeric',\n    month: '2-digit',\n    day: '2-digit',\n    hour: '2-digit',\n    minute: '2-digit',\n  }).format(d)\n}\n\nexport interface TimeI18n {\n  justNow: string\n  minutesAgo: string\n  hoursAgo: string\n  daysAgo: string\n}\n\nconst defaultTimeI18n: TimeI18n = {\n  justNow: '刚刚',\n  minutesAgo: '分钟前',\n  hoursAgo: '小时前',\n  daysAgo: '天前',\n}\n\nexport function formatRelativeTime(date: string | Date, i18n?: TimeI18n): string {\n  const t = i18n || defaultTimeI18n\n  const d = typeof date === 'string' ? new Date(date) : date\n  const now = new Date()\n  const diffMs = now.getTime() - d.getTime()\n  const diffMins = Math.floor(diffMs / 60000)\n  \n  if (diffMins < 1) return t.justNow\n  if (diffMins < 60) return `${diffMins}${t.minutesAgo}`\n  \n  const diffHours = Math.floor(diffMins / 60)\n  if (diffHours < 24) return `${diffHours}${t.hoursAgo}`\n  \n  const diffDays = Math.floor(diffHours / 24)\n  if (diffDays < 7) return `${diffDays}${t.daysAgo}`\n  \n  return formatDate(d)\n}\n\n"
  },
  {
    "path": "frontend/src/main.tsx",
    "content": "import React from 'react'\nimport ReactDOM from 'react-dom/client'\nimport { BrowserRouter } from 'react-router-dom'\nimport { QueryClient, QueryClientProvider } from '@tanstack/react-query'\nimport App from './App.tsx'\nimport './index.css'\n\nconst queryClient = new QueryClient({\n  defaultOptions: {\n    queries: {\n      refetchOnWindowFocus: false,\n      retry: 1,\n      staleTime: 5 * 60 * 1000, // 5 minutes\n    },\n  },\n})\n\nReactDOM.createRoot(document.getElementById('root')!).render(\n  <React.StrictMode>\n    <QueryClientProvider client={queryClient}>\n      <BrowserRouter>\n        <App />\n      </BrowserRouter>\n    </QueryClientProvider>\n  </React.StrictMode>,\n)\n\n"
  },
  {
    "path": "frontend/src/pages/AgentMonitorPage.tsx",
    "content": "import { useState, useEffect } from 'react'\nimport { useQuery, useMutation, useQueryClient } from '@tanstack/react-query'\nimport { toast } from 'sonner'\nimport { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card'\nimport { Button } from '@/components/ui/button'\nimport { Badge } from '@/components/ui/badge'\nimport { agentApi } from '@/lib/api-client'\nimport {\n  Bot,\n  Activity,\n  CheckCircle2,\n  XCircle,\n  Clock,\n  RefreshCw,\n  Trash2,\n  Play,\n  Zap,\n  GitBranch,\n  MessageSquare,\n  TrendingUp,\n  AlertCircle,\n  ChevronRight,\n  Workflow,\n  ArrowRight,\n  Timer,\n} from 'lucide-react'\nimport type { AgentLogEntry, AgentMetrics, AgentInfo, WorkflowInfo } from '@/types/api'\nimport { useGlobalI18n, useLanguageStore } from '@/store/useLanguageStore'\nimport { formatRelativeTime as formatRelativeTimeUtil } from '@/lib/utils'\n\n// 智能体角色和描述映射\nconst AGENT_ROLES: Record<string, { roleZh: string; roleEn: string; descZh: string; descEn: string }> = {\n  NewsAnalyst: {\n    roleZh: '金融新闻分析师',\n    roleEn: 'Financial News Analyst',\n    descZh: '分析金融新闻的情感、影响和关键信息',\n    descEn: 'Analyzes sentiment, impact and key information of financial news',\n  },\n  BullResearcher: {\n    roleZh: '看多研究员',\n    roleEn: 'Bull Researcher',\n    descZh: '从积极角度分析股票,发现投资机会',\n    descEn: 'Analyzes stocks from a positive perspective, discovering investment opportunities',\n  },\n  BearResearcher: {\n    roleZh: '看空研究员',\n    roleEn: 'Bear Researcher',\n    descZh: '从风险角度分析股票,识别潜在问题',\n    descEn: 'Analyzes stocks from a risk perspective, identifying potential problems',\n  },\n  InvestmentManager: {\n    roleZh: '投资经理',\n    roleEn: 'Investment Manager',\n    descZh: '综合多方观点,做出投资决策',\n    descEn: 'Integrates multiple viewpoints to make investment decisions',\n  },\n  SearchAnalyst: {\n    roleZh: '搜索分析师',\n    roleEn: 'Search Analyst',\n    descZh: '动态获取数据,支持 AkShare、BochaAI、网页搜索等',\n    descEn: 'Dynamically obtains data, supports AkShare, BochaAI, web search, etc.',\n  },\n}\n\n// 工作流描述映射\nconst WORKFLOW_DESCRIPTIONS: Record<string, { descZh: string; descEn: string }> = {\n  NewsAnalysisWorkflow: {\n    descZh: '新闻分析工作流：爬取 -> 清洗 -> 情感分析',\n    descEn: 'News Analysis Workflow: Crawl -> Clean -> Sentiment Analysis',\n  },\n  InvestmentDebateWorkflow: {\n    descZh: '投资辩论工作流：Bull vs Bear 多智能体辩论',\n    descEn: 'Investment Debate Workflow: Bull vs Bear Multi-Agent Debate',\n  },\n}\n\n// 状态徽章颜色\nconst statusColors: Record<string, { bg: string; text: string; border: string }> = {\n  started: { bg: 'bg-blue-100', text: 'text-blue-700', border: 'border-blue-200' },\n  completed: { bg: 'bg-emerald-100', text: 'text-emerald-700', border: 'border-emerald-200' },\n  failed: { bg: 'bg-rose-100', text: 'text-rose-700', border: 'border-rose-200' },\n  active: { bg: 'bg-emerald-100', text: 'text-emerald-700', border: 'border-emerald-200' },\n  inactive: { bg: 'bg-gray-100', text: 'text-gray-700', border: 'border-gray-200' },\n}\n\n// 智能体图标映射\nconst agentIcons: Record<string, React.ReactNode> = {\n  NewsAnalyst: <MessageSquare className=\"w-4 h-4\" />,\n  BullResearcher: <TrendingUp className=\"w-4 h-4\" />,\n  BearResearcher: <AlertCircle className=\"w-4 h-4\" />,\n  InvestmentManager: <Zap className=\"w-4 h-4\" />,\n  DebateWorkflow: <Workflow className=\"w-4 h-4\" />,\n}\n\n// 格式化时间戳\n// 格式化时间戳（已废弃，使用 formatRelativeTimeUtil）\nfunction formatTimestamp(timestamp: string, locale: string = 'zh-CN'): string {\n  const date = new Date(timestamp)\n  return date.toLocaleString(locale, {\n    month: '2-digit',\n    day: '2-digit',\n    hour: '2-digit',\n    minute: '2-digit',\n    second: '2-digit',\n  })\n}\n\nexport default function AgentMonitorPage() {\n  const t = useGlobalI18n()\n  const { lang } = useLanguageStore()\n  const queryClient = useQueryClient()\n  const [selectedAgent, setSelectedAgent] = useState<string | null>(null)\n  const [autoRefresh, setAutoRefresh] = useState(true)\n  \n  // 获取智能体角色和描述（国际化）\n  const getAgentInfo = (agentName: string, defaultRole: string, defaultDesc: string) => {\n    const agentInfo = AGENT_ROLES[agentName]\n    if (agentInfo) {\n      return {\n        role: lang === 'zh' ? agentInfo.roleZh : agentInfo.roleEn,\n        description: lang === 'zh' ? agentInfo.descZh : agentInfo.descEn,\n      }\n    }\n    return {\n      role: defaultRole,\n      description: defaultDesc,\n    }\n  }\n  \n  // 获取工作流描述（国际化）\n  const getWorkflowDescription = (workflowName: string, defaultDesc: string) => {\n    const workflowInfo = WORKFLOW_DESCRIPTIONS[workflowName]\n    if (workflowInfo) {\n      return lang === 'zh' ? workflowInfo.descZh : workflowInfo.descEn\n    }\n    return defaultDesc\n  }\n\n  // 获取性能指标\n  const { data: metrics, isLoading: metricsLoading, refetch: refetchMetrics } = useQuery({\n    queryKey: ['agent', 'metrics'],\n    queryFn: agentApi.getMetrics,\n    refetchInterval: autoRefresh ? 10000 : false, // 10秒自动刷新\n    staleTime: 5000,\n  })\n\n  // 获取执行日志\n  const { data: logs, isLoading: logsLoading, refetch: refetchLogs } = useQuery({\n    queryKey: ['agent', 'logs', selectedAgent],\n    queryFn: () => agentApi.getLogs({\n      limit: 50,\n      agent_name: selectedAgent || undefined,\n    }),\n    refetchInterval: autoRefresh ? 5000 : false, // 5秒自动刷新\n    staleTime: 3000,\n  })\n\n  // 获取可用智能体\n  const { data: available, isLoading: availableLoading } = useQuery({\n    queryKey: ['agent', 'available'],\n    queryFn: agentApi.getAvailable,\n    staleTime: 60000, // 1分钟\n  })\n\n  // 清空日志 Mutation\n  const clearLogsMutation = useMutation({\n    mutationFn: agentApi.clearLogs,\n    onSuccess: (data) => {\n      toast.success(data.message)\n      queryClient.invalidateQueries({ queryKey: ['agent', 'logs'] })\n      queryClient.invalidateQueries({ queryKey: ['agent', 'metrics'] })\n    },\n    onError: (error: Error) => {\n      toast.error(`清空失败: ${error.message}`)\n    },\n  })\n\n  const handleRefresh = () => {\n    refetchMetrics()\n    refetchLogs()\n    toast.success('数据已刷新')\n  }\n\n  const handleClearLogs = () => {\n    if (window.confirm(t.agents.confirmClearLogs)) {\n      clearLogsMutation.mutate()\n    }\n  }\n\n  // 计算成功率\n  const successRate = metrics\n    ? ((metrics.successful_executions / metrics.total_executions) * 100 || 0).toFixed(1)\n    : '0'\n\n  return (\n    <div className=\"p-6 space-y-6 bg-gradient-to-br from-slate-50 to-indigo-50 min-h-screen\">\n      {/* 顶部标题区 */}\n      <div className=\"flex items-center justify-between\">\n        <div>\n          <h1 className=\"text-3xl font-bold tracking-tight text-gray-900 flex items-center gap-3\">\n            <Activity className=\"w-8 h-8 text-indigo-500\" />\n            {t.agents.title}\n          </h1>\n          <p className=\"text-muted-foreground mt-1\">\n            {t.agents.subtitle}\n          </p>\n        </div>\n        <div className=\"flex items-center gap-3\">\n          <Button\n            variant=\"outline\"\n            size=\"sm\"\n            onClick={() => setAutoRefresh(!autoRefresh)}\n            className={autoRefresh ? 'bg-emerald-50 border-emerald-200' : ''}\n          >\n            <RefreshCw className={`w-4 h-4 mr-2 ${autoRefresh ? 'animate-spin' : ''}`} />\n            {autoRefresh ? t.agents.autoRefreshing : t.agents.autoRefreshing}\n          </Button>\n          <Button\n            variant=\"outline\"\n            size=\"sm\"\n            onClick={handleRefresh}\n          >\n            <RefreshCw className=\"w-4 h-4 mr-2\" />\n            {t.agents.refresh}\n          </Button>\n          <Button\n            variant=\"outline\"\n            size=\"sm\"\n            onClick={handleClearLogs}\n            className=\"text-rose-600 hover:bg-rose-50\"\n            disabled={clearLogsMutation.isPending}\n          >\n            <Trash2 className=\"w-4 h-4 mr-2\" />\n            {t.agents.clearLogs}\n          </Button>\n        </div>\n      </div>\n\n      {/* 性能指标卡片 */}\n      <div className=\"grid grid-cols-1 md:grid-cols-4 gap-4\">\n        <Card className=\"bg-white/80 backdrop-blur-sm border-indigo-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.agents.totalExec}</p>\n                <p className=\"text-3xl font-bold text-indigo-600\">\n                  {metrics?.total_executions || 0}\n                </p>\n              </div>\n              <Play className=\"w-10 h-10 text-indigo-500/30\" />\n            </div>\n          </CardContent>\n        </Card>\n\n        <Card className=\"bg-white/80 backdrop-blur-sm border-emerald-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.agents.successExec}</p>\n                <p className=\"text-3xl font-bold text-emerald-600\">\n                  {metrics?.successful_executions || 0}\n                </p>\n              </div>\n              <CheckCircle2 className=\"w-10 h-10 text-emerald-500/30\" />\n            </div>\n            <p className=\"text-xs text-muted-foreground mt-2\">\n              {t.agents.successRate} {successRate}%\n            </p>\n          </CardContent>\n        </Card>\n\n        <Card className=\"bg-white/80 backdrop-blur-sm border-rose-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.agents.failedExec}</p>\n                <p className=\"text-3xl font-bold text-rose-600\">\n                  {metrics?.failed_executions || 0}\n                </p>\n              </div>\n              <XCircle className=\"w-10 h-10 text-rose-500/30\" />\n            </div>\n          </CardContent>\n        </Card>\n\n        <Card className=\"bg-white/80 backdrop-blur-sm border-amber-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.agents.avgTime}</p>\n                <p className=\"text-3xl font-bold text-amber-600\">\n                  {metrics?.avg_execution_time?.toFixed(1) || 0}s\n                </p>\n              </div>\n              <Clock className=\"w-10 h-10 text-amber-500/30\" />\n            </div>\n          </CardContent>\n        </Card>\n      </div>\n\n      <div className=\"grid grid-cols-1 lg:grid-cols-3 gap-6\">\n        {/* 智能体列表 */}\n        <Card className=\"bg-white/90\">\n          <CardHeader>\n            <CardTitle className=\"flex items-center gap-2\">\n              <Bot className=\"w-5 h-5 text-indigo-500\" />\n              {t.agents.availableAgents}\n            </CardTitle>\n            <CardDescription>\n              {t.agents.availableAgentsDesc}\n            </CardDescription>\n          </CardHeader>\n          <CardContent className=\"space-y-4\">\n            {/* 智能体 */}\n            <div>\n              <h4 className=\"text-sm font-medium text-gray-500 mb-2\">{t.agents.agents}</h4>\n              <div className=\"space-y-2\">\n                {available?.agents.map((agent) => (\n                  <div\n                    key={agent.name}\n                    className={`p-3 rounded-lg border cursor-pointer transition-all ${\n                      selectedAgent === agent.name\n                        ? 'border-indigo-300 bg-indigo-50'\n                        : 'border-gray-100 hover:border-indigo-200 hover:bg-indigo-50/50'\n                    }`}\n                    onClick={() => setSelectedAgent(selectedAgent === agent.name ? null : agent.name)}\n                  >\n                    <div className=\"flex items-center justify-between\">\n                      <div className=\"flex items-center gap-2\">\n                        <div className={`w-8 h-8 rounded-full flex items-center justify-center ${\n                          agent.status === 'active' ? 'bg-emerald-100 text-emerald-600' : 'bg-gray-100 text-gray-600'\n                        }`}>\n                          {agentIcons[agent.name] || <Bot className=\"w-4 h-4\" />}\n                        </div>\n                        <div>\n                          <p className=\"font-medium text-gray-900 text-sm\">{agent.name}</p>\n                          <p className=\"text-xs text-gray-500\">{getAgentInfo(agent.name, agent.role, agent.description).role}</p>\n                        </div>\n                      </div>\n                      <Badge className={`${statusColors[agent.status].bg} ${statusColors[agent.status].text}`}>\n                        {agent.status === 'active' ? t.agents.active : t.agents.inactive}\n                      </Badge>\n                    </div>\n                    <p className=\"text-xs text-gray-500 mt-2\">{getAgentInfo(agent.name, agent.role, agent.description).description}</p>\n                    {metrics?.agent_stats?.[agent.name] && (\n                      <div className=\"flex items-center gap-3 mt-2 text-xs text-gray-400\">\n                        <span>{t.agents.execTimes} {metrics.agent_stats[agent.name].total} {t.agents.times}</span>\n                        <span>•</span>\n                        <span>{t.agents.success} {metrics.agent_stats[agent.name].successful}</span>\n                        {metrics.agent_stats[agent.name].avg_time > 0 && (\n                          <>\n                            <span>•</span>\n                            <span>{t.agents.avg} {metrics.agent_stats[agent.name].avg_time.toFixed(1)}s</span>\n                          </>\n                        )}\n                      </div>\n                    )}\n                  </div>\n                ))}\n              </div>\n            </div>\n\n            {/* 工作流 */}\n            <div>\n              <h4 className=\"text-sm font-medium text-gray-500 mb-2\">{t.agents.workflows}</h4>\n              <div className=\"space-y-2\">\n                {available?.workflows.map((workflow) => (\n                  <div\n                    key={workflow.name}\n                    className=\"p-3 rounded-lg border border-gray-100 hover:border-purple-200 hover:bg-purple-50/50 transition-all\"\n                  >\n                    <div className=\"flex items-center gap-2 mb-2\">\n                      <GitBranch className=\"w-4 h-4 text-purple-500\" />\n                      <span className=\"font-medium text-gray-900 text-sm\">{workflow.name}</span>\n                    </div>\n                    <p className=\"text-xs text-gray-500\">{getWorkflowDescription(workflow.name, workflow.description)}</p>\n                    <div className=\"flex items-center gap-1 mt-2 flex-wrap\">\n                      {workflow.agents.map((agent, idx) => (\n                        <span key={agent} className=\"flex items-center\">\n                          <Badge variant=\"outline\" className=\"text-xs\">\n                            {agent}\n                          </Badge>\n                          {idx < workflow.agents.length - 1 && (\n                            <ArrowRight className=\"w-3 h-3 text-gray-400 mx-1\" />\n                          )}\n                        </span>\n                      ))}\n                    </div>\n                  </div>\n                ))}\n              </div>\n            </div>\n          </CardContent>\n        </Card>\n\n        {/* 执行日志 */}\n        <Card className=\"lg:col-span-2 bg-white/90\">\n          <CardHeader>\n            <CardTitle className=\"flex items-center justify-between\">\n              <span className=\"flex items-center gap-2\">\n                <Activity className=\"w-5 h-5 text-blue-500\" />\n                {t.agents.execLogs}\n                {selectedAgent && (\n                  <Badge variant=\"outline\" className=\"ml-2\">\n                    {selectedAgent}\n                    <button\n                      onClick={(e) => {\n                        e.stopPropagation()\n                        setSelectedAgent(null)\n                      }}\n                      className=\"ml-1 hover:text-rose-500\"\n                    >\n                      ×\n                    </button>\n                  </Badge>\n                )}\n              </span>\n              <span className=\"text-sm font-normal text-gray-500\">\n                {logs?.length || 0} {t.agents.records}\n              </span>\n            </CardTitle>\n            <CardDescription>\n              {t.agents.execLogsDesc}\n            </CardDescription>\n          </CardHeader>\n          <CardContent>\n            {logsLoading ? (\n              <div className=\"flex items-center justify-center py-12\">\n                <RefreshCw className=\"w-8 h-8 animate-spin text-blue-500\" />\n              </div>\n            ) : logs && logs.length > 0 ? (\n              <div className=\"space-y-3 max-h-[600px] overflow-y-auto pr-2\">\n                {logs.map((log, index) => (\n                  <div\n                    key={log.id}\n                    className={`p-3 rounded-lg border transition-all ${\n                      log.status === 'completed'\n                        ? 'border-emerald-100 bg-emerald-50/30'\n                        : log.status === 'failed'\n                        ? 'border-rose-100 bg-rose-50/30'\n                        : 'border-blue-100 bg-blue-50/30'\n                    }`}\n                  >\n                    <div className=\"flex items-start justify-between gap-3\">\n                      <div className=\"flex items-start gap-3\">\n                        <div className={`w-8 h-8 rounded-full flex items-center justify-center flex-shrink-0 ${\n                          log.status === 'completed'\n                            ? 'bg-emerald-100 text-emerald-600'\n                            : log.status === 'failed'\n                            ? 'bg-rose-100 text-rose-600'\n                            : 'bg-blue-100 text-blue-600'\n                        }`}>\n                          {log.status === 'completed' ? (\n                            <CheckCircle2 className=\"w-4 h-4\" />\n                          ) : log.status === 'failed' ? (\n                            <XCircle className=\"w-4 h-4\" />\n                          ) : (\n                            <Play className=\"w-4 h-4\" />\n                          )}\n                        </div>\n                        <div>\n                          <div className=\"flex items-center gap-2 flex-wrap\">\n                            <span className=\"font-medium text-gray-900\">\n                              {log.agent_name}\n                            </span>\n                            {log.agent_role && (\n                              <span className=\"text-xs text-gray-500\">\n                                ({getAgentInfo(log.agent_name || '', log.agent_role, '').role})\n                              </span>\n                            )}\n                            <Badge className={`${statusColors[log.status].bg} ${statusColors[log.status].text} text-xs`}>\n                              {log.status === 'completed' ? t.tasks.completed : log.status === 'failed' ? t.tasks.failed : t.tasks.running}\n                            </Badge>\n                          </div>\n                          <p className=\"text-sm text-gray-600 mt-1\">\n                            {log.action.replace(/_/g, ' ')}\n                          </p>\n                          {log.details && Object.keys(log.details).length > 0 && (\n                            <div className=\"mt-2 text-xs text-gray-500 bg-gray-50 p-2 rounded\">\n                              {Object.entries(log.details).map(([key, value]) => (\n                                <div key={key} className=\"flex gap-2\">\n                                  <span className=\"font-medium\">{key}:</span>\n                                  <span>{String(value)}</span>\n                                </div>\n                              ))}\n                            </div>\n                          )}\n                        </div>\n                      </div>\n                      <div className=\"text-right flex-shrink-0\">\n                        <p className=\"text-xs text-gray-400\">\n                          {formatRelativeTimeUtil(log.timestamp, t.time)}\n                        </p>\n                        {log.execution_time && (\n                          <p className=\"text-xs text-gray-500 flex items-center gap-1 mt-1\">\n                            <Timer className=\"w-3 h-3\" />\n                            {log.execution_time.toFixed(1)}s\n                          </p>\n                        )}\n                      </div>\n                    </div>\n                  </div>\n                ))}\n              </div>\n            ) : (\n              <div className=\"text-center py-12 text-gray-500\">\n                <Activity className=\"w-16 h-16 mx-auto opacity-30 mb-4\" />\n                <p className=\"text-lg\">{t.agents.noLogs}</p>\n                <p className=\"text-sm mt-2\">\n                  {t.agents.noLogsHint}\n                </p>\n              </div>\n            )}\n          </CardContent>\n        </Card>\n      </div>\n\n      {/* 最近活动时间线 */}\n      {metrics?.recent_activity && metrics.recent_activity.length > 0 && (\n        <Card className=\"bg-white/90\">\n          <CardHeader>\n            <CardTitle className=\"flex items-center gap-2\">\n              <Clock className=\"w-5 h-5 text-purple-500\" />\n              {t.agents.recentActivity || 'Recent Activity'}\n            </CardTitle>\n          </CardHeader>\n          <CardContent>\n            <div className=\"flex items-center gap-2 overflow-x-auto pb-2\">\n              {metrics.recent_activity.map((activity, index) => (\n                <div\n                  key={index}\n                  className={`flex-shrink-0 px-3 py-2 rounded-lg border ${statusColors[activity.status]?.bg} ${statusColors[activity.status]?.border}`}\n                >\n                  <div className=\"flex items-center gap-2\">\n                    <div className={`w-2 h-2 rounded-full ${\n                      activity.status === 'completed' ? 'bg-emerald-500' :\n                      activity.status === 'failed' ? 'bg-rose-500' : 'bg-blue-500'\n                    }`} />\n                    <span className=\"text-sm font-medium\">{activity.agent_name}</span>\n                  </div>\n                  <p className=\"text-xs text-gray-500 mt-1\">\n                    {activity.action.replace(/_/g, ' ')}\n                  </p>\n                  <p className=\"text-xs text-gray-400\">\n                    {formatRelativeTimeUtil(activity.timestamp, t.time)}\n                  </p>\n                </div>\n              ))}\n            </div>\n          </CardContent>\n        </Card>\n      )}\n    </div>\n  )\n}\n"
  },
  {
    "path": "frontend/src/pages/AlphaMiningPage.tsx",
    "content": "/**\n * Alpha Mining 因子挖掘页面（增强版）\n * \n * 技术亮点展示：\n * - 符号回归 + RL: Transformer 策略网络 + REINFORCE 算法\n * - DSL 系统: 21 个时序/算术/条件操作符\n * - 情感融合: 支持新闻情感特征增强因子效果\n * - 完整评估: Sortino/Sharpe/IC/Rank IC 等指标\n * - AgenticX 集成: BaseTool 封装，支持 Agent 调用\n */\n\nimport React, { useState, useEffect, useCallback } from 'react';\nimport { Card, CardContent, CardHeader, CardTitle, CardDescription } from '../components/ui/card';\nimport { Button } from '../components/ui/button';\nimport { Badge } from '../components/ui/badge';\nimport { alphaMiningApi } from '../lib/api-client';\nimport type { AlphaMiningFactor, AlphaMiningMetrics } from '../lib/api-client';\nimport {\n  OperatorGrid,\n  TrainingMonitor,\n  MetricsDashboard,\n  SentimentCompare,\n  AgentDemo,\n} from '../components/alpha-mining';\nimport {\n  Tabs,\n  TabsContent,\n  TabsList,\n  TabsTrigger,\n} from '@/components/ui/tabs';\nimport {\n  Zap, Code, BarChart2, Heart, Bot,\n  RefreshCw, ChevronRight, Sparkles, Brain\n} from 'lucide-react';\nimport { useLanguageStore } from '@/store/useLanguageStore';\n\n// ============================================================================\n// 国际化文案\n// ============================================================================\n\nconst i18n = {\n  zh: {\n    title: 'Alpha因子挖掘',\n    subtitle: '',\n    tabs: {\n      overview: '概览',\n      training: '训练',\n      evaluate: '评估',\n      sentiment: '情感融合',\n      agent: 'Agent',\n    },\n    techBadges: {\n      rl: '符号回归 + RL',\n      dsl: '21 个 DSL 操作符',\n      sentiment: '情感融合',\n      metrics: '完整评估体系',\n      agent: 'AgenticX 集成',\n    },\n    dsl: {\n      title: 'DSL 操作符系统',\n      desc: '21 个可组合操作符，支持算术/时序/条件运算',\n    },\n    factors: {\n      title: '已发现的因子',\n      desc: '按 Sortino Ratio 排序的最优因子',\n      empty: '暂无已发现的因子',\n      emptyHint: '去\"训练\"标签页启动因子挖掘',\n      loading: '加载中...',\n    },\n    arch: {\n      title: '系统架构',\n      features: '特征数据',\n      featuresDesc: '行情 + 情感',\n      generator: 'AlphaGenerator',\n      generatorDesc: 'Transformer',\n      vm: 'FactorVM',\n      vmDesc: 'StackVM 执行',\n      evaluator: 'Evaluator',\n      evaluatorDesc: '回测评估',\n      rl: 'REINFORCE',\n      rlDesc: '策略梯度',\n    },\n    evaluate: {\n      expression: '因子表达式',\n      placeholder: '点击下方操作符构建表达式，如: ADD(RET, MA5(VOL))',\n      button: '评估因子',\n      evaluating: '评估中...',\n      operators: '操作符',\n    },\n  },\n  en: {\n    title: 'Alpha Mining',\n    subtitle: '',\n    tabs: {\n      overview: 'Overview',\n      training: 'Training',\n      evaluate: 'Evaluate',\n      sentiment: 'Sentiment',\n      agent: 'Agent',\n    },\n    techBadges: {\n      rl: 'Symbolic Regression + RL',\n      dsl: '21 DSL Operators',\n      sentiment: 'Sentiment Fusion',\n      metrics: 'Full Evaluation',\n      agent: 'AgenticX Integration',\n    },\n    dsl: {\n      title: 'DSL Operator System',\n      desc: '21 composable operators for arithmetic/timeseries/conditional operations',\n    },\n    factors: {\n      title: 'Discovered Factors',\n      desc: 'Top factors ranked by Sortino Ratio',\n      empty: 'No factors discovered yet',\n      emptyHint: 'Go to \"Training\" tab to start factor mining',\n      loading: 'Loading...',\n    },\n    arch: {\n      title: 'System Architecture',\n      features: 'Features',\n      featuresDesc: 'Price + Sentiment',\n      generator: 'AlphaGenerator',\n      generatorDesc: 'Transformer',\n      vm: 'FactorVM',\n      vmDesc: 'StackVM Executor',\n      evaluator: 'Evaluator',\n      evaluatorDesc: 'Backtesting',\n      rl: 'REINFORCE',\n      rlDesc: 'Policy Gradient',\n    },\n    evaluate: {\n      expression: 'Factor Expression',\n      placeholder: 'Click operators below to build expression, e.g.: ADD(RET, MA5(VOL))',\n      button: 'Evaluate',\n      evaluating: 'Evaluating...',\n      operators: 'Operators',\n    },\n  },\n};\n\n// ============================================================================\n// 主页面组件\n// ============================================================================\n\nconst AlphaMiningPage: React.FC = () => {\n  const { lang } = useLanguageStore();\n  const [activeTab, setActiveTab] = useState('overview');\n  const [factors, setFactors] = useState<AlphaMiningFactor[]>([]);\n  const [isLoadingFactors, setIsLoadingFactors] = useState(true);\n  const [evaluateFormula, setEvaluateFormula] = useState('');\n  const [evaluateResult, setEvaluateResult] = useState<AlphaMiningMetrics | null>(null);\n  const [isEvaluating, setIsEvaluating] = useState(false);\n\n  const t = i18n[lang];\n\n  // 加载已发现的因子\n  const loadFactors = useCallback(async () => {\n    setIsLoadingFactors(true);\n    try {\n      const response = await alphaMiningApi.getFactors(20);\n      setFactors(response.factors || []);\n    } catch (error) {\n      console.error('Failed to load factors:', error);\n    } finally {\n      setIsLoadingFactors(false);\n    }\n  }, []);\n\n  useEffect(() => {\n    loadFactors();\n  }, [loadFactors]);\n\n  // 评估因子\n  const handleEvaluate = useCallback(async () => {\n    if (!evaluateFormula.trim()) return;\n    \n    setIsEvaluating(true);\n    try {\n      const response = await alphaMiningApi.evaluate({ formula: evaluateFormula });\n      if (response.success && response.metrics) {\n        setEvaluateResult(response.metrics);\n      }\n    } catch (error) {\n      console.error('Evaluate error:', error);\n    } finally {\n      setIsEvaluating(false);\n    }\n  }, [evaluateFormula]);\n\n  // 插入操作符到表达式\n  const handleOperatorClick = (op: string) => {\n    setEvaluateFormula(prev => prev ? `${prev} ${op}` : op);\n  };\n\n  // 插入特征到表达式\n  const handleFeatureClick = (feature: string) => {\n    setEvaluateFormula(prev => prev ? `${prev} ${feature}` : feature);\n  };\n\n  // 训练完成回调\n  const handleTrainingComplete = useCallback((result: { best_score: number; best_formula: string }) => {\n    loadFactors(); // 刷新因子列表\n    if (result.best_formula) {\n      setEvaluateFormula(result.best_formula);\n    }\n  }, [loadFactors]);\n\n  return (\n    <div className=\"container mx-auto px-4 py-6 max-w-7xl\">\n      {/* 页面标题 */}\n      <div className=\"mb-6\">\n        <div className=\"flex items-center gap-3\">\n          <div className=\"p-2 bg-gradient-to-br from-amber-400 to-orange-500 rounded-lg\">\n            <Brain className=\"w-6 h-6 text-white\" />\n          </div>\n          <div>\n            <h1 className=\"text-2xl font-bold\">{t.title}</h1>\n            {t.subtitle && <p className=\"text-gray-600 text-sm\">{t.subtitle}</p>}\n          </div>\n        </div>\n        \n        {/* 技术亮点标签 */}\n        <div className=\"flex flex-wrap gap-2 mt-4\">\n          <TechBadge icon={<Zap className=\"w-3 h-3\" />} label={t.techBadges.rl} />\n          <TechBadge icon={<Code className=\"w-3 h-3\" />} label={t.techBadges.dsl} />\n          <TechBadge icon={<Heart className=\"w-3 h-3\" />} label={t.techBadges.sentiment} />\n          <TechBadge icon={<BarChart2 className=\"w-3 h-3\" />} label={t.techBadges.metrics} />\n          <TechBadge icon={<Bot className=\"w-3 h-3\" />} label={t.techBadges.agent} />\n        </div>\n      </div>\n\n      {/* 主内容区 - Tab 切换 */}\n      <Tabs value={activeTab} onValueChange={setActiveTab} className=\"space-y-6\">\n        <TabsList className=\"grid grid-cols-5 w-full max-w-2xl\">\n          <TabsTrigger value=\"overview\" className=\"gap-1\">\n            <Sparkles className=\"w-4 h-4\" />\n            {t.tabs.overview}\n          </TabsTrigger>\n          <TabsTrigger value=\"training\" className=\"gap-1\">\n            <Zap className=\"w-4 h-4\" />\n            {t.tabs.training}\n          </TabsTrigger>\n          <TabsTrigger value=\"evaluate\" className=\"gap-1\">\n            <BarChart2 className=\"w-4 h-4\" />\n            {t.tabs.evaluate}\n          </TabsTrigger>\n          <TabsTrigger value=\"sentiment\" className=\"gap-1\">\n            <Heart className=\"w-4 h-4\" />\n            {t.tabs.sentiment}\n          </TabsTrigger>\n          <TabsTrigger value=\"agent\" className=\"gap-1\">\n            <Bot className=\"w-4 h-4\" />\n            {t.tabs.agent}\n          </TabsTrigger>\n        </TabsList>\n\n        {/* 概览 Tab */}\n        <TabsContent value=\"overview\" className=\"space-y-6\">\n          <div className=\"grid grid-cols-1 lg:grid-cols-2 gap-6\">\n            {/* DSL 操作符展示 */}\n            <Card>\n              <CardHeader>\n                <CardTitle className=\"flex items-center gap-2\">\n                  <Code className=\"w-5 h-5 text-blue-500\" />\n                  {t.dsl.title}\n                </CardTitle>\n                <CardDescription>{t.dsl.desc}</CardDescription>\n              </CardHeader>\n              <CardContent>\n                <OperatorGrid\n                  onOperatorClick={handleOperatorClick}\n                  onFeatureClick={handleFeatureClick}\n                  compact\n                />\n              </CardContent>\n            </Card>\n\n            {/* 已发现的因子 */}\n            <Card>\n              <CardHeader>\n                <div className=\"flex items-center justify-between\">\n                  <div>\n                    <CardTitle className=\"flex items-center gap-2\">\n                      <Sparkles className=\"w-5 h-5 text-amber-500\" />\n                      {t.factors.title}\n                    </CardTitle>\n                    <CardDescription>{t.factors.desc}</CardDescription>\n                  </div>\n                  <Button variant=\"outline\" size=\"sm\" onClick={loadFactors}>\n                    <RefreshCw className={`w-4 h-4 ${isLoadingFactors ? 'animate-spin' : ''}`} />\n                  </Button>\n                </div>\n              </CardHeader>\n              <CardContent>\n                {isLoadingFactors ? (\n                  <div className=\"text-center py-8 text-gray-500\">\n                    {t.factors.loading}\n                  </div>\n                ) : factors.length === 0 ? (\n                  <div className=\"text-center py-8 text-gray-500\">\n                    <Sparkles className=\"w-10 h-10 mx-auto opacity-50 mb-2\" />\n                    <p>{t.factors.empty}</p>\n                    <p className=\"text-sm mt-1\">{t.factors.emptyHint}</p>\n                  </div>\n                ) : (\n                  <div className=\"space-y-2 max-h-96 overflow-y-auto\">\n                    {factors.slice(0, 10).map((factor, idx) => (\n                      <FactorCard\n                        key={idx}\n                        factor={factor}\n                        rank={idx + 1}\n                        onSelect={() => setEvaluateFormula(factor.formula_str)}\n                      />\n                    ))}\n                  </div>\n                )}\n              </CardContent>\n            </Card>\n          </div>\n\n          {/* 系统架构说明 */}\n          <Card className=\"bg-gradient-to-r from-indigo-50 to-purple-50\">\n            <CardContent className=\"py-6\">\n              <h3 className=\"font-semibold mb-4 flex items-center gap-2\">\n                <Brain className=\"w-5 h-5 text-indigo-500\" />\n                {t.arch.title}\n              </h3>\n              <div className=\"flex items-center justify-center gap-2 flex-wrap\">\n                <ArchNode label={t.arch.features} sub={t.arch.featuresDesc} />\n                <ChevronRight className=\"w-4 h-4 text-gray-400\" />\n                <ArchNode label={t.arch.generator} sub={t.arch.generatorDesc} highlight />\n                <ChevronRight className=\"w-4 h-4 text-gray-400\" />\n                <ArchNode label={t.arch.vm} sub={t.arch.vmDesc} />\n                <ChevronRight className=\"w-4 h-4 text-gray-400\" />\n                <ArchNode label={t.arch.evaluator} sub={t.arch.evaluatorDesc} />\n                <ChevronRight className=\"w-4 h-4 text-gray-400\" />\n                <ArchNode label={t.arch.rl} sub={t.arch.rlDesc} highlight />\n              </div>\n            </CardContent>\n          </Card>\n        </TabsContent>\n\n        {/* 训练 Tab */}\n        <TabsContent value=\"training\">\n          <TrainingMonitor onTrainingComplete={handleTrainingComplete} />\n        </TabsContent>\n\n        {/* 评估 Tab */}\n        <TabsContent value=\"evaluate\" className=\"space-y-6\">\n          <div className=\"grid grid-cols-1 lg:grid-cols-3 gap-6\">\n            {/* 左侧：操作符和输入 */}\n            <div className=\"space-y-4\">\n              <Card>\n                <CardHeader className=\"pb-2\">\n                  <CardTitle className=\"text-sm\">{t.evaluate.expression}</CardTitle>\n                </CardHeader>\n                <CardContent>\n                  <textarea\n                    value={evaluateFormula}\n                    onChange={(e) => setEvaluateFormula(e.target.value)}\n                    placeholder={t.evaluate.placeholder}\n                    className=\"w-full px-3 py-2 border rounded-md font-mono text-sm h-24\"\n                  />\n                  <Button\n                    onClick={handleEvaluate}\n                    disabled={isEvaluating || !evaluateFormula.trim()}\n                    className=\"w-full mt-2\"\n                  >\n                    {isEvaluating ? t.evaluate.evaluating : t.evaluate.button}\n                  </Button>\n                </CardContent>\n              </Card>\n\n              <Card>\n                <CardHeader className=\"pb-2\">\n                  <CardTitle className=\"text-sm\">{t.evaluate.operators}</CardTitle>\n                </CardHeader>\n                <CardContent>\n                  <OperatorGrid\n                    onOperatorClick={handleOperatorClick}\n                    onFeatureClick={handleFeatureClick}\n                    compact\n                  />\n                </CardContent>\n              </Card>\n            </div>\n\n            {/* 右侧：评估结果 */}\n            <div className=\"lg:col-span-2\">\n              <MetricsDashboard\n                metrics={evaluateResult}\n                formula={evaluateFormula}\n                loading={isEvaluating}\n              />\n            </div>\n          </div>\n        </TabsContent>\n\n        {/* 情感融合 Tab */}\n        <TabsContent value=\"sentiment\">\n          <SentimentCompare />\n        </TabsContent>\n\n        {/* Agent Tab */}\n        <TabsContent value=\"agent\">\n          <AgentDemo />\n        </TabsContent>\n      </Tabs>\n    </div>\n  );\n};\n\n// ============================================================================\n// 子组件\n// ============================================================================\n\n// 技术亮点徽章\nconst TechBadge: React.FC<{ icon: React.ReactNode; label: string }> = ({ icon, label }) => (\n  <Badge variant=\"outline\" className=\"gap-1 px-2 py-1\">\n    {icon}\n    {label}\n  </Badge>\n);\n\n// 因子卡片\ninterface FactorCardProps {\n  factor: AlphaMiningFactor;\n  rank: number;\n  onSelect: () => void;\n}\n\nconst FactorCard: React.FC<FactorCardProps> = ({ factor, rank, onSelect }) => {\n  const getSortinoColor = (sortino: number) => {\n    if (sortino > 1) return 'text-green-600 bg-green-50';\n    if (sortino > 0) return 'text-amber-600 bg-amber-50';\n    return 'text-red-600 bg-red-50';\n  };\n\n  return (\n    <div\n      className=\"p-3 border rounded-lg hover:shadow-sm transition-shadow cursor-pointer\"\n      onClick={onSelect}\n    >\n      <div className=\"flex items-start justify-between gap-2\">\n        <div className=\"flex items-center gap-2\">\n          <span className=\"text-xs text-gray-400 font-medium\">#{rank}</span>\n          <code className=\"text-sm font-mono truncate max-w-[200px]\" title={factor.formula_str}>\n            {factor.formula_str}\n          </code>\n        </div>\n        <Badge className={`text-xs ${getSortinoColor(factor.sortino)}`}>\n          {factor.sortino.toFixed(3)}\n        </Badge>\n      </div>\n      {factor.discovered_at && (\n        <div className=\"text-xs text-gray-400 mt-1\">\n          {new Date(factor.discovered_at).toLocaleString()}\n        </div>\n      )}\n    </div>\n  );\n};\n\n// 架构节点\nconst ArchNode: React.FC<{ label: string; sub: string; highlight?: boolean }> = ({\n  label,\n  sub,\n  highlight,\n}) => (\n  <div className={`\n    px-4 py-2 rounded-lg text-center\n    ${highlight ? 'bg-indigo-100 border-2 border-indigo-300' : 'bg-white border border-gray-200'}\n  `}>\n    <div className={`text-sm font-medium ${highlight ? 'text-indigo-700' : ''}`}>{label}</div>\n    <div className=\"text-xs text-gray-500\">{sub}</div>\n  </div>\n);\n\nexport default AlphaMiningPage;\n"
  },
  {
    "path": "frontend/src/pages/Dashboard.tsx",
    "content": "import { useQuery } from '@tanstack/react-query'\nimport { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card'\nimport { Badge } from '@/components/ui/badge'\nimport { Button } from '@/components/ui/button'\nimport { newsApi, taskApi } from '@/lib/api-client'\nimport { TrendingUp, Newspaper, Activity, Clock } from 'lucide-react'\nimport { useState, useMemo, useEffect } from 'react'\nimport { formatRelativeTime } from '@/lib/utils'\nimport NewsDetailDrawer from '@/components/NewsDetailDrawer'\nimport { useGlobalI18n, useLanguageStore } from '@/store/useLanguageStore'\nimport { useCallback } from 'react'\n\n// 新闻源配置\nconst NEWS_SOURCES = [\n  { key: 'all', nameZh: '全部来源', nameEn: 'All Sources', icon: '📰' },\n  { key: 'sina', nameZh: '新浪财经', nameEn: 'Sina Finance', icon: '🌐' },\n  { key: 'tencent', nameZh: '腾讯财经', nameEn: 'Tencent Finance', icon: '🐧' },\n  { key: 'jwview', nameZh: '金融界', nameEn: 'JRJ', icon: '💰' },\n  { key: 'eeo', nameZh: '经济观察网', nameEn: 'EEO', icon: '📊' },\n  { key: 'caijing', nameZh: '财经网', nameEn: 'Caijing', icon: '📈' },\n  { key: 'jingji21', nameZh: '21经济网', nameEn: '21Jingji', icon: '📉' },\n  { key: 'nbd', nameZh: '每日经济新闻', nameEn: 'NBD', icon: '📰' },\n  { key: 'yicai', nameZh: '第一财经', nameEn: 'Yicai', icon: '🎯' },\n  { key: '163', nameZh: '网易财经', nameEn: '163 Finance', icon: '📧' },\n  { key: 'eastmoney', nameZh: '东方财富', nameEn: 'Eastmoney', icon: '💎' },\n]\n\n// 后端可能返回的中文 source 名称到 key 的映射\nconst SOURCE_NAME_TO_KEY: Record<string, string> = {\n  '全部来源': 'all',\n  '新浪财经': 'sina',\n  '腾讯财经': 'tencent',\n  '金融界': 'jwview',\n  '经济观察网': 'eeo',\n  '财经网': 'caijing',\n  '21经济网': 'jingji21',\n  '每日经济新闻': 'nbd',\n  '第一财经': 'yicai',\n  '网易财经': '163',\n  '东方财富': 'eastmoney',\n  '东方财富网': 'eastmoney', // 后端可能返回的变体\n  '同花顺财经': 'tonghuashun',\n  '证券时报': 'securities_times',\n  '证券之星': 'stockstar',\n  '中金在线': 'cnfol',\n  '澎湃新闻': 'thepaper',\n  '证券时报网': 'securities_times_online',\n  '北京商报': 'bbtnews',\n  '卡车之家': 'truckhome',\n  'sogou': 'sogou',\n}\n\n// 扩展的新闻源配置（包含后端可能返回的其他来源）\nconst EXTENDED_NEWS_SOURCES: Record<string, { nameZh: string; nameEn: string; icon: string }> = {\n  tonghuashun: { nameZh: '同花顺财经', nameEn: 'Tonghuashun Finance', icon: '📊' },\n  securities_times: { nameZh: '证券时报', nameEn: 'Securities Times', icon: '📰' },\n  stockstar: { nameZh: '证券之星', nameEn: 'Stockstar', icon: '⭐' },\n  cnfol: { nameZh: '中金在线', nameEn: 'CNFOL', icon: '💼' },\n  thepaper: { nameZh: '澎湃新闻', nameEn: 'The Paper', icon: '📰' },\n  securities_times_online: { nameZh: '证券时报网', nameEn: 'Securities Times Online', icon: '📰' },\n  bbtnews: { nameZh: '北京商报', nameEn: 'Beijing Business Today', icon: '📰' },\n  truckhome: { nameZh: '卡车之家', nameEn: 'Truck Home', icon: '🚚' },\n  sogou: { nameZh: '搜狗', nameEn: 'Sogou', icon: '🔍' },\n}\n\nexport default function Dashboard() {\n  const t = useGlobalI18n()\n  const { lang } = useLanguageStore()\n  const [selectedSource, setSelectedSource] = useState<string>('all')\n  const [selectedNewsId, setSelectedNewsId] = useState<number | null>(null)\n  const [drawerOpen, setDrawerOpen] = useState(false)\n\n  // 获取新闻源图标\n  const getSourceIcon = useCallback((sourceValue: string) => {\n    // 1. 先尝试直接匹配 key\n    const sourceByKey = NEWS_SOURCES.find(s => s.key === sourceValue)\n    if (sourceByKey) {\n      return sourceByKey.icon\n    }\n    \n    // 2. 尝试通过中文名称映射到 key\n    const mappedKey = SOURCE_NAME_TO_KEY[sourceValue]\n    if (mappedKey) {\n      const source = NEWS_SOURCES.find(s => s.key === mappedKey)\n      if (source) {\n        return source.icon\n      }\n      // 如果在扩展配置中\n      const extendedSource = EXTENDED_NEWS_SOURCES[mappedKey]\n      if (extendedSource) {\n        return extendedSource.icon\n      }\n    }\n    \n    // 3. 尝试在扩展配置中直接查找\n    const extendedSource = EXTENDED_NEWS_SOURCES[sourceValue]\n    if (extendedSource) {\n      return extendedSource.icon\n    }\n    \n    // 4. 默认图标\n    return '📰'\n  }, [])\n  \n  // 获取新闻源名称（支持中文 source 名称映射）\n  const getSourceName = useCallback((sourceValue: string) => {\n    // 1. 先尝试直接匹配 key\n    const sourceByKey = NEWS_SOURCES.find(s => s.key === sourceValue)\n    if (sourceByKey) {\n      return t.nav.home === '首页' ? sourceByKey.nameZh : sourceByKey.nameEn\n    }\n    \n    // 2. 尝试通过中文名称映射到 key\n    const mappedKey = SOURCE_NAME_TO_KEY[sourceValue]\n    if (mappedKey) {\n      const source = NEWS_SOURCES.find(s => s.key === mappedKey)\n      if (source) {\n        return t.nav.home === '首页' ? source.nameZh : source.nameEn\n      }\n      // 如果在扩展配置中\n      const extendedSource = EXTENDED_NEWS_SOURCES[mappedKey]\n      if (extendedSource) {\n        return t.nav.home === '首页' ? extendedSource.nameZh : extendedSource.nameEn\n      }\n    }\n    \n    // 3. 尝试在扩展配置中直接查找\n    const extendedSource = EXTENDED_NEWS_SOURCES[sourceValue]\n    if (extendedSource) {\n      return t.nav.home === '首页' ? extendedSource.nameZh : extendedSource.nameEn\n    }\n    \n    // 4. 如果都不匹配，返回原值（可能是英文或未知来源）\n    return sourceValue\n  }, [t])\n\n  // 监听自定义事件，用于从相关新闻跳转\n  useEffect(() => {\n    const handleNewsSelect = (e: CustomEvent<number>) => {\n      setSelectedNewsId(e.detail)\n      setDrawerOpen(true)\n    }\n    window.addEventListener('news-select', handleNewsSelect as EventListener)\n    return () => {\n      window.removeEventListener('news-select', handleNewsSelect as EventListener)\n    }\n  }, [])\n\n  const { data: newsList } = useQuery({\n    queryKey: ['news', 'dashboard', selectedSource],\n    queryFn: () => newsApi.getLatestNews({ \n      source: selectedSource === 'all' ? undefined : selectedSource, \n      limit: 100\n    }),\n  })\n\n  const { data: taskStats } = useQuery({\n    queryKey: ['tasks', 'stats'],\n    queryFn: () => taskApi.getTaskStats(),\n    refetchInterval: 10000, // 每10秒刷新\n  })\n\n  // 按来源统计新闻数量\n  const newsStats = useMemo(() => {\n    if (!newsList) return []\n    const stats = new Map<string, number>()\n    newsList.forEach(news => {\n      stats.set(news.source, (stats.get(news.source) || 0) + 1)\n    })\n    return Array.from(stats.entries()).map(([source, count]) => ({\n      source,\n      count,\n      name: getSourceName(source),\n      icon: getSourceIcon(source)\n    })).sort((a, b) => b.count - a.count)\n  }, [newsList, getSourceName, getSourceIcon])\n\n  return (\n    <div className=\"p-6 space-y-6\">\n      <div className=\"flex items-center justify-between\">\n        <div>\n          <h1 className=\"text-3xl font-bold tracking-tight\">{t.dashboard.title}</h1>\n          <p className=\"text-muted-foreground\">\n            {t.dashboard.subtitle}\n          </p>\n        </div>\n      </div>\n\n      {/* 统计卡片 */}\n      <div className=\"grid gap-4 md:grid-cols-2 lg:grid-cols-4\">\n        <Card>\n          <CardHeader className=\"flex flex-row items-center justify-between space-y-0 pb-2\">\n            <CardTitle className=\"text-sm font-medium\">\n              {t.dashboard.totalNews}\n            </CardTitle>\n            <Newspaper className=\"h-4 w-4 text-muted-foreground\" />\n          </CardHeader>\n          <CardContent>\n            <div className=\"text-2xl font-bold\">{taskStats?.total_news_saved || 0}</div>\n            <p className=\"text-xs text-muted-foreground\">\n              {t.dashboard.savedToDb}\n            </p>\n          </CardContent>\n        </Card>\n\n        <Card>\n          <CardHeader className=\"flex flex-row items-center justify-between space-y-0 pb-2\">\n            <CardTitle className=\"text-sm font-medium\">\n              {t.dashboard.totalTasks}\n            </CardTitle>\n            <Activity className=\"h-4 w-4 text-muted-foreground\" />\n          </CardHeader>\n          <CardContent>\n            <div className=\"text-2xl font-bold\">{taskStats?.total || 0}</div>\n            <p className=\"text-xs text-muted-foreground\">\n              {t.dashboard.recentCompleted} {taskStats?.recent_completed || 0} {t.dashboard.units}\n            </p>\n          </CardContent>\n        </Card>\n\n        <Card>\n          <CardHeader className=\"flex flex-row items-center justify-between space-y-0 pb-2\">\n            <CardTitle className=\"text-sm font-medium\">\n              {t.dashboard.crawlRate}\n            </CardTitle>\n            <TrendingUp className=\"h-4 w-4 text-muted-foreground\" />\n          </CardHeader>\n          <CardContent>\n            <div className=\"text-2xl font-bold\">\n              {taskStats && taskStats.total > 0\n                ? (((taskStats.by_status?.completed || 0) / taskStats.total) * 100).toFixed(1)\n                : '0.0'}%\n            </div>\n            <p className=\"text-xs text-muted-foreground\">\n              {taskStats?.by_status?.completed || 0} / {taskStats?.total || 0}\n            </p>\n          </CardContent>\n        </Card>\n\n        <Card>\n          <CardHeader className=\"flex flex-row items-center justify-between space-y-0 pb-2\">\n            <CardTitle className=\"text-sm font-medium\">\n              {t.dashboard.liveMonitor}\n            </CardTitle>\n            <Clock className=\"h-4 w-4 text-muted-foreground\" />\n          </CardHeader>\n          <CardContent>\n            <div className=\"text-2xl font-bold text-green-600\">{t.dashboard.running}</div>\n            <p className=\"text-xs text-muted-foreground\">\n              {t.dashboard.autoInterval}\n            </p>\n          </CardContent>\n        </Card>\n      </div>\n\n      {/* 来源统计 */}\n      {newsStats.length > 0 && (\n        <Card>\n          <CardHeader>\n            <CardTitle>{t.dashboard.newsStats}</CardTitle>\n            <CardDescription>{t.dashboard.newsStatsDesc}</CardDescription>\n          </CardHeader>\n          <CardContent>\n            <div className=\"grid grid-cols-2 md:grid-cols-3 lg:grid-cols-5 gap-3\">\n              {newsStats.map(stat => (\n                <Card key={stat.source} className=\"p-4 hover:shadow-md transition-shadow\">\n                  <div className=\"flex flex-col items-center gap-2\">\n                    <span className=\"text-3xl\">{stat.icon}</span>\n                    <span className=\"text-sm font-medium text-center\">{stat.name}</span>\n                    <span className=\"text-2xl font-bold text-blue-600\">{stat.count}</span>\n                  </div>\n                </Card>\n              ))}\n            </div>\n          </CardContent>\n        </Card>\n      )}\n\n      {/* 来源筛选 */}\n      <Card>\n        <CardHeader>\n          <CardTitle>{t.dashboard.latestNews}</CardTitle>\n          <CardDescription>{t.dashboard.latestNewsDesc}</CardDescription>\n        </CardHeader>\n        <CardContent className=\"space-y-4\">\n          {/* 筛选器 */}\n          <div className=\"flex flex-wrap gap-2 p-3 bg-slate-50 rounded-lg\">\n            {NEWS_SOURCES.map(source => (\n              <Button\n                key={source.key}\n                variant={selectedSource === source.key ? 'default' : 'outline'}\n                size=\"sm\"\n                onClick={() => setSelectedSource(source.key)}\n                className=\"text-xs\"\n              >\n                <span className=\"mr-1\">{source.icon}</span>\n                {getSourceName(source.key)}\n                {source.key !== 'all' && newsStats.find(s => s.source === source.key) && (\n                  <Badge variant=\"secondary\" className=\"ml-2\">\n                    {newsStats.find(s => s.source === source.key)?.count}\n                  </Badge>\n                )}\n              </Button>\n            ))}\n          </div>\n\n          {/* 新闻列表 */}\n          {newsList && newsList.length > 0 ? (\n            <div className=\"space-y-3 max-h-[600px] overflow-y-auto\">\n              {newsList.slice(0, 20).map((news) => (\n                <div \n                  key={news.id} \n                  className=\"flex items-start gap-4 p-4 hover:bg-gray-50 rounded-lg transition-colors border border-gray-100 cursor-pointer\"\n                  onClick={() => {\n                    setSelectedNewsId(news.id)\n                    setDrawerOpen(true)\n                  }}\n                >\n                  <div className=\"flex-1\">\n                    <h3 className=\"font-medium leading-tight\">{news.title}</h3>\n                    <p className=\"text-sm text-gray-600 mt-1 line-clamp-2\">\n                      {news.content}\n                    </p>\n                    <div className=\"flex items-center gap-4 mt-2 text-xs text-gray-500\">\n                      <span className=\"flex items-center gap-1\">\n                        <span>{getSourceIcon(news.source)}</span>\n                        <span>{getSourceName(news.source)}</span>\n                      </span>\n                      <span>⏰ {formatRelativeTime(news.publish_time || news.created_at, t.time)}</span>\n                      {news.stock_codes && news.stock_codes.length > 0 && (\n                        <span className=\"flex items-center gap-1\">\n                          📈 \n                          {news.stock_codes.slice(0, 3).map(code => (\n                            <Badge key={code} variant=\"outline\" className=\"text-xs\">\n                              {code}\n                            </Badge>\n                          ))}\n                          {news.stock_codes.length > 3 && (\n                            <span className=\"text-xs text-gray-400\">\n                              +{news.stock_codes.length - 3}\n                            </span>\n                          )}\n                        </span>\n                      )}\n                    </div>\n                  </div>\n                </div>\n              ))}\n            </div>\n          ) : (\n            <div className=\"text-center py-8 text-gray-500\">\n              {selectedSource === 'all' ? t.dashboard.noNews : t.dashboard.noNewsFrom}\n            </div>\n          )}\n        </CardContent>\n      </Card>\n\n      {/* 新闻详情抽屉 */}\n      <NewsDetailDrawer\n        newsId={selectedNewsId}\n        open={drawerOpen}\n        onOpenChange={(open) => {\n          setDrawerOpen(open)\n          if (!open) {\n            // 延迟清除newsId，避免关闭动画时闪烁\n            setTimeout(() => setSelectedNewsId(null), 300)\n          }\n        }}\n      />\n    </div>\n  )\n}\n"
  },
  {
    "path": "frontend/src/pages/NewsListPage.tsx",
    "content": "import { useState, useEffect, useMemo, useRef, useCallback } from 'react'\nimport { useQuery, useMutation, useQueryClient } from '@tanstack/react-query'\nimport { toast } from 'sonner'\nimport { Card, CardContent, CardHeader, CardTitle, CardFooter } from '@/components/ui/card'\nimport { Button } from '@/components/ui/button'\nimport { Badge } from '@/components/ui/badge'\nimport { newsApi, analysisApi } from '@/lib/api-client'\nimport { formatRelativeTime } from '@/lib/utils'\nimport { RefreshCw, Sparkles, Calendar, Newspaper, TrendingUp, RefreshCcw, ChevronDown, ChevronUp, CheckCircle2, XCircle, MinusCircle, HelpCircle, Search, X, Check, Minus } from 'lucide-react'\nimport NewsDetailDrawer from '@/components/NewsDetailDrawer'\nimport { useNewsToolbar } from '@/context/NewsToolbarContext'\nimport { useDebounce } from '@/hooks/useDebounce'\nimport HighlightText from '@/components/HighlightText'\nimport { useModelConfig } from '@/components/ModelSelector'\nimport { useGlobalI18n } from '@/store/useLanguageStore'\n\ntype FilterType = 'all' | 'pending' | 'positive' | 'negative' | 'neutral'\n\n// 独立的搜索框组件，自己管理内部状态，避免每次输入都重新挂载\nfunction SearchBox({ onSearch }: { onSearch: (query: string) => void }) {\n  const t = useGlobalI18n()\n  const [localQuery, setLocalQuery] = useState('')\n  const isComposingRef = useRef(false)\n\n  const handleChange = (e: React.ChangeEvent<HTMLInputElement>) => {\n    const value = e.target.value\n    setLocalQuery(value)\n    // 非组合输入状态下，直接更新搜索词\n    if (!isComposingRef.current) {\n      onSearch(value)\n    }\n  }\n\n  const handleCompositionEnd = (e: React.CompositionEvent<HTMLInputElement>) => {\n    isComposingRef.current = false\n    // 组合输入结束后，更新搜索词\n    onSearch(e.currentTarget.value)\n  }\n\n  const handleClear = () => {\n    setLocalQuery('')\n    onSearch('')\n  }\n\n  return (\n    <div className=\"relative w-full\">\n      <Search className=\"absolute left-3 top-1/2 -translate-y-1/2 w-4 h-4 text-gray-400\" />\n      <input\n        type=\"text\"\n        placeholder={t.news.search}\n        value={localQuery}\n        onCompositionStart={() => {\n          isComposingRef.current = true\n        }}\n        onCompositionEnd={handleCompositionEnd}\n        onChange={handleChange}\n        className=\"w-full pl-10 pr-10 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500 h-10\"\n      />\n      {localQuery && (\n        <button\n          onClick={handleClear}\n          className=\"absolute right-3 top-1/2 -translate-y-1/2 text-gray-400 hover:text-gray-600 transition-colors\"\n          aria-label=\"清除搜索\"\n        >\n          <X className=\"w-4 h-4\" />\n        </button>\n      )}\n    </div>\n  )\n}\n\n// 新闻源配置（国际化在组件内处理）\nconst NEWS_SOURCES = [\n  { key: 'all', nameZh: '全部来源', nameEn: 'All Sources', icon: '📰' },\n  { key: 'sina', nameZh: '新浪财经', nameEn: 'Sina Finance', icon: '🌐' },\n  { key: 'tencent', nameZh: '腾讯财经', nameEn: 'Tencent Finance', icon: '🐧' },\n  { key: 'jwview', nameZh: '金融界', nameEn: 'JRJ', icon: '💰' },\n  { key: 'eeo', nameZh: '经济观察网', nameEn: 'EEO', icon: '📊' },\n  { key: 'caijing', nameZh: '财经网', nameEn: 'Caijing', icon: '📈' },\n  { key: 'jingji21', nameZh: '21经济网', nameEn: '21Jingji', icon: '📉' },\n  { key: 'nbd', nameZh: '每日经济新闻', nameEn: 'NBD', icon: '📰' },\n  { key: 'yicai', nameZh: '第一财经', nameEn: 'Yicai', icon: '🎯' },\n  { key: '163', nameZh: '网易财经', nameEn: '163 Finance', icon: '📧' },\n  { key: 'eastmoney', nameZh: '东方财富', nameEn: 'Eastmoney', icon: '💎' },\n]\n\n// 后端可能返回的中文 source 名称到 key 的映射\nconst SOURCE_NAME_TO_KEY: Record<string, string> = {\n  '全部来源': 'all',\n  '新浪财经': 'sina',\n  '腾讯财经': 'tencent',\n  '金融界': 'jwview',\n  '经济观察网': 'eeo',\n  '财经网': 'caijing',\n  '21经济网': 'jingji21',\n  '每日经济新闻': 'nbd',\n  '第一财经': 'yicai',\n  '网易财经': '163',\n  '东方财富': 'eastmoney',\n  '东方财富网': 'eastmoney', // 后端可能返回的变体\n  '同花顺财经': 'tonghuashun', // 后端可能返回的其他来源\n  '证券时报': 'securities_times',\n  '证券之星': 'stockstar',\n  '中金在线': 'cnfol',\n  '澎湃新闻': 'thepaper',\n  '证券时报网': 'securities_times_online',\n  '北京商报': 'bbtnews',\n  '卡车之家': 'truckhome',\n  'sogou': 'sogou',\n}\n\n// 扩展的新闻源配置（包含后端可能返回的其他来源）\nconst EXTENDED_NEWS_SOURCES: Record<string, { nameZh: string; nameEn: string; icon: string }> = {\n  tonghuashun: { nameZh: '同花顺财经', nameEn: 'Tonghuashun Finance', icon: '📊' },\n  securities_times: { nameZh: '证券时报', nameEn: 'Securities Times', icon: '📰' },\n  stockstar: { nameZh: '证券之星', nameEn: 'Stockstar', icon: '⭐' },\n  cnfol: { nameZh: '中金在线', nameEn: 'CNFOL', icon: '💼' },\n  thepaper: { nameZh: '澎湃新闻', nameEn: 'The Paper', icon: '📰' },\n  securities_times_online: { nameZh: '证券时报网', nameEn: 'Securities Times Online', icon: '📰' },\n  bbtnews: { nameZh: '北京商报', nameEn: 'Beijing Business Today', icon: '📰' },\n  truckhome: { nameZh: '卡车之家', nameEn: 'Truck Home', icon: '🚚' },\n  sogou: { nameZh: '搜狗', nameEn: 'Sogou', icon: '🔍' },\n}\n\nexport default function NewsListPage() {\n  const t = useGlobalI18n()\n  const queryClient = useQueryClient()\n  const [expandedStocks, setExpandedStocks] = useState<Set<number>>(new Set())\n  const [gridCols, setGridCols] = useState(3)\n  const [activeFilter, setActiveFilter] = useState<FilterType>('all')\n  const [activeSource, setActiveSource] = useState<string>('all') // 新增：来源筛选\n  const [analyzingNewsId, setAnalyzingNewsId] = useState<number | null>(null)\n  const [isRefreshing, setIsRefreshing] = useState(false) // 手动管理刷新状态\n  const [selectedNewsId, setSelectedNewsId] = useState<number | null>(null)\n  const [drawerOpen, setDrawerOpen] = useState(false)\n  const [searchQuery, setSearchQuery] = useState('') // 搜索关键词\n  const debouncedSearchQuery = useDebounce(searchQuery, 300) // 防抖处理\n  const [selectedNewsIds, setSelectedNewsIds] = useState<Set<number>>(new Set()) // 批量选择状态\n  const [lastSelectedNewsId, setLastSelectedNewsId] = useState<number | null>(null) // 最后选中的新闻ID（用于Shift范围选择）\n  \n  // 获取新闻源图标\n  const getSourceIcon = useCallback((sourceValue: string) => {\n    // 1. 先尝试直接匹配 key\n    const sourceByKey = NEWS_SOURCES.find(s => s.key === sourceValue)\n    if (sourceByKey) {\n      return sourceByKey.icon\n    }\n    \n    // 2. 尝试通过中文名称映射到 key\n    const mappedKey = SOURCE_NAME_TO_KEY[sourceValue]\n    if (mappedKey) {\n      const source = NEWS_SOURCES.find(s => s.key === mappedKey)\n      if (source) {\n        return source.icon\n      }\n      // 如果在扩展配置中\n      const extendedSource = EXTENDED_NEWS_SOURCES[mappedKey]\n      if (extendedSource) {\n        return extendedSource.icon\n      }\n    }\n    \n    // 3. 尝试在扩展配置中直接查找\n    const extendedSource = EXTENDED_NEWS_SOURCES[sourceValue]\n    if (extendedSource) {\n      return extendedSource.icon\n    }\n    \n    // 4. 默认图标\n    return '📰'\n  }, [])\n  \n  // 获取新闻源名称（支持中文 source 名称映射）\n  const getSourceName = useCallback((sourceValue: string) => {\n    // 1. 先尝试直接匹配 key\n    const sourceByKey = NEWS_SOURCES.find(s => s.key === sourceValue)\n    if (sourceByKey) {\n      return t.nav.home === '首页' ? sourceByKey.nameZh : sourceByKey.nameEn\n    }\n    \n    // 2. 尝试通过中文名称映射到 key\n    const mappedKey = SOURCE_NAME_TO_KEY[sourceValue]\n    if (mappedKey) {\n      const source = NEWS_SOURCES.find(s => s.key === mappedKey)\n      if (source) {\n        return t.nav.home === '首页' ? source.nameZh : source.nameEn\n      }\n      // 如果在扩展配置中\n      const extendedSource = EXTENDED_NEWS_SOURCES[mappedKey]\n      if (extendedSource) {\n        return t.nav.home === '首页' ? extendedSource.nameZh : extendedSource.nameEn\n      }\n    }\n    \n    // 3. 尝试在扩展配置中直接查找\n    const extendedSource = EXTENDED_NEWS_SOURCES[sourceValue]\n    if (extendedSource) {\n      return t.nav.home === '首页' ? extendedSource.nameZh : extendedSource.nameEn\n    }\n    \n    // 4. 如果都不匹配，返回原值（可能是英文或未知来源）\n    return sourceValue\n  }, [t])\n  \n  // 使用 useCallback 确保 onSearch 引用稳定，避免 SearchBox 重新渲染\n  const handleSearch = useCallback((query: string) => {\n    setSearchQuery(query)\n  }, [])\n  const { setContent } = useNewsToolbar()\n  const modelConfig = useModelConfig() // 获取当前选中的模型配置\n\n  // 监听自定义事件，用于从相关新闻跳转\n  useEffect(() => {\n    const handleNewsSelect = (e: CustomEvent<number>) => {\n      setSelectedNewsId(e.detail)\n      setDrawerOpen(true)\n    }\n    window.addEventListener('news-select', handleNewsSelect as EventListener)\n    return () => {\n      window.removeEventListener('news-select', handleNewsSelect as EventListener)\n    }\n  }, [])\n\n  // 当切换新闻源时，清空选择状态\n  useEffect(() => {\n    setSelectedNewsIds(new Set())\n    setLastSelectedNewsId(null)\n  }, [activeSource])\n\n  // Phase 2: 自动轮询最新新闻（1分钟刷新）\n  const { data: newsList, isLoading, isError, error } = useQuery({\n    queryKey: ['news', 'latest', activeSource],\n    queryFn: () => newsApi.getLatestNews({ \n      source: activeSource === 'all' ? undefined : activeSource, \n      limit: 200  // 增加限制以显示更多新闻\n    }),\n    staleTime: 1 * 60 * 1000,  // 1分钟内数据视为新鲜\n    refetchInterval: 1 * 60 * 1000,  // 每1分钟自动刷新\n    refetchIntervalInBackground: true,  // 后台也刷新\n    retry: 3,  // 失败时重试3次\n    retryDelay: (attemptIndex) => Math.min(1000 * 2 ** attemptIndex, 30000),  // 指数退避重试\n    onError: (error: Error) => {\n      console.error('Failed to fetch news:', error)\n      toast.error(`加载新闻失败: ${error.message}`)\n    },\n  })\n\n  // 这里保留 dataUpdatedAt，后续可以用于全局最后刷新时间展示\n\n  // Phase 2: 强制刷新 mutation\n  const refreshMutation = useMutation({\n    mutationFn: newsApi.forceRefresh,\n    onSuccess: () => {\n      toast.success('爬取任务已触发，正在获取最新新闻...')\n      // 等待更长时间让爬取完成（根据日志，爬取大约需要60-120秒）\n      const checkInterval = setInterval(() => {\n        queryClient.invalidateQueries({ queryKey: ['news', 'latest'] })\n      }, 5000) // 每5秒检查一次\n      \n      // 2分钟后停止轮询并结束\n      setTimeout(() => {\n        clearInterval(checkInterval)\n        queryClient.invalidateQueries({ queryKey: ['news', 'latest'] })\n        setIsRefreshing(false) // 结束刷新状态\n        toast.success('刷新完成！')\n      }, 120000) // 120秒\n    },\n    onError: (error: Error) => {\n      setIsRefreshing(false) // 出错也要结束刷新状态\n      toast.error(`刷新失败: ${error.message}`)\n    },\n  })\n\n  // 分析新闻 mutation\n  const analyzeMutation = useMutation({\n    mutationFn: (newsId: number) => analysisApi.analyzeNews(newsId, modelConfig),\n    onSuccess: (data) => {\n      setAnalyzingNewsId(null)\n      if (data.success) {\n        toast.success(t.news.analysisComplete)\n        queryClient.invalidateQueries({ queryKey: ['news'] })\n      } else {\n        toast.error(`${t.news.analysisFailed}: ${data.error}`)\n      }\n    },\n    onError: (error: Error) => {\n      setAnalyzingNewsId(null)\n      toast.error(`${t.news.analysisFailed}: ${error.message}`)\n    },\n  })\n\n  // 批量分析 mutation\n  const batchAnalyzeMutation = useMutation({\n    mutationFn: (newsIds: number[]) => analysisApi.batchAnalyzeNews(newsIds, modelConfig),\n    onSuccess: (data) => {\n      if (data.success) {\n        const message = t.news.analysisComplete\n          .replace('{success}', data.success_count.toString())\n          .replace('{failed}', data.failed_count.toString())\n        toast.success(message)\n        queryClient.invalidateQueries({ queryKey: ['news'] })\n      } else {\n        toast.error(data.message || '批量分析失败')\n      }\n    },\n    onError: (error: Error) => {\n      toast.error(`批量分析失败: ${error.message}`)\n    },\n  })\n\n  const handleBatchAnalyze = useCallback(() => {\n    if (selectedNewsIds.size === 0) return\n    batchAnalyzeMutation.mutate(Array.from(selectedNewsIds))\n  }, [selectedNewsIds, batchAnalyzeMutation])\n\n  const handleBatchReanalyze = useCallback(() => {\n    // 重新分析使用相同的API\n    handleBatchAnalyze()\n  }, [handleBatchAnalyze])\n\n  // 批量删除新闻 mutation\n  const batchDeleteMutation = useMutation({\n    mutationFn: (newsIds: number[]) => newsApi.batchDeleteNews(newsIds),\n    onSuccess: (data) => {\n      if (data.success) {\n        toast.success(data.message || t.news.deleteSelected)\n        setSelectedNewsIds(new Set()) // 清空选择状态\n        setLastSelectedNewsId(null) // 清除最后选中项\n        queryClient.invalidateQueries({ queryKey: ['news'] })\n      } else {\n        toast.error(data.message || '删除失败')\n      }\n    },\n    onError: (error: Error) => {\n      toast.error(`删除失败: ${error.message}`)\n    },\n  })\n\n  // 切换新闻选择状态\n  const toggleNewsSelection = useCallback((newsId: number) => {\n    setSelectedNewsIds(prev => {\n      const newSet = new Set(prev)\n      if (newSet.has(newsId)) {\n        newSet.delete(newsId)\n      } else {\n        newSet.add(newsId)\n      }\n      return newSet\n    })\n  }, [])\n\n  // 范围选择函数（用于Shift点击）\n  const selectRange = useCallback((startId: number, endId: number, newsList: Array<{ id: number }>) => {\n    if (!newsList || newsList.length === 0) return\n    \n    const startIndex = newsList.findIndex(n => n.id === startId)\n    const endIndex = newsList.findIndex(n => n.id === endId)\n    \n    if (startIndex === -1 || endIndex === -1) return\n    \n    const minIndex = Math.min(startIndex, endIndex)\n    const maxIndex = Math.max(startIndex, endIndex)\n    \n    setSelectedNewsIds(prev => {\n      const newSet = new Set(prev)\n      for (let i = minIndex; i <= maxIndex; i++) {\n        newSet.add(newsList[i].id)\n      }\n      return newSet\n    })\n  }, [])\n\n  // 取消所有选择\n  const clearSelection = useCallback(() => {\n    setSelectedNewsIds(new Set())\n    setLastSelectedNewsId(null)\n  }, [])\n\n  // 批量删除\n  const handleBatchDelete = useCallback(() => {\n    if (selectedNewsIds.size === 0) {\n      return\n    }\n\n    const count = selectedNewsIds.size\n    const confirmMessage = t.news.confirmDelete.replace('{count}', count.toString())\n    \n    if (window.confirm(confirmMessage)) {\n      batchDeleteMutation.mutate(Array.from(selectedNewsIds))\n    }\n  }, [selectedNewsIds, batchDeleteMutation, t])\n\n  const handleForceRefresh = () => {\n    if (isRefreshing) {\n      toast.warning(t.news.crawling)\n      return\n    }\n    \n    setIsRefreshing(true) // 立即设置刷新状态，阻止后续点击\n    refreshMutation.mutate({ source: 'sina' })\n  }\n\n  // 将搜索框 + 刷新按钮挂到顶部工具栏\n  useEffect(() => {\n    // 使用独立的 SearchBox 组件，它自己管理内部状态\n    // 这样 searchQuery 变化时不会导致 input 重新挂载\n    const searchBox = <SearchBox onSearch={handleSearch} />\n\n    const refreshButton = (\n      <Button\n        onClick={handleForceRefresh}\n        disabled={isRefreshing}\n        variant=\"outline\"\n        size=\"sm\"\n        className=\"h-10 rounded-lg border-gray-300 shadow-sm\"\n      >\n        <RefreshCw\n          className={`w-4 h-4 mr-2 ${isRefreshing ? 'animate-spin' : ''}`}\n        />\n        {isRefreshing ? t.news.crawlingProgress : t.news.refreshNow}\n      </Button>\n    )\n\n    setContent({ left: searchBox, right: refreshButton })\n\n    return () => {\n      setContent({ left: null, right: null })\n    }\n  }, [isRefreshing, setContent, handleSearch])\n\n  const handleAnalyze = (newsId: number) => {\n    setAnalyzingNewsId(newsId)\n    analyzeMutation.mutate(newsId)\n  }\n\n  const toggleStockExpand = (newsId: number) => {\n    setExpandedStocks(prev => {\n      const newSet = new Set(prev)\n      if (newSet.has(newsId)) {\n        newSet.delete(newsId)\n      } else {\n        newSet.add(newsId)\n      }\n      return newSet\n    })\n  }\n\n  // 动态计算每行卡片数量，使卡片尽可能接近正方形\n  useEffect(() => {\n    const calculateGridCols = () => {\n      const containerWidth = window.innerWidth - 48 // 减去左右 padding (24px * 2)\n      const idealCardWidth = 380 // 理想卡片宽度，接近 min-h-[480px] 形成正方形\n      const gap = 24 // gap-6 = 24px\n      \n      // 计算可以放下多少列\n      let cols = Math.floor((containerWidth + gap) / (idealCardWidth + gap))\n      \n      // 限制在合理范围内\n      cols = Math.max(1, Math.min(cols, 5))\n      \n      setGridCols(cols)\n    }\n\n    calculateGridCols()\n    window.addEventListener('resize', calculateGridCols)\n    return () => window.removeEventListener('resize', calculateGridCols)\n  }, [])\n\n  // 根据股票数量动态计算内容显示行数\n  const getContentLines = (stockCount: number, isExpanded: boolean) => {\n    if (stockCount === 0) {\n      return 8 // 没有股票时显示更多内容\n    }\n    if (isExpanded || stockCount > 6) {\n      return 3 // 展开或股票很多时显示较少内容\n    }\n    if (stockCount <= 3) {\n      return 6 // 股票很少时显示更多内容\n    }\n    return 5 // 默认中等内容\n  }\n\n  const getSentimentBadge = (score: number | null) => {\n    if (score === null) return null\n    if (score > 0.1) {\n      return (\n        <Badge className=\"bg-green-100 text-green-800 hover:bg-green-100 border-green-200\">\n          <span className=\"mr-1\">😊</span>\n          {t.news.positive} {score.toFixed(2)}\n        </Badge>\n      )\n    }\n    if (score < -0.1) {\n      return (\n        <Badge className=\"bg-red-100 text-red-800 hover:bg-red-100 border-red-200\">\n          <span className=\"mr-1\">😰</span>\n          {t.news.negative} {score.toFixed(2)}\n        </Badge>\n      )\n    }\n    return (\n      <Badge variant=\"outline\" className=\"bg-gray-50 text-gray-700\">\n        <span className=\"mr-1\">😐</span>\n        {t.news.neutral} {score.toFixed(2)}\n      </Badge>\n    )\n  }\n\n  // 筛选新闻（情感 + 搜索）\n  const filteredNews = useMemo(() => {\n    if (!newsList) return []\n    \n    const query = debouncedSearchQuery.toLowerCase().trim()\n    \n    return newsList.filter(news => {\n      // 1. 情感筛选\n      let sentimentMatch = true\n      switch (activeFilter) {\n        case 'pending':\n          sentimentMatch = news.sentiment_score === null\n          break\n        case 'positive':\n          sentimentMatch = news.sentiment_score !== null && news.sentiment_score > 0.1\n          break\n        case 'negative':\n          sentimentMatch = news.sentiment_score !== null && news.sentiment_score < -0.1\n          break\n        case 'neutral':\n          sentimentMatch = news.sentiment_score !== null && news.sentiment_score >= -0.1 && news.sentiment_score <= 0.1\n          break\n        default:\n          sentimentMatch = true\n      }\n      \n      // 2. 搜索匹配（如果没有搜索词，则自动通过）\n      if (!query) return sentimentMatch\n      \n      const titleMatch = news.title.toLowerCase().includes(query)\n      const contentMatch = news.content.toLowerCase().includes(query)\n      const codeMatch = news.stock_codes?.some(code => code.toLowerCase().includes(query)) || false\n      const sourceMatch = getSourceName(news.source).toLowerCase().includes(query)\n      \n      const searchMatch = titleMatch || contentMatch || codeMatch || sourceMatch\n      \n      // 3. 返回交集\n      return sentimentMatch && searchMatch\n    })\n  }, [newsList, activeFilter, debouncedSearchQuery, getSourceName])\n\n  // 计算全选状态\n  const isAllSelected = useMemo(() => {\n    if (!filteredNews || filteredNews.length === 0) return false\n    return filteredNews.every(news => selectedNewsIds.has(news.id))\n  }, [filteredNews, selectedNewsIds])\n\n  const isPartiallySelected = useMemo(() => {\n    if (!filteredNews || filteredNews.length === 0) return false\n    const selectedCount = filteredNews.filter(news => selectedNewsIds.has(news.id)).length\n    return selectedCount > 0 && selectedCount < filteredNews.length\n  }, [filteredNews, selectedNewsIds])\n\n  // 全选/取消全选处理函数\n  const handleSelectAll = useCallback(() => {\n    if (!filteredNews) return\n    if (isAllSelected) {\n      // 取消全选：只取消当前筛选的新闻\n      setSelectedNewsIds(prev => {\n        const newSet = new Set(prev)\n        filteredNews.forEach(news => newSet.delete(news.id))\n        return newSet\n      })\n      setLastSelectedNewsId(null)\n    } else {\n      // 全选：选中所有筛选后的新闻\n      setSelectedNewsIds(prev => {\n        const newSet = new Set(prev)\n        filteredNews.forEach(news => newSet.add(news.id))\n        return newSet\n      })\n      // 设置最后选中项为筛选列表的最后一个\n      if (filteredNews.length > 0) {\n        setLastSelectedNewsId(filteredNews[filteredNews.length - 1].id)\n      }\n    }\n  }, [filteredNews, isAllSelected])\n\n  // 获取卡片样式类\n  const getCardStyle = (sentiment: number | null) => {\n    const baseStyle = \"flex flex-col transition-all duration-300 border min-w-0 h-full hover:shadow-lg hover:-translate-y-1\"\n    \n    if (sentiment === null) {\n      return `${baseStyle} bg-white border-gray-200 hover:border-primary/30`\n    }\n\n    if (sentiment > 0.1) {\n      // 利好：鲜明的绿色渐变背景 + 深绿边框\n      return `${baseStyle} bg-gradient-to-br from-emerald-100 to-white border-emerald-300 hover:border-emerald-400 hover:shadow-emerald-200/60`\n    }\n    \n    if (sentiment < -0.1) {\n      // 利空：鲜明的红色渐变背景 + 深红边框\n      return `${baseStyle} bg-gradient-to-br from-rose-100 to-white border-rose-300 hover:border-rose-400 hover:shadow-rose-200/60`\n    }\n\n    // 中性：清晰的蓝色/灰色渐变背景 + 深灰边框\n    return `${baseStyle} bg-gradient-to-br from-slate-100 to-white border-slate-300 hover:border-slate-400 hover:shadow-slate-200/60`\n  }\n\n  // 获取重新分析按钮样式\n  const getAnalyzeButtonStyle = (sentiment: number | null) => {\n    if (sentiment === null) {\n      return \"w-full bg-primary hover:bg-primary/90 text-white shadow-sm hover:shadow transition-all\"\n    }\n    if (sentiment > 0.1) {\n      return \"w-full border-emerald-200 text-emerald-700 hover:bg-emerald-50 hover:border-emerald-300 hover:text-emerald-800 transition-colors\"\n    }\n    if (sentiment < -0.1) {\n      return \"w-full border-rose-200 text-rose-700 hover:bg-rose-50 hover:border-rose-300 hover:text-rose-800 transition-colors\"\n    }\n    return \"w-full border-slate-200 text-slate-700 hover:bg-slate-50 hover:border-slate-300 hover:text-slate-800 transition-colors\"\n  }\n\n  return (\n    <div className=\"flex flex-col h-full overflow-hidden\">\n      {/* 固定顶部区域：筛选栏和批量操作栏 */}\n      <div className=\"flex-shrink-0 p-6 pb-4 space-y-4 bg-white border-b border-gray-200 z-10\">\n        {/* 筛选栏：新闻源 + 情感筛选 */}\n        <Card className=\"border-gray-200 shadow-sm\">\n        <CardHeader className=\"pb-4\">\n          <div className=\"flex flex-wrap items-center gap-3\">\n            {/* 新闻源筛选 */}\n            <div className=\"flex flex-wrap items-center gap-1.5 bg-blue-50 p-1 rounded-lg border border-blue-200\">\n              {NEWS_SOURCES.map((source) => (\n                <Button\n                  key={source.key}\n                  variant={activeSource === source.key ? 'default' : 'ghost'}\n                  size=\"sm\"\n                  onClick={() => setActiveSource(source.key)}\n                  className={\n                    activeSource === source.key\n                      ? 'bg-white text-blue-600 shadow-sm hover:bg-white/90 text-xs'\n                      : 'text-slate-600 hover:text-blue-600 text-xs'\n                  }\n                >\n                  <span className=\"mr-1\">{source.icon}</span>\n                  {getSourceName(source.key)}\n                </Button>\n              ))}\n            </div>\n            \n            {/* 情感筛选 */}\n            <div className=\"flex flex-wrap items-center gap-1 bg-slate-50 p-1 rounded-lg border border-slate-200\">\n              {/* 全选复选框 */}\n              <button\n                onClick={handleSelectAll}\n                className={`flex items-center gap-1.5 px-2 py-1 rounded h-8 transition-colors ${\n                  isAllSelected \n                    ? 'bg-blue-100 text-blue-700' \n                    : isPartiallySelected\n                    ? 'bg-blue-50 text-blue-600'\n                    : 'hover:bg-gray-100 text-gray-600'\n                }`}\n                aria-label={isAllSelected ? t.news.deselectAll : t.news.selectAll}\n              >\n                <div className={`w-4 h-4 rounded border-2 flex items-center justify-center transition-all ${\n                  isAllSelected \n                    ? 'bg-blue-500 border-blue-500' \n                    : isPartiallySelected\n                    ? 'bg-blue-100 border-blue-500'\n                    : 'border-gray-300 bg-white'\n                }`}>\n                  {isAllSelected && <Check className=\"w-3 h-3 text-white\" />}\n                  {isPartiallySelected && <Minus className=\"w-3 h-3 text-blue-600\" />}\n                </div>\n                <span className=\"text-xs font-medium\">\n                  {isAllSelected ? t.news.deselectAll : t.news.selectAll}\n                </span>\n              </button>\n              \n                <Button\n                  variant={activeFilter === 'all' ? 'default' : 'ghost'}\n                  size=\"sm\"\n                  onClick={() => setActiveFilter('all')}\n                className={`h-8 ${\n                  activeFilter === 'all'\n                    ? 'bg-white text-primary shadow-sm hover:bg-white/90'\n                    : 'text-slate-600 hover:text-slate-900'\n                }`}\n                >\n                  {t.news.all}\n                </Button>\n                <Button\n                  variant={activeFilter === 'pending' ? 'default' : 'ghost'}\n                  size=\"sm\"\n                  onClick={() => setActiveFilter('pending')}\n                className={`h-8 ${\n                  activeFilter === 'pending'\n                    ? 'bg-white text-orange-600 shadow-sm hover:bg-white/90'\n                    : 'text-slate-600 hover:text-orange-600'\n                }`}\n                >\n                  <HelpCircle className=\"w-3.5 h-3.5 mr-1.5\" />\n                  {t.news.pending}\n                </Button>\n                <Button\n                  variant={activeFilter === 'positive' ? 'default' : 'ghost'}\n                  size=\"sm\"\n                  onClick={() => setActiveFilter('positive')}\n                className={`h-8 ${\n                  activeFilter === 'positive'\n                    ? 'bg-white text-emerald-600 shadow-sm hover:bg-white/90'\n                    : 'text-slate-600 hover:text-emerald-600'\n                }`}\n                >\n                  <CheckCircle2 className=\"w-3.5 h-3.5 mr-1.5\" />\n                  {t.news.positive}\n                </Button>\n                <Button\n                  variant={activeFilter === 'negative' ? 'default' : 'ghost'}\n                  size=\"sm\"\n                  onClick={() => setActiveFilter('negative')}\n                className={`h-8 ${\n                  activeFilter === 'negative'\n                    ? 'bg-white text-rose-600 shadow-sm hover:bg-white/90'\n                    : 'text-slate-600 hover:text-rose-600'\n                }`}\n                >\n                  <XCircle className=\"w-3.5 h-3.5 mr-1.5\" />\n                  {t.news.negative}\n                </Button>\n                <Button\n                  variant={activeFilter === 'neutral' ? 'default' : 'ghost'}\n                  size=\"sm\"\n                  onClick={() => setActiveFilter('neutral')}\n                className={`h-8 ${\n                  activeFilter === 'neutral'\n                    ? 'bg-white text-slate-600 shadow-sm hover:bg-white/90'\n                    : 'text-slate-600 hover:text-slate-900'\n                }`}\n                >\n                  <MinusCircle className=\"w-3.5 h-3.5 mr-1.5\" />\n                  {t.news.neutral}\n              </Button>\n            </div>\n          </div>\n        </CardHeader>\n      </Card>\n\n      {/* 批量操作栏 */}\n      {selectedNewsIds.size > 0 && (\n        <Card className=\"border-blue-200 bg-blue-50 shadow-sm\">\n          <CardContent className=\"p-4\">\n            <div className=\"flex items-center justify-between\">\n              <div className=\"flex items-center gap-4\">\n                <span className=\"text-sm font-medium text-gray-700\">\n                  {t.news.selectedItems.replace('{count}', selectedNewsIds.size.toString())}\n                </span>\n                <Button\n                  variant=\"ghost\"\n                  size=\"sm\"\n                  onClick={clearSelection}\n                  className=\"text-blue-600 hover:text-blue-700 hover:bg-blue-100\"\n                >\n                  {t.news.cancelSelection}\n                </Button>\n              </div>\n              <div className=\"flex items-center gap-2\">\n                {/* 根据筛选条件显示不同的分析按钮 */}\n                {activeFilter === 'pending' ? (\n                  <Button\n                    onClick={handleBatchAnalyze}\n                    disabled={batchAnalyzeMutation.isPending}\n                    size=\"sm\"\n                    className=\"bg-blue-600 hover:bg-blue-700 text-white\"\n                  >\n                    {batchAnalyzeMutation.isPending ? (\n                      <>\n                        <RefreshCw className=\"w-4 h-4 mr-2 animate-spin\" />\n                        {t.news.analyzingSelected.replace('{count}', selectedNewsIds.size.toString())}\n                      </>\n                    ) : (\n                      <>\n                        <Sparkles className=\"w-4 h-4 mr-2\" />\n                        {t.news.analyzeAll}\n                      </>\n                    )}\n                  </Button>\n                ) : (activeFilter === 'positive' || activeFilter === 'negative' || activeFilter === 'neutral') ? (\n                  <Button\n                    onClick={handleBatchReanalyze}\n                    disabled={batchAnalyzeMutation.isPending}\n                    size=\"sm\"\n                    className=\"bg-blue-600 hover:bg-blue-700 text-white\"\n                  >\n                    {batchAnalyzeMutation.isPending ? (\n                      <>\n                        <RefreshCw className=\"w-4 h-4 mr-2 animate-spin\" />\n                        {t.news.analyzingSelected.replace('{count}', selectedNewsIds.size.toString())}\n                      </>\n                    ) : (\n                      <>\n                        <RefreshCcw className=\"w-4 h-4 mr-2\" />\n                        {t.news.reanalyzeAll}\n                      </>\n                    )}\n                  </Button>\n                ) : null}\n                \n                {/* 删除按钮 */}\n                <Button\n                  variant=\"destructive\"\n                  size=\"sm\"\n                  onClick={handleBatchDelete}\n                  disabled={batchDeleteMutation.isPending}\n                  className=\"bg-red-600 hover:bg-red-700 text-white\"\n                >\n                  {batchDeleteMutation.isPending ? (\n                    <>\n                      <RefreshCw className=\"w-4 h-4 mr-2 animate-spin\" />\n                      {t.common.loading}\n                    </>\n                  ) : (\n                    t.news.deleteSelected\n                  )}\n                </Button>\n              </div>\n            </div>\n          </CardContent>\n        </Card>\n      )}\n      </div>\n\n      {/* 可滚动的新闻列表区域 */}\n      <div className=\"flex-1 overflow-y-auto p-6 pt-4 min-h-0\">\n        <div \n          className=\"grid gap-6\"\n          style={{\n            gridTemplateColumns: `repeat(${gridCols}, minmax(0, 1fr))`\n          }}\n        >\n        {isLoading ? (\n          <div className=\"col-span-full text-center py-12 text-gray-500\">\n            <div className=\"inline-block animate-spin rounded-full h-8 w-8 border-b-2 border-primary\"></div>\n            <p className=\"mt-4\">{t.common.loading}</p>\n          </div>\n        ) : isError ? (\n          <div className=\"col-span-full text-center py-12\">\n            <div className=\"text-red-500 mb-4\">\n              <XCircle className=\"w-12 h-12 mx-auto mb-2\" />\n              <p className=\"text-lg font-semibold\">加载失败</p>\n              <p className=\"text-sm mt-2 text-gray-600\">{error?.message || '未知错误'}</p>\n            </div>\n            <Button\n              onClick={() => queryClient.invalidateQueries({ queryKey: ['news', 'latest'] })}\n              variant=\"outline\"\n            >\n              <RefreshCw className=\"w-4 h-4 mr-2\" />\n              重试\n            </Button>\n          </div>\n        ) : filteredNews && filteredNews.length > 0 ? (\n          filteredNews.map((news) => (\n            <Card \n              key={news.id} \n              className={`${getCardStyle(news.sentiment_score)} cursor-pointer hover:shadow-lg transition-shadow relative ${\n                selectedNewsIds.has(news.id) ? 'border-blue-500 border-2' : ''\n              }`}\n              onClick={(e) => {\n                // 阻止按钮和选择框点击事件冒泡\n                if ((e.target as HTMLElement).closest('button') || \n                    (e.target as HTMLElement).closest('.selection-checkbox')) {\n                  return\n                }\n                \n                const isCommandOrCtrl = e.metaKey || e.ctrlKey\n                const isShift = e.shiftKey\n                \n                // Command/Ctrl + 点击：多选模式\n                if (isCommandOrCtrl) {\n                  e.preventDefault()\n                  toggleNewsSelection(news.id)\n                  setLastSelectedNewsId(news.id)\n                  return\n                }\n                \n                // Shift + 点击：范围选择\n                if (isShift) {\n                  e.preventDefault()\n                  if (lastSelectedNewsId !== null) {\n                    selectRange(lastSelectedNewsId, news.id, filteredNews)\n                  } else {\n                    // 如果没有上次选中项，只选中当前项\n                    toggleNewsSelection(news.id)\n                  }\n                  setLastSelectedNewsId(news.id)\n                  return\n                }\n                \n                // 普通点击：如果已选中则切换选择，否则打开详情\n                if (selectedNewsIds.has(news.id)) {\n                  toggleNewsSelection(news.id)\n                  setLastSelectedNewsId(null)\n                } else {\n                  setSelectedNewsId(news.id)\n                  setDrawerOpen(true)\n                }\n              }}\n            >\n              <CardHeader className=\"pb-2 flex-shrink-0 relative\">\n                {/* 选择框 */}\n                <button\n                  className={`selection-checkbox absolute top-2 right-2 w-5 h-5 rounded-full flex items-center justify-center transition-all z-10 ${\n                    selectedNewsIds.has(news.id)\n                      ? 'bg-blue-500 hover:bg-blue-600'\n                      : 'border-2 border-gray-300 hover:border-gray-400 bg-white'\n                  }`}\n                  onClick={(e) => {\n                    e.stopPropagation()\n                    const isCommandOrCtrl = e.metaKey || e.ctrlKey\n                    const isShift = e.shiftKey\n                    \n                    if (isCommandOrCtrl || isShift) {\n                      // 键盘修饰键时，使用与卡片相同的逻辑\n                      if (isShift && lastSelectedNewsId !== null) {\n                        selectRange(lastSelectedNewsId, news.id, filteredNews)\n                      } else {\n                        toggleNewsSelection(news.id)\n                      }\n                      setLastSelectedNewsId(news.id)\n                    } else {\n                      // 普通点击选择框\n                      toggleNewsSelection(news.id)\n                      setLastSelectedNewsId(news.id)\n                    }\n                  }}\n                  aria-label={selectedNewsIds.has(news.id) ? '取消选择' : '选择'}\n                >\n                  {selectedNewsIds.has(news.id) && (\n                    <Check className=\"w-3 h-3 text-white\" />\n                  )}\n                </button>\n                <CardTitle className=\"text-base leading-tight font-semibold text-gray-900 line-clamp-2 mb-1.5 min-h-[44px] pr-7\">\n                  <HighlightText text={news.title} highlight={debouncedSearchQuery} />\n                </CardTitle>\n                <div className=\"flex items-center gap-2 text-xs text-gray-500\">\n                  <div className=\"flex items-center gap-1\">\n                    <Calendar className=\"w-3 h-3\" />\n                    <span>{formatRelativeTime(news.publish_time || news.created_at, t.time)}</span>\n                  </div>\n                  <span>•</span>\n                  <div className=\"flex items-center gap-1\">\n                    <span>{getSourceIcon(news.source)}</span>\n                    <span>{getSourceName(news.source)}</span>\n                  </div>\n                </div>\n              </CardHeader>\n              \n              <CardContent className=\"flex-1 flex flex-col pb-3 pt-2 overflow-hidden\">\n                <div \n                  className=\"text-sm text-gray-600 mb-3 leading-relaxed flex-shrink-0\"\n                  style={{\n                    display: '-webkit-box',\n                    WebkitLineClamp: getContentLines(\n                      news.stock_codes?.length || 0,\n                      expandedStocks.has(news.id)\n                    ),\n                    WebkitBoxOrient: 'vertical',\n                    overflow: 'hidden'\n                  }}\n                >\n                  <HighlightText text={news.content} highlight={debouncedSearchQuery} />\n                </div>\n                \n                <div className=\"mt-auto space-y-2\">\n                  {news.stock_codes && news.stock_codes.length > 0 && (\n                    <div className=\"space-y-1.5\">\n                      <div className=\"flex flex-wrap gap-1.5\">\n                        {(expandedStocks.has(news.id) \n                          ? news.stock_codes \n                          : news.stock_codes.slice(0, 6)\n                        ).map((code) => (\n                          <Badge \n                            key={code} \n                            variant=\"outline\" \n                            className=\"text-xs bg-blue-50 text-blue-700 border-blue-200 hover:bg-blue-100 px-2 py-0.5\"\n                          >\n                            <TrendingUp className=\"w-3 h-3 mr-0.5\" />\n                            {code}\n                          </Badge>\n                        ))}\n                      </div>\n                      {news.stock_codes.length > 6 && (\n                        <button\n                          onClick={() => toggleStockExpand(news.id)}\n                          className=\"text-xs text-primary hover:text-primary/80 flex items-center gap-0.5 transition-colors pt-0.5\"\n                        >\n                          {expandedStocks.has(news.id) ? (\n                            <>\n                              <ChevronUp className=\"w-3 h-3\" />\n                              {t.news.collapse} ({news.stock_codes.length} {t.news.stocks})\n                            </>\n                          ) : (\n                            <>\n                              <ChevronDown className=\"w-3 h-3\" />\n                              {t.news.expandMore} ({news.stock_codes.length - 6})\n                            </>\n                          )}\n                        </button>\n                      )}\n                    </div>\n                  )}\n\n                  {news.sentiment_score !== null && (\n                    <div className=\"flex items-center pt-0.5\">\n                      {getSentimentBadge(news.sentiment_score)}\n                    </div>\n                  )}\n                </div>\n              </CardContent>\n\n              <CardFooter className=\"pt-2 pb-4 px-6 flex-shrink-0\">\n                <Button\n                  onClick={() => handleAnalyze(news.id)}\n                  disabled={analyzingNewsId === news.id}\n                  size=\"sm\"\n                  className={getAnalyzeButtonStyle(news.sentiment_score)}\n                  variant={news.sentiment_score !== null ? 'outline' : 'default'}\n                >\n                  {analyzingNewsId === news.id ? (\n                    <>\n                      <RefreshCw className=\"w-4 h-4 mr-2 animate-spin\" />\n                      {t.news.analyzing}\n                    </>\n                  ) : news.sentiment_score !== null ? (\n                    <>\n                      <RefreshCcw className=\"w-4 h-4 mr-2\" />\n                      {t.news.reanalyze}\n                    </>\n                  ) : (\n                    <>\n                      <Sparkles className=\"w-4 h-4 mr-2\" />\n                      {t.news.analyze}\n                    </>\n                  )}\n                </Button>\n              </CardFooter>\n            </Card>\n          ))\n        ) : (\n          <div className=\"col-span-full text-center py-16\">\n            <div className=\"text-gray-400 mb-2\">\n              {debouncedSearchQuery ? (\n                <Search className=\"w-16 h-16 mx-auto opacity-50\" />\n              ) : (\n              <Newspaper className=\"w-16 h-16 mx-auto opacity-50\" />\n              )}\n            </div>\n            {debouncedSearchQuery ? (\n              <>\n                <p className=\"text-gray-500 text-lg\">{t.news.noNewsFound} \"{debouncedSearchQuery}\" {t.news.relatedNews}</p>\n                <p className=\"text-gray-400 text-sm mt-1\">{t.news.tryOtherKeywords}</p>\n              </>\n            ) : (\n              <>\n            <p className=\"text-gray-500 text-lg\">{t.news.noNews}</p>\n            <p className=\"text-gray-400 text-sm mt-1\">{t.news.pleaseCrawl}</p>\n              </>\n            )}\n          </div>\n        )}\n        </div>\n      </div>\n\n      {/* 新闻详情抽屉 */}\n      <NewsDetailDrawer\n        newsId={selectedNewsId}\n        open={drawerOpen}\n        onOpenChange={(open) => {\n          setDrawerOpen(open)\n          if (!open) {\n            // 延迟清除newsId，避免关闭动画时闪烁\n            setTimeout(() => setSelectedNewsId(null), 300)\n          }\n        }}\n      />\n    </div>\n  )\n}\n\n"
  },
  {
    "path": "frontend/src/pages/StockAnalysisPage.tsx",
    "content": "import { useState, useEffect, useMemo, useRef, useCallback } from 'react'\nimport { useParams, useNavigate } from 'react-router-dom'\nimport { useQuery, useMutation, useQueryClient } from '@tanstack/react-query'\nimport { toast } from 'sonner'\nimport { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card'\nimport { Button } from '@/components/ui/button'\nimport { Badge } from '@/components/ui/badge'\nimport { stockApi, agentApi, knowledgeGraphApi, SSEDebateEvent } from '@/lib/api-client'\nimport { formatRelativeTime } from '@/lib/utils'\nimport NewsDetailDrawer from '@/components/NewsDetailDrawer'\nimport { useGlobalI18n, useLanguageStore } from '@/store/useLanguageStore'\nimport DebateChatRoom, { ChatMessage, ChatRole } from '@/components/DebateChatRoom'\nimport DebateHistorySidebar from '@/components/DebateHistorySidebar'\nimport { useDebateStore, DebateSession } from '@/store/useDebateStore'\nimport type { MentionTarget } from '@/components/MentionInput'\nimport {\n  TrendingUp,\n  TrendingDown,\n  Minus,\n  Newspaper,\n  BarChart3,\n  MessageSquare,\n  RefreshCw,\n  Calendar,\n  Swords,\n  Bot,\n  ThumbsUp,\n  ThumbsDown,\n  Scale,\n  Loader2,\n  Activity,\n  ArrowLeft,\n  Download,\n  CheckCircle2,\n  AlertCircle,\n  ChevronDown,\n  Copy,\n  FileDown,\n  Settings,\n  Trash2,\n  Network,\n  Building2,\n  StopCircle,\n  History,\n} from 'lucide-react'\nimport {\n  XAxis,\n  YAxis,\n  CartesianGrid,\n  Tooltip,\n  ResponsiveContainer,\n  Bar,\n  Legend,\n  ComposedChart,\n  Line,\n} from 'recharts'\nimport KLineChart from '@/components/KLineChart'\nimport type { DebateResponse } from '@/types/api'\nimport ReactMarkdown from 'react-markdown'\nimport remarkGfm from 'remark-gfm'\nimport { DebateModeSelector } from '@/components/DebateConfig'\n\n// 从代码中提取纯数字代码\nconst extractCode = (fullCode: string): string => {\n  const code = fullCode.toUpperCase()\n  if (code.startsWith('SH') || code.startsWith('SZ')) {\n    return code.slice(2)\n  }\n  return code\n}\n\n// K线周期配置\ntype KLinePeriod = 'daily' | '1m' | '5m' | '15m' | '30m' | '60m'\nconst getPeriodOptions = (t: any): { value: KLinePeriod; label: string; limit: number }[] => [\n  { value: 'daily', label: t.stockDetail.dailyK, limit: 120 },\n  { value: '60m', label: t.stockDetail.min60, limit: 200 },\n  { value: '30m', label: t.stockDetail.min30, limit: 200 },\n  { value: '15m', label: t.stockDetail.min15, limit: 200 },\n  { value: '5m', label: t.stockDetail.min5, limit: 300 },\n  { value: '1m', label: t.stockDetail.min1, limit: 400 },\n]\n\n// 复权类型配置\ntype KLineAdjust = 'qfq' | 'hfq' | ''\nconst getAdjustOptions = (t: any): { value: KLineAdjust; label: string; tip: string }[] => [\n  { value: 'qfq', label: t.stockDetail.qfq, tip: t.stockDetail.qfqTip },\n  { value: '', label: t.stockDetail.noAdjust, tip: t.stockDetail.noAdjustTip },\n  { value: 'hfq', label: t.stockDetail.hfq, tip: t.stockDetail.hfqTip },\n]\n\n// 定向爬取任务状态类型\ntype CrawlTaskStatus = 'idle' | 'pending' | 'running' | 'completed' | 'failed'\n\ninterface CrawlTaskState {\n  status: CrawlTaskStatus\n  taskId?: number\n  progress?: {\n    current: number\n    total: number\n    message?: string\n  }\n  error?: string\n}\n\nexport default function StockAnalysisPage() {\n  const t = useGlobalI18n()\n  const { lang } = useLanguageStore()\n  const { code } = useParams<{ code: string }>()\n  const navigate = useNavigate()\n  const queryClient = useQueryClient()\n  const [debateResult, setDebateResult] = useState<DebateResponse | null>(null)\n  const [klinePeriod, setKlinePeriod] = useState<KLinePeriod>('daily')\n  const [klineAdjust, setKlineAdjust] = useState<KLineAdjust>('qfq')  // 默认前复权，与国内主流软件一致\n  const [crawlTask, setCrawlTask] = useState<CrawlTaskState>({ status: 'idle' })\n  const [selectedNewsId, setSelectedNewsId] = useState<number | null>(null)\n  const [drawerOpen, setDrawerOpen] = useState(false)\n  const [newsDisplayCount, setNewsDisplayCount] = useState(12) // 默认显示12条\n  const [newsExpanded, setNewsExpanded] = useState(true) // 新闻是否展开\n  const [debateMode, setDebateMode] = useState<string>('parallel') // 辩论模式\n  const [showModelSelector, setShowModelSelector] = useState(false) // 模型选择器显示状态\n  const [showKnowledgeGraph, setShowKnowledgeGraph] = useState(true) // 是否展示知识图谱\n  \n  // 流式辩论状态\n  const [isStreaming, setIsStreaming] = useState(false)\n  const [streamPhase, setStreamPhase] = useState<string>('')\n  const [streamingContent, setStreamingContent] = useState<{\n    bull: string\n    bear: string\n    manager: string\n    quick: string\n  }>({ bull: '', bear: '', manager: '', quick: '' })\n  const [activeAgent, setActiveAgent] = useState<string | null>(null)\n  const [currentRound, setCurrentRound] = useState<{ round: number; maxRounds: number } | null>(null)\n  const [chatMessages, setChatMessages] = useState<ChatMessage[]>([])\n  const currentMessageIdRef = useRef<string | null>(null)\n  const cancelStreamRef = useRef<(() => void) | null>(null)\n  const chatMessagesRef = useRef<ChatMessage[]>([])\n  \n  // 保持 ref 同步\n  useEffect(() => {\n    chatMessagesRef.current = chatMessages\n  }, [chatMessages])\n  \n  const stockCode = code?.toUpperCase() || 'SH600519'\n  const pureCode = extractCode(stockCode)\n  \n  // 辩论历史 Store\n  const { \n    currentSession,\n    startSession, \n    addMessage: addMessageToStore, \n    syncMessages,\n    getStockSessions,\n    loadSession,\n    clearStockHistory,\n    syncToBackend,\n    loadFromBackend,\n    saveAnalysisResult,\n    updateSessionStatus,\n    deleteSession,\n    getLatestInProgressSession\n  } = useDebateStore()\n  \n  // 历史侧边栏状态\n  const [showHistorySidebar, setShowHistorySidebar] = useState(false)\n  \n  // 获取该股票的历史会话（直接从 Store 订阅，确保数据变化时自动更新）\n  const allSessions = useDebateStore(state => state.sessions)\n  const historySessions = useMemo(() => allSessions[stockCode] || [], [stockCode, allSessions])\n  \n  // 页面加载时从后端加载历史\n  useEffect(() => {\n    loadFromBackend(stockCode)\n  }, [stockCode, loadFromBackend])\n\n  // 页面加载时检查是否有未完成的会话，并提示恢复\n  useEffect(() => {\n    const checkAndRestoreSession = () => {\n      const inProgressSession = getLatestInProgressSession(stockCode)\n      if (inProgressSession && inProgressSession.messages.length > 0) {\n        // 有未完成的会话，提示用户恢复\n        const shouldRestore = window.confirm(\n          `${t.stockDetail.detectIncompleteSession || '检测到有未完成的'}${inProgressSession.mode === 'realtime_debate' ? t.stockDetail.realtimeDebate : t.stockDetail.analysis || '分析'}${t.stockDetail.session || '会话'}（${inProgressSession.messages.length} ${t.stockDetail.messages || '条消息'}），${t.stockDetail.restore || '是否恢复'}？`\n        )\n        if (shouldRestore) {\n          restoreSessionState(inProgressSession)\n          toast.success(t.stockDetail.sessionRestored)\n        } else {\n          // 标记为中断\n          updateSessionStatus('interrupted')\n        }\n      } else if (inProgressSession && inProgressSession.analysisResult) {\n        // 有分析结果的会话，直接恢复\n        restoreSessionState(inProgressSession)\n      }\n    }\n    \n    // 延迟执行，确保 store 数据已加载\n    const timer = setTimeout(checkAndRestoreSession, 500)\n    return () => clearTimeout(timer)\n  }, [stockCode])\n\n  // 恢复会话状态到页面\n  const restoreSessionState = useCallback((session: DebateSession) => {\n    // 恢复模式\n    setDebateMode(session.mode)\n    \n    // 恢复聊天消息（需要类型转换）\n    if (session.messages.length > 0) {\n      const restoredMessages: ChatMessage[] = session.messages.map(m => ({\n        id: m.id,\n        role: m.role as ChatRole,\n        content: m.content,\n        timestamp: new Date(m.timestamp),\n        round: m.round,\n        isStreaming: false\n      }))\n      setChatMessages(restoredMessages)\n    }\n    \n    // 恢复分析结果（并行/快速模式）\n    if (session.analysisResult) {\n      setStreamingContent({\n        bull: session.analysisResult.bull || '',\n        bear: session.analysisResult.bear || '',\n        manager: session.analysisResult.manager || '',\n        quick: session.analysisResult.quick || ''\n      })\n      \n      // 如果有最终决策，设置 debateResult\n      if (session.analysisResult.finalDecision || session.analysisResult.bull || session.analysisResult.bear) {\n        setDebateResult({\n          success: true,\n          stock_code: session.stockCode,\n          stock_name: session.stockName,\n          mode: session.mode as 'parallel' | 'realtime_debate' | 'quick_analysis',\n          bull_analysis: session.analysisResult.bull ? {\n            success: true,\n            agent_name: 'BullResearcher',\n            stance: 'bull',\n            analysis: session.analysisResult.bull\n          } : undefined,\n          bear_analysis: session.analysisResult.bear ? {\n            success: true,\n            agent_name: 'BearResearcher',\n            stance: 'bear',\n            analysis: session.analysisResult.bear\n          } : undefined,\n          final_decision: session.analysisResult.finalDecision ? {\n            success: true,\n            agent_name: 'InvestmentManager',\n            rating: session.analysisResult.finalDecision.rating,\n            decision: session.analysisResult.finalDecision.decision\n          } : undefined,\n          quick_analysis: session.analysisResult.quick ? {\n            success: true,\n            analysis: session.analysisResult.quick\n          } : undefined,\n          execution_time: session.analysisResult.executionTime\n        })\n      }\n    }\n    \n    // 加载会话到 store\n    loadSession(session.stockCode, session.id)\n  }, [loadSession])\n\n  // 获取当前周期配置\n  const PERIOD_OPTIONS = getPeriodOptions(t)\n  const ADJUST_OPTIONS = getAdjustOptions(t)\n  const currentPeriodConfig = PERIOD_OPTIONS.find(p => p.value === klinePeriod) || PERIOD_OPTIONS[0]\n\n  // 获取股票名称（从数据库查询）\n  const { data: stockInfo } = useQuery({\n    queryKey: ['stock', 'info', pureCode],\n    queryFn: () => stockApi.searchRealtime(pureCode, 1),\n    staleTime: 24 * 60 * 60 * 1000, // 缓存24小时\n  })\n  \n  // 股票名称：优先使用查询结果，否则显示代码\n  const stockName = stockInfo?.[0]?.name || stockCode\n\n  // 获取股票概览\n  const { data: overview, isLoading: overviewLoading, refetch: refetchOverview } = useQuery({\n    queryKey: ['stock', 'overview', stockCode],\n    queryFn: () => stockApi.getOverview(stockCode),\n    staleTime: 5 * 60 * 1000,\n  })\n\n  // 获取关联新闻\n  const { data: newsList, isLoading: newsLoading } = useQuery({\n    queryKey: ['stock', 'news', stockCode],\n    queryFn: () => stockApi.getNews(stockCode, { limit: 200 }), // 获取更多数据，前端分页\n    staleTime: 5 * 60 * 1000,\n  })\n\n  // 计算排序后的展示新闻（按时间从新到旧）\n  const displayedNews = useMemo(() => {\n    if (!newsList) return []\n    const sorted = [...newsList].sort((a, b) => {\n      const timeA = a.publish_time ? new Date(a.publish_time).getTime() : 0\n      const timeB = b.publish_time ? new Date(b.publish_time).getTime() : 0\n      return timeB - timeA // 降序排列（最新的在前）\n    })\n    return sorted.slice(0, newsDisplayCount)\n  }, [newsList, newsDisplayCount])\n\n  // 是否还有更多新闻\n  const hasMoreNews = (newsList?.length || 0) > newsDisplayCount\n  \n  // 是否有历史新闻数据\n  const hasHistoryNews = newsList && newsList.length > 0\n\n  // 获取新闻卡片样式（根据情感分数）\n  const getNewsCardStyle = (sentiment: number | null) => {\n    const baseStyle = \"flex flex-col transition-all duration-300 border min-w-0 h-full hover:shadow-lg hover:-translate-y-1 cursor-pointer\"\n    \n    if (sentiment === null) {\n      return `${baseStyle} bg-white border-gray-200 hover:border-blue-300`\n    }\n\n    if (sentiment > 0.1) {\n      // 利好：绿色渐变\n      return `${baseStyle} bg-gradient-to-br from-emerald-50 to-white border-emerald-200 hover:border-emerald-400 hover:shadow-emerald-200/60`\n    }\n    \n    if (sentiment < -0.1) {\n      // 利空：红色渐变\n      return `${baseStyle} bg-gradient-to-br from-rose-50 to-white border-rose-200 hover:border-rose-400 hover:shadow-rose-200/60`\n    }\n\n    // 中性：蓝灰色渐变\n    return `${baseStyle} bg-gradient-to-br from-slate-50 to-white border-slate-200 hover:border-slate-400 hover:shadow-slate-200/60`\n  }\n\n  // 获取情感趋势\n  const { data: sentimentTrend, isLoading: trendLoading } = useQuery({\n    queryKey: ['stock', 'sentiment-trend', stockCode],\n    queryFn: () => stockApi.getSentimentTrend(stockCode, 30),\n    staleTime: 5 * 60 * 1000,\n  })\n\n  // 获取知识图谱\n  const { data: knowledgeGraph, isLoading: kgLoading, refetch: refetchKG } = useQuery({\n    queryKey: ['knowledge-graph', stockCode],\n    queryFn: () => knowledgeGraphApi.getCompanyGraph(stockCode),\n    staleTime: 10 * 60 * 1000, // 缓存10分钟\n  })\n\n  // 获取K线数据 - 支持多周期和复权类型\n  const { data: klineData, isLoading: klineLoading, refetch: refetchKline } = useQuery({\n    queryKey: ['stock', 'kline', stockCode, klinePeriod, currentPeriodConfig.limit, klineAdjust],\n    queryFn: async () => {\n      const actualAdjust = klinePeriod === 'daily' ? klineAdjust : ''\n      console.log(`🔍 Fetching kline data: code=${stockCode}, period=${klinePeriod}, limit=${currentPeriodConfig.limit}, adjust=${actualAdjust}`)\n      \n      const data = await stockApi.getKLineData(\n        stockCode, \n        klinePeriod, \n        currentPeriodConfig.limit,\n        actualAdjust\n      )\n      \n      if (data && data.length > 0) {\n        console.log(`✅ Received ${data.length} kline data points, latest: ${data[data.length - 1].date}, close: ${data[data.length - 1].close}`)\n      } else {\n        console.warn(`⚠️ Received empty kline data`)\n      }\n      \n      return data\n    },\n    staleTime: 0, // 禁用缓存，每次都重新获取以避免混乱\n    gcTime: 0, // 立即丢弃缓存 (React Query v5: cacheTime改名为gcTime)\n  })\n\n  // 辩论 Mutation（非流式备用）\n  const debateMutation = useMutation({\n    mutationFn: (mode: string) => agentApi.runDebate({\n      stock_code: stockCode,\n      stock_name: stockName,\n      mode: mode as 'parallel' | 'realtime_debate' | 'quick_analysis',\n      language: lang,\n    }),\n    onSuccess: (data) => {\n      setDebateResult(data)\n      if (data.success) {\n        toast.success(t.stockDetail.debateComplete)\n      } else {\n        toast.error(`辩论失败: ${data.error}`)\n      }\n    },\n    onError: (error: Error) => {\n      toast.error(`辩论失败: ${error.message}`)\n    },\n  })\n\n  // Agent 名称到聊天角色的映射\n  const agentToRole = useCallback((agent: string): ChatRole => {\n    switch (agent) {\n      case 'BullResearcher': return 'bull'\n      case 'BearResearcher': return 'bear'\n      case 'InvestmentManager': return 'manager'\n      case 'DataCollector': return 'data_collector'\n      case 'QuickAnalyst': return 'manager' // 快速分析师用经理角色\n      default: return 'system'\n    }\n  }, [])\n\n  // 处理 SSE 事件\n  const handleSSEEvent = useCallback((event: SSEDebateEvent) => {\n    console.log('SSE Event:', event.type, event.data)\n    \n    switch (event.type) {\n      case 'task_plan':\n        // 搜索计划事件\n        const plan = event.data as any\n        setChatMessages(prev => {\n          // 查找最后一条消息，如果是数据专员的思考中消息，则替换\n          const lastMsg = prev[prev.length - 1]\n          if (lastMsg && lastMsg.role === 'data_collector' && !lastMsg.content) {\n            return prev.map(msg => \n              msg.id === lastMsg.id \n                ? { ...msg, searchPlan: plan, searchStatus: 'pending' } \n                : msg\n            )\n          }\n          // 否则添加新消息\n          return [...prev, {\n            id: `plan-${Date.now()}`,\n            role: 'data_collector' as ChatRole,\n            content: '',\n            timestamp: new Date(),\n            searchPlan: plan,\n            searchStatus: 'pending'\n          }]\n        })\n        break\n\n      case 'phase':\n        setStreamPhase(event.data.phase || '')\n        // 更新轮次信息\n        if (event.data.round && event.data.max_rounds) {\n          setCurrentRound({ round: event.data.round, maxRounds: event.data.max_rounds })\n          \n          // 实时辩论模式：添加轮次系统消息\n          if (debateMode === 'realtime_debate') {\n            setChatMessages(prev => [...prev, {\n              id: `system-round-${event.data.round}`,\n              role: 'system' as ChatRole,\n              content: `📢 ${t.debateRoom.roundPrefix} ${event.data.round}/${event.data.max_rounds} ${t.debateRoom.roundSuffix}${t.debateRoom.roundStarted}`,\n              timestamp: new Date()\n            }])\n          }\n        }\n        if (event.data.phase === 'complete') {\n          toast.success(t.stockDetail.debateComplete)\n          // 添加完成消息\n          if (debateMode === 'realtime_debate') {\n            setChatMessages(prev => [...prev, {\n              id: 'system-complete',\n              role: 'system' as ChatRole,\n              content: `✅ ${t.debateRoom.debateEnded}`,\n              timestamp: new Date()\n            }])\n          }\n        }\n        if (event.data.phase === 'data_collection' && debateMode === 'realtime_debate') {\n          setChatMessages(prev => [...prev, {\n            id: 'system-start',\n            role: 'system' as ChatRole,\n            content: `🎬 ${t.debateRoom.debateStarted}`,\n            timestamp: new Date()\n          }])\n        }\n        break\n        \n      case 'agent':\n        const { agent, content, is_start, is_end, is_chunk, round } = event.data\n        const chatRole = agentToRole(agent || '')\n        \n        if (is_start) {\n          setActiveAgent(agent || null)\n          \n          // 实时辩论模式：创建新消息\n          if (debateMode === 'realtime_debate') {\n            const newMsgId = `msg-${Date.now()}-${agent}`\n            currentMessageIdRef.current = newMsgId\n            setChatMessages(prev => [...prev, {\n              id: newMsgId,\n              role: chatRole,\n              content: '',\n              timestamp: new Date(),\n              round: round,\n              isStreaming: true\n            }])\n          }\n          \n          // 旧逻辑：分栏模式的轮次标记\n          if (round && debateMode !== 'realtime_debate') {\n            setStreamingContent(prev => {\n              const key = agent === 'BullResearcher' ? 'bull' \n                        : agent === 'BearResearcher' ? 'bear'\n                        : null\n              if (key && round > 1) {\n                const roundMarker = lang === 'zh' \n                  ? `\\n\\n---\\n**【第${round}轮】**\\n`\n                  : `\\n\\n---\\n**【Round ${round}】**\\n`\n                return { ...prev, [key]: prev[key as keyof typeof prev] + roundMarker }\n              }\n              return prev\n            })\n          }\n        } else if (is_end) {\n          setActiveAgent(null)\n          \n          // 实时辩论模式：标记消息完成\n          if (debateMode === 'realtime_debate' && currentMessageIdRef.current) {\n            setChatMessages(prev => prev.map(msg => \n              msg.id === currentMessageIdRef.current \n                ? { ...msg, isStreaming: false }\n                : msg\n            ))\n            currentMessageIdRef.current = null\n          }\n        } else if (is_chunk && content) {\n          // 实时辩论模式：追加到当前消息\n          if (debateMode === 'realtime_debate' && currentMessageIdRef.current) {\n            setChatMessages(prev => prev.map(msg => \n              msg.id === currentMessageIdRef.current \n                ? { ...msg, content: msg.content + content }\n                : msg\n            ))\n          }\n          \n          // 旧逻辑：分栏模式\n          setStreamingContent(prev => {\n            const key = agent === 'BullResearcher' ? 'bull' \n                      : agent === 'BearResearcher' ? 'bear'\n                      : agent === 'InvestmentManager' ? 'manager'\n                      : agent === 'QuickAnalyst' ? 'quick'\n                      : null\n            if (key) {\n              return { ...prev, [key]: prev[key as keyof typeof prev] + content }\n            }\n            return prev\n          })\n        }\n        \n        // 处理 DataCollector 的非流式消息\n        if (agent === 'DataCollector' && content && !is_chunk && debateMode === 'realtime_debate') {\n          setChatMessages(prev => [...prev, {\n            id: `data-collector-${Date.now()}`,\n            role: 'data_collector' as ChatRole,\n            content: content,\n            timestamp: new Date()\n          }])\n        }\n        break\n        \n      case 'result':\n        // 最终结果\n        setDebateResult({\n          success: event.data.success || false,\n          stock_code: stockCode,\n          stock_name: stockName,\n          mode: event.data.mode as any,\n          bull_analysis: event.data.bull_analysis,\n          bear_analysis: event.data.bear_analysis,\n          final_decision: event.data.final_decision,\n          quick_analysis: event.data.quick_analysis,\n          debate_id: event.data.debate_id,\n          execution_time: event.data.execution_time\n        })\n        setIsStreaming(false)\n        setCurrentRound(null)\n        \n        // 保存分析结果到 store（用于历史恢复）\n        saveAnalysisResult({\n          bull: event.data.bull_analysis?.analysis,\n          bear: event.data.bear_analysis?.analysis,\n          manager: event.data.final_decision?.decision,\n          quick: event.data.quick_analysis?.analysis,\n          finalDecision: event.data.final_decision ? {\n            rating: event.data.final_decision.rating,\n            decision: event.data.final_decision.decision\n          } : undefined,\n          executionTime: event.data.execution_time\n        })\n        break\n        \n      case 'error':\n        toast.error(`辩论失败: ${event.data.message}`)\n        setIsStreaming(false)\n        setCurrentRound(null)\n        // 添加错误消息\n        if (debateMode === 'realtime_debate') {\n          setChatMessages(prev => [...prev, {\n            id: 'system-error',\n            role: 'system' as ChatRole,\n            content: `❌ 发生错误: ${event.data.message}`,\n            timestamp: new Date()\n          }])\n        }\n        break\n    }\n  }, [stockCode, stockName, debateMode, agentToRole])\n\n  // 处理追问 SSE 事件\n  const handleFollowUpEvent = useCallback((event: SSEDebateEvent) => {\n    console.log('FollowUp Event:', event.type, event.data)\n    \n    switch (event.type) {\n      case 'task_plan':\n        const plan = event.data as any\n        setChatMessages(prev => [...prev, {\n          id: `plan-${Date.now()}`,\n          role: 'data_collector' as ChatRole,\n          content: '',\n          timestamp: new Date(),\n          searchPlan: plan,\n          searchStatus: 'pending'\n        }])\n        setIsStreaming(false) // 计划生成完就不再流式了，等待确认\n        break\n\n      case 'agent':\n        const { agent, content, is_start, is_end, is_chunk } = event.data\n        const chatRole = agentToRole(agent || '')\n        \n        if (is_start) {\n          setActiveAgent(agent || null)\n          // 创建新消息\n          const newMsgId = `followup-${Date.now()}-${agent}`\n          currentMessageIdRef.current = newMsgId\n          setChatMessages(prev => [...prev, {\n            id: newMsgId,\n            role: chatRole,\n            content: '',\n            timestamp: new Date(),\n            isStreaming: true\n          }])\n        } else if (is_end) {\n          setActiveAgent(null)\n          // 标记消息完成\n          if (currentMessageIdRef.current) {\n            setChatMessages(prev => prev.map(msg => \n              msg.id === currentMessageIdRef.current \n                ? { ...msg, isStreaming: false }\n                : msg\n            ))\n            currentMessageIdRef.current = null\n          }\n          setIsStreaming(false)\n        } else if (is_chunk && content) {\n          // 追加到当前消息\n          if (currentMessageIdRef.current) {\n            setChatMessages(prev => prev.map(msg => \n              msg.id === currentMessageIdRef.current \n                ? { ...msg, content: msg.content + content }\n                : msg\n            ))\n          }\n        }\n        break\n        \n      case 'complete':\n        setIsStreaming(false)\n        break\n        \n      case 'error':\n        toast.error(`回复失败: ${event.data.message}`)\n        setIsStreaming(false)\n        break\n    }\n  }, [agentToRole])\n\n  // 处理用户发送消息（支持 @ 提及）\n  const handleUserSendMessage = useCallback((content: string, mentions?: MentionTarget[]) => {\n    // 添加用户消息到聊天\n    const userMessage: ChatMessage = {\n      id: `user-${Date.now()}`,\n      role: 'user' as ChatRole,\n      content: content,\n      timestamp: new Date()\n    }\n    setChatMessages(prev => [...prev, userMessage])\n    \n    // 同步到 Store\n    if (currentSession) {\n      addMessageToStore(userMessage)\n    }\n    \n    // 角色名称映射\n    const roleNames: Record<string, string> = {\n      bull: t.debateHistory.roleNames.bull,\n      bear: t.debateHistory.roleNames.bear,\n      manager: t.debateHistory.roleNames.manager,\n      data_collector: t.debateHistory.roleNames.data_collector,\n      user: t.debateHistory.roleNames.user,\n      system: t.stockDetail.history === '历史' ? '系统' : 'System'\n    }\n    \n    // 构建上下文（从之前的聊天记录中提取）\n    const contextSummary = chatMessages\n      .filter(m => m.role !== 'system' && m.role !== 'user')\n      .slice(-6) // 最近6条消息\n      .map(m => `【${roleNames[m.role] || m.role}】${m.content.slice(0, 200)}`)\n      .join('\\n')\n    \n    // 开始流式请求\n    setIsStreaming(true)\n    \n    const cancel = agentApi.followUp(\n      {\n        stock_code: stockCode,\n        stock_name: stockName,\n        question: content,\n        context: contextSummary\n      },\n      handleFollowUpEvent,\n      (error) => {\n        toast.error(`回复失败: ${error.message}`)\n        setIsStreaming(false)\n      },\n      () => {\n        setIsStreaming(false)\n      }\n    )\n    \n    // 保存取消函数\n    cancelStreamRef.current = cancel\n  }, [stockCode, stockName, chatMessages, handleFollowUpEvent])\n\n  // 处理确认搜索\n  const handleConfirmSearch = useCallback((plan: any, msgId: string) => {\n    // 更新消息状态为执行中\n    setChatMessages(prev => prev.map(msg => \n      msg.id === msgId ? { ...msg, searchStatus: 'executing' } : msg\n    ))\n    \n    setIsStreaming(true)\n    \n    // 执行搜索\n    agentApi.executeSearch(\n      plan,\n      (event) => {\n        if (event.type === 'agent') {\n          // 搜索结果返回\n          const { content } = event.data\n          setChatMessages(prev => prev.map(msg => \n            msg.id === msgId \n              ? { ...msg, content: content || '', searchStatus: 'completed' } \n              : msg\n          ))\n          \n          // 同步到 Store\n          if (currentSession) {\n            const updatedMsg = chatMessages.find(m => m.id === msgId)\n            if (updatedMsg) {\n              addMessageToStore({ ...updatedMsg, content: content || '', searchStatus: 'completed' })\n            }\n          }\n        }\n      },\n      (error) => {\n        toast.error(`搜索执行失败: ${error.message}`)\n        setIsStreaming(false)\n        setChatMessages(prev => prev.map(msg => \n          msg.id === msgId ? { ...msg, searchStatus: 'pending' } : msg\n        ))\n      },\n      () => {\n        setIsStreaming(false)\n        // 先同步消息到 Store，再保存到后端\n        syncMessages(chatMessagesRef.current)\n        syncToBackend(stockCode)\n      }\n    )\n  }, [stockCode, currentSession, chatMessages, addMessageToStore, syncMessages, syncToBackend])\n\n  // 处理取消搜索\n  const handleCancelSearch = useCallback((msgId: string) => {\n    setChatMessages(prev => prev.map(msg => \n      msg.id === msgId ? { ...msg, searchStatus: 'cancelled' } : msg\n    ))\n    toast.info(t.stockDetail.searchCancelled)\n  }, [])\n\n  const handleStartDebate = useCallback(() => {\n    // 重置状态\n    setDebateResult(null)\n    setStreamingContent({ bull: '', bear: '', manager: '', quick: '' })\n    setStreamPhase('')\n    setActiveAgent(null)\n    setCurrentRound(null)\n    setChatMessages([]) // 重置聊天消息\n    currentMessageIdRef.current = null\n    setIsStreaming(true)\n    \n    // 创建新的辩论会话\n    startSession(stockCode, stockName, debateMode)\n    \n    // 取消之前的流\n    if (cancelStreamRef.current) {\n      cancelStreamRef.current()\n    }\n    \n    // 开始新的流式辩论\n    const cancel = agentApi.runDebateStream(\n      {\n        stock_code: stockCode,\n        stock_name: stockName,\n        mode: debateMode as 'parallel' | 'realtime_debate' | 'quick_analysis',\n        language: lang,\n      },\n      handleSSEEvent,\n      (error) => {\n        toast.error(`辩论失败: ${error.message}`)\n        setIsStreaming(false)\n        updateSessionStatus('interrupted')\n      },\n      () => {\n        // 完成后保存分析结果并同步到后端\n        console.log('🏁 Debate completed!')\n        console.log('🏁 chatMessagesRef.current:', chatMessagesRef.current.length, 'messages')\n        console.log('🏁 Message roles:', chatMessagesRef.current.map(m => m.role))\n        \n        setIsStreaming(false)\n        updateSessionStatus('completed')\n        // 使用 ref 获取最新的消息列表，批量同步到 Store\n        syncMessages(chatMessagesRef.current)\n        // 然后同步到后端\n        syncToBackend(stockCode)\n      }\n    )\n    \n    cancelStreamRef.current = cancel\n  }, [stockCode, stockName, debateMode, handleSSEEvent, startSession, syncMessages, syncToBackend])\n  \n  // 组件卸载时取消流\n  useEffect(() => {\n    return () => {\n      if (cancelStreamRef.current) {\n        cancelStreamRef.current()\n      }\n    }\n  }, [])\n\n  // 定期保存流式内容到 store（防止刷新丢失）\n  useEffect(() => {\n    if (!isStreaming) return\n    \n    const saveInterval = setInterval(() => {\n      // 保存当前分析内容（并行/快速模式）\n      if (streamingContent.bull || streamingContent.bear || streamingContent.manager || streamingContent.quick) {\n        saveAnalysisResult({\n          bull: streamingContent.bull || undefined,\n          bear: streamingContent.bear || undefined,\n          manager: streamingContent.manager || undefined,\n          quick: streamingContent.quick || undefined\n        })\n      }\n    }, 3000) // 每3秒保存一次\n    \n    return () => clearInterval(saveInterval)\n  }, [isStreaming, streamingContent, saveAnalysisResult])\n\n  // 实时辩论模式：同步所有完成的消息到 store\n  useEffect(() => {\n    if (debateMode !== 'realtime_debate' || chatMessages.length === 0 || !currentSession) return\n    \n    // 找出所有已完成但尚未在 Store 中的消息\n    const storeMessageIds = new Set(currentSession.messages.map(m => m.id))\n    const completedMessages = chatMessages.filter(m => \n      !m.isStreaming && // 已完成\n      (m.content || m.searchPlan) && // 有内容\n      !storeMessageIds.has(m.id) // 不在 Store 中\n    )\n    \n    // 逐个添加到 Store\n    for (const msg of completedMessages) {\n      addMessageToStore(msg)\n    }\n  }, [chatMessages, debateMode, currentSession, addMessageToStore])\n\n  // 定向爬取任务状态查询\n  const { data: crawlStatus, refetch: refetchCrawlStatus } = useQuery({\n    queryKey: ['stock', 'targeted-crawl-status', stockCode],\n    queryFn: () => stockApi.getTargetedCrawlStatus(stockCode),\n    enabled: crawlTask.status === 'running' || crawlTask.status === 'pending',\n    refetchInterval: (crawlTask.status === 'running' || crawlTask.status === 'pending') ? 2000 : false, // pending/running 时每2秒轮询\n    staleTime: 0,\n  })\n\n  // 监听爬取状态变化\n  useEffect(() => {\n    // 只在有状态且当前任务正在进行时处理\n    if (crawlStatus && (crawlTask.status === 'running' || crawlTask.status === 'pending')) {\n      // 重要：检查 task_id 是否匹配，避免使用旧任务的状态\n      const isMatchingTask = !crawlTask.taskId || !crawlStatus.task_id || crawlTask.taskId === crawlStatus.task_id\n      \n      if (!isMatchingTask) {\n        console.warn('Task ID mismatch, ignoring status update', { \n          currentTaskId: crawlTask.taskId, \n          statusTaskId: crawlStatus.task_id \n        })\n        return\n      }\n      \n      if (crawlStatus.status === 'completed') {\n        setCrawlTask({ \n          status: 'completed', \n          taskId: crawlStatus.task_id,\n          progress: { current: 100, total: 100, message: t.stockDetail.crawlComplete }\n        })\n        // 强制刷新新闻列表（忽略缓存）\n        queryClient.resetQueries({ queryKey: ['stock', 'news', stockCode] })\n        queryClient.resetQueries({ queryKey: ['stock', 'overview', stockCode] })\n        // 立即重新获取\n        queryClient.refetchQueries({ queryKey: ['stock', 'news', stockCode], type: 'all' })\n        queryClient.refetchQueries({ queryKey: ['stock', 'overview', stockCode], type: 'all' })\n        toast.success(`${t.stockDetail.crawlSuccess} ${crawlStatus.saved_count || 0} ${t.stockDetail.newsItems}`)\n      } else if (crawlStatus.status === 'failed') {\n        setCrawlTask({ \n          status: 'failed', \n          taskId: crawlStatus.task_id,\n          error: crawlStatus.error_message || t.stockDetail.crawlFailed\n        })\n        toast.error(`${t.stockDetail.crawlFailed}: ${crawlStatus.error_message || t.stockDetail.unknownError}`)\n      } else if (crawlStatus.status === 'running' || crawlStatus.status === 'pending') {\n        // 更新进度和真实的 taskId\n        setCrawlTask(prev => ({\n          ...prev,\n          status: crawlStatus.status as CrawlTaskStatus,\n          taskId: crawlStatus.task_id || prev.taskId,\n          progress: crawlStatus.progress || prev.progress\n        }))\n      }\n    }\n  }, [crawlStatus, crawlTask.status, crawlTask.taskId, stockCode, queryClient])\n\n  // 页面加载时检查是否有进行中的任务\n  useEffect(() => {\n    const checkExistingTask = async () => {\n      try {\n        const status = await stockApi.getTargetedCrawlStatus(stockCode)\n        // 只恢复正在运行或等待中的任务\n        if (status && (status.status === 'running' || status.status === 'pending')) {\n          setCrawlTask({\n            status: status.status as CrawlTaskStatus,\n            taskId: status.task_id,\n            progress: status.progress\n          })\n        } else {\n          // 其他状态（completed/failed/idle）重置为 idle\n          setCrawlTask({ status: 'idle' })\n        }\n      } catch {\n        // 没有进行中的任务，保持 idle 状态\n        setCrawlTask({ status: 'idle' })\n      }\n    }\n    checkExistingTask()\n  }, [stockCode])\n\n  // 定向爬取 Mutation\n  const targetedCrawlMutation = useMutation({\n    mutationFn: () => stockApi.startTargetedCrawl(stockCode, stockName),\n    onSuccess: (data) => {\n      if (data.success) {\n        // 任务启动成功，设置为 pending 状态（后端已创建任务记录）\n        setCrawlTask({ \n          status: 'pending', \n          taskId: data.task_id!,  // 现在 task_id 一定存在\n          progress: { current: 0, total: 100, message: t.stockDetail.taskCreated }\n        })\n        toast.success(t.stockDetail.crawlTaskStarted)\n        // 立即开始轮询（不需要延迟，因为任务记录已创建）\n        refetchCrawlStatus()\n      } else if (data.task_id) {\n        // 已有正在进行的任务，恢复到该任务的状态\n        setCrawlTask({ \n          status: 'running', \n          taskId: data.task_id,\n          progress: { current: 0, total: 100, message: t.stockDetail.crawlingInProgress }\n        })\n        toast.info(t.stockDetail.crawlTaskExists)\n        // 立即获取任务状态\n        refetchCrawlStatus()\n      } else {\n        setCrawlTask({ status: 'failed', error: data.message })\n        toast.error(`启动失败: ${data.message}`)\n      }\n    },\n    onError: (error: Error) => {\n      setCrawlTask({ status: 'failed', error: error.message })\n      toast.error(`启动失败: ${error.message}`)\n    },\n  })\n\n  const handleStartCrawl = () => {\n    // 重置状态，清除之前的 taskId\n    setCrawlTask({ status: 'pending', taskId: undefined })\n    targetedCrawlMutation.mutate()\n  }\n\n  const handleStopCrawl = async () => {\n    if (window.confirm(t.stockDetail.stopCrawlConfirm)) {\n      try {\n        // 调用后端 API 取消任务\n        const result = await stockApi.cancelTargetedCrawl(stockCode)\n        if (result.success) {\n          setCrawlTask({ status: 'idle' })\n          toast.info(result.message || t.stockDetail.crawlTaskStopped)\n        } else {\n          toast.error(result.message || t.stockDetail.crawlTaskStopFailed)\n        }\n      } catch (error: any) {\n        console.error('Failed to cancel crawl task:', error)\n        // 即使后端失败，也重置前端状态\n      setCrawlTask({ status: 'idle' })\n      toast.info(t.stockDetail.crawlTaskStopped)\n      }\n    }\n  }\n\n  // 清除新闻 Mutation\n  const clearNewsMutation = useMutation({\n    mutationFn: () => stockApi.clearStockNews(stockCode),\n    onSuccess: (data) => {\n      if (data.success) {\n        toast.success(`${t.stockDetail.newsCleared} ${data.deleted_count || 0} ${t.stockDetail.newsItems}`)\n        // 强制刷新新闻列表\n        queryClient.resetQueries({ queryKey: ['stock', 'news', stockCode] })\n        queryClient.resetQueries({ queryKey: ['stock', 'overview', stockCode] })\n        queryClient.refetchQueries({ queryKey: ['stock', 'news', stockCode], type: 'all' })\n        queryClient.refetchQueries({ queryKey: ['stock', 'overview', stockCode], type: 'all' })\n      } else {\n        toast.error(`清除失败: ${data.message}`)\n      }\n    },\n    onError: (error: Error) => {\n      toast.error(`清除失败: ${error.message}`)\n    },\n  })\n\n  const handleClearNews = () => {\n    if (window.confirm(`${t.stockDetail.clearNewsConfirm}${stockName}${t.stockDetail.clearNewsConfirmEnd}`)) {\n      clearNewsMutation.mutate()\n    }\n  }\n\n  // 情感趋势指示器\n  const getTrendIcon = (trend: string) => {\n    switch (trend) {\n      case 'up':\n        return <TrendingUp className=\"w-5 h-5 text-emerald-500\" />\n      case 'down':\n        return <TrendingDown className=\"w-5 h-5 text-rose-500\" />\n      default:\n        return <Minus className=\"w-5 h-5 text-gray-500\" />\n    }\n  }\n\n  const getSentimentColor = (score: number | null) => {\n    if (score === null) return 'gray'\n    if (score > 0.1) return 'emerald'\n    if (score < -0.1) return 'rose'\n    return 'amber'\n  }\n\n  const getSentimentLabel = (score: number | null) => {\n    if (score === null) return t.stockDetail.unknown\n    if (score > 0.3) return t.stockDetail.strongBull\n    if (score > 0.1) return t.stockDetail.positive\n    if (score < -0.3) return t.stockDetail.strongBear\n    if (score < -0.1) return t.stockDetail.negative\n    return t.stockDetail.neutral\n  }\n\n  // 复制内容到剪贴板\n  const handleCopyContent = (content: string, label: string) => {\n    navigator.clipboard.writeText(content).then(() => {\n      toast.success(`${label}${t.stockDetail.copy}`)\n    }).catch(() => {\n      toast.error(`${t.stockDetail.copy}失败`)\n    })\n  }\n\n  // 导出内容到本地文件\n  const handleExportToFile = (content: string, filename: string) => {\n    const blob = new Blob([content], { type: 'text/markdown;charset=utf-8' })\n    const url = URL.createObjectURL(blob)\n    const link = document.createElement('a')\n    link.href = url\n    link.download = filename\n    document.body.appendChild(link)\n    link.click()\n    document.body.removeChild(link)\n    URL.revokeObjectURL(url)\n    toast.success(`${t.stockDetail.export}成功`)\n  }\n\n  return (\n    <div className=\"p-6 space-y-6 bg-gradient-to-br from-slate-50 to-blue-50 min-h-screen\">\n      {/* 顶部标题区 */}\n      <div className=\"flex items-center justify-between gap-4 flex-wrap\">\n        <div className=\"flex items-center gap-6\">\n        <div>\n          <div className=\"flex items-center gap-3\">\n            <h1 className=\"text-3xl font-bold tracking-tight text-gray-900\">\n              {stockName}\n            </h1>\n            <Badge variant=\"outline\" className=\"text-base px-3 py-1 bg-white\">\n              {stockCode}\n            </Badge>\n          </div>\n          <p className=\"text-muted-foreground mt-1 flex items-center gap-2\">\n            <Activity className=\"w-4 h-4\" />\n            {t.stockDetail.title}\n          </p>\n        </div>\n        </div>\n        \n        <div className=\"flex items-center gap-3\">\n          {/* 历史记录按钮 */}\n          {historySessions.length > 0 && (\n            <Button\n              variant=\"outline\"\n              size=\"sm\"\n              onClick={() => setShowHistorySidebar(true)}\n              className=\"gap-2 hover:bg-indigo-50 border-indigo-200 text-indigo-600\"\n            >\n              <History className=\"w-4 h-4\" />\n              {t.stockDetail.history} ({historySessions.length})\n            </Button>\n          )}\n          {/* 返回按钮 */}\n            <Button\n              variant=\"outline\"\n              size=\"sm\"\n            onClick={() => navigate('/stock')}\n            className=\"gap-2 hover:bg-gray-100\"\n        >\n            <ArrowLeft className=\"w-4 h-4\" />\n            {t.stockDetail.backToSearch}\n        </Button>\n        </div>\n      </div>\n\n      {/* 知识图谱卡片 */}\n      {showKnowledgeGraph && knowledgeGraph && knowledgeGraph.graph_exists && (\n        <Card className=\"bg-gradient-to-r from-purple-50 to-blue-50 border-purple-200\">\n          <CardHeader>\n            <div className=\"flex items-start justify-between\">\n              <div>\n                <CardTitle className=\"flex items-center gap-2 text-purple-800\">\n                  <Network className=\"w-5 h-5 text-purple-600\" />\n                  {t.stockDetail.knowledgeGraph}\n                </CardTitle>\n                <CardDescription className=\"mt-1.5\">\n                  {t.stockDetail.knowledgeGraphDesc}\n                </CardDescription>\n              </div>\n              <Button\n                variant=\"ghost\"\n                size=\"sm\"\n                onClick={() => refetchKG()}\n                className=\"h-8 px-2\"\n                title=\"刷新图谱\"\n              >\n                <RefreshCw className={`w-3.5 h-3.5 ${kgLoading ? 'animate-spin' : ''}`} />\n              </Button>\n            </div>\n          </CardHeader>\n          <CardContent className=\"space-y-3\">\n            {/* 名称变体 */}\n            {knowledgeGraph.name_variants && knowledgeGraph.name_variants.length > 0 && (\n              <div>\n                <p className=\"text-xs text-gray-500 mb-1\">{t.stockDetail.nameVariants}</p>\n                <div className=\"flex flex-wrap gap-1\">\n                  {knowledgeGraph.name_variants.map((variant, idx) => (\n                    <Badge key={idx} variant=\"outline\" className=\"text-xs bg-white\">\n                      {variant}\n                    </Badge>\n                  ))}\n                </div>\n              </div>\n            )}\n            \n            {/* 业务线 */}\n            {knowledgeGraph.businesses && knowledgeGraph.businesses.length > 0 && (\n              <div>\n                <p className=\"text-xs text-gray-500 mb-1\">{t.stockDetail.mainBusiness}</p>\n                <div className=\"flex flex-wrap gap-1\">\n                  {knowledgeGraph.businesses\n                    .filter(b => b.status === 'active')\n                    .slice(0, 5)\n                    .map((business, idx) => (\n                      <Badge \n                        key={idx} \n                        className={`text-xs ${\n                          business.type === 'new' \n                            ? 'bg-emerald-100 text-emerald-700' \n                            : 'bg-blue-100 text-blue-700'\n                        }`}\n                        title={business.description || business.name}\n                      >\n                        {business.type === 'new' && '🆕 '}\n                        {business.name}\n                      </Badge>\n                    ))}\n                </div>\n              </div>\n            )}\n            \n            {/* 关联概念 */}\n            {knowledgeGraph.concepts && knowledgeGraph.concepts.length > 0 && (\n              <div>\n                <p className=\"text-xs text-gray-500 mb-1\">{t.stockDetail.relatedConcepts}</p>\n                <div className=\"flex flex-wrap gap-1\">\n                  {knowledgeGraph.concepts.slice(0, 6).map((concept, idx) => (\n                    <Badge key={idx} className=\"text-xs bg-purple-100 text-purple-700\">\n                      {concept}\n                    </Badge>\n                  ))}\n                </div>\n              </div>\n            )}\n            \n            {/* 检索策略 */}\n            {knowledgeGraph.search_queries && knowledgeGraph.search_queries.length > 0 && (\n              <div>\n                <p className=\"text-xs text-gray-500 mb-1\">{t.stockDetail.concurrentQueries}（{knowledgeGraph.search_queries.length}{t.stockDetail.queries}）</p>\n                <div className=\"text-xs text-gray-600 bg-white rounded p-2 max-h-20 overflow-y-auto\">\n                  {knowledgeGraph.search_queries.slice(0, 3).map((query, idx) => (\n                    <div key={idx} className=\"truncate\">• {query}</div>\n                  ))}\n                  {knowledgeGraph.search_queries.length > 3 && (\n                    <div className=\"text-gray-400\">... 还有 {knowledgeGraph.search_queries.length - 3} 条</div>\n                  )}\n                </div>\n              </div>\n            )}\n          </CardContent>\n        </Card>\n      )}\n\n      {/* 概览卡片 */}\n      <div className=\"grid grid-cols-1 md:grid-cols-4 gap-4\">\n        <Card className=\"bg-white/80 backdrop-blur-sm border-blue-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.stockDetail.relatedNews}</p>\n                <p className=\"text-2xl font-bold text-blue-600\">\n                  {overview?.total_news || 0}\n                </p>\n              </div>\n              <Newspaper className=\"w-8 h-8 text-blue-500/50\" />\n            </div>\n            <p className=\"text-xs text-muted-foreground mt-2\">\n              {t.stockDetail.analyzed} {overview?.analyzed_news || 0} {t.stockDetail.items}\n            </p>\n          </CardContent>\n        </Card>\n\n        <Card className=\"bg-white/80 backdrop-blur-sm border-emerald-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.stockDetail.overallSentiment}</p>\n                <p className={`text-2xl font-bold text-${getSentimentColor(overview?.avg_sentiment ?? null)}-600`}>\n                  {overview?.avg_sentiment != null \n                    ? (overview.avg_sentiment > 0 ? '+' : '') + overview.avg_sentiment.toFixed(2)\n                    : '--'}\n                </p>\n              </div>\n              <BarChart3 className={`w-8 h-8 text-${getSentimentColor(overview?.avg_sentiment || null)}-500/50`} />\n            </div>\n            <p className=\"text-xs text-muted-foreground mt-2\">\n              {getSentimentLabel(overview?.avg_sentiment || null)}\n            </p>\n          </CardContent>\n        </Card>\n\n        <Card className=\"bg-white/80 backdrop-blur-sm border-purple-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.stockDetail.recent7d}</p>\n                <p className={`text-2xl font-bold text-${getSentimentColor(overview?.recent_sentiment ?? null)}-600`}>\n                  {overview?.recent_sentiment != null\n                    ? (overview.recent_sentiment > 0 ? '+' : '') + overview.recent_sentiment.toFixed(2)\n                    : '--'}\n                </p>\n              </div>\n              {getTrendIcon(overview?.sentiment_trend || 'stable')}\n            </div>\n            <p className=\"text-xs text-muted-foreground mt-2 flex items-center gap-1\">\n              {t.stockDetail.trend}：\n              {overview?.sentiment_trend === 'up' && <span className=\"text-emerald-600\">{t.stockDetail.up} ↑</span>}\n              {overview?.sentiment_trend === 'down' && <span className=\"text-rose-600\">{t.stockDetail.down} ↓</span>}\n              {overview?.sentiment_trend === 'stable' && <span className=\"text-gray-600\">{t.stockDetail.stable} →</span>}\n            </p>\n          </CardContent>\n        </Card>\n\n        <Card className=\"bg-white/80 backdrop-blur-sm border-orange-100\">\n          <CardContent className=\"pt-6\">\n            <div className=\"flex items-center justify-between\">\n              <div>\n                <p className=\"text-sm text-muted-foreground\">{t.stockDetail.latestNews}</p>\n                <p className=\"text-lg font-medium text-gray-700\">\n                  {overview?.last_news_time \n                    ? formatRelativeTime(overview.last_news_time, t.time)\n                    : t.stockDetail.none}\n                </p>\n              </div>\n              <Calendar className=\"w-8 h-8 text-orange-500/50\" />\n            </div>\n          </CardContent>\n        </Card>\n      </div>\n\n          {/* K线图 */}\n          <Card className=\"bg-white/90\">\n            <CardHeader className=\"pb-2\">\n              <div className=\"flex items-center justify-between flex-wrap gap-4\">\n                <div>\n              <CardTitle className=\"flex items-center gap-2\">\n                <TrendingUp className=\"w-5 h-5 text-blue-500\" />\n                    {t.stockDetail.kline}\n              </CardTitle>\n              <CardDescription>\n                    {t.stockDetail.dataSource}：akshare · {ADJUST_OPTIONS.find(o => o.value === klineAdjust)?.label || t.stockDetail.qfq} · {t.stockDetail.supportZoom}\n              </CardDescription>\n                </div>\n                {klineData && klineData.length > 0 && (\n                  <div className=\"flex items-center gap-4 text-sm\">\n                    <div className=\"flex items-center gap-1\">\n                      <span className=\"text-gray-500\">{t.stockDetail.close}：</span>\n                      <span className={`font-semibold ${\n                        klineData[klineData.length - 1].change_percent !== undefined &&\n                        klineData[klineData.length - 1].change_percent! >= 0\n                          ? 'text-rose-600'\n                          : 'text-emerald-600'\n                      }`}>\n                        ¥{klineData[klineData.length - 1].close.toFixed(2)}\n                      </span>\n                    </div>\n                    {klineData[klineData.length - 1].change_percent !== undefined && (\n                      <div className=\"flex items-center gap-1\">\n                        <span className=\"text-gray-500\">{t.stockDetail.change}：</span>\n                        <Badge className={\n                          klineData[klineData.length - 1].change_percent! >= 0\n                            ? 'bg-rose-100 text-rose-700'\n                            : 'bg-emerald-100 text-emerald-700'\n                        }>\n                          {klineData[klineData.length - 1].change_percent! >= 0 ? '+' : ''}\n                          {klineData[klineData.length - 1].change_percent!.toFixed(2)}%\n                        </Badge>\n                      </div>\n                    )}\n                    {klineData[klineData.length - 1].turnover !== undefined && (\n                      <div className=\"flex items-center gap-1\">\n                        <span className=\"text-gray-500\">{t.stockDetail.volume}：</span>\n                        <span className=\"font-medium\">\n                          {(klineData[klineData.length - 1].turnover! / 100000000).toFixed(2)}{t.stockDetail.billion}\n                        </span>\n                      </div>\n                    )}\n                  </div>\n                )}\n              </div>\n              {/* 周期和复权选择器 */}\n              <div className=\"flex items-center gap-1 mt-3 pt-3 border-t border-gray-100 flex-wrap\">\n                <span className=\"text-sm text-gray-500 mr-2\">{t.stockDetail.period}：</span>\n                {PERIOD_OPTIONS.map((option) => (\n                  <Button\n                    key={option.value}\n                    variant={klinePeriod === option.value ? 'default' : 'ghost'}\n                    size=\"sm\"\n                    onClick={() => setKlinePeriod(option.value)}\n                    className={`h-7 px-3 text-xs ${\n                      klinePeriod === option.value \n                        ? 'bg-blue-600 hover:bg-blue-700' \n                        : 'hover:bg-gray-100'\n                    }`}\n                  >\n                    {option.label}\n                  </Button>\n                ))}\n                \n                {/* 复权类型选择器（仅日线有效） */}\n                {klinePeriod === 'daily' && (\n                  <>\n                    <span className=\"text-gray-300 mx-2\">|</span>\n                    <span className=\"text-sm text-gray-500 mr-2\" title=\"前复权可消除分红送股产生的缺口，保持K线连续性\">\n                      {t.stockDetail.adjust}：\n                    </span>\n                    {ADJUST_OPTIONS.map((option) => (\n                      <Button\n                        key={option.value}\n                        variant={klineAdjust === option.value ? 'default' : 'ghost'}\n                        size=\"sm\"\n                        onClick={() => setKlineAdjust(option.value)}\n                        title={option.tip}\n                        className={`h-7 px-3 text-xs ${\n                          klineAdjust === option.value \n                            ? 'bg-amber-600 hover:bg-amber-700' \n                            : 'hover:bg-gray-100'\n                        }`}\n                      >\n                        {option.label}\n                        {option.value === 'qfq' && <span className=\"ml-1 text-[10px] opacity-70\">{t.stockDetail.recommendLabel || 'Recommend'}</span>}\n                      </Button>\n                    ))}\n                  </>\n                )}\n                \n                <Button\n                  variant=\"ghost\"\n                  size=\"sm\"\n                  onClick={() => refetchKline()}\n                  disabled={klineLoading}\n                  className=\"h-7 px-2 ml-2\"\n                >\n                  <RefreshCw className={`w-3.5 h-3.5 ${klineLoading ? 'animate-spin' : ''}`} />\n                </Button>\n              </div>\n            </CardHeader>\n            <CardContent>\n              {klineLoading ? (\n                <div className=\"h-[550px] flex items-center justify-center\">\n                  <Loader2 className=\"w-8 h-8 animate-spin text-blue-500\" />\n                </div>\n              ) : klineData && klineData.length > 0 ? (\n                <KLineChart\n                  data={klineData}\n                  height={550}\n                  showVolume={true}\n                  showMA={klinePeriod === 'daily'}\n                  showMACD={false}\n                  theme=\"light\"\n                  period={klinePeriod}\n                />\n              ) : (\n                <div className=\"h-[550px] flex flex-col items-center justify-center text-gray-500\">\n                  <BarChart3 className=\"w-12 h-12 opacity-50 mb-3\" />\n                  <p>{t.stockDetail.noKline}</p>\n                  <p className=\"text-sm mt-1\">{t.stockDetail.checkCode}</p>\n                </div>\n              )}\n          </CardContent>\n        </Card>\n\n      {/* 关联新闻 */}\n      <Card className=\"bg-white/90\">\n          <CardHeader>\n            <div className=\"flex items-start justify-between\">\n              <div className=\"flex-1\">\n                <div className=\"flex items-center justify-between\">\n                  <div>\n                    <CardTitle className=\"flex items-center gap-2\">\n                      <Newspaper className=\"w-5 h-5 text-blue-500\" />\n                      {t.stockDetail.news}\n                    </CardTitle>\n                    <CardDescription className=\"mt-1.5\">\n                      {t.stockDetail.newsContain} {stockCode} {t.stockDetail.newsTotal} {newsList && `（${t.stockDetail.newsTotal}${newsList.length}${t.stockDetail.items}）`}\n                    </CardDescription>\n                  </div>\n                  {/* 展开/折叠按钮 */}\n                  <Button\n                    variant=\"ghost\"\n                    size=\"sm\"\n                    onClick={() => {\n                      setNewsExpanded(!newsExpanded)\n                      if (newsExpanded) {\n                        // 折叠时重置为12条\n                        setNewsDisplayCount(12)\n                      }\n                    }}\n                    className=\"gap-2\"\n                  >\n                    <ChevronDown className={`w-4 h-4 transition-transform ${newsExpanded ? '' : 'rotate-180'}`} />\n                    {newsExpanded ? t.stockDetail.fold : t.stockDetail.expand}\n                  </Button>\n                </div>\n              </div>\n              {/* 定向爬取按钮组 */}\n              <div className=\"flex items-center gap-2\">\n                {/* 一键清除按钮 - 仅在有新闻时显示 */}\n                {hasHistoryNews && (\n                  <Button\n                    variant=\"ghost\"\n                    size=\"sm\"\n                    onClick={handleClearNews}\n                    disabled={clearNewsMutation.isPending || crawlTask.status === 'running' || crawlTask.status === 'pending'}\n                    className=\"gap-2 text-rose-600 hover:text-rose-700 hover:bg-rose-50\"\n                    title=\"清除该股票的所有新闻\"\n                  >\n                    {clearNewsMutation.isPending ? (\n                      <>\n                        <Loader2 className=\"w-4 h-4 animate-spin\" />\n                        <span>清除中...</span>\n                      </>\n                    ) : (\n                      <>\n                        <Trash2 className=\"w-4 h-4\" />\n                        <span>{t.stockDetail.clearData}</span>\n                      </>\n                    )}\n                  </Button>\n                )}\n                \n                {crawlTask.status === 'completed' && (\n                  <span className=\"flex items-center gap-1 text-xs text-emerald-600\">\n                    <CheckCircle2 className=\"w-3.5 h-3.5\" />\n                    {t.stockDetail.crawlComplete}\n                  </span>\n                )}\n                {crawlTask.status === 'failed' && (\n                  <span className=\"flex items-center gap-1 text-xs text-rose-600\">\n                    <AlertCircle className=\"w-3.5 h-3.5\" />\n                    {t.stockDetail.crawlFailed}\n                  </span>\n                )}\n                {crawlTask.status === 'running' || crawlTask.status === 'pending' ? (\n                  <>\n                    <Button\n                      variant=\"outline\"\n                      size=\"sm\"\n                      disabled\n                      className=\"gap-2\"\n                    >\n                      <Loader2 className=\"w-4 h-4 animate-spin\" />\n                      <span>{t.stockDetail.crawling}</span>\n                      {crawlTask.progress && (\n                        <span className=\"text-xs text-gray-500\">\n                          {crawlTask.progress.message || `${crawlTask.progress.current}%`}\n                        </span>\n                      )}\n                    </Button>\n                    <Button\n                      variant=\"ghost\"\n                      size=\"sm\"\n                      onClick={handleStopCrawl}\n                      className=\"gap-2 text-rose-600 hover:text-rose-700 hover:bg-rose-50\"\n                    >\n                      <StopCircle className=\"w-4 h-4\" />\n                      <span>{t.stockDetail.stop}</span>\n                    </Button>\n                  </>\n                ) : (\n                  <Button\n                    variant=\"outline\"\n                    size=\"sm\"\n                    onClick={handleStartCrawl}\n                    disabled={targetedCrawlMutation.isPending}\n                    className=\"gap-2\"\n                  >\n                    <Download className=\"w-4 h-4\" />\n                    {hasHistoryNews ? t.stockDetail.updateCrawl : t.stockDetail.targetCrawl}\n                  </Button>\n                )}\n              </div>\n            </div>\n          </CardHeader>\n          <CardContent>\n            {newsLoading ? (\n              <div className=\"flex items-center justify-center py-12\">\n                <Loader2 className=\"w-8 h-8 animate-spin text-blue-500\" />\n              </div>\n            ) : newsList && newsList.length > 0 ? (\n              newsExpanded ? (\n                <div className=\"space-y-4\">\n                  {/* 卡片 Grid 布局 */}\n                  <div className=\"grid gap-4 grid-cols-1 md:grid-cols-2 lg:grid-cols-3\">\n                  {displayedNews.map((news) => (\n                    <Card\n                      key={news.id}\n                      className={getNewsCardStyle(news.sentiment_score)}\n                      onClick={() => {\n                        setSelectedNewsId(news.id)\n                        setDrawerOpen(true)\n                      }}\n                    >\n                      <CardHeader className=\"pb-2 flex-shrink-0\">\n                        <CardTitle className=\"text-sm leading-tight font-semibold text-gray-900 line-clamp-2 min-h-[40px]\">\n                          {news.title}\n                        </CardTitle>\n                        <div className=\"flex items-center gap-2 text-xs text-gray-500 mt-1\">\n                          <Calendar className=\"w-3 h-3\" />\n                          <span>{news.publish_time ? formatRelativeTime(news.publish_time, t.time) : t.stockDetail.unknown}</span>\n                          <span>•</span>\n                          <span>{news.source}</span>\n                        </div>\n                      </CardHeader>\n                      \n                      <CardContent className=\"flex-1 flex flex-col pb-3 pt-1 overflow-hidden\">\n                        <p \n                          className=\"text-sm text-gray-600 leading-relaxed flex-1\"\n                          style={{\n                            display: '-webkit-box',\n                            WebkitLineClamp: 3,\n                            WebkitBoxOrient: 'vertical',\n                            overflow: 'hidden'\n                          }}\n                        >\n                          {news.content}\n                        </p>\n                        \n                        {/* 底部标签区域 */}\n                        <div className=\"flex items-center justify-between mt-3 pt-2 border-t border-gray-100\">\n                          <div className=\"flex items-center gap-1.5\">\n                            {news.sentiment_score !== null && (\n                              <Badge \n                                className={`text-xs px-2 py-0.5 ${\n                                  news.sentiment_score > 0.1 ? 'bg-emerald-100 text-emerald-700 border-emerald-200' :\n                                  news.sentiment_score < -0.1 ? 'bg-rose-100 text-rose-700 border-rose-200' :\n                                  'bg-amber-100 text-amber-700 border-amber-200'\n                                }`}\n                              >\n                                {news.sentiment_score > 0.1 ? `📈 ${t.stockDetail.positive}` : \n                                 news.sentiment_score < -0.1 ? `📉 ${t.stockDetail.negative}` : `➖ ${t.stockDetail.neutral}`}\n                              </Badge>\n                            )}\n                            {news.has_analysis && (\n                              <Badge variant=\"outline\" className=\"text-xs px-2 py-0.5\">\n                                {t.stockDetail.analyzed}\n                              </Badge>\n                            )}\n                          </div>\n                          {news.sentiment_score !== null && (\n                            <span className=\"text-xs text-gray-400\">\n                              {news.sentiment_score > 0 ? '+' : ''}{news.sentiment_score.toFixed(2)}\n                            </span>\n                          )}\n                        </div>\n                      </CardContent>\n                    </Card>\n                  ))}\n                </div>\n                \n                  {/* 继续扩展按钮 */}\n                  {hasMoreNews && (\n                    <div className=\"text-center pt-4\">\n                      <Button\n                        variant=\"outline\"\n                        onClick={() => setNewsDisplayCount(prev => prev + 12)}\n                        className=\"gap-2 hover:bg-blue-50\"\n                      >\n                        <ChevronDown className=\"w-4 h-4\" />\n                        {t.stockDetail.loadMore} ({t.stockDetail.remaining} {(newsList?.length || 0) - newsDisplayCount} {t.stockDetail.items})\n                      </Button>\n                    </div>\n                  )}\n                  \n                  {/* 已显示全部提示 */}\n                  {!hasMoreNews && newsList && newsList.length > 12 && (\n                    <div className=\"text-center pt-4 text-sm text-gray-400\">\n                      {t.stockDetail.showAll} {newsList.length} {t.stockDetail.items}\n                    </div>\n                  )}\n                </div>\n              ) : (\n                <div className=\"text-center py-8 text-gray-500\">\n                  <p className=\"text-sm\">{t.stockDetail.newsFolded}</p>\n                </div>\n              )\n            ) : (\n              <div className=\"text-center py-12 text-gray-500\">\n                <Newspaper className=\"w-12 h-12 mx-auto opacity-50 mb-3\" />\n                <p>{t.stockDetail.noRelatedNews}</p>\n                <p className=\"text-sm mt-1\">{t.stockDetail.clickCrawl}</p>\n                </div>\n              )}\n            </CardContent>\n          </Card>\n\n          {/* 情感趋势图 */}\n          <Card className=\"bg-white/90\">\n            <CardHeader>\n              <CardTitle className=\"flex items-center gap-2\">\n                <MessageSquare className=\"w-5 h-5 text-purple-500\" />\n                {t.stockDetail.sentimentTrend}\n              </CardTitle>\n              <CardDescription>\n                {t.stockDetail.sentimentDesc}\n              </CardDescription>\n            </CardHeader>\n            <CardContent>\n              {trendLoading ? (\n                <div className=\"h-64 flex items-center justify-center\">\n                  <Loader2 className=\"w-8 h-8 animate-spin text-purple-500\" />\n                </div>\n              ) : sentimentTrend && sentimentTrend.length > 0 ? (\n                <ResponsiveContainer width=\"100%\" height={300}>\n                  <ComposedChart data={sentimentTrend}>\n                    <CartesianGrid strokeDasharray=\"3 3\" stroke=\"#e5e7eb\" />\n                    <XAxis \n                      dataKey=\"date\" \n                      tick={{ fontSize: 10 }}\n                      tickFormatter={(value) => value.slice(5)}\n                    />\n                    <YAxis \n                      yAxisId=\"left\"\n                      domain={[-1, 1]}\n                      tick={{ fontSize: 10 }}\n                    />\n                    <YAxis \n                      yAxisId=\"right\"\n                      orientation=\"right\"\n                      tick={{ fontSize: 10 }}\n                    />\n                    <Tooltip\n                      contentStyle={{\n                        backgroundColor: 'rgba(255, 255, 255, 0.95)',\n                        borderRadius: '8px',\n                        border: '1px solid #e5e7eb',\n                      }}\n                    />\n                    <Legend />\n                    <Bar \n                      yAxisId=\"right\"\n                      dataKey=\"positive_count\" \n                      stackId=\"a\" \n                      fill=\"#10b981\" \n                      name={t.stockDetail.positive}\n                    />\n                    <Bar \n                      yAxisId=\"right\"\n                      dataKey=\"neutral_count\" \n                      stackId=\"a\" \n                      fill=\"#f59e0b\" \n                      name={t.stockDetail.neutral}\n                    />\n                    <Bar \n                      yAxisId=\"right\"\n                      dataKey=\"negative_count\" \n                      stackId=\"a\" \n                      fill=\"#ef4444\" \n                      name={t.stockDetail.negative}\n                    />\n                    <Line\n                      yAxisId=\"left\"\n                      type=\"monotone\"\n                      dataKey=\"avg_sentiment\"\n                      stroke=\"#8b5cf6\"\n                      strokeWidth={2}\n                      dot={false}\n                      name={t.stockDetail.avgSentiment}\n                    />\n                  </ComposedChart>\n                </ResponsiveContainer>\n              ) : (\n                <div className=\"h-64 flex items-center justify-center text-gray-500\">\n                  暂无数据\n                </div>\n              )}\n            </CardContent>\n          </Card>\n\n      {/* Bull vs Bear 辩论 */}\n        <div className=\"space-y-6\">\n          {/* 触发辩论按钮 */}\n          <Card className=\"bg-gradient-to-r from-emerald-50 to-rose-50 border-none\">\n            <CardContent className=\"py-6\">\n              <div className=\"flex flex-col gap-4\">\n                <div className=\"flex items-center justify-between\">\n                  <div className=\"flex items-center gap-4\">\n                    <div className=\"flex -space-x-2\">\n                      <div className=\"w-12 h-12 rounded-full bg-emerald-500 flex items-center justify-center text-white shadow-lg\">\n                        <ThumbsUp className=\"w-6 h-6\" />\n                      </div>\n                      <div className=\"w-12 h-12 rounded-full bg-rose-500 flex items-center justify-center text-white shadow-lg\">\n                        <ThumbsDown className=\"w-6 h-6\" />\n                      </div>\n                    </div>\n                    <div>\n                      <h3 className=\"font-semibold text-gray-900\">{t.stockDetail.bullBear}</h3>\n                      <p className=\"text-sm text-gray-500\">\n                        {t.stockDetail.bullBearDesc}\n                      </p>\n                    </div>\n                  </div>\n                  <Button\n                    onClick={handleStartDebate}\n                    disabled={isStreaming || debateMutation.isPending}\n                    className=\"bg-gradient-to-r from-emerald-500 to-rose-500 hover:from-emerald-600 hover:to-rose-600\"\n                  >\n                    {isStreaming || debateMutation.isPending ? (\n                      <>\n                        <Loader2 className=\"w-4 h-4 mr-2 animate-spin\" />\n                        {t.stockDetail.debating}\n                      </>\n                    ) : (\n                      <>\n                        <Swords className=\"w-4 h-4 mr-2\" />\n                        {t.stockDetail.startDebate}\n                      </>\n                    )}\n                  </Button>\n                </div>\n                {/* 辩论模式选择器 */}\n                <div className=\"flex items-center gap-3 pt-2 border-t border-gray-100\">\n                  <span className=\"text-sm text-gray-500\">{t.stockDetail.analysisMode}:</span>\n                  <DebateModeSelector\n                    value={debateMode}\n                    onChange={setDebateMode}\n                    disabled={debateMutation.isPending}\n                  />\n                </div>\n              </div>\n            </CardContent>\n          </Card>\n\n          {/* 流式辩论进行中 - 实时显示内容 */}\n          {isStreaming && (\n            <>\n              {/* 阶段指示器 - 仅非聊天室模式显示 */}\n              {debateMode !== 'realtime_debate' && (\n                <div className=\"flex items-center justify-between mb-4\">\n                  <div className=\"flex items-center gap-2\">\n                    <Loader2 className=\"w-4 h-4 animate-spin text-blue-500\" />\n                    <span className=\"text-sm text-blue-600 font-medium\">\n                      {streamPhase === 'start' && (t.stockDetail.history === '历史' ? '正在初始化...' : 'Initializing...')}\n                      {streamPhase === 'data_collection' && (t.stockDetail.history === '历史' ? '📊 数据专员正在搜集资料...' : '📊 Data Collector is gathering materials...')}\n                      {streamPhase === 'analyzing' && `🚀 ${t.stockDetail.quickAnalysis || 'Quick Analysis'}...`}\n                      {streamPhase === 'parallel_analysis' && `⚡ Bull/Bear ${t.stockDetail.parallelAnalysis}...`}\n                      {streamPhase === 'debate' && `🎭 ${t.stockDetail.realtimeDebate}...`}\n                      {streamPhase === 'decision' && `⚖️ ${t.stockDetail.managerDecision}...`}\n                      {streamPhase === 'complete' && (t.stockDetail.history === '历史' ? '✅ 分析完成' : '✅ Analysis Complete')}\n                    </span>\n                  </div>\n                </div>\n              )}\n\n              {/* 快速分析模式 - 流式显示 */}\n              {debateMode === 'quick_analysis' && (\n                <Card className=\"bg-gradient-to-r from-blue-50 to-cyan-50 border-none\">\n                  <CardHeader className=\"pb-3\">\n                    <CardTitle className=\"flex items-center gap-2 text-blue-700\">\n                      <div className={`w-10 h-10 rounded-full bg-blue-100 flex items-center justify-center ${activeAgent === 'QuickAnalyst' ? 'animate-pulse ring-2 ring-blue-400' : ''}`}>\n                        <Activity className=\"w-5 h-5 text-blue-600\" />\n                      </div>\n                      🚀 {t.stockDetail.quickAnalysis || 'Quick Analysis'}\n                      {activeAgent === 'QuickAnalyst' && <span className=\"text-xs bg-blue-200 px-2 py-0.5 rounded animate-pulse\">{t.stockDetail.history === '历史' ? '输出中...' : 'Outputting...'}</span>}\n                    </CardTitle>\n                    <CardDescription>\n                      <Bot className=\"w-3 h-3 inline mr-1\" />\n                      QuickAnalyst · {t.stockDetail.quickAnalysis || 'Quick Analysis'}\n                    </CardDescription>\n                  </CardHeader>\n                  <CardContent>\n                    {streamingContent.quick ? (\n                      <div className=\"prose prose-sm max-w-none prose-headings:text-blue-800\">\n                        <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                          {streamingContent.quick}\n                        </ReactMarkdown>\n                        {activeAgent === 'QuickAnalyst' && <span className=\"inline-block w-2 h-4 bg-blue-500 animate-pulse ml-1\" />}\n                      </div>\n                    ) : (\n                      <div className=\"flex flex-col items-center justify-center py-12 text-gray-500\">\n                        <Loader2 className=\"w-10 h-10 animate-spin text-blue-500 mb-4\" />\n                        <p className=\"text-sm font-medium\">{t.stockDetail.waitingAnalysis}</p>\n                      </div>\n                    )}\n                  </CardContent>\n                </Card>\n              )}\n\n              {/* 实时辩论模式 - 聊天室界面 */}\n              {debateMode === 'realtime_debate' && (\n                <DebateChatRoom\n                  messages={chatMessages}\n                  onSendMessage={handleUserSendMessage}\n                  isDebating={isStreaming}\n                  currentRound={currentRound}\n                  activeAgent={activeAgent}\n                  stockName={stockName}\n                    historySessions={historySessions}\n                    onLoadSession={(sessionId) => {\n                      const session = loadSession(stockCode, sessionId)\n                      if (session) {\n                        setChatMessages(session.messages)\n                        toast.success(t.stockDetail.historySessionLoaded)\n                      }\n                    }}\n                    onClearHistory={() => {\n                      clearStockHistory(stockCode)\n                      toast.success(t.stockDetail.allHistoryCleared)\n                    }}\n                    onConfirmSearch={handleConfirmSearch}\n                    onCancelSearch={handleCancelSearch}\n                />\n              )}\n\n              {/* 并行模式 - 分栏显示 */}\n              {debateMode === 'parallel' && (\n                <div className=\"grid grid-cols-1 lg:grid-cols-2 gap-6\">\n                  {/* 看多观点 - 流式 */}\n                  <Card className={`bg-white/90 border-l-4 border-l-emerald-500 ${activeAgent === 'BullResearcher' ? 'ring-2 ring-emerald-400' : ''}`}>\n                    <CardHeader className=\"pb-3\">\n                      <div className=\"flex items-start justify-between gap-2\">\n                        <div className=\"flex-1\">\n                          <CardTitle className=\"flex items-center gap-2 text-emerald-700\">\n                            <div className={`w-8 h-8 rounded-full bg-emerald-100 flex items-center justify-center ${activeAgent === 'BullResearcher' ? 'animate-pulse' : ''}`}>\n                              <ThumbsUp className=\"w-4 h-4 text-emerald-600\" />\n                            </div>\n                            {t.stockDetail.bullView}\n                            {activeAgent === 'BullResearcher' && <span className=\"text-xs bg-emerald-200 px-2 py-0.5 rounded animate-pulse\">{t.stockDetail.outputting}</span>}\n                          </CardTitle>\n                          <CardDescription>\n                            <Bot className=\"w-3 h-3 inline mr-1\" />\n                            BullResearcher · {t.stockDetail.bullView}\n                          </CardDescription>\n                        </div>\n                        {/* 操作按钮组 */}\n                        {streamingContent.bull && (\n                          <div className=\"flex items-center gap-1\">\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={() => handleCopyContent(streamingContent.bull, t.stockDetail.bullView)}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.copy}\n                            >\n                              <Copy className=\"w-3.5 h-3.5\" />\n                            </Button>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={() => handleExportToFile(\n                                streamingContent.bull, \n                                `${stockName}_${t.stockDetail.bullView}_${new Date().toISOString().slice(0,10)}.md`\n                              )}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.export}\n                            >\n                              <FileDown className=\"w-3.5 h-3.5\" />\n                            </Button>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={handleStartDebate}\n                              disabled={isStreaming}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.regenerate}\n                            >\n                              <RefreshCw className=\"w-3.5 h-3.5\" />\n                            </Button>\n                          </div>\n                        )}\n                      </div>\n                    </CardHeader>\n                    <CardContent>\n                      {streamingContent.bull ? (\n                        <div className=\"prose prose-sm max-w-none prose-headings:text-emerald-800 max-h-96 overflow-y-auto\">\n                          <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                            {streamingContent.bull}\n                          </ReactMarkdown>\n                          {activeAgent === 'BullResearcher' && <span className=\"inline-block w-2 h-4 bg-emerald-500 animate-pulse ml-1\" />}\n                        </div>\n                      ) : (\n                        <div className=\"flex flex-col items-center justify-center py-12 text-gray-500\">\n                          <Loader2 className=\"w-8 h-8 animate-spin text-emerald-500 mb-4\" />\n                          <p className=\"text-sm\">等待分析...</p>\n                        </div>\n                      )}\n                    </CardContent>\n                  </Card>\n\n                  {/* 看空观点 - 流式 */}\n                  <Card className={`bg-white/90 border-l-4 border-l-rose-500 ${activeAgent === 'BearResearcher' ? 'ring-2 ring-rose-400' : ''}`}>\n                    <CardHeader className=\"pb-3\">\n                      <div className=\"flex items-start justify-between gap-2\">\n                        <div className=\"flex-1\">\n                          <CardTitle className=\"flex items-center gap-2 text-rose-700\">\n                            <div className={`w-8 h-8 rounded-full bg-rose-100 flex items-center justify-center ${activeAgent === 'BearResearcher' ? 'animate-pulse' : ''}`}>\n                              <ThumbsDown className=\"w-4 h-4 text-rose-600\" />\n                            </div>\n                            {t.stockDetail.bearView}\n                            {activeAgent === 'BearResearcher' && <span className=\"text-xs bg-rose-200 px-2 py-0.5 rounded animate-pulse\">{t.stockDetail.outputting}</span>}\n                          </CardTitle>\n                          <CardDescription>\n                            <Bot className=\"w-3 h-3 inline mr-1\" />\n                            BearResearcher · {t.stockDetail.bearView}\n                          </CardDescription>\n                        </div>\n                        {/* 操作按钮组 */}\n                        {streamingContent.bear && (\n                          <div className=\"flex items-center gap-1\">\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={() => handleCopyContent(streamingContent.bear, t.stockDetail.bearView)}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.copy}\n                            >\n                              <Copy className=\"w-3.5 h-3.5\" />\n                            </Button>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={() => handleExportToFile(\n                                streamingContent.bear, \n                                `${stockName}_${t.stockDetail.bearView}_${new Date().toISOString().slice(0,10)}.md`\n                              )}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.export}\n                            >\n                              <FileDown className=\"w-3.5 h-3.5\" />\n                            </Button>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={handleStartDebate}\n                              disabled={isStreaming}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.regenerate}\n                            >\n                              <RefreshCw className=\"w-3.5 h-3.5\" />\n                            </Button>\n                          </div>\n                        )}\n                      </div>\n                    </CardHeader>\n                    <CardContent>\n                      {streamingContent.bear ? (\n                        <div className=\"prose prose-sm max-w-none prose-headings:text-rose-800 max-h-96 overflow-y-auto\">\n                          <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                            {streamingContent.bear}\n                          </ReactMarkdown>\n                          {activeAgent === 'BearResearcher' && <span className=\"inline-block w-2 h-4 bg-rose-500 animate-pulse ml-1\" />}\n                        </div>\n                      ) : (\n                        <div className=\"flex flex-col items-center justify-center py-12 text-gray-500\">\n                          <Loader2 className=\"w-8 h-8 animate-spin text-rose-500 mb-4\" />\n                          <p className=\"text-sm\">等待分析...</p>\n                        </div>\n                      )}\n                    </CardContent>\n                  </Card>\n\n                  {/* 投资经理决策 - 流式 */}\n                  <Card className={`lg:col-span-2 bg-gradient-to-r from-blue-50 to-indigo-50 border-none ${activeAgent === 'InvestmentManager' ? 'ring-2 ring-indigo-400' : ''}`}>\n                    <CardHeader className=\"pb-3\">\n                      <div className=\"flex items-start justify-between gap-2\">\n                        <div className=\"flex-1\">\n                          <CardTitle className=\"flex items-center gap-2 text-indigo-700\">\n                            <div className={`w-10 h-10 rounded-full bg-indigo-100 flex items-center justify-center ${activeAgent === 'InvestmentManager' ? 'animate-pulse' : ''}`}>\n                              <Scale className=\"w-5 h-5 text-indigo-600\" />\n                            </div>\n                            {t.stockDetail.managerDecision}\n                            {activeAgent === 'InvestmentManager' && <span className=\"text-xs bg-indigo-200 px-2 py-0.5 rounded animate-pulse\">{t.stockDetail.deciding}</span>}\n                          </CardTitle>\n                          <CardDescription>\n                            <Bot className=\"w-3 h-3 inline mr-1\" />\n                            InvestmentManager · {t.stockDetail.managerDecision}\n                          </CardDescription>\n                        </div>\n                        {/* 操作按钮组 */}\n                        {streamingContent.manager && (\n                          <div className=\"flex items-center gap-1\">\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={() => handleCopyContent(streamingContent.manager, t.stockDetail.managerDecision)}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.copy}\n                            >\n                              <Copy className=\"w-3.5 h-3.5\" />\n                            </Button>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={() => handleExportToFile(\n                                streamingContent.manager, \n                                `${stockName}_${t.stockDetail.managerDecision}_${new Date().toISOString().slice(0,10)}.md`\n                              )}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.export}\n                            >\n                              <FileDown className=\"w-3.5 h-3.5\" />\n                            </Button>\n                            <Button\n                              variant=\"ghost\"\n                              size=\"sm\"\n                              onClick={handleStartDebate}\n                              disabled={isStreaming}\n                              className=\"h-8 px-2\"\n                              title={t.stockDetail.regenerate}\n                            >\n                              <RefreshCw className=\"w-3.5 h-3.5\" />\n                            </Button>\n                          </div>\n                        )}\n                      </div>\n                    </CardHeader>\n                    <CardContent>\n                      {streamingContent.manager ? (\n                        <div className=\"prose prose-sm max-w-none prose-headings:text-indigo-800\">\n                          <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                            {streamingContent.manager}\n                          </ReactMarkdown>\n                          {activeAgent === 'InvestmentManager' && <span className=\"inline-block w-2 h-4 bg-indigo-500 animate-pulse ml-1\" />}\n                        </div>\n                      ) : (\n                        <div className=\"flex flex-col items-center justify-center py-8 text-gray-500\">\n                          <Loader2 className=\"w-10 h-10 animate-spin text-indigo-500 mb-4\" />\n                          <p className=\"text-sm font-medium\">{t.stockDetail.waitingDecision}</p>\n                        </div>\n                      )}\n                    </CardContent>\n                  </Card>\n                </div>\n              )}\n            </>\n          )}\n\n          {/* 辩论结果 */}\n          {!debateMutation.isPending && debateResult && debateResult.success && (\n            <>\n              {/* 快速分析结果 */}\n              {debateResult.mode === 'quick_analysis' && debateResult.quick_analysis && (\n                <Card className=\"bg-gradient-to-br from-blue-50 to-cyan-50 border-blue-200\">\n                  <CardHeader>\n                    <CardTitle className=\"flex items-center gap-2 text-blue-800\">\n                      <div className=\"w-10 h-10 rounded-full bg-blue-100 flex items-center justify-center\">\n                        <Activity className=\"w-5 h-5 text-blue-600\" />\n                      </div>\n                      🚀 {t.stockDetail.quickAnalysis} {t.stockDetail.result}\n                    </CardTitle>\n                    <CardDescription className=\"flex items-center gap-4\">\n                      <span>\n                        <Bot className=\"w-3 h-3 inline mr-1\" />\n                        QuickAnalyst · {t.stockDetail.quickAnalysis}\n                      </span>\n                      {debateResult.execution_time && (\n                        <span className=\"text-xs bg-blue-100 px-2 py-0.5 rounded\">\n                          {t.stockDetail.executionTime} {debateResult.execution_time.toFixed(1)}s\n                        </span>\n                      )}\n                    </CardDescription>\n                  </CardHeader>\n                  <CardContent>\n                    <div className=\"prose prose-sm max-w-none prose-headings:text-blue-800 prose-headings:font-semibold\">\n                      <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                        {debateResult.quick_analysis.analysis || t.stockDetail.analysisComplete}\n                      </ReactMarkdown>\n                    </div>\n                  </CardContent>\n                </Card>\n              )}\n\n              {/* 实时辩论结果 - 显示聊天室 */}\n              {debateResult.mode === 'realtime_debate' && chatMessages.length > 0 && (\n                <div className=\"space-y-4\">\n                  <DebateChatRoom\n                    messages={chatMessages}\n                    onSendMessage={handleUserSendMessage}\n                    isDebating={false}\n                    currentRound={null}\n                    activeAgent={null}\n                    stockName={stockName}\n                    historySessions={historySessions}\n                    onLoadSession={(sessionId) => {\n                      const session = loadSession(stockCode, sessionId)\n                      if (session) {\n                        setChatMessages(session.messages)\n                        toast.success(t.stockDetail.historySessionLoaded)\n                      }\n                    }}\n                    onClearHistory={() => {\n                      clearStockHistory(stockCode)\n                      toast.success(t.stockDetail.allHistoryCleared)\n                    }}\n                    onConfirmSearch={handleConfirmSearch}\n                    onCancelSearch={handleCancelSearch}\n                  />\n                  {/* 投资经理决策摘要 */}\n                  {debateResult.final_decision && (\n                    <Card className=\"bg-gradient-to-br from-blue-50 to-purple-50 border-blue-200\">\n                      <CardHeader>\n                        <CardTitle className=\"flex items-center gap-2 text-blue-800\">\n                          <div className=\"w-10 h-10 rounded-full bg-blue-100 flex items-center justify-center\">\n                            <Scale className=\"w-5 h-5 text-blue-600\" />\n                          </div>\n                          📊 {t.stockDetail.managerDecision}\n                          {debateResult.final_decision?.rating && (\n                            <Badge \n                              className={`ml-2 ${\n                                debateResult.final_decision.rating === '强烈推荐' || debateResult.final_decision.rating === '推荐' ||\n                                debateResult.final_decision.rating === t.stockDetail.stronglyRec || debateResult.final_decision.rating === t.stockDetail.recommend ||\n                                debateResult.final_decision.rating === 'Strongly Recommend' || debateResult.final_decision.rating === 'Recommend'\n                                  ? 'bg-emerald-500' \n                                  : debateResult.final_decision.rating === '中性' || debateResult.final_decision.rating === 'Neutral'\n                                  ? 'bg-amber-500'\n                                  : 'bg-rose-500'\n                              }`}\n                            >\n                              {debateResult.final_decision.rating}\n                            </Badge>\n                          )}\n                        </CardTitle>\n                      </CardHeader>\n                    </Card>\n                  )}\n                </div>\n              )}\n\n              {/* 并行分析结果 */}\n              {(debateResult.mode === 'parallel' || !debateResult.mode) && (\n                <div className=\"grid grid-cols-1 lg:grid-cols-2 gap-6\">\n                  {/* 看多观点 */}\n                  <Card className=\"bg-white/90 border-l-4 border-l-emerald-500\">\n                    <CardHeader className=\"pb-3\">\n                      <div className=\"flex items-start justify-between gap-2\">\n                        <div className=\"flex-1\">\n                          <CardTitle className=\"flex items-center gap-2 text-emerald-700\">\n                            <div className=\"w-8 h-8 rounded-full bg-emerald-100 flex items-center justify-center\">\n                              <ThumbsUp className=\"w-4 h-4 text-emerald-600\" />\n                            </div>\n                            {t.stockDetail.bullView}\n                          </CardTitle>\n                          <CardDescription>\n                            <Bot className=\"w-3 h-3 inline mr-1\" />\n                            {debateResult.bull_analysis?.agent_name || 'BullResearcher'} · {t.stockDetail.bullResearcher}\n                          </CardDescription>\n                        </div>\n                        {/* 操作按钮组 */}\n                        <div className=\"flex items-center gap-1\">\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={() => handleCopyContent(debateResult.bull_analysis?.analysis || '', t.stockDetail.bullView)}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.copy}\n                          >\n                            <Copy className=\"w-3.5 h-3.5\" />\n                          </Button>\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={() => handleExportToFile(\n                              debateResult.bull_analysis?.analysis || '', \n                              `${stockName}_${t.stockDetail.bullView}_${new Date().toISOString().slice(0,10)}.md`\n                            )}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.export}\n                          >\n                            <FileDown className=\"w-3.5 h-3.5\" />\n                          </Button>\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={handleStartDebate}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.regenerate}\n                          >\n                            <RefreshCw className=\"w-3.5 h-3.5\" />\n                          </Button>\n                        </div>\n                      </div>\n                    </CardHeader>\n                    <CardContent>\n                      <div className=\"prose prose-sm max-w-none prose-headings:text-emerald-800 prose-headings:font-semibold\">\n                        <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                          {debateResult.bull_analysis?.analysis || t.stockDetail.analysisGenerating}\n                        </ReactMarkdown>\n                      </div>\n                    </CardContent>\n                  </Card>\n\n                  {/* 看空观点 */}\n                  <Card className=\"bg-white/90 border-l-4 border-l-rose-500\">\n                    <CardHeader className=\"pb-3\">\n                      <div className=\"flex items-start justify-between gap-2\">\n                        <div className=\"flex-1\">\n                          <CardTitle className=\"flex items-center gap-2 text-rose-700\">\n                            <div className=\"w-8 h-8 rounded-full bg-rose-100 flex items-center justify-center\">\n                              <ThumbsDown className=\"w-4 h-4 text-rose-600\" />\n                            </div>\n                            {t.stockDetail.bearView}\n                          </CardTitle>\n                          <CardDescription>\n                            <Bot className=\"w-3 h-3 inline mr-1\" />\n                            {debateResult.bear_analysis?.agent_name || 'BearResearcher'} · {t.stockDetail.bearResearcher}\n                          </CardDescription>\n                        </div>\n                        {/* 操作按钮组 */}\n                        <div className=\"flex items-center gap-1\">\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={() => handleCopyContent(debateResult.bear_analysis?.analysis || '', t.stockDetail.bearView)}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.copy}\n                          >\n                            <Copy className=\"w-3.5 h-3.5\" />\n                          </Button>\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={() => handleExportToFile(\n                              debateResult.bear_analysis?.analysis || '', \n                              `${stockName}_${t.stockDetail.bearView}_${new Date().toISOString().slice(0,10)}.md`\n                            )}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.export}\n                          >\n                            <FileDown className=\"w-3.5 h-3.5\" />\n                          </Button>\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={handleStartDebate}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.regenerate}\n                          >\n                            <RefreshCw className=\"w-3.5 h-3.5\" />\n                          </Button>\n                        </div>\n                      </div>\n                    </CardHeader>\n                    <CardContent>\n                      <div className=\"prose prose-sm max-w-none prose-headings:text-rose-800 prose-headings:font-semibold\">\n                        <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                          {debateResult.bear_analysis?.analysis || t.stockDetail.analysisGenerating}\n                        </ReactMarkdown>\n                      </div>\n                    </CardContent>\n                  </Card>\n\n                  {/* 最终决策 */}\n                  <Card className=\"lg:col-span-2 bg-gradient-to-br from-blue-50 to-purple-50 border-blue-200\">\n                    <CardHeader>\n                      <div className=\"flex items-start justify-between gap-2\">\n                        <div className=\"flex-1\">\n                          <CardTitle className=\"flex items-center gap-2 text-blue-800\">\n                            <div className=\"w-10 h-10 rounded-full bg-blue-100 flex items-center justify-center\">\n                              <Scale className=\"w-5 h-5 text-blue-600\" />\n                            </div>\n                            {t.stockDetail.managerDecision}\n                            {debateResult.final_decision?.rating && (\n                              <Badge \n                                className={`ml-2 ${\n                                  debateResult.final_decision.rating === '强烈推荐' || debateResult.final_decision.rating === '推荐' ||\n                                  debateResult.final_decision.rating === t.stockDetail.stronglyRec || debateResult.final_decision.rating === t.stockDetail.recommend ||\n                                  debateResult.final_decision.rating === 'Strongly Recommend' || debateResult.final_decision.rating === 'Recommend'\n                                    ? 'bg-emerald-500'\n                                    : debateResult.final_decision.rating === '回避' || debateResult.final_decision.rating === '谨慎' ||\n                                      debateResult.final_decision.rating === t.stockDetail.avoid || debateResult.final_decision.rating === t.stockDetail.caution ||\n                                      debateResult.final_decision.rating === 'Avoid' || debateResult.final_decision.rating === 'Caution'\n                                    ? 'bg-rose-500'\n                                    : 'bg-amber-500'\n                                }`}\n                              >\n                                {debateResult.final_decision.rating}\n                              </Badge>\n                            )}\n                          </CardTitle>\n                          <CardDescription className=\"flex items-center gap-4\">\n                            <span>\n                              <Bot className=\"w-3 h-3 inline mr-1\" />\n                              {debateResult.final_decision?.agent_name || 'InvestmentManager'} · {t.stockDetail.investmentManager}\n                            </span>\n                            {debateResult.execution_time && (\n                              <span className=\"text-xs bg-blue-100 px-2 py-0.5 rounded\">\n                                {t.stockDetail.executionTime} {debateResult.execution_time.toFixed(1)}s\n                              </span>\n                            )}\n                          </CardDescription>\n                        </div>\n                        {/* 操作按钮组 */}\n                        <div className=\"flex items-center gap-1\">\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={() => handleCopyContent(debateResult.final_decision?.decision || '', t.stockDetail.managerDecision)}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.copy}\n                          >\n                            <Copy className=\"w-3.5 h-3.5\" />\n                          </Button>\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={() => handleExportToFile(\n                              debateResult.final_decision?.decision || '', \n                              `${stockName}_${t.stockDetail.managerDecision}_${new Date().toISOString().slice(0,10)}.md`\n                            )}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.export}\n                          >\n                            <FileDown className=\"w-3.5 h-3.5\" />\n                          </Button>\n                          <Button\n                            variant=\"ghost\"\n                            size=\"sm\"\n                            onClick={handleStartDebate}\n                            className=\"h-8 px-2\"\n                            title={t.stockDetail.regenerate}\n                          >\n                            <RefreshCw className=\"w-3.5 h-3.5\" />\n                          </Button>\n                        </div>\n                      </div>\n                    </CardHeader>\n                    <CardContent>\n                      <div className=\"prose prose-sm max-w-none prose-headings:text-blue-800 prose-headings:font-semibold\">\n                        <ReactMarkdown remarkPlugins={[remarkGfm]}>\n                          {debateResult.final_decision?.decision || t.stockDetail.decisionGenerating}\n                        </ReactMarkdown>\n                      </div>\n                    </CardContent>\n                  </Card>\n                </div>\n              )}\n            </>\n          )}\n\n          {/* 辩论失败 */}\n          {debateResult && !debateResult.success && (\n            <Card className=\"bg-rose-50 border-rose-200\">\n              <CardContent className=\"py-6\">\n                <p className=\"text-rose-700\">{t.stockDetail.debateFailed}: {debateResult.error}</p>\n              </CardContent>\n            </Card>\n          )}\n\n          {/* 初始状态 */}\n          {!debateResult && !debateMutation.isPending && (\n            <Card className=\"bg-gray-50\">\n              <CardContent className=\"py-12 text-center text-gray-500\">\n                <Swords className=\"w-16 h-16 mx-auto opacity-50 mb-4\" />\n                <p className=\"text-lg\">{t.stockDetail.clickDebate}</p>\n                <p className=\"text-sm mt-2\">\n                  {t.stockDetail.debateDesc}\n                </p>\n              </CardContent>\n            </Card>\n          )}\n        </div>\n\n      {/* 新闻详情抽屉 */}\n      <NewsDetailDrawer\n        newsId={selectedNewsId}\n        open={drawerOpen}\n        onOpenChange={(open) => {\n          setDrawerOpen(open)\n          if (!open) {\n            // 延迟清除newsId，避免关闭动画时闪烁\n            setTimeout(() => setSelectedNewsId(null), 300)\n          }\n        }}\n      />\n      \n      {/* 历史记录侧边栏 */}\n      <DebateHistorySidebar\n        sessions={historySessions}\n        currentSessionId={currentSession?.id}\n        onLoadSession={(session) => {\n          restoreSessionState(session)\n          setShowHistorySidebar(false)\n          toast.success(`${t.stockDetail.historySessionLoaded || '已加载历史会话'}：${session.mode === 'realtime_debate' ? t.stockDetail.realtimeDebate : session.mode === 'parallel' ? t.stockDetail.parallelAnalysis : (t.stockDetail.quickAnalysis || 'Quick Analysis')}`)\n        }}\n        onDeleteSession={(sessionId) => {\n          deleteSession(stockCode, sessionId)\n          toast.success(t.stockDetail.sessionDeleted)\n        }}\n        onClearHistory={() => {\n          clearStockHistory(stockCode)\n          setDebateResult(null)\n          setStreamingContent({ bull: '', bear: '', manager: '', quick: '' })\n          setChatMessages([])\n          toast.success(t.stockDetail.allHistoryCleared)\n        }}\n        isOpen={showHistorySidebar}\n        onToggle={() => setShowHistorySidebar(!showHistorySidebar)}\n      />\n    </div>\n  )\n}\n"
  },
  {
    "path": "frontend/src/pages/StockSearchPage.tsx",
    "content": "/**\n * 股票搜索入口页面\n * 风格参考 Manus/ChatGPT 的对话入口\n */\nimport { useState, useCallback, useRef, useEffect } from 'react'\nimport { useQuery, useMutation, useQueryClient } from '@tanstack/react-query'\nimport { useNavigate } from 'react-router-dom'\nimport { stockApi } from '@/lib/api-client'\nimport { cn } from '@/lib/utils'\nimport { \n  Search, \n  Loader2, \n  Database, \n  RefreshCw, \n  TrendingUp,\n  Sparkles,\n  ArrowRight,\n  BarChart3\n} from 'lucide-react'\nimport { toast } from 'sonner'\nimport { useGlobalI18n } from '@/store/useLanguageStore'\n\nexport default function StockSearchPage() {\n  const t = useGlobalI18n()\n  const [keyword, setKeyword] = useState('')\n  const [isOpen, setIsOpen] = useState(false)\n  const [selectedIndex, setSelectedIndex] = useState(-1)\n  const inputRef = useRef<HTMLInputElement>(null)\n  const listRef = useRef<HTMLDivElement>(null)\n  const navigate = useNavigate()\n  const queryClient = useQueryClient()\n\n  // 获取股票数量\n  const { data: stockCount } = useQuery({\n    queryKey: ['stock-count'],\n    queryFn: () => stockApi.getStockCount(),\n    staleTime: 60 * 1000,\n  })\n\n  // 初始化股票数据\n  const initMutation = useMutation({\n    mutationFn: () => stockApi.initStockData(),\n    onSuccess: (data) => {\n      if (data.success) {\n        toast.success(`成功导入 ${data.count} 只股票！`)\n        queryClient.invalidateQueries({ queryKey: ['stock-count'] })\n        queryClient.invalidateQueries({ queryKey: ['stock-search'] })\n      } else {\n        toast.error(data.message)\n      }\n    },\n    onError: (error: Error) => {\n      toast.error(`初始化失败: ${error.message}`)\n    },\n  })\n\n  // 搜索查询\n  const { data: searchResults, isLoading } = useQuery({\n    queryKey: ['stock-search', keyword],\n    queryFn: () => stockApi.searchRealtime(keyword, 15),\n    enabled: keyword.length >= 1,\n    staleTime: 30 * 1000,\n  })\n\n  // 处理选择股票\n  const handleSelect = useCallback((stock: { code: string; name: string; full_code: string }) => {\n    setKeyword('')\n    setIsOpen(false)\n    setSelectedIndex(-1)\n    navigate(`/stock/${stock.full_code}`)\n  }, [navigate])\n\n  // 键盘导航\n  const handleKeyDown = useCallback((e: React.KeyboardEvent) => {\n    if (!searchResults || searchResults.length === 0) return\n\n    switch (e.key) {\n      case 'ArrowDown':\n        e.preventDefault()\n        setSelectedIndex(prev => \n          prev < searchResults.length - 1 ? prev + 1 : 0\n        )\n        break\n      case 'ArrowUp':\n        e.preventDefault()\n        setSelectedIndex(prev => \n          prev > 0 ? prev - 1 : searchResults.length - 1\n        )\n        break\n      case 'Enter':\n        e.preventDefault()\n        if (selectedIndex >= 0 && searchResults[selectedIndex]) {\n          handleSelect(searchResults[selectedIndex])\n        }\n        break\n      case 'Escape':\n        setIsOpen(false)\n        setSelectedIndex(-1)\n        break\n    }\n  }, [searchResults, selectedIndex, handleSelect])\n\n  // 点击外部关闭\n  useEffect(() => {\n    const handleClickOutside = (e: MouseEvent) => {\n      if (\n        inputRef.current &&\n        !inputRef.current.contains(e.target as Node) &&\n        listRef.current &&\n        !listRef.current.contains(e.target as Node)\n      ) {\n        setIsOpen(false)\n      }\n    }\n\n    document.addEventListener('mousedown', handleClickOutside)\n    return () => document.removeEventListener('mousedown', handleClickOutside)\n  }, [])\n\n  // 滚动到选中项\n  useEffect(() => {\n    if (selectedIndex >= 0 && listRef.current) {\n      const selectedItem = listRef.current.children[selectedIndex] as HTMLElement\n      if (selectedItem) {\n        selectedItem.scrollIntoView({ block: 'nearest' })\n      }\n    }\n  }, [selectedIndex])\n\n  // 热门股票示例\n  const hotStocks = [\n    { code: '600519', name: '贵州茅台', full_code: 'SH600519' },\n    { code: '000001', name: '平安银行', full_code: 'SZ000001' },\n    { code: '601318', name: '中国平安', full_code: 'SH601318' },\n    { code: '000858', name: '五粮液', full_code: 'SZ000858' },\n  ]\n\n  return (\n    <div className=\"min-h-[calc(100vh-120px)] flex flex-col items-center justify-center px-4 bg-gradient-to-br from-slate-50 via-blue-50/30 to-indigo-50/50\">\n      {/* 标题区域 */}\n      <div className=\"text-center mb-10 animate-in fade-in-0 slide-in-from-bottom-4 duration-500\">\n        <div className=\"flex items-center justify-center gap-3 mb-4\">\n          <div className=\"w-12 h-12 rounded-2xl bg-gradient-to-br from-blue-500 to-indigo-600 flex items-center justify-center shadow-lg shadow-blue-500/25\">\n            <BarChart3 className=\"w-6 h-6 text-white\" />\n          </div>\n        </div>\n        <h1 className=\"text-4xl font-bold text-gray-900 tracking-tight mb-3\">\n          {t.stock.title}\n        </h1>\n        <p className=\"text-lg text-gray-500 max-w-md mx-auto\">\n          {t.stock.subtitle}\n        </p>\n      </div>\n\n      {/* 搜索框区域 */}\n      <div className=\"w-full max-w-2xl relative animate-in fade-in-0 slide-in-from-bottom-6 duration-500 delay-100\">\n        <div className={cn(\n          'relative bg-white rounded-2xl shadow-xl shadow-gray-200/50',\n          'border border-gray-100',\n          'transition-all duration-300',\n          isOpen && keyword.length >= 1 ? 'rounded-b-none' : ''\n        )}>\n          {/* 搜索图标 */}\n          <Search className=\"absolute left-5 top-1/2 -translate-y-1/2 w-5 h-5 text-gray-400\" />\n          \n          {/* 输入框 */}\n          <input\n            ref={inputRef}\n            type=\"text\"\n            value={keyword}\n            onChange={(e) => {\n              setKeyword(e.target.value)\n              setIsOpen(true)\n              setSelectedIndex(-1)\n            }}\n            onFocus={() => setIsOpen(true)}\n            onKeyDown={handleKeyDown}\n            placeholder={t.stock.searchPlaceholder}\n            className={cn(\n              'w-full pl-14 pr-14 py-5 text-lg',\n              'border-none rounded-2xl',\n              'focus:outline-none focus:ring-0',\n              'placeholder:text-gray-400',\n              'transition-all duration-200'\n            )}\n            autoFocus\n          />\n          \n          {/* 右侧图标 */}\n          <div className=\"absolute right-4 top-1/2 -translate-y-1/2 flex items-center gap-2\">\n            {isLoading ? (\n              <Loader2 className=\"w-5 h-5 text-gray-400 animate-spin\" />\n            ) : keyword.length > 0 ? (\n              <div className=\"w-8 h-8 rounded-lg bg-blue-600 flex items-center justify-center cursor-pointer hover:bg-blue-700 transition-colors\">\n                <ArrowRight className=\"w-4 h-4 text-white\" />\n              </div>\n            ) : (\n              <Sparkles className=\"w-5 h-5 text-gray-300\" />\n            )}\n          </div>\n        </div>\n\n        {/* 搜索结果下拉列表 */}\n        {isOpen && keyword.length >= 1 && (\n          <div\n            ref={listRef}\n            className={cn(\n              'absolute z-50 w-full',\n              'bg-white rounded-b-2xl shadow-xl shadow-gray-200/50',\n              'border border-t-0 border-gray-100',\n              'max-h-[400px] overflow-y-auto',\n              'animate-in fade-in-0 duration-150'\n            )}\n          >\n            {isLoading ? (\n              <div className=\"flex items-center justify-center py-10 text-gray-500\">\n                <Loader2 className=\"w-5 h-5 animate-spin mr-2\" />\n                {t.stock.searching}\n              </div>\n            ) : searchResults && searchResults.length > 0 ? (\n              <div className=\"py-2\">\n                {searchResults.map((stock, index) => (\n                  <div\n                    key={stock.code}\n                    onClick={() => handleSelect(stock)}\n                    className={cn(\n                      'flex items-center justify-between px-5 py-4 cursor-pointer',\n                      'transition-colors duration-100',\n                      selectedIndex === index\n                        ? 'bg-blue-50'\n                        : 'hover:bg-gray-50'\n                    )}\n                  >\n                    <div className=\"flex items-center gap-4\">\n                      <div className=\"w-10 h-10 rounded-xl bg-gradient-to-br from-blue-100 to-indigo-100 flex items-center justify-center\">\n                        <TrendingUp className=\"w-5 h-5 text-blue-600\" />\n                      </div>\n                      <div className=\"flex flex-col\">\n                        <span className=\"font-semibold text-gray-900\">\n                          {stock.name}\n                        </span>\n                        <span className=\"text-sm text-gray-500\">\n                          {stock.full_code}\n                        </span>\n                      </div>\n                    </div>\n                    <div className=\"flex items-center gap-2\">\n                      {stock.market && (\n                        <span className=\"text-xs px-2 py-1 bg-gray-100 text-gray-600 rounded-lg\">\n                          {stock.market}\n                        </span>\n                      )}\n                      {stock.industry && (\n                        <span className=\"text-xs text-gray-500\">\n                          {stock.industry}\n                        </span>\n                      )}\n                      <ArrowRight className=\"w-4 h-4 text-gray-300\" />\n                    </div>\n                  </div>\n                ))}\n              </div>\n            ) : (\n              <div className=\"py-10 text-center\">\n                {stockCount && stockCount.count === 0 ? (\n                  <div className=\"space-y-4\">\n                    <Database className=\"w-12 h-12 mx-auto text-gray-300\" />\n                    <p className=\"text-gray-500 font-medium\">{t.stock.emptyDb}</p>\n                    <p className=\"text-sm text-gray-400\">{t.stock.initTip}</p>\n                    <button\n                      onClick={(e) => {\n                        e.stopPropagation()\n                        initMutation.mutate()\n                      }}\n                      disabled={initMutation.isPending}\n                      className={cn(\n                        'inline-flex items-center gap-2 px-5 py-2.5 text-sm font-medium rounded-xl',\n                        'bg-blue-600 text-white hover:bg-blue-700',\n                        'disabled:opacity-50 disabled:cursor-not-allowed',\n                        'transition-colors shadow-lg shadow-blue-500/25'\n                      )}\n                    >\n                      {initMutation.isPending ? (\n                        <>\n                          <Loader2 className=\"w-4 h-4 animate-spin\" />\n                          {t.stock.importing}\n                        </>\n                      ) : (\n                        <>\n                          <RefreshCw className=\"w-4 h-4\" />\n                          {t.stock.initBtn}\n                        </>\n                      )}\n                    </button>\n                  </div>\n                ) : (\n                  <div>\n                    <p className=\"text-gray-500 font-medium\">{t.stock.notFound}</p>\n                    <p className=\"text-sm text-gray-400 mt-1\">{t.stock.tryInput}</p>\n                  </div>\n                )}\n              </div>\n            )}\n            \n            {/* 快捷键提示 */}\n            <div className=\"px-5 py-3 border-t border-gray-100 bg-gray-50/50\">\n              <div className=\"flex items-center gap-5 text-xs text-gray-400\">\n                <span className=\"flex items-center gap-1\">\n                  <kbd className=\"px-1.5 py-0.5 bg-white border border-gray-200 rounded text-gray-500 shadow-sm\">↑↓</kbd>\n                  <span>{t.stock.nav}</span>\n                </span>\n                <span className=\"flex items-center gap-1\">\n                  <kbd className=\"px-1.5 py-0.5 bg-white border border-gray-200 rounded text-gray-500 shadow-sm\">Enter</kbd>\n                  <span>{t.stock.select}</span>\n                </span>\n                <span className=\"flex items-center gap-1\">\n                  <kbd className=\"px-1.5 py-0.5 bg-white border border-gray-200 rounded text-gray-500 shadow-sm\">Esc</kbd>\n                  <span>{t.stock.close}</span>\n                </span>\n              </div>\n            </div>\n          </div>\n        )}\n      </div>\n\n      {/* 热门股票推荐 */}\n      {!isOpen && (\n        <div className=\"mt-10 animate-in fade-in-0 slide-in-from-bottom-8 duration-500 delay-200\">\n          <p className=\"text-sm text-gray-400 text-center mb-4\">{t.stock.hotStocks}</p>\n          <div className=\"flex flex-wrap justify-center gap-3\">\n            {hotStocks.map((stock) => (\n              <button\n                key={stock.code}\n                onClick={() => navigate(`/stock/${stock.full_code}`)}\n                className={cn(\n                  'flex items-center gap-2 px-4 py-2.5 rounded-xl',\n                  'bg-white border border-gray-100 shadow-sm',\n                  'hover:border-blue-200 hover:bg-blue-50/50 hover:shadow-md',\n                  'transition-all duration-200',\n                  'group'\n                )}\n              >\n                <TrendingUp className=\"w-4 h-4 text-gray-400 group-hover:text-blue-500 transition-colors\" />\n                <span className=\"font-medium text-gray-700 group-hover:text-blue-600 transition-colors\">\n                  {stock.name}\n                </span>\n                <span className=\"text-xs text-gray-400\">\n                  {stock.full_code}\n                </span>\n              </button>\n            ))}\n          </div>\n        </div>\n      )}\n\n      {/* 功能说明 */}\n      <div className=\"mt-16 grid grid-cols-3 gap-8 max-w-2xl animate-in fade-in-0 slide-in-from-bottom-10 duration-500 delay-300\">\n        <div className=\"text-center\">\n          <div className=\"w-10 h-10 mx-auto rounded-xl bg-blue-100 flex items-center justify-center mb-3\">\n            <BarChart3 className=\"w-5 h-5 text-blue-600\" />\n          </div>\n          <p className=\"text-sm font-medium text-gray-700\">{t.stock.kline}</p>\n          <p className=\"text-xs text-gray-400 mt-1\">{t.stock.klineDesc}</p>\n        </div>\n        <div className=\"text-center\">\n          <div className=\"w-10 h-10 mx-auto rounded-xl bg-purple-100 flex items-center justify-center mb-3\">\n            <Sparkles className=\"w-5 h-5 text-purple-600\" />\n          </div>\n          <p className=\"text-sm font-medium text-gray-700\">{t.stock.aiSentiment}</p>\n          <p className=\"text-xs text-gray-400 mt-1\">{t.stock.aiSentimentDesc}</p>\n        </div>\n        <div className=\"text-center\">\n          <div className=\"w-10 h-10 mx-auto rounded-xl bg-emerald-100 flex items-center justify-center mb-3\">\n            <TrendingUp className=\"w-5 h-5 text-emerald-600\" />\n          </div>\n          <p className=\"text-sm font-medium text-gray-700\">{t.stock.debate}</p>\n          <p className=\"text-xs text-gray-400 mt-1\">{t.stock.debateDesc}</p>\n        </div>\n      </div>\n    </div>\n  )\n}\n\n"
  },
  {
    "path": "frontend/src/pages/TaskManagerPage.tsx",
    "content": "import { useQuery } from '@tanstack/react-query'\nimport { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card'\nimport { Badge } from '@/components/ui/badge'\nimport { taskApi } from '@/lib/api-client'\nimport { formatRelativeTime } from '@/lib/utils'\nimport { useGlobalI18n, useLanguageStore } from '@/store/useLanguageStore'\n\nexport default function TaskManagerPage() {\n  const t = useGlobalI18n()\n  const { lang } = useLanguageStore()\n  const { data: tasks, isLoading } = useQuery({\n    queryKey: ['tasks', 'list'],\n    queryFn: () => taskApi.getTaskList({ limit: 20 }),\n    refetchInterval: 5000, // 每5秒刷新\n  })\n\n  const getStatusBadge = (status: string) => {\n    const variants = {\n      completed: 'success' as const,\n      running: 'default' as const,\n      pending: 'secondary' as const,\n      failed: 'destructive' as const,\n    }\n    const labels = {\n      completed: `✅ ${t.tasks.completed}`,\n      running: `⏳ ${t.tasks.running}`,\n      pending: `⏸️ ${t.tasks.pending}`,\n      failed: `❌ ${t.tasks.failed}`,\n    }\n    return <Badge variant={variants[status as keyof typeof variants] || 'outline'}>{labels[status as keyof typeof labels] || status}</Badge>\n  }\n\n  return (\n    <div className=\"p-6 space-y-6\">\n      <div>\n        <h1 className=\"text-3xl font-bold tracking-tight\">{t.tasks.title}</h1>\n        <p className=\"text-muted-foreground\">{t.tasks.subtitle}</p>\n      </div>\n\n      <div className=\"space-y-4\">\n        {isLoading ? (\n          <div className=\"text-center py-12 text-gray-500\">{t.tasks.loading}</div>\n        ) : tasks && tasks.length > 0 ? (\n          tasks.map((task) => (\n            <Card key={task.id}>\n              <CardHeader>\n                <div className=\"flex items-center justify-between\">\n                  <CardTitle className=\"text-base\">\n                    {t.tasks.task} #{task.id} - {task.source}\n                  </CardTitle>\n                  <div className=\"flex items-center gap-2\">\n                    {getStatusBadge(task.status)}\n                    <Badge variant=\"outline\">{task.mode === 'realtime' ? `⚡ ${t.tasks.realtime}` : `🥶 ${t.tasks.coldStart}`}</Badge>\n                  </div>\n                </div>\n              </CardHeader>\n              <CardContent>\n                <div className=\"grid grid-cols-2 md:grid-cols-4 gap-4 text-sm\">\n                  <div>\n                    <div className=\"text-gray-500\">{t.tasks.crawled}</div>\n                    <div className=\"font-medium\">{task.crawled_count}</div>\n                  </div>\n                  <div>\n                    <div className=\"text-gray-500\">{t.tasks.saved}</div>\n                    <div className=\"font-medium\">{task.saved_count}</div>\n                  </div>\n                  <div>\n                    <div className=\"text-gray-500\">{t.tasks.duration}</div>\n                    <div className=\"font-medium\">\n                      {task.execution_time ? `${task.execution_time.toFixed(2)}s` : '-'}\n                    </div>\n                  </div>\n                  <div>\n                    <div className=\"text-gray-500\">{t.tasks.createdAt}</div>\n                    <div className=\"font-medium\">{formatRelativeTime(task.created_at, t.time)}</div>\n                  </div>\n                </div>\n\n                {task.progress && task.progress.percentage && (\n                  <div className=\"mt-4\">\n                    <div className=\"flex justify-between text-xs text-gray-500 mb-1\">\n                      <span>{t.tasks.progress}</span>\n                      <span>{task.progress.percentage}%</span>\n                    </div>\n                    <div className=\"w-full bg-gray-200 rounded-full h-2\">\n                      <div\n                        className=\"bg-blue-600 h-2 rounded-full transition-all\"\n                        style={{ width: `${task.progress.percentage}%` }}\n                      />\n                    </div>\n                  </div>\n                )}\n              </CardContent>\n            </Card>\n          ))\n        ) : (\n          <div className=\"text-center py-12 text-gray-500\">\n            {t.tasks.noTasks}\n          </div>\n        )}\n      </div>\n    </div>\n  )\n}\n\n"
  },
  {
    "path": "frontend/src/store/useDebateStore.ts",
    "content": "import { create } from 'zustand'\nimport { persist } from 'zustand/middleware'\n\n// 聊天消息类型（与 DebateChatRoom 一致）\nexport type ChatRole = 'user' | 'bull' | 'bear' | 'manager' | 'system' | 'data_collector' | 'search'\n\nexport interface ChatMessage {\n  id: string\n  role: ChatRole\n  content: string\n  timestamp: Date\n  round?: number\n  isStreaming?: boolean\n  mentions?: string[] // 消息中的 @ 提及\n  searchPlan?: any // 搜索计划\n  searchStatus?: 'pending' | 'executing' | 'completed' | 'cancelled'\n}\n\n// 分析结果（用于保存并行/快速分析模式的结果）\nexport interface AnalysisResult {\n  bull?: string\n  bear?: string\n  manager?: string\n  quick?: string\n  finalDecision?: {\n    rating?: string\n    decision?: string\n  }\n  executionTime?: number\n}\n\n// 辩论会话\nexport interface DebateSession {\n  id: string\n  stockCode: string\n  stockName: string\n  messages: ChatMessage[]\n  mode: string\n  createdAt: Date\n  updatedAt: Date\n  // 新增：并行/快速分析模式的结果\n  analysisResult?: AnalysisResult\n  // 新增：会话状态\n  status?: 'in_progress' | 'completed' | 'interrupted'\n}\n\n// 本地存储的会话格式（日期需要序列化）\ninterface SerializedSession {\n  id: string\n  stockCode: string\n  stockName: string\n  messages: Array<Omit<ChatMessage, 'timestamp'> & { timestamp: string }>\n  mode: string\n  createdAt: string\n  updatedAt: string\n}\n\ninterface DebateStore {\n  // 当前会话\n  currentSession: DebateSession | null\n  // 历史会话列表（按股票代码索引）\n  sessions: Record<string, DebateSession[]>\n  \n  // 操作方法\n  startSession: (stockCode: string, stockName: string, mode: string) => string\n  addMessage: (message: ChatMessage) => void\n  updateMessage: (messageId: string, updates: Partial<ChatMessage>) => void\n  clearCurrentSession: () => void\n  \n  // 批量同步消息（用于辩论完成时一次性同步所有消息）\n  syncMessages: (messages: ChatMessage[]) => void\n  \n  // 新增：保存分析结果（用于并行/快速分析模式）\n  saveAnalysisResult: (result: AnalysisResult) => void\n  // 新增：更新会话状态\n  updateSessionStatus: (status: 'in_progress' | 'completed' | 'interrupted') => void\n  // 新增：恢复会话到页面状态\n  restoreSession: (sessionId: string) => DebateSession | null\n  // 新增：获取最近未完成的会话\n  getLatestInProgressSession: (stockCode: string) => DebateSession | null\n  \n  // 历史管理\n  loadSession: (stockCode: string, sessionId?: string) => DebateSession | null\n  getStockSessions: (stockCode: string) => DebateSession[]\n  deleteSession: (stockCode: string, sessionId: string) => void\n  clearStockHistory: (stockCode: string) => Promise<void>\n  \n  // 同步到后端（可选）\n  syncToBackend: (stockCode: string) => Promise<void>\n  loadFromBackend: (stockCode: string) => Promise<void>\n}\n\n// 序列化会话（用于持久化）\nconst serializeSession = (session: DebateSession): SerializedSession => ({\n  ...session,\n  messages: session.messages.map(m => ({\n    ...m,\n    timestamp: m.timestamp.toISOString()\n  })),\n  createdAt: session.createdAt.toISOString(),\n  updatedAt: session.updatedAt.toISOString()\n})\n\n// 反序列化会话（从持久化恢复）\nconst deserializeSession = (session: SerializedSession): DebateSession => ({\n  ...session,\n  messages: session.messages.map(m => ({\n    ...m,\n    timestamp: new Date(m.timestamp)\n  })),\n  createdAt: new Date(session.createdAt),\n  updatedAt: new Date(session.updatedAt)\n})\n\nexport const useDebateStore = create<DebateStore>()(\n  persist(\n    (set, get) => ({\n      currentSession: null,\n      sessions: {},\n      \n      startSession: (stockCode, stockName, mode) => {\n        const sessionId = `debate-${stockCode}-${Date.now()}`\n        const newSession: DebateSession = {\n          id: sessionId,\n          stockCode,\n          stockName,\n          messages: [],\n          mode,\n          createdAt: new Date(),\n          updatedAt: new Date(),\n          status: 'in_progress'\n        }\n        \n        set(state => ({\n          currentSession: newSession,\n          sessions: {\n            ...state.sessions,\n            [stockCode]: [\n              newSession,\n              ...(state.sessions[stockCode] || []).slice(0, 9) // 最多保留10个历史会话\n            ]\n          }\n        }))\n        \n        return sessionId\n      },\n      \n      addMessage: (message) => {\n        set(state => {\n          if (!state.currentSession) return state\n          \n          const updatedSession = {\n            ...state.currentSession,\n            messages: [...state.currentSession.messages, message],\n            updatedAt: new Date()\n          }\n          \n          // 同时更新 sessions 中的记录\n          const stockCode = updatedSession.stockCode\n          const updatedSessions = (state.sessions[stockCode] || []).map(s =>\n            s.id === updatedSession.id ? updatedSession : s\n          )\n          \n          return {\n            currentSession: updatedSession,\n            sessions: {\n              ...state.sessions,\n              [stockCode]: updatedSessions\n            }\n          }\n        })\n      },\n      \n      // 批量同步消息（替换当前会话的所有消息）\n      syncMessages: (messages) => {\n        set(state => {\n          if (!state.currentSession) return state\n          \n          // 优化过滤逻辑：只要有内容就保存，并强制标记为非流式\n          const validMessages = messages\n            .filter(m => m.content || m.searchPlan || m.role === 'system')\n            .map(m => ({\n              ...m,\n              isStreaming: false // 强制标记为已完成\n            }))\n          \n          const updatedSession = {\n            ...state.currentSession,\n            messages: validMessages,\n            updatedAt: new Date()\n          }\n          \n          const stockCode = updatedSession.stockCode\n          const updatedSessions = (state.sessions[stockCode] || []).map(s =>\n            s.id === updatedSession.id ? updatedSession : s\n          )\n          \n          return {\n            currentSession: updatedSession,\n            sessions: {\n              ...state.sessions,\n              [stockCode]: updatedSessions\n            }\n          }\n        })\n      },\n      \n      updateMessage: (messageId, updates) => {\n        set(state => {\n          if (!state.currentSession) return state\n          \n          const updatedMessages = state.currentSession.messages.map(m =>\n            m.id === messageId ? { ...m, ...updates } : m\n          )\n          \n          const updatedSession = {\n            ...state.currentSession,\n            messages: updatedMessages,\n            updatedAt: new Date()\n          }\n          \n          const stockCode = updatedSession.stockCode\n          const updatedSessions = (state.sessions[stockCode] || []).map(s =>\n            s.id === updatedSession.id ? updatedSession : s\n          )\n          \n          return {\n            currentSession: updatedSession,\n            sessions: {\n              ...state.sessions,\n              [stockCode]: updatedSessions\n            }\n          }\n        })\n      },\n      \n      clearCurrentSession: () => {\n        set({ currentSession: null })\n      },\n      \n      // 保存分析结果（用于并行/快速分析模式）\n      saveAnalysisResult: (result) => {\n        set(state => {\n          if (!state.currentSession) return state\n          \n          const updatedSession = {\n            ...state.currentSession,\n            analysisResult: result,\n            updatedAt: new Date()\n          }\n          \n          const stockCode = updatedSession.stockCode\n          const updatedSessions = (state.sessions[stockCode] || []).map(s =>\n            s.id === updatedSession.id ? updatedSession : s\n          )\n          \n          return {\n            currentSession: updatedSession,\n            sessions: {\n              ...state.sessions,\n              [stockCode]: updatedSessions\n            }\n          }\n        })\n      },\n      \n      // 更新会话状态\n      updateSessionStatus: (status) => {\n        set(state => {\n          if (!state.currentSession) return state\n          \n          const updatedSession = {\n            ...state.currentSession,\n            status,\n            updatedAt: new Date()\n          }\n          \n          const stockCode = updatedSession.stockCode\n          const updatedSessions = (state.sessions[stockCode] || []).map(s =>\n            s.id === updatedSession.id ? updatedSession : s\n          )\n          \n          return {\n            currentSession: updatedSession,\n            sessions: {\n              ...state.sessions,\n              [stockCode]: updatedSessions\n            }\n          }\n        })\n      },\n      \n      // 恢复会话\n      restoreSession: (sessionId) => {\n        const state = get()\n        for (const stockCode of Object.keys(state.sessions)) {\n          const session = state.sessions[stockCode].find(s => s.id === sessionId)\n          if (session) {\n            set({ currentSession: session })\n            return session\n          }\n        }\n        return null\n      },\n      \n      // 获取最近未完成的会话\n      getLatestInProgressSession: (stockCode) => {\n        const state = get()\n        const stockSessions = state.sessions[stockCode] || []\n        return stockSessions.find(s => s.status === 'in_progress') || null\n      },\n      \n      loadSession: (stockCode, sessionId) => {\n        const state = get()\n        const stockSessions = state.sessions[stockCode] || []\n        \n        if (sessionId) {\n          const session = stockSessions.find(s => s.id === sessionId)\n          if (session) {\n            set({ currentSession: session })\n            return session\n          }\n        }\n        \n        // 如果没有指定 sessionId，返回最新的会话\n        if (stockSessions.length > 0) {\n          const latestSession = stockSessions[0]\n          set({ currentSession: latestSession })\n          return latestSession\n        }\n        \n        return null\n      },\n      \n      getStockSessions: (stockCode) => {\n        return get().sessions[stockCode] || []\n      },\n      \n      deleteSession: (stockCode, sessionId) => {\n        set(state => {\n          const updatedSessions = (state.sessions[stockCode] || []).filter(\n            s => s.id !== sessionId\n          )\n          \n          return {\n            sessions: {\n              ...state.sessions,\n              [stockCode]: updatedSessions\n            },\n            // 如果删除的是当前会话，清空当前会话\n            currentSession: state.currentSession?.id === sessionId \n              ? null \n              : state.currentSession\n          }\n        })\n      },\n      \n      clearStockHistory: async (stockCode) => {\n        // 1. 先清除本地 Store\n        set(state => {\n          const { [stockCode]: _, ...rest } = state.sessions\n          return {\n            sessions: rest,\n            currentSession: state.currentSession?.stockCode === stockCode\n              ? null\n              : state.currentSession\n          }\n        })\n        \n        // 2. 同时清除后端数据库中的历史\n        try {\n          const response = await fetch(`/api/v1/agents/debate/history/${stockCode}`, {\n            method: 'DELETE'\n          })\n          if (response.ok) {\n            console.log('✅ 已清除后端历史记录')\n          } else {\n            console.error('❌ 清除后端历史失败')\n          }\n        } catch (error) {\n          console.error('❌ 清除后端历史出错:', error)\n        }\n      },\n      \n      // 同步到后端\n      syncToBackend: async (stockCode) => {\n        const state = get()\n        const sessions = state.sessions[stockCode]\n        \n        console.log('💾 syncToBackend called for:', stockCode)\n        console.log('💾 Sessions count:', sessions?.length || 0)\n        \n        if (!sessions || sessions.length === 0) {\n          console.warn('⚠️ syncToBackend: no sessions to sync')\n          return\n        }\n        \n        // 打印每个会话的消息数量\n        sessions.forEach((s, i) => {\n          console.log(`💾 Session ${i}: ${s.id}, messages: ${s.messages.length}`)\n          console.log(`💾 Session ${i} roles:`, s.messages.map(m => m.role))\n        })\n        \n        try {\n          const serialized = sessions.map(serializeSession)\n          console.log('💾 Sending to backend:', JSON.stringify(serialized).slice(0, 500) + '...')\n          \n          const response = await fetch(`/api/v1/agents/debate/history`, {\n            method: 'POST',\n            headers: { 'Content-Type': 'application/json' },\n            body: JSON.stringify({\n              stock_code: stockCode,\n              sessions: serialized\n            })\n          })\n          \n          if (!response.ok) {\n            console.error('Failed to sync debate history to backend')\n          } else {\n            console.log('✅ Synced to backend successfully')\n          }\n        } catch (error) {\n          console.error('Error syncing debate history:', error)\n        }\n      },\n      \n      // 从后端加载\n      loadFromBackend: async (stockCode) => {\n        console.log('📥 loadFromBackend called for:', stockCode)\n        \n        try {\n          const response = await fetch(`/api/v1/agents/debate/history/${stockCode}`)\n          \n          if (response.ok) {\n            const data = await response.json()\n            console.log('📥 Loaded from backend:', data)\n            \n            if (data.sessions && data.sessions.length > 0) {\n              const sessions = data.sessions.map(deserializeSession)\n              console.log('📥 Deserialized sessions:', sessions.length)\n              sessions.forEach((s: any, i: number) => {\n                console.log(`📥 Session ${i}: ${s.id}, messages: ${s.messages.length}`)\n                console.log(`📥 Session ${i} roles:`, s.messages.map((m: any) => m.role))\n              })\n              \n              set(state => ({\n                sessions: {\n                  ...state.sessions,\n                  [stockCode]: sessions\n                }\n              }))\n            } else {\n              console.log('📥 No sessions in response')\n            }\n          } else {\n            console.error('📥 Failed to load:', response.status)\n          }\n        } catch (error) {\n          console.error('Error loading debate history from backend:', error)\n        }\n      }\n    }),\n    {\n      name: 'finnews-debate-history',\n      // 自定义序列化\n      serialize: (state) => {\n        const serialized = {\n          ...state,\n          state: {\n            ...state.state,\n            currentSession: state.state.currentSession \n              ? serializeSession(state.state.currentSession)\n              : null,\n            sessions: Object.fromEntries(\n              Object.entries(state.state.sessions).map(([k, v]) => [\n                k,\n                (v as DebateSession[]).map(serializeSession)\n              ])\n            )\n          }\n        }\n        return JSON.stringify(serialized)\n      },\n      // 自定义反序列化\n      deserialize: (str) => {\n        const parsed = JSON.parse(str)\n        return {\n          ...parsed,\n          state: {\n            ...parsed.state,\n            currentSession: parsed.state.currentSession\n              ? deserializeSession(parsed.state.currentSession)\n              : null,\n            sessions: Object.fromEntries(\n              Object.entries(parsed.state.sessions).map(([k, v]) => [\n                k,\n                (v as SerializedSession[]).map(deserializeSession)\n              ])\n            )\n          }\n        }\n      }\n    }\n  )\n)\n\n"
  },
  {
    "path": "frontend/src/store/useLanguageStore.ts",
    "content": "/**\n * 全局语言状态管理\n */\n\nimport { create } from 'zustand';\nimport { persist } from 'zustand/middleware';\n\nexport type Lang = 'zh' | 'en';\n\ninterface LanguageState {\n  lang: Lang;\n  setLang: (lang: Lang) => void;\n  toggleLang: () => void;\n}\n\nexport const useLanguageStore = create<LanguageState>()(\n  persist(\n    (set, get) => ({\n      lang: 'zh',\n      setLang: (lang) => set({ lang }),\n      toggleLang: () => set({ lang: get().lang === 'zh' ? 'en' : 'zh' }),\n    }),\n    {\n      name: 'finnews-language',\n    }\n  )\n);\n\n// 全局国际化文案\nexport const globalI18n = {\n  zh: {\n    nav: {\n      home: '首页',\n      news: '新闻流',\n      stock: '个股分析',\n      alphaMining: 'Alpha因子挖掘',\n      agents: '智能体监控',\n      tasks: '任务管理',\n    },\n    header: {\n      title: 'FinnewsHunter',\n      poweredBy: 'Powered by',\n    },\n    dashboard: {\n      title: '仪表盘',\n      subtitle: '金融新闻智能分析平台 - Powered by AgenticX',\n      totalNews: '总新闻数',\n      savedToDb: '已保存到数据库',\n      totalTasks: '总任务数',\n      recentCompleted: '最近完成',\n      units: '个',\n      crawlRate: '爬取成功率',\n      liveMonitor: '实时监控',\n      running: '运行中',\n      autoInterval: '每1分钟自动爬取',\n      newsStats: '新闻来源统计',\n      newsStatsDesc: '各新闻源的内容数量分布',\n      latestNews: '最新新闻',\n      latestNewsDesc: '最近爬取的新闻动态',\n      allSources: '全部来源',\n      noNews: '暂无新闻数据，请先爬取新闻',\n      noNewsFrom: '暂无来自该来源的新闻',\n    },\n    news: {\n      search: '搜索新闻、股票代码...',\n      all: '全部',\n      pending: '待分析',\n      positive: '利好',\n      negative: '利空',\n      neutral: '中性',\n      items: '条新闻',\n      source: '来源',\n      analyzing: '分析中...',\n      reanalyze: '重新分析',\n      analyze: '分析',\n      analysisComplete: '分析完成！',\n      analysisFailed: '分析失败',\n      crawling: '正在爬取中，请稍候...',\n      refreshNow: '立即刷新',\n      crawlingProgress: '爬取中...(约2分钟)',\n      collapse: '收起',\n      expandMore: '展开更多',\n      stocks: '只股票',\n      noNews: '暂无新闻',\n      noNewsFound: '没有找到与',\n      relatedNews: '相关的新闻',\n      tryOtherKeywords: '试试其他关键词，如股票代码或公司名称',\n      pleaseCrawl: '请先爬取新闻',\n      selectedItems: '已选择 {count} 项',\n      cancelSelection: '取消选择',\n      deleteNews: '删除新闻',\n      deleteSelected: '删除选中',\n      confirmDelete: '确定要删除选中的 {count} 条新闻吗？此操作不可恢复。',\n      selectAll: '全选',\n      deselectAll: '取消全选',\n      analyzeAll: '全部分析',\n      reanalyzeAll: '重新分析',\n      analyzingSelected: '正在分析选中的 {count} 条新闻...',\n      analysisComplete: '分析完成！成功 {success} 条，失败 {failed} 条',\n    },\n    stock: {\n      title: '个股智能分析',\n      subtitle: '输入股票代码或名称，开启 AI 驱动的投资洞察',\n      searchPlaceholder: '搜索股票代码或名称...',\n      searching: '搜索中...',\n      notFound: '未找到匹配的股票',\n      tryInput: '尝试输入股票代码或名称',\n      emptyDb: '股票数据库为空',\n      initTip: '点击下方按钮初始化股票数据',\n      initBtn: '初始化股票数据',\n      importing: '正在导入股票数据...',\n      hotStocks: '热门股票',\n      kline: 'K线分析',\n      klineDesc: '多周期行情数据',\n      aiSentiment: 'AI 情感分析',\n      aiSentimentDesc: '新闻舆情智能解读',\n      debate: '多空辩论',\n      debateDesc: 'Bull vs Bear 对决',\n      nav: '导航',\n      select: '选择',\n      close: '关闭',\n    },\n    agents: {\n      title: '智能体监控台',\n      subtitle: '实时查看智能体执行状态、性能指标和思考链',\n      autoRefreshing: '自动刷新中',\n      refresh: '手动刷新',\n      clearLogs: '清空日志',\n      totalExec: '总执行次数',\n      successExec: '成功执行',\n      successRate: '成功率',\n      failedExec: '失败次数',\n      avgTime: '平均耗时',\n      availableAgents: '可用智能体',\n      availableAgentsDesc: '系统中已注册的智能体和工作流',\n      agents: '智能体',\n      workflows: '工作流',\n      active: '活跃',\n      inactive: '未激活',\n      execLogs: '执行日志',\n      execLogsDesc: '实时智能体执行日志和状态追踪',\n      records: '条记录',\n      noLogs: '暂无执行日志',\n      noLogsHint: '执行分析任务或辩论后，日志将在此显示',\n      execTimes: '执行',\n      times: '次',\n      success: '成功',\n      avg: '平均',\n      recentActivity: '最近活动',\n      confirmClearLogs: '确定要清空所有执行日志吗？此操作不可恢复。',\n    },\n    tasks: {\n      title: '任务管理',\n      subtitle: '爬取任务监控和管理',\n      task: '任务',\n      completed: '已完成',\n      running: '运行中',\n      pending: '待执行',\n      failed: '失败',\n      realtime: '实时',\n      coldStart: '冷启动',\n      crawled: '爬取数',\n      saved: '保存数',\n      duration: '耗时',\n      createdAt: '创建时间',\n      progress: '进度',\n      noTasks: '暂无任务记录',\n      loading: '加载中...',\n    },\n    common: {\n      loading: '加载中...',\n      noData: '暂无数据',\n      confirm: '确定',\n      cancel: '取消',\n    },\n    time: {\n      justNow: '刚刚',\n      minutesAgo: '分钟前',\n      hoursAgo: '小时前',\n      daysAgo: '天前',\n    },\n    model: {\n      loading: '加载中...',\n      notConfigured: '未配置LLM',\n      selectModel: '选择模型',\n      selectTip: '选择模型 · 兼顾质量与成本',\n      noApiKey: '未配置API Key',\n      current: '当前',\n    },\n    debateRoom: {\n      title: '投资辩论',\n      titlePlaceholder: '多空辩论室',\n      subtitle: '多方 vs 空方 · 投资经理主持',\n      roundPrefix: '第',\n      roundSuffix: '轮',\n      typing: '正在输入...',\n      thinking: '思考中...',\n      noMessages: '尚无消息',\n      clickStartDebate: '点击「开始辩论」启动多空对决',\n      canSpeakDuringDebate: '您也可以在辩论过程中发言提问',\n      debateInProgress: '辩论进行中,输入 @提及智能体...',\n      mentionTip: '提示:使用@多方辩手@空方辩手可以指定角色回答',\n      roundStarted: '轮辩论开始',\n      debateEnded: '辩论结束，投资经理已做出最终决策',\n      debateStarted: '辩论开始，数据专员正在准备资料...',\n      searchPlanConfirm: '搜索计划确认',\n      searchPlanExecuting: '正在搜索中...',\n      searchPlanCompleted: '执行完成',\n      searchPlanCancel: '取消',\n      searchPlanConfirmBtn: '确认执行',\n      estimatedTime: '预计耗时',\n      seconds: '秒',\n    },\n    mentionInput: {\n      placeholder: '输入消息，使用 @ 提及智能体或数据源...',\n      agents: '智能体',\n      sources: '数据源',\n      stocks: '股票',\n    },\n    debateHistory: {\n      history: '历史',\n      noMessages: '尚无消息',\n      messages: '条消息',\n      justNow: '刚刚',\n      minutesAgo: '分钟前',\n      hoursAgo: '小时前',\n      daysAgo: '天前',\n      today: '今天',\n      yesterday: '昨天',\n      thisWeek: '本周',\n      older: '更早',\n      expandHistory: '展开历史记录',\n      continueDebate: '继续辩论',\n      delete: '删除',\n      searchPlaceholder: '搜索历史记录...',\n      noMatchingRecords: '未找到匹配的记录',\n      noHistoryYet: '暂无历史记录',\n      tryOtherKeywords: '尝试其他关键词',\n      historyAutoSave: '开始辩论后会自动保存',\n      roleNames: {\n        user: '我',\n        bull: '多方',\n        bear: '空方',\n        manager: '经理',\n        data_collector: '数据专员',\n      },\n    },\n    stockDetail: {\n      title: '个股分析 · 智能体驱动的投资决策',\n      relatedNews: '关联新闻',\n      analyzed: '已分析',\n      items: '条',\n      overallSentiment: '整体情感',\n      recent7d: '近7天情感',\n      unknown: '未知',\n      trend: '趋势',\n      up: '上升',\n      down: '下降',\n      stable: '稳定',\n      latestNews: '最新新闻',\n      none: '暂无',\n      kline: 'K线图 · 真实行情',\n      dataSource: '数据来源',\n      supportZoom: '支持缩放拖拽',\n      close: '收盘',\n      change: '涨跌',\n      volume: '成交额',\n      billion: '亿',\n      period: '周期',\n      adjust: '复权',\n      daily: '日K',\n      dailyK: '日K',\n      min60: '60分',\n      min30: '30分',\n      min15: '15分',\n      min5: '5分',\n      min1: '1分',\n      qfq: '前复权',\n      qfqTip: '消除除权缺口，保持走势连续（推荐）',\n      noAdjust: '不复权',\n      noAdjustTip: '显示真实交易价格，会有除权缺口',\n      hfq: '后复权',\n      hfqTip: '以上市首日为基准，价格可能很高',\n      recommendLabel: 'Recommend',\n      timeLabel: '时间',\n      openLabel: '开',\n      highLabel: '高',\n      lowLabel: '低',\n      closeLabel: '收',\n      volumeLabel: '量',\n      parallelAnalysis: '并行分析',\n      parallelAnalysisDesc: 'Bull/Bear并行分析，投资经理汇总决策',\n      realtimeDebate: '实时辩论',\n      realtimeDebateDesc: '四人实时对话，投资经理主持，多空双方交替发言',\n      quickAnalysis: '快速分析',\n      quickAnalysisDesc: '单一分析师快速给出建议，适合时间紧迫场景',\n      result: '结果',\n      historySessionLoaded: '已加载历史会话',\n      detectIncompleteSession: '检测到有未完成的',\n      session: '会话',\n      messages: '条消息',\n      restore: '是否恢复',\n      analysis: '分析',\n      analysisModeConfig: '分析模式配置',\n      default: '默认',\n      parallelExecution: '并行执行',\n      about2to3min: '约2-3分钟',\n      realtimeDialogue: '实时对话',\n      fourAgents: '4位智能体',\n      about5to10min: '约5-10分钟',\n      singleAgent: '单智能体',\n      about1min: '约1分钟',\n      advancedConfig: '高级配置',\n      maxExecutionTime: '最大执行时间',\n      seconds: '秒',\n      maxDebateRounds: '最大辩论回合数',\n      rounds: '轮',\n      managerCanInterrupt: '投资经理可打断辩论',\n      collectDataBeforeDebate: '辩论前搜集数据',\n      executionTime: '耗时',\n      news: '关联新闻',\n      newsContain: '包含',\n      newsTotal: '条',\n      fold: '折叠',\n      expand: '展开',\n      clearData: '清除数据',\n      clearing: '清除中...',\n      crawlComplete: '爬取完成',\n      crawlFailed: '爬取失败',\n      crawling: '爬取中...',\n      stop: '停止',\n      updateCrawl: '更新爬取',\n      targetCrawl: '定向爬取',\n      noRelatedNews: '暂无关联新闻',\n      clickCrawl: '点击「定向爬取」获取该股票的相关新闻',\n      loadMore: '继续扩展',\n      remaining: '还有',\n      showAll: '已显示全部',\n      newsFolded: '新闻已折叠，点击\"展开\"查看',\n      sentimentTrend: '新闻情感趋势',\n      sentimentDesc: '近30天新闻情感分布与平均值',\n      positive: '利好',\n      negative: '利空',\n      neutral: '中性',\n      avgSentiment: '平均情感',\n      bullBear: 'Bull vs Bear 智能体辩论',\n      bullBearDesc: '看多研究员 vs 看空研究员，投资经理综合裁决',\n      startDebate: '开始辩论',\n      debating: '辩论中...',\n      analysisMode: '分析模式',\n      bullView: '看多观点',\n      bearView: '看空观点',\n      managerDecision: '投资经理决策',\n      waitingAnalysis: '等待分析...',\n      waitingDecision: '等待多空分析完成后进行决策...',\n      clickDebate: '点击\"开始辩论\"启动智能体分析',\n      debateDesc: '系统将自动调用 Bull/Bear 研究员进行多角度分析，并由投资经理给出综合决策',\n      backToSearch: '返回搜索',\n      history: '历史',\n      copy: '复制',\n      export: '导出',\n      regenerate: '重新生成',\n      stronglyRec: '强烈推荐',\n      recommend: '推荐',\n      avoid: '回避',\n      caution: '谨慎',\n      strongBull: '强烈利好',\n      strongBear: '强烈利空',\n      noKline: '暂无K线数据',\n      checkCode: '请检查股票代码是否正确',\n      sessionRestored: '已恢复上次会话',\n      debateComplete: '辩论分析完成！',\n      outputting: '输出中...',\n      deciding: '决策中...',\n      analysisComplete: '分析完成',\n      analysisGenerating: '分析生成中...',\n      decisionGenerating: '决策生成中...',\n      debateFailed: '辩论分析失败',\n      sessionDeleted: '已删除会话',\n      allHistoryCleared: '已清除所有历史记录',\n      searchCancelled: '已取消搜索任务',\n      crawlTaskStarted: '定向爬取任务已启动',\n      crawlingInProgress: '正在爬取中...',\n      crawlTaskExists: '该股票已有正在进行的爬取任务，正在同步状态...',\n      crawlTaskStopped: '已停止爬取任务',\n      crawlTaskStopFailed: '停止任务失败',\n      newsCleared: '已清除',\n      newsItems: '条新闻',\n      clearNewsConfirm: '确定要清除「',\n      clearNewsConfirmEnd: '」的所有新闻吗？此操作不可恢复！',\n      stopCrawlConfirm: '确定要停止当前的爬取任务吗？',\n      knowledgeGraph: '知识图谱 · 智能检索',\n      knowledgeGraphDesc: '基于多维度关键词并发检索，提升召回率',\n      nameVariants: '名称变体',\n      mainBusiness: '主营业务',\n      relatedConcepts: '关联概念',\n      concurrentQueries: '并发检索查询',\n      bullResearcher: '看多研究员',\n      bearResearcher: '看空研究员',\n      investmentManager: '投资经理',\n      generatingSearchPlan: '正在生成搜索计划...',\n      deleteSessionConfirm: '确定要删除这条记录吗？',\n      clearAllHistoryConfirm: '确定要清除所有历史记录吗？此操作不可恢复！',\n      clearAllRecords: '清除所有记录',\n      crawlSuccess: '定向爬取完成！新增',\n      unknownError: '未知错误',\n      taskCreated: '任务已创建，等待执行...',\n    },\n    alphaMining: {\n      training: {\n        title: 'RL 训练监控',\n        desc: 'Transformer + REINFORCE 算法实时训练进度',\n        ready: '就绪',\n        running: '训练中',\n        completed: '完成',\n        error: '错误',\n        steps: '训练步数',\n        useSentiment: '使用情感特征',\n        stop: '停止',\n        start: '开始训练',\n        progress: '训练进度',\n        bestFactor: '当前最优因子',\n        convergence: '收敛曲线',\n        trainingFailed: '训练失败',\n      },\n      metrics: {\n        noData: '暂无评估数据',\n        hint: '请先评估一个因子表达式',\n        currentFactor: '当前因子',\n        multiDim: '多维度评估',\n        riskMetrics: '风险指标',\n        maxDrawdown: '最大回撤',\n        safe: '安全',\n        danger: '危险',\n        dailyTurnover: '日均换手率',\n        winRate: '胜率',\n        totalReturn: '累计收益',\n        returnsCurve: '收益曲线',\n        returnsDesc: '策略累计收益 vs 基准',\n        strategy: '策略',\n        benchmark: '基准',\n        metricDesc: '指标说明',\n        sortinoDesc: 'Sortino: 越高越好，>1优秀',\n        sharpeDesc: 'Sharpe: 越高越好，>0.5良好',\n        icDesc: 'IC: 绝对值>0.03有效',\n        maxDDDesc: 'Max DD: <20%安全',\n        excellent: '优秀',\n        good: '良好',\n        average: '一般',\n        poor: '较差',\n        lowTurnover: '低换手',\n      },\n      sentiment: {\n        title: '情感融合效果对比',\n        desc: '对比纯技术因子 vs 情感增强因子的挖掘效果',\n        steps: '训练步数',\n        comparing: '对比中...',\n        start: '开始对比',\n        techOnly: '纯技术因子',\n        techDesc: '个特征（RET, VOL, VOLUME_CHG, TURNOVER）',\n        enhanced: '情感增强因子',\n        enhancedDesc: '个特征（+SENTIMENT, NEWS_COUNT）',\n        bestFactor: '最优因子',\n        none: '无',\n        improvement: '改进幅度',\n        improved: '情感特征提升了因子效果',\n        degraded: '情感特征降低了因子效果',\n        scoreDiff: 'Score 差异',\n        comparison: 'Score 对比',\n        techOnlyBar: '纯技术',\n        enhancedBar: '情感增强',\n        conclusion: '结论：',\n        conclusionPositive: '情感特征（SENTIMENT, NEWS_COUNT）对因子挖掘有正向贡献，建议在实际应用中开启情感融合功能。',\n        conclusionNegative: '本次实验中情感特征未能提升效果，可能原因包括：样本量不足、情感数据噪音、训练步数过少等。建议增加训练步数后重试。',\n        comparingText: '正在进行对比实验...',\n        comparingHint: '分别训练纯技术因子和情感增强因子，每种',\n        stepsText: '步',\n        startHint: '点击\"开始对比\"运行情感融合实验',\n        startDesc: '将分别训练纯技术因子和情感增强因子进行效果对比',\n        comparisonFailed: '对比失败',\n      },\n      agent: {\n        title: 'AgenticX Agent 调用演示',\n        desc: '展示 Agent 如何调用 AlphaMiningTool 进行因子挖掘',\n        success: '成功',\n        failed: '失败',\n        toolParams: 'Tool 参数',\n        stockCode: '股票代码（可选）',\n        stockPlaceholder: '如 SH600519',\n        steps: '训练步数',\n        useSentiment: '使用情感特征',\n        executing: '执行中...',\n        execute: '执行 Agent 调用',\n        inputParams: '输入参数',\n        output: '输出结果',\n        executionTime: '耗时',\n        bestFactor: '最优因子',\n        logs: '执行日志',\n        codeExample: 'Python 调用示例',\n        executeFailed: '执行失败',\n        startHint: '配置参数后点击\"执行 Agent 调用\"',\n        startDesc: '将演示 QuantitativeAgent 如何通过 AlphaMiningTool 进行因子挖掘',\n        miningTask: '为 {code} 挖掘量化因子',\n        createAgent: '创建 Agent',\n        registerTool: '注册 Tool',\n        executeMining: '执行因子挖掘',\n      },\n      operators: {\n        all: '全部',\n        availableFeatures: '可用特征',\n        techFeature: '技术特征',\n        sentimentFeature: '情感特征',\n        totalOperators: '共 {count} 个操作符',\n        totalFeatures: '{count} 个特征',\n        params: '参',\n        categoryArithmetic: '算术运算',\n        categoryUnary: '一元运算',\n        categoryTimeseries: '时序运算',\n        categoryConditional: '条件运算',\n        categorySpecial: '特殊运算',\n        add: '加法',\n        sub: '减法',\n        mul: '乘法',\n        div: '除法（安全）',\n        neg: '取负',\n        abs: '绝对值',\n        sign: '符号函数',\n        gate: '条件选择',\n        max: '取最大',\n        min: '取最小',\n        delay1: '延迟1期',\n        delay5: '延迟5期',\n        delta1: '1期差分',\n        delta5: '5期差分',\n        ma5: '5期均线',\n        ma10: '10期均线',\n        std5: '5期标准差',\n        std10: '10期标准差',\n        jump: '跳跃检测',\n        jumpExample: '检测>3σ异常值',\n        decay: '衰减加权',\n        max3: '3期最大',\n      },\n    },\n  },\n  en: {\n    nav: {\n      home: 'Home',\n      news: 'News Feed',\n      stock: 'Stock Analysis',\n      alphaMining: 'Alpha Mining',\n      agents: 'Agent Monitor',\n      tasks: 'Task Manager',\n    },\n    header: {\n      title: 'FinnewsHunter',\n      poweredBy: 'Powered by',\n    },\n    dashboard: {\n      title: 'Dashboard',\n      subtitle: 'Financial News AI Analytics Platform - Powered by AgenticX',\n      totalNews: 'Total News',\n      savedToDb: 'Saved to database',\n      totalTasks: 'Total Tasks',\n      recentCompleted: 'Recently completed',\n      units: '',\n      crawlRate: 'Crawl Success Rate',\n      liveMonitor: 'Live Monitor',\n      running: 'Running',\n      autoInterval: 'Auto crawl every minute',\n      newsStats: 'News Source Stats',\n      newsStatsDesc: 'Content distribution by news source',\n      latestNews: 'Latest News',\n      latestNewsDesc: 'Recently crawled news',\n      allSources: 'All Sources',\n      noNews: 'No news data, please crawl news first',\n      noNewsFrom: 'No news from this source',\n    },\n    news: {\n      search: 'Search news, stock codes...',\n      all: 'All',\n      pending: 'Pending',\n      positive: 'Positive',\n      negative: 'Negative',\n      neutral: 'Neutral',\n      items: 'items',\n      source: 'Source',\n      analyzing: 'Analyzing...',\n      reanalyze: 'Re-analyze',\n      analyze: 'Analyze',\n      analysisComplete: 'Analysis complete!',\n      analysisFailed: 'Analysis failed',\n      crawling: 'Crawling in progress, please wait...',\n      refreshNow: 'Refresh Now',\n      crawlingProgress: 'Crawling... (~2 min)',\n      collapse: 'Collapse',\n      expandMore: 'Expand More',\n      stocks: 'stocks',\n      noNews: 'No news',\n      noNewsFound: 'No news found for',\n      relatedNews: '',\n      tryOtherKeywords: 'Try other keywords like stock codes or company names',\n      pleaseCrawl: 'Please crawl news first',\n      selectedItems: 'Selected {count} items',\n      cancelSelection: 'Cancel Selection',\n      deleteNews: 'Delete News',\n      deleteSelected: 'Delete Selected',\n      confirmDelete: 'Are you sure you want to delete {count} selected news? This action cannot be undone.',\n      selectAll: 'Select All',\n      deselectAll: 'Deselect All',\n      analyzeAll: 'Analyze All',\n      reanalyzeAll: 'Re-analyze All',\n      analyzingSelected: 'Analyzing {count} selected news...',\n      analysisComplete: 'Analysis complete! {success} succeeded, {failed} failed',\n    },\n    stock: {\n      title: 'Stock Intelligence',\n      subtitle: 'Enter stock code or name for AI-powered investment insights',\n      searchPlaceholder: 'Search stock code or name...',\n      searching: 'Searching...',\n      notFound: 'No matching stocks found',\n      tryInput: 'Try entering stock code or name',\n      emptyDb: 'Stock database is empty',\n      initTip: 'Click below to initialize stock data',\n      initBtn: 'Initialize Stock Data',\n      importing: 'Importing stock data...',\n      hotStocks: 'Popular Stocks',\n      kline: 'K-Line Analysis',\n      klineDesc: 'Multi-period market data',\n      aiSentiment: 'AI Sentiment',\n      aiSentimentDesc: 'News sentiment analysis',\n      debate: 'Bull vs Bear',\n      debateDesc: 'Bull vs Bear debate',\n      nav: 'Navigate',\n      select: 'Select',\n      close: 'Close',\n    },\n    agents: {\n      title: 'Agent Monitor',\n      subtitle: 'Real-time agent execution status, metrics and reasoning chain',\n      autoRefreshing: 'Auto-refreshing',\n      refresh: 'Refresh',\n      clearLogs: 'Clear Logs',\n      totalExec: 'Total Executions',\n      successExec: 'Successful',\n      successRate: 'Success Rate',\n      failedExec: 'Failed',\n      avgTime: 'Avg Time',\n      availableAgents: 'Available Agents',\n      availableAgentsDesc: 'Registered agents and workflows',\n      agents: 'Agents',\n      workflows: 'Workflows',\n      active: 'Active',\n      inactive: 'Inactive',\n      execLogs: 'Execution Logs',\n      execLogsDesc: 'Real-time agent execution logs and status',\n      records: 'records',\n      noLogs: 'No execution logs',\n      noLogsHint: 'Logs will appear here after running analysis or debates',\n      execTimes: 'Executions',\n      times: '',\n      success: 'Success',\n      avg: 'Avg',\n      recentActivity: 'Recent Activity',\n      confirmClearLogs: 'Are you sure you want to clear all execution logs? This action cannot be undone.',\n    },\n    tasks: {\n      title: 'Task Manager',\n      subtitle: 'Crawl task monitoring and management',\n      task: 'Task',\n      completed: 'Completed',\n      running: 'Running',\n      pending: 'Pending',\n      failed: 'Failed',\n      realtime: 'Realtime',\n      coldStart: 'Cold Start',\n      crawled: 'Crawled',\n      saved: 'Saved',\n      duration: 'Duration',\n      createdAt: 'Created',\n      progress: 'Progress',\n      noTasks: 'No tasks',\n      loading: 'Loading...',\n    },\n    common: {\n      loading: 'Loading...',\n      noData: 'No data',\n      confirm: 'Confirm',\n      cancel: 'Cancel',\n    },\n    time: {\n      justNow: 'just now',\n      minutesAgo: ' min ago',\n      hoursAgo: ' hours ago',\n      daysAgo: ' days ago',\n    },\n    model: {\n      loading: 'Loading...',\n      notConfigured: 'LLM not configured',\n      selectModel: 'Select Model',\n      selectTip: 'Select Model - Balance quality & cost',\n      noApiKey: 'API Key not configured',\n      current: 'Current',\n    },\n    debateRoom: {\n      title: 'Investment Debate',\n      titlePlaceholder: 'Bull vs Bear Debate Room',\n      subtitle: 'Bull vs Bear · Investment Manager moderates',\n      roundPrefix: 'Round',\n      roundSuffix: '',\n      typing: 'is typing...',\n      thinking: 'Thinking...',\n      noMessages: 'No messages yet',\n      clickStartDebate: 'Click \"Start Debate\" to initiate bull-bear confrontation',\n      canSpeakDuringDebate: 'You can also speak and ask questions during the debate',\n      debateInProgress: 'Debate in progress, enter @ to mention agents...',\n      mentionTip: 'Tip: Use @BullDebater @BearDebater to specify a role for replies',\n      roundStarted: 'round debate started',\n      debateEnded: 'Debate ended, Investment Manager has made final decision',\n      debateStarted: 'Debate started, Data Collector is preparing materials...',\n      searchPlanConfirm: 'Search Plan Confirmation',\n      searchPlanExecuting: 'Searching...',\n      searchPlanCompleted: 'Execution completed',\n      searchPlanCancel: 'Cancel',\n      searchPlanConfirmBtn: 'Confirm Execution',\n      estimatedTime: 'Estimated time',\n      seconds: 's',\n    },\n    mentionInput: {\n      placeholder: 'Enter message, use @ to mention agents or data sources...',\n      agents: 'Agents',\n      sources: 'Data Sources',\n      stocks: 'Stocks',\n    },\n    debateHistory: {\n      history: 'History',\n      noMessages: 'No messages yet',\n      messages: 'messages',\n      justNow: 'just now',\n      minutesAgo: 'min ago',\n      hoursAgo: 'hours ago',\n      daysAgo: 'days ago',\n      today: 'Today',\n      yesterday: 'Yesterday',\n      thisWeek: 'This Week',\n      older: 'Older',\n      expandHistory: 'Expand history',\n      continueDebate: 'Continue debate',\n      delete: 'Delete',\n      searchPlaceholder: 'Search history...',\n      noMatchingRecords: 'No matching records',\n      noHistoryYet: 'No history yet',\n      tryOtherKeywords: 'Try other keywords',\n      historyAutoSave: 'History will be saved after starting debate',\n      roleNames: {\n        user: 'Me',\n        bull: 'Bull',\n        bear: 'Bear',\n        manager: 'Manager',\n        data_collector: 'Data Collector',\n      },\n    },\n    stockDetail: {\n      title: 'Stock Analysis - Agent-driven Investment Decisions',\n      relatedNews: 'Related News',\n      analyzed: 'Analyzed',\n      items: '',\n      overallSentiment: 'Overall Sentiment',\n      recent7d: '7-Day Sentiment',\n      unknown: 'Unknown',\n      trend: 'Trend',\n      up: 'Rising',\n      down: 'Falling',\n      stable: 'Stable',\n      latestNews: 'Latest News',\n      none: 'None',\n      kline: 'K-Line Chart - Real Market Data',\n      dataSource: 'Data source',\n      supportZoom: 'Supports zoom & drag',\n      close: 'Close',\n      change: 'Change',\n      volume: 'Volume',\n      billion: 'B',\n      period: 'Period',\n      adjust: 'Adjust',\n      daily: 'Daily',\n      dailyK: 'Daily',\n      min60: '60min',\n      min30: '30min',\n      min15: '15min',\n      min5: '5min',\n      min1: '1min',\n      qfq: 'Forward Adjusted',\n      qfqTip: 'Eliminates ex-dividend gaps, maintains continuity (Recommended)',\n      noAdjust: 'No Adjustment',\n      noAdjustTip: 'Shows actual trading prices, may have ex-dividend gaps',\n      hfq: 'Backward Adjusted',\n      hfqTip: 'Based on IPO date, prices may be very high',\n      recommendLabel: 'Recommend',\n      timeLabel: 'Time',\n      openLabel: 'Open',\n      highLabel: 'High',\n      lowLabel: 'Low',\n      closeLabel: 'Close',\n      volumeLabel: 'Volume',\n      parallelAnalysis: 'Parallel Analysis',\n      parallelAnalysisDesc: 'Bull/Bear parallel analysis, Investment Manager summarizes decision',\n      realtimeDebate: 'Real-time Debate',\n      realtimeDebateDesc: 'Four agents real-time dialogue, Investment Manager moderates, Bull/Bear alternate',\n      quickAnalysis: 'Quick Analysis',\n      quickAnalysisDesc: 'Single analyst quick recommendation, suitable for time-sensitive scenarios',\n      result: 'Result',\n      historySessionLoaded: 'Loaded history session',\n      detectIncompleteSession: 'Detected incomplete',\n      session: 'session',\n      messages: 'messages',\n      restore: 'Restore?',\n      analysis: 'Analysis',\n      analysisModeConfig: 'Analysis Mode Config',\n      default: 'Default',\n      parallelExecution: 'Parallel Execution',\n      about2to3min: '~2-3 min',\n      realtimeDialogue: 'Real-time Dialogue',\n      fourAgents: '4 Agents',\n      about5to10min: '~5-10 min',\n      singleAgent: 'Single Agent',\n      about1min: '~1 min',\n      advancedConfig: 'Advanced Config',\n      maxExecutionTime: 'Max Execution Time',\n      seconds: 's',\n      maxDebateRounds: 'Max Debate Rounds',\n      rounds: 'rounds',\n      managerCanInterrupt: 'Manager Can Interrupt',\n      collectDataBeforeDebate: 'Collect Data Before Debate',\n      executionTime: 'Time',\n      news: 'Related News',\n      newsContain: 'Contains',\n      newsTotal: '',\n      fold: 'Collapse',\n      expand: 'Expand',\n      clearData: 'Clear Data',\n      clearing: 'Clearing...',\n      crawlComplete: 'Crawl Complete',\n      crawlFailed: 'Crawl Failed',\n      crawling: 'Crawling...',\n      stop: 'Stop',\n      updateCrawl: 'Update Crawl',\n      targetCrawl: 'Target Crawl',\n      noRelatedNews: 'No related news',\n      clickCrawl: 'Click \"Target Crawl\" to fetch news for this stock',\n      loadMore: 'Load More',\n      remaining: '',\n      showAll: 'Showing all',\n      newsFolded: 'News collapsed, click \"Expand\" to view',\n      sentimentTrend: 'News Sentiment Trend',\n      sentimentDesc: '30-day sentiment distribution and average',\n      positive: 'Positive',\n      negative: 'Negative',\n      neutral: 'Neutral',\n      avgSentiment: 'Avg Sentiment',\n      bullBear: 'Bull vs Bear Agent Debate',\n      bullBearDesc: 'Bull Researcher vs Bear Researcher, Investment Manager decides',\n      startDebate: 'Start Debate',\n      debating: 'Debating...',\n      analysisMode: 'Analysis Mode',\n      bullView: 'Bull View',\n      bearView: 'Bear View',\n      managerDecision: 'Manager Decision',\n      waitingAnalysis: 'Waiting for analysis...',\n      waitingDecision: 'Waiting for bull/bear analysis to complete...',\n      clickDebate: 'Click \"Start Debate\" to begin agent analysis',\n      debateDesc: 'System will call Bull/Bear researchers for multi-angle analysis, with Investment Manager making final decision',\n      backToSearch: 'Back to Search',\n      history: 'History',\n      copy: 'Copy',\n      export: 'Export',\n      regenerate: 'Regenerate',\n      stronglyRec: 'Strongly Recommend',\n      recommend: 'Recommend',\n      avoid: 'Avoid',\n      caution: 'Caution',\n      strongBull: 'Strong Positive',\n      strongBear: 'Strong Negative',\n      noKline: 'No K-line data',\n      checkCode: 'Please check if the stock code is correct',\n      sessionRestored: 'Session restored',\n      debateComplete: 'Debate analysis complete!',\n      outputting: 'Outputting...',\n      deciding: 'Deciding...',\n      analysisComplete: 'Analysis complete',\n      analysisGenerating: 'Analysis generating...',\n      decisionGenerating: 'Decision generating...',\n      debateFailed: 'Debate analysis failed',\n      sessionDeleted: 'Session deleted',\n      allHistoryCleared: 'All history cleared',\n      searchCancelled: 'Search task cancelled',\n      crawlTaskStarted: 'Targeted crawl task started',\n      crawlingInProgress: 'Crawling in progress...',\n      crawlTaskExists: 'This stock already has a crawl task in progress, syncing status...',\n      crawlTaskStopped: 'Crawl task stopped',\n      crawlTaskStopFailed: 'Failed to stop task',\n      newsCleared: 'Cleared',\n      newsItems: 'news items',\n      clearNewsConfirm: 'Are you sure you want to clear all news for \"',\n      clearNewsConfirmEnd: '\"? This action cannot be undone!',\n      stopCrawlConfirm: 'Are you sure you want to stop the current crawl task?',\n      knowledgeGraph: 'Knowledge Graph · Intelligent Retrieval',\n      knowledgeGraphDesc: 'Concurrent retrieval based on multi-dimensional keywords to improve recall',\n      nameVariants: 'Name Variants',\n      mainBusiness: 'Main Business',\n      relatedConcepts: 'Related Concepts',\n      concurrentQueries: 'Concurrent Retrieval Queries',\n      bullResearcher: 'Bull Researcher',\n      bearResearcher: 'Bear Researcher',\n      investmentManager: 'Investment Manager',\n      generatingSearchPlan: 'Generating search plan...',\n      deleteSessionConfirm: 'Are you sure you want to delete this record?',\n      clearAllHistoryConfirm: 'Are you sure you want to clear all history? This action cannot be undone!',\n      clearAllRecords: 'Clear All Records',\n      crawlSuccess: 'Targeted crawl complete! Added',\n      unknownError: 'Unknown error',\n      taskCreated: 'Task created, waiting for execution...',\n    },\n    alphaMining: {\n      training: {\n        title: 'RL Training Monitor',\n        desc: 'Transformer + REINFORCE algorithm real-time training progress',\n        ready: 'Ready',\n        running: 'Training',\n        completed: 'Completed',\n        error: 'Error',\n        steps: 'Training Steps',\n        useSentiment: 'Use Sentiment Features',\n        stop: 'Stop',\n        start: 'Start Training',\n        progress: 'Training Progress',\n        bestFactor: 'Current Best Factor',\n        convergence: 'Convergence Curve',\n        trainingFailed: 'Training failed',\n      },\n      metrics: {\n        noData: 'No evaluation data',\n        hint: 'Please evaluate a factor expression first',\n        currentFactor: 'Current Factor',\n        multiDim: 'Multi-dimensional Evaluation',\n        riskMetrics: 'Risk Metrics',\n        maxDrawdown: 'Max Drawdown',\n        safe: 'Safe',\n        danger: 'Danger',\n        dailyTurnover: 'Daily Turnover',\n        winRate: 'Win Rate',\n        totalReturn: 'Total Return',\n        returnsCurve: 'Returns Curve',\n        returnsDesc: 'Strategy cumulative returns vs benchmark',\n        strategy: 'Strategy',\n        benchmark: 'Benchmark',\n        metricDesc: 'Metric Description',\n        sortinoDesc: 'Sortino: Higher is better, >1 excellent',\n        sharpeDesc: 'Sharpe: Higher is better, >0.5 good',\n        icDesc: 'IC: |value|>0.03 effective',\n        maxDDDesc: 'Max DD: <20% safe',\n        excellent: 'Excellent',\n        good: 'Good',\n        average: 'Average',\n        poor: 'Poor',\n        lowTurnover: 'Low Turnover',\n      },\n      sentiment: {\n        title: 'Sentiment Fusion Comparison',\n        desc: 'Compare pure technical factors vs sentiment-enhanced factors',\n        steps: 'Training Steps',\n        comparing: 'Comparing...',\n        start: 'Start Comparison',\n        techOnly: 'Pure Technical Factors',\n        techDesc: ' features (RET, VOL, VOLUME_CHG, TURNOVER)',\n        enhanced: 'Sentiment-Enhanced Factors',\n        enhancedDesc: ' features (+SENTIMENT, NEWS_COUNT)',\n        bestFactor: 'Best Factor',\n        none: 'None',\n        improvement: 'Improvement',\n        improved: 'Sentiment features improved factor performance',\n        degraded: 'Sentiment features degraded factor performance',\n        scoreDiff: 'Score Difference',\n        comparison: 'Score Comparison',\n        techOnlyBar: 'Technical Only',\n        enhancedBar: 'With Sentiment',\n        conclusion: 'Conclusion:',\n        conclusionPositive: 'Sentiment features (SENTIMENT, NEWS_COUNT) contribute positively to factor mining. It is recommended to enable sentiment fusion in practical applications.',\n        conclusionNegative: 'In this experiment, sentiment features did not improve performance. Possible reasons include insufficient sample size, sentiment data noise, or too few training steps. It is recommended to increase training steps and retry.',\n        comparingText: 'Comparison experiment in progress...',\n        comparingHint: 'Training pure technical factors and sentiment-enhanced factors separately,',\n        stepsText: ' steps each',\n        startHint: 'Click \"Start Comparison\" to run sentiment fusion experiment',\n        startDesc: 'Will train pure technical factors and sentiment-enhanced factors separately for comparison',\n        comparisonFailed: 'Comparison failed',\n      },\n      agent: {\n        title: 'AgenticX Agent Call Demo',\n        desc: 'Demonstrates how Agent calls AlphaMiningTool for factor mining',\n        success: 'Success',\n        failed: 'Failed',\n        toolParams: 'Tool Parameters',\n        stockCode: 'Stock Code (Optional)',\n        stockPlaceholder: 'e.g. SH600519',\n        steps: 'Training Steps',\n        useSentiment: 'Use Sentiment Features',\n        executing: 'Executing...',\n        execute: 'Execute Agent Call',\n        inputParams: 'Input Parameters',\n        output: 'Output Result',\n        executionTime: 'Execution Time',\n        bestFactor: 'Best Factor',\n        logs: 'Execution Logs',\n        codeExample: 'Python Call Example',\n        executeFailed: 'Execution failed',\n        startHint: 'Configure parameters and click \"Execute Agent Call\"',\n        startDesc: 'Will demonstrate how QuantitativeAgent performs factor mining through AlphaMiningTool',\n        miningTask: 'Mine quantitative factors for {code}',\n        createAgent: 'Create Agent',\n        registerTool: 'Register Tool',\n        executeMining: 'Execute factor mining',\n      },\n      operators: {\n        all: 'All',\n        availableFeatures: 'Available Features',\n        techFeature: 'Technical Feature',\n        sentimentFeature: 'Sentiment Feature',\n        totalOperators: '{count} Operators',\n        totalFeatures: '{count} Features',\n        params: ' params',\n        categoryArithmetic: 'Arithmetic',\n        categoryUnary: 'Unary',\n        categoryTimeseries: 'Time Series',\n        categoryConditional: 'Conditional',\n        categorySpecial: 'Special',\n        add: 'Addition',\n        sub: 'Subtraction',\n        mul: 'Multiplication',\n        div: 'Division (Safe)',\n        neg: 'Negate',\n        abs: 'Absolute Value',\n        sign: 'Sign Function',\n        gate: 'Conditional Select',\n        max: 'Maximum',\n        min: 'Minimum',\n        delay1: 'Delay 1 Period',\n        delay5: 'Delay 5 Periods',\n        delta1: '1-Period Difference',\n        delta5: '5-Period Difference',\n        ma5: '5-Period Moving Average',\n        ma10: '10-Period Moving Average',\n        std5: '5-Period Standard Deviation',\n        std10: '10-Period Standard Deviation',\n        jump: 'Jump Detection',\n        jumpExample: 'Detect >3σ outliers',\n        decay: 'Decay Weighted',\n        max3: '3-Period Maximum',\n      },\n    },\n  },\n};\n\nexport const useGlobalI18n = () => {\n  const { lang } = useLanguageStore();\n  return globalI18n[lang];\n};\n"
  },
  {
    "path": "frontend/src/store/useNewsStore.ts",
    "content": "import { create } from 'zustand'\nimport type { News } from '@/types/api'\n\ninterface NewsStore {\n  newsList: News[]\n  selectedNews: News | null\n  setNewsList: (news: News[]) => void\n  setSelectedNews: (news: News | null) => void\n  updateNews: (newsId: number, updates: Partial<News>) => void\n}\n\nexport const useNewsStore = create<NewsStore>((set) => ({\n  newsList: [],\n  selectedNews: null,\n  \n  setNewsList: (news) => set({ newsList: news }),\n  \n  setSelectedNews: (news) => set({ selectedNews: news }),\n  \n  updateNews: (newsId, updates) =>\n    set((state) => ({\n      newsList: state.newsList.map((news) =>\n        news.id === newsId ? { ...news, ...updates } : news\n      ),\n    })),\n}))\n\n"
  },
  {
    "path": "frontend/src/store/useTaskStore.ts",
    "content": "import { create } from 'zustand'\nimport type { CrawlTask, TaskStats } from '@/types/api'\n\ninterface TaskStore {\n  tasks: CrawlTask[]\n  taskStats: TaskStats | null\n  setTasks: (tasks: CrawlTask[]) => void\n  setTaskStats: (stats: TaskStats) => void\n  addTask: (task: CrawlTask) => void\n  updateTask: (taskId: number, updates: Partial<CrawlTask>) => void\n}\n\nexport const useTaskStore = create<TaskStore>((set) => ({\n  tasks: [],\n  taskStats: null,\n  \n  setTasks: (tasks) => set({ tasks }),\n  \n  setTaskStats: (stats) => set({ taskStats: stats }),\n  \n  addTask: (task) =>\n    set((state) => ({\n      tasks: [task, ...state.tasks],\n    })),\n  \n  updateTask: (taskId, updates) =>\n    set((state) => ({\n      tasks: state.tasks.map((task) =>\n        task.id === taskId ? { ...task, ...updates } : task\n      ),\n    })),\n}))\n\n"
  },
  {
    "path": "frontend/src/types/api.ts",
    "content": "/**\n * API 类型定义\n * 与后端 API 响应结构保持一致\n */\n\nexport interface News {\n  id: number\n  title: string\n  content: string\n  url: string\n  source: string\n  publish_time: string | null\n  created_at: string\n  stock_codes: string[] | null\n  sentiment_score: number | null\n  author: string | null\n  keywords: string[] | null\n}\n\nexport interface Analysis {\n  id: number\n  news_id: number\n  agent_name: string\n  agent_role: string | null\n  analysis_result: string\n  summary: string | null\n  sentiment: 'positive' | 'negative' | 'neutral' | null\n  sentiment_score: number | null\n  confidence: number | null\n  execution_time: number | null\n  created_at: string\n}\n\nexport interface CrawlTask {\n  id: number\n  celery_task_id: string | null\n  mode: 'cold_start' | 'realtime' | 'targeted'\n  status: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled'\n  source: string\n  config: Record<string, any> | null\n  progress: {\n    current_page?: number\n    total_pages?: number\n    percentage?: number\n  } | null\n  current_page: number | null\n  total_pages: number | null\n  result: Record<string, any> | null\n  crawled_count: number\n  saved_count: number\n  error_message: string | null\n  execution_time: number | null\n  created_at: string\n  started_at: string | null\n  completed_at: string | null\n}\n\nexport interface TaskStats {\n  total: number\n  by_status: Record<string, number>\n  by_mode: Record<string, number>\n  recent_completed: number\n  total_news_crawled: number\n  total_news_saved: number\n}\n\nexport interface CrawlRequest {\n  source: string\n  start_page: number\n  end_page: number\n}\n\nexport interface CrawlResponse {\n  success: boolean\n  message: string\n  crawled_count: number\n  saved_count: number\n  source: string\n}\n\nexport interface AnalysisResponse {\n  success: boolean\n  analysis_id?: number\n  news_id: number\n  sentiment?: string\n  sentiment_score?: number\n  confidence?: number\n  summary?: string\n  execution_time?: number\n  error?: string\n}\n\n// ============ Phase 2: 个股分析类型 ============\n\nexport interface StockOverview {\n  code: string\n  name: string | null\n  total_news: number\n  analyzed_news: number\n  avg_sentiment: number | null\n  recent_sentiment: number | null\n  sentiment_trend: 'up' | 'down' | 'stable'\n  last_news_time: string | null\n}\n\nexport interface StockNewsItem {\n  id: number\n  title: string\n  content: string\n  url: string\n  source: string\n  publish_time: string | null\n  sentiment_score: number | null\n  has_analysis: boolean\n}\n\nexport interface SentimentTrendPoint {\n  date: string\n  avg_sentiment: number\n  news_count: number\n  positive_count: number\n  negative_count: number\n  neutral_count: number\n}\n\nexport interface KLineDataPoint {\n  timestamp: number  // 时间戳（毫秒）\n  date: string\n  open: number\n  high: number\n  low: number\n  close: number\n  volume: number\n  turnover?: number  // 成交额\n  change_percent?: number  // 涨跌幅\n  change_amount?: number  // 涨跌额\n  amplitude?: number  // 振幅\n  turnover_rate?: number  // 换手率\n}\n\nexport interface RealtimeQuote {\n  code: string\n  name: string\n  price: number\n  change_percent: number\n  change_amount: number\n  volume: number\n  turnover: number\n  high: number\n  low: number\n  open: number\n  prev_close: number\n}\n\n// ============ Phase 2: 智能体辩论类型 ============\n\nexport interface DebateRequest {\n  stock_code: string\n  stock_name?: string\n  context?: string\n  provider?: string\n  model?: string\n  mode?: 'parallel' | 'realtime_debate' | 'quick_analysis'  // 辩论模式\n  language?: 'zh' | 'en'  // 语言设置，影响AI回答的语言\n}\n\nexport interface AgentAnalysis {\n  success: boolean\n  agent_name: string\n  agent_role?: string\n  stance: 'bull' | 'bear'\n  analysis?: string\n  error?: string\n  timestamp?: string\n}\n\nexport interface FinalDecision {\n  success: boolean\n  agent_name: string\n  agent_role?: string\n  decision?: string\n  rating?: string\n  error?: string\n  timestamp?: string\n}\n\nexport interface TrajectoryStep {\n  step: string\n  timestamp: string\n  data: Record<string, any>\n}\n\nexport interface QuickAnalysisResult {\n  success: boolean\n  analysis?: string\n  timestamp?: string\n  error?: string\n}\n\nexport interface DebateHistoryItem {\n  round: number\n  agent: string\n  type: string\n  content: string\n}\n\nexport interface DebateResponse {\n  success: boolean\n  debate_id?: string\n  stock_code: string\n  stock_name?: string\n  mode?: 'parallel' | 'realtime_debate' | 'quick_analysis'\n  bull_analysis?: AgentAnalysis\n  bear_analysis?: AgentAnalysis\n  final_decision?: FinalDecision\n  quick_analysis?: QuickAnalysisResult\n  debate_history?: DebateHistoryItem[]\n  trajectory?: TrajectoryStep[]\n  execution_time?: number\n  error?: string\n}\n\n// ============ Phase 2: 智能体监控类型 ============\n\nexport interface AgentLogEntry {\n  id: string\n  timestamp: string\n  agent_name: string\n  agent_role?: string\n  action: string\n  status: 'started' | 'completed' | 'failed'\n  details?: Record<string, any>\n  execution_time?: number\n}\n\nexport interface AgentMetrics {\n  total_executions: number\n  successful_executions: number\n  failed_executions: number\n  avg_execution_time: number\n  agent_stats: Record<string, {\n    total: number\n    successful: number\n    failed: number\n    avg_time: number\n  }>\n  recent_activity: Array<{\n    timestamp: string\n    agent_name: string\n    action: string\n    status: string\n  }>\n}\n\nexport interface AgentInfo {\n  name: string\n  role: string\n  description: string\n  status: 'active' | 'inactive'\n}\n\nexport interface WorkflowInfo {\n  name: string\n  description: string\n  agents: string[]\n  status: 'active' | 'inactive'\n}\n\n"
  },
  {
    "path": "frontend/tailwind.config.js",
    "content": "/** @type {import('tailwindcss').Config} */\nexport default {\n  darkMode: [\"class\"],\n  content: [\n    './pages/**/*.{ts,tsx}',\n    './components/**/*.{ts,tsx}',\n    './app/**/*.{ts,tsx}',\n    './src/**/*.{ts,tsx}',\n  ],\n  prefix: \"\",\n  theme: {\n    container: {\n      center: true,\n      padding: \"2rem\",\n      screens: {\n        \"2xl\": \"1400px\",\n      },\n    },\n    extend: {\n      colors: {\n        border: \"hsl(var(--border))\",\n        input: \"hsl(var(--input))\",\n        ring: \"hsl(var(--ring))\",\n        background: \"hsl(var(--background))\",\n        foreground: \"hsl(var(--foreground))\",\n        primary: {\n          DEFAULT: \"hsl(var(--primary))\",\n          foreground: \"hsl(var(--primary-foreground))\",\n        },\n        secondary: {\n          DEFAULT: \"hsl(var(--secondary))\",\n          foreground: \"hsl(var(--secondary-foreground))\",\n        },\n        destructive: {\n          DEFAULT: \"hsl(var(--destructive))\",\n          foreground: \"hsl(var(--destructive-foreground))\",\n        },\n        muted: {\n          DEFAULT: \"hsl(var(--muted))\",\n          foreground: \"hsl(var(--muted-foreground))\",\n        },\n        accent: {\n          DEFAULT: \"hsl(var(--accent))\",\n          foreground: \"hsl(var(--accent-foreground))\",\n        },\n        popover: {\n          DEFAULT: \"hsl(var(--popover))\",\n          foreground: \"hsl(var(--popover-foreground))\",\n        },\n        card: {\n          DEFAULT: \"hsl(var(--card))\",\n          foreground: \"hsl(var(--card-foreground))\",\n        },\n      },\n      borderRadius: {\n        lg: \"var(--radius)\",\n        md: \"calc(var(--radius) - 2px)\",\n        sm: \"calc(var(--radius) - 4px)\",\n      },\n      keyframes: {\n        \"accordion-down\": {\n          from: { height: \"0\" },\n          to: { height: \"var(--radix-accordion-content-height)\" },\n        },\n        \"accordion-up\": {\n          from: { height: \"var(--radix-accordion-content-height)\" },\n          to: { height: \"0\" },\n        },\n      },\n      animation: {\n        \"accordion-down\": \"accordion-down 0.2s ease-out\",\n        \"accordion-up\": \"accordion-up 0.2s ease-out\",\n      },\n    },\n  },\n  plugins: [require(\"tailwindcss-animate\")],\n}\n\n"
  },
  {
    "path": "frontend/tsconfig.json",
    "content": "{\n  \"compilerOptions\": {\n    \"target\": \"ES2020\",\n    \"useDefineForClassFields\": true,\n    \"lib\": [\"ES2020\", \"DOM\", \"DOM.Iterable\"],\n    \"module\": \"ESNext\",\n    \"skipLibCheck\": true,\n\n    /* Bundler mode */\n    \"moduleResolution\": \"bundler\",\n    \"allowImportingTsExtensions\": true,\n    \"resolveJsonModule\": true,\n    \"isolatedModules\": true,\n    \"noEmit\": true,\n    \"jsx\": \"react-jsx\",\n\n    /* Linting */\n    \"strict\": true,\n    \"noUnusedLocals\": true,\n    \"noUnusedParameters\": true,\n    \"noFallthroughCasesInSwitch\": true,\n    \n    /* Path mapping */\n    \"baseUrl\": \".\",\n    \"paths\": {\n      \"@/*\": [\"./src/*\"]\n    }\n  },\n  \"include\": [\"src\"],\n  \"references\": [{ \"path\": \"./tsconfig.node.json\" }]\n}\n\n"
  },
  {
    "path": "frontend/tsconfig.node.json",
    "content": "{\n  \"compilerOptions\": {\n    \"composite\": true,\n    \"skipLibCheck\": true,\n    \"module\": \"ESNext\",\n    \"moduleResolution\": \"bundler\",\n    \"allowSyntheticDefaultImports\": true\n  },\n  \"include\": [\"vite.config.ts\"]\n}\n\n"
  },
  {
    "path": "frontend/vite.config.ts",
    "content": "import { defineConfig } from 'vite'\nimport react from '@vitejs/plugin-react-swc'\nimport path from 'path'\n\n// https://vitejs.dev/config/\nexport default defineConfig({\n  plugins: [react()],\n  resolve: {\n    alias: {\n      '@': path.resolve(__dirname, './src'),\n    },\n  },\n  server: {\n    port: 3000,\n    proxy: {\n      '/api': {\n        target: 'http://localhost:8000',\n        changeOrigin: true,\n      },\n    },\n  },\n})\n\n"
  },
  {
    "path": "legacy_v1/.deepsource.toml",
    "content": "version = 1\n\n[[analyzers]]\nname = \"python\"\n\n  [analyzers.meta]\n  runtime_version = \"3.x.x\""
  },
  {
    "path": "legacy_v1/Chinese_Stop_Words.txt",
    "content": "\r\nÿ\r\n\r\nǰ\r\nת\r\nλ\r\n\r\n֤ȯ\r\n\r\n\r\n\r\n\r\n\r\n\r\nο\r\n\r\n\r\nΥ߱ؾ\r\n\r\n\r\n\r\n£\r\n\r\n\r\n\r\n:\r\n\r\n \r\n&\r\n*\r\nһһ\r\n~~~~\r\n\r\n. \r\n\r\n.һ\r\n./\r\n-- \r\n\r\n\r\n\r\nۣ\r\n\r\nۢݣݣ\r\nۢ٣ģ\r\n\r\nP\r\n\r\n//\r\n\r\n\r\nۢڣ\r\nۢڣ\r\n\r\n}\r\nҲ \r\n\r\n\r\nۢ٢ޣ\r\nۢڣ£ \r\nۢ٣\r\nۢܣ\r\nۢ٢ۣ\r\nۣۢ\r\nۣ\r\n \r\n \r\nۢڣ\r\n \r\n \r\nۢ٢\r\n \r\n\r\nۢݣ\r\nۢڣ \r\nۢܣ\r\nۢڢۣ\r\nۣۢ\r\nۢܣ\r\nۢ٢ݣ\r\nۢ٢ߣ\r\nۢ٣\r\nʣ \r\nۢ٢\r\nۢ٢ܣ\r\nۢ٣\r\nۢڣ\r\nۢڢ\r\nۢڢ٣\r\nۢ٣ã\r\nۣۢ\r\nۣۢ\r\nۢڢݣ\r\nۢڢڣ\r\nһ.\r\nۢ٣\r\n.\r\nۣ\r\nۢ٣£\r\n/\r\nۢ٣\r\nۣۢ\r\nۢ٢٣\r\nۢܣ\r\nۢܣ\r\nۣۢ\r\nۢݣ\r\nۢ٣\r\nۢڢ\r\nۢڢߣ\r\nۢ٣\r\nۢڣ\r\n\r\nݣ\r\n://\r\n\r\nۢڢ\r\nۢݣ\r\n\r\n\r\n...\r\n...................\r\n\r\nڣأƣɣԣ\r\nۣۢƣ\r\n\r\nۢ٣\r\nݡġ䣽 \r\nȦա\r\n\r\n\r\nڣ\r\n\r\nۢۢ٣\r\nң̣\r\nۢ٣ţ\r\n\r\nۣݣ\r\n\r\n. \r\nۢڣ\r\nۢ\r\nۢڢߣ\r\nۢڢڣ\r\nۣۢ\r\nۢ٣\r\nۢ٣£\r\nۢ٣\r\nۢ٣\r\nۢ٣\r\nۢ٢ڣ\r\nۢڣ\r\n\r\nۢ\r\n\r\nۢ٣\r\nۢڣ\r\nۢڢޣ\r\nۣۢ\r\nۢڢ\r\n\r\n\r\n\r\nԪ\r\nۢڢ\r\n\r\n  \r\nۢ٣\r\n::\r\nۢڣ\r\nۣۢ\r\nۢܣ\r\nۢݣ\r\nۢޣ\r\nۢߣ\r\nۢ\r\nۢ \r\n\r\n\r\n?\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n,\r\n\r\n'\r\n? \r\n\r\n\r\n\r\n? \r\n\r\n<\r\n>\r\n\r\n\r\n\r\n\r\n[\r\n]\r\n(\r\n)\r\n-\r\n+\r\n\r\n\r\n\r\n/\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\"\r\n;\r\n#\r\n@\r\n\r\n\r\n\r\nգ\r\n \r\n\r\n\r\n\r\nsub\r\nexp \r\nsup\r\nsub\r\nLex \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n=\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nۢݣ\r\nۢݣ\r\nۢڣ\r\n \r\nۢڣǣ\r\nۢ٣\r\ṇ\r\n \r\nۣ\r\n......\r\n\r\n\r\n\r\nʵϰ\r\n\r\n\r\n\r\n\r\nѽ\r\nӴ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nȷ\r\n\r\n\r\n\r\n˴\r\n\r\n\r\n\r\n˵\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nȻ\r\n\r\n\r\nΩ\r\n\r\nֻ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n֮\r\n\r\n\r\n\r\n˼\r\n\r\n\r\nӶ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nĻ\r\n\r\nȵ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n˵\r\n֮\r\nǵ\r\nͽ\r\n\r\n\r\nµ\r\n\r\n\r\n\r\n\r\n\r\nλ\r\n\r\n\r\n\r\n\r\n\r\n\r\nʴ\r\nȻ\r\n\r\n\r\n\r\nȻ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nδ\r\nο\r\nʱ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n仰˵\r\n֮\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nʹ\r\n\r\nʱ\r\n\r\n\r\nȻ\r\n\r\n̶\r\n֮\r\n\r\n\r\nʹ\r\n\r\n\r\n\r\n֮\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n˵\r\n\r\n˵\r\n˵\r\nʼ\r\n\r\n\r\n\r\n\r\nɼ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nͬ\r\n\r\n\r\n\r\n\r\n\r\nһ\r\n\r\n\r\n\r\n˵\r\n˵\r\nð\r\nô\r\nÿ\r\nÿ\r\n\r\nĪ\r\nĳ\r\nĳ\r\nĳЩ\r\n\r\n\r\nı\r\nĶ\r\nĸ\r\n\r\n\r\n\r\n\r\nЩ\r\n\r\n\r\nǱ\r\nǶ\r\nǸ\r\nǻ\r\n\r\nô\r\nôЩ\r\nô\r\nʱ\r\nЩ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nԸ\r\nŶ\r\nŻ\r\nž\r\n\r\n\r\nƾ\r\nƾ\r\n\r\n\r\n\r\n\r\n\r\nһ\r\n\r\n\r\n\r\n\r\n\r\nǡǡ෴\r\nǰ\r\nǰ\r\n\r\nȻ\r\nȻ\r\nȻ\r\n\r\n˼\r\n\r\nκ\r\nƾ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nɶ\r\n\r\n\r\n\r\nʹ\r\n\r\nô\r\n\r\nʡ\r\nʱ\r\nʲô\r\nʲô\r\nʹ\r\n\r\nǵ\r\n\r\n˭\r\n˭֪\r\n˳\r\n˳\r\nƵ\r\n\r\nȻ\r\n˵\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nȻ\r\nȻ\r\n\r\nʹ\r\n\r\n\r\nͨ\r\nͬ\r\nͬʱ\r\n\r\nһ\r\n\r\n\r\nΪ\r\nΪ\r\nΪ\r\nΪʲô\r\nΪ\r\nι\r\n\r\n\r\n\r\n\r\nغ\r\nں\r\n\r\n\r\n\r\n\r\n\r\nԶ\r\n\r\n\r\n\r\n\r\nѽ\r\n\r\n\r\n\r\nҪ\r\nҪ\r\nҪȻ\r\nҪ\r\nҪô\r\nҪ\r\nҲ\r\nҲ\r\nҲ\r\nһ\r\nһ\r\nһ\r\nһ\r\nһ\r\nһ\r\nһ\r\nһ\r\n\r\n\r\n\r\n\r\nԱ\r\nԼ\r\n\r\n\r\n\r\n\r\nֻ\r\n\r\n\r\n\r\nΪ\r\nӴ\r\n\r\n\r\nɴ˿ɼ\r\n\r\n\r\nе\r\nй\r\nЩ\r\n\r\n\r\n\r\nǺ\r\n\r\nͬʱ\r\n\r\n\r\nԽ\r\n\r\n\r\n˵\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nô\r\nô\r\nô\r\n\r\nզ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n˵\r\n\r\nô\r\nô\r\nôЩ\r\nô\r\nʱ\r\nЩ\r\n\r\n\r\n֨\r\n֮\r\n֮\r\n֮\r\n֮һ\r\nֻ\r\nֻ\r\nֻҪ\r\nֻ\r\n\r\n\r\nλ\r\n\r\n\r\n\r\nԴ\r\nԸ\r\nԸ\r\nԼ\r\nԼ\r\n\r\n\r\nܵ\r\nܵ˵\r\nܵ˵\r\n֮ܶ\r\n֮\r\n\r\n\r\nȻ\r\nʹ\r\n\r\nΪ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nѽ\r\nӴ\r\n\r\n\r\nҰ\r\nŰ\r\n\r\n\r\n\r\n\r\nʱ\r\n˵\r\n\r\n\r\n\r\nȻ\r\n˳\r\nװ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n˵\r\n\r\nϾ\r\n\r\nض\r\nؽ\r\n\r\n\r\n\r\n\r\n\r\nû\r\nû\r\n\r\n\r\nȻ\r\n\r\n\r\n\r\n\r\n\r\nò\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nɿ\r\nɿ\r\n\r\n\r\n\r\n\r\n\r\nܲ\r\n\r\n\r\nȻĻ\r\n\r\n\r\nʤ\r\nʱ\r\n\r\nͬ\r\n\r\nҪ\r\n\r\n\r\n\r\n\r\n\r\n\r\nֺ\r\nɵ\r\n\r\nֶ\r\nô\r\n\r\n֪\r\nֹ\r\nֹһ\r\n\r\n\r\n\r\nԵ\r\n\r\nһ\r\n\r\n\r\nԵ\r\n˵\r\n˵ú\r\nȥ\r\n˵\r\n\r\n\r\n\r\nҹ\r\n\r\nñ\r\nû\r\n\r\n\r\n\r\n\r\n\r\n\r\n˻\r\nʤ\r\n\r\n϶\r\n\r\nȻ\r\n\r\n\r\n伫\r\n\r\n\r\n\r\n\r\n\r\n\r\nȥ\r\n\r\n˶\r\n\r\n\r\nȥ\r\nȴ\r\n\r\n\r\nϢ\r\n\r\n˵\r\n\r\n\r\n\r\n\r\n˺\r\n\r\nε\r\nҴ\r\nӲ\r\nӴ\r\nӴԺ\r\nӹŵ\r\nӹ\r\nӽԺ\r\nӿ\r\n\r\n\r\n\r\nͷ\r\nδ\r\n޵\r\nС\r\n\r\n\r\n\r\n絽\r\n\r\n\r\n\r\n\r\n\r\nﵩ\r\n\r\n촰˵\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nԼ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nԸ\r\nָ֮\r\n\r\n\r\n\r\nڶ\r\nȻ\r\nͥ\r\nͷ\r\n\r\n\r\n\r\n\r\n˵\r\n\r\n\r\n\r\n˶\r\nĿǰΪֹ\r\nͷ\r\nͷ\r\n\r\n\r\nȷ\r\nȵ\r\n\r\n\r\n\r\n\r\n\r\nȻ\r\n\r\n\r\n\r\nȻ\r\nʱ\r\n\r\n\r\n\r\n\r\n\r\n\r\nǰ\r\n\r\n\r\n\r\n\r\n˵\r\nû˵\r\n\r\n\r\n\r\n\r\n֮Ȼ\r\n֮\r\n\r\n\r\n\r\n\r\nǳ\r\nǵ\r\n\r\nڷ\r\nͷ\r\n\r\nȻ\r\n\r\n\r\n\r\n\r\n¸\r\nõ\r\n\r\nϿ\r\n粻\r\n\r\n\r\n\r\n\r\nղ\r\nպ\r\n\r\nߵ\r\n\r\n\r\nҹ\r\n\r\nʽ\r\n\r\n\r\nһ\r\nΪ\r\nȻ\r\n\r\n\r\nƵ\r\n\r\n\r\nʶ\r\n\r\n\r\n\r\nֲ\r\n߳\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nޱ\r\n\r\n\r\nα\r\nγ\r\nη\r\nο\r\nֶΪ\r\n\r\nֹ\r\n\r\nܶ\r\n\r\nȻ\r\n\r\n\r\n\r\nȻ\r\n\r\n\r\n\r\n˵\r\n\r\nȻ\r\n\r\nȻ\r\n\r\nͬ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nΪ\r\nҴ\r\n\r\n\r\n˵\r\n\r\n\r\n\r\n\r\n...\r\n֮\r\n\r\n\r\n\r\n֮\r\n֮\r\nֱ\r\n\r\n\r\n\r\nҪ\r\n\r\nϱ\r\nΪ\r\n\r\n\r\nԿ\r\nȻ\r\n\r\n\r\n\r\n\r\nʱ\r\n\r\n\r\n\r\n\r\nȥ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nȻ\r\n\r\nĽ\r\nľ\r\n\r\n\r\n\r\n\r\nȻ\r\n\r\nʹ\r\n͵\r\n\r\nȻ\r\n\r\nٷ\r\nݳ\r\nݴ\r\nʵ\r\n˵\r\n֪\r\nϤ\r\n˵\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nȥ\r\n\r\nɺ\r\n\r\n\r\n\r\nҪ\r\n\r\nü\r\n\r\n\r\n\r\n\r\n\r\nϴ\r\nʵʵ\r\n\r\n۴\r\n\r\n\r\n\r\nӦ\r\n\r\n\r\n\r\n\r\n\r\nʱ\r\n\r\n\r\n\r\n\r\n\r\n\r\nٵ\r\n\r\n\r\nһ\r\n·\r\n\r\nŴ\r\nŴ\r\n\r\n\r\nʶ\r\nȻ\r\n\r\nԼ\r\n΢\r\nΪ\r\n˵\r\n\r\n\r\n\r\nû\r\nû\r\nÿ\r\nÿÿ\r\nÿʱÿ\r\nȻ\r\nȻ\r\nĪ\r\nĪ\r\nĪ\r\nĪ\r\nĬĬ\r\nĬȻ\r\n\r\nĩ\r\n\r\nѵ\r\nѵ\r\nѹ\r\n˵\r\n\r\n긴һ\r\n\r\nż\r\nż\r\n\r\n\r\n\r\nƩ\r\nƫƫ\r\nƹ\r\nƽ\r\n\r\n\r\nͨ\r\n\r\nʵ\r\n\r\n\r\n\r\n\r\n\r\nͷ\r\n\r\n\r\n\r\nֹ\r\n\r\nǡ\r\nǡ\r\nǡǡ\r\nǡ\r\nǡ\r\nǡ\r\nǧ\r\n\r\nǧ\r\nǧǧ\r\n\r\nв\r\nĪ\r\n\r\n\r\n\r\n׿\r\n\r\n\r\n\r\n\r\n\r\n\r\n̼\r\n֮\r\n\r\n\r\nȡ\r\nȥ\r\nȨʱ\r\nȫ\r\nȫ\r\nȫ\r\nȫȻ\r\nȫ\r\nȻ\r\n\r\n\r\nԾ\r\nȻ\r\nոһ\r\nռ\r\nս\r\n\r\n\r\n糣\r\n˵ȵ\r\n\r\n\r\n\r\nǰ\r\n\r\n\r\n\r\n\r\n\r\nͷ\r\nɪɪ\r\nɳɳ\r\n\r\n\r\nȥ\r\nһ.\r\nһһ\r\nһ\r\nһ\r\nһЩ\r\nһ\r\nһͨ\r\nһ\r\nһ\r\nһʱ\r\nһ\r\nһƬ\r\nһ\r\nһֱ\r\nһ\r\nһ\r\nһת\r\nһ\r\nһ\r\n\r\n\r\n\r\n\r\n\r\nȥ\r\n\r\n\r\nһ\r\n\r\n\r\n\r\n\r\n\r\nȻ\r\n\r\n\r\n\r\n\r\n˵\r\nר\r\nҲ˵\r\n˵\r\nϸ\r\n\r\n\r\nС\r\nм\r\nḻ\r\nΪ\r\nΪʲ\r\nΪֹ\r\nΪ\r\n\r\nҪ\r\n\r\n\r\n֮ǰ\r\n֮\r\n֮\r\nҲ˵\r\nҲ\r\n˽\r\nȡ\r\n\r\nƶ\r\nЩ\r\n\r\n\r\n\r\n\r\nʲ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nΪ\r\nǰ\r\nԺ\r\n\r\n\r\nԹ\r\n\r\n\r\n\r\nͼ\r\nΰ\r\nƺ\r\n\r\n\r\n\r\n\r\n\r\nʹ\r\nʹ\r\n\r\n\r\n\r\nٽ\r\n\r\n\r\nȻ\r\n\r\n\r\nԪ\r\nȲ\r\nȺ\r\n\r\n\r\nȫ\r\nȫ\r\nȫ\r\nͬ\r\n\r\n\r\n֮\r\n\r\n\r\n\r\n\r\nٴ\r\n˵\r\n\r\n׼\r\n\r\n\r\n\r\n\r\nֱ\r\n\r\n\r\n\r\n\r\nǰ\r\nǰ\r\nǰ\r\n\r\nǿ\r\nʮ\r\n\r\nȴ\r\nȴ\r\nԭ\r\nּ\r\nʱ\r\n˫\r\nӦ\r\nӳ\r\nȡ\r\nܵ\r\n\r\nϤ\r\nֻ\r\nֻ\r\nֻ\r\nֻ\r\n\r\nٿ\r\n\r\n\r\n\r\n\r\nͬһ\r\nͬ\r\n\r\n\r\n\r\nʹ\r\nΧ\r\nǺ\r\n\r\nΨ\r\nॵ\r\n\r\n\r\nٺ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nô\r\n\r\n\r\n\r\n\r\n\r\nʧȥ\r\n\r\n\r\n\r\nõ\r\n\r\nͬ\r\n\r\nʼ\r\n\r\n\r\n֪\r\nǵ\r\n\r\n\r\nȫ\r\nȫ\r\n\r\nʵ\r\nʵ\r\n\r\n\r\n\r\nӦ\r\nԴ\r\nԷ\r\nԱ\r\nС\r\n\r\n\r\n\r\n\r\n\r\nҪ\r\n\r\n\r\n޴\r\n\r\n\r\n\r\nѾ\r\n\r\nͰ\r\n\r\n\r\n\r\n\r\n㷺\r\nӦ\r\nӦ\r\nӦ\r\n\r\n\r\nչ\r\n\r\nǿ\r\nǿ\r\n\r\nǰ\r\n\r\nʱ\r\nγ\r\n\r\nʱ\r\n\r\n\r\n\r\n\r\nó\r\nõ\r\n\r\nȻ\r\nҪ\r\n\r\n\r\n\r\nܽ\r\n\r\n\r\nΩ\r\n˼\r\nԸ\r\nΪ\r\n\r\nҵ\r\n\r\nԻ\r\nս\r\n\r\n\r\n\r\nν\r\n\r\n\r\n\r\n/\r\n\r\n\r\n\r\n\r\n޷\r\n\r\n\r\nȷ\r\nǲ\r\n\r\nǷ\r\nȻ\r\n\r\nͨ\r\nձ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nм\r\n\r\nЧ\r\nʱ\r\nе\r\nе\r\n\r\n\r\nĩ##ĩ\r\n\r\n\r\n˵\r\n\r\nĳĳ\r\n\r\nӭ\r\n\r\nֵ\r\n\r\n\r\n\r\n\r\n˵\r\n˴\r\nʱ\r\n˴\r\nÿ\r\nÿ\r\nÿ\r\nȼ\r\nȽ\r\nûκ\r\nע\r\n\r\n\r\n\r\nȻ\r\nر\r\n\r\nص\r\n\r\n\r\nִ\r\n\r\n\r\n\r\n\r\n\r\n\r\nɴ\r\nĿǰ\r\nֱ\r\nֱ\r\n\r\n\r\n෴\r\nͬ\r\n\r\nӦ\r\n൱\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nգ\r\nӺ\r\n\r\n֪\r\nȷ\r\n\r\n\r\nƶ\r\nͻ\r\nͻȻ\r\n\r\n\r\nڶ\r\n\r\nϰ\r\n\r\n\r\n̺\r\n\r\nά\r\n\r\nϵ\r\nܷ\r\nܹ\r\nԺ\r\nԴ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nΧ\r\nĪȻ\r\n\r\nΪ\r\nж\r\n\r\nʾ\r\nҪ\r\n涨\r\n\r\nƩ\r\nΪ\r\n\r\nʶ\r\n\r\n\r\n\r\n˵\r\n˵\r\n˵˵\r\n\r\n\r\n˭\r\n˭\r\n\r\n\r\n\r\nת\r\nת\r\nת\r\nﵽ\r\nѸ\r\nȥ\r\n\r\n\r\nҪ\r\nһ\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nӦ\r\nʵ\r\n\r\n\r\n\r\nͨ\r\n\r\n\r\n⵽\r\nѭ\r\n\r\nǰ\r\n\r\n\r\nȡ\r\n\r\nش\r\n\r\nҪ\r\n\r\n\r\nֹ\r\n\r\n\r\n\r\nʱ\r\n\r\nѵ˵\r\n\r\nҪ\r\n\r\nǶ\r\n\r\n "
  },
  {
    "path": "legacy_v1/Crawler/__init__.py",
    "content": "\n"
  },
  {
    "path": "legacy_v1/Crawler/crawler_cnstock.py",
    "content": "# -*- coding: utf-8 -*-\n\"\"\"\nCreated on Sat Feb 3 13:41:50 2018\n\n@author: Damon Li\n\"\"\"\n\nimport time, re, requests\nfrom concurrent import futures\nfrom bs4 import BeautifulSoup\nfrom pymongo import MongoClient\nimport Text_Analysis.text_mining as tm\n\nimport gevent\nfrom gevent import monkey,pool\nmonkey.patch_all()\n\n\nclass WebCrawlFromcnstock(object):\n    '''Crawl company news from 'http://company.cnstock.com/company/scp_gsxw/1',\n                               'http://ggjd.cnstock.com/gglist/search/qmtbbdj/1',\n                               'http://ggjd.cnstock.com/gglist/search/ggkx/1' website.\n\n    # Arguments:\n        totalPages: Number of pages set to be crawled.\n        Range: Divide total web pages into totalPages/Range parts \n               for multi-threading processing.\n        ThreadsNum: Number of threads needed to be start.\n        dbName: Name of database.\n        colName: Name of collection.\n        IP: Local IP address.\n        PORT: Port number corresponding to IP address.\n    '''\n\n    def __init__(self,**kwarg):\n        self.ThreadsNum = kwarg['ThreadsNum']\n        self.dbName = kwarg['dbName']\n        self.colName = kwarg['collectionName']\n        self.IP = kwarg['IP']\n        self.PORT = kwarg['PORT']\n        self.Prob = .5\n        self.realtimeNewsURL = []\n        self.tm = tm.TextMining(IP=\"localhost\",PORT=27017)\n\n    def ConnDB(self):\n        '''Connect mongodb.\n        '''\n        Conn = MongoClient(self.IP, self.PORT) \n        db = Conn[self.dbName]\n        self._collection = db.get_collection(self.colName)\n\n    def countchn(self,string):\n        '''Count Chinese numbers and calculate the frequency of Chinese occurrence.\n\n        # Arguments:\n            string: Each part of crawled website analyzed by BeautifulSoup.\n        '''\n        pattern = re.compile(u'[\\u1100-\\uFFFDh]+?')\n        result = pattern.findall(string)\n        chnnum = len(result)\n        possible = chnnum/len(str(string))\n        return (chnnum, possible)\n\n    def getUrlInfo(self,url): \n        '''Analyze website and extract useful information.\n        '''\n        respond = requests.get(url)\n        respond.encoding = BeautifulSoup(respond.content, \"lxml\").original_encoding\n        bs = BeautifulSoup(respond.text, \"lxml\")\n        span_list = bs.find_all('span')\n        part = bs.find_all('p')\n        article = ''\n        date = ''\n        for span in span_list:\n            if 'class' in span.attrs and span['class'] == ['timer']:\n                date = span.text\n                break\n\n        for paragraph in part:\n            chnstatus = self.countchn(str(paragraph))\n            possible = chnstatus[1]\n            if possible > self.Prob:\n               article += str(paragraph)\n\n        while article.find('<') != -1 and article.find('>') != -1:\n              string = article[article.find('<'):article.find('>')+1]\n              article = article.replace(string,'')\n        while article.find('\\u3000') != -1:\n              article = article.replace('\\u3000','')\n\n        article = ' '.join(re.split(' +|\\n+', article)).strip() \n\n        return date, article\n\n    def GenPagesLst(self,totalPages,Range,initPageID):\n        '''Generate page number list using Range parameter.\n        '''\n        PageLst = []\n        k = initPageID\n        while k+Range-1 <= totalPages:\n            PageLst.append((k,k+Range-1))\n            k += Range\n        if k+Range-1 < totalPages:\n            PageLst.append((k,totalPages))\n        return PageLst\n\n    def CrawlHistoryCompanyNews(self,startPage,endPage,url_Part_1):\n        '''Crawl historical company news \n        '''\n        self.ConnDB()\n        AddressLst = self.extractData(['Address'])[0]\n        if AddressLst == []:\n            urls = []\n            for pageId in range(startPage,endPage+1):\n                urls.append(url_Part_1 + str(pageId))\n            for url in urls:\n                print(url)\n                resp = requests.get(url)\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \n                bs = BeautifulSoup(resp.text, \"lxml\")\n                a_list = bs.find_all('a')\n                for a in a_list:\n                    if 'href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\n                    and a['href'].find('http://company.cnstock.com/company/') != -1 \\\n                    and a.parent.find('span'):\n                        date, article = self.getUrlInfo(a['href'])\n                        while article == '' and self.Prob >= .1:\n                            self.Prob -= .193\n                            date, article = self.getUrlInfo(a['href'])\n                        self.Prob =.5\n                        if article != '':\n                            data = {'Date' : date,\n                                    'Address' : a['href'],\n                                    'Title' : a['title'],\n                                    'Article' : article}\n                            self._collection.insert_one(data)\n        else:\n            urls = []\n            for pageId in range(startPage,endPage+1):\n                urls.append(url_Part_1 + str(pageId))\n            for url in urls:\n                print(' <Re-Crawl url> ', url)\n                resp = requests.get(url)\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \n                bs = BeautifulSoup(resp.text, \"lxml\")\n                a_list = bs.find_all('a')\n                for a in a_list:\n                    if 'href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\n                    and a['href'].find('http://company.cnstock.com/company/') != -1 \\\n                    and a.parent.find('span'):\n                        if a['href'] not in AddressLst:\n                            date, article = self.getUrlInfo(a['href'])\n                            while article == '' and self.Prob >= .1:\n                                self.Prob -= .1\n                                date, article = self.getUrlInfo(a['href'])\n                            self.Prob =.5\n                            if article != '':\n                                data = {'Date' : date,\n                                        'Address' : a['href'],\n                                        'Title' : a['title'],\n                                        'Article' : article}\n                                self._collection.insert_one(data)\n\n    def CrawlRealtimeCompanyNews(self,url_part_lst):\n        '''Continue crawling company news from first website page \n           every once in a while and extract the useful information, \n           including summary, key words, released date, related stock \n           codes list and main body.\n        '''\n        doc_lst = []\n        self.ConnDB()\n        self._AddressLst = self.extractData(['Address'])[0]\n        for url_Part in url_part_lst:\n            url = url_Part + str(1)\n            resp = requests.get(url)\n            resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \n            bs = BeautifulSoup(resp.text, \"lxml\")\n            a_list = bs.find_all('a')\n            if len(self.realtimeNewsURL) == 0:\n                for a in a_list:\n                    if ('href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\n                    and a['href'].find('http://company.cnstock.com/company/') != -1 \\\n                    and a.parent.find('span')) or ('href' in a.attrs and 'target' in a.attrs \\\n                    and 'title' in a.attrs and a['href'].find('http://ggjd.cnstock.com/company/') != -1 \\\n                    and a.parent.find('span')):\n                        if a['href'] not in self._AddressLst:\n                            self.realtimeNewsURL.append(a['href'])\n                            date, article = self.getUrlInfo(a['href'])\n                            while article == '' and self.Prob >= .1:\n                                self.Prob -= .1\n                                date, article = self.getUrlInfo(a['href'])\n                            self.Prob =.5\n                            if article != '':\n                                data = {'Date' : date,\n                                        'Address' : a['href'],\n                                        'Title' : a['title'],\n                                        'Article' : article}\n                                self._collection.insert_one(data)\n                                doc_lst.append(a['title'] + ' ' + article)\n                                print(' [' + date + '] ' + a['title'])\n            else:\n                for a in a_list:\n                    if ('href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\n                    and a['href'].find('http://company.cnstock.com/company/') != -1 \\\n                    and a.parent.find('span')) or ('href' in a.attrs and 'target' in a.attrs \\\n                    and 'title' in a.attrs and a['href'].find('http://ggjd.cnstock.com/company/') != -1 \\\n                    and a.parent.find('span')):\n                        if a['href'] not in self.realtimeNewsURL and a['href'] not in self._AddressLst:\n                            self.realtimeNewsURL.append(a['href'])\n                            date, article = self.getUrlInfo(a['href'])\n                            while article == '' and self.Prob >= .1:\n                                self.Prob -= .1\n                                date, article = self.getUrlInfo(a['href'])\n                            self.Prob =.5\n                            if article != '':\n                                data = {'Date' : date,\n                                        'Address' : a['href'],\n                                        'Title' : a['title'],\n                                        'Article' : article}\n                                self._collection.insert_one(data)\n                                doc_lst.append(a['title'] + ' ' + article)\n                                print(' [' + date + '] ' + a['title'])\n        return doc_lst\n\n    def extractData(self,tag_list):\n        '''Extract column data with tag in 'tag_list' to the list.\n        '''\n        data = []\n        for tag in tag_list:\n            exec(tag + \" = self._collection.distinct('\" + tag + \"')\")\n            exec(\"data.append(\" + tag + \")\")\n        return data\n\n    def coroutine_run(self,totalPages,Range,initPageID,**kwarg):\n        '''Coroutines running.\n        '''\n        jobs = []\n        page_ranges_lst = self.GenPagesLst(totalPages,Range,initPageID)\n        for page_range in page_ranges_lst:\n            jobs.append(gevent.spawn(self.CrawlHistoryCompanyNews,page_range[0],page_range[1],kwarg['url_Part_1']))\n        gevent.joinall(jobs) \n\n    def multi_threads_run(self,**kwarg):\n        '''Multi-threading running.\n        '''\n        page_ranges_lst = self.GenPagesLst()\n        print(' Using ' + str(self.ThreadsNum) + ' threads for collecting news ... ')\n        with futures.ThreadPoolExecutor(max_workers=self.ThreadsNum) as executor:\n            future_to_url = {executor.submit(self.CrawlHistoryCompanyNews,page_range[0],page_range[1]) : \\\n                             ind for ind, page_range in enumerate(page_ranges_lst)}  \n\n    def classifyRealtimeStockNews(self):\n        '''Continue crawling and classifying news(articles/documents) every 60s. \n        '''\n        while True:\n            print(' * start crawling news from CNSTOCK ... ')\n            doc_list = self.CrawlRealtimeCompanyNews(['http://company.cnstock.com/company/scp_gsxw/',\\\n                                                    'http://ggjd.cnstock.com/gglist/search/qmtbbdj/',\\\n                                                    'http://ggjd.cnstock.com/gglist/search/ggkx/']) #\n            print(' * finish crawling ... ')\n            if len(doc_list) != 0:\n                self.tm.classifyRealtimeStockNews(doc_list)\n            time.sleep(60)"
  },
  {
    "path": "legacy_v1/Crawler/crawler_jrj.py",
    "content": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nCreated on Sat Feb 3 13:41:50 2018\r\n\r\n@author: Damon Li\r\n\"\"\"\r\n\r\nimport time, re, requests, datetime\r\nfrom concurrent import futures\r\nfrom bs4 import BeautifulSoup\r\nfrom pymongo import MongoClient\r\nimport Text_Analysis.text_mining as tm\r\nfrom bson.objectid import ObjectId\r\n\r\nimport gevent\r\nfrom gevent import monkey,pool\r\nmonkey.patch_all()\r\n\r\n\r\nclass WebCrawlFromjrj(object):\r\n    '''Crawl company news from 'http://roll.finance.sina.com.cn/finance/zq1/ssgs/index.shtml' website.\r\n\r\n    # Arguments:\r\n        totalPages: Number of pages set to be crawled.\r\n        Range: Divide total web pages into totalPages/Range parts \r\n               for multi-threading processing.\r\n        ThreadsNum: Number of threads needed to be start.\r\n        dbName: Name of database.\r\n        colName: Name of collection.\r\n        IP: Local IP address.\r\n        PORT: Port number corresponding to IP address.\r\n    '''\r\n\r\n    def __init__(self,*arg,**kwarg):\r\n        self.startDate = arg[0]\r\n        self.endDate = arg[1]\r\n        self.Range = arg[2]\r\n        self.ThreadsNum = kwarg['ThreadsNum']\r\n        self.dbName = kwarg['dbName']\r\n        self.colName = kwarg['collectionName']\r\n        self.IP = kwarg['IP']\r\n        self.PORT = kwarg['PORT']\r\n        self.Prob = .5\r\n        self.realtimeNewsURL = []\r\n        self.tm = tm.TextMining(IP=\"localhost\",PORT=27017)\r\n\r\n    def getEveryDay(self,begin_date,end_date):\r\n        '''Get date list from 'begin_date' to 'end_date' on the calendar.\r\n        '''\r\n        date_list = []  \r\n        begin_date = datetime.datetime.strptime(begin_date, \"%Y-%m-%d\")  \r\n        end_date = datetime.datetime.strptime(end_date,\"%Y-%m-%d\")  \r\n        while begin_date <= end_date:  \r\n            date_str = begin_date.strftime(\"%Y-%m-%d\")  \r\n            date_list.append(date_str)  \r\n            begin_date += datetime.timedelta(days=1)  \r\n        return date_list  \r\n\r\n    def countchn(self,string):\r\n        '''Count Chinese numbers and calculate the frequency of Chinese occurrence.\r\n\r\n        # Arguments:\r\n            string: Each part of crawled website analyzed by BeautifulSoup.\r\n        '''\r\n        pattern = re.compile(u'[\\u1100-\\uFFFDh]+?')\r\n        result = pattern.findall(string)\r\n        chnnum = len(result)\r\n        possible = chnnum/len(str(string))\r\n        return (chnnum, possible)\r\n\r\n    def getUrlInfo(self,url,specificDate):\r\n        '''Analyze website and extract useful information.\r\n        '''\r\n        respond = requests.get(url)\r\n        respond.encoding = BeautifulSoup(respond.content, \"lxml\").original_encoding\r\n        bs = BeautifulSoup(respond.text, \"lxml\")\r\n        meta_list = bs.find_all('meta')\r\n        span_list = bs.find_all('span')\r\n        part = bs.find_all('p')\r\n        article = ''\r\n        date = ''\r\n        NotFoundPage = False\r\n        for span in span_list:\r\n            for child in span.children:\r\n                if child == 'jrj_final_date_start':\r\n                    date = span.text.replace('\\r','').replace('\\n','')\r\n                    if date.find('年') != -1:\r\n                        date = date.replace('年','-').replace('月','-').replace('日','')\r\n                    break\r\n            break\r\n        if date == '':\r\n            date = specificDate\r\n\r\n        for p in part:\r\n            if p.text.find('页面没有找到') != -1:\r\n               NotFoundPage = True\r\n               break\r\n\r\n        if not NotFoundPage:\r\n            for paragraph in part:\r\n                chnstatus = self.countchn(str(paragraph))\r\n                possible = chnstatus[1]\r\n                if possible > self.Prob:\r\n                   article += str(paragraph)\r\n\r\n            while article.find('<') != -1 and article.find('>') != -1:\r\n                  string = article[article.find('<'):article.find('>')+1]\r\n                  article = article.replace(string,'')\r\n            while article.find('\\u3000') != -1:\r\n                  article = article.replace('\\u3000','')\r\n\r\n            article = ' '.join(re.split(' +|\\n+', article)).strip() \r\n\r\n        return date, article, NotFoundPage\r\n\r\n    def GenDatesLst(self):\r\n        '''Divide date list into parts using Range parameter.\r\n        '''\r\n        DatesLst = self.getEveryDay(self.startDate,self.endDate)\r\n        NewDatesLst = []\r\n        k = 0\r\n        while k < len(DatesLst):\r\n            if k+self.Range >= len(DatesLst):\r\n                break\r\n            else:\r\n                NewDatesLst.append(DatesLst[k:k+self.Range])\r\n                k += self.Range \r\n        NewDatesLst.append(DatesLst[k:])\r\n        return NewDatesLst\r\n\r\n    def findPagesOfSpecificDate(self,firstUrl,date):\r\n        '''Search the number of web pages of specific date.\r\n\r\n        # Arguments:\r\n            firstUrl: The first web page of specific date.\r\n            date: Desinated date.\r\n        '''\r\n        respond = requests.get(firstUrl)\r\n        respond.encoding = BeautifulSoup(respond.content, \"lxml\").original_encoding\r\n        bs = BeautifulSoup(respond.text, \"lxml\")\r\n        a_list = bs.find_all('a')\r\n        Nums = 1\r\n        for a in a_list:\r\n            if 'href' in a.attrs and 'target' in a.attrs:\r\n                if a['href'].find(date.replace('-','') + '_') != -1 and a.text.isdigit():\r\n                    Nums += 1\r\n        return Nums\r\n\r\n    def CrawlRealtimeCompanyNews(self,today_Date): \r\n        '''Continue crawling company news from first website page\r\n           every once in a while and extract the useful information, \r\n           including summary, key words, released date, related stock \r\n           codes list and main body.\r\n        '''\r\n        doc_lst = []\r\n        if len(self.realtimeNewsURL) == 0:\r\n            self.ConnDB()\r\n            self._AddressLst = self.extractData(['Address'])[0]\r\n            urlsAndDates = []\r\n            url_Part_1 = 'http://stock.jrj.com.cn/xwk/'\r\n            url_Part_2 = '_1.shtml'\r\n            firstUrl = url_Part_1 + today_Date.replace('-','')[0:6] + '/' + today_Date.replace('-','') + url_Part_2\r\n            Nums = self.findPagesOfSpecificDate(firstUrl,today_Date)\r\n            for num in range(1,Nums+1):\r\n                urlsAndDates.append((url_Part_1 + today_Date.replace('-','')[0:6] + '/' + today_Date.replace('-','') \\\r\n                    + '_' + str(num) + '.shtml', today_Date))\r\n            for url, specificDate in urlsAndDates:\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and a.string and \\\r\n                    a['href'].find('/' + specificDate.replace('-','')[0:4] + '/' + specificDate.replace('-','')[4:6] + '/') != -1:\r\n                        if a['href'] not in self._AddressLst:\r\n                            self.realtimeNewsURL.append(a['href'])\r\n                            date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                            while article == '' and self.Prob >= .1 and not NotFoundPage:\r\n                                self.Prob -= .1\r\n                                date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                            self.Prob =.5\r\n                            if article != '':\r\n                                data = {'Date' : date,\r\n                                        'Address' : a['href'],\r\n                                        'Title' : a.string,\r\n                                        'Article' : article}\r\n                                self._collection.insert_one(data)\r\n                                doc_lst.append(a.string + ' ' + article)\r\n                                print(' [' + date + '] ' + a.string)\r\n        else:\r\n            urlsAndDates = []\r\n            url_Part_1 = 'http://stock.jrj.com.cn/xwk/'\r\n            url_Part_2 = '_1.shtml'\r\n            firstUrl = url_Part_1 + today_Date.replace('-','')[0:6] + '/' + today_Date.replace('-','') + url_Part_2\r\n            Nums = self.findPagesOfSpecificDate(firstUrl,today_Date)\r\n            for num in range(1,Nums+1):\r\n                urlsAndDates.append((url_Part_1 + today_Date.replace('-','')[0:6] + '/' + today_Date.replace('-','') \\\r\n                    + '_' + str(num) + '.shtml', today_Date))\r\n            for url, specificDate in urlsAndDates:\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and a.string and \\\r\n                    a['href'].find('/' + specificDate.replace('-','')[0:4] + '/' + specificDate.replace('-','')[4:6] + '/') != -1:\r\n                        if a['href'] not in self._AddressLst and a['href'] not in self.realtimeNewsURL:\r\n                            self.realtimeNewsURL.append(a['href'])\r\n                            date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                            while article == '' and self.Prob >= .1 and not NotFoundPage:\r\n                                self.Prob -= .1\r\n                                date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                            self.Prob =.5\r\n                            if article != '':\r\n                                data = {'Date' : date,\r\n                                        'Address' : a['href'],\r\n                                        'Title' : a.string,\r\n                                        'Article' : article}\r\n                                self._collection.insert_one(data)\r\n                                doc_lst.append(a.string + ' ' + article)\r\n                                print(' [' + date + '] ' + a.string)\r\n        return doc_lst\r\n\r\n    def CrawlHistoryCompanyNews(self,datelst):\r\n        '''Crawl historical company news \r\n        '''\r\n        self.ConnDB()\r\n        AddressLst = self.extractData(['Address'])[0]\r\n        if AddressLst == []:\r\n            urlsAndDates = []\r\n            url_Part_1 = 'http://stock.jrj.com.cn/xwk/'\r\n            url_Part_2 = '_1.shtml'\r\n            for date in datelst:\r\n                firstUrl = url_Part_1 + date.replace('-','')[0:6] + '/' + date.replace('-','') + url_Part_2\r\n                Nums = self.findPagesOfSpecificDate(firstUrl,date)\r\n                for num in range(1,Nums+1):\r\n                    urlsAndDates.append((url_Part_1 + date.replace('-','')[0:6] + '/' + date.replace('-','') \\\r\n                        + '_' + str(num) + '.shtml', date))\r\n            for url, specificDate in urlsAndDates:\r\n                print(url)\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and a.string and \\\r\n                    a['href'].find('/' + specificDate.replace('-','')[0:4] + '/' + specificDate.replace('-','')[4:6] + '/') != -1:\r\n                        date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                        while article == '' and self.Prob >= .1 and not NotFoundPage:\r\n                            self.Prob -= .1\r\n                            date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                        self.Prob =.5\r\n                        if article != '':\r\n                            data = {'Date' : date,\r\n                                    'Address' : a['href'],\r\n                                    'Title' : a.string,\r\n                                    'Article' : article}\r\n                            self._collection.insert_one(data)\r\n        else:\r\n            urlsAndDates = []\r\n            url_Part_1 = 'http://stock.jrj.com.cn/xwk/'\r\n            url_Part_2 = '_1.shtml'\r\n            for date in datelst:\r\n                firstUrl = url_Part_1 + date.replace('-','')[0:6] + '/' + date.replace('-','') + url_Part_2\r\n                Nums = self.findPagesOfSpecificDate(firstUrl,date)\r\n                for num in range(1,Nums+1):\r\n                    urlsAndDates.append((url_Part_1 + date.replace('-','')[0:6] + '/' + date.replace('-','') \\\r\n                        + '_' + str(num) + '.shtml', date))\r\n            for url, specificDate in urlsAndDates:\r\n                print(' <Re-Crawl url> ', url)\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and a.string and \\\r\n                    a['href'].find('/' + specificDate.replace('-','')[0:4] + '/' + specificDate.replace('-','')[4:6] + '/') != -1:\r\n                        if a['href'] not in AddressLst:\r\n                            date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                            while article == '' and self.Prob >= .1 and not NotFoundPage:\r\n                                self.Prob -= .1\r\n                                date, article, NotFoundPage = self.getUrlInfo(a['href'],specificDate)\r\n                            self.Prob =.5\r\n                            if article != '':\r\n                                data = {'Date' : date,\r\n                                        'Address' : a['href'],\r\n                                        'Title' : a.string,\r\n                                        'Article' : article}\r\n                                self._collection.insert_one(data)\r\n\r\n    def ConnDB(self):\r\n        '''Connect mongodb.\r\n        '''\r\n        Conn = MongoClient(self.IP, self.PORT) \r\n        db = Conn[self.dbName]\r\n        self._collection = db.get_collection(self.colName)\r\n\r\n    def extractData(self,tag_list):\r\n        '''Extract column data with tag in 'tag_list' to the list.\r\n        '''\r\n        data = []\r\n        for tag in tag_list:\r\n            exec(tag + \" = self._collection.distinct('\" + tag + \"')\")\r\n            exec(\"data.append(\" + tag + \")\")\r\n        return data\r\n\r\n    def StockCodeDuplicateRemoval(self):\r\n        '''Discarded.\r\n        '''\r\n        Conn = MongoClient(self.IP, self.PORT) \r\n        db = Conn[self.dbName]\r\n        collection = db.get_collection(self.colName)\r\n        idLst = collection.distinct('_id')\r\n        relevantStockSeries = []\r\n        for _id in idLst:\r\n            data = collection.find_one({'_id':ObjectId(_id)})\r\n            if 'relevantStock' in data.keys():\r\n                relevantStock = collection.find_one({'_id':ObjectId(_id)})['relevantStock']\r\n                if len(relevantStock) > 1:\r\n                    relevantStockCodeDuplicateRemoval = list(set(relevantStock))\r\n                    collection.update({\"_id\":_id},{\"$set\":{\"relevantStock\":' '.join(relevantStockCodeDuplicateRemoval)}})\r\n                    print(relevantStockCodeDuplicateRemoval)\r\n                    break\r\n                if len(relevantStock) == 1:\r\n                    print(relevantStock)\r\n                    print(len(relevantStock))\r\n                    break\r\n        print('Duplicate Removal successfully ... ')\r\n\r\n    def coroutine_run(self):\r\n        '''Coroutines running.\r\n        '''\r\n        jobs = []\r\n        dateLst = self.GenDatesLst()\r\n        for datelst in dateLst:\r\n            jobs.append(gevent.spawn(self.CrawlHistoryCompanyNews,datelst))\r\n        gevent.joinall(jobs) \r\n\r\n    def multi_threads_run(self,**kwarg):\r\n        '''Multi-threading running.\r\n        '''\r\n        dateLst = self.GenDatesLst()\r\n        print(' Using ' + str(self.ThreadsNum) + ' threads for collecting news ... ')\r\n        with futures.ThreadPoolExecutor(max_workers=self.ThreadsNum) as executor:\r\n            future_to_url = {executor.submit(self.CrawlHistoryCompanyNews,datelst) : \\\r\n                             ind for ind, datelst in enumerate(dateLst)}  \r\n\r\n    def classifyRealtimeStockNews(self):\r\n        '''Continue crawling and classifying news(articles/documents) every 60s. \r\n        '''\r\n        today_Date = datetime.datetime.now().strftime('%Y-%m-%d')\r\n        while True:\r\n            print(' * start crawling news from JRJ ... ')\r\n            doc_list = self.CrawlRealtimeCompanyNews(today_Date) #\r\n            print(' * finish crawling ... ')\r\n            if len(doc_list) != 0:\r\n                self.tm.classifyRealtimeStockNews(doc_list)\r\n            time.sleep(60)\r\n"
  },
  {
    "path": "legacy_v1/Crawler/crawler_nbd.py",
    "content": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nCreated on Tue Jan 23 17:19:50 2018\r\n\r\n@author: Damon Li\r\n\"\"\"\r\n\r\nimport re, os, time, requests\r\nfrom bs4 import BeautifulSoup\r\nimport pymongo, threading, traceback\r\n\r\nimport gevent\r\nfrom gevent import monkey,pool\r\nmonkey.patch_all()\r\n\r\n\r\nclass WebCrawlFromNBD(object):\r\n    '''Crawl company news from 'http://stocks.nbd.com.cn/columns/275' website.\r\n\r\n    # Arguments:\r\n        totalPages: Number of pages set to be crawled.\r\n        Range: Divide total web pages into totalPages/Range parts \r\n               for multi-threading processing.\r\n        ThreadsNum: Number of threads needed to be start.\r\n        dbName: Name of database.\r\n        colName: Name of collection.\r\n        IP: Local IP address.\r\n        PORT: Port number corresponding to IP address.\r\n    '''\r\n\r\n\r\n    def __init__(self,*arg,**kwarg):\r\n        self.totalPages = arg[0] #totalPages\r\n        self.Range = arg[1] #Range\r\n        self.ThreadsNum = kwarg['ThreadsNum']\r\n        self.dbName = kwarg['dbName']\r\n        self.colName = kwarg['collectionName']\r\n        self.IP = kwarg['IP']\r\n        self.PORT = kwarg['PORT']\r\n        self.url_lst_withoutArticles = []\r\n        self.title_lst_withoutArticles = []\r\n        self.url_lst_withoutNews = []\r\n        self.CrawledUrlsID = []\r\n        self.filePath = os.path.dirname(os.path.realpath(__file__))\r\n\r\n    def countchn(self,string):\r\n        '''Count Chinese numbers and calculate the frequency of Chinese occurrence.\r\n\r\n        # Arguments:\r\n            string: Each part of crawled website analyzed by BeautifulSoup.\r\n        '''\r\n        pattern = re.compile(u'[\\u1100-\\uFFFDh]+?')\r\n        result = pattern.findall(string)\r\n        chnnum = len(result)\r\n        possible = chnnum/len(str(string))\r\n        return (chnnum, possible)\r\n\r\n    def getUrlInfo(self,url):\r\n        '''Analyze website and extract useful information.\r\n        '''\r\n        respond = requests.get(url)\r\n        respond.encoding = BeautifulSoup(respond.content, \"lxml\").original_encoding\r\n        bs = BeautifulSoup(respond.text, \"lxml\")\r\n        span_list = bs.find_all('span')\r\n        part = bs.find_all('p')\r\n        article = ''\r\n        date = ''\r\n\r\n        for span in span_list:\r\n            if 'class' in span.attrs and span.text and span['class'] == ['time']:\r\n                    string = span.text.split()\r\n                    for dt in string:\r\n                        if dt.find('-') != -1:\r\n                            date += dt + ' '\r\n                        elif dt.find(':') != -1:\r\n                            date += dt\r\n                    break\r\n\r\n        for paragraph in part:\r\n            chnstatus = self.countchn(str(paragraph))\r\n            possible = chnstatus[1]\r\n            if possible > 0.5:\r\n               article += str(paragraph)\r\n\r\n        while article.find('<') != -1 and article.find('>') != -1:\r\n              string = article[article.find('<'):article.find('>')+1]\r\n              article = article.replace(string,'')\r\n        while article.find('\\u3000') != -1:\r\n              article = article.replace('\\u3000','')\r\n\r\n        article = ' '.join(re.split(' +|\\n+', article)).strip() \r\n\r\n        return article, date\r\n\r\n    def GenPagesLst(self):\r\n        '''Generate page number list using Range parameter.\r\n        '''\r\n        PageLst = []\r\n        k = 1\r\n        while k+self.Range-1 <= self.totalPages:\r\n            PageLst.append((k,k+self.Range-1))\r\n            k += self.Range\r\n        if k+self.Range-1 < self.totalPages:\r\n            PageLst.append((k,self.totalPages))\r\n        return PageLst\r\n\r\n    def ReCrawlNews(self,url_list):\r\n        '''Continue crawling pages without any return.\r\n\r\n        # Arguments:\r\n          url_list: List of web pages that without any values.\r\n        '''\r\n        try:\r\n          nums = 1\r\n          ulst = []\r\n          while url_list != []:\r\n             ulst.append(url_list[0])\r\n             print(' <Re-Crawl News> ', url_list[0])\r\n             if nums > 10:\r\n                print(' <!> wait 1s before request url again ...')\r\n                time.sleep(1)\r\n                nums = 1\r\n             resp = requests.get(url_list[0])\r\n             resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n             bs = BeautifulSoup(resp.text, \"lxml\")\r\n             a_list = bs.find_all('a')\r\n             if a_list != []:\r\n               for a in a_list:\r\n                   if 'click-statistic' in a.attrs and a.string \\\r\n                   and a['click-statistic'].find('Article_') != -1 \\\r\n                   and a['href'].find('http://www.nbd.com.cn/articles/') != -1:\r\n                       article, date = self.getUrlInfo(a['href'])\r\n                       if date == '' or article == '':\r\n                          self.url_lst_withoutArticles.append(a['href'])\r\n                          self.title_lst_withoutArticles.append(a.string)\r\n                       elif date != '' and article != '':\r\n                           data = {'date' : date,\r\n                                   'address' : a['href'],\r\n                                   'title' : a.string,\r\n                                   'Article' : article}\r\n                           self.collection.insert_one(data)\r\n                           self.CrawledUrlsID.append(int(url_list[0].split('/')[-1]))\r\n               url_list.remove(url_list[0])\r\n             if len(ulst) >= 2 and ulst[-1] == ulst[-2]:\r\n                nums += 1\r\n          return self.url_lst_withoutArticles, self.title_lst_withoutArticles\r\n        except Exception:\r\n            traceback.print_exc()\r\n\r\n    def ReCrawlArticles(self,url_list,title_list):\r\n        '''Continue crawling urls without main information return.\r\n\r\n        # Arguments:\r\n          url_list: List of urls without getting any articles(main body).\r\n          title_list: List of urls without crawling any titles.\r\n        '''\r\n        nums = 1\r\n        ulst = []\r\n        while url_list != []:\r\n            ulst.append(url_list[0])\r\n            print(' <Re-Crawl Articles> ', url_list[0])\r\n            if nums > 10:\r\n              print(' <!> wait 1s before request url again ...')\r\n              time.sleep(1)\r\n              nums = 1\r\n            article, date = self.getUrlInfo(url_list[0])\r\n            if date != '' and article != '':\r\n               data = {'date' : date,\r\n                       'address' : url_list[0],\r\n                       'title' : title_list[0],\r\n                       'Article' : article}\r\n               print(' remove ' + url_list[0] + ' successfully ... ')\r\n               url_list.remove(url_list[0])\r\n               title_list.remove(title_list[0])\r\n               self.collection.insert_one(data)\r\n            if len(ulst) >= 2 and ulst[-1] == ulst[-2]:\r\n               nums += 1\r\n\r\n    def CrawlCompanyNews(self,startPage,endPage):\r\n        '''Crawl historical company news \r\n        '''\r\n        self.ConnDB()\r\n        AddressLst = self.extractData(['address'])[0]\r\n        if AddressLst == []:\r\n          urls = []\r\n          url_Part = 'http://stocks.nbd.com.cn/columns/275/page/' \r\n          for pageId in range(startPage,endPage+1):\r\n              urls.append(url_Part + str(pageId))\r\n          for url in urls:\r\n              print(url)\r\n              resp = requests.get(url)\r\n              resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n              bs = BeautifulSoup(resp.text, \"lxml\")\r\n              a_list = bs.find_all('a')\r\n              if a_list == []:\r\n                self.url_lst_withoutNews.append(url)\r\n              else:\r\n                for a in a_list:\r\n                    if 'click-statistic' in a.attrs and a.string \\\r\n                    and a['click-statistic'].find('Article_') != -1 \\\r\n                    and a['href'].find('http://www.nbd.com.cn/articles/') != -1:\r\n                        article, date = self.getUrlInfo(a['href'])\r\n                        if date == '' or article == '':\r\n                           self.url_lst_withoutArticles.append(a['href'])\r\n                           self.title_lst_withoutArticles.append(a.string)\r\n                        elif date != '' and article != '':\r\n                            data = {'date' : date,\r\n                                    'address' : a['href'],\r\n                                    'title' : a.string,\r\n                                    'Article' : article}\r\n                            self.collection.insert_one(data)\r\n                            self.CrawledUrlsID.append(int(url.split('/')[-1]))\r\n        else:\r\n          urls = []\r\n          url_Part = 'http://stocks.nbd.com.cn/columns/275/page/' \r\n          for pageId in range(startPage,endPage+1):\r\n              urls.append(url_Part + str(pageId))\r\n          for url in urls:\r\n              print(' <Re-Crawl url> ', url)\r\n              resp = requests.get(url)\r\n              resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n              bs = BeautifulSoup(resp.text, \"lxml\")\r\n              a_list = bs.find_all('a')\r\n              if a_list == []:\r\n                self.url_lst_withoutNews.append(url)\r\n              else:\r\n                for a in a_list:\r\n                    if 'click-statistic' in a.attrs and a.string \\\r\n                    and a['click-statistic'].find('Article_') != -1 \\\r\n                    and a['href'].find('http://www.nbd.com.cn/articles/') != -1:\r\n                        if a['href'] not in AddressLst:\r\n                            article, date = self.getUrlInfo(a['href'])\r\n                            if date == '' or article == '':\r\n                               self.url_lst_withoutArticles.append(a['href'])\r\n                               self.title_lst_withoutArticles.append(a.string)\r\n                            elif date != '' and article != '':\r\n                                data = {'date' : date,\r\n                                        'address' : a['href'],\r\n                                        'title' : a.string,\r\n                                        'Article' : article}\r\n                                self.collection.insert_one(data)\r\n                                self.CrawledUrlsID.append(int(url.split('/')[-1]))\r\n\r\n    def ConnDB(self):\r\n        '''Connect mongodb.\r\n        '''\r\n        client = pymongo.MongoClient(self.IP, self.PORT)\r\n        mydb = client[self.dbName]\r\n        self.collection = mydb.get_collection(self.colName)\r\n\r\n    def extractData(self,tag_list):\r\n        '''Extract column data with tag in 'tag_list' to the list.\r\n        '''\r\n        data = []\r\n        for tag in tag_list:\r\n            exec(tag + \" = self.collection.distinct('\" + tag + \"')\")\r\n            exec(\"data.append(\" + tag + \")\")\r\n        return data\r\n\r\n    def single_run(self):\r\n        '''Single threading running.\r\n        '''\r\n        page_ranges_lst = self.GenPagesLst()\r\n        for ind, page_range in enumerate(page_ranges_lst):\r\n            self.CrawlCompanyNews(page_range[0],page_range[1]) \r\n        return self.url_lst_withoutNews\r\n\r\n    def multi_threads_run(self):\r\n        '''Multi-threading running.\r\n        '''\r\n        page_ranges_lst = self.GenPagesLst()\r\n        th_lst = []\r\n        for page_range in page_ranges_lst:\r\n            thread = threading.Thread(target=self.CrawlCompanyNews,\\\r\n                                      args=(page_range[0],page_range[1]))\r\n            th_lst.append(thread)\r\n        for thread in th_lst:\r\n            thread.start()\r\n        for thread in th_lst:\r\n            thread.join()\r\n        return self.url_lst_withoutNews\r\n\r\n    def coroutine_run(self):\r\n        '''Coroutines running.\r\n        '''\r\n        jobs = []\r\n        page_ranges_lst = self.GenPagesLst()\r\n        for page_range in page_ranges_lst:\r\n            jobs.append(gevent.spawn(self.CrawlCompanyNews,page_range[0],page_range[1]))\r\n        gevent.joinall(jobs) \r\n        return self.url_lst_withoutNews"
  },
  {
    "path": "legacy_v1/Crawler/crawler_sina.py",
    "content": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nCreated on Mon Jan 22 10:01:40 2018\r\n\r\n@author: Damon Li\r\n\"\"\"\r\n\r\nimport time, re, requests\r\nfrom concurrent import futures\r\nfrom bs4 import BeautifulSoup\r\nfrom pymongo import MongoClient\r\nimport Text_Analysis.text_mining as tm\r\n\r\nimport gevent\r\nfrom gevent import monkey,pool\r\nmonkey.patch_all()\r\n\r\n\r\nclass WebCrawlFromSina(object):\r\n    '''Crawl company news from 'http://roll.finance.sina.com.cn/finance/zq1/ssgs/index.shtml' website.\r\n\r\n    # Arguments:\r\n        totalPages: Number of pages set to be crawled(int type).\r\n        Range: Divide total web pages into totalPages/Range parts \r\n               for multi-threading processing(int type).\r\n        ThreadsNum: Number of threads needed to be start(int type).\r\n        dbName: Name of database(string type).\r\n        colName: Name of collection(string type).\r\n        IP: Local IP address(string type).\r\n        PORT: Port number corresponding to IP address(int type).\r\n    '''\r\n\r\n    def __init__(self,*arg,**kwarg):\r\n        self.totalPages = arg[0] #totalPages\r\n        self.Range = arg[1] #Range\r\n        self.ThreadsNum = kwarg['ThreadsNum']\r\n        self.dbName = kwarg['dbName']\r\n        self.colName = kwarg['collectionName']\r\n        self.IP = kwarg['IP']\r\n        self.PORT = kwarg['PORT']\r\n        self.Porb = .5 \r\n        self.realtimeNewsURL = []\r\n        self.tm = tm.TextMining(IP=\"localhost\",PORT=27017)\r\n\r\n    def countchn(self,string):\r\n        '''Count Chinese numbers and calculate the frequency of Chinese occurrence.\r\n\r\n        # Arguments:\r\n            string: Each part of crawled website analyzed by BeautifulSoup.\r\n        '''\r\n        pattern = re.compile(u'[\\u1100-\\uFFFDh]+?')\r\n        result = pattern.findall(string)\r\n        chnnum = len(result)\r\n        possible = chnnum/len(str(string))\r\n        return (chnnum, possible)\r\n\r\n    def getUrlInfo(self,url):\r\n        '''Analyze website and extract useful information.\r\n        '''\r\n        respond = requests.get(url)\r\n        respond.encoding = BeautifulSoup(respond.content, \"lxml\").original_encoding\r\n        bs = BeautifulSoup(respond.text, \"lxml\")\r\n        meta_list = bs.find_all('meta')\r\n        span_list = bs.find_all('span')\r\n        part = bs.find_all('p')\r\n        article = ''\r\n        date = ''\r\n        summary = ''\r\n        keyWords = ''\r\n        stockCodeLst = ''\r\n        for meta in meta_list:\r\n            if 'name' in meta.attrs and meta['name'] == 'description':\r\n                summary = meta['content']\r\n            elif 'name' in meta.attrs and meta['name'] == 'keywords':\r\n                keyWords = meta['content']\r\n            if summary != '' and keyWords != '':\r\n                break\r\n        for span in span_list:\r\n            if 'class' in span.attrs:\r\n                if span['class'] == ['date'] or span['class'] == ['time-source']:\r\n                    string = span.text.split()\r\n                    for dt in string:\r\n                        if dt.find('年') != -1:\r\n                            date += dt.replace('年','-').replace('月','-').replace('日',' ')\r\n                        elif dt.find(':') != -1:\r\n                            date += dt\r\n                    break\r\n            if 'id' in span.attrs and span['id'] == 'pub_date':\r\n                string = span.text.split()\r\n                for dt in string:\r\n                    if dt.find('年') != -1:\r\n                        date += dt.replace('年','-').replace('月','-').replace('日',' ')\r\n                    elif dt.find(':') != -1:\r\n                        date += dt\r\n                break\r\n        for span in span_list:\r\n            if 'id' in span.attrs and span['id'].find('stock_') != -1:\r\n                stockCodeLst += span['id'][8:] + ' '\r\n\r\n        for paragraph in part:\r\n            chnstatus = self.countchn(str(paragraph))\r\n            possible = chnstatus[1]\r\n            '''Porb: Standard frequency of Chinese occurrence among \r\n               each parts of one news(article/document), used\r\n               to judge whether any part is main body or not.\r\n            '''\r\n            if possible > self.Porb:\r\n               article += str(paragraph)\r\n\r\n        time1 = time.time()\r\n        while article.find('<') != -1 and article.find('>') != -1:\r\n              string = article[article.find('<'):article.find('>')+1]\r\n              article = article.replace(string,'')\r\n              time2 = time.time()\r\n              if time2 - time1 > 60:\r\n                print(' [*] 循环超时60s，跳出循环 ... ')\r\n                break\r\n\r\n        time1 = time.time()\r\n        while article.find('\\u3000') != -1:\r\n              article = article.replace('\\u3000','')\r\n              time2 = time.time()\r\n              if time2 - time1 > 60:\r\n                print(' [*] 循环超时60s，跳出循环 ... ')\r\n                break\r\n\r\n        article = ' '.join(re.split(' +|\\n+', article)).strip() \r\n\r\n        return summary, keyWords, date, stockCodeLst, article\r\n\r\n    def GenPagesLst(self):\r\n        '''Generate page number list using Range parameter.\r\n        '''\r\n        PageLst = []\r\n        k = 1\r\n        while k+self.Range-1 <= self.totalPages:\r\n            PageLst.append((k,k+self.Range-1))\r\n            k += self.Range\r\n        if k+self.Range-1 < self.totalPages:\r\n            PageLst.append((k,self.totalPages))\r\n        return PageLst\r\n\r\n    def CrawlRealtimeCompanyNews(self,firstPage): \r\n        '''Continue crawling company news from first website page\r\n           every once in a while and extract the useful information, \r\n           including summary, key words, released date, related stock \r\n           codes list and main body.\r\n        '''\r\n        doc_lst = []\r\n        if len(self.realtimeNewsURL) == 0:\r\n            self.ConnDB()\r\n            self._AddressLst = self.extractData(['Address'])[0]\r\n            resp = requests.get(firstPage)\r\n            resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n            bs = BeautifulSoup(resp.text, \"lxml\")\r\n            a_list = bs.find_all('a')\r\n            for a in a_list:\r\n                if 'href' in a.attrs and a.string and \\\r\n                a['href'].find('http://finance.sina.com.cn/stock/s/') != -1:\r\n                    if a['href'] not in self._AddressLst:\r\n                        self.realtimeNewsURL.append(a['href'])\r\n                        summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                        while article == '' and self.Prob >= .1:\r\n                            self.Prob -= .1\r\n                            summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                        self.Prob =.5\r\n                        if article != '':\r\n                            data = {'Date' : date,\r\n                                    'Address' : a['href'],\r\n                                    'Title' : a.string,\r\n                                    'Keywords' : keyWords,\r\n                                    'Summary' : summary,\r\n                                    'Article' : article,\r\n                                    'RelevantStock' : stockCodeLst}\r\n                            self._collection.insert_one(data)\r\n                            doc_lst.append(a.string + ' ' + summary + ' ' + article)\r\n                            print(' [' + date + '] ' + a.string)\r\n        else:\r\n            resp = requests.get(firstPage)\r\n            resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n            bs = BeautifulSoup(resp.text, \"lxml\")\r\n            a_list = bs.find_all('a')\r\n            for a in a_list:\r\n                if 'href' in a.attrs and a.string and \\\r\n                a['href'].find('http://finance.sina.com.cn/stock/s/') != -1:\r\n                    if a['href'] not in self.realtimeNewsURL and a['href'] not in self._AddressLst:\r\n                        self.realtimeNewsURL.append(a['href'])\r\n                        summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                        while article == '' and self.Prob >= .1:\r\n                            self.Prob -= .1\r\n                            summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                        self.Prob =.5\r\n                        if article != '':\r\n                            data = {'Date' : date,\r\n                                    'Address' : a['href'],\r\n                                    'Title' : a.string,\r\n                                    'Keywords' : keyWords,\r\n                                    'Summary' : summary,\r\n                                    'Article' : article,\r\n                                    'RelevantStock' : stockCodeLst}\r\n                            self._collection.insert_one(data)\r\n                            doc_lst.append(a.string + ' ' + summary + ' ' + article)\r\n                            print(' [' + date + '] ' + a.string)\r\n        return doc_lst\r\n\r\n    def CrawlHistoryCompanyNews(self,startPage,endPage):\r\n        '''Crawl historical company news \r\n        '''\r\n        self.ConnDB()\r\n        AddressLst = self.extractData(['Address'])[0]\r\n        if AddressLst == []:\r\n            urls = []\r\n            url_Part_1 = 'http://roll.finance.sina.com.cn/finance/zq1/ssgs/index_' \r\n            url_Part_2 = '.shtml'\r\n            for pageId in range(startPage,endPage+1):\r\n                urls.append(url_Part_1 + str(pageId) + url_Part_2)\r\n            for url in urls:\r\n                print(url)\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and a.string and \\\r\n                    a['href'].find('http://finance.sina.com.cn/stock/s/') != -1:\r\n                        summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                        while article == '' and self.Prob >= .1:\r\n                            self.Prob -= .1\r\n                            summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                        self.Prob =.5\r\n                        if article != '':\r\n                            data = {'Date' : date,\r\n                                    'Address' : a['href'],\r\n                                    'Title' : a.string,\r\n                                    'Keywords' : keyWords,\r\n                                    'Summary' : summary,\r\n                                    'Article' : article,\r\n                                    'RelevantStock' : stockCodeLst}\r\n                            self._collection.insert_one(data)\r\n        else:\r\n            urls = []\r\n            url_Part_1 = 'http://roll.finance.sina.com.cn/finance/zq1/ssgs/index_' \r\n            url_Part_2 = '.shtml'\r\n            for pageId in range(startPage,endPage+1):\r\n                urls.append(url_Part_1 + str(pageId) + url_Part_2)\r\n            for url in urls:\r\n                print(' <Re-Crawl url> ', url)\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and a.string and \\\r\n                    a['href'].find('http://finance.sina.com.cn/stock/s/') != -1:\r\n                        if a['href'] not in AddressLst:\r\n                            summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                            while article == '' and self.Prob >= .1:\r\n                                self.Prob -= .1\r\n                                summary, keyWords, date, stockCodeLst, article = self.getUrlInfo(a['href'])\r\n                            self.Prob =.5\r\n                            if article != '':\r\n                                data = {'Date' : date,\r\n                                        'Address' : a['href'],\r\n                                        'Title' : a.string,\r\n                                        'Keywords' : keyWords,\r\n                                        'Summary' : summary,\r\n                                        'Article' : article,\r\n                                        'RelevantStock' : stockCodeLst}\r\n                                self._collection.insert_one(data)\r\n\r\n    def ConnDB(self):\r\n        '''Connect mongodb.\r\n        '''\r\n        Conn = MongoClient(self.IP, self.PORT) \r\n        db = Conn[self.dbName]\r\n        self._collection = db.get_collection(self.colName)\r\n\r\n    def extractData(self,tag_list):\r\n        '''Extract column data with tag in 'tag_list' to the list.\r\n        '''\r\n        data = []\r\n        for tag in tag_list:\r\n            exec(tag + \" = self._collection.distinct('\" + tag + \"')\")\r\n            exec(\"data.append(\" + tag + \")\")\r\n        return data\r\n\r\n    def single_run(self):\r\n        '''Single threading running.\r\n        '''\r\n        page_ranges_lst = self.GenPagesLst()\r\n        for ind, page_range in enumerate(page_ranges_lst):\r\n            self.CrawlHistoryCompanyNews(page_range[0],page_range[1]) \r\n\r\n    def coroutine_run(self):\r\n        '''Coroutines running.\r\n        '''\r\n        jobs = []\r\n        page_ranges_lst = self.GenPagesLst()\r\n        for page_range in page_ranges_lst:\r\n            jobs.append(gevent.spawn(self.CrawlHistoryCompanyNews,page_range[0],page_range[1]))\r\n        gevent.joinall(jobs) \r\n\r\n    def multi_threads_run(self,**kwarg):\r\n        '''Multi-threading running.\r\n        '''\r\n        page_ranges_lst = self.GenPagesLst()\r\n        print(' Using ' + str(self.ThreadsNum) + ' threads for collecting news ... ')\r\n        with futures.ThreadPoolExecutor(max_workers=self.ThreadsNum) as executor:\r\n            future_to_url = {executor.submit(self.CrawlHistoryCompanyNews,page_range[0],page_range[1]) : \\\r\n                             ind for ind, page_range in enumerate(page_ranges_lst)}  \r\n\r\n    def classifyRealtimeStockNews(self):\r\n        '''Continue crawling and classifying news(articles/documents) every 60s. \r\n        '''\r\n        while True:\r\n            print(' * start crawling news from SINA ... ')\r\n            doc_list = self.CrawlRealtimeCompanyNews('http://roll.finance.sina.com.cn/finance/zq1/ssgs/index_1.shtml') #\r\n            print(' * finish crawling ... ')\r\n            if len(doc_list) != 0:\r\n                self.tm.classifyRealtimeStockNews(doc_list)\r\n            time.sleep(60)\r\n\r\nif __name__ == '__main__':\r\n    web_crawl_obj = WebCrawlFromSina(5000,100,ThreadsNum=4,IP=\"localhost\",PORT=27017,\\\r\n        dbName=\"Sina_Stock\",collectionName=\"sina_news_company\")\r\n    web_crawl_obj.coroutine_run()  #web_crawl_obj.single_run() #web_crawl_obj.multi_threads_run()"
  },
  {
    "path": "legacy_v1/Crawler/crawler_stcn.py",
    "content": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nCreated on Sat Feb 3 13:41:50 2018\r\n\r\n@author: Damon Li\r\n\"\"\"\r\n\r\nimport time, re, requests, datetime\r\nfrom concurrent import futures\r\nfrom bs4 import BeautifulSoup\r\nfrom pymongo import MongoClient\r\nimport Text_Analysis.text_mining as tm\r\n\r\nimport gevent\r\nfrom gevent import monkey,pool\r\nmonkey.patch_all()\r\n\r\n\r\nclass WebCrawlFromstcn(object):\r\n    '''Crawl company news from 'http://company.stcn.com/gsxw/1.shtml',\r\n                                'http://stock.stcn.com/xingu/1.shtml',\r\n                                'http://stock.stcn.com/zhuli/1.shtml',\r\n                                'http://stock.stcn.com/bankuai/1.shtml',\r\n                                'http://stock.stcn.com/dapan/1.shtml' website.\r\n\r\n    # Arguments:\r\n        totalPages: Number of pages set to be crawled.\r\n        Range: Divide total web pages into totalPages/Range parts \r\n               for multi-threading processing.\r\n        ThreadsNum: Number of threads needed to be start.\r\n        dbName: Name of database.\r\n        colName: Name of collection.\r\n        IP: Local IP address.\r\n        PORT: Port number corresponding to IP address.\r\n    '''\r\n\r\n    def __init__(self,**kwarg):\r\n        self.ThreadsNum = kwarg['ThreadsNum']\r\n        self.dbName = kwarg['dbName']\r\n        self.colName = kwarg['collectionName']\r\n        self.IP = kwarg['IP']\r\n        self.PORT = kwarg['PORT']\r\n        self.Prob = .5\r\n        self.realtimeNewsURL = []\r\n        self.tm = tm.TextMining(IP=\"localhost\",PORT=27017)\r\n\r\n    def countchn(self,string):\r\n        '''Count Chinese numbers and calculate the frequency of Chinese occurrence.\r\n\r\n        # Arguments:\r\n            string: Each part of crawled website analyzed by BeautifulSoup.\r\n        '''\r\n        pattern = re.compile(u'[\\u1100-\\uFFFDh]+?')\r\n        result = pattern.findall(string)\r\n        chnnum = len(result)\r\n        possible = chnnum/len(str(string))\r\n        return (chnnum, possible)\r\n\r\n    def getUrlInfo(self,url):\r\n        '''Analyze website and extract useful information.\r\n        '''\r\n        respond = requests.get(url)\r\n        respond.encoding = BeautifulSoup(respond.content, \"lxml\").original_encoding\r\n        bs = BeautifulSoup(respond.text, \"lxml\")\r\n        div_list = bs.find_all('div')\r\n        part = bs.find_all('p')\r\n        article = ''\r\n        date = ''\r\n        for div in div_list:\r\n            if 'class' in div.attrs and div['class'] == ['info']:\r\n                date = div.text.split(' ')[0] + ' ' + div.text.split(' ')[1]\r\n                break\r\n\r\n        for paragraph in part:\r\n            chnstatus = self.countchn(str(paragraph))\r\n            possible = chnstatus[1]\r\n            if possible > self.Prob:\r\n               article += str(paragraph)\r\n\r\n        while article.find('<') != -1 and article.find('>') != -1:\r\n              string = article[article.find('<'):article.find('>')+1]\r\n              article = article.replace(string,'')\r\n        while article.find('\\u3000') != -1:\r\n              article = article.replace('\\u3000','')\r\n\r\n        article = ' '.join(re.split(' +|\\n+', article)).strip() \r\n\r\n        return date, article\r\n\r\n    def GenPagesLst(self,totalPages,Range,initPageID):\r\n        '''Generate page number list using Range parameter.\r\n        '''\r\n        PageLst = []\r\n        k = initPageID\r\n        while k+Range-1 <= totalPages:\r\n            PageLst.append((k,k+Range-1))\r\n            k += Range\r\n        if k+Range-1 < totalPages:\r\n            PageLst.append((k,totalPages))\r\n        return PageLst\r\n\r\n    def CrawlRealtimeCompanyNews(self,url_part_lst):\r\n        '''Continue crawling company news from first website page\r\n           every once in a while and extract the useful information, \r\n           including summary, key words, released date, related stock \r\n           codes list and main body.\r\n        '''\r\n        doc_lst = []\r\n        self.ConnDB()\r\n        self._AddressLst = self.extractData(['Address'])[0]\r\n        for url_Part in url_part_lst:\r\n            url = url_Part + str(1) + '.shtml'\r\n            resp = requests.get(url)\r\n            resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n            bs = BeautifulSoup(resp.text, \"lxml\")\r\n            a_list = bs.find_all('a')\r\n            if len(self.realtimeNewsURL) == 0:\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\r\n                    and a['href'].find('http://company.stcn.com/') != -1 \\\r\n                    and a.parent.find('span') or ('href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\r\n                    and a['href'].find('http://stock.stcn.com/') != -1 \\\r\n                    and a.parent.find('span')):\r\n                        if a['href'] not in self._AddressLst:\r\n                            self.realtimeNewsURL.append(a['href'])\r\n                            date, article = self.getUrlInfo(a['href'])\r\n                            while article == '' and self.Prob >= .1:\r\n                                self.Prob -= .1\r\n                                date, article = self.getUrlInfo(a['href'])\r\n                            self.Prob =.5\r\n                            if article != '':\r\n                                data = {'Date' : date,\r\n                                        'Address' : a['href'],\r\n                                        'Title' : a['title'],\r\n                                        'Article' : article}\r\n                                self._collection.insert_one(data)\r\n                                doc_lst.append(a['title'] + ' ' + article)\r\n                                print(' [' + date + '] ' + a['title'])\r\n            else:\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\r\n                    and a['href'].find('http://company.stcn.com/') != -1 \\\r\n                    and a.parent.find('span') or ('href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\r\n                    and a['href'].find('http://stock.stcn.com/') != -1 \\\r\n                    and a.parent.find('span')):\r\n                        if a['href'] not in self.realtimeNewsURL and a['href'] not in self._AddressLst:\r\n                            self.realtimeNewsURL.append(a['href'])\r\n                            date, article = self.getUrlInfo(a['href'])\r\n                            while article == '' and self.Prob >= .1:\r\n                                self.Prob -= .1\r\n                                date, article = self.getUrlInfo(a['href'])\r\n                            self.Prob =.5\r\n                            if article != '':\r\n                                data = {'Date' : date,\r\n                                        'Address' : a['href'],\r\n                                        'Title' : a['title'],\r\n                                        'Article' : article}\r\n                                self._collection.insert_one(data)\r\n                                doc_lst.append(a['title'] + ' ' + article)\r\n                                print(' [' + date + '] ' + a['title'])  \r\n        return doc_lst\r\n\r\n    def CrawlCompanyNews(self,startPage,endPage,url_Part_1):\r\n        '''Crawl historical company news \r\n        '''\r\n        self.ConnDB()\r\n        AddressLst = self.extractData(['Address'])[0]\r\n        if AddressLst == []:\r\n            urls = []\r\n            for pageId in range(startPage,endPage+1):\r\n                urls.append(url_Part_1 + str(pageId) + '.shtml')\r\n            for url in urls:\r\n                print(url)\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\r\n                    and a['href'].find('http://company.stcn.com/') != -1 \\\r\n                    and a.parent.find('span'):\r\n                        date, article = self.getUrlInfo(a['href'])\r\n                        while article == '' and self.Prob >= .1:\r\n                            self.Prob -= .1\r\n                            date, article = self.getUrlInfo(a['href'])\r\n                        self.Prob =.5\r\n                        if article != '':\r\n                            data = {'Date' : date,\r\n                                    'Address' : a['href'],\r\n                                    'Title' : a['title'],\r\n                                    'Article' : article}\r\n                            self._collection.insert_one(data)\r\n        else:\r\n            urls = []\r\n            for pageId in range(startPage,endPage+1):\r\n                urls.append(url_Part_1 + str(pageId) + '.shtml')\r\n            for url in urls:\r\n                print(' <Re-Crawl url> ', url)\r\n                resp = requests.get(url)\r\n                resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding \r\n                bs = BeautifulSoup(resp.text, \"lxml\")\r\n                a_list = bs.find_all('a')\r\n                for a in a_list:\r\n                    if 'href' in a.attrs and 'target' in a.attrs and 'title' in a.attrs \\\r\n                    and a['href'].find('http://company.stcn.com/') != -1 \\\r\n                    and a.parent.find('span'):\r\n                        if a['href'] not in AddressLst:\r\n                            date, article = self.getUrlInfo(a['href'])\r\n                            while article == '' and self.Prob >= .1:\r\n                                self.Prob -= .1\r\n                                date, article = self.getUrlInfo(a['href'])\r\n                            self.Prob =.5\r\n                            if article != '':\r\n                                data = {'Date' : date,\r\n                                        'Address' : a['href'],\r\n                                        'Title' : a['title'],\r\n                                        'Article' : article}\r\n                                self._collection.insert_one(data)\r\n\r\n    def ConnDB(self):\r\n        '''Connect mongodb.\r\n        '''\r\n        Conn = MongoClient(self.IP, self.PORT) \r\n        db = Conn[self.dbName]\r\n        self._collection = db.get_collection(self.colName)\r\n\r\n    def extractData(self,tag_list):\r\n        '''Extract column data with tag in 'tag_list' to the list.\r\n        '''\r\n        data = []\r\n        for tag in tag_list:\r\n            exec(tag + \" = self._collection.distinct('\" + tag + \"')\")\r\n            exec(\"data.append(\" + tag + \")\")\r\n        return data\r\n\r\n    def coroutine_run(self,totalPages,Range,initPageID,**kwarg):\r\n        '''Coroutines running.\r\n        '''\r\n        jobs = []\r\n        page_ranges_lst = self.GenPagesLst(totalPages,Range,initPageID)\r\n        for page_range in page_ranges_lst:\r\n            jobs.append(gevent.spawn(self.CrawlCompanyNews,page_range[0],page_range[1],kwarg['url_Part_1']))\r\n        gevent.joinall(jobs) \r\n\r\n    def multi_threads_run(self,**kwarg):\r\n        '''Multi-threading running.\r\n        '''\r\n        page_ranges_lst = self.GenPagesLst()\r\n        print(' Using ' + str(self.ThreadsNum) + ' threads for collecting news ... ')\r\n        with futures.ThreadPoolExecutor(max_workers=self.ThreadsNum) as executor:\r\n            future_to_url = {executor.submit(self.CrawlCompanyNews,page_range[0],page_range[1]) : \\\r\n                             ind for ind, page_range in enumerate(page_ranges_lst)}  \r\n\r\n    def classifyRealtimeStockNews(self):\r\n        '''Continue crawling and classifying news(articles/documents) every 60s. \r\n        '''\r\n        today_Date = datetime.datetime.now().strftime('%Y-%m-%d')\r\n        while True:\r\n            print(' * start crawling news from STCN ... ')\r\n            doc_list = self.CrawlRealtimeCompanyNews(['http://company.stcn.com/gsxw/',\\\r\n                                                'http://stock.stcn.com/xingu/',\\\r\n                                                'http://stock.stcn.com/zhuli/',\\\r\n                                                'http://stock.stcn.com/bankuai/',\\\r\n                                                'http://stock.stcn.com/dapan/']) #\r\n            print(' * finish crawling ... ')\r\n            if len(doc_list) != 0:\r\n                self.tm.classifyRealtimeStockNews(doc_list)\r\n            time.sleep(60)\r\n"
  },
  {
    "path": "legacy_v1/Crawler/crawler_tushare.py",
    "content": "import pymongo\r\nimport tushare as ts\r\nimport datetime\r\nimport time\r\nimport math\r\nimport traceback\r\n\r\nclass CrawlStockData(object):\r\n\tdef __init__(self,**kwarg):\r\n\t\tself.IP = kwarg['IP']\r\n\t\tself.PORT = kwarg['PORT']\r\n\t\tself.ConnDB()\r\n\t\tself.stockDailyPath = 'D:\\\\stock_daliy'\r\n\r\n\tdef ConnDB(self):\r\n\t\tself._Conn = pymongo.MongoClient(self.IP, self.PORT) \r\n\r\n\tdef extractData(self,dbName,colName,tag_list):\r\n\t\tdb = self._Conn[dbName]\r\n\t\tcollection = db.get_collection(colName)\r\n\t\tdata = []\r\n\t\tfor tag in tag_list:\r\n\t\t\texec(tag + \" = collection.distinct('\" + tag + \"')\")\r\n\t\t\texec(\"data.append(\" + tag + \")\")\r\n\t\treturn data\r\n\r\n\tdef getStockBasicFromTushare(self,dbName,colName):\r\n\t\tdb = self._Conn[dbName]\r\n\t\tcollection = db.get_collection(colName)\r\n\t\tstock_basic_info = ts.get_stock_basics()\r\n\t\tfor i in range(len(stock_basic_info)):\r\n\t\t\tdata = {stock_basic_info.index.name : stock_basic_info.index[i]}\r\n\t\t\tdata.update({'name' : stock_basic_info['name'][i]})\r\n\t\t\tdata.update({'industry' : stock_basic_info['industry'][i]})\r\n\t\t\tdata.update({'area' : stock_basic_info['area'][i]})\r\n\t\t\tdata.update({'pe' : stock_basic_info['pe'][i]})\r\n\t\t\tdata.update({'outstanding' : stock_basic_info['outstanding'][i]})\r\n\t\t\tdata.update({'totals' : stock_basic_info['totals'][i]})\r\n\t\t\tdata.update({'totalAssets' : stock_basic_info['totalAssets'][i]})\r\n\t\t\tdata.update({'liquidAssets' : stock_basic_info['liquidAssets'][i]})\r\n\t\t\tdata.update({'fixedAssets' : stock_basic_info['fixedAssets'][i]})\r\n\t\t\tdata.update({'reserved' : stock_basic_info['reserved'][i]})\r\n\t\t\tdata.update({'reservedPerShare' : stock_basic_info['reservedPerShare'][i]})\r\n\t\t\tdata.update({'esp' : stock_basic_info['esp'][i]})\r\n\t\t\tdata.update({'bvps' : stock_basic_info['bvps'][i]})\r\n\t\t\tdata.update({'pb' : stock_basic_info['pb'][i]})\r\n\t\t\tdata.update({'undp' : stock_basic_info['undp'][i]})\r\n\t\t\tdata.update({'perundp' : stock_basic_info['perundp'][i]})\r\n\t\t\tdata.update({'rev' : stock_basic_info['rev'][i]})\r\n\t\t\tdata.update({'profit' : stock_basic_info['profit'][i]})\r\n\t\t\tdata.update({'gpr' : stock_basic_info['gpr'][i]})\r\n\t\t\tdata.update({'npr' : stock_basic_info['npr'][i]})\r\n\t\t\tdata.update({'holders' : stock_basic_info['holders'][i]})\r\n\t\t\t#detail = dict(zip(stock_basic_info.columns, [stock_basic_info[j][i] for j in stock_basic_info.columns]))\r\n\t\t\tcollection.insert_one(data)\r\n\r\n\tdef renewStockBasic(self):\r\n\t\tpass\r\n\r\n\tdef getStockTickHistory(self,dbName,stockCode):\r\n\t\ttry:\r\n\t\t\tdb = self._Conn[dbName]\r\n\t\t\tcollection = db.get_collection(stockCode)\r\n\t\t\tdate = self.extractData(\"NBD\",\"nbd_news_company\",['date'])[0]\r\n\t\t\tbegin_date = min(date).split(' ')[0]\r\n\t\t\tdate_list = self.getCalendar(begin_date)\r\n\t\t\tfor dt in date_list:\r\n\t\t\t\ttickDataOfEachDate = ts.get_tick_data(stockCode,date=dt)\r\n\t\t\t\tif not math.isnan(tickDataOfEachDate['price'][0]): #exist data at that day\r\n\t\t\t\t\tdata = {}\r\n\t\t\t\t\tfor i in range(len(tickDataOfEachDate)-1,-1,-1):\r\n\t\t\t\t\t\tdata.update({'date' : dt})\r\n\t\t\t\t\t\tdata.update({'time' : tickDataOfEachDate['time'][i]})\r\n\t\t\t\t\t\tdata.update({'price' : tickDataOfEachDate['price'][i]})\r\n\t\t\t\t\t\tdata.update({'change' : tickDataOfEachDate['change'][i]})\r\n\t\t\t\t\t\tdata.update({'volume' : int(tickDataOfEachDate['volume'][i])})\r\n\t\t\t\t\t\tdata.update({'amount' : int(tickDataOfEachDate['amount'][i])})\r\n\t\t\t\t\t\tdata.update({'type' : tickDataOfEachDate['type'][i]})\r\n\t\t\t\t\t\tcollection.insert_one(data)\r\n\t\t\t\t\t\tdata = {}\r\n\t\t\t\tprint(dt + ' crawl finished ... ')\r\n\t\texcept Exception:\r\n\t\t\ttraceback.print_exc()\r\n\r\n\tdef getStockDayHistory(self,dbName,stockCode):\r\n\t\tdb = self._Conn[dbName]\r\n\t\tcollection = db.get_collection(stockCode)\r\n\t\tPath = self.stockDailyPath + '\\\\' + stockCode + '.txt'\r\n\t\tdata = []\r\n\t\tfor row in open(Path,'r'):\r\n\t\t\tline = row.split()\r\n\t\t\tdata.append(line)\r\n\t\tDict = {}\r\n\t\tfor i in range(len(data)):\r\n\t\t\tif len(data[i]) > 1:\r\n\t\t\t\tDict.update({'date' : data[i][0]})\r\n\t\t\t\tDict.update({'open' : data[i][1]})\r\n\t\t\t\tDict.update({'high' : data[i][2]})\r\n\t\t\t\tDict.update({'low' : data[i][3]})\r\n\t\t\t\tDict.update({'close' : data[i][4]})\r\n\t\t\t\tDict.update({'volume' : data[i][5]})\r\n\t\t\t\tDict.update({'turnover' : data[i][6]})\r\n\t\t\t\tcollection.insert_one(Dict)\r\n\t\t\t\tDict = {}\r\n\r\n\tdef getCalendar(self,begin_date):  \r\n\t\tdate_list = []  \r\n\t\tbegin_date = datetime.datetime.strptime(begin_date, \"%Y-%m-%d\")  \r\n\t\tend_date = datetime.datetime.strptime(time.strftime('%Y-%m-%d',time.localtime(time.time())), \"%Y-%m-%d\")  \r\n\t\twhile begin_date <= end_date:  \r\n\t\t\tdate_str = begin_date.strftime(\"%Y-%m-%d\")  \r\n\t\t\tdate_list.append(date_str)  \r\n\t\t\tbegin_date += datetime.timedelta(days=1)  \r\n\t\treturn date_list\r\n\r\n\tdef isUnique(self, List):  \r\n\t\t# write your code here  \r\n\t\tn = len(List)  \r\n\t\tfor i in range(n):  \r\n\t\t\tif List.count(List[i]) != 1: #判断单个字符串a[i]出现次数  \r\n\t\t\t\treturn False  \r\n\t\t\t\t#break  \r\n\t\treturn True \r\n\r\n\tdef getStockTickRealtime(self):\r\n\t\tpass\r\n\r\n\r\n"
  },
  {
    "path": "legacy_v1/README_OLD.md",
    "content": "# 上市公司新闻文本分析与分类预测\n\n ![image](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/assets/images/FINNEWS-HUNTER.jpg)\n\n[![Star History Chart](https://api.star-history.com/svg?repos=DemonDamon/Listed-company-news-crawl-and-text-analysis&type=Date)]([https://star-history.com/#linhandev/dataset&Date](https://star-history.com/#DemonDamon/Listed-company-news-crawl-and-text-analysis&Date))\n\n-------------------------------\n\n## 简介\n\n上市公司新闻文本分析与分类预测的基本步骤如下：\n\n - 从新浪财经、每经网、金融界、中国证券网、证券时报网上，爬取上市公司（个股）的历史新闻文本数据（包括时间、网址、标题、正文）\n - 从Tushare上获取沪深股票日线数据（开、高、低、收、成交量和持仓量）和基本信息（包括股票代码、股票名称、所属行业、所属地区、PE值、总资产、流动资产、固定资产、留存资产等）\n - 对抓取的新闻文本按照，去停用词、加载新词、分词的顺序进行处理\n - 利用前两步中所获取的股票名称和分词后的结果，抽取出每条新闻里所包含的（0支、1支或多支）股票名称，并将所对应的所有股票代码，组合成与该条新闻相关的股票代码列表，并在历史数据表中增加一列相关股票代码数据\n - 从历史新闻数据库中抽取与某支股票相关的所有新闻文本，利用该支股票的日线数据（比如某一天发布的消息，在设定N天后如果价格上涨则认为是利好消息，反之则是利空消息）给每条新闻贴上“利好”和“利空”的标签，并存储到新的数据库中（或导出到CSV文件）\n - 实时抓取新闻数据，判断与该新闻相关的股票有哪些，利用上一步的结果，对与某支股票相关的所有历史新闻文本（已贴标签）进行文本分析（构建新的特征集），然后利用SVM（或随机森林）分类器对文本分析结果进行训练（如果已保存训练模型，可选择重新训练或直接加载模型），最后利用训练模型对实时抓取的新闻数据进行分类预测\n\n开发环境`Python-v3(3.6)`：\n\n - gensim==3.2.0\n - jieba==0.39\n - scikit-learn==0.19.1\n - pandas==0.20.0\n - numpy==1.13.3+mkl\n - scipy==0.19.0\n - pymongo==3.6.0\n - beautifulsoup4==4.6.0\n - tushare==1.1.1\n - requests==2.18.4\n - gevent==1.2.1\n\n## 文本处理 -> [text_processing.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Text_Analysis/text_processing.py)\n\n - 文本处理包括去停用词处理、加载新词、中文分词、去掉出现次数少的分词\n - 生成字典和Bow向量，并基于Gensim转化模型（LSI、LDA、TF-IDF）转化Bow向量\n - 计算文本相似度\n - 打印词云\n\n## 文本挖掘 -> [text_mining.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Text_Analysis/text_mining.py)\n\n - 从新闻文本中抽取特定信息，并贴上新的文本标签方便往后训练模型\n - 从数据库中抽取与某支股票相关的所有新闻文本\n - 将贴好标签的历史新闻进行分类训练，利用训练好的模型对实时抓取的新闻文本进行分类预测\n\n## 新闻爬取 -> [crawler_cnstock.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_cnstock.py), [crawler_jrj.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_jrj.py), [crawler_nbd.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_nbd.py), [crawler_sina.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_sina.py), [crawler_stcn.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_stcn.py)\n\n - 分析网站结构，多线程（或协程）爬取上市公司历史新闻数据\n\n## Tushare数据提取 -> [crawler_tushare.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/run_crawler_tushare.py)\n\n - 获取沪深所有股票的基本信息，包括股票代码、股票名称、所属行业、所属地区等\n\n## 用法\n\n - 配好运行环境以及安装MongoDB，最好再安装一个MongoDB的可视化管理工具Studio 3T\n - 先运行[crawler_cnstock.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_cnstock.py), [crawler_jrj.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_jrj.py), [crawler_nbd.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_nbd.py), [crawler_sina.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_sina.py), [crawler_stcn.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_stcn.py)这5个py文件，而且可能因为对方服务器没有响应而重复多次运行这几个文件才能抓取大量的历史数据\n - 接着运行[crawler_tushare.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/run_crawler_tushare.py)从Tushare获取基本信息和股票价格\n - 最后运行[run_main.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/run_main.py)文件，其中有4个步骤，除了第1步初始化外，其他几步最好单独运行\n - 注意：所有程序都必须在文件所在目录下运行\n\n## 更新目标\n\n 由于之前的项目代码是在初学Python的时候写的，很多写法都是入门级别，因此为了提高整体项目的质量，除了优化代码细节和已有的功能模块之外，还加入了多个功能模块，来支撑未来更加智能化和个性化的金融分析与交易。\n - 完成初步构想，重构该项目，将项目分成8大模块，分别是`数据获取模块`，`数据清洗与预处理模块`，`大数据可视化模块`，`基于机器学习的文本挖掘模块`，`金融知识图谱构建模块`，`任务导向多轮对话模块`，`金融交易模块`，`通用服务模块`\n (备注：项目在完善之后会重新更名为`Finnews Hunter`，命名的来源是出于对`《全职猎人》`的喜爱，与项目本质的结合，其中`Finnews`是`Financial News`的简写。上面提到的8个模块，分别由`《全职猎人》`中的本人最喜爱的8位角色命名，分别是\n - `数据获取模块`               -> [Gon](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Gon) -> `网页爬虫、各种数据源API调用等`\n - `数据清洗与预处理模块`       -> [Killua](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Killua) -> `数据清洗、数据转换(数据采样、类型转换、归一化等)、数据描述(数据可视化)、特征选择与组合(熵增益和分支定界等)、特征抽取(主成分分析、线性判别分析等)`\n - `大数据可视化模块`           -> [Kurapika](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Kurapika) -> `基于多个可视化模块进行封装，包括提供Web可视化界面`\n - `自然语言处理模块`           -> [Leorio](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Leorio) -> `中文分词、词性标注、实体识别`\n - `基于机器学习的文本挖掘模块` -> [Hisoka](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Hisoka)  -> ``\n - `金融知识图谱构建模块`       -> [Chrollo](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Chrollo) -> ``\n - `任务导向多轮对话模块`       -> [Illumi](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Illumi) -> ``\n - `金融交易模块`               -> [Feitan](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Feitan) -> ``\n - `基础与Web服务模块`          -> [Kite](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src/Kite) -> `基础服务集，包括基本参数配置文件(.py)、数据库的构建与连接、日志打印与收集、多线程服务、Web服务框架搭建以及其他函数`)\n \n ## 更新日志\n - 注意：  \n   - 以下例子均需在代码根目录[src](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/tree/main/src)下执行  \n   - 先安装好MongoDB用作存储数据库，以及Redis用做简单的消息队列\n   - 运行下面demo时，先要设置[config.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Kite/config.py)里面的参数\n   \n - 更新[crawler_tushare.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_tushare.py)代码为[stockinfospyder.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/stockinfospyder.py)，直接运行即可获取股票历史价格数据，并在每天15:30分后更新数据(目前只采集天数据)\n    - example-1 调用[AkShare](https://www.akshare.xyz/zh_CN/latest/)接口获取股票历史价格数据，并开启实时更新\n    ```\n    from Kite import config\n    from Gon.stockinfospyder import StockInfoSpyder\n\n    stock_info_spyder = StockInfoSpyder(config.STOCK_DATABASE_NAME, config.COLLECTION_NAME_STOCK_BASIC_INFO)\n    # 指定时间段，获取历史数据，如：stock_info_spyder.get_historical_news(start_date=\"20150101\", end_date=\"20201204\")\n    # 如果没有指定时间段，且数据库已存在部分数据，则从最新的数据时间开始获取直到现在，比如数据库里已有sh600000价格数据到\n    # 2020-12-03号，如不设定具体时间，则从自动获取sh600000自2020-12-04至当前的价格数据\n    stock_info_spyder.get_historical_news()\n    ```\n    - example-2 开启自动化更新所有股票价格数据(目前只支持在15:30分后更新日数据)\n    ```\n    from Kite import config\n    from Gon.stockinfospyder import StockInfoSpyder\n\n    stock_info_spyder = StockInfoSpyder(config.STOCK_DATABASE_NAME, config.COLLECTION_NAME_STOCK_BASIC_INFO)\n    stock_info_spyder.get_realtime_news()\n    ```\n - 更新[crawler_cnstock.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_cnstock.py)代码为[cnstockspyder.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/cnstockspyder.py)，直接运行即可获取中国证券网历史新闻数据，并可以实时更新采集\n    - example-1 爬取历史新闻数据，然后去重以及去NULL\n    ```\n    import time\n    import logging\n    from Kite import config\n    from Killua.denull import DeNull\n    from Killua.deduplication import Deduplication \n    from Gon.cnstockspyder import CnStockSpyder\n\n    cnstock_spyder = CnStockSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n    for url_to_be_crawled, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n        logging.info(\"start crawling {} ...\".format(url_to_be_crawled))\n        cnstock_spyder.get_historical_news(url_to_be_crawled, category_chn=type_chn)\n        logging.info(\"finished ...\")\n        time.sleep(30)\n\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n    ```\n    - example-2 实时更新新闻数据库，并且将新数据推进redis消息队列等待处理\n    ```\n    import time, logging, threading\n    from Kite import config\n    from Kite.database import Database\n    from Killua.denull import DeNull\n    from Killua.deduplication import Deduplication \n    from Gon.cnstockspyder import CnStockSpyder\n\n    obj = Database()\n    df = obj.get_data(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK, keys=[\"Date\", \"Category\"])\n\n    cnstock_spyder = CnStockSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n    # 先补充历史数据，比如已爬取数据到2020-12-01，但是启动实时爬取程序在2020-12-23，则先\n    # 自动补充爬取2020-12-02至2020-12-23的新闻数据\n    for url_to_be_crawled, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n        # 查询type_chn的最近一条数据的时间\n        latets_date_in_db = max(df[df.Category == type_chn][\"Date\"].to_list())\n        cnstock_spyder.get_historical_news(url_to_be_crawled, category_chn=type_chn, start_date=latets_date_in_db)\n\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n\n    # 开启多线程并行实时爬取\n    thread_list = []\n    for url, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n        thread = threading.Thread(target=cnstock_spyder.get_realtime_news, args=(url, type_chn, 60))\n        thread_list.append(thread)\n    for thread in thread_list:\n        thread.start()\n    for thread in thread_list:\n        thread.join()\n    ```\n - 更新[crawler_jrj.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_jrj.py)代码为[jrjspyder.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/jrjspyder.py)，直接运行即可获取金融界历史新闻数据，并可以实时更新采集\n    - example-1 爬取历史新闻数据，然后去重以及去NULL\n    ```\n    from Kite import config\n    from Killua.denull import DeNull\n    from Killua.deduplication import Deduplication \n    from Gon.jrjspyder import JrjSpyder\n\n    jrj_spyder = JrjSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n    jrj_spyder.get_historical_news(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ, start_date=\"2015-01-01\")\n\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n    ```\n    - example-2 已爬取一定量的历史数据下，开启实时更新新闻数据库，并且将新数据推进redis消息队列等待处理\n    ```\n    from Kite import config\n    from Gon.jrjspyder import JrjSpyder\n\n    jrj_spyder = JrjSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n    jrj_spyder.get_historical_news(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ)  # 补充爬虫数据到最新日期\n    jrj_spyder.get_realtime_news()\n    ```\n - 更新[crawler_nbd.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_nbd.py)代码为[nbdspyder.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/nbdspyder.py)，直接运行即可获取每经网历史新闻数据，并可以实时更新采集\n    - example-1 爬取历史新闻数据，然后去重以及去NULL\n    ```\n    from Kite import config\n    from Killua.denull import DeNull\n    from Killua.deduplication import Deduplication \n    from Gon.nbdspyder import NbdSpyder\n\n    nbd_spyder = NbdSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\n    nbd_spyder.get_historical_news(start_page=684)\n\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n    ```\n    - example-2 已爬取一定量的历史数据下，开启实时更新新闻数据库，并且将新数据推进redis消息队列等待处理\n    ```\n    from Kite import config\n    from Killua.denull import DeNull\n    from Killua.deduplication import Deduplication \n    from Gon.nbdspyder import NbdSpyder\n\n    # 如果没有历史数据从头爬取，如果已爬取历史数据，则从最新的时间开始爬取\n    # 如历史数据中最近的新闻时间是\"2020-12-09 20:37:10\"，则从该时间开始爬取\n    nbd_spyder = NbdSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\n    nbd_spyder.get_historical_news()\n\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n\n    nbd_spyder.get_realtime_news()\n    ```\n - 更新[crawler_sina.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/Crawler/crawler_sina.py)代码为[sinaspyder.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/sinaspyder.py)，直接运行即可获取新浪财经历史新闻数据(未更新)\n - 停止`证券时报网`爬虫代码的更新(旧代码已不可用)，新增`网易财经`和`凤凰财经`的爬虫代码(未更新)\n - 新增[buildstocknewsdb.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Killua/buildstocknewsdb.py)如果已经在每经网、中国证券网和金融界爬取了一定量新闻文本，接下来就是针对每支股票构建对应的新闻数据库，并根据股价贴上3/5/10/15/30/60天标签，具体判断条件查看[buildstocknewsdb.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Killua/buildstocknewsdb.py)第111-116行注释\n    - example-1 从历史新闻数据库中抽取、构建每支股票的新闻数据库，并贴上标签\n    ```\n    from Kite import config\n    from Killua.buildstocknewsdb import GenStockNewsDB\n\n    gen_stock_news_db = GenStockNewsDB()\n    gen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n    gen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\n    gen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n    ```\n    - example-2 监听redis消息队列，将新的数据分别存入与该新闻相关的所有股票新闻数据库中\n    ```\n    from Kite import config\n    from Killua.buildstocknewsdb import GenStockNewsDB\n\n    gen_stock_news_db = GenStockNewsDB()\n    gen_stock_news_db.listen_redis_queue()\n    ```\n - 新增[realtime_spyder_startup.bat](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/realtime_spyder_startup.bat)同时以下程序\n    - 开启多个爬虫实例，包括[realtime_starter_cnstock.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/realtime_starter_cnstock.py)、[realtime_starter_jrj.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/realtime_starter_jrj.py)、[realtime_starter_nbd.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/realtime_starter_nbd.py)等\n    - 全股票数据更新代码[realtime_starter_stock_price.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/realtime_starter_stock_price.py)\n    - 监听redis消息队列[realtime_starter_redis_queue.py](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/Gon/realtime_starter_redis_queue.py)\n  - 新增[realtime_spyder_stopall.bat](https://github.com/DemonDamon/Listed-company-news-crawl-and-text-analysis/blob/main/src/realtime_spyder_stopall.bat)批量终止爬虫程序\n - 更新前使用jieba分词系统，在实体识别上需要不断维护新词表来提高识别精度；更新后，使用基于BERT预训练的FinBERT对金融领域实体进行识别\n\n# FinnewsHunter (Reborn)\n\n基于 AgenticX 框架构建的企业级多智能体金融决策平台。\n\n## 项目状态\n\n🚧 **重构进行中** 🚧\n\n本项目正在经历重大重构，从单一脚本集合升级为现代化的微服务架构。\n\n- **旧版代码**：已归档至 `legacy_v1/` 目录。\n- **重构规划**：详见 [planning.md](../../planning.md)。\n\n## 技术架构\n\n- **后端**: Python, FastAPI, AgenticX (Orchestrator, Debate, Tools)\n- **前端**: TypeScript, React\n- **算法**: sklearn, PyTorch, vllm\n\n## 快速开始\n\n### 后端开发\n\n1. 进入后端目录：\n   ```bash\n   cd backend\n   ```\n2. 安装依赖：\n   ```bash\n   pip install -r requirements.txt\n   ```\n3. 启动服务：\n   ```bash\n   uvicorn app.main:app --reload\n   ```\n\n## 目录结构\n\n```\nFinnewsHunter/\n├── backend/            # FastAPI 后端服务\n│   ├── app/            # 应用代码\n│   └── tests/          # 测试用例\n├── frontend/           # React 前端应用 (待初始化)\n├── legacy_v1/          # 旧版代码归档\n├── docs/               # 项目文档\n└── README.md           # 项目说明\n```\n\n### 快速开始\n\n1. 进入后端目录：\n   ```bash\n   cd backend\n   ```\n2. 安装依赖：\n   ```bash\n   pip install -r requirements.txt\n   ```\n3. 启动服务：\n   ```bash\n   uvicorn app.main:app --reload\n   ```\n\n## 目录结构\n\n```\nFinnewsHunter/\n├── backend/            # FastAPI 后端服务\n│   ├── app/            # 应用代码\n│   └── tests/          # 测试用例\n├── frontend/           # React 前端应用 (待初始化)\n├── legacy_v1/          # 旧版代码归档\n├── docs/               # 项目文档\n└── README.md           # 项目说明\n```"
  },
  {
    "path": "legacy_v1/Text_Analysis/__init__.py",
    "content": "\n"
  },
  {
    "path": "legacy_v1/Text_Analysis/text_mining.py",
    "content": "# -*- coding: UTF-8 -*- \r\n\"\"\"\r\nCreated on Sat Jan 20 10:20:33 2018\r\n\r\n@author: Damon Li\r\n\"\"\"\r\n\r\nimport os, re, csv, time, warnings, threading\r\nfrom pymongo import MongoClient\r\nimport pandas as pd\r\nimport numpy as np\r\nfrom scipy.sparse import csr_matrix\r\nfrom bson.objectid import ObjectId\r\nimport Text_Analysis.text_processing as tp\r\nfrom gensim import corpora, utils\r\n\r\nfrom sklearn import svm\r\nfrom sklearn.ensemble import RandomForestClassifier \r\nfrom sklearn.externals import joblib\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.metrics import classification_report\r\nimport sklearn.exceptions\r\nfrom sklearn.preprocessing import OneHotEncoder\r\n\r\nwarnings.filterwarnings(\"ignore\", category=sklearn.exceptions.UndefinedMetricWarning)\r\nwarnings.filterwarnings(\"ignore\", category=Warning, module='sklearn')\r\nwarnings.filterwarnings(\"ignore\", category=UserWarning, module='gensim')\r\nwarnings.filterwarnings(\"ignore\", category=RuntimeWarning, module='gensim')\r\n\r\nclass TextMining(object):\r\n\t'''Text analysis and prediction functions class.\r\n\r\n\t# Arguments:\r\n\t\tIP: IP address of mongodb database.\r\n\t\tPORT: Port number corresponding to IP.\r\n\t'''\r\n\r\n\tdef __init__(self,**kwarg): \r\n\t\tself.IP = kwarg['IP']\r\n\t\tself.PORT = kwarg['PORT']\r\n\t\tself.ConnDB()\r\n\t\tself.tp = tp.TextProcessing(os.getcwd() + '\\\\' + 'Chinese_Stop_Words.txt', \\\r\n\t\t\tos.getcwd() + '\\\\' + 'finance_dict.txt')\r\n\t\tif not os.path.exists(os.getcwd() + '\\\\' + 'stock_dict_file'):\r\n\t\t\tos.makedirs(os.getcwd() + '\\\\' + 'stock_dict_file')\r\n\t\tself.DictPath = os.getcwd() + '\\\\' + 'stock_dict_file'\r\n\r\n\tdef ConnDB(self):\r\n\t\t'''Connect to the mongodb.\r\n\t\t'''\r\n\t\tself._Conn = MongoClient(self.IP, self.PORT) \r\n\r\n\tdef extractData(self,dbName,colName,tag_list):\r\n\t\t'''Extract data from specific collection of specific database.\r\n\r\n\t\t# Arguments:\r\n\t\t\tdbName: Name of database.\r\n\t\t\tcolName: Name of collection.\r\n\t\t\ttag_list: List of tags that need to be extracted.\r\n\t\t'''\r\n\t\tdb = self._Conn[dbName]\r\n\t\tcollection = db.get_collection(colName)\r\n\t\tdata = []\r\n\t\tDict = {}\r\n\t\tfor tag in tag_list:\r\n\t\t\texec(tag + \" = collection.distinct('\" + tag + \"')\")\r\n\t\t\texec(\"data.append(\" + tag + \")\")\r\n\t\t\texec(\"Dict.update({'\" + tag + \"' : np.array(\" + tag + \")})\")\r\n\t\tdataFrame = pd.DataFrame(Dict,columns=tag_list)\r\n\t\treturn dataFrame\r\n\r\n\tdef extractStockCodeFromArticle(self,dbName,colName):\r\n\t\t'''Extract the stocks mentioned by each news(articles/documents).\r\n\r\n\t\t# Arguments:\r\n\t\t\tdbName: Name of database.\r\n\t\t\tcolName: Name of collection.\r\n\t\t'''\r\n\t\tdb = self._Conn[dbName]\r\n\t\tcollection = db.get_collection(colName)\r\n\t\tidLst = self.extractData(dbName,colName,['_id'])._id\r\n\t\tdata = self.extractData(\"Stock\",\"Basic_Info\",['name','code'])\r\n\t\tarticles = []\r\n\t\tfor _id in idLst:\r\n\t\t\tif dbName == 'NBD_Stock':\r\n\t\t\t\ttitle = collection.find_one({'_id':ObjectId(_id)})['title']\r\n\t\t\telse:\r\n\t\t\t\ttitle = collection.find_one({'_id':ObjectId(_id)})['Title']\r\n\t\t\tarticle = collection.find_one({'_id':ObjectId(_id)})['Article']\r\n\t\t\tarticles.append(title + ' ' + article)\r\n\t\ttoken, _, _ = self.tp.genDictionary(articles,saveDict=False)\r\n\t\tj = 0\r\n\t\tfor tk in token:\r\n\t\t\trelevantStockName = []\r\n\t\t\trelevantStockCode = []\r\n\t\t\tfor k in range(len(tk)):\r\n\t\t\t\tif len(tk[k]) >= 3 and tk[k] in list(data.name):\r\n\t\t\t\t\trelevantStockName.append(tk[k]) \r\n\t\t\t\t\trelevantStockCode.append(list(data[(data.name == tk[k])].code)[0]) \r\n\t\t\tif len(relevantStockCode) != 0:\r\n\t\t\t\trelevantStockCodeDuplicateRemoval = list(set(relevantStockCode))\r\n\t\t\t\tcollection.update({\"_id\":idLst[j]},{\"$set\":{\"relevantStock\":\\\r\n\t\t\t\t\t' '.join(relevantStockCodeDuplicateRemoval)}})\r\n\t\t\t# print(' [*] finished ' + str(j+1) + ' ... ')\r\n\t\t\tj += 1\r\n\r\n\tdef extractStockCodeFromRealtimeNews(self,documents):\r\n\t\t'''Extract stocks mentioined by real-time crawled news(articles/documents), \r\n\t\t\tand return the list of corresponding codes.\r\n\r\n\t\t# Arguments:\r\n\t\t\tdocuments: Real-time crawled news(articles/documents).\r\n\t\t'''\r\n\t\tstock_basic_info = self.extractData(\"Stock\",\"Basic_Info\",['name','code'])\r\n\t\ttoken_list = self.tp.jieba_tokenize(documents)\r\n\t\trelevant_stock_list = []\r\n\t\tfor tokens in token_list:\r\n\t\t\trelevantStockCode = []\r\n\t\t\tfor tk in tokens:\r\n\t\t\t\tif len(tk) >= 3 and tk in list(stock_basic_info.name):\r\n\t\t\t\t\trelevantStockCode.append(list(stock_basic_info[(stock_basic_info.name == tk)].code)[0]) \r\n\t\t\trelevant_stock_list.append(list(set(relevantStockCode))) \r\n\t\treturn relevant_stock_list\r\n\r\n\tdef judgeGoodOrBadNews(self,stockCode,date,judgeTerm):\r\n\t\t'''Label the historical news(articles/documents) with 'Bad', 'Good' or 'Neutral'.\r\n\r\n\t\t# Arguments:\r\n\t\t\tstockCode: Code of specific stock.\r\n\t\t\tdate: Date at which released the specific news.\r\n\t\t\tjudgeTerm: Interval after which compare the close price with that at the released date.\r\n\t\t'''\r\n\t\tdb = self._Conn['Stock']\r\n\t\tcollection = db.get_collection(stockCode)\r\n\t\tdateLst = self.extractData(\"Stock\",stockCode,['date']).date\r\n\t\tdays = 0\r\n\t\tCloseLst = []\r\n\t\tfor dt in dateLst:\r\n\t\t\tif dt >= date:\r\n\t\t\t\tCloseLst.append(float(collection.find_one({'date':dt})['close']))\r\n\t\t\t\tif days >= judgeTerm:\r\n\t\t\t\t\tbreak\r\n\t\t\t\tdays += 1\r\n\t\tif CloseLst[-1] > CloseLst[0]:\r\n\t\t\tcharacter = '利好'\r\n\t\telif CloseLst[-1] < CloseLst[0]:\r\n\t\t\tcharacter = '利空'\r\n\t\telse:\r\n\t\t\tcharacter = '中立'\r\n\t\treturn character\r\n\r\n\tdef getNewsOfSpecificStock(self,dbColLst,stockCode,**kwarg):\r\n\t\t'''Get news related to specific stock from historical database.\r\n\r\n\t\t# Arguments:\r\n\t\t\tdbColLst: List of databases and collections, eg: [(db_1,col_1),(db_2,col_2),...,(db_N,col_N)].\r\n\t\t\tstockCode: Code of specific stock.\r\n\t\t\texport: List parameters deciding the ways of exporting('csv' or 'database')\r\n\t\t\t\t\tand file path of saving, eg: export=['csv','.\\\\file'].\r\n\t\t'''\r\n\t\tif kwarg['export'][0] == 'csv':\r\n\t\t\twith open(kwarg['export'][1] + '\\\\' + stockCode + '.csv', 'a+', newline='',encoding='utf-8') as file:\r\n\t\t\t\tfieldnames = ['date','address','title','article']\r\n\t\t\t\twriter = csv.DictWriter(file, fieldnames=fieldnames)\r\n\t\t\t\twriter.writeheader()\r\n\t\t\t\tfor dbName,colName in dbColLst:\r\n\t\t\t\t\tdb = self._Conn[dbName]\r\n\t\t\t\t\tcollection = db.get_collection(colName)\r\n\t\t\t\t\tidLst = self.extractData(dbName,colName,['_id'])._id\r\n\t\t\t\t\tif dbName == 'Sina_Stock':\r\n\t\t\t\t\t\tfor _id in idLst:\r\n\t\t\t\t\t\t\tkeys = ' '.join([k for k in collection.find_one({'_id':ObjectId(_id)}).keys()])\r\n\t\t\t\t\t\t\tif keys.find('RelevantStock') != -1:\r\n\t\t\t\t\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['RelevantStock'].find(stockCode) != -1:\r\n\t\t\t\t\t\t\t\t\tprint('     ' + collection.find_one({'_id':ObjectId(_id)})['Title'])\r\n\t\t\t\t\t\t\t\t\twriter.writerow({'date':collection.find_one({'_id':ObjectId(_id)})['Date'], \\\r\n\t\t\t\t\t\t\t\t\t\t'address':collection.find_one({'_id':ObjectId(_id)})['Address'], \\\r\n\t\t\t\t\t\t\t\t\t\t'title':collection.find_one({'_id':ObjectId(_id)})['Title'], \\\r\n\t\t\t\t\t\t\t\t\t\t'article':collection.find_one({'_id':ObjectId(_id)})['Article']})\r\n\t\t\t\t\telif dbName == 'NBD':\r\n\t\t\t\t\t\tfor _id in idLst:\r\n\t\t\t\t\t\t\tkeys = ' '.join([k for k in collection.find_one({'_id':ObjectId(_id)}).keys()])\r\n\t\t\t\t\t\t\tif keys.find('relevantStock') != -1:\r\n\t\t\t\t\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['relevantStock'].find(stockCode) != -1:\r\n\t\t\t\t\t\t\t\t\tprint('     ' + collection.find_one({'_id':ObjectId(_id)})['title'])\r\n\t\t\t\t\t\t\t\t\twriter.writerow({'date':collection.find_one({'_id':ObjectId(_id)})['date'], \\\r\n\t\t\t\t\t\t\t\t\t\t'address':collection.find_one({'_id':ObjectId(_id)})['address'], \\\r\n\t\t\t\t\t\t\t\t\t\t'title':collection.find_one({'_id':ObjectId(_id)})['title'], \\\r\n\t\t\t\t\t\t\t\t\t\t'article':collection.find_one({'_id':ObjectId(_id)})['Article']})\r\n\t\t\t\t\tprint(' [*] extracting ' + stockCode + ' news from ' + dbName + ' database to CSV file successfully ... ')\r\n\t\telif kwarg['export'][0] == 'database': #new database\r\n\t\t\tfor dbName,colName in dbColLst:\r\n\t\t\t\tdb = self._Conn[dbName]\r\n\t\t\t\tcollection = db.get_collection(colName)\r\n\t\t\t\tidLst = self.extractData(dbName,colName,['_id'])._id\r\n\t\t\t\tif dbName == 'NBD_Stock':\r\n\t\t\t\t\tnewdb = self._Conn[kwarg['export'][1]]\r\n\t\t\t\t\tnewcollection = newdb.get_collection(kwarg['export'][2])\r\n\t\t\t\t\tfor _id in idLst:\r\n\t\t\t\t\t\tkeys = ' '.join([k for k in collection.find_one({'_id':ObjectId(_id)}).keys()])\r\n\t\t\t\t\t\tif keys.find('relevantStock') != -1:\r\n\t\t\t\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['relevantStock'].find(stockCode) != -1:\r\n\t\t\t\t\t\t\t\tcharacter = self.judgeGoodOrBadNews(stockCode,\\\r\n\t\t\t\t\t\t\t\t\tcollection.find_one({'_id':ObjectId(_id)})['date'].split(' ')[0].replace('-',''),kwarg['judgeTerm'])\r\n\r\n\t\t\t\t\t\t\t\t# print('     ' + collection.find_one({'_id':ObjectId(_id)})['title'] + '(' + character + ')')\r\n\r\n\t\t\t\t\t\t\t\tdata = {'Date' : collection.find_one({'_id':ObjectId(_id)})['date'],\r\n\t\t\t\t\t\t\t\t\t\t'Address' : collection.find_one({'_id':ObjectId(_id)})['address'],\r\n\t\t\t\t\t\t\t\t\t\t'Title' : collection.find_one({'_id':ObjectId(_id)})['title'],\r\n\t\t\t\t\t\t\t\t\t\t'Article' : collection.find_one({'_id':ObjectId(_id)})['Article'],\r\n\t\t\t\t\t\t\t\t\t\t'Character' : character}\r\n\t\t\t\t\t\t\t\tnewcollection.insert_one(data) \r\n\t\t\t\telif dbName == 'Sina_Stock':\r\n\t\t\t\t\tnewdb = self._Conn[kwarg['export'][1]]\r\n\t\t\t\t\tnewcollection = newdb.get_collection(kwarg['export'][2])\r\n\t\t\t\t\tfor _id in idLst:\r\n\t\t\t\t\t\tkeys = ' '.join([k for k in collection.find_one({'_id':ObjectId(_id)}).keys()])\r\n\t\t\t\t\t\tif keys.find('RelevantStock') != -1:\r\n\t\t\t\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['RelevantStock'].find(stockCode) != -1:\r\n\t\t\t\t\t\t\t\tcharacter = self.judgeGoodOrBadNews(stockCode,\\\r\n\t\t\t\t\t\t\t\t\tcollection.find_one({'_id':ObjectId(_id)})['Date'].split(' ')[0].replace('-',''),kwarg['judgeTerm'])\r\n\r\n\t\t\t\t\t\t\t\t# print('     ' + collection.find_one({'_id':ObjectId(_id)})['Title'] + '(' + character + ')')\r\n\r\n\t\t\t\t\t\t\t\tdata = {'Date' : collection.find_one({'_id':ObjectId(_id)})['Date'],\r\n\t\t\t\t\t\t\t\t\t\t'Address' : collection.find_one({'_id':ObjectId(_id)})['Address'],\r\n\t\t\t\t\t\t\t\t\t\t'Title' : collection.find_one({'_id':ObjectId(_id)})['Title'],\r\n\t\t\t\t\t\t\t\t\t\t'Article' : collection.find_one({'_id':ObjectId(_id)})['Article'],\r\n\t\t\t\t\t\t\t\t\t\t'Character' : character}\r\n\t\t\t\t\t\t\t\tnewcollection.insert_one(data)\r\n\t\t\t\telse:\r\n\t\t\t\t\tnewdb = self._Conn[kwarg['export'][1]]\r\n\t\t\t\t\tnewcollection = newdb.get_collection(kwarg['export'][2])\r\n\t\t\t\t\tfor _id in idLst:\r\n\t\t\t\t\t\tkeys = ' '.join([k for k in collection.find_one({'_id':ObjectId(_id)}).keys()])\r\n\t\t\t\t\t\tif keys.find('relevantStock') != -1:\r\n\t\t\t\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['relevantStock'].find(stockCode) != -1:\r\n\t\t\t\t\t\t\t\tcharacter = self.judgeGoodOrBadNews(stockCode,\\\r\n\t\t\t\t\t\t\t\t\tcollection.find_one({'_id':ObjectId(_id)})['Date'].split(' ')[0].replace('-',''),kwarg['judgeTerm'])\r\n\r\n\t\t\t\t\t\t\t\t# print('     ' + collection.find_one({'_id':ObjectId(_id)})['Title'] + '(' + character + ')')\r\n\r\n\t\t\t\t\t\t\t\tdata = {'Date' : collection.find_one({'_id':ObjectId(_id)})['Date'],\r\n\t\t\t\t\t\t\t\t\t\t'Address' : collection.find_one({'_id':ObjectId(_id)})['Address'],\r\n\t\t\t\t\t\t\t\t\t\t'Title' : collection.find_one({'_id':ObjectId(_id)})['Title'],\r\n\t\t\t\t\t\t\t\t\t\t'Article' : collection.find_one({'_id':ObjectId(_id)})['Article'],\r\n\t\t\t\t\t\t\t\t\t\t'Character' : character}\r\n\t\t\t\t\t\t\t\tnewcollection.insert_one(data)\t\r\n\t\t\t\tprint(' [' + stockCode + '] ' + dbName + ' has been extracted successfully ... ')\r\n\r\n\tdef classifyHistoryStockNews(self,dbName,stockCode,**kwarg):\r\n\t\t'''Build classifier from historical news(articles/documents) of specific stock.\r\n\r\n\t\t# Arguments:\r\n\t\t\tdbName: Name of database.\r\n\t\t\tstockCode: Code of specific stock.\r\n\t\t\trenewDict: Renew the dictionary created by historical news(articles/documents) of\r\n\t\t\t\t\t\tspecific stock or not(bool type).\r\n\t\t\tmodelType: Transformation model type, including 'lsi', 'lda' and 'None', 'None' means TF-IDF mmodel.\r\n\t\t\ttfDim: The number of topics that will be extracted from each news(articles/documents). \r\n\t\t\trenewModel: Re-train the transformation models or not(bool type).\r\n\t\t\tClassifier: The name of classifier, including 'SVM' and 'RandomForest' so far.\r\n\t\t\tParams: The parameters of classifier, detail refer to the setting of classifier parameters of scikit-learn module.\r\n\t\t'''\r\n\t\tif kwarg['renewDict']:\r\n\t\t\tif not os.path.exists(self.DictPath+'\\\\'+stockCode):\r\n\t\t\t\tos.makedirs(self.DictPath+'\\\\'+stockCode)\r\n\t\t\tdb = self._Conn[dbName]\r\n\t\t\tcollection = db.get_collection(stockCode)\r\n\t\t\tidLst = self.extractData(dbName,stockCode,['_id'])._id\r\n\t\t\tarticles = []\r\n\t\t\tcharacters = []\r\n\t\t\tfor _id in idLst:\r\n\t\t\t\tarticles.append(collection.find_one({'_id':ObjectId(_id)})['Article'])\r\n\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['Character'] == \"利好\":\r\n\t\t\t\t\tcharacters.append(1)\r\n\t\t\t\telif collection.find_one({'_id':ObjectId(_id)})['Character'] == \"利空\":\r\n\t\t\t\t\tcharacters.append(-1)\r\n\t\t\t\telse:\r\n\t\t\t\t\tcharacters.append(0)\r\n\t\t\tself.tp.genDictionary(articles,saveDict=True,saveDictPath=self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_dict.dict',\\\r\n\t\t\t\tsaveBowvec=True,saveBowvecPath=self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_bowvec.mm',returnValue=False)\r\n\t\t\tprint(' [*] renew the dictionary and bow-vector successfully ... ')\r\n\t\telif not os.path.exists(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_dict.dict') \\\r\n\t\tor not os.path.exists(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_bowvec.mm'):\r\n\t\t\tif not os.path.exists(self.DictPath+'\\\\'+stockCode):\r\n\t\t\t\tos.makedirs(self.DictPath+'\\\\'+stockCode)\r\n\t\t\tdb = self._Conn[dbName]\r\n\t\t\tcollection = db.get_collection(stockCode)\r\n\t\t\tidLst = self.extractData(dbName,stockCode,['_id'])._id\r\n\t\t\tarticles = []\r\n\t\t\tcharacters = []\r\n\t\t\tfor _id in idLst:\r\n\t\t\t\tarticles.append(collection.find_one({'_id':ObjectId(_id)})['Article'])\r\n\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['Character'] == \"利好\":\r\n\t\t\t\t\tcharacters.append(1)\r\n\t\t\t\telif collection.find_one({'_id':ObjectId(_id)})['Character'] == \"利空\":\r\n\t\t\t\t\tcharacters.append(-1)\r\n\t\t\t\telse:\r\n\t\t\t\t\tcharacters.append(0)\r\n\t\t\tself.tp.genDictionary(articles,saveDict=True,saveDictPath=self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_dict.dict',\\\r\n\t\t\t\tsaveBowvec=True,saveBowvecPath=self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_bowvec.mm',returnValue=False)\r\n\t\t\tprint(' [*] generate and save the dictionary and bow-vector successfully ... ')\r\n\t\telse:\r\n\t\t\tdb = self._Conn[dbName]\r\n\t\t\tcollection = db.get_collection(stockCode)\r\n\t\t\tidLst = self.extractData(dbName,stockCode,['_id'])._id\r\n\t\t\tcharacters = []\r\n\t\t\tfor _id in idLst:\r\n\t\t\t\tif collection.find_one({'_id':ObjectId(_id)})['Character'] == \"利好\":\r\n\t\t\t\t\tcharacters.append(1)\r\n\t\t\t\telif collection.find_one({'_id':ObjectId(_id)})['Character'] == \"利空\":\r\n\t\t\t\t\tcharacters.append(-1)\r\n\t\t\t\telse:\r\n\t\t\t\t\tcharacters.append(0)\r\n\t\tdictionary = corpora.Dictionary.load(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_dict.dict')\r\n\t\tbowvec = corpora.MmCorpus(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_bowvec.mm')\r\n\t\tprint(' [*] load dictionary and bow-vector successfully ... ')\r\n\t\t_, modelVec = self.tp.CallTransformationModel(dictionary,bowvec,modelType=kwarg['modelType'],\\\r\n\t\t\ttfDim=kwarg['tfDim'],renewModel=kwarg['renewModel'],modelPath=self.DictPath+'\\\\'+stockCode+'\\\\')\r\n\t\tCSRMatrix = self.ConvertToCSRMatrix(modelVec)\r\n\t\ttrain_X, train_Y, test_X, test_Y = self.genTrainingSet(CSRMatrix,characters)\r\n\t\tif kwarg['Classifier'] == 'SVM':\r\n\t\t\tself.SVMClassifier(train_X,train_Y,test_X,test_Y,kwarg['Params'],['precision'],stockCode)\r\n\t\tif kwarg['Classifier'] == 'RandomForest':\r\n\t\t\tself.RdForestClassifier(train_X,train_Y,test_X,test_Y,kwarg['Params'],['precision'],stockCode)\r\n\t\treturn self._precise\r\n\r\n\tdef classifyRealtimeStockNews(self,doc_list):\r\n\t\t'''Classify real-time news(articles/documents) of specific stock.\r\n\r\n\t\t#Arguments:\r\n\t\t\tdoc_list: List of real-time news(articles/documents) crawled from specific websites.\r\n\t\t'''\r\n\t\tprint(' * extract relevant stock codes from latest crawled news ... ')\r\n\t\trelevant_stock_list = self.extractStockCodeFromRealtimeNews(doc_list)\r\n\t\tif len(relevant_stock_list) != 0:\r\n\t\t\ttfDim = 200\r\n\t\t\tfor i, code_list in enumerate(relevant_stock_list):\r\n\t\t\t\tfor code in code_list:\r\n\r\n\t\t\t\t\tprint(' * load SVM parameters (gamma & C) ... ')\r\n\t\t\t\t\tParams_svm = {'kernel': ['rbf'], 'gamma': [10, 20, 50, 100, 150, 200], \\\r\n\t\t\t\t\t\t'C': [10, 15, 20, 30, 50, 100]}\r\n\r\n\t\t\t\t\tprint(' * use historical news to build SVM model of ' + code + ' ... ')\r\n\t\t\t\t\tself.classifyHistoryStockNews(\"Stock_News\",code,modelType='lda',tfDim=tfDim,renewDict=False,\\\r\n\t\t\t\t\t\t\trenewModel=False,Classifier='SVM',Params=Params_svm) #code=\"600740\"\r\n\r\n\t\t\t\t\tprint(' * load historical dictionary of ' + code + ' ...')\r\n\t\t\t\t\tdictionary = corpora.Dictionary.load(os.getcwd() + '\\\\' + 'stock_dict_file\\\\' + code + '\\\\' + code + '_dict.dict')\r\n\t\t\t\t\t\r\n\t\t\t\t\tprint(' * tokenize latest crawled news ... ')\r\n\t\t\t\t\ttoken = self.tp.jieba_tokenize(doc_list)\r\n\r\n\t\t\t\t\tprint(' * create bow-vector of latest news of ' + code + ' ... ')\r\n\t\t\t\t\tbowvec_doc = [dictionary.doc2bow(text) for text in token]\r\n\t\t\t\t\t\r\n\t\t\t\t\tprint(' * load bow-vector of historical news of ' + code + ' ... ')\r\n\t\t\t\t\tbowvec_all = list(corpora.MmCorpus(os.getcwd() + '\\\\' + 'stock_dict_file\\\\' + code + '\\\\' + code + '_bowvec.mm'))\r\n\t\t\t\t\t\r\n\t\t\t\t\tprint(' * extend latest bow-vector to historical bow-vector of ' + code + ' ... ')\r\n\t\t\t\t\tbowvec_all.extend(bowvec_doc)\r\n\t\t\t\t\t\r\n\t\t\t\t\tprint(' * create new lda model of ' + code + ' ... ')\r\n\t\t\t\t\t_, NewmodelVec = self.tp.CallTransformationModel(dictionary,bowvec_all,modelType='lda',\\\r\n\t\t\t\t\t\t\t\t\ttfDim=200,renewModel=False,modelPath=os.getcwd() + '\\\\' + 'stock_dict_file\\\\' + code + '\\\\')\r\n\t\t\t\t\t\r\n\t\t\t\t\tprint(' * convert latest lda vector to CSR matrix of ' + code + ' ... ')\r\n\t\t\t\t\tNewCSRMatrix = self.ConvertToCSRMatrix(NewmodelVec)\r\n\t\t\t\t\t\r\n\t\t\t\t\tprint(' * load SVM model of ' + code + ' ... ')\r\n\t\t\t\t\tclf = joblib.load(os.getcwd() + '\\\\' + 'stock_dict_file\\\\' + code + '\\\\' + code + '_svm.pkl') \r\n\t\t\t\t\t\r\n\t\t\t\t\tprint(' * predicting ... ')\r\n\t\t\t\t\tif clf.predict(NewCSRMatrix[i-2,:])[0] == 1:\r\n\t\t\t\t\t\tprint('   《' + doc_list[i].split(' ')[0] + \"》\" + '对' + code + '是利好消息 ...')\r\n\t\t\t\t\telif clf.predict(NewCSRMatrix[i-2,:])[0] == -1:\r\n\t\t\t\t\t\tprint('   《' + doc_list[i].split(' ')[0] + \"》\" + '对' + code + '是利空消息 ...')\r\n\t\t\t\t\telse:\r\n\t\t\t\t\t\tprint('   《' + doc_list[i].split(' ')[0] + \"》\" + '对' + code + '是中立消息 ...')\r\n\t\telse:\r\n\t\t\tprint(' * not any relevant stock ... ')\r\n\r\n\tdef SVMClassifier(self,train_X,train_Y,test_X,test_Y,tuned_parameters,scores,stockCode):\r\n\t\t'''SVM Classifier.\r\n\r\n\t\t# Arguments:\r\n\t\t\ttrain_X: Features train data. \r\n\t\t\ttrain_Y: Labels train data.\r\n\t\t\ttest_X: Features train data.\r\n\t\t\ttest_Y: Labels train data.\r\n\t\t\ttuned_parameters: The parameters of classifier, refer to the setting of classifier parameters of scikit-learn module.\r\n\t\t\tscores: Targets of optimization, detail refer to optimal targets setting of scikit-learn module.\r\n\t\t\tstockCode: Code of specific stock.\r\n\t\t'''\r\n\t\tfor score in scores:\r\n\t\t\tif not os.path.exists(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_svm.pkl'):\r\n\t\t\t\tclf = GridSearchCV(svm.SVC(), tuned_parameters, cv=5, scoring='%s_weighted' % score) # 构造这个GridSearch的分类器,5-fold\r\n\t\t\t\tclf.fit(train_X, train_Y) # 只在训练集上面做k-fold,然后返回最优的模型参数\r\n\t\t\t\tjoblib.dump(clf, self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_svm.pkl')\r\n\t\t\t\tprint(clf.best_params_) # 输出最优的模型参数\r\n\t\t\telse:\r\n\t\t\t\tclf = joblib.load(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_svm.pkl') \r\n\t\t\t# for params, mean_score, scores in clf.grid_scores_:\r\n\t\t\t# \tprint(\"%0.3f (+/-%0.03f) for %r\" % (mean_score, scores.std() * 2, params))\r\n\t\t\ttrain_pred = clf.predict(train_X) \r\n\t\t\ttest_pred = clf.predict(test_X) # 在测试集上测试最优的模型的泛化能力.\r\n\t\t\tprint(classification_report(test_Y, test_pred))\r\n\r\n\t\tprecise_train = 0\r\n\t\tfor k in range(len(train_pred)):\r\n\t\t\tif train_pred[k] == train_Y[k]:\r\n\t\t\t\tprecise_train += 1\r\n\t\tprecise_test = 0\r\n\t\tfor k in range(len(test_pred)):\r\n\t\t\tif test_pred[k] == test_Y[k]:\r\n\t\t\t\tprecise_test += 1\r\n\t\tprint(' [*] train_pred:', precise_train/len(train_Y), ', test_pred:', precise_test/len(test_pred))\r\n\t\tprint(' ' + '-' * 50)\r\n\t\tself._precise = precise_test/len(test_pred)\r\n\r\n\tdef RdForestClassifier(self,train_X,train_Y,test_X,test_Y,tuned_parameters,scores,stockCode):\r\n\t\t'''Random Forest Classifier.\r\n\r\n\t\t# Arguments:\r\n\t\t\ttrain_X: Features train data. \r\n\t\t\ttrain_Y: Labels train data.\r\n\t\t\ttest_X: Features train data.\r\n\t\t\ttest_Y: Labels train data.\r\n\t\t\ttuned_parameters: The parameters of classifier, refer to the setting of classifier parameters of scikit-learn module.\r\n\t\t\tscores: Targets of optimization, detail refer to optimal targets setting of scikit-learn module.\r\n\t\t\tstockCode: Code of specific stock.\r\n\t\t'''\r\n\t\tfor score in scores:\r\n\t\t\tif not os.path.exists(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_rdf.pkl'):\r\n\t\t\t\tclf = GridSearchCV(RandomForestClassifier(random_state=14), tuned_parameters, cv=5, scoring='%s_weighted' % score) # 构造这个GridSearch的分类器,5-fold\r\n\t\t\t\tclf.fit(train_X, train_Y) # 只在训练集上面做k-fold,然后返回最优的模型参数\r\n\t\t\t\tjoblib.dump(clf, self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_rdf.pkl')\r\n\t\t\t\tprint(clf.best_params_) # 输出最优的模型参数\r\n\t\t\telse:\r\n\t\t\t\tclf = joblib.load(self.DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_rdf.pkl') \r\n\t\t\t# for params, mean_score, scores in clf.grid_scores_:\r\n\t\t\t# \tprint(\"%0.3f (+/-%0.03f) for %r\" % (mean_score, scores.std() * 2, params))\r\n\t\t\ttrain_pred = clf.predict(train_X) \r\n\t\t\ttest_pred = clf.predict(test_X) # 在测试集上测试最优的模型的泛化能力.\r\n\t\t\tprint(classification_report(test_Y, test_pred))\r\n\t\tprecise_train = 0\r\n\t\tfor k in range(len(train_pred)):\r\n\t\t\tif train_pred[k] == train_Y[k]:\r\n\t\t\t\tprecise_train += 1\r\n\t\tprecise_test = 0\r\n\t\tfor k in range(len(test_pred)):\r\n\t\t\tif test_pred[k] == test_Y[k]:\r\n\t\t\t\tprecise_test += 1\r\n\t\tprint(' [*] train_pred:', precise_train/len(train_Y), ', test_pred:', precise_test/len(test_pred))\r\n\t\tprint(' ' + '-' * 50)\r\n\t\tself._precise = precise_test/len(test_pred)\r\n\r\n\tdef ConvertToCSRMatrix(self,modelVec):\r\n\t\t'''Convert LDA(LSI) model vector to CSR sparse matrix, that could be accepted by Scipy and Numpy.\r\n\t\t\r\n\t\t# Arguments:\r\n\t\t\tmodelVec: Transformation model vector, such as LDA model vector, tfidf model vector or lsi model vector.\r\n\t\t'''\r\n\t\tdata = []\r\n\t\trows = []\r\n\t\tcols = []\r\n\t\tself._line_count = 0\r\n\t\tfor line in modelVec:  \r\n\t\t\tfor elem in line:\r\n\t\t\t\trows.append(self._line_count)\r\n\t\t\t\tcols.append(elem[0])\r\n\t\t\t\tdata.append(elem[1])\r\n\t\t\tself._line_count += 1\r\n\t\tsparse_matrix = csr_matrix((data,(rows,cols))) \r\n\t\tmatrix = sparse_matrix.toarray() \r\n\t\treturn matrix\r\n\r\n\tdef genTrainingSet(self,X,Y):\r\n\t\t'''Generate training data set.\r\n\r\n\t\t# Arguments:\r\n\t\t\tX: Feature set.\r\n\t\t\tY: Label set.\r\n\t\t'''\r\n\t\trarray=np.random.random(size=self._line_count)\r\n\t\ttrain_X = []\r\n\t\ttrain_Y = []\r\n\t\ttest_X = []\r\n\t\ttest_Y = []\r\n\t\tfor i in range(self._line_count):\r\n\t\t\tif rarray[i]<0.8:\r\n\t\t\t\ttrain_X.append(X[i,:])\r\n\t\t\t\ttrain_Y.append(Y[i])\r\n\t\t\telse:\r\n\t\t\t\ttest_X.append(X[i,:])\r\n\t\t\t\ttest_Y.append(Y[i])\r\n\t\treturn train_X,train_Y,test_X,test_Y\r\n"
  },
  {
    "path": "legacy_v1/Text_Analysis/text_processing.py",
    "content": "# -*- coding: UTF-8 -*- \r\n\"\"\"\r\nCreated on Fri Feb 23 12:37:46 2018\r\n\r\n@author: Damon Li\r\n\"\"\"\r\n\r\nimport numpy as np\r\n\r\nimport jieba, os\r\nfrom gensim import corpora,similarities,models,matutils,utils\r\n\r\n\r\nclass TextProcessing(object):\r\n    '''Text pre-processing functions class.\r\n\r\n    # Arguments\r\n        chnSTWPath: chinese stop words txt file path.\r\n        finance_dict: latest financial related words txt file path.\r\n    '''\r\n\r\n    def __init__(self,chnSTWPath,finance_dict):\r\n        self.chnSTWPath = chnSTWPath\r\n        self.finance_dict = finance_dict\r\n\r\n    def renewFinanceDict(self,new_Word_list):\r\n        '''Add latest necessary financial words into financial dictionary\r\n            for improving tokenization effect.\r\n\r\n        # Arguments:\r\n            new_Word_list: New financial words list, eg: [\"区块链\"，\"离岸金融\"].\r\n        '''\r\n        with open(self.finance_dict,'a',encoding='utf-8') as file:\r\n            for word in new_Word_list:\r\n                file.write(word + '\\n')\r\n\r\n    def getchnSTW(self):\r\n        '''Load the stop words txt file.\r\n        '''   \r\n        stopwords = [line.strip() for line in open(self.chnSTWPath, 'r').readlines()]  \r\n        return stopwords\r\n\r\n    def jieba_tokenize(self,documents): \r\n        '''Cut the documents into a sequence of independent words.\r\n\r\n        # Arguments:\r\n            documents: List of news(articles).\r\n        '''\r\n        chnSTW = self.getchnSTW()\r\n        corpora_documents = []\r\n        jieba.load_userdict(self.finance_dict)\r\n        for item_text in documents: \r\n            outstr = []\r\n            sentence_seged = list(jieba.cut(item_text))\r\n            for word in sentence_seged:  \r\n                if word not in chnSTW and word != '\\t' \\\r\n                and word != ' ':  \r\n                    outstr.append(word)\r\n            corpora_documents.append(outstr)\r\n        return corpora_documents\r\n\r\n    def RemoveWordAppearOnce(self,corpora_documents):\r\n        '''Remove the words that appear once among all the tokenized news(articles).\r\n\r\n        # Arguments:\r\n             corpora_documents: List of tokenized news(articles).\r\n        '''\r\n        frequency = defaultdict(int)  \r\n        for text in corpora_documents:  \r\n            for token in text:      \r\n                frequency[token] += 1 \r\n        corpora_documents = [[token for token in text if frequency[token] > 1]  for text in corpora_documents] \r\n        return corpora_documents\r\n\r\n    def genDictionary(self,documents,**kwarg):\r\n        '''Generate dictionary and bow-vector of all tokenzied news(articles).\r\n\r\n        # Arguments:\r\n            documents: List of news(articles).\r\n            saveDict: Save dictionary or not(bool type).\r\n            saveBowvec: Save bow-vector or not(bool type).\r\n            returnValue: Return value or not(bool type).\r\n        '''\r\n        self._raw_documents = documents\r\n        token = self.jieba_tokenize(documents) #jieba tokenize\r\n        #corpora_documents = self.RemoveWordAppearOnce(token)  # remove thw words appearing once in the dictionary\r\n        self._dictionary = corpora.Dictionary(token)  # generate dictionary using tokenized documents  \r\n        if kwarg['saveDict']:\r\n            self._dictionary.save(kwarg['saveDictPath']) # store the dictionary, for future reference\r\n        self._BowVecOfEachDoc = [self._dictionary.doc2bow(text) for text in token]  # convert tokenized documents to vectors\r\n        if kwarg['saveBowvec']:\r\n            corpora.MmCorpus.serialize(kwarg['saveBowvecPath'], self._BowVecOfEachDoc)  # store to disk, for later use\r\n        if kwarg['returnValue']:\r\n            return token, self._dictionary, self._BowVecOfEachDoc\r\n\r\n    def CallTransformationModel(self,Dict,Bowvec,**kwarg):\r\n        '''Invoke specific transformation models of Gensim module.\r\n\r\n        # Arguments:\r\n            Dict: Dictionary made by all tokenized news(articles/documents).\r\n            Bowvec: Bow-vector created by all tokenized news(articles/documents).\r\n            modelType: Transformation model type, including 'lsi', 'lda' and 'None', 'None' means TF-IDF mmodel.\r\n            tfDim: The number of topics that will be extracted from each news(articles/documents). \r\n            renewModel: Re-train the transformation models or not(bool type).\r\n            modelPath: The path of saving trained transformation models.\r\n        '''\r\n        if kwarg['renewModel']:\r\n            tfidf = models.TfidfModel(Bowvec)  # initialize tfidf model\r\n            tfidfVec = tfidf[Bowvec] # use the model to transform whole corpus\r\n            tfidf.save(kwarg['modelPath']+\"tfidf_model.tfidf\")\r\n            if kwarg['modelType'] == 'lsi':\r\n                model = models.LsiModel(tfidfVec, id2word=Dict, num_topics=kwarg['tfDim']) # initialize an LSI transformation\r\n                modelVec = model[tfidfVec] # create a double wrapper over the original corpus: bow->tfidf->fold-in-lsi\r\n                model.save(kwarg['modelPath']) # same for tfidf, lda, ...\r\n            elif kwarg['modelType'] == 'lda':\r\n                model = models.LdaModel(tfidfVec, id2word=Dict, num_topics=kwarg['tfDim'])\r\n                modelVec = model[tfidfVec] #每个文本对应的LDA向量，稀疏的，元素值是隶属与对应序数类的权重 \r\n                model.save(kwarg['modelPath']) # same for tfidf, lda, ...\r\n            elif kwarg['modelType'] == 'None': \r\n                model = tfidf\r\n                modelVec = tfidfVec\r\n        else:\r\n            if not os.path.exists(kwarg['modelPath']+\"tfidf_model.tfidf\"):\r\n                tfidf = models.TfidfModel(Bowvec)  # initialize tfidf model\r\n                tfidfVec = tfidf[Bowvec] #\r\n                tfidf.save(kwarg['modelPath']+\"tfidf_model.tfidf\")\r\n            else:\r\n                tfidf = models.TfidfModel.load(kwarg['modelPath']+\"tfidf_model.tfidf\") \r\n                tfidfVec = tfidf[Bowvec] # use the model to transform whole corpus\r\n            if kwarg['modelType'] == 'lsi':\r\n                if not os.path.exists(kwarg['modelPath']+\"lsi_model.lsi\"):\r\n                    tfidf = models.TfidfModel.load(kwarg['modelPath']+\"tfidf_model.tfidf\") \r\n                    tfidfVec = tfidf[Bowvec] # use the model to transform whole corpus\r\n                    model = models.LsiModel(tfidfVec, id2word=Dict, num_topics=kwarg['tfDim']) # initialize an LSI transformation\r\n                    modelVec = model[tfidfVec] # create a double wrapper over the original corpus: bow->tfidf->fold-in-lsi\r\n                    model.save(kwarg['modelPath']+\"lsi_model.lsi\") # same for tfidf, lda, ...\r\n                else:\r\n                    model = models.LsiModel.load(kwarg['modelPath']+\"lsi_model.lsi\")\r\n                    modelVec = model[tfidfVec] \r\n            elif kwarg['modelType'] == 'lda':\r\n                if not os.path.exists(kwarg['modelPath']+\"lda_model.lda\"):\r\n                    tfidf = models.TfidfModel.load(kwarg['modelPath']+\"tfidf_model.tfidf\") \r\n                    tfidfVec = tfidf[Bowvec] # use the model to transform whole corpus\r\n                    model = models.LdaModel(tfidfVec, id2word=Dict, num_topics=kwarg['tfDim'])\r\n                    modelVec = model[tfidfVec] #每个文本对应的LDA向量，稀疏的，元素值是隶属与对应序数类的权重 \r\n                    model.save(kwarg['modelPath']+\"lda_model.lda\") # same for tfidf, lda, ...\r\n                else:\r\n                    model = models.LdaModel.load(kwarg['modelPath']+\"lda_model.lda\")\r\n                    modelVec = model[tfidfVec] \r\n            elif kwarg['modelType'] == 'None': \r\n                model = tfidf\r\n                modelVec = tfidfVec\r\n        return tfidfVec, modelVec\r\n\r\n    def CalSim(self,test_document,Type,best_num):\r\n        '''Calculate similarities between test document wth all news(articles/documents).\r\n\r\n        # Arguments:\r\n            test_document: List of raw documents.\r\n            Type: Models of calculating similarities.\r\n            best_num: refer to 'num_best' parameter in Gensim module.\r\n        '''\r\n        if Type == 'Similarity-tfidf-index':\r\n            tfidf = models.TfidfModel(self._BowVecOfEachDoc)  \r\n            tfidfVec = tfidf[self._BowVecOfEachDoc]\r\n            self._num_features = len(self._dictionary.token2id.keys())\r\n            self._similarity = similarities.Similarity(Type, tfidfVec, \\\r\n                num_features=self._num_features,num_best=best_num)  \r\n            test_cut_raw = list(jieba.cut(test_document))  \r\n            test_BowVecOfEachDoc = self._dictionary.doc2bow(test_cut_raw) \r\n            self._test_BowVecOfEachDoc = tfidf[test_BowVecOfEachDoc]\r\n        elif Type == 'Similarity-LSI-index':\r\n            lsi_model = models.LsiModel(self._BowVecOfEachDoc)  \r\n            corpus_lsi = lsi_model[self._BowVecOfEachDoc]\r\n            self._num_features = len(self._dictionary.token2id.keys())\r\n            self._similarity = similarities.Similarity(Type, corpus_lsi, \\\r\n                num_features=self._num_features,num_best=best_num)  \r\n            test_cut_raw = list(jieba.cut(test_document))  \r\n            test_BowVecOfEachDoc = self._dictionary.doc2bow(test_cut_raw) \r\n            self._test_BowVecOfEachDoc = lsi_model[test_BowVecOfEachDoc]\r\n        self.Print_CalSim()\r\n        IdLst = []\r\n        SimRltLst = []\r\n        SimTxLst = []\r\n        for Id, Sim in self._similarity[self._test_BowVecOfEachDoc]:\r\n            IdLst.append(Id)\r\n            SimRltLst.append(Sim)\r\n            SimTxLst.append(self._raw_documents[Id])\r\n        return IdLst,SimTxLst,SimRltLst\r\n\r\n    def PrintWorfCloud(self,documents,backgroundImgPath,fontPath):\r\n        '''Print out the word cloud of all news(articles/documents).\r\n\r\n        # Arguments:\r\n            documents: Overall raw documents.\r\n            backgroundImgPath: Background image path.\r\n            fontPath: The path of windows fonts that used to create the word-cloud.\r\n        '''\r\n        from scipy.misc import imread\r\n        import matplotlib.pyplot as plt\r\n        from wordcloud import WordCloud\r\n        corpora_documents = self.jieba_tokenize(documents) #分词\r\n        for k in range(len(corpora_documents)):\r\n            corpora_documents[k] = ' '.join(corpora_documents[k])\r\n        corpora_documents = ' '.join(corpora_documents)\r\n        color_mask = imread(backgroundImgPath) #\"C:\\\\Users\\\\lenovo\\\\Desktop\\\\Text_Mining\\\\3.jpg\"\r\n        cloud = WordCloud(font_path=fontPath,mask=color_mask,background_color='white',\\\r\n                          max_words=2000,max_font_size=40) #\"C:\\\\Windows\\\\Fonts\\\\simhei.ttf\"\r\n        word_cloud = cloud.generate(corpora_documents) \r\n        plt.imshow(word_cloud, interpolation='bilinear')\r\n        plt.axis(\"off\")\r\n\r\nif __name__ == '__main__':\r\n    tp = TextProcessing(os.getcwd() + '\\\\' + 'Chinese_Stop_Words.txt', \\\r\n    os.getcwd() + '\\\\' + 'finance_dict.txt')\r\n    doc = ['中央、地方支持政策频出,煤炭行业站上了风口 券商研报浩如烟海，投资线索眼花缭乱，第一财经推出\\\r\n            《一财研选》产品，挖掘研报精华，每期梳理5条投资线索，便于您短时间内获取有价值的信息。专业团队\\\r\n            每周日至每周四晚8点准时“上新”，\\\r\n            助您投资顺利！1．中央、地方支持政策频出，这个行业站上了风口！（信达证券）近年来，利好住房租赁\\\r\n            市场发展的政策频频发布，顶层设计趋于完善。信达证券指出，2015年以来，住建部、国务院等机构相继出\\\r\n            台政策支持住房租赁市场发展，地方积极跟进，试点城市全部出台相关方案支持当地住房租赁市场发展。除\\\r\n            此之外，“租购同权”保障承租人享受公共服务的权益，稳定租赁关系，利好长租公寓发展。除政策利好长租\\\r\n            公寓外，需求的逐步释放对长租公寓市场形成支撑。信达证券研究发现，人口向核心一、二线城市流动趋势不\\\r\n            减，高房价刺激购房需求转向租房需求、首次置业年龄抬升、高校毕业生租房需求增加等因素将刺激长租公寓\\\r\n            需求进一步释放。总体而言，住房租赁市场容量逾万亿且具备区域性特征。2017年8月，国土资源部、住房和城\\\r\n            乡建设部联合印发《利用集体建设用地建设租赁住房试点方案》，选择13个试点城市推进利用集体建设用地建\\\r\n            设租赁住房，各地“只租不售”地块频出，彰显政府发展住房租赁市场决心。类REITs产品盘活租赁资产，解决\\\r\n            长租融资痛点，上述举措能够有效增加租赁住房供给。伴随政策利好，多主体纷纷进军住房租赁市场。信达证\\\r\n            券指出，截至目前，房企、房地产中介、专业租赁机构、连锁酒店、金融机构和互联网公司均已涉足住宅租赁市\\\r\n            场。其中，房企多采用自持物业的重资产运营方式，中介机构及其他公司多以轻资产运营方式为主，从房源获\\\r\n            取的角度看，集中与分散并行。信达证券指出，当前我国租赁住房的发展还处于初步阶段，多主体参与、多模式\\\r\n            并存。参与各方均凭借自身比较优势切入住房租赁领域。未来，房企、互联网公司、金融机构存在巨大的合作空间。\\\r\n            在市场细分的前提下，增值服务的提供将成为住房租赁市场发展的关键。信达证券推荐关注招商蛇口(21.100, \\\r\n            -1.43, -6.35%)（001979.SZ）、万科A(31.270, -1.48, -4.52%)（000002.SZ）、世联行(8.700, -0.87,\\\r\n             -9.09%)（002285.SZ）、昆百大A(7.510, -0.05, -0.66%)（000560.SZ）、天健集团(9.330, -0.56, -5.66%)\\\r\n            （000090.SZ）。2．煤炭库存创八年新低，缺煤升级，高煤价仍将持续（中银国际）截至1月30日，秦皇岛5500大\\\r\n            卡山西优混动力煤报755元，跳涨2%，再超预期，并创近6年新高，此轮上涨持续了10周时间，累计涨幅达13%。煤炭\\\r\n            行业是本栏重点追踪的行业板块，近期的大涨验证了此前选摘的多家研究机构的观点，今天我们再来看一下中银国际\\\r\n            对板块未来表现的分析观点。中银国际指出，六大电厂日耗量周均81万吨，环比增加9%，库存天数由13天下降至10.9天\\\r\n            ，为近8年新低，库存下降至899万吨，为近7年新低。缺煤情况非常突出。经济的强韧性叠加寒冷冰雪天气推升需求超预\\\r\n            期是主因，供应侧在年关生产积极性不高、运输不畅是辅因，且短期较难明显缓解，2月初地方矿也面临陆续放假，在\\\r\n            这种情况下煤价有继续攀高的可能。中银国际认为此轮煤价上涨包含着较多非季节性因素：六大电厂日耗从2017年12月\\\r\n            开始同比增幅都在10%以上，这还是在有工业限产的情况下，这是非常高的数字，在2017年7~8月旺季的同比增幅也只\\\r\n            有15%左右。经济较好下的需求超预期历来是煤炭股最好的催化剂。尽管2月份由于春节因素可能价格会回落，但在2018\\\r\n            年缺煤明显的情况下，幅度不会太大，高煤价还会继续维持。3月初两会召开，安全形势再度紧张，煤炭的供应仍然会偏\\\r\n            紧，在叠加3月15日后限产解除，限产解除前后下游补库存，高煤价可能会贯穿整个一季度。中银国际指出，2017年1月秦\\\r\n            皇岛煤价均价只有602元，2018年1月的均价为726元，同比增长21%，动力煤公司一季度的业绩大概率会上调。尽管后续煤\\\r\n            价调控的压力在加大，但近期效果可能不明显，中期有待观察。煤炭板块2018年市盈率15倍，估值不贵，且存在继续上调\\\r\n            盈利预测和估值下行的可能，股价仍有空间。继续推荐动力煤龙头陕西煤业(8.340, -0.77, -8.45%)（601225.SH）、\\\r\n            兖州煤业(15.150, -1.24, -7.57%)（600803.SH）、中国神华(24.290, -1.16, -4.56%)（601088.SH），以及优质\\\r\n            的国企改革兼并重组题材股潞安环能(11.590, -1.11, -8.74%)（601699.SH）、山西焦化(12.420, -1.38, -10.00%\\\r\n            )（600740.SH）、山煤国际(4.520, -0.50, -9.96%)（600546.SH）、阳泉煤业(7.780, -0.86, -9.95%)（600348.SH）\\\r\n            。',\\\r\n            '郭文仓到重点工程项目督导检查 2月2日,公司党委书记、董事长、总经理郭文仓,公司董事,股份公司副总经理、总工程师、\\\r\n            郭毅民,股份公司副总经理张国富、柴高贵及相关单位负责人到焦化厂煤场全封闭和1#—4#干熄焦等重点工程项目建设工地\\\r\n            督导检查施工进度和安全工作情况。郭文仓一行实地查看并详细了解了现场施工情况,询问了施工队伍人员状况,他说,\\\r\n            煤场全封闭项目和1#—4#干熄焦项目是公司的重点环保项目,一定要力争将重点工程项目建成精品工程、一流环保标杆项目\\\r\n            。近日天气寒冷,又临近春节,煤场全封闭项目进入收尾的关键阶段,施工负责人要紧绷安全弦,加强现场安全管理,从细节抓\\\r\n            起,消除隐患,确保收尾工作安全稳定顺利。1#—4#干熄焦项目在大面积开工的重要时期,一定要统筹安排项目进度和质量\\\r\n            管理,落实好冬季防护措施,管控好每一道施工环节,目前尤其要注重人员的思想状况,做到不安全不施工,保证施工安全和人\\\r\n            员人身安全,确保项目“安全无事故、质量全达标、进度按计划、投资不超概、投产即达效、竣工不留尾、审计无问题、廉政建\\\r\n            设好”,为公司打造成全国独立焦化旗舰企业奠定坚实的基础。']\r\n    DictPath = os.getcwd() + '\\\\' + 'stock_dict_file'\r\n    stockCode = '600740'\r\n    print(DictPath)\r\n    print(DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_dict.dict')\r\n    print(DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_bowvec.mm')\r\n    if not os.path.exists(DictPath+'\\\\'+stockCode):\r\n        os.makedirs(DictPath+'\\\\'+stockCode)\r\n    tp.genDictionary(doc,saveDict=True,saveDictPath=DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_dict.dict',\\\r\n        saveBowvec=True,saveBowvecPath=DictPath+'\\\\'+stockCode+'\\\\'+stockCode+'_bowvec.mm',returnValue=False)"
  },
  {
    "path": "legacy_v1/finance_dict.txt",
    "content": ""
  },
  {
    "path": "legacy_v1/run_crawler_cnstock.py",
    "content": "from Crawler.crawler_cnstock import WebCrawlFromcnstock\r\n\r\nif __name__ == '__main__':\r\n    web_crawl_obj = WebCrawlFromcnstock(IP=\"localhost\",PORT=27017,ThreadsNum=4,\\\r\n        dbName=\"Cnstock_Stock\",collectionName=\"cnstock_news_company\")\r\n    web_crawl_obj.coroutine_run(621,10,1,url_Part_1='http://company.cnstock.com/company/scp_gsxw/') #Obj.multi_threads_run()\r\n    web_crawl_obj.coroutine_run(112,10,0,url_Part_1='http://ggjd.cnstock.com/gglist/search/qmtbbdj/')\r\n    web_crawl_obj.coroutine_run(116,10,0,url_Part_1='http://ggjd.cnstock.com/gglist/search/ggkx/')\r\n   \r\n "
  },
  {
    "path": "legacy_v1/run_crawler_jrj.py",
    "content": "from Crawler.crawler_jrj import WebCrawlFromjrj\r\n\r\nif __name__ == '__main__':\r\n    web_crawl_obj = WebCrawlFromjrj(\"2009-01-05\",\"2018-02-03\",100,ThreadsNum=4,IP=\"localhost\",PORT=27017,\\\r\n        dbName=\"Jrj_Stock\",collectionName=\"jrj_news_company\")\r\n    web_crawl_obj.coroutine_run()  #web_crawl_obj.single_run() #web_crawl_obj.multi_threads_run()"
  },
  {
    "path": "legacy_v1/run_crawler_nbd.py",
    "content": "from Crawler.crawler_nbd import WebCrawlFromNBD\r\n\r\nif __name__ == '__main__':\r\n    web_crawl_obj = WebCrawlFromNBD(2871,10,ThreadsNum=4,IP=\"localhost\",PORT=27017,dbName='NBD_Stock',\\\r\n      collectionName=\"nbd_news_company\")\r\n    url_lst_withoutNews = web_crawl_obj.coroutine_run() #web_crawl_obj.single_run() #web_crawl_obj.multi_threads_run()\r\n    if url_lst_withoutNews != []:\r\n       print(' -------------------- Re-Crawl News List Pages -------------------- ')\r\n       url_lst_withoutArticles, title_lst_withoutArticles = web_crawl_obj.ReCrawlNews(url_lst_withoutNews)\r\n    if url_lst_withoutArticles != [] or title_lst_withoutArticles != []:\r\n       print(' -------------------- Re-Crawl Article Pages -------------------- ')\r\n       web_crawl_obj.ReCrawlArticles(url_lst_withoutArticles,title_lst_withoutArticles)"
  },
  {
    "path": "legacy_v1/run_crawler_sina.py",
    "content": "from Crawler.crawler_sina import WebCrawlFromSina\r\n\r\nif __name__ == '__main__':\r\n    web_crawl_obj = WebCrawlFromSina(5000,100,ThreadsNum=4,IP=\"localhost\",PORT=27017,\\\r\n        dbName=\"Sina_Stock\",collectionName=\"sina_news_company\")\r\n    web_crawl_obj.coroutine_run()  #web_crawl_obj.single_run() #web_crawl_obj.multi_threads_run()"
  },
  {
    "path": "legacy_v1/run_crawler_stcn.py",
    "content": "from Crawler.crawler_stcn import WebCrawlFromstcn\r\n\r\nif __name__ == '__main__':\r\n    web_crawl_obj = WebCrawlFromstcn(IP=\"localhost\",PORT=27017,ThreadsNum=4,\\\r\n        dbName=\"Stcn_Stock\",collectionName=\"stcn_news_company\")\r\n    web_crawl_obj.coroutine_run(20,1,1,url_Part_1='http://company.stcn.com/gsxw/') \r\n    web_crawl_obj.coroutine_run(20,1,1,url_Part_1='http://stock.stcn.com/xingu/')\r\n    web_crawl_obj.coroutine_run(20,1,1,url_Part_1='http://stock.stcn.com/zhuli/')\r\n    web_crawl_obj.coroutine_run(20,1,1,url_Part_1='http://stock.stcn.com/bankuai/')\r\n    web_crawl_obj.coroutine_run(20,1,1,url_Part_1='http://stock.stcn.com/dapan/')"
  },
  {
    "path": "legacy_v1/run_crawler_tushare.py",
    "content": "from Crawler.crawler_tushare import CrawlStockData\r\n\r\nif __name__ == '__main__':\r\n\tt1 = time.time()\r\n\t# Initiate \r\n\tObj = CrawlStockData(IP=\"localhost\",PORT=27017)\r\n\t# Get basic infos of stocks\r\n\tObj.getStockBasicFromTushare(\"Stock\",\"Basic_Info\")\r\n\t# Extract stocks' code \r\n\tCode = Obj.extractData('Stock','Basic_Info',['code'])[0]\r\n\t# Get stock price from Tushare\r\n\tfor stockcode in Code:\r\n\t\tObj.getStockDayHistory('Stock',stockcode)\r\n\t\tprint(' [*] ' + stockcode + ' has finished storing ... ')\r\n\tt2 = time.time()\r\n\tprint(' running time:', t2 - t1)"
  },
  {
    "path": "legacy_v1/run_main.py",
    "content": "import time, datetime, threading\r\nfrom concurrent import futures\r\n\r\nfrom Crawler.crawler_sina import WebCrawlFromSina\r\nfrom Crawler.crawler_jrj import WebCrawlFromjrj\r\nfrom Crawler.crawler_cnstock import WebCrawlFromcnstock\r\nfrom Crawler.crawler_stcn import WebCrawlFromstcn\r\n\r\nimport Text_Analysis.text_mining as tm\r\n\r\ndef crawlers(web):\r\n\tif web == 'sina':\r\n\t\tweb_crawl_obj = WebCrawlFromSina(5000,100,ThreadsNum=4,IP=\"localhost\",PORT=27017,\\\r\n\t\t\tdbName=\"Sina_Stock\",collectionName=\"sina_news_company\")\r\n\t\tweb_crawl_obj.classifyRealtimeStockNews()\r\n\telif web == 'jrj':\r\n\t\tweb_crawl_obj = WebCrawlFromjrj(\"2009-01-05\",\"2018-02-03\",100,ThreadsNum=4,IP=\"localhost\",PORT=27017,\\\r\n\t\t\tdbName=\"Jrj_Stock\",collectionName=\"jrj_news_company\")\r\n\t\tweb_crawl_obj.classifyRealtimeStockNews()\r\n\telif web == 'cnstock':\r\n\t\tweb_crawl_obj = WebCrawlFromcnstock(IP=\"localhost\",PORT=27017,ThreadsNum=4,\\\r\n\t\t\tdbName=\"Cnstock_Stock\",collectionName=\"cnstock_news_company\")\r\n\t\tweb_crawl_obj.classifyRealtimeStockNews()\r\n\telif web == 'stcn':\r\n\t\tweb_crawl_obj = WebCrawlFromstcn(IP=\"localhost\",PORT=27017,ThreadsNum=4,\\\r\n\t\t\tdbName=\"Stcn_Stock\",collectionName=\"stcn_news_company\")\r\n\t\tweb_crawl_obj.classifyRealtimeStockNews()\r\n\r\nif __name__ == '__main__':\r\n\t# Step 1. Initiate\r\n\ttext_mining_obj = tm.TextMining(IP=\"localhost\",PORT=27017)\r\n\r\n\t# Step 2. Extract relevant stock codes of news(articles/documents) from all database\r\n\ttext_mining_obj.extractStockCodeFromArticle(\"NBD_Stock\",\"nbd_news_company\") # 从每经网的新闻中抽出相关的股票代码\r\n\ttext_mining_obj.extractStockCodeFromArticle(\"Cnstock_Stock\",\"cnstock_news_company\") # 从中国证券网的新闻中抽出相关的股票代码\r\n\ttext_mining_obj.extractStockCodeFromArticle(\"Stcn_Stock\",\"stcn_news_company\") # 从证券时报网的新闻中抽出相关的股票代码\r\n\ttext_mining_obj.extractStockCodeFromArticle(\"Jrj_Stock\",\"jrj_news_company\") # 从金融界网的新闻中抽出相关的股票代码\r\n\r\n\t# Step 3. Extract all news related to specific stock to new database(this step will take long time)\r\n\tcodeLst = text_mining_obj.extractData(\"Stock\",\"Basic_Info\",['code']).code\r\n\tRange = 10\r\n\tIdx = 0\r\n\twhile Idx < len(codeLst):\r\n\t\tthread_lst = []\r\n\t\tfor stockcode in codeLst[Idx:Idx+Range]:\r\n\t\t\tthread = threading.Thread(target=text_mining_obj.getNewsOfSpecificStock,\\\r\n\t\t\t\targs=([(\"NBD_Stock\",\"nbd_news_company\"),(\"Sina_Stock\",\"sina_news_company\"),\\\r\n\t\t\t\t(\"Cnstock_Stock\",\"cnstock_news_company\"),(\"Stcn_Stock\",\"stcn_news_company\"),(\"Jrj_Stock\",\\\r\n\t\t\t\t\"jrj_news_company\")],stockcode),kwargs={\"export\":['database','Stock_News',stockcode],\"judgeTerm\":3})\r\n\t\t\tthread_lst.append(thread)\r\n\t\tfor thread in thread_lst:\r\n\t\t\tthread.start()\r\n\t\tfor thread in thread_lst:\r\n\t\t\tthread.join()\r\n\t\tprint(' [*] have extracted ' + codeLst[Idx:Idx+Range])\r\n\t\tIdx += Range\r\n\tthread_lst = []\r\n\tfor stockcode in codeLst[Idx:]:\r\n\t\tthread = threading.Thread(target=text_mining_obj.getNewsOfSpecificStock,\\\r\n\t\t\targs=([(\"NBD_Stock\",\"nbd_news_company\"),(\"Sina_Stock\",\"sina_news_company\"),\\\r\n\t\t\t(\"Cnstock_Stock\",\"cnstock_news_company\"),(\"Stcn_Stock\",\"stcn_news_company\"),(\"Jrj_Stock\",\\\r\n\t\t\t\"jrj_news_company\")],stockcode),kwargs={\"export\":['database','Stock_News',stockcode],\"judgeTerm\":3})\r\n\t\tthread_lst.append(thread)\r\n\tfor thread in thread_lst:\r\n\t\tthread.start()\r\n\tfor thread in thread_lst:\r\n\t\tthread.join()\r\n\tprint(' [*] have extracted ' + codeLst[Idx:Idx+Range])\r\n\r\n\t# Step 4. Crawl real-time news from 'web_list' and make classification \r\n\tweb_list = ['sina','jrj','cnstock','stcn']\r\n\twith futures.ThreadPoolExecutor(max_workers=4) as executor:\r\n\t\tfuture_to_url = {executor.submit(crawlers,param) : \\\r\n\t\t\tind for ind, param in enumerate(web_list)}\r\n"
  },
  {
    "path": "legacy_v1/src/Gon/__init__.py",
    "content": "import os\nimport sys\n\n\ndef add_path(path):\n    if path not in sys.path:\n        sys.path.insert(0, path)\n\n\n# add `./src` dir to system path\nsrc_dir_1 = os.path.abspath(os.path.join(os.getcwd(), \"../\"))\n\n# add `./src/Gon` dir to system path\nsrc_dir_2 = os.path.dirname(__file__)\n\nadd_path(src_dir_1)\nadd_path(src_dir_2)"
  },
  {
    "path": "legacy_v1/src/Gon/cnstockspyder.py",
    "content": "\"\"\"\n中国证券网：https://www.cnstock.com\n公司聚焦：https://company.cnstock.com/company/scp_gsxw\n公告解读：https://ggjd.cnstock.com/gglist/search/qmtbbdj\n公告快讯：https://ggjd.cnstock.com/gglist/search/ggkx\n利好公告：https://ggjd.cnstock.com/company/scp_ggjd/tjd_sdlh\n\"\"\"\n\nimport __init__\n\nfrom spyder import Spyder\n\nfrom Kite import utils\nfrom Kite import config\nfrom Kite.database import Database\n\nfrom Killua.denull import DeNull\nfrom Killua.deduplication import Deduplication\n\nfrom Leorio.tokenization import Tokenization\n\nimport re\nimport time\nimport json\nimport redis\nimport random\nimport logging\nimport threading\nfrom bs4 import BeautifulSoup\nfrom selenium import webdriver\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass CnStockSpyder(Spyder):\n\n    def __init__(self, database_name, collection_name):\n        super(CnStockSpyder, self).__init__()\n        self.db_obj = Database()\n        self.col = self.db_obj.conn[database_name].get_collection(collection_name)\n        self.terminated_amount = 0\n        self.db_name = database_name\n        self.col_name = collection_name\n        self.tokenization = Tokenization(import_module=\"jieba\", user_dict=config.USER_DEFINED_DICT_PATH)\n        self.redis_client = redis.StrictRedis(host=config.REDIS_IP,\n                                              port=config.REDIS_PORT,\n                                              db=config.CACHE_NEWS_REDIS_DB_ID)\n\n    def get_url_info(self, url):\n        try:\n            bs = utils.html_parser(url)\n        except Exception:\n            return False\n        span_list = bs.find_all(\"span\")\n        part = bs.find_all(\"p\")\n        article = \"\"\n        date = \"\"\n        for span in span_list:\n            if \"class\" in span.attrs and span[\"class\"] == [\"timer\"]:\n                date = span.text\n                break\n        for paragraph in part:\n            chn_status = utils.count_chn(str(paragraph))\n            possible = chn_status[1]\n            if possible > self.is_article_prob:\n                article += str(paragraph)\n        while article.find(\"<\") != -1 and article.find(\">\") != -1:\n            string = article[article.find(\"<\"):article.find(\">\")+1]\n            article = article.replace(string, \"\")\n        while article.find(\"\\u3000\") != -1:\n            article = article.replace(\"\\u3000\", \"\")\n        article = \" \".join(re.split(\" +|\\n+\", article)).strip()\n\n        return [date, article]\n\n    def get_historical_news(self, url, category_chn=None, start_date=None):\n        \"\"\"\n        :param url: 爬虫网页\n        :param category_chn: 所属类别, 中文字符串, 包括'公司聚焦', '公告解读', '公告快讯', '利好公告'\n        :param start_date: 数据库中category_chn类别新闻最近一条数据的时间\n        \"\"\"\n        assert category_chn is not None\n        driver = webdriver.Chrome(executable_path=config.CHROME_DRIVER)\n        btn_more_text = \"\"\n        crawled_urls_list = self.extract_data([\"Url\"])[0]\n        logging.info(\"historical data length -> {} ... \".format(len(crawled_urls_list)))\n        # crawled_urls_list = []\n        driver.get(url)\n        name_code_df = self.db_obj.get_data(config.STOCK_DATABASE_NAME,\n                                            config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                            keys=[\"name\", \"code\"])\n        name_code_dict = dict(name_code_df.values)\n        if start_date is None:\n            while btn_more_text != \"没有更多\":\n                more_btn = driver.find_element_by_id('j_more_btn')\n                btn_more_text = more_btn.text\n                logging.info(\"1-{}\".format(more_btn.text))\n                if btn_more_text == \"加载更多\":\n                    more_btn.click()\n                    time.sleep(random.random())  # sleep random time less 1s\n                elif btn_more_text == \"加载中...\":\n                    time.sleep(random.random()+2)\n                    more_btn = driver.find_element_by_id('j_more_btn')\n                    btn_more_text = more_btn.text\n                    logging.info(\"2-{}\".format(more_btn.text))\n                    if btn_more_text == \"加载更多\":\n                        more_btn.click()\n                else:\n                    more_btn.click()\n                    break\n            bs = BeautifulSoup(driver.page_source, \"html.parser\")\n            for li in bs.find_all(\"li\", attrs={\"class\": [\"newslist\"]}):\n                a = li.find_all(\"h2\")[0].find(\"a\")\n                if a[\"href\"] not in crawled_urls_list:\n                    result = self.get_url_info(a[\"href\"])\n                    while not result:\n                        self.terminated_amount += 1\n                        if self.terminated_amount > config.CNSTOCK_MAX_REJECTED_AMOUNTS:\n                            # 始终无法爬取的URL保存起来\n                            with open(config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                file.write(\"{}\\n\".format(a[\"href\"]))\n                            logging.info(\"rejected by remote server longer than {} minutes, \"\n                                         \"and the failed url has been written in path {}\"\n                                         .format(config.CNSTOCK_MAX_REJECTED_AMOUNTS,\n                                                 config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH))\n                            break\n                        logging.info(\"rejected by remote server, request {} again after \"\n                                     \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                        time.sleep(60 * self.terminated_amount)\n                        result = self.get_url_info(a[\"href\"])\n                    if not result:\n                        # 爬取失败的情况\n                        logging.info(\"[FAILED] {} {}\".format(a[\"title\"], a[\"href\"]))\n                    else:\n                        # 有返回但是article为null的情况\n                        date, article = result\n                        while article == \"\" and self.is_article_prob >= .1:\n                            self.is_article_prob -= .1\n                            result = self.get_url_info(a[\"href\"])\n                            while not result:\n                                self.terminated_amount += 1\n                                if self.terminated_amount > config.CNSTOCK_MAX_REJECTED_AMOUNTS:\n                                    # 始终无法爬取的URL保存起来\n                                    with open(config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                        file.write(\"{}\\n\".format(a[\"href\"]))\n                                    logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                 \"and the failed url has been written in path {}\"\n                                                 .format(config.CNSTOCK_MAX_REJECTED_AMOUNTS,\n                                                         config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH))\n                                    break\n                                logging.info(\"rejected by remote server, request {} again after \"\n                                             \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                time.sleep(60 * self.terminated_amount)\n                                result = self.get_url_info(a[\"href\"])\n                            date, article = result\n                        self.is_article_prob = .5\n                        if article != \"\":\n                            related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                              name_code_dict)\n                            data = {\"Date\": date,\n                                    \"Category\": category_chn,\n                                    \"Url\": a[\"href\"],\n                                    \"Title\": a[\"title\"],\n                                    \"Article\": article,\n                                    \"RelatedStockCodes\": \" \".join(related_stock_codes_list)}\n                            # self.col.insert_one(data)\n                            self.db_obj.insert_data(self.db_name, self.col_name, data)\n                            logging.info(\"[SUCCESS] {} {} {}\".format(date, a[\"title\"], a[\"href\"]))\n        else:\n            # 当start_date不为None时，补充历史数据\n            is_click_button = True\n            start_get_url_info = False\n            tmp_a = None\n            while is_click_button:\n                bs = BeautifulSoup(driver.page_source, \"html.parser\")\n                for li in bs.find_all(\"li\", attrs={\"class\": [\"newslist\"]}):\n                    a = li.find_all(\"h2\")[0].find(\"a\")\n                    if tmp_a is not None and a[\"href\"] != tmp_a:\n                        continue\n                    elif tmp_a is not None and a[\"href\"] == tmp_a:\n                        start_get_url_info = True\n                    if start_get_url_info:\n                        date, _ = self.get_url_info(a[\"href\"])\n                        if date <= start_date:\n                            is_click_button = False\n                            break\n                tmp_a = a[\"href\"]\n                if is_click_button:\n                    more_btn = driver.find_element_by_id('j_more_btn')\n                    more_btn.click()\n            # 从一开始那条新闻到tmp_a都是新增新闻，不包括tmp_a\n            bs = BeautifulSoup(driver.page_source, \"html.parser\")\n            for li in bs.find_all(\"li\", attrs={\"class\": [\"newslist\"]}):\n                a = li.find_all(\"h2\")[0].find(\"a\")\n                if a[\"href\"] != tmp_a:\n                    result = self.get_url_info(a[\"href\"])\n                    while not result:\n                        self.terminated_amount += 1\n                        if self.terminated_amount > config.CNSTOCK_MAX_REJECTED_AMOUNTS:\n                            # 始终无法爬取的URL保存起来\n                            with open(config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                file.write(\"{}\\n\".format(a[\"href\"]))\n                            logging.info(\"rejected by remote server longer than {} minutes, \"\n                                         \"and the failed url has been written in path {}\"\n                                         .format(config.CNSTOCK_MAX_REJECTED_AMOUNTS,\n                                                 config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH))\n                            break\n                        logging.info(\"rejected by remote server, request {} again after \"\n                                     \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                        time.sleep(60 * self.terminated_amount)\n                        result = self.get_url_info(a[\"href\"])\n                    if not result:\n                        # 爬取失败的情况\n                        logging.info(\"[FAILED] {} {}\".format(a[\"title\"], a[\"href\"]))\n                    else:\n                        # 有返回但是article为null的情况\n                        date, article = result\n                        while article == \"\" and self.is_article_prob >= .1:\n                            self.is_article_prob -= .1\n                            result = self.get_url_info(a[\"href\"])\n                            while not result:\n                                self.terminated_amount += 1\n                                if self.terminated_amount > config.CNSTOCK_MAX_REJECTED_AMOUNTS:\n                                    # 始终无法爬取的URL保存起来\n                                    with open(config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                        file.write(\"{}\\n\".format(a[\"href\"]))\n                                    logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                 \"and the failed url has been written in path {}\"\n                                                 .format(config.CNSTOCK_MAX_REJECTED_AMOUNTS,\n                                                         config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH))\n                                    break\n                                logging.info(\"rejected by remote server, request {} again after \"\n                                             \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                time.sleep(60 * self.terminated_amount)\n                                result = self.get_url_info(a[\"href\"])\n                            date, article = result\n                        self.is_article_prob = .5\n                        if article != \"\":\n                            related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                              name_code_dict)\n                            data = {\"Date\": date,\n                                    \"Category\": category_chn,\n                                    \"Url\": a[\"href\"],\n                                    \"Title\": a[\"title\"],\n                                    \"Article\": article,\n                                    \"RelatedStockCodes\": \" \".join(related_stock_codes_list)}\n                            # self.col.insert_one(data)\n                            self.db_obj.insert_data(self.db_name, self.col_name, data)\n                            logging.info(\"[SUCCESS] {} {} {}\".format(date, a[\"title\"], a[\"href\"]))\n                else:\n                    break\n        driver.quit()\n\n    def get_realtime_news(self, url, category_chn=None, interval=60):\n        logging.info(\"start real-time crawling of URL -> {}, request every {} secs ... \".format(url, interval))\n        assert category_chn is not None\n        # TODO: 由于cnstock爬取的数据量并不大，这里暂时是抽取历史所有数据进行去重，之后会修改去重策略\n        name_code_df = self.db_obj.get_data(config.STOCK_DATABASE_NAME,\n                                            config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                            keys=[\"name\", \"code\"])\n        name_code_dict = dict(name_code_df.values)\n        crawled_urls = self.db_obj.get_data(self.db_name,\n                                            self.col_name,\n                                            keys=[\"Url\"])[\"Url\"].to_list()\n        while True:\n            # 每隔一定时间轮询该网址\n            bs = utils.html_parser(url)\n            for li in bs.find_all(\"li\", attrs={\"class\": [\"newslist\"]}):\n                a = li.find_all(\"h2\")[0].find(\"a\")\n                if a[\"href\"] not in crawled_urls:  # latest_3_days_crawled_href\n                    result = self.get_url_info(a[\"href\"])\n                    while not result:\n                        self.terminated_amount += 1\n                        if self.terminated_amount > config.CNSTOCK_MAX_REJECTED_AMOUNTS:\n                            # 始终无法爬取的URL保存起来\n                            with open(config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                file.write(\"{}\\n\".format(a[\"href\"]))\n                            logging.info(\"rejected by remote server longer than {} minutes, \"\n                                         \"and the failed url has been written in path {}\"\n                                         .format(config.CNSTOCK_MAX_REJECTED_AMOUNTS,\n                                                 config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH))\n                            break\n                        logging.info(\"rejected by remote server, request {} again after \"\n                                     \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                        time.sleep(60 * self.terminated_amount)\n                        result = self.get_url_info(a[\"href\"])\n                    if not result:\n                        # 爬取失败的情况\n                        logging.info(\"[FAILED] {} {}\".format(a[\"title\"], a[\"href\"]))\n                    else:\n                        # 有返回但是article为null的情况\n                        date, article = result\n                        while article == \"\" and self.is_article_prob >= .1:\n                            self.is_article_prob -= .1\n                            result = self.get_url_info(a[\"href\"])\n                            while not result:\n                                self.terminated_amount += 1\n                                if self.terminated_amount > config.CNSTOCK_MAX_REJECTED_AMOUNTS:\n                                    # 始终无法爬取的URL保存起来\n                                    with open(config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                        file.write(\"{}\\n\".format(a[\"href\"]))\n                                    logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                 \"and the failed url has been written in path {}\"\n                                                 .format(config.CNSTOCK_MAX_REJECTED_AMOUNTS,\n                                                         config.RECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH))\n                                    break\n                                logging.info(\"rejected by remote server, request {} again after \"\n                                             \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                time.sleep(60 * self.terminated_amount)\n                                result = self.get_url_info(a[\"href\"])\n                            date, article = result\n                        self.is_article_prob = .5\n                        if article != \"\":\n                            related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                              name_code_dict)\n                            self.db_obj.insert_data(self.db_name, self.col_name,\n                                                    {\"Date\": date,\n                                                     \"Category\": category_chn,\n                                                     \"Url\": a[\"href\"],\n                                                     \"Title\": a[\"title\"],\n                                                     \"Article\": article,\n                                                     \"RelatedStockCodes\": \" \".join(related_stock_codes_list)})\n                            self.redis_client.lpush(config.CACHE_NEWS_LIST_NAME, json.dumps(\n                                {\"Date\": date,\n                                 \"Category\": category_chn,\n                                 \"Url\": a[\"href\"],\n                                 \"Title\": a[\"title\"],\n                                 \"Article\": article,\n                                 \"RelatedStockCodes\": \" \".join(related_stock_codes_list),\n                                 \"OriDB\": config.DATABASE_NAME,\n                                 \"OriCOL\": config.COLLECTION_NAME_CNSTOCK\n                                 }\n                            ))\n                            logging.info(\"[SUCCESS] {} {} {}\".format(date, a[\"title\"], a[\"href\"]))\n                            crawled_urls.append(a[\"href\"])\n            # logging.info(\"sleep {} secs then request {} again ... \".format(interval, url))\n            time.sleep(interval)\n\n\n# \"\"\"\n# Example-1:\n# 爬取历史新闻数据\n# \"\"\"\n# if __name__ == '__main__':\n#     import time\n#     import logging\n#     from Kite import config\n#     from Killua.denull import DeNull\n#     from Killua.deduplication import Deduplication\n#     from Gon.cnstockspyder import CnStockSpyder\n#\n#     cnstock_spyder = CnStockSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n#     for url_to_be_crawled, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n#         logging.info(\"start crawling {} ...\".format(url_to_be_crawled))\n#         cnstock_spyder.get_historical_news(url_to_be_crawled, category_chn=type_chn)\n#         logging.info(\"finished ...\")\n#         time.sleep(30)\n#\n#     Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n#     DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n\n\n# \"\"\"\n# Example-2:\n# 爬取实时新闻数据\n# \"\"\"\n# if __name__ == '__main__':\n#     import time, logging, threading\n#     from Kite import config\n#     from Kite.database import Database\n#     from Killua.denull import DeNull\n#     from Killua.deduplication import Deduplication\n#     from Gon.cnstockspyder import CnStockSpyder\n#\n#     obj = Database()\n#     df = obj.get_data(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK, keys=[\"Date\", \"Category\"])\n#\n#     cnstock_spyder = CnStockSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n#     # 先补充历史数据，比如已爬取数据到2020-12-01，但是启动实时爬取程序在2020-12-23，则先\n#     # 自动补充爬取2020-12-02至2020-12-23的新闻数据\n#     for url_to_be_crawled, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n#         # 查询type_chn的最近一条数据的时间\n#         latets_date_in_db = max(df[df.Category == type_chn][\"Date\"].to_list())\n#         cnstock_spyder.get_historical_news(url_to_be_crawled, category_chn=type_chn, start_date=latets_date_in_db)\n#\n#     Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n#     DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n#\n#     # 开启多线程并行实时爬取\n#     thread_list = []\n#     for url, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n#         thread = threading.Thread(target=cnstock_spyder.get_realtime_news, args=(url, type_chn, 60))\n#         thread_list.append(thread)\n#     for thread in thread_list:\n#         thread.start()\n#     for thread in thread_list:\n#         thread.join()\n"
  },
  {
    "path": "legacy_v1/src/Gon/history_starter_cnstock.py",
    "content": "import __init__\n\nimport time\nimport logging\n\nfrom Kite import config\n\nfrom Killua.denull import DeNull\nfrom Killua.deduplication import Deduplication\nfrom Killua.buildstocknewsdb import GenStockNewsDB\n\nfrom Gon.cnstockspyder import CnStockSpyder\n\n\n# 1. 爬取历史数据\ncnstock_spyder = CnStockSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\nfor url_to_be_crawled, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n    logging.info(\"start crawling {} ...\".format(url_to_be_crawled))\n    cnstock_spyder.get_historical_news(url_to_be_crawled, category_chn=type_chn)\n    logging.info(\"finished ...\")\n    time.sleep(30)\n\n# 2. 针对历史数据进行去重清洗\nDeduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n\n# 3. 将历史数据中包含null值的行去掉\nDeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n\n# 4. 创建新的数据库，针对每一个股票，将所有涉及该股票的新闻都保存在新的数据库，并贴好\"利好\",\"利空\"和\"中性\"标签\ngen_stock_news_db = GenStockNewsDB()\ngen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n\n"
  },
  {
    "path": "legacy_v1/src/Gon/history_starter_jrj.py",
    "content": "import __init__\n\nfrom Kite import config\n\nfrom Killua.denull import DeNull\nfrom Killua.deduplication import Deduplication\nfrom Killua.buildstocknewsdb import GenStockNewsDB\n\nfrom Gon.jrjspyder import JrjSpyder\n\n\n# 1. 爬取历史数据\njrj_spyder = JrjSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\njrj_spyder.get_historical_news(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ, start_date=\"2015-01-01\")\n\n# 2. 针对历史数据进行去重清洗\nDeduplication(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n\n# 3. 将历史数据中包含null值的行去掉\nDeNull(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n\n# 4. 创建新的数据库，针对每一个股票，将所有涉及该股票的新闻都保存在新的数据库，并贴好\"利好\",\"利空\"和\"中性\"标签\ngen_stock_news_db = GenStockNewsDB()\ngen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n"
  },
  {
    "path": "legacy_v1/src/Gon/history_starter_nbd.py",
    "content": "import __init__\n\nfrom Kite import config\n\nfrom Killua.denull import DeNull\nfrom Killua.deduplication import Deduplication\nfrom Killua.buildstocknewsdb import GenStockNewsDB\n\nfrom Gon.nbdspyder import NbdSpyder\n\n\n# 1. 爬取历史数据\nnbd_spyder = NbdSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\nnbd_spyder.get_historical_news(start_page=684)\n\n# 2. 针对历史数据进行去重清洗\nDeduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n\n# 3. 将历史数据中包含null值的行去掉\nDeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n\n# 4. 创建新的数据库，针对每一个股票，将所有涉及该股票的新闻都保存在新的数据库，并贴好\"利好\",\"利空\"和\"中性\"标签\ngen_stock_news_db = GenStockNewsDB()\ngen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\n"
  },
  {
    "path": "legacy_v1/src/Gon/history_starter_stock_price.py",
    "content": "import __init__\n\nfrom Kite import config\n\nfrom Gon.stockinfospyder import StockInfoSpyder\n\n\nstock_info_spyder = StockInfoSpyder(config.STOCK_DATABASE_NAME, config.COLLECTION_NAME_STOCK_BASIC_INFO)\n\n# 指定时间段，获取历史数据，如：stock_info_spyder.get_historical_news(start_date=\"20150101\", end_date=\"20201204\")\n# 如果没有指定时间段，且数据库已存在部分数据，则从最新的数据时间开始获取直到现在，比如数据库里已有sh600000价格数据到\n# 2020-12-03号，如不设定具体时间，则从自动获取sh600000自2020-12-04至当前的价格数据\nstock_info_spyder.get_historical_news()\n"
  },
  {
    "path": "legacy_v1/src/Gon/ifengspyder.py",
    "content": "\"\"\"\n凤凰财经网：https://finance.ifeng.com\n上市公司：https://finance.ifeng.com/shanklist/1-62-83-\n大盘评述：https://finance.ifeng.com/shanklist/1-62-85-\n证券要闻：https://finance.ifeng.com/shanklist/1-62-84-\n\"\"\""
  },
  {
    "path": "legacy_v1/src/Gon/jrjspyder.py",
    "content": "\"\"\"\n金融界：http://www.jrj.com.cn\n股票频道全部新闻：http://stock.jrj.com.cn/xwk/202012/20201203_1.shtml\n\"\"\"\n\nimport __init__\n\nfrom spyder import Spyder\n\nfrom Kite import utils\nfrom Kite import config\nfrom Kite.database import Database\n\nfrom Leorio.tokenization import Tokenization\n\nimport time\nimport json\nimport redis\nimport datetime\nimport logging\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass JrjSpyder(Spyder):\n\n    def __init__(self, database_name, collection_name):\n        super(JrjSpyder, self).__init__()\n        self.db_obj = Database()\n        self.col = self.db_obj.conn[database_name].get_collection(collection_name)\n        self.terminated_amount = 0\n        self.db_name = database_name\n        self.col_name = collection_name\n        self.tokenization = Tokenization(import_module=\"jieba\", user_dict=config.USER_DEFINED_DICT_PATH)\n        self.redis_client = redis.StrictRedis(host=config.REDIS_IP,\n                                              port=config.REDIS_PORT,\n                                              db=config.CACHE_NEWS_REDIS_DB_ID)\n\n    def get_url_info(self, url, specific_date):\n        try:\n            bs = utils.html_parser(url)\n        except Exception:\n            return False\n        date = \"\"\n        for span in bs.find_all(\"span\"):\n            if span.contents[0] == \"jrj_final_date_start\":\n                date = span.text.replace(\"\\r\", \"\").replace(\"\\n\", \"\")\n                break\n        if date == \"\":\n            date = specific_date\n        article = \"\"\n        for p in bs.find_all(\"p\"):\n            if not p.find_all(\"jrj_final_daohang_start\") and p.attrs == {} and \\\n                    not p.find_all(\"input\") and not p.find_all(\"a\", attrs={\"class\": \"red\"}) and not p.find_all(\"i\") and not p.find_all(\"span\"):\n                # if p.contents[0] != \"jrj_final_daohang_start1\" and p.attrs == {} and \\\n                #         not p.find_all(\"input\") and not p.find_all(\"a\", attrs={\"class\": \"red\"}) and not p.find_all(\"i\"):\n                article += p.text.replace(\"\\r\", \"\").replace(\"\\n\", \"\").replace(\"\\u3000\", \"\")\n\n        return [date, article]\n\n    def get_historical_news(self, url, start_date=None, end_date=None):\n        name_code_df = self.db_obj.get_data(config.STOCK_DATABASE_NAME,\n                                            config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                            keys=[\"name\", \"code\"])\n        name_code_dict = dict(name_code_df.values)\n\n        crawled_urls_list = []\n        if end_date is None:\n            end_date = datetime.datetime.now().strftime(\"%Y-%m-%d\")\n\n        if start_date is None:\n            # 如果start_date是None，则从历史数据库最新的日期补充爬取到最新日期\n            # e.g. history_latest_date_str -> \"2020-12-08\"\n            #      history_latest_date_dt -> datetime.date(2020, 12, 08)\n            #      start_date -> \"2020-12-09\"\n            history_latest_date_list = self.db_obj.get_data(self.db_name,\n                                                            self.col_name,\n                                                            keys=[\"Date\"])[\"Date\"].to_list()\n            if len(history_latest_date_list) != 0:\n                history_latest_date_str = max(history_latest_date_list).split(\" \")[0]\n                history_latest_date_dt = datetime.datetime.strptime(history_latest_date_str, \"%Y-%m-%d\").date()\n                offset = datetime.timedelta(days=1)\n                start_date = (history_latest_date_dt + offset).strftime('%Y-%m-%d')\n            else:\n                start_date = config.JRJ_REQUEST_DEFAULT_DATE\n\n        dates_list = utils.get_date_list_from_range(start_date, end_date)\n        dates_separated_into_ranges_list = utils.gen_dates_list(dates_list, config.JRJ_DATE_RANGE)\n\n        for dates_range in dates_separated_into_ranges_list:\n            for date in dates_range:\n                first_url = \"{}/{}/{}_1.shtml\".format(url, date.replace(\"-\", \"\")[0:6], date.replace(\"-\", \"\"))\n                max_pages_num = utils.search_max_pages_num(first_url, date)\n                for num in range(1, max_pages_num + 1):\n                    _url = \"{}/{}/{}_{}.shtml\".format(url, date.replace(\"-\", \"\")[0:6], date.replace(\"-\", \"\"), str(num))\n                    bs = utils.html_parser(_url)\n                    a_list = bs.find_all(\"a\")\n                    for a in a_list:\n                        if \"href\" in a.attrs and a.string and \\\n                                a[\"href\"].find(\"/{}/{}/\".format(date.replace(\"-\", \"\")[:4],\n                                                                date.replace(\"-\", \"\")[4:6])) != -1:\n                            if a[\"href\"] not in crawled_urls_list:\n                                # 如果标题不包含\"收盘\",\"报于\"等字样，即可写入数据库，因为包含这些字样标题的新闻多为机器自动生成\n                                if a.string.find(\"收盘\") == -1 and a.string.find(\"报于\") == -1 and \\\n                                        a.string.find(\"新三板挂牌上市\") == -1:\n                                    result = self.get_url_info(a[\"href\"], date)\n                                    while not result:\n                                        self.terminated_amount += 1\n                                        if self.terminated_amount > config.JRJ_MAX_REJECTED_AMOUNTS:\n                                            # 始终无法爬取的URL保存起来\n                                            with open(config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                                file.write(\"{}\\n\".format(a[\"href\"]))\n                                            logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                         \"and the failed url has been written in path {}\"\n                                                         .format(config.JRJ_MAX_REJECTED_AMOUNTS,\n                                                                 config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH))\n                                            break\n                                        logging.info(\"rejected by remote server, request {} again after \"\n                                                     \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                        time.sleep(60 * self.terminated_amount)\n                                        result = self.get_url_info(a[\"href\"], date)\n                                    if not result:\n                                        # 爬取失败的情况\n                                        logging.info(\"[FAILED] {} {}\".format(a.string, a[\"href\"]))\n                                    else:\n                                        # 有返回但是article为null的情况\n                                        article_specific_date, article = result\n                                        while article == \"\" and self.is_article_prob >= .1:\n                                            self.is_article_prob -= .1\n                                            result = self.get_url_info(a[\"href\"], date)\n                                            while not result:\n                                                self.terminated_amount += 1\n                                                if self.terminated_amount > config.JRJ_MAX_REJECTED_AMOUNTS:\n                                                    # 始终无法爬取的URL保存起来\n                                                    with open(config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                                        file.write(\"{}\\n\".format(a[\"href\"]))\n                                                    logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                                 \"and the failed url has been written in path {}\"\n                                                                 .format(config.JRJ_MAX_REJECTED_AMOUNTS,\n                                                                         config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH))\n                                                    break\n                                                logging.info(\"rejected by remote server, request {} again after \"\n                                                             \"{} seconds...\".format(a[\"href\"],\n                                                                                    60 * self.terminated_amount))\n                                                time.sleep(60 * self.terminated_amount)\n                                                result = self.get_url_info(a[\"href\"], date)\n                                            article_specific_date, article = result\n                                        self.is_article_prob = .5\n                                        if article != \"\":\n                                                related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                                                  name_code_dict)\n                                                data = {\"Date\": article_specific_date,\n                                                        \"Url\": a[\"href\"],\n                                                        \"Title\": a.string,\n                                                        \"Article\": article,\n                                                        \"RelatedStockCodes\": \" \".join(related_stock_codes_list)}\n                                                # self.col.insert_one(data)\n                                                self.db_obj.insert_data(self.db_name, self.col_name, data)\n                                                logging.info(\"[SUCCESS] {} {} {}\".format(article_specific_date,\n                                                                                         a.string,\n                                                                                         a[\"href\"]))\n                                    self.terminated_amount = 0  # 爬取结束后重置该参数\n                                else:\n                                    logging.info(\"[QUIT] {}\".format(a.string))\n\n    def get_realtime_news(self, interval=60):\n        name_code_df = self.db_obj.get_data(config.STOCK_DATABASE_NAME,\n                                            config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                            keys=[\"name\", \"code\"])\n        name_code_dict = dict(name_code_df.values)\n        # crawled_urls_list = []\n        is_change_date = False\n        last_date = datetime.datetime.now().strftime(\"%Y-%m-%d\")\n        while True:\n            today_date = datetime.datetime.now().strftime(\"%Y-%m-%d\")\n            if today_date != last_date:\n                is_change_date = True\n                last_date = today_date\n            if is_change_date:\n                # crawled_urls_list = []\n                utils.batch_lpop(self.redis_client,\n                                 config.CACHE_SAVED_NEWS_JRJ_TODAY_VAR_NAME,\n                                 self.redis_client.llen(config.CACHE_SAVED_NEWS_JRJ_TODAY_VAR_NAME))\n                is_change_date = False\n            _url = \"{}/{}/{}_1.shtml\".format(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ,\n                                             today_date.replace(\"-\", \"\")[0:6],\n                                             today_date.replace(\"-\", \"\"))\n            max_pages_num = utils.search_max_pages_num(_url, today_date)\n            for num in range(1, max_pages_num + 1):\n                _url = \"{}/{}/{}_{}.shtml\".format(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ,\n                                                  today_date.replace(\"-\", \"\")[0:6],\n                                                  today_date.replace(\"-\", \"\"),\n                                                  str(num))\n                bs = utils.html_parser(_url)\n                a_list = bs.find_all(\"a\")\n                for a in a_list:\n                    if \"href\" in a.attrs and a.string and \\\n                            a[\"href\"].find(\"/{}/{}/\".format(today_date.replace(\"-\", \"\")[:4],\n                                                            today_date.replace(\"-\", \"\")[4:6])) != -1:\n                        # if a[\"href\"] not in crawled_urls_list:\n                        if a[\"href\"] not in self.redis_client.lrange(config.CACHE_SAVED_NEWS_JRJ_TODAY_VAR_NAME, 0, -1):\n                            # 如果标题不包含\"收盘\",\"报于\"等字样，即可写入数据库，因为包含这些字样标题的新闻多为机器自动生成\n                            if a.string.find(\"收盘\") == -1 and a.string.find(\"报于\") == -1 and \\\n                                    a.string.find(\"新三板挂牌上市\") == -1:\n                                result = self.get_url_info(a[\"href\"], today_date)\n                                while not result:\n                                    self.terminated_amount += 1\n                                    if self.terminated_amount > config.JRJ_MAX_REJECTED_AMOUNTS:\n                                        # 始终无法爬取的URL保存起来\n                                        with open(config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                            file.write(\"{}\\n\".format(a[\"href\"]))\n                                        logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                     \"and the failed url has been written in path {}\"\n                                                     .format(config.JRJ_MAX_REJECTED_AMOUNTS,\n                                                             config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH))\n                                        break\n                                    logging.info(\"rejected by remote server, request {} again after \"\n                                                 \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                    time.sleep(60 * self.terminated_amount)\n                                    result = self.get_url_info(a[\"href\"], today_date)\n                                if not result:\n                                    # 爬取失败的情况\n                                    logging.info(\"[FAILED] {} {}\".format(a.string, a[\"href\"]))\n                                else:\n                                    # 有返回但是article为null的情况\n                                    article_specific_date, article = result\n                                    while article == \"\" and self.is_article_prob >= .1:\n                                        self.is_article_prob -= .1\n                                        result = self.get_url_info(a[\"href\"], today_date)\n                                        while not result:\n                                            self.terminated_amount += 1\n                                            if self.terminated_amount > config.JRJ_MAX_REJECTED_AMOUNTS:\n                                                # 始终无法爬取的URL保存起来\n                                                with open(config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                                    file.write(\"{}\\n\".format(a[\"href\"]))\n                                                logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                             \"and the failed url has been written in path {}\"\n                                                             .format(config.JRJ_MAX_REJECTED_AMOUNTS,\n                                                                     config.RECORD_JRJ_FAILED_URL_TXT_FILE_PATH))\n                                                break\n                                            logging.info(\"rejected by remote server, request {} again after \"\n                                                         \"{} seconds...\".format(a[\"href\"],\n                                                                                60 * self.terminated_amount))\n                                            time.sleep(60 * self.terminated_amount)\n                                            result = self.get_url_info(a[\"href\"], today_date)\n                                        article_specific_date, article = result\n                                    self.is_article_prob = .5\n                                    if article != \"\":\n                                        related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                                          name_code_dict)\n                                        self.db_obj.insert_data(self.db_name, self.col_name,\n                                                                {\"Date\": article_specific_date,\n                                                                 \"Url\": a[\"href\"],\n                                                                 \"Title\": a.string,\n                                                                 \"Article\": article,\n                                                                 \"RelatedStockCodes\": \" \".join(related_stock_codes_list)})\n                                        self.redis_client.lpush(config.CACHE_NEWS_LIST_NAME, json.dumps(\n                                            {\"Date\": article_specific_date,\n                                             \"Url\": a[\"href\"],\n                                             \"Title\": a.string,\n                                             \"Article\": article,\n                                             \"RelatedStockCodes\": \" \".join(related_stock_codes_list),\n                                             \"OriDB\": config.DATABASE_NAME,\n                                             \"OriCOL\": config.COLLECTION_NAME_JRJ\n                                             }\n                                        ))\n                                        logging.info(\"[SUCCESS] {} {} {}\".format(article_specific_date,\n                                                                                 a.string,\n                                                                                 a[\"href\"]))\n                                self.terminated_amount = 0  # 爬取结束后重置该参数\n                            else:\n                                logging.info(\"[QUIT] {}\".format(a.string))\n                            # crawled_urls_list.append(a[\"href\"])\n                            self.redis_client.lpush(config.CACHE_SAVED_NEWS_JRJ_TODAY_VAR_NAME, a[\"href\"])\n            # logging.info(\"sleep {} secs then request again ... \".format(interval))\n            time.sleep(interval)\n\n\n# \"\"\"\n# Example-1:\n# 爬取历史新闻数据\n# \"\"\"\n# if __name__ == \"__main__\":\n#     jrj_spyder = JrjSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n#     jrj_spyder.get_historical_news(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ, start_date=\"2015-01-01\")\n#\n#     Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n#     DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n\n\n# \"\"\"\n# Example-2:\n# 爬取实时新闻数据\n# \"\"\"\n# if __name__ == '__main__':\n#     from Kite import config\n#     from Gon.jrjspyder import JrjSpyder\n#\n#     jrj_spyder = JrjSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n#     jrj_spyder.get_historical_news(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ)  # 补充爬虫数据到最新日期\n#     jrj_spyder.get_realtime_news()\n"
  },
  {
    "path": "legacy_v1/src/Gon/kill_realtime_spyder_tasks.py",
    "content": "import __init__\n\nimport os\nimport wmi\nimport redis\nimport logging\n\nfrom Kite import config\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass KillPyTasks(object):\n\n    def __init__(self):\n        self.redis_client = redis.StrictRedis(config.REDIS_IP,\n                                              port=config.REDIS_PORT,\n                                              db=config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_DB_ID)\n        for _id in range(self.redis_client.llen(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR)):\n            proc = self.get_python_process(param=self.redis_client.lindex(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR, _id).decode())\n            for p in proc:\n                self.killtask(p.Handle)\n                self.print_pid_info(p)\n        for _ in range(self.redis_client.llen(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR)):\n            self.redis_client.lpop(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR)\n\n    @staticmethod\n    def killtask(pid):\n        os.system(f\"taskkill /F /pid {pid} -t\")\n\n    @staticmethod\n    def get_python_process(prop=\"python.exe\", param=None):\n        output = []\n        w = wmi.WMI()\n        for proc in w.Win32_Process(name=prop):\n            if param is None:\n                output.append(proc)\n            else:\n                if str(proc.CommandLine).find(param) >= 0:\n                    output.append(proc)\n        return output\n\n    @staticmethod\n    def print_pid_info(process):\n        logging.info(\"{} | {} | {} -> killed ... \".format(process.Handle, process.Caption, process.CommandLine))\n\n\nif __name__ == \"__main__\":\n    KillPyTasks()"
  },
  {
    "path": "legacy_v1/src/Gon/money163spyder.py",
    "content": "\"\"\"\n网易财经网：https://money.163.com\n个股资讯：http://money.163.com/special/g/00251LR5/gptj.html\n市场资讯：http://money.163.com/special/00251LR5/cpznList.html\n行业板块：http://money.163.com/special/00251LJV/hyyj.html\n\"\"\""
  },
  {
    "path": "legacy_v1/src/Gon/nbdspyder.py",
    "content": "\"\"\"\n每经网：http://www.nbd.com.cn\nA股动态：http://stocks.nbd.com.cn/columns/275/page/1\n\"\"\"\n\nimport __init__\n\nfrom spyder import Spyder\n\nfrom Kite import utils\nfrom Kite import config\nfrom Kite.database import Database\n\nfrom Leorio.tokenization import Tokenization\n\nimport re\nimport time\nimport json\nimport redis\nimport logging\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass NbdSpyder(Spyder):\n\n    def __init__(self, database_name, collection_name):\n        super(NbdSpyder, self).__init__()\n        self.db_obj = Database()\n        self.col = self.db_obj.conn[database_name].get_collection(collection_name)\n        self.terminated_amount = 0\n        self.db_name = database_name\n        self.col_name = collection_name\n        self.tokenization = Tokenization(import_module=\"jieba\", user_dict=config.USER_DEFINED_DICT_PATH)\n        self.redis_client = redis.StrictRedis(host=config.REDIS_IP,\n                                              port=config.REDIS_PORT,\n                                              db=config.CACHE_NEWS_REDIS_DB_ID)\n\n    def get_url_info(self, url):\n        try:\n            bs = utils.html_parser(url)\n        except Exception:\n            return False\n        span_list = bs.find_all(\"span\")\n        part = bs.find_all(\"p\")\n        article = \"\"\n        date = \"\"\n        for span in span_list:\n            if \"class\" in span.attrs and span.text and span[\"class\"] == [\"time\"]:\n                    string = span.text.split()\n                    for dt in string:\n                        if dt.find(\"-\") != -1:\n                            date += dt + \" \"\n                        elif dt.find(\":\") != -1:\n                            date += dt\n                    break\n        for paragraph in part:\n            chn_status = utils.count_chn(str(paragraph))\n            possible = chn_status[1]\n            if possible > self.is_article_prob:\n                article += str(paragraph)\n        while article.find(\"<\") != -1 and article.find(\">\") != -1:\n            string = article[article.find(\"<\"):article.find(\">\")+1]\n            article = article.replace(string, \"\")\n        while article.find(\"\\u3000\") != -1:\n            article = article.replace(\"\\u3000\", \"\")\n        article = \" \".join(re.split(\" +|\\n+\", article)).strip()\n\n        return [date, article]\n\n    def get_historical_news(self, start_page=684):\n        date_list = self.db_obj.get_data(self.db_name, self.col_name, keys=[\"Date\"])[\"Date\"].to_list()\n        name_code_df = self.db_obj.get_data(config.STOCK_DATABASE_NAME,\n                                            config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                            keys=[\"name\", \"code\"])\n        name_code_dict = dict(name_code_df.values)\n        if len(date_list) == 0:\n            # 说明没有历史数据，从头开始爬取\n            crawled_urls_list = []\n            page_urls = [\"{}/{}\".format(config.WEBSITES_LIST_TO_BE_CRAWLED_NBD, page_id)\n                         for page_id in range(start_page, 0, -1)]\n            for page_url in page_urls:\n                bs = utils.html_parser(page_url)\n                a_list = bs.find_all(\"a\")\n                for a in a_list:\n                    if \"click-statistic\" in a.attrs and a.string \\\n                            and a[\"click-statistic\"].find(\"Article_\") != -1 \\\n                            and a[\"href\"].find(\"http://www.nbd.com.cn/articles/\") != -1:\n                        if a[\"href\"] not in crawled_urls_list:\n                            result = self.get_url_info(a[\"href\"])\n                            while not result:\n                                self.terminated_amount += 1\n                                if self.terminated_amount > config.NBD_MAX_REJECTED_AMOUNTS:\n                                    # 始终无法爬取的URL保存起来\n                                    with open(config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                        file.write(\"{}\\n\".format(a[\"href\"]))\n                                    logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                 \"and the failed url has been written in path {}\"\n                                                 .format(config.NBD_MAX_REJECTED_AMOUNTS,\n                                                         config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH))\n                                    break\n                                logging.info(\"rejected by remote server, request {} again after \"\n                                             \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                time.sleep(60 * self.terminated_amount)\n                                result = self.get_url_info(a[\"href\"])\n                            if not result:\n                                # 爬取失败的情况\n                                logging.info(\"[FAILED] {} {}\".format(a.string, a[\"href\"]))\n                            else:\n                                # 有返回但是article为null的情况\n                                date, article = result\n                                while article == \"\" and self.is_article_prob >= .1:\n                                    self.is_article_prob -= .1\n                                    result = self.get_url_info(a[\"href\"])\n                                    while not result:\n                                        self.terminated_amount += 1\n                                        if self.terminated_amount > config.NBD_MAX_REJECTED_AMOUNTS:\n                                            # 始终无法爬取的URL保存起来\n                                            with open(config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                                file.write(\"{}\\n\".format(a[\"href\"]))\n                                            logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                         \"and the failed url has been written in path {}\"\n                                                         .format(config.NBD_MAX_REJECTED_AMOUNTS,\n                                                                 config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH))\n                                            break\n                                        logging.info(\"rejected by remote server, request {} again after \"\n                                                     \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                        time.sleep(60 * self.terminated_amount)\n                                        result = self.get_url_info(a[\"href\"])\n                                    date, article = result\n                                self.is_article_prob = .5\n                                if article != \"\":\n                                    related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                                      name_code_dict)\n                                    data = {\"Date\": date,\n                                            # \"PageId\": page_url.split(\"/\")[-1],\n                                            \"Url\": a[\"href\"],\n                                            \"Title\": a.string,\n                                            \"Article\": article,\n                                            \"RelatedStockCodes\": \" \".join(related_stock_codes_list)}\n                                    # self.col.insert_one(data)\n                                    self.db_obj.insert_data(self.db_name, self.col_name, data)\n                                    logging.info(\"[SUCCESS] {} {} {}\".format(date, a.string, a[\"href\"]))\n        else:\n            is_stop = False\n            start_date = max(date_list)\n            page_start_id = 1\n            while not is_stop:\n                page_url = \"{}/{}\".format(config.WEBSITES_LIST_TO_BE_CRAWLED_NBD, page_start_id)\n                bs = utils.html_parser(page_url)\n                a_list = bs.find_all(\"a\")\n                for a in a_list:\n                    if \"click-statistic\" in a.attrs and a.string \\\n                            and a[\"click-statistic\"].find(\"Article_\") != -1 \\\n                            and a[\"href\"].find(\"http://www.nbd.com.cn/articles/\") != -1:\n                        result = self.get_url_info(a[\"href\"])\n                        while not result:\n                            self.terminated_amount += 1\n                            if self.terminated_amount > config.NBD_MAX_REJECTED_AMOUNTS:\n                                # 始终无法爬取的URL保存起来\n                                with open(config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                    file.write(\"{}\\n\".format(a[\"href\"]))\n                                logging.info(\"rejected by remote server longer than {} minutes, \"\n                                             \"and the failed url has been written in path {}\"\n                                             .format(config.NBD_MAX_REJECTED_AMOUNTS,\n                                                     config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH))\n                                break\n                            logging.info(\"rejected by remote server, request {} again after \"\n                                         \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                            time.sleep(60 * self.terminated_amount)\n                            result = self.get_url_info(a[\"href\"])\n                        if not result:\n                            # 爬取失败的情况\n                            logging.info(\"[FAILED] {} {}\".format(a.string, a[\"href\"]))\n                        else:\n                            # 有返回但是article为null的情况\n                            date, article = result\n                            if date > start_date:\n                                while article == \"\" and self.is_article_prob >= .1:\n                                    self.is_article_prob -= .1\n                                    result = self.get_url_info(a[\"href\"])\n                                    while not result:\n                                        self.terminated_amount += 1\n                                        if self.terminated_amount > config.NBD_MAX_REJECTED_AMOUNTS:\n                                            # 始终无法爬取的URL保存起来\n                                            with open(config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                                file.write(\"{}\\n\".format(a[\"href\"]))\n                                            logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                         \"and the failed url has been written in path {}\"\n                                                         .format(config.NBD_MAX_REJECTED_AMOUNTS,\n                                                                 config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH))\n                                            break\n                                        logging.info(\"rejected by remote server, request {} again after \"\n                                                     \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                        time.sleep(60 * self.terminated_amount)\n                                        result = self.get_url_info(a[\"href\"])\n                                    date, article = result\n                                self.is_article_prob = .5\n                                if article != \"\":\n                                    related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                                      name_code_dict)\n                                    data = {\"Date\": date,\n                                            \"Url\": a[\"href\"],\n                                            \"Title\": a.string,\n                                            \"Article\": article,\n                                            \"RelatedStockCodes\": \" \".join(related_stock_codes_list)}\n                                    self.db_obj.insert_data(self.db_name, self.col_name, data)\n                                    logging.info(\"[SUCCESS] {} {} {}\".format(date, a.string, a[\"href\"]))\n                            else:\n                                is_stop = True\n                                break\n                if not is_stop:\n                    page_start_id += 1\n\n    def get_realtime_news(self, interval=60):\n        page_url = \"{}/1\".format(config.WEBSITES_LIST_TO_BE_CRAWLED_NBD)\n        logging.info(\"start real-time crawling of URL -> {}, request every {} secs ... \".format(page_url, interval))\n        name_code_df = self.db_obj.get_data(config.STOCK_DATABASE_NAME,\n                                            config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                            keys=[\"name\", \"code\"])\n        name_code_dict = dict(name_code_df.values)\n        # crawled_urls = []\n        date_list = self.db_obj.get_data(self.db_name, self.col_name, keys=[\"Date\"])[\"Date\"].to_list()\n        latest_date = max(date_list)\n        while True:\n            # 每隔一定时间轮询该网址\n            # if len(crawled_urls) > 100:\n            #     # 防止list过长，内存消耗大，维持list在100条\n            #     crawled_urls.pop(0)\n            if self.redis_client.llen(config.CACHE_SAVED_NEWS_NBD_TODAY_VAR_NAME) > 100:\n                # 防止缓存list过长，内存消耗大，维持list在100条\n                self.redis_client.rpop(config.CACHE_SAVED_NEWS_NBD_TODAY_VAR_NAME)\n            bs = utils.html_parser(page_url)\n            a_list = bs.find_all(\"a\")\n            for a in a_list:\n                if \"click-statistic\" in a.attrs and a.string \\\n                        and a[\"click-statistic\"].find(\"Article_\") != -1 \\\n                        and a[\"href\"].find(\"http://www.nbd.com.cn/articles/\") != -1:\n                    # if a[\"href\"] not in crawled_urls:\n                    if a[\"href\"] not in self.redis_client.lrange(config.CACHE_SAVED_NEWS_NBD_TODAY_VAR_NAME, 0, -1):\n                        result = self.get_url_info(a[\"href\"])\n                        while not result:\n                            self.terminated_amount += 1\n                            if self.terminated_amount > config.NBD_MAX_REJECTED_AMOUNTS:\n                                # 始终无法爬取的URL保存起来\n                                with open(config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                    file.write(\"{}\\n\".format(a[\"href\"]))\n                                logging.info(\"rejected by remote server longer than {} minutes, \"\n                                             \"and the failed url has been written in path {}\"\n                                             .format(config.NBD_MAX_REJECTED_AMOUNTS,\n                                                     config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH))\n                                break\n                            logging.info(\"rejected by remote server, request {} again after \"\n                                         \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                            time.sleep(60 * self.terminated_amount)\n                            result = self.get_url_info(a[\"href\"])\n                        if not result:\n                            # 爬取失败的情况\n                            logging.info(\"[FAILED] {} {}\".format(a.string, a[\"href\"]))\n                        else:\n                            # 有返回但是article为null的情况\n                            date, article = result\n                            if date > latest_date:\n                                while article == \"\" and self.is_article_prob >= .1:\n                                    self.is_article_prob -= .1\n                                    result = self.get_url_info(a[\"href\"])\n                                    while not result:\n                                        self.terminated_amount += 1\n                                        if self.terminated_amount > config.NBD_MAX_REJECTED_AMOUNTS:\n                                            # 始终无法爬取的URL保存起来\n                                            with open(config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH, \"a+\") as file:\n                                                file.write(\"{}\\n\".format(a[\"href\"]))\n                                            logging.info(\"rejected by remote server longer than {} minutes, \"\n                                                         \"and the failed url has been written in path {}\"\n                                                         .format(config.NBD_MAX_REJECTED_AMOUNTS,\n                                                                 config.RECORD_NBD_FAILED_URL_TXT_FILE_PATH))\n                                            break\n                                        logging.info(\"rejected by remote server, request {} again after \"\n                                                     \"{} seconds...\".format(a[\"href\"], 60 * self.terminated_amount))\n                                        time.sleep(60 * self.terminated_amount)\n                                        result = self.get_url_info(a[\"href\"])\n                                    date, article = result\n                                self.is_article_prob = .5\n                                if article != \"\":\n                                    related_stock_codes_list = self.tokenization.find_relevant_stock_codes_in_article(article,\n                                                                                                                      name_code_dict)\n                                    self.db_obj.insert_data(self.db_name, self.col_name,\n                                                            {\"Date\": date,\n                                                             # \"PageId\": page_url.split(\"/\")[-1],\n                                                             \"Url\": a[\"href\"],\n                                                             \"Title\": a.string,\n                                                             \"Article\": article,\n                                                             \"RelatedStockCodes\": \" \".join(related_stock_codes_list)})\n                                    self.redis_client.lpush(config.CACHE_NEWS_LIST_NAME, json.dumps(\n                                        {\"Date\": date,\n                                         # \"PageId\": page_url.split(\"/\")[-1],\n                                         \"Url\": a[\"href\"],\n                                         \"Title\": a.string,\n                                         \"Article\": article,\n                                         \"RelatedStockCodes\": \" \".join(related_stock_codes_list),\n                                         \"OriDB\": config.DATABASE_NAME,\n                                         \"OriCOL\": config.COLLECTION_NAME_NBD\n                                         }\n                                    ))\n                                    # crawled_urls.append(a[\"href\"])\n                                    self.redis_client.lpush(config.CACHE_SAVED_NEWS_NBD_TODAY_VAR_NAME, a[\"href\"])\n                                    logging.info(\"[SUCCESS] {} {} {}\".format(date, a.string, a[\"href\"]))\n            # logging.info(\"sleep {} secs then request again ... \".format(interval))\n            time.sleep(interval)\n\n\n# \"\"\"\n# Example-1:\n# 爬取历史新闻数据\n# \"\"\"\n# if __name__ == \"__main__\":\n#     nbd_spyder = NbdSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\n#     nbd_spyder.get_historical_news(start_page=684)\n#\n#     Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n#     DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n\n\n# \"\"\"\n# Example-2:\n# 爬取实时新闻数据\n# \"\"\"\n# if __name__ == '__main__':\n#     from Kite import config\n#\n#     from Killua.denull import DeNull\n#     from Killua.deduplication import Deduplication\n#\n#     from Gon.nbdspyder import NbdSpyder\n#\n#     # 如果没有历史数据从头爬取，如果已爬取历史数据，则从最新的时间开始爬取\n#     # 如历史数据中最近的新闻时间是\"2020-12-09 20:37:10\"，则从该时间开始爬取\n#     nbd_spyder = NbdSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\n#     nbd_spyder.get_historical_news()\n#\n#     Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n#     DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n#\n#     nbd_spyder.get_realtime_news()\n"
  },
  {
    "path": "legacy_v1/src/Gon/realtime_starter_cnstock.py",
    "content": "import __init__\n\nimport time\nimport redis\nimport logging\nimport threading\n\nfrom Kite import config\nfrom Kite.database import Database\n\nfrom Killua.denull import DeNull\nfrom Killua.deduplication import Deduplication \n\nfrom Gon.cnstockspyder import CnStockSpyder\n\n\nredis_client = redis.StrictRedis(config.REDIS_IP,\n                                 port=config.REDIS_PORT,\n                                 db=config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_DB_ID)\nredis_client.lpush(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR, \"realtime_starter_cnstock.py\")\n\nobj = Database()\ndf = obj.get_data(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK, keys=[\"Date\", \"Category\"])\n\ncnstock_spyder = CnStockSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n# 先补充历史数据，比如已爬取数据到2020-12-01，但是启动实时爬取程序在2020-12-23，则先\n# 自动补充爬取2020-12-02至2020-12-23的新闻数据\nfor url_to_be_crawled, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n    # 查询type_chn的最近一条数据的时间\n    latets_date_in_db = max(df[df.Category == type_chn][\"Date\"].to_list())\n    cnstock_spyder.get_historical_news(url_to_be_crawled, category_chn=type_chn, start_date=latets_date_in_db)\n\nDeduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\nDeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n\n# 开启多线程并行实时爬取\nthread_list = []\nfor url, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n    thread = threading.Thread(target=cnstock_spyder.get_realtime_news, args=(url, type_chn, 60))\n    thread_list.append(thread)\nfor thread in thread_list:\n    thread.start()\nfor thread in thread_list:\n    thread.join()"
  },
  {
    "path": "legacy_v1/src/Gon/realtime_starter_jrj.py",
    "content": "import __init__\n\nimport redis\n\nfrom Kite import config\n\nfrom Gon.jrjspyder import JrjSpyder\n\n\nredis_client = redis.StrictRedis(config.REDIS_IP,\n                                 port=config.REDIS_PORT,\n                                 db=config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_DB_ID)\nredis_client.lpush(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR, \"realtime_starter_jrj.py\")\n\njrj_spyder = JrjSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\njrj_spyder.get_historical_news(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ)  # 补充爬虫数据到最新日期\njrj_spyder.get_realtime_news()"
  },
  {
    "path": "legacy_v1/src/Gon/realtime_starter_nbd.py",
    "content": "import __init__\n\nimport redis\n\nfrom Kite import config\n\nfrom Killua.denull import DeNull\nfrom Killua.deduplication import Deduplication \n\nfrom Gon.nbdspyder import NbdSpyder\n\n\nredis_client = redis.StrictRedis(config.REDIS_IP,\n                                 port=config.REDIS_PORT,\n                                 db=config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_DB_ID)\nredis_client.lpush(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR, \"realtime_starter_nbd.py\")\n\n# 如果没有历史数据从头爬取，如果已爬取历史数据，则从最新的时间开始爬取\n# 如历史数据中最近的新闻时间是\"2020-12-09 20:37:10\"，则从该时间开始爬取\nnbd_spyder = NbdSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\nnbd_spyder.get_historical_news()\n\n# Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n# DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n\nnbd_spyder.get_realtime_news()"
  },
  {
    "path": "legacy_v1/src/Gon/realtime_starter_redis_queue.py",
    "content": "import __init__\n\nimport redis\n\nfrom Kite import config\n\nfrom Killua.buildstocknewsdb import GenStockNewsDB\n\n\nredis_client = redis.StrictRedis(config.REDIS_IP,\n                                 port=config.REDIS_PORT,\n                                 db=config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_DB_ID)\nredis_client.lpush(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR, \"realtime_starter_redis_queue.py\")\n\ngen_stock_news_db = GenStockNewsDB()\ngen_stock_news_db.listen_redis_queue()"
  },
  {
    "path": "legacy_v1/src/Gon/realtime_starter_stock_price.py",
    "content": "import __init__\n\nimport redis\n\nfrom Kite import config\n\nfrom Gon.stockinfospyder import StockInfoSpyder\n\n\nredis_client = redis.StrictRedis(config.REDIS_IP,\n                                 port=config.REDIS_PORT,\n                                 db=config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_DB_ID)\nredis_client.lpush(config.CACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR, \"realtime_starter_stock_price.py\")\n\nstock_info_spyder = StockInfoSpyder(config.STOCK_DATABASE_NAME, config.COLLECTION_NAME_STOCK_BASIC_INFO)\nstock_info_spyder.get_realtime_news()"
  },
  {
    "path": "legacy_v1/src/Gon/sinaspyder.py",
    "content": "\"\"\"\n新浪财经网：https://finance.sina.com.cn\n公司要闻：https://finance.sina.com.cn/roll/index.d.html?cid=56592&page=1\n个股点评：https://finance.sina.com.cn/roll/index.d.html?cid=56588&page=1\n大盘评述：https://finance.sina.com.cn/roll/index.d.html?cid=56589&page=1\n公司研究：http://stock.finance.sina.com.cn/stock/go.php/vReport_List/kind/company/index.phtml?p=1\n市场研究：https://finance.sina.com.cn/roll/index.d.html?cid=56605&page=1\n主力动向：https://finance.sina.com.cn/roll/index.d.html?cid=56615&page=1\n行业研究：http://stock.finance.sina.com.cn/stock/go.php/vReport_List/kind/industry/index.phtml?p=1\n投资策略：http://stock.finance.sina.com.cn/stock/go.php/vReport_List/kind/strategy/index.phtml?p=1\n\"\"\"\n\nimport __init__\nfrom spyder import Spyder\n"
  },
  {
    "path": "legacy_v1/src/Gon/spyder.py",
    "content": "class Spyder(object):\n\n    def __init__(self):\n        self.is_article_prob = .5\n\n    def extract_data(self, tag_list):\n        data = list()\n        for tag in tag_list:\n            exec(tag + \" = self.col.distinct('\" + tag + \"')\")\n            exec(\"data.append(\" + tag + \")\")\n\n        return data\n\n    def query_news(self, _key, param):\n        # 模糊查询\n        return self.col.find({_key: {'$regex': \".*{}.*\".format(param)}})\n\n    def get_url_info(self, url):\n        pass\n\n    def get_historical_news(self, url):\n        pass\n\n    def get_realtime_news(self, url):\n        pass"
  },
  {
    "path": "legacy_v1/src/Gon/stockinfospyder.py",
    "content": "\"\"\"\nhttps://www.akshare.xyz/zh_CN/latest/\n\"\"\"\n\nimport __init__\n\nimport os\nimport time\nimport redis\nimport logging\nimport datetime\nfrom spyder import Spyder\n\nfrom pandas._libs.tslibs.timestamps import Timestamp\n\nfrom Kite.database import Database\nfrom Kite import config\n\nimport akshare as ak\n\nimport tushare as ts\nts.set_token(config.TUSHARE_TOKEN)\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass StockInfoSpyder(Spyder):\n\n    def __init__(self, database_name, collection_name):\n        super(StockInfoSpyder, self).__init__()\n        self.db_obj = Database()\n        self.col_basic_info = self.db_obj.get_collection(database_name, collection_name)\n        self.database_name = database_name\n        self.collection_name = collection_name\n        self.start_program_date = datetime.datetime.now().strftime(\"%Y%m%d\")\n        self.redis_client = redis.StrictRedis(host=\"localhost\",\n                                              port=6379,\n                                              db=config.REDIS_CLIENT_FOR_CACHING_STOCK_INFO_DB_ID)\n        self.redis_client.set(\"today_date\", datetime.datetime.now().strftime(\"%Y-%m-%d\"))\n\n    def get_stock_code_info(self):\n        # TODO:每半年需要更新一次\n        stock_info_df = ak.stock_info_a_code_name()  # 获取所有A股code和name\n        stock_symbol_code = ak.stock_zh_a_spot().get([\"symbol\", \"code\"])  # 获取A股所有股票的symbol和code\n        for _id in range(stock_info_df.shape[0]):\n            _symbol = stock_symbol_code[stock_symbol_code.code == stock_info_df.iloc[_id].code].symbol.values\n            if len(_symbol) != 0:\n                _dict = {\"symbol\": _symbol[0]}\n                _dict.update(stock_info_df.iloc[_id].to_dict())\n                self.col_basic_info.insert_one(_dict)\n\n    def get_historical_news(self, start_date=None, end_date=None, freq=\"day\"):\n        if end_date is None:\n            end_date = datetime.datetime.now().strftime(\"%Y%m%d\")\n        stock_symbol_list = self.col_basic_info.distinct(\"symbol\")\n        if len(stock_symbol_list) == 0:\n            self.get_stock_code_info()\n            stock_symbol_list = self.col_basic_info.distinct(\"symbol\")\n        if freq == \"day\":\n            start_stock_code = 0 if self.redis_client.get(\"start_stock_code\") is None else int(self.redis_client.get(\"start_stock_code\").decode())\n            for symbol in stock_symbol_list:\n                if int(symbol[2:]) > start_stock_code:\n                    if start_date is None:\n                        # 如果该symbol有历史数据，如果有则从API获取从数据库中最近的时间开始直到现在的所有价格数据\n                        # 如果该symbol无历史数据，则从API获取从2015年1月1日开始直到现在的所有价格数据\n                        _latest_date = self.redis_client.get(symbol)\n                        if _latest_date is None:\n                            symbol_start_date = config.STOCK_PRICE_REQUEST_DEFAULT_DATE\n                        else:\n                            tmp_date_dt = datetime.datetime.strptime(_latest_date.decode(), \"%Y-%m-%d\").date()\n                            offset = datetime.timedelta(days=1)\n                            symbol_start_date = (tmp_date_dt + offset).strftime('%Y%m%d')\n\n                    if symbol_start_date < end_date:\n                        stock_zh_a_daily_hfq_df = ak.stock_zh_a_daily(symbol=symbol,\n                                                                      start_date=symbol_start_date,\n                                                                      end_date=end_date,\n                                                                      adjust=\"qfq\")\n                        stock_zh_a_daily_hfq_df.insert(0, 'date', stock_zh_a_daily_hfq_df.index.tolist())\n                        stock_zh_a_daily_hfq_df.index = range(len(stock_zh_a_daily_hfq_df))\n                        _col = self.db_obj.get_collection(self.database_name, symbol)\n                        for _id in range(stock_zh_a_daily_hfq_df.shape[0]):\n                            _tmp_dict = stock_zh_a_daily_hfq_df.iloc[_id].to_dict()\n                            _tmp_dict.pop(\"outstanding_share\")\n                            _tmp_dict.pop(\"turnover\")\n                            _col.insert_one(_tmp_dict)\n                            self.redis_client.set(symbol, str(_tmp_dict[\"date\"]).split(\" \")[0])\n\n                        logging.info(\"{} finished saving from {} to {} ... \".format(symbol, symbol_start_date, end_date))\n                self.redis_client.set(\"start_stock_code\", int(symbol[2:]))\n            self.redis_client.set(\"start_stock_code\", 0)\n        elif freq == \"week\":\n            pass\n        elif freq == \"month\":\n            pass\n        elif freq == \"5mins\":\n            pass\n        elif freq == \"15mins\":\n            pass\n        elif freq == \"30mins\":\n            pass\n        elif freq == \"60mins\":\n            pass\n\n    def get_realtime_news(self, freq=\"day\"):\n        while True:\n            if_updated = input(\"Has the stock price dataset been updated today? (Y/N) \\n\")\n            if if_updated == \"Y\":\n                self.redis_client.set(\"is_today_updated\", \"1\")\n                break\n            elif if_updated == \"N\":\n                self.redis_client.set(\"is_today_updated\", \"\")\n                break\n        self.get_historical_news()  # 对所有股票补充数据到最新\n        while True:\n            if freq == \"day\":\n                time_now = datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n                if time_now.split(\" \")[0] != self.redis_client.get(\"today_date\").decode():\n                    self.redis_client.set(\"today_date\", time_now.split(\" \")[0])\n                    self.redis_client.set(\"is_today_updated\", \"\")  # 过了凌晨，该参数设置回空值，表示今天未进行数据更新\n                if not bool(self.redis_client.get(\"is_today_updated\").decode()):\n                    update_time = \"{} {}\".format(time_now.split(\" \")[0], \"15:30:00\")\n                    if time_now >= update_time:\n                        stock_zh_a_spot_df = ak.stock_zh_a_spot()  # 当天的日数据行情下载\n                        for _id, sym in enumerate(stock_zh_a_spot_df[\"symbol\"]):\n                            _col = self.db_obj.get_collection(self.database_name, sym)\n                            _tmp_dict = {}\n                            _tmp_dict.update({\"date\": Timestamp(\"{} 00:00:00\".format(time_now.split(\" \")[0]))})\n                            _tmp_dict.update({\"open\": stock_zh_a_spot_df.iloc[_id].open})\n                            _tmp_dict.update({\"high\": stock_zh_a_spot_df.iloc[_id].high})\n                            _tmp_dict.update({\"low\": stock_zh_a_spot_df.iloc[_id].low})\n                            _tmp_dict.update({\"close\": stock_zh_a_spot_df.iloc[_id].trade})\n                            _tmp_dict.update({\"volume\": stock_zh_a_spot_df.iloc[_id].volume})\n                            _col.insert_one(_tmp_dict)\n                            self.redis_client.set(sym, time_now.split(\" \")[0])\n                            logging.info(\"finished updating {} price data of {} ... \".format(sym, time_now.split(\" \")[0]))\n                        self.redis_client.set(\"is_today_updated\", \"1\")\n        #TODO:当更新股票价格数据后，接着应该更新股票新闻数据库标签\n\n\n# if __name__ == \"__main__\":\n#     from Kite import config\n#     from Gon.stockinfospyder import StockInfoSpyder\n#\n#     stock_info_spyder = StockInfoSpyder(config.STOCK_DATABASE_NAME, config.COLLECTION_NAME_STOCK_BASIC_INFO)\n#\n#     # 指定时间段，获取历史数据，如：stock_info_spyder.get_historical_news(start_date=\"20150101\", end_date=\"20201204\")\n#     # 如果没有指定时间段，且数据库已存在部分数据，则从最新的数据时间开始获取直到现在，比如数据库里已有sh600000价格数据到\n#     # 2020-12-03号，如不设定具体时间，则从自动获取sh600000自2020-12-04至当前的价格数据\n#     # stock_info_spyder.get_historical_news()\n#\n#     # 开启自动化更新所有股票价格数据(目前只支持在15:30分后更新日数据)\n#     stock_info_spyder.get_realtime_news()\n"
  },
  {
    "path": "legacy_v1/src/Hisoka/classifier.py",
    "content": "import __init__\nimport logging\nimport warnings\n\nfrom Kite import config\n\nimport joblib\nfrom sklearn import svm\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.metrics import classification_report\nimport sklearn.exceptions\n\nlogging.basicConfig(level=logging.INFO,\n                    format=\"%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s\",\n                    datefmt=\"%a, %d %b %Y %H:%M:%S\")\n\nwarnings.filterwarnings(\"ignore\", category=sklearn.exceptions.UndefinedMetricWarning)\nwarnings.filterwarnings(\"ignore\", category=Warning, module='sklearn')\nwarnings.filterwarnings(\"ignore\", category=UserWarning, module='gensim')\nwarnings.filterwarnings(\"ignore\", category=RuntimeWarning, module='gensim')\n\n\nclass Classifier(object):\n\n    def __init__(self):\n        self.scores = config.CLASSIFIER_SCORE_LIST\n\n    def train(self, train_x, train_y, test_x, test_y, model_type=\"svm\", model_save_path=None):\n        assert len(self.scores) != 0\n        clf = None\n        for score in self.scores:\n            # 'cv': 构造这个GridSearch的分类器,5-fold\n            # 'refit': 默认为True,程序将会以交叉验证训练集得到的最佳参数，重新对所有可用的训练，\n            #          作为最终用于性能评估的最佳模型参数。即在搜索参数结束后，用最佳参数结果再\n            #          次fit一遍全部数据集\n            if model_type == \"svm\":\n                tuned_parameters = config.SMV_TUNED_PARAMTERS\n                clf = GridSearchCV(svm.SVC(),\n                                   tuned_parameters,\n                                   cv=5,\n                                   scoring=score,\n                                   refit=\"AUC\")\n            elif model_type == \"rdforest\":\n                tuned_parameters = config.RDFOREST_TUNED_PARAMTERS\n                clf = GridSearchCV(RandomForestClassifier(random_state=10),\n                                   tuned_parameters,\n                                   cv=5,\n                                   scoring=score,\n                                   refit=\"AUC\")\n            # 只在训练集上面做k-fold,然后返回最优的模型参数\n            clf.fit(train_x, train_y)\n            if model_save_path is not None:\n                joblib.dump(clf, model_save_path)\n            # 输出最优的模型参数\n            logging.info(\"the best params: {}\".format(clf.best_params_))\n            train_pred = clf.predict(train_x)\n            test_pred = clf.predict(test_x)  # 在测试集上测试最优的模型的泛化能力\n            logging.info(\"\\n{}\".format(classification_report(test_y, test_pred)))\n            precise_train = 0\n            for k in range(len(train_pred)):\n                if train_pred[k] == train_y[k]:\n                    precise_train += 1\n            precise_test = 0\n            for k in range(len(test_pred)):\n                if test_pred[k] == test_y[k]:\n                    precise_test += 1\n            logging.info('train_accuracy: {}  test_accuracy: {}'\n                         .format(str(round(precise_train / len(train_y), 4)),\n                                 str(round(precise_test / len(test_pred), 4))))\n            self._precise = precise_test / len(test_pred)\n        assert clf is not None\n        return clf\n\n    @staticmethod\n    def model_load(classifier_save_path):\n        return joblib.load(classifier_save_path)"
  },
  {
    "path": "legacy_v1/src/Killua/__init__.py",
    "content": "import os\nimport sys\n\n\ndef add_path(path):\n    if path not in sys.path:\n        sys.path.insert(0, path)\n\n\n# add `./src` dir to system path\nsrc_dir = os.path.abspath(os.path.join(os.getcwd(), \"../\"))\n\nadd_path(src_dir)"
  },
  {
    "path": "legacy_v1/src/Killua/buildstocknewsdb.py",
    "content": "import __init__\n\nimport json\nimport redis\nimport logging\nimport datetime\nimport akshare as ak\n\nfrom Kite import config\nfrom Kite.database import Database\n\nfrom Leorio.tokenization import Tokenization\nfrom Leorio.topicmodelling import TopicModelling\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass GenStockNewsDB(object):\n\n    def __init__(self):\n        self.database = Database()\n        # 获取从1990-12-19至2020-12-31股票交易日数据\n        self.trade_date = ak.tool_trade_date_hist_sina()[\"trade_date\"].tolist()\n        self.label_range = {3: \"3DaysLabel\",\n                            5: \"5DaysLabel\",\n                            10: \"10DaysLabel\",\n                            15: \"15DaysLabel\",\n                            30: \"30DaysLabel\",\n                            60: \"60DaysLabel\"}\n        self.redis_client = redis.StrictRedis(host=config.REDIS_IP,\n                                              port=config.REDIS_PORT,\n                                              db=config.CACHE_NEWS_REDIS_DB_ID)\n        self.redis_client.set(\"today_date\", datetime.datetime.now().strftime(\"%Y-%m-%d\"))\n        self.redis_client.delete(\"stock_news_num_over_{}\".format(config.MINIMUM_STOCK_NEWS_NUM_FOR_ML))\n        self._stock_news_nums_stat()\n\n    def get_all_news_about_specific_stock(self, database_name, collection_name):\n        # 获取collection_name的key值，看是否包含RelatedStockCodes，如果没有说明，没有做将新闻中所涉及的\n        # 股票代码保存在新的一列\n        _keys_list = list(next(self.database.get_collection(database_name, collection_name).find()).keys())\n        if \"RelatedStockCodes\" not in _keys_list:\n            tokenization = Tokenization(import_module=\"jieba\", user_dict=\"./Leorio/financedict.txt\")\n            tokenization.update_news_database_rows(database_name, collection_name)\n        # 创建stock_code为名称的collection\n        stock_symbol_list = self.database.get_data(config.STOCK_DATABASE_NAME,\n                                                   config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                                   keys=[\"symbol\"])[\"symbol\"].to_list()\n        col_names = self.database.connect_database(config.ALL_NEWS_OF_SPECIFIC_STOCK_DATABASE).list_collection_names(session=None)\n        for symbol in stock_symbol_list:\n            if symbol not in col_names:\n                # if int(symbol[2:]) > 837:\n                _collection = self.database.get_collection(config.ALL_NEWS_OF_SPECIFIC_STOCK_DATABASE, symbol)\n                _tmp_num_stat = 0\n                for row in self.database.get_collection(database_name, collection_name).find():  # 迭代器\n                    if symbol[2:] in row[\"RelatedStockCodes\"].split(\" \"):\n                        # 返回新闻发布后n天的标签\n                        _tmp_dict = {}\n                        for label_days, key_name in self.label_range.items():\n                            _tmp_res = self._label_news(\n                                datetime.datetime.strptime(row[\"Date\"].split(\" \")[0], \"%Y-%m-%d\"), symbol, label_days)\n                            _tmp_dict.update({key_name: _tmp_res})\n                        _data = {\"Date\": row[\"Date\"],\n                                 \"Url\": row[\"Url\"],\n                                 \"Title\": row[\"Title\"],\n                                 \"Article\": row[\"Article\"],\n                                 \"OriDB\": database_name,\n                                 \"OriCOL\": collection_name}\n                        _data.update(_tmp_dict)\n                        _collection.insert_one(_data)\n                        _tmp_num_stat += 1\n                logging.info(\"there are {} news mentioned {} in {} collection need to be fetched ... \"\n                             .format(_tmp_num_stat, symbol, collection_name))\n            # else:\n            #     logging.info(\"{} has fetched all related news from {}...\".format(symbol, collection_name))\n\n    def listen_redis_queue(self):\n        # 监听redis消息队列，当新的实时数据过来时，根据\"RelatedStockCodes\"字段，将新闻分别保存到对应的股票数据库\n        # e.g.:缓存新的一条数据中，\"RelatedStockCodes\"字段数据为\"603386 603003 600111 603568\"，则将该条新闻分别\n        # 都存进这四支股票对应的数据库中\n        crawled_url_today = set()\n        while True:\n            date_now = datetime.datetime.now().strftime(\"%Y-%m-%d\")\n            if date_now != self.redis_client.get(\"today_date\").decode():\n                crawled_url_today = set()\n                self.redis_client.set(\"today_date\", date_now)\n            if self.redis_client.llen(config.CACHE_NEWS_LIST_NAME) != 0:\n                data = json.loads(self.redis_client.lindex(config.CACHE_NEWS_LIST_NAME, -1))\n                if data[\"Url\"] not in crawled_url_today:  # 排除重复插入冗余文本\n                    crawled_url_today.update({data[\"Url\"]})\n                    if data[\"RelatedStockCodes\"] != \"\":\n                        for stock_code in data[\"RelatedStockCodes\"].split(\" \"):\n                            # 将新闻分别送进相关股票数据库\n                            symbol = \"sh{}\".format(stock_code) if stock_code[0] == \"6\" else \"sz{}\".format(stock_code)\n                            _collection = self.database.get_collection(config.ALL_NEWS_OF_SPECIFIC_STOCK_DATABASE, symbol)\n                            _tmp_dict = {}\n                            for label_days, key_name in self.label_range.items():\n                                _tmp_res = self._label_news(\n                                    datetime.datetime.strptime(data[\"Date\"].split(\" \")[0], \"%Y-%m-%d\"), symbol, label_days)\n                                _tmp_dict.update({key_name: _tmp_res})\n                            _data = {\"Date\": data[\"Date\"],\n                                     \"Url\": data[\"Url\"],\n                                     \"Title\": data[\"Title\"],\n                                     \"Article\": data[\"Article\"],\n                                     \"OriDB\": data[\"OriDB\"],\n                                     \"OriCOL\": data[\"OriCOL\"]}\n                            _data.update(_tmp_dict)\n                            _collection.insert_one(_data)\n                            logging.info(\"the real-time fetched news {}, which was saved in [DB:{} - COL:{}] ...\".format(data[\"Title\"],\n                                                                                                                         config.ALL_NEWS_OF_SPECIFIC_STOCK_DATABASE,\n                                                                                                                         symbol))\n                            #\n                            # if symbol.encode() in self.redis_client.lrange(\"stock_news_num_over_{}\".format(config.MINIMUM_STOCK_NEWS_NUM_FOR_ML), 0, -1):\n                            #     label_name = \"3DaysLabel\"\n                            #     # classifier_save_path = \"{}_classifier.pkl\".format(symbol)\n                            #     ori_dict_path = \"{}_docs_dict.dict\".format(symbol)\n                            #     bowvec_save_path = \"{}_bowvec.mm\".format(symbol)\n                            #\n                            #     topicmodelling = TopicModelling()\n                            #     chn_label = topicmodelling.classify_stock_news(data[\"Article\"],\n                            #                                                    config.ALL_NEWS_OF_SPECIFIC_STOCK_DATABASE,\n                            #                                                    symbol,\n                            #                                                    label_name=label_name,\n                            #                                                    topic_model_type=\"lsi\",\n                            #                                                    classifier_model=\"rdforest\",  # rdforest / svm\n                            #                                                    ori_dict_path=ori_dict_path,\n                            #                                                    bowvec_save_path=bowvec_save_path)\n                            #     logging.info(\n                            #         \"document '{}...' was classified with label '{}' for symbol {} ... \".format(\n                            #             data[\"Article\"][:20], chn_label, symbol))\n\n                    self.redis_client.rpop(config.CACHE_NEWS_LIST_NAME)\n                    logging.info(\"now pop {} from redis queue of [DB:{} - KEY:{}] ... \".format(data[\"Title\"],\n                                                                                               config.CACHE_NEWS_REDIS_DB_ID,\n                                                                                               config.CACHE_NEWS_LIST_NAME))\n\n    def _label_news(self, date, symbol, n_days):\n        \"\"\"\n        :param date: 类型datetime.datetime，表示新闻发布的日期，只包括年月日，不包括具体时刻，如datetime.datetime(2015, 1, 5, 0, 0)\n        :param symbol: 类型str，表示股票标的，如sh600000\n        :param n_days: 类型int，表示根据多少天后的价格设定标签，如新闻发布后n_days天，如果收盘价格上涨，则认为该则新闻是利好消息\n        \"\"\"\n        # 计算新闻发布当天经过n_days天后的具体年月日\n        this_date_data = self.database.get_data(config.STOCK_DATABASE_NAME,\n                                                symbol,\n                                                query={\"date\": date})\n        # 考虑情况：新闻发布日期是非交易日，因此该日期没有价格数据，则往前寻找，比如新闻发布日期是2020-12-12是星期六，\n        # 则考虑2020-12-11日的收盘价作为该新闻发布时的数据\n        tmp_date = date\n        if this_date_data is None:\n            i = 1\n            while this_date_data is None and i <= 10:\n                tmp_date -= datetime.timedelta(days=i)\n                # 判断日期是否是交易日，如果是再去查询数据库；如果this_date_data还是NULL值，则说明数据库没有该交易日数据\n                if tmp_date.strftime(\"%Y-%m-%d\") in self.trade_date:\n                    this_date_data = self.database.get_data(config.STOCK_DATABASE_NAME,\n                                                            symbol,\n                                                            query={\"date\": tmp_date})\n                i += 1\n        try:\n            close_price_this_date = this_date_data[\"close\"][0]\n        except Exception:\n            close_price_this_date = None\n        # 考虑情况：新闻发布后n_days天是非交易日，或者没有采集到数据，因此向后寻找，如新闻发布日期是2020-12-08，5天\n        # 后的日期是2020-12-13是周日，因此将2020-12-14日周一的收盘价作为n_days后的数据\n        new_date = date + datetime.timedelta(days=n_days)\n        n_days_later_data = self.database.get_data(config.STOCK_DATABASE_NAME,\n                                                   symbol,\n                                                   query={\"date\": new_date})\n        if n_days_later_data is None:\n            i = 1\n            while n_days_later_data is None and i <= 10:\n                new_date = date + datetime.timedelta(days=n_days+i)\n                if new_date.strftime(\"%Y-%m-%d\") in self.trade_date:\n                    n_days_later_data = self.database.get_data(config.STOCK_DATABASE_NAME,\n                                                               symbol,\n                                                               query={\"date\": new_date})\n                i += 1\n        try:\n            close_price_n_days_later = n_days_later_data[\"close\"][0]\n        except Exception:\n            close_price_n_days_later = None\n        # 判断条件：\n        # （1）如果n_days个交易日后且n_days<=10天，则价格上涨(下跌)超过3%，则认为该新闻是利好(利空)消息；如果价格在3%的范围内，则为中性消息\n        # （2）如果n_days个交易日后且10<n_days<=15天，则价格上涨(下跌)超过5%，则认为该新闻是利好(利空)消息；如果价格在5%的范围内，则为中性消息\n        # （3）如果n_days个交易日后且15<n_days<=30天，则价格上涨(下跌)超过10%，则认为该新闻是利好(利空)消息；如果价格在10%的范围内，则为中性消息\n        # （4）如果n_days个交易日后且30<n_days<=60天，则价格上涨(下跌)超过15%，则认为该新闻是利好(利空)消息；如果价格在15%的范围内，则为中性消息\n        # Note：中性消息定义为，该消息迅速被市场消化，并没有持续性影响\n        param = 0.01\n        if n_days <= 10:\n            param = 0.03\n        elif 10 < n_days <= 15:\n            param = 0.05\n        elif 15 < n_days <= 30:\n            param = 0.10\n        elif 30 < n_days <= 60:\n            param = 0.15\n        if close_price_this_date is not None and close_price_n_days_later is not None:\n            if (close_price_n_days_later - close_price_this_date) / close_price_this_date > param:\n                return \"利好\"\n            elif (close_price_n_days_later - close_price_this_date) / close_price_this_date < -param:\n                return \"利空\"\n            else:\n                return \"中性\"\n        else:\n            return \"\"\n\n    def _stock_news_nums_stat(self):\n        cols_list = self.database.connect_database(config.ALL_NEWS_OF_SPECIFIC_STOCK_DATABASE).list_collection_names(session=None)\n        for sym in cols_list:\n            if self.database.get_collection(config.ALL_NEWS_OF_SPECIFIC_STOCK_DATABASE, sym).estimated_document_count() > config.MINIMUM_STOCK_NEWS_NUM_FOR_ML:\n                self.redis_client.lpush(\"stock_news_num_over_{}\".format(config.MINIMUM_STOCK_NEWS_NUM_FOR_ML), sym)\n\n\nif __name__ == \"__main__\":\n    from Kite import config\n    from Killua.buildstocknewsdb import GenStockNewsDB\n\n    gen_stock_news_db = GenStockNewsDB()\n    # gen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\n    # gen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\n    # gen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n\n    # gen_stock_news_db.listen_redis_queue()\n"
  },
  {
    "path": "legacy_v1/src/Killua/deduplication.py",
    "content": "import __init__\n\nfrom Kite.database import Database\nfrom Kite import utils\n\nimport logging\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass Deduplication(object):\n\n    def __init__(self, database_name, collection_name):\n        self.database = Database()\n        self.database_name = database_name\n        self.collection_name = collection_name\n        self.delete_num = 0\n\n    def run(self):\n        date_list = self.database.get_data(self.database_name,\n                                           self.collection_name,\n                                           keys=[\"Date\"])[\"Date\"].tolist()\n        collection = self.database.get_collection(self.database_name, self.collection_name)\n        date_list.sort()  # 升序\n        # start_date, end_date = date_list[1].split(\" \")[0], date_list[-1].split(\" \")[0]\n        start_date, end_date = min(date_list).split(\" \")[0], max(date_list).split(\" \")[0]\n        for _date in utils.get_date_list_from_range(start_date, end_date):\n            # 获取特定时间对应的数据并根据URL去重\n            # logging.info(_date)\n            try:\n                data_df = self.database.get_data(self.database_name,\n                                                 self.collection_name,\n                                                 query={\"Date\": {\"$regex\": _date}})\n            except Exception:\n                continue\n            if data_df is None:\n                continue\n            data_df_drop_duplicate = data_df.drop_duplicates([\"Url\"])\n            for _id in list(set(data_df[\"_id\"]) - set(data_df_drop_duplicate[\"_id\"])):\n                collection.delete_one({'_id': _id})\n                self.delete_num += 1\n            # logging.info(\"{} finished ... \".format(_date))\n        logging.info(\"DB:{} - COL:{} had {} data length originally, now has deleted {} depulications ... \"\n                     .format(self.database_name, self.collection_name, str(len(date_list)), self.delete_num))\n\n\nif __name__ == \"__main__\":\n    from Killua.deduplication import Deduplication\n    from Kite import config\n\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n    Deduplication(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n\n\n\n\n"
  },
  {
    "path": "legacy_v1/src/Killua/denull.py",
    "content": "\"\"\"\n删除数据库中含有null值的行\n\"\"\"\n\nimport __init__\nimport logging\n\nfrom Kite.database import Database\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass DeNull(object):\n\n    def __init__(self, database_name, collection_name):\n        self.database = Database()\n        self.database_name = database_name\n        self.collection_name = collection_name\n        self.delete_num = 0\n\n    def run(self):\n        collection = self.database.get_collection(self.database_name, self.collection_name)\n        for row in self.database.get_collection(self.database_name, self.collection_name).find():\n            for _key in list(row.keys()):\n                if _key != \"RelatedStockCodes\" and row[_key] == \"\":\n                    collection.delete_one({'_id': row[\"_id\"]})\n                    self.delete_num += 1\n                    break\n        logging.info(\"there are {} news contained NULL value in {} collection ... \"\n                     .format(self.delete_num, self.collection_name))\n\n\nif __name__ == \"__main__\":\n    from Killua.denull import DeNull\n    from Kite import config\n\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\n    DeNull(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()"
  },
  {
    "path": "legacy_v1/src/Kite/__init__.py",
    "content": "import os\nimport sys\n\n\ndef add_path(path):\n    if path not in sys.path:\n        sys.path.insert(0, path)\n\n\nthis_dir = os.path.dirname(__file__)\n\n# add `./src/Kite` dir to system path\nadd_path(this_dir)"
  },
  {
    "path": "legacy_v1/src/Kite/config.py",
    "content": "MONGODB_IP = \"localhost\"\nMONGODB_PORT = 27017\nREDIS_IP = \"localhost\"\nREDIS_PORT = 6379\nTHREAD_NUMS_FOR_SPYDER = 4\n\nDATABASE_NAME = \"finnewshunter\"\n\nCOLLECTION_NAME_CNSTOCK = \"cnstock\"\nCHROME_DRIVER = \"./chromedriver.exe\"\n# WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK = {\"https://company.cnstock.com/company/scp_gsxw\": \"公司聚焦\",\n#                                        \"https://ggjd.cnstock.com/gglist/search/qmtbbdj\": \"公告解读\",\n#                                        \"https://ggjd.cnstock.com/gglist/search/ggkx\": \"公告快讯\",\n#                                        \"https://ggjd.cnstock.com/company/scp_ggjd/tjd_sdlh\": \"利好公告\"}\nWEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK = {\"https://company.cnstock.com/company/scp_gsxw\": \"公司聚焦\",\n                                       \"http://ggjd.cnstock.com/company/scp_ggjd/tjd_bbdj\": \"公告解读\",\n                                       \"http://ggjd.cnstock.com/company/scp_ggjd/tjd_ggkx\": \"公告快讯\",\n                                       \"https://ggjd.cnstock.com/company/scp_ggjd/tjd_sdlh\": \"利好公告\"}\nRECORD_CNSTOCK_FAILED_URL_TXT_FILE_PATH = \"D:/workfiles/gpu-cloud-backup/Listed-company-news-crawl-and-text-analysis/src/Gon/cnstock_failed_urls.txt\"\nCNSTOCK_MAX_REJECTED_AMOUNTS = 10\n\nCOLLECTION_NAME_JRJ = \"jrj\"\nJRJ_DATE_RANGE = 100\nWEBSITES_LIST_TO_BE_CRAWLED_JRJ = \"http://stock.jrj.com.cn/xwk\"\nRECORD_JRJ_FAILED_URL_TXT_FILE_PATH = \"D:/workfiles/gpu-cloud-backup/Listed-company-news-crawl-and-text-analysis/src/Gon/jrj_failed_urls.txt\"\nJRJ_MAX_REJECTED_AMOUNTS = 10\nJRJ_REQUEST_DEFAULT_DATE = \"2015-01-01\"\nCACHE_SAVED_NEWS_JRJ_TODAY_VAR_NAME = \"cache_news_queue_jrj\"\n\nCOLLECTION_NAME_NBD = \"nbd\"\nWEBSITES_LIST_TO_BE_CRAWLED_NBD = \"http://stocks.nbd.com.cn/columns/275/page\"\nRECORD_NBD_FAILED_URL_TXT_FILE_PATH = \"D:/workfiles/gpu-cloud-backup/Listed-company-news-crawl-and-text-analysis/src/Gon/nbd_failed_urls.txt\"\nNBD_TOTAL_PAGES_NUM = 684\nNBD_MAX_REJECTED_AMOUNTS = 10\nCACHE_SAVED_NEWS_NBD_TODAY_VAR_NAME = \"cache_news_queue_nbd\"\n\nTUSHARE_TOKEN = \"97fbc4c73727b5d171ca6670cbc4af8b0a3de5fbab74b52f30b598cc\"\nSTOCK_DATABASE_NAME = \"stock\"\nCOLLECTION_NAME_STOCK_BASIC_INFO = \"basic_info\"\nSTOCK_PRICE_REQUEST_DEFAULT_DATE = \"20150101\"\nREDIS_CLIENT_FOR_CACHING_STOCK_INFO_DB_ID = 1\n\nALL_NEWS_OF_SPECIFIC_STOCK_DATABASE = \"stocknews\"\n\nTOPIC_NUMBER = 200\nSVM_TUNED_PARAMTERS = {\"kernel\": [\"rbf\"], \"gamma\": [10, 20, 50, 100, 150, 200], \"C\": [10, 15, 20, 30, 50, 100]}\nRDFOREST_TUNED_PARAMTERS = {\"n_estimators\": [1, 2, 3, 4, 5, 10],\n                            \"criterion\": [\"gini\", \"entropy\"],\n                            \"max_features\": [\"auto\", \"sqrt\"]}\nCLASSIFIER_SCORE_LIST = [\"f1_weighted\"]\nUSER_DEFINED_DICT_PATH = \"D:/workfiles/gpu-cloud-backup/Listed-company-news-crawl-and-text-analysis/src/Leorio/financedict.txt\"\nCHN_STOP_WORDS_PATH = \"D:/workfiles/gpu-cloud-backup/Listed-company-news-crawl-and-text-analysis/src/Leorio/chnstopwords.txt\"\n\nCACHE_NEWS_REDIS_DB_ID = 0\nCACHE_NEWS_LIST_NAME = \"cache_news_waiting_for_classification\"\n\nCACHE_RECORED_OPENED_PYTHON_PROGRAM_DB_ID = 0\nCACHE_RECORED_OPENED_PYTHON_PROGRAM_VAR = \"opened_python_scripts\"\n\nMINIMUM_STOCK_NEWS_NUM_FOR_ML = 1000"
  },
  {
    "path": "legacy_v1/src/Kite/database.py",
    "content": "from pymongo import MongoClient\nimport pandas as pd\n\n\nclass Database(object):\n\n\tdef __init__(self, ip=\"localhost\", port=27017):\n\t\tself.ip = ip\n\t\tself.port = port\n\t\tself.conn = MongoClient(self.ip, self.port)\n\n\tdef connect_database(self, database_name):\n\t\treturn self.conn[database_name]\n\n\tdef get_collection(self, database_name, collection_name):\n\t\treturn self.connect_database(database_name).get_collection(collection_name)\n\n\tdef insert_data(self, database_name, collection_name, data_dict):\n\t\tdatabase = self.conn[database_name]\n\t\tcollection = database.get_collection(collection_name)\n\t\tcollection.insert_one(data_dict)\n\n\tdef update_row(self, database_name, collection_name, query, new_values):\n\t\tassert isinstance(query, dict)\n\t\tassert isinstance(new_values, dict)\n\t\tdatabase = self.conn[database_name]\n\t\tcollection = database.get_collection(collection_name)\n\t\tcollection.update_one(query, {\"$set\": new_values})\n\n\tdef get_data(self, database_name, collection_name, max_data_request=None, query=None, keys=None):\n\t\t# e.g.:\n\t\t# ExampleObj = Database()\n\t\t# ExampleObj.get_data(\"finnewshunter\", \"nbd\", query={\"Date\": {\"$regex\": \"2014\"}}, keys=[\"Url\", \"Title\"])\n\t\tdatabase = self.conn[database_name]\n\t\tcollection = database.get_collection(collection_name)\n\t\tif query:\n\t\t\tassert isinstance(query, dict)\n\t\telse:\n\t\t\tquery = {}\n\t\tif keys:\n\t\t\tassert isinstance(keys, list)\n\t\telse:\n\t\t\tkeys = []\n\t\tif max_data_request:\n\t\t\tassert isinstance(max_data_request, int)\n\t\telse:\n\t\t\tmax_data_request = float(\"inf\")\n\t\ttry:\n\t\t\tif len(keys) != 0:\n\t\t\t\t_dict = {_key: [] for _key in keys}\n\t\t\t\tdata = collection.find(query) if len(query) != 0 else collection.find()\n\t\t\t\tfor _id, row in enumerate(data):\n\t\t\t\t\tif _id + 1 <= max_data_request:\n\t\t\t\t\t\tfor _key in keys:\n\t\t\t\t\t\t\t_dict[_key].append(row[_key])\n\t\t\t\t\telse:\n\t\t\t\t\t\tbreak\n\t\t\telse:\n\t\t\t\t# data = collection.find()\n\t\t\t\tdata = collection.find(query) if len(query) != 0 else collection.find()\n\t\t\t\tdata_keys = list(\n\t\t\t\t\tnext(data).keys())  # ['_id', 'Date', 'PageId', 'Url', 'Title', 'Article', 'RelevantStockCodes']\n\t\t\t\t_dict = {_key: [] for _key in data_keys}\n\t\t\t\tfor _id, row in enumerate(collection.find(query) if len(query) != 0 else collection.find()):\n\t\t\t\t\tif _id + 1 <= max_data_request:\n\t\t\t\t\t\tfor _key in data_keys:\n\t\t\t\t\t\t\t_dict[_key].append(row[_key])\n\t\t\t\t\telse:\n\t\t\t\t\t\tbreak\n\t\t\treturn pd.DataFrame(_dict)\n\t\texcept Exception:\n\t\t\treturn None\n\n\tdef drop_db(self, database):\n\t\tself.conn.drop_database(database)\n\n\n'''\nfrom database import Database\n\nExampleObj = Database()\ndb = ExampleObj.connect_database(\"cnstock\")\ncol = ExampleObj.create_col(db, \"cnstock_col\")\nExampleObj.insert_data(col, {'name': 'sena', \"id\": 136})\nExampleObj.drop_db(db)\n'''\n\n"
  },
  {
    "path": "legacy_v1/src/Kite/log.py",
    "content": ""
  },
  {
    "path": "legacy_v1/src/Kite/utils.py",
    "content": "import re\nimport datetime\nimport requests\nimport numpy as np\nfrom bs4 import BeautifulSoup\nfrom scipy.sparse import csr_matrix\n\n\ndef generate_pages_list(total_pages, range, init_page_id):\n    page_list = list()\n    k = init_page_id\n\n    while k + range - 1 <= total_pages:\n        page_list.append((k, k + range -1))\n        k += range\n\n    if k + range - 1 < total_pages:\n        page_list.append((k, total_pages))\n\n    return page_list\n\n\ndef count_chn(string):\n    '''Count Chinese numbers and calculate the frequency of Chinese occurrence.\n\n    # Arguments:\n        string: Each part of crawled website analyzed by BeautifulSoup.\n    '''\n    pattern = re.compile(u'[\\u1100-\\uFFFDh]+?')\n    result = pattern.findall(string)\n    chn_num = len(result)\n    possible = chn_num / len(str(string))\n\n    return chn_num, possible\n\n\ndef get_date_list_from_range(begin_date, end_date):\n    '''Get date list from 'begin_date' to 'end_date' on the calendar.\n    '''\n    date_list = list()\n    begin_date = datetime.datetime.strptime(begin_date, \"%Y-%m-%d\")\n    end_date = datetime.datetime.strptime(end_date, \"%Y-%m-%d\")\n    while begin_date <= end_date:\n        date_str = begin_date.strftime(\"%Y-%m-%d\")\n        date_list.append(date_str)\n        begin_date += datetime.timedelta(days=1)\n\n    return date_list\n\n\ndef gen_dates_list(date_list, date_range):\n    date_list_latest = list()\n    k = 0\n    while k < len(date_list):\n        if k + date_range >= len(date_list):\n            break\n        else:\n            date_list_latest.append(date_list[k: k + date_range])\n            k += date_range\n    date_list_latest.append(date_list[k:])\n\n    return date_list_latest\n\n\ndef get_date_before(n_days):\n    \"\"\"\n    获取前n_days天的日期，如今天是2020-12-25，当n_days=1，返回\"2020-12-24\"\n    :param n_days: 前n_days天数，如n_days=1，即前1天\n    \"\"\"\n    today = datetime.datetime.now()\n    # 计算偏移量\n    offset = datetime.timedelta(days=-n_days)\n    # 获取想要的日期的时间\n    re_date = (today + offset).strftime('%Y-%m-%d')\n    return re_date\n\n\ndef search_max_pages_num(first_url, date):\n    \"\"\"\n    主要针对金融界网站\n    通过日期搜索新闻，比如2020年1月1日的新闻，下面链接\n    http://stock.jrj.com.cn/xwk/202001/20200101_1.shtml\n    为搜索返回的第一个网页，通过这个网页可以发现，数据库\n    返回的最大页数是4，即2020年1月1日共有4页的新闻列表\n    :param first_url: 搜索该日期返回的第一个网址，如'http://stock.jrj.com.cn/xwk/202001/20200101_1.shtml'\n    :param date: 日期，如'2020-01-01'\n    \"\"\"\n    respond = requests.get(first_url)\n    respond.encoding = BeautifulSoup(respond.content, \"lxml\").original_encoding\n    bs = BeautifulSoup(respond.text, \"lxml\")\n    a_list = bs.find_all(\"a\")\n    max_pages_num = 1\n    for a in a_list:\n        if \"href\" in a.attrs and \"target\" in a.attrs:\n            if a[\"href\"].find(date.replace(\"-\", \"\") + \"_\") != -1 \\\n                    and a.text.isdigit():\n                max_pages_num += 1\n\n    return max_pages_num\n\n\ndef html_parser(url):\n    resp = requests.get(url)\n    resp.encoding = BeautifulSoup(resp.content, \"lxml\").original_encoding\n    bs = BeautifulSoup(resp.text, \"lxml\")\n\n    return bs\n\n\ndef get_chn_stop_words(path):\n    '''Load the stop words txt file.\n    '''\n    stopwords = [line.strip() for line in open(path, 'r').readlines()]\n\n    return stopwords\n\n\ndef convert_to_csr_matrix(model_vector):\n    \"\"\"\n    Convert LDA(LSI) model vector to CSR sparse matrix, that could be accepted by Scipy and Numpy.\n\n    # Arguments:\n        modelVec: Transformation model vector, such as LDA model vector, tfidf model vector or lsi model vector.\n    \"\"\"\n    data = []\n    rows = []\n    cols = []\n    _line_count = 0\n    for line in model_vector:  # line=[(int, float), (int, float), ...]\n        for elem in line:  # elem=(int, float)\n            rows.append(_line_count)\n            cols.append(elem[0])\n            data.append(elem[1])\n        _line_count += 1\n    sparse_matrix = csr_matrix((data, (rows, cols)))\n    matrix = sparse_matrix.toarray()  # <class 'numpy.ndarray'>\n\n    return matrix\n\n\ndef generate_training_set(x, y, split=0.8):\n    rand = np.random.random(size=x.shape[0])\n    train_x = []\n    train_y = []\n    test_x = []\n    test_y = []\n    for i in range(x.shape[0]):\n        if rand[i] < split:\n            train_x.append(x[i, :])\n            train_y.append(y[i])\n        else:\n            test_x.append(x[i, :])\n            test_y.append(y[i])\n    return train_x, train_y, test_x, test_y\n\n\ndef is_contain_chn(word):\n    \"\"\"\n    判断传入字符串是否包含中文\n    :param word: 待判断字符串\n    :return: True:包含中文  False:不包含中文\n    \"\"\"\n    zh_pattern = re.compile(u'[\\u4e00-\\u9fa5]+')\n    if zh_pattern.search(word):\n        return True\n    else:\n        return False\n\n\ndef batch_lpop(client, key, n):\n    p = client.pipeline()\n    p.lrange(key, 0, n-1)\n    p.ltrim(key, n, -1)\n    p.execute()\n"
  },
  {
    "path": "legacy_v1/src/Kite/webserver.py",
    "content": ""
  },
  {
    "path": "legacy_v1/src/Leorio/__init__.py",
    "content": "import os\nimport sys\n\n\ndef add_path(path):\n    if path not in sys.path:\n        sys.path.insert(0, path)\n\n\n# add `./src` dir to system path\nsrc_dir = os.path.abspath(os.path.join(os.getcwd(), \"../\"))\nadd_path(src_dir)"
  },
  {
    "path": "legacy_v1/src/Leorio/chnstopwords.txt",
    "content": "\nÿ\n\nǰ\nת\nλ\n\n֤ȯ\n\n\n\n\n\n\nο\n\n\nΥ߱ؾ\n\n\n\n£\n\n\n\n:\n\n \n&\n*\nһһ\n~~~~\n\n. \n\n.һ\n./\n-- \n\n\n\nۣ\n\nۢݣݣ\nۢ٣ģ\n\nP\n\n//\n\n\nۢڣ\nۢڣ\n\n}\nҲ \n\n\nۢ٢ޣ\nۢڣ£ \nۢ٣\nۢܣ\nۢ٢ۣ\nۣۢ\nۣ\n \n \nۢڣ\n \n \nۢ٢\n \n\nۢݣ\nۢڣ \nۢܣ\nۢڢۣ\nۣۢ\nۢܣ\nۢ٢ݣ\nۢ٢ߣ\nۢ٣\nʣ \nۢ٢\nۢ٢ܣ\nۢ٣\nۢڣ\nۢڢ\nۢڢ٣\nۢ٣ã\nۣۢ\nۣۢ\nۢڢݣ\nۢڢڣ\nһ.\nۢ٣\n.\nۣ\nۢ٣£\n/\nۢ٣\nۣۢ\nۢ٢٣\nۢܣ\nۢܣ\nۣۢ\nۢݣ\nۢ٣\nۢڢ\nۢڢߣ\nۢ٣\nۢڣ\n\nݣ\n://\n\nۢڢ\nۢݣ\n\n\n...\n...................\n\nڣأƣɣԣ\nۣۢƣ\n\nۢ٣\nݡġ䣽 \nȦա\n\n\nڣ\n\nۢۢ٣\nң̣\nۢ٣ţ\n\nۣݣ\n\n. \nۢڣ\nۢ\nۢڢߣ\nۢڢڣ\nۣۢ\nۢ٣\nۢ٣£\nۢ٣\nۢ٣\nۢ٣\nۢ٢ڣ\nۢڣ\n\nۢ\n\nۢ٣\nۢڣ\nۢڢޣ\nۣۢ\nۢڢ\n\n\n\nԪ\nۢڢ\n\n  \nۢ٣\n::\nۢڣ\nۣۢ\nۢܣ\nۢݣ\nۢޣ\nۢߣ\nۢ\nۢ \n\n\n?\n\n\n\n\n\n\n\n\n\n\n\n\n,\n\n'\n? \n\n\n\n? \n\n<\n>\n\n\n\n\n[\n]\n(\n)\n-\n+\n\n\n\n/\n\n\n\n\n\n\n\n\n\n\n\n\n\"\n;\n#\n@\n\n\n\nգ\n \n\n\n\nsub\nexp \nsup\nsub\nLex \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n=\n\n\n\n\n\n\n\n\nۢݣ\nۢݣ\nۢڣ\n \nۢڣǣ\nۢ٣\ṇ\n \nۣ\n......\n\n\n\nʵϰ\n\n\n\n\nѽ\nӴ\n\n\n\n\n\n\n\n\n\n\n\n\n\nȷ\n\n\n\n˴\n\n\n\n˵\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nȻ\n\n\nΩ\n\nֻ\n\n\n\n\n\n\n\n֮\n\n\n\n˼\n\n\nӶ\n\n\n\n\n\n\n\n\n\nĻ\n\nȵ\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n˵\n֮\nǵ\nͽ\n\n\nµ\n\n\n\n\n\nλ\n\n\n\n\n\n\nʴ\nȻ\n\n\n\nȻ\n\n\n\n\n\n\n\nδ\nο\nʱ\n\n\n\n\n\n\n\n\n仰˵\n֮\n\n\n\n\n\n\n\n\n\n\n\n\nʹ\n\nʱ\n\n\nȻ\n\n̶\n֮\n\n\nʹ\n\n\n\n֮\n\n\n\n\n\n\n\n\n\n\n\n\n˵\n\n˵\n˵\nʼ\n\n\n\n\nɼ\n\n\n\n\n\n\n\n\n\n\nͬ\n\n\n\n\n\nһ\n\n\n\n˵\n˵\nð\nô\nÿ\nÿ\n\nĪ\nĳ\nĳ\nĳЩ\n\n\nı\nĶ\nĸ\n\n\n\n\nЩ\n\n\nǱ\nǶ\nǸ\nǻ\n\nô\nôЩ\nô\nʱ\nЩ\n\n\n\n\n\n\n\n\n\n\n\nԸ\nŶ\nŻ\nž\n\n\nƾ\nƾ\n\n\n\n\n\nһ\n\n\n\n\n\nǡǡ෴\nǰ\nǰ\n\nȻ\nȻ\nȻ\n\n˼\n\nκ\nƾ\n\n\n\n\n\n\n\n\n\n\nɶ\n\n\n\nʹ\n\nô\n\nʡ\nʱ\nʲô\nʲô\nʹ\n\nǵ\n\n˭\n˭֪\n˳\n˳\nƵ\n\nȻ\n˵\n\n\n\n\n\n\n\n\n\n\n\n\n\nȻ\nȻ\n\nʹ\n\n\nͨ\nͬ\nͬʱ\n\nһ\n\n\nΪ\nΪ\nΪ\nΪʲô\nΪ\nι\n\n\n\n\nغ\nں\n\n\n\n\n\nԶ\n\n\n\n\nѽ\n\n\n\nҪ\nҪ\nҪȻ\nҪ\nҪô\nҪ\nҲ\nҲ\nҲ\nһ\nһ\nһ\nһ\nһ\nһ\nһ\nһ\n\n\n\n\nԱ\nԼ\n\n\n\n\nֻ\n\n\n\nΪ\nӴ\n\n\nɴ˿ɼ\n\n\nе\nй\nЩ\n\n\n\nǺ\n\nͬʱ\n\n\nԽ\n\n\n˵\n\n\n\n\n\n\n\nô\nô\nô\n\nզ\n\n\n\n\n\n\n\n\n˵\n\nô\nô\nôЩ\nô\nʱ\nЩ\n\n\n֨\n֮\n֮\n֮\n֮һ\nֻ\nֻ\nֻҪ\nֻ\n\n\nλ\n\n\n\nԴ\nԸ\nԸ\nԼ\nԼ\n\n\nܵ\nܵ˵\nܵ˵\n֮ܶ\n֮\n\n\nȻ\nʹ\n\nΪ\n\n\n\n\n\n\n\n\n\n\n\n\nѽ\nӴ\n\n\nҰ\nŰ\n\n\n\n\nʱ\n˵\n\n\n\nȻ\n˳\nװ\n\n\n\n\n\n\n\n\n\n\n\n\n\n˵\n\nϾ\n\nض\nؽ\n\n\n\n\n\nû\nû\n\n\nȻ\n\n\n\n\n\nò\n\n\n\n\n\n\n\n\n\n\n\nɿ\nɿ\n\n\n\n\n\nܲ\n\n\nȻĻ\n\n\nʤ\nʱ\n\nͬ\n\nҪ\n\n\n\n\n\n\nֺ\nɵ\n\nֶ\nô\n\n֪\nֹ\nֹһ\n\n\n\nԵ\n\nһ\n\n\nԵ\n˵\n˵ú\nȥ\n˵\n\n\n\nҹ\n\nñ\nû\n\n\n\n\n\n\n˻\nʤ\n\n϶\n\nȻ\n\n\n伫\n\n\n\n\n\n\nȥ\n\n˶\n\n\nȥ\nȴ\n\n\nϢ\n\n˵\n\n\n\n\n˺\n\nε\nҴ\nӲ\nӴ\nӴԺ\nӹŵ\nӹ\nӽԺ\nӿ\n\n\n\nͷ\nδ\n޵\nС\n\n\n\n絽\n\n\n\n\n\nﵩ\n\n촰˵\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nԼ\n\n\n\n\n\n\n\n\n\nԸ\nָ֮\n\n\n\nڶ\nȻ\nͥ\nͷ\n\n\n\n\n˵\n\n\n\n˶\nĿǰΪֹ\nͷ\nͷ\n\n\nȷ\nȵ\n\n\n\n\n\nȻ\n\n\n\nȻ\nʱ\n\n\n\n\n\n\nǰ\n\n\n\n\n˵\nû˵\n\n\n\n\n֮Ȼ\n֮\n\n\n\n\nǳ\nǵ\n\nڷ\nͷ\n\nȻ\n\n\n\n\n¸\nõ\n\nϿ\n粻\n\n\n\n\nղ\nպ\n\nߵ\n\n\nҹ\n\nʽ\n\n\nһ\nΪ\nȻ\n\n\nƵ\n\n\nʶ\n\n\n\nֲ\n߳\n\n\n\n\n\n\n\nޱ\n\n\nα\nγ\nη\nο\nֶΪ\n\nֹ\n\nܶ\n\nȻ\n\n\n\nȻ\n\n\n\n˵\n\nȻ\n\nȻ\n\nͬ\n\n\n\n\n\n\n\n\n\n\n\nΪ\nҴ\n\n\n˵\n\n\n\n\n...\n֮\n\n\n\n֮\n֮\nֱ\n\n\n\nҪ\n\nϱ\nΪ\n\n\nԿ\nȻ\n\n\n\n\nʱ\n\n\n\n\nȥ\n\n\n\n\n\n\n\n\nȻ\n\nĽ\nľ\n\n\n\n\nȻ\n\nʹ\n͵\n\nȻ\n\nٷ\nݳ\nݴ\nʵ\n˵\n֪\nϤ\n˵\n\n\n\n\n\n\n\n\n\n\n\n\nȥ\n\nɺ\n\n\n\nҪ\n\nü\n\n\n\n\n\nϴ\nʵʵ\n\n۴\n\n\n\nӦ\n\n\n\n\n\nʱ\n\n\n\n\n\n\nٵ\n\n\nһ\n·\n\nŴ\nŴ\n\n\nʶ\nȻ\n\nԼ\n΢\nΪ\n˵\n\n\n\nû\nû\nÿ\nÿÿ\nÿʱÿ\nȻ\nȻ\nĪ\nĪ\nĪ\nĪ\nĬĬ\nĬȻ\n\nĩ\n\nѵ\nѵ\nѹ\n˵\n\n긴һ\n\nż\nż\n\n\n\nƩ\nƫƫ\nƹ\nƽ\n\n\nͨ\n\nʵ\n\n\n\n\n\nͷ\n\n\n\nֹ\n\nǡ\nǡ\nǡǡ\nǡ\nǡ\nǡ\nǧ\n\nǧ\nǧǧ\n\nв\nĪ\n\n\n\n׿\n\n\n\n\n\n\n̼\n֮\n\n\nȡ\nȥ\nȨʱ\nȫ\nȫ\nȫ\nȫȻ\nȫ\nȻ\n\n\nԾ\nȻ\nոһ\nռ\nս\n\n\n糣\n˵ȵ\n\n\n\nǰ\n\n\n\n\n\nͷ\nɪɪ\nɳɳ\n\n\nȥ\nһ.\nһһ\nһ\nһ\nһЩ\nһ\nһͨ\nһ\nһ\nһʱ\nһ\nһƬ\nһ\nһֱ\nһ\nһ\nһת\nһ\nһ\n\n\n\n\n\nȥ\n\n\nһ\n\n\n\n\n\nȻ\n\n\n\n\n˵\nר\nҲ˵\n˵\nϸ\n\n\nС\nм\nḻ\nΪ\nΪʲ\nΪֹ\nΪ\n\nҪ\n\n\n֮ǰ\n֮\n֮\nҲ˵\nҲ\n˽\nȡ\n\nƶ\nЩ\n\n\n\n\nʲ\n\n\n\n\n\n\n\n\n\n\n\n\nΪ\nǰ\nԺ\n\n\nԹ\n\n\n\nͼ\nΰ\nƺ\n\n\n\n\n\nʹ\nʹ\n\n\n\nٽ\n\n\nȻ\n\n\nԪ\nȲ\nȺ\n\n\nȫ\nȫ\nȫ\nͬ\n\n\n֮\n\n\n\n\nٴ\n˵\n\n׼\n\n\n\n\nֱ\n\n\n\n\nǰ\nǰ\nǰ\n\nǿ\nʮ\n\nȴ\nȴ\nԭ\nּ\nʱ\n˫\nӦ\nӳ\nȡ\nܵ\n\nϤ\nֻ\nֻ\nֻ\nֻ\n\nٿ\n\n\n\n\nͬһ\nͬ\n\n\n\nʹ\nΧ\nǺ\n\nΨ\nॵ\n\n\nٺ\n\n\n\n\n\n\n\n\nô\n\n\n\n\n\nʧȥ\n\n\n\nõ\n\nͬ\n\nʼ\n\n\n֪\nǵ\n\n\nȫ\nȫ\n\nʵ\nʵ\n\n\n\nӦ\nԴ\nԷ\nԱ\nС\n\n\n\n\n\nҪ\n\n\n޴\n\n\n\nѾ\n\nͰ\n\n\n\n\n㷺\nӦ\nӦ\nӦ\n\n\nչ\n\nǿ\nǿ\n\nǰ\n\nʱ\nγ\n\nʱ\n\n\n\n\nó\nõ\n\nȻ\nҪ\n\n\n\nܽ\n\n\nΩ\n˼\nԸ\nΪ\n\nҵ\n\nԻ\nս\n\n\n\nν\n\n\n\n/\n\n\n\n\n޷\n\n\nȷ\nǲ\n\nǷ\nȻ\n\nͨ\nձ\n\n\n\n\n\n\n\n\n\n\n\n\nм\n\nЧ\nʱ\nе\nе\n\n\nĩ##ĩ\n\n\n˵\n\nĳĳ\n\nӭ\n\nֵ\n\n\n\n\n˵\n˴\nʱ\n˴\nÿ\nÿ\nÿ\nȼ\nȽ\nûκ\nע\n\n\n\nȻ\nر\n\nص\n\n\nִ\n\n\n\n\n\n\nɴ\nĿǰ\nֱ\nֱ\n\n\n෴\nͬ\n\nӦ\n൱\n\n\n\n\n\n\n\nգ\nӺ\n\n֪\nȷ\n\n\nƶ\nͻ\nͻȻ\n\n\nڶ\n\nϰ\n\n\n̺\n\nά\n\nϵ\nܷ\nܹ\nԺ\nԴ\n\n\n\n\n\n\n\nΧ\nĪȻ\n\nΪ\nж\n\nʾ\nҪ\n涨\n\nƩ\nΪ\n\nʶ\n\n\n\n˵\n˵\n˵˵\n\n\n˭\n˭\n\n\n\nת\nת\nת\nﵽ\nѸ\nȥ\n\n\nҪ\nһ\n\n\n\n\n\n\n\n\nӦ\nʵ\n\n\n\nͨ\n\n\n⵽\nѭ\n\nǰ\n\n\nȡ\n\nش\n\nҪ\n\n\nֹ\n\n\n\nʱ\n\nѵ˵\n\nҪ\n\nǶ\n\n "
  },
  {
    "path": "legacy_v1/src/Leorio/financedict.txt",
    "content": "备付金\n余额宝\n佣金宝\n前海\nC轮融资\n区块链\n数字货币\n去中心化\n正虹科技\n千山药机\n常山北明\n华菱精工\n蓝晓科技\n兴化股份\n红墙股份\n世荣兆业\n奥飞数据\n万兴科技\n德邦股份\n海辰药业\n宣亚国际\n长亮科技\n蓝色光标\n翔港科技\n永吉股份\n天永智能\n成飞集成\n北特科技\n科顺股份\n三五互联\n哈空调\n新宁物流\n湖南投资\n华联控股\n上海雅仕\n海澜之家\n富祥股份\n药石科技\n神雾环保\n新城控股\n上峰水泥\n旗滨集团\n久吾高科\n天虹股份\n横店影视\n天泽信息\n华发股份\n四川双马\n国发股份\n中国国航\n万年青\n复旦复华\n信达地产\n光启技术\n中设集团\n山西焦化\n象屿股份\n南京银行\n安迪苏\n神雾节能\n罗普斯金\n展鹏科技\n罗 牛 山\n中石科技\n真视通\n金发拉比\n葛洲坝\n大唐电信\n劲胜智能\n*ST金宇\n智飞生物\n科力远\n东方通信\n英可瑞\n*ST东海A\n阳光股份\n中房股份\n南华仪器\n顺网科技\n天邦股份\n先导智能\n南方航空\n华斯股份\n森马服饰\n尚品宅配\n彩虹股份\n珠江实业\n中交地产\n光华科技\n云南城投\n诚志股份\n信息发展\n泰格医药\n飞乐音响\n永悦科技\n中国化学\n宏昌电子\n东北电气\n南山控股\n我武生物\n天威视讯\n康隆达\n协鑫集成\n中旗股份\n海峡股份\n古越龙山\n爱建集团\n阳 光 城\n百合花\n格力电器\n楚江新材\n瀛通通讯\n*ST云网\n天健集团\n掌阅科技\n中坚科技\n中欣氟材\n得利斯\n海天味业\n滨江集团\n久其软件\n当代明诚\n吉比特\n中源协和\n华友钴业\n格力地产\n冠农股份\n重庆啤酒\n华英农业\n珠海港\n杭氧股份\n海螺水泥\n世茂股份\n京山轻机\n华联综超\n威孚高科\n井神股份\n华鑫股份\n华录百纳\n生 意 宝\n开山股份\n华新水泥\n飞利信\n南大光电\n众信旅游\n重庆建工\n奥马电器\n雷曼股份\n招商蛇口\n一汽轿车\n镇海股份\n北新建材\n世龙实业\n中南文化\n海汽集团\n*ST匹凸\n六国化工\n掌趣科技\n北大荒\n中国建筑\n健友股份\n大晟文化\n中远海特\n首旅酒店\n中国人寿\n金牌厨柜\n金地集团\n风语筑\n海大集团\n精测电子\n吉宏股份\n中海油服\n金自天正\n湘潭电化\n东方雨虹\n新元科技\n先达股份\n烽火通信\n唐人神\n首开股份\n创业软件\n华鲁恒升\n老板电器\n欧普照明\n新 希 望\n金健米业\n高鸿股份\n恒大高新\n九强生物\n盛天网络\n五洲交通\n中国高科\n哈工智能\n科达洁能\n新南洋\n大商股份\n东方财富\n江河集团\n大华股份\n中青宝\n天玑科技\n高升控股\n同仁堂\n安德利\n万方发展\n田中精机\n合盛硅业\n通源石油\n湖南海利\n广州港\n华西能源\n蓝盾股份\n聚灿光电\n辉隆股份\n未名医药\n柯利达\n傲农生物\n塔牌集团\n金 融 街\nST云维\n山西证券\n蓝思科技\n中国长城\n易见股份\n新日股份\n三诺生物\nS佳通\n吉艾科技\n电工合金\n山鹰纸业\n金科股份\n南 玻Ａ\n创新股份\n华胜天成\nST景谷\n三全食品\n新钢股份\n银座股份\n新华保险\n神马股份\n沱牌舍得\n中国武夷\n云南锗业\n国旅联合\n元成股份\n北陆药业\n赫美集团\n卧龙地产\n上港集团\n康得新\n福建水泥\n滨海能源\n保龄宝\n金冠电气\n蓝光发展\n梅雁吉祥\n大连重工\n当代东方\n冀东装备\n大秦铁路\n福星股份\n欧派家居\n众应互联\n绿景控股\n华东重机\n通达股份\n波导股份\n京汉股份\n电子城\n华伍股份\n大连圣亚\n皮阿诺\n美利云\n冀东水泥\n三峡新材\n奇精机械\n海量数据\n恒基达鑫\n金杯电工\n金陵体育\n音飞储存\n上海银行\n振东制药\n沙河股份\n康跃科技\n利尔化学\n梦百合\n凯伦股份\n*ST昌九\n会稽山\n苏垦农发\n汇洁股份\n华菱星马\n杰克股份\n万达信息\n华策影视\n银亿股份\n三毛派神\n登海种业\n盐 田 港\n上工申贝\n沃森生物\n中国石化\n中材国际\n玲珑轮胎\n天华超净\n鸿博股份\n吉峰农机\n众源新材\n志邦股份\n光洋股份\n柳 工\n中南建设\n博彦科技\n光力科技\n美亚柏科\n兰州民百\n宝鼎科技\n东湖高新\n美亚光电\n华帝股份\n智度股份\n美丽生态\n中远海控\n东港股份\n江阴银行\n宝新能源\n建发股份\n众兴菌业\n仟源医药\n祁连山\n*ST昌鱼\n常山药业\n贝达药业\n建新股份\n三六五网\n宝色股份\n龙马环卫\n粤泰股份\n钧达股份\n天晟新材\n晨鸣纸业\n金 螳 螂\n双鹭药业\n中国太保\n达威股份\n光韵达\n界龙实业\n华泰股份\n天创时尚\n尖峰集团\n迪马股份\n探路者\n强力新材\n纳思达\n立霸股份\n创维数字\n华谊集团\n浙江交科\n盐湖股份\n广州发展\n风神股份\n新湖中宝\n湖南发展\n华夏幸福\n片仔癀\n中信银行\n蓝英装备\n万通地产\n华讯方舟\n奥佳华\n捷成股份\n山煤国际\n海南橡胶\n柘中股份\n九阳股份\n鱼跃医疗\n全筑股份\n新开源\n香江控股\n交大昂立\n东方网力\n元隆雅图\n派思股份\n沃施股份\n唐德影视\n天康生物\n恒瑞医药\n三安光电\n东方时尚\n冰川网络\n华瑞股份\n天山股份\n海峡环保\n长方集团\n申通地铁\n万和电气\n电广传媒\n航天长峰\n中国海诚\n梦舟股份\n涪陵电力\n铁流股份\n青岛海尔\n力源信息\n金字火腿\n梦洁股份\n健康元\n张 裕Ａ\n万盛股份\n共达电声\n贤丰控股\n桂东电力\n工大高新\n雅戈尔\n设研院\n联美控股\n南京高科\n华天科技\n奥飞娱乐\n航天电子\n荣盛发展\n柳钢股份\n暴风集团\n爱迪尔\n博雅生物\n航天电器\n道明光学\n机器人\n泛微网络\n龙元建设\n鼎捷软件\n岱勒新材\n华业资本\n鸿特精密\n中元股份\n科伦药业\n海南高速\n中科曙光\n科达股份\n长信科技\n海航创新\n星光农机\n美诺华\n龙江交通\n江泉实业\n大亚圣象\n中集集团\n天源迪科\n富安娜\n佛山照明\n财信发展\n三维丝\n美的集团\n双汇发展\n东方钽业\n兰太实业\n敦煌种业\n国际实业\n激智科技\n凯龙股份\n深科技\n恒锋工具\n兆日科技\n青龙管业\n时代万恒\n洽洽食品\n顺发恒业\n美凯龙\n银信科技\n京投发展\n兴发集团\n梅花生物\n川大智胜\n云意电气\n金枫酒业\n利君股份\n科泰电源\n数据港\n天地源\n三维通信\n上实发展\n伟明环保\n中国平安\n信雅达\n天广中茂\n绿地控股\n金逸影视\n粤高速Ａ\n天神娱乐\n香雪制药\n九牧王\n浙大网新\n北京银行\n贵州茅台\n同力水泥\n天目药业\n隆平高科\n三棵树\n冠城大通\n天能重工\n华兰生物\n陕西黑猫\n厦门国贸\n易联众\n台基股份\n永安行\n老百姓\n腾龙股份\n用友网络\n北京城建\n再升科技\n皖江物流\n旺能环境\n昆仑万维\n江苏银行\n国联水产\n沙隆达Ａ\n爱乐达\n广州浪奇\n*ST准油\n水井坊\n聚隆科技\n华谊兄弟\n安妮股份\n五 粮 液\n博汇纸业\n金洲慈航\n苏 泊 尔\n中国交建\n亚宝药业\n吉林化纤\n金路集团\n同洲电子\n二三四五\n凤形股份\n东方通\n齐峰新材\n深圳华强\n明星电缆\n建设银行\n安彩高科\n北信源\n海正药业\n亚泰集团\n鼎信通讯\n木林森\n万里石\n家家悦\n金陵饭店\n华中数控\n达 意 隆\n万马股份\n南风股份\n卫宁健康\n洋河股份\n金晶科技\n中国重汽\n辉煌科技\n东兴证券\n多伦科技\n太化股份\n瑞斯康达\n招商轮船\n雏鹰农牧\n恒生电子\n巴安水务\n宁夏建材\n东莞控股\n杭州银行\n深圳机场\n冠昊生物\n瑞茂通\n贵人鸟\n招商证券\n华侨城Ａ\n方正科技\n华孚时尚\n龙津药业\n拓普集团\n天原集团\n东晶电子\n江铃汽车\n新澳股份\n天坛生物\n安正时尚\n隆基股份\n名雕股份\n长盈精密\n澳柯玛\n网达软件\n粤 水 电\n华夏银行\n现代制药\n金科文化\n润达医疗\n赛摩电气\n花园生物\n福建高速\n三友化工\n无锡银行\n长春经开\n易尚展示\n太极股份\n京华激光\n中毅达\n滨化股份\n一拖股份\n银河生物\n长航凤凰\n科士达\n全 聚 德\n神州泰岳\n华电重工\n中农立华\n上海家化\n永艺股份\n森特股份\n中国铁建\n顺鑫农业\n紫鑫药业\n中信海直\n山东路桥\n深物业A\n上柴股份\n克来机电\n长城汽车\n汉威科技\n亚盛集团\n福田汽车\n申万宏源\n广州酒家\n埃斯顿\n煌上煌\n同花顺\n鲁商置业\n七 匹 狼\n桐昆股份\n绵石投资\n易德龙\n上海物贸\n伊利股份\n合锻智能\n华贸物流\n上海三毛\n东阿阿胶\n睿康股份\n奋达科技\n云南能投\n游族网络\n杰瑞股份\n中国中铁\n青岛啤酒\n黑猫股份\n梅安森\n同方股份\n绿盟科技\n创意信息\n浪潮软件\n浙能电力\n来伊份\n华星创业\n兰石重装\n重庆路桥\n西水股份\n维维股份\n新华百货\n中直股份\n莎普爱思\n中国石油\n康盛股份\n中海达\n哈高科\n景兴纸业\n众合科技\n首钢股份\n红旗连锁\n川环科技\n美尔雅\n中远海能\n赛为智能\n三星医疗\n银邦股份\n爱施德\n光大银行\n浙江富润\n西藏发展\n荣科科技\n万业企业\n芭田股份\n三一重工\n银禧科技\n广宇集团\n神州高铁\n常熟银行\n证通电子\n天瑞仪器\n国祯环保\n中国神华\n洁美科技\n中国汽研\n兴业银行\n法 尔 胜\n金花股份\n东吴证券\n中洲控股\n新 大 陆\n海普瑞\n*ST柳化\n天富能源\n昌红科技\n海南瑞泽\n*ST宝实\n杰恩设计\n铁龙物流\n三湘印象\n张家界\n金禾实业\n中远海发\n阳光照明\n新泉股份\n歌力思\n榕基软件\n厦门港务\n上海机电\n泸州老窖\n澄星股份\n靖远煤电\n白云机场\n宁波港\n正丹股份\n物产中大\n襄阳轴承\n天夏智慧\n浙江美大\n恒立液压\n顾家家居\n华润双鹤\n中航光电\n千金药业\n圣农发展\n佳讯飞鸿\n宇通客车\n继峰股份\n保利地产\n天润曲轴\n广誉远\n深纺织Ａ\n南方汇通\n奥特佳\n利安隆\n北京文化\n长江润发\n新五丰\n华舟应急\n鲁阳节能\n拓尔思\n国药一致\n徐家汇\n科新机电\n印纪传媒\n千禾味业\n汇川技术\n雪榕生物\n华远地产\n上海临港\n元力股份\n欢瑞世纪\n汉鼎宇佑\n金新农\n透景生命\n振华重工\n理工光科\n新乡化纤\n世纪星源\n云煤能源\n海兴电力\n天茂集团\n莱美药业\n同有科技\n福耀玻璃\n中钨高新\n索菲亚\n宋城演艺\n交运股份\n中体产业\n星星科技\n鹏博士\n乐凯新材\n广发证券\n歌华有线\n三维股份\n一汽夏利\n上海机场\n新农开发\n希努尔\n乐普医疗\n浙数文化\n东方新星\n闽发铝业\n深南电路\n豪迈科技\n陆家嘴\n海鸥卫浴\n东富龙\n中国银行\n东北证券\n中国国旅\n交通银行\n通富微电\n四维图新\n厦门空港\n永和智控\n易华录\n广弘控股\n山东海化\n亿晶光电\n周大生\n重庆百货\n棒杰股份\n益丰药房\n新华龙\n鸿利智汇\n拓日新能\n齐心集团\n思创医惠\n小康股份\n艾比森\n山推股份\n王府井\n晶方科技\n雪 莱 特\n振静股份\n华纺股份\n*ST坊展\n宏大爆破\n二六三\n龙净环保\n承德露露\n迎驾贡酒\n丰林集团\n粤宏远Ａ\n大众交通\n锡业股份\n骆驼股份\n科大智能\n燕京啤酒\n大港股份\n四创电子\n獐子岛\n龙头股份\n海利生物\n炬华科技\n迪安诊断\n光线传媒\n锦江股份\n齐翔腾达\n鞍重股份\n汇通能源\n凯恩股份\n汉邦高科\n新 海 宜\n四川金顶\n华域汽车\n利欧股份\n苏常柴Ａ\n太极实业\n海欣股份\n大连港\n杭齿前进\n航民股份\n广东甘化\n人民网\n日盈电子\n世联行\n天润数娱\n贵绳股份\n云南白药\n中新赛克\n远方信息\n融钰集团\n锦江投资\n易成新能\n中水渔业\n沈阳化工\n江海股份\n楚天科技\n华联股份\n东材科技\n兴源环境\n澳洋科技\n民生银行\n江苏阳光\n洪城水业\n华宏科技\n神州长城\nST常林\n农发种业\n美芝股份\n旋极信息\n首航节能\n通鼎互联\n凯美特气\n渤海轮渡\n山河药辅\n王子新材\n新界泵业\n汉缆股份\n星辉娱乐\n重庆水务\n三维工程\n美好置业\n健帆生物\n兆驰股份\n通化东宝\n乐山电力\n天鹅股份\n渝 开 发\n欣龙控股\n长江投资\n丽珠集团\n青海华鼎\n湖北广电\n东南网架\n黑牡丹\n上汽集团\n东方明珠\n实丰文化\n康恩贝\n宜宾纸业\n海默科技\n海油工程\n中科金财\n东华科技\n国投电力\n太平鸟\n合众思壮\n天津港\n*ST新城\n星宇股份\n工商银行\n弘宇股份\n光明乳业\n西藏城投\n申科股份\n延华智能\n露天煤业\n岭南控股\n*ST青松\n华金资本\n永太科技\n中国电建\n国药股份\n星源材质\n西安旅游\n佳隆股份\n金力泰\n金盾股份\n四方股份\n上海建工\n云投生态\n怡达股份\n宝信软件\n广电电气\n日照港\n海南椰岛\n大龙地产\n富春股份\n*ST 中绒\n新亚制程\n建投能源\n浙江震元\n华懋科技\n广电网络\n锦州港\n金证股份\n太安堂\n今世缘\n商赢环球\n多喜爱\n冠豪高新\n凯利泰\n永高股份\n东方精工\n黔轮胎Ａ\n文投控股\n高伟达\n中原传媒\n北京科锐\n黄山旅游\n菲达环保\n博信股份\n长城影视\n华闻传媒\n通策医疗\n小天鹅Ａ\n徐工机械\n陕西煤业\n天地科技\n合金投资\n济民制药\n亚星客车\n御银股份\n海欣食品\n韩建河山\n联创电子\n宁波精达\n合诚股份\n力生制药\n京运通\n润邦股份\n亚通股份\n新华医疗\n东诚药业\n世纪瑞尔\n普邦股份\n万润股份\n招商银行\n中国国贸\n华宇软件\n锦龙股份\n沧州大化\n强生控股\n兖州煤业\n浙商证券\n阳光电源\n摩恩电气\n旷达科技\n*ST丹科\n中远海科\n轻纺城\n申能股份\n南京医药\n中国中车\n长久物流\n南卫股份\n中华企业\n德威新材\n飞荣达\n茂业通信\n览海投资\n鹿港文化\n酒鬼酒\n长电科技\n龙泉股份\n沃特股份\n金河生物\n大元泵业\n天房发展\n利亚德\n金鹰股份\n*ST爱富\n史丹利\n福建金森\n安徽水利\n亚太实业\n扬子新材\n初灵信息\n航天机电\n中衡设计\n福能股份\n华东医药\n万孚生物\n威帝股份\n仙琚制药\n亚邦股份\n东方航空\n南京化纤\n桂林旅游\n苏交科\n珠江控股\n同达创业\n白云电器\n浪潮信息\n飞科电器\n国民技术\n金莱特\n丰元股份\n华鹏飞\n西藏旅游\n环能科技\n神思电子\n白云山\n山东章鼓\n川投能源\n上海莱士\n北部湾港\n中航地产\n国投中鲁\n莱宝高科\n欣旺达\n中航机电\n古井贡酒\n大豪科技\n润和软件\n乐凯胶片\n微光股份\n安硕信息\n海立股份\n三圣股份\n科林电气\n*ST宏盛\n博敏电子\n新文化\n方直科技\n金固股份\n安记食品\n山东出版\n帝龙文化\n创新医疗\n三聚环保\n博思软件\n新华文轩\n百川能源\n瑞康医药\n正平股份\n长荣股份\n海通证券\n应流股份\n神开股份\n津膜科技\n国机通用\n西部黄金\n中泰化学\n贵阳银行\n凤凰光学\n金利华电\n三特索道\n华东电脑\n萃华珠宝\n浙江仙通\n南洋股份\n德尔股份\n上海沪工\n乐心医疗\n中信证券\n四方冷链\n卫 士 通\n九鼎投资\n必康股份\n麦趣尔\n宜华健康\n巨人网络\n平治信息\n科达利\n兆易创新\n城地股份\n步长制药\n嘉澳环保\n朗迪集团\n五洲新春\n科森科技\n杭电股份\n东方电缆\n引力传媒\n司太立\n集友股份\n维力医疗\n圣达生物\n德新交运\n赛福天\n山东华鹏\n大唐发电\n凤凰传媒\n嘉泽新能\n中国中冶\n中国铝业\n*ST锐电\n陕鼓动力\n君正集团\n中国西电\n晋亿实业\n宁波热电\n渤海活塞\n江苏有线\n*ST嘉陵\n洛阳玻璃\n石化油服\n厦华电子\n星湖科技\n*ST京城\n人民同泰\n新华传媒\n益民集团\n中路股份\n*ST厦工\n华北制药\n山西汾酒\n天业股份\n天津磁卡\n宁波海运\n保税科技\n鲁银投资\n汉商集团\n天海投资\n一汽富维\n实达集团\nS*ST前锋\n绿庭投资\n中船防务\n奥瑞德\n哈药股份\n豫园股份\n富控互动\n申达股份\n鹏起科技\n惠泉啤酒\n中珠医疗\n国睿科技\n老白干酒\n时代出版\n莫高股份\n狮头股份\n栖霞建设\n宏达矿业\n海航基础\n腾达建设\n驰宏锌锗\n天药股份\n信威集团\n瑞贝卡\n*ST海润\n盘江股份\n广东明珠\n天科股份\n三房巷\n通葡股份\n正源股份\n亚星化学\n营口港\nXD万华化\n广汇汽车\n华仪电气\n江苏舜天\n重庆港九\n亿利洁能\n嘉化能源\n航天信息\n外运发展\n赣粤高速\n国电南自\n大湖股份\n广汇能源\nST成城\n中昌数据\n民丰特纸\n赤天化\n瀚叶股份\n海航控股\n江苏吴中\n华资实业\n国中水务\n安通控股\n太原重工\n永泰能源\n宝硕股份\n中国船舶\n*ST新亿\n太极集团\n西宁特钢\n*ST天成\n大名城\n东方金钰\n中葡股份\n海泰发展\n东风科技\n宋都股份\n康欣新材\n宁波联合\n四川路桥\n东风汽车\n朗新科技\n隆盛科技\n中孚信息\n民德电子\n南京聚隆\n新雷能\n贝斯特\n会畅通讯\n朗科智能\n辰安科技\n山鼎设计\n迈克生物\n康拓红外\n双杰电气\n鲍斯股份\n航新科技\n中光防雷\n迦南科技\n三环集团\n腾信股份\n飞天诚信\n光环新网\n光一科技\n麦捷科技\n邦讯技术\n聚飞光电\n吴通控股\n华昌达\n海联讯\n新莱应材\n飞力达\n纳川股份\n福安药业\n佳士科技\n通裕重工\n智慧松德\n迪威迅\n新研股份\n科融环境\n量子高科\n星普医科\n大富科技\n锦富技术\n锐奇股份\n易世达\n坚瑞沃能\n盛运环保\n康芝药业\n华谊嘉信\n世纪鼎利\n福瑞股份\n华力创通\n回天新材\n上海凯宝\n梅泰诺\n金龙机电\n宝德股份\n立思辰\n盈趣科技\n香山股份\n麦格米特\n凯中精密\n普路通\n南兴装备\n万达电影\n中矿资源\n葵花药业\n燕塘乳业\n奥瑞金\n美盛文化\n顾地科技\n猛狮科技\n德联集团\n万润科技\n民盛金科\n三垒股份\n瑞和股份\n艾格拉斯\n亚夏汽车\nST龙力\n八菱科技\n圣阳股份\n中京电子\n雷柏科技\n群兴玩具\n顺灏股份\n三七互娱\n千红制药\n东方铁塔\n鸿路钢构\n云图控股\n林州重机\n海源机械\n光正集团\n天桥起重\n日发精机\n恺英网络\n达华智能\n涪陵榨菜\n科林环保\n金正大\n益生股份\n天马精化\n壹桥股份\n龙星化工\n江苏神通\n尤夫股份\n胜利精密\n凯撒文化\n中原特钢\n达实智能\n爱仕达\n建研集团\n信邦制药\n南洋科技\n东山精密\n千方科技\n亚太药业\n台海核电\n神剑股份\n森源电气\n富临运业\n顺丰控股\n漫步者\n高乐股份\n潮宏基\n海宁皮城\n人人乐\n*ST三泰\n博云新材\n大 东 南\n德奥通航\n升达林业\n步 步 高\n合兴包装\n恒康医疗\n特 尔 佳\n利达光电\n巴士在线\n深圳惠程\n中航三鑫\n常铝股份\n新光圆成\n恒星科技\n天马股份\n三变科技\n广博股份\n浔兴股份\n山河智能\n万邦德\n沙钢股份\n凯瑞德\n云南旅游\n轴研科技\n久联发展\n丽江旅游\n华信国际\n东信和平\n霞客环保\n德豪润达\n华邦健康\n华润三九\n中弘股份\n中通客车\n凯迪生态\n中粮生化\n山大华特\n*ST天化\n云内动力\n现代投资\n东凌国际\n云南铜业\n吉电股份\n陕西金叶\n冰轮环境\n云铝股份\n凯撒旅游\n长江证券\n*ST平能\n通化金马\n浩物股份\n新华制药\n南风化工\n苏宁环球\n恒逸石化\n厦门信达\n*ST华泽\n建新矿业\n东方电子\n海航投资\n平潭发展\n太阳能\n海南海药\n供销大集\n航天发展\n中天金融\n粤电力Ａ\n万泽股份\n万 家 乐\n美菱电器\n荣安地产\n国际医学\n华塑控股\n鄂武商Ａ\n渤海金控\n胜利股份\n华数传媒\n广聚能源\n皇庭国际\n泛海控股\n中国天楹\n神州数码\n中粮地产\n深深房Ａ\n深赤湾Ａ\n深深宝Ａ\n深中华A\n全新好\n深振业Ａ\n华测导航\nST生化\n和仁科技\n牧原股份\n传艺科技\n庄园牧场\n浩云科技\n华钰矿业\n元祖股份\n万邦达\n曲江文旅\n贵航股份\n汉森制药\n长江电力\n吉祥航空\n华仁药业\n金通灵\n红蜻蜓\n万东医疗\n新日恒力\n光大证券\n伊力特\n张江高科\n中南传媒\n捷顺科技\n瀚蓝环境\n维宏股份\n精锻科技\n深华发Ａ\n曲美家居\n中威电子\n景嘉微\n安信信托\n赢时胜\n天翔环境\n永利股份\n中金环境\n达志科技\n东方日升\n金明精机\n金龙汽车\n兰州黄河\n湘电股份\n国机汽车\n奇信股份\n龙大肉食\n中山公用\n杭锅股份\n视觉中国\n恒信东方\n南天信息\n福成股份\n特变电工\n江苏国信\n深天地Ａ\n北京城乡\n广日股份\n宏图高科\n中兴商业\n宜华生活\n潍柴重机\n文山电力\n尚荣医疗\n羚锐制药\n围海股份\n好利来\n优博讯\n远达环保\n精伦电子\n慈文传媒\n安井食品\n隧道股份\n恒丰纸业\n黑牛食品\n雄韬股份\n东阳光科\n兄弟科技\n华铁股份\n农 产 品\n雷鸣科化\n翠微股份\n山东威达\nST南化\n百利科技\n*ST沪科\n博深工具\n清水源\n新天然气\n信捷电气\n哈森股份\n钱江生化\n杭钢股份\n奥克股份\n马应龙\n丰乐种业\n登云股份\n三角轮胎\n新开普\n永鼎股份\n奥拓电子\n嘉欣丝绸\n华自科技\n新朋股份\n文科园林\n四川九洲\n美联新材\n三元股份\n柏堡龙\n茂业商业\n正邦科技\n新力金融\n深圳能源\n悦达投资\n四方达\n川化股份\n南京公用\n朗姿股份\n招商公路\n广汽集团\n小商品城\n金石东方\n上海环境\n中核钛白\n雪峰科技\n光电股份\n集智股份\n国元证券\n本钢板材\n名家汇\n鲁 泰Ａ\n西安饮食\n南京新百\n华扬联众\n数字政通\n新大洲Ａ\n北辰实业\n仁和药业\n南威软件\n德尔未来\n奥维通信\n博实股份\n凌云股份\n东江环保\n中环股份\n青青稞酒\n华统股份\n皖能电力\n天龙股份\n荃银高科\n新世界\n越秀金控\n龙韵股份\n利源精制\n英飞拓\n奇正藏药\n金亚科技\n丽鹏股份\n超图软件\n金安国纪\n晨光文具\n新疆浩源\n卓郎智能\n东风股份\n洪涛股份\n南都电源\n上海九百\n江南高纤\n吴江银行\n航发科技\n浦东建设\n科大国创\n汇中股份\n林海股份\n永贵电器\n*ST智慧\n比亚迪\n泰达股份\n华茂股份\n蓝科高新\n深高速\n宁波富邦\n和而泰\n银轮股份\n昆药集团\n力星股份\n双环传动\n兰花科创\n城投控股\n哈尔斯\n路畅科技\n上海电力\n人福医药\n汉得信息\n数码科技\n潍柴动力\n联环药业\n三 力 士\n启明星辰\n四川成渝\n杭州解百\n科锐国际\n共进股份\n三峡水利\n北大医药\n东土科技\n神奇制药\n丰原药业\n读者传媒\n中粮糖业\n雪人股份\n富奥股份\n凤竹纺织\n桂林三金\n天沃科技\n鹏翎股份\n福达股份\n龙宇燃油\n广东鸿图\n兴业证券\n神州信息\n浙江广厦\n春兴精工\n恒力股份\n姚记扑克\n同济堂\n双箭股份\n漳州发展\n紫光股份\n裕兴股份\n天龙光电\n九 芝 堂\n三鑫医疗\n秀强股份\n兴业股份\n天银机电\n石基信息\n大东方\n安控科技\n恒泰实达\n华昌化工\n吉林高速\n津滨发展\n远东传动\n常青股份\n宜通世纪\n宝鹰股份\n中国联通\n德美化工\n民生控股\n第一创业\n北方国际\n惠而浦\n道恩股份\n加加食品\n西昌电力\n中新科技\n皖新传媒\n金一文化\n汉王科技\n*ST沈机\n鲁信创投\n广汇物流\n快克股份\n国投资本\n诺 普 信\n幸福蓝海\n中航电子\n浦东金桥\n科远股份\n舒泰神\n乔治白\n京威股份\n兴民智通\n惠发股份\n闰土股份\n泰胜风能\n皇氏集团\n国金证券\n瑞尔特\n科力尔\n吉林敖东\n天喻信息\n新华联\nST慧球\n宜安科技\n西部证券\n中色股份\n苏州高新\n平高电气\n智云股份\n宝钢股份\n际华集团\n晋西车轴\n山东高速\n津劝业\n新纶科技\n丰华股份\n大禹节水\n欧亚集团\n东音股份\n金徽酒\n华能国际\n*ST上普\n博闻科技\n精准信息\n天壕环境\n江化微\n雪浪环境\n利德曼\n东华软件\n昆百大Ａ\n中电广通\n*ST运盛\n摩登大道\n亿利达\n长白山\n上海医药\n中航重机\n中电鑫龙\n思源电气\n杭萧钢构\n佳发安泰\n金隅集团\n远兴能源\n安居宝\n精艺股份\n江苏国泰\n山东金泰\n天业通联\n康达尔\n三超新材\n中原环保\n安车检测\n中持股份\n西部矿业\n通润装备\n铜陵有色\n开润股份\n诚迈科技\n大西洋\n克明面业\n首商股份\n武汉控股\n巨轮智能\n珠江啤酒\n华安证券\n美康生物\n乐金健康\n精华制药\n九洲电气\n菲林格尔\n华达科技\n中装建设\n游久游戏\n健民集团\n北部湾旅\n申华控股\n宝光股份\n大康农业\n春兰股份\n风范股份\n以岭药业\n百隆东方\n软控股份\n金智科技\n海螺型材\n百联股份\n中原高速\n商业城\n国海证券\n中国软件\n闽东电力\n富春环保\n恒银金融\n吉林森工\n莱茵体育\n哈投股份\n楚天高速\n金运激光\n西南证券\n川仪股份\n欧浦智网\n皖天然气\n爱康科技\n西藏矿业\n方大化工\n文化长城\n万 科Ａ\n郴电国际\n南宁百货\n开元股份\n联明股份\n宝莱特\n雄塑科技\n创力集团\n联发股份\n国统股份\n华东科技\n成都路桥\n紫金矿业\n祥源文化\n泰合健康\n中飞股份\n仙坛股份\n宁波高发\n中原证券\n西藏药业\n广晟有色\n宝胜股份\n朗源股份\n华峰超纤\n奥康国际\n国轩高科\n汤臣倍健\n盛通股份\n新华网\n力帆股份\n天圣制药\n环旭电子\n通宝能源\n恒立实业\n山东药玻\n云赛智联\n华映科技\n贵糖股份\n旭光股份\n新 华 都\n兔 宝 宝\n宜昌交运\n广信材料\n广泽股份\n开创国际\n长青集团\n南宁糖业\n大洋电机\n上海电气\n林洋能源\n任子行\n四环生物\n黔源电力\n中国动力\n三雄极光\n纽威股份\n双星新材\n绿城水务\n民和股份\n东睦股份\n诚意药业\n大恒科技\n绿茵生态\n安利股份\n和邦生物\n日上集团\n中化国际\n隆基机械\n青岛双星\n东安动力\n中视传媒\n开开实业\n卧龙电气\n中恒集团\n天宸股份\n中信重工\n益佰制药\n东方海洋\n如意集团\n银鸽投资\n富森美\n中国医药\n圆通速递\n开滦股份\n慈星股份\n中煤能源\n宁沪高速\n泰豪科技\n浙江世宝\n中际旭创\n迪森股份\n长城动漫\n烽火电子\n万向德农\n双良节能\n佛塑科技\n双成药业\n海格通信\n双象股份\n南岭民爆\n合肥百货\n寒锐钴业\n江南化工\n杭叉集团\n特 力Ａ\n万顺股份\n上海电影\n金种子酒\n中电环保\n苏州固锝\n中炬高新\n爱普股份\n合康新能\n科斯伍德\n友阿股份\n华海药业\n中泰股份\n先河环保\n博世科\n亚厦股份\n嘉应制药\n海康威视\n*ST河化\n中文在线\n惠达卫浴\n青海春天\n南方传媒\n国新能源\n新集能源\n长园集团\n第一医药\n新美星\n欣天科技\n福鞍股份\n太平洋\n中航高科\n长源电力\n鲁西化工\n宏创控股\n光迅科技\n东易日盛\n贵州百灵\n宁波富达\n绿康生化\n国泰君安\n龙源技术\n新野纺织\n长缆科技\n江南水务\n安源煤业\n长安汽车\n华电国际\n华建集团\n美达股份\n申通快递\n豫能控股\n聚龙股份\n恩华药业\n晓程科技\n中工国际\n亚太科技\n方正证券\n中牧股份\n珠江钢琴\n神宇股份\n红阳能源\n天音控股\n航发控制\n浙江鼎力\n北纬科技\n奥联电子\n中铁工业\n徕木股份\n吉鑫科技\n明星电力\n国农科技\n花王股份\n华微电子\n九州通\n天目湖\n拓斯达\n鸿达兴业\n广生堂\n今飞凯达\n广深铁路\n北玻股份\n恒宝股份\n赛升药业\n恒为科技\n江淮汽车\n达安基因\n海越股份\n唐山港\n向日葵\n汇源通信\n莱茵生物\n道道全\n四川长虹\n智光电气\n融捷股份\n健盛集团\n灵康药业\n长生生物\n万丰奥威\n五矿资本\n外高桥\n启迪古汉\n凤凰股份\n鑫茂科技\n赛轮金宇\n节能风电\n华虹计通\n浙江医药\n毅昌股份\n百花村\n康缘药业\n梦网集团\n岳阳林纸\n济川药业\n海信科龙\n朗玛信息\n银泰资源\n苏利股份\n西藏天路\n永新股份\n报 喜 鸟\n嘉寓股份\n京泉华\n新时达\n汇冠股份\n国瓷材料\n九洲药业\n浙江东方\n上海梅林\n江苏雷利\n科隆股份\n西部创业\n大同煤业\n海虹控股\n*ST郑煤\n国电电力\n盾安环境\n我乐家居\n时代新材\n瑞凌股份\n明家联合\n东方电气\n中成股份\n沪电股份\n深圳燃气\n中国重工\n湖北能源\n东方集团\n圣邦股份\n西部牧业\n航天通信\n安琪酵母\n东北制药\n好当家\n日月股份\n华明装备\n海亮股份\n星云股份\n金山股份\n赛托生物\n安诺其\n积成电子\n西王食品\n长高集团\n桃李面包\n海印股份\n佳沃股份\n京蓝科技\n百大集团\n九安医疗\n通程控股\n四川美丰\n九有股份\n怡 亚 通\n京天利\n普利制药\n深天马Ａ\n吉视传媒\n辽宁成大\n泰尔股份\n中国电影\n阳泉煤业\n联络互动\n万林股份\n金鸿控股\n日出东方\n东旭光电\n中国银河\n理邦仪器\n北斗星通\n峨眉山Ａ\n红 宝 丽\n漳泽电力\n复星医药\n五矿发展\n太空板业\n文一科技\n兴业科技\n内蒙华电\n博济医药\n生物股份\n清新环境\n新北洋\n福斯特\n道氏技术\n特发信息\n长江传媒\n浙江众成\n国美通讯\n崇达技术\n中富通\n维尔利\n弘业股份\n春秋航空\n汇鸿集团\n友好集团\n江西铜业\n苏试试验\n太阳纸业\n德宏股份\n艾华集团\n裕同科技\n海德股份\n乾照光电\n卫信康\n康斯特\n众业达\n国风塑业\n鹭燕医药\n众泰汽车\n麦达数字\n弘讯科技\n大连电瓷\n亿帆医药\n新洋丰\n五洋科技\n智慧能源\n华西股份\n康尼机电\n中 关 村\n特锐德\n中国核建\n豫光金铅\n艾迪精密\n新兴铸管\n上海石化\n理工环科\n雅本化学\n中超控股\n河钢股份\n四通股份\n石大胜华\n黑芝麻\n中能电气\n浩丰科技\n远大智能\n内蒙一机\n苏大维格\n南京熊猫\n兴蓉环境\n中化岩土\n中钢国际\n黄河旋风\n康美药业\n邦宝益智\n凯乐科技\n文峰股份\n广百股份\n武汉中商\n数字认证\n西部建设\n*ST华菱\n佳都科技\n*ST中基\n电科院\n铜峰电子\n飞马国际\n华泰证券\n航发动力\n黄山胶囊\n三元达\n高能环境\n中原内配\n恒天海龙\n宝钢包装\n天润乳业\n通产丽星\n岷江水电\n拉芳家化\n赞宇科技\n瑞特股份\n三联虹普\n宏润建设\n金海环境\n珈伟股份\n航天工程\n精达股份\n蓝黛传动\n中来股份\n岭南园林\n科华恒盛\n南通锻压\n银河电子\n宝通科技\n华立股份\n庞大集团\n中国核电\n腾邦国际\n建艺集团\n康强电子\n青岛金王\n荣泰健康\n凯盛科技\n北京利尔\n盈峰环境\n奥 特 迅\n福日电子\n宗申动力\n京东方Ａ\n濮耐股份\n中潜股份\n*ST三维\n中亚股份\n*ST一重\n*ST松江\n京能电力\n江山股份\n综艺股份\n巨化股份\n华媒控股\n洪都航空\n红宇新材\n海思科\n北方华创\n宝泰隆\n中科创达\n思维列控\n安靠智电\n思特奇\n司尔特\n山东矿机\n高德红外\n华脉科技\n凌霄泵业\n新潮能源\n柳州医药\n中顺洁柔\n华能水电\n宏达新材\n祥龙电业\n启迪设计\n南山铝业\n惠伦晶体\n银河磁体\n华锦股份\n中储股份\n良信电器\n中科三环\n碧水源\n红豆股份\n火炬电子\n玉龙股份\n德赛电池\n得邦照明\n巨星科技\n骅威文化\n溢多利\n久远银海\n迪瑞医疗\n国恩股份\n润欣科技\n同和药业\n超华科技\n茂化实华\n钱江水利\n亿通科技\n奥普光电\n联创互联\n海洋王\n海马汽车\n通宇通讯\n青松股份\n曙光股份\n中联重科\n紫光国芯\n陕天然气\n惠威科技\n国星光电\n久之洋\n金城医药\n炼石有色\n三川智慧\n万讯自控\n可立克\n雪迪龙\n三丰智能\n合肥城建\n启明信息\n模塑科技\n东方国信\n海南矿业\n桂冠电力\n博晖创新\n龙溪股份\n宁波建工\n全通教育\n亚振家居\n国信证券\n钢研高纳\n达刚路机\n*ST重钢\n山东钢铁\n恒泰艾普\n维科精华\n经纬纺机\n网宿科技\n吉药控股\n抚顺特钢\n海利尔\n出版传媒\n亚太股份\n荣之联\n珍宝岛\n宁波银行\n星徽精密\n全志科技\n中闽能源\n温州宏丰\n大冷股份\n蓝焰控股\n华体科技\n云天化\n东宝生物\n广济药业\n拓维信息\n科华控股\n中再资环\n泰禾集团\n三德科技\n宏发股份\n运达科技\n川润股份\n博瑞传播\n皖通科技\n湘邮科技\n汇顶科技\n思美传媒\n岱美股份\n沃华医药\n日播时尚\n恒通股份\n精工钢构\n太龙药业\n泰和新材\n昊华能源\n华电能源\n瑞泰科技\n华天酒店\n新黄浦\n许继电气\n渝三峡Ａ\n广安爱众\n安泰集团\n永辉超市\n天保基建\n艾德生物\n能科股份\n东华测试\n宝钛股份\n贵广网络\n盛路通信\n永安药业\n悦心健康\n久立特材\n中润资源\n新联电子\n好想你\n长海股份\n金陵药业\n万集科技\n秦川机床\n佛慈制药\n荣丰控股\n广联达\n诺邦股份\n华灿光电\n东方创业\n坚朗五金\n伟星股份\n新天科技\n金浦钛业\n英特集团\n东方电热\n英洛华\n华光股份\n安科生物\n东软载波\n海王生物\n跃岭股份\n威华股份\n高盟新材\n汉钟精机\n焦点科技\n标准股份\n仁智股份\n海翔药业\n南华生物\n扬帆新材\n瑞丰高材\n乐惠国际\n航天动力\n起步股份\n高新兴\n秦安股份\n特一药业\n路通视信\n诺力股份\n延长化建\n古鳌科技\n中百集团\n赛隆药业\n中国宝安\n南方轴承\n西部材料\n三花智控\n惠博普\nST新梅\n明牌珠宝\n苏州科达\n东方锆业\n建设机械\n天域生态\n富临精工\n龙建股份\n海伦哲\n安徽合力\n中新药业\n皖维高新\n韵达股份\n耀皮玻璃\n海陆重工\n牧高笛\n英搏尔\n数源科技\n金圆股份\n莲花健康\n合力泰\n安泰科技\n中钢天源\n平安银行\n大众公用\n三利谱\n华平股份\n宏达高科\n万通智控\n恒顺众昇\n华铁科技\n传化智联\n东软集团\n国光股份\n同济科技\n天山生物\n晶盛机电\n金信诺\n百润股份\n今天国际\n金龙羽\n天宝食品\n刚泰控股\n*ST普林\n河北宣工\n中航飞机\n海伦钢琴\n惠天热电\n日海通讯\n环球印务\n顶点软件\n中国卫星\n中宠股份\n世纪华通\n方正电机\n威 尔 泰\n联建光电\n比音勒芬\n禾丰牧业\n陕国投Ａ\n多氟多\n海波重科\n伟隆股份\n创元科技\n赛象科技\n香溢融通\n雅克科技\n宏辉果蔬\n新疆天业\n华丽家族\n长城电工\n坤彩科技\n和佳股份\n山东地矿\n中航电测\n海能达\n太钢不锈\n东方网络\n海鸥股份\n全柴动力\n洲际油气\n德展健康\n广田集团\n科华生物\n厦门钨业\n城市传媒\n利民股份\n嘉麟杰\n同大股份\n联化科技\n国检集团\n中文传媒\n诺德股份\n科陆电子\n天汽模\n章源钨业\n振华科技\nST明科\n金刚玻璃\n红日药业\n沧州明珠\n中鼎股份\n金轮股份\n东方银星\n亚泰国际\n弘亚数控\n财通证券\n松发股份\n嘉诚国际\n兰生股份\n塞力斯\n大北农\n隆华节能\n通用股份\nGQY视讯\n中电电机\n*ST大控\n苏州恒久\n康泰生物\n中科新材\n凯文教育\n天马科技\n紫江企业\n中恒电气\n胜宏科技\n华意压缩\n平煤股份\n宁波中百\n联创光电\n鲁亿通\n恒通科技\n同兴达\n包钢股份\n潞安环能\n杰赛科技\n大冶特钢\n广西广电\n法拉电子\n茶花股份\n道森股份\n得润电子\n粤 传 媒\n清源股份\n天铁股份\n瑞普生物\n三星新材\n东方证券\n银之杰\n金牛化工\n飞亚达Ａ\n蒙草生态\n分众传媒\n孚日股份\n迅游科技\n金麒麟\n江山欧派\n浙富控股\n大金重工\n顺络电子\n隆鑫通用\n中航资本\n广电运通\n华工科技\n华鼎股份\n温氏股份\n科大讯飞\n上海能源\n长鹰信质\n双塔食品\n水星家纺\n勤上股份\n鸣志电器\n方盛制药\n大通燃气\n宁波华翔\n汇金通\n青山纸业\n湖南天雁\n星期六\n美邦服饰\n艾艾精工\n明泰铝业\n星网锐捷\n新宝股份\n中马传动\n宏盛股份\n天顺风能\n博士眼镜\n禾望电气\n至正股份\n钱江摩托\n富瀚微\n天首发展\n鼎龙股份\n秦港股份\n动力源\n天通股份\n甘肃电投\n国盛金控\n江特电机\n远大控股\n澳洋顺昌\n首创股份\n两面针\n宁波东力\n科信技术\n*ST大有\n远光软件\n创兴资源\n格林美\n金钼股份\n佩蒂股份\n东珠景观\n新 和 成\n易事特\n*ST紫学\n置信电气\n武进不锈\n江西长运\n神力股份\n金贵银业\n博通股份\n北矿科技\n安奈儿\n科迪乳业\n红 太 阳\n*ST万里\n硕贝德\n康力电梯\n航天晨光\n冀中能源\n荣华实业\n中央商场\n嘉事堂\n英威腾\n星帅尔\n凯普生物\n斯莱克\n农业银行\n常熟汽饰\n龙蟒佰利\n东方能源\n万里马\n万安科技\n老凤祥\n美锦能源\n永创智能\n一心堂\n新疆众和\n新安股份\n桂发祥\n智慧农业\n松芝股份\n奥翔药业\n海兰信\n高争民爆\n郑煤机\n远 望 谷\n长春燃气\n酒钢宏兴\n世名科技\n中航沈飞\n乾景园林\n正业科技\n爱尔眼科\n香梨股份\nST信通\n英唐智控\n大庆华科\n中国科传\n利群股份\n上海凤凰\n振华股份\n博威合金\n盛洋科技\n美尚生态\n华正新材\n世运电路\n圣龙股份\n海特高新\n冠福股份\n键桥通讯\n硅宝科技\n罗顿发展\n汇纳科技\n海联金汇\n株冶集团\n苏宁云商\n大连友谊\n金岭矿业\n华测检测\n连云港\n和科达\n京新药业\n国泰集团\n合纵科技\n通光线缆\n方大炭素\n安科瑞\n怡球资源\n国创高新\n海 利 得\n菲利华\n银宝山新\n北新路桥\n电魂网络\n威创股份\n诚益通\n世嘉科技\n搜于特\n威海广泰\n市北高新\n美晨生态\n鼎汉技术\n江南嘉捷\n安 纳 达\n通威股份\n亚星锚链\n迪生力\n深 赛 格\n*ST墨龙\n园城黄金\n雷迪克\n浙江永强\n兆丰股份\n九华旅游\n威龙股份\n濮阳惠成\nST仰帆\n渤海股份\n普丽盛\n蓝丰生化\n卫星石化\n天和防务\n南 京 港\n景峰医药\n石化机械\n天舟文化\n金桥信息\n盈方微\n耐威科技\n亿联网络\n博创科技\n南钢股份\n超声电子\nST山水\n中油资本\n棕榈股份\n正元智慧\n日科化学\n号百控股\n华荣股份\n劲拓股份\n海信电器\n天士力\n电连技术\n巨力索具\n鞍钢股份\n同益股份\n泰晶科技\n格尔软件\n恒源煤电\n北方导航\n赛意信息\n华银电力\n横河模具\n博腾股份\n永清环保\n英飞特\n长青股份\n德艺文创\n三晖电气\n劲嘉股份\n联得装备\n金诚信\n保变电气\n中信国安\n昊志机电\n凯众股份\n纳尔股份\n天宇股份\n卓翼科技\n京能置业\n好莱客\n新华锦\n正泰电器\n吉华集团\n兴森科技\n视源股份\n神州易桥\n同为股份\n*ST圣莱\n云海金属\n泰山石油\n沃尔核材\n马钢股份\n海天精工\n沪宁股份\n誉衡药业\n正海磁材\n恒润股份\n美年健康\n全信股份\n康弘药业\n高澜股份\n正裕工业\n辰欣药业\n神农基因\n大理药业\n卫光生物\n阳煤化工\n赢合科技\n金太阳\n睿能科技\n英派斯\n氯碱化工\n百川股份\n韶能股份\n启迪桑德\n雷科防务\n上海洗霸\n世纪天鸿\n先锋新材\n光大嘉宝\n中科电气\n超讯通信\n国电南瑞\n快乐购\n深大通\n华升股份\n优德精密\n四通新材\n富满电子\n亚玛顿\n依顿电子\n碳元科技\n三祥新材\n百傲化学\n九鼎新材\n中利集团\n杉杉股份\n哈三联\n基蛋生物\n美克家居\n新宏泰\n西仪股份\n华控赛格\n航天科技\n金财互联\n杭州高新\n斯太尔\n友邦吊顶\n荣晟环保\n新奥股份\n中孚实业\n大参林\n当升科技\n中青旅\n宝莫股份\n太阳电缆\n东华能源\n如通股份\n苏博特\n浙江龙盛\n信立泰\n上海天洋\n浦发银行\n广宇发展\n亚光科技\n飞鹿股份\n晨化股份\n深南电A\n聚光科技\n法兰泰克\n中公高科\n新能泰山\n三木集团\n力盛赛车\n*ST中安\n海顺新材\n联泰环保\n大连热电\n中国中期\n鹏鹞环保\n皖通高速\n天奇股份\n君禾股份\n宁波韵升\n益盛药业\n新易盛\n精功科技\n贝因美\n东方园林\n西山煤电\n光莆股份\n焦作万方\n佳创视讯\n三夫户外\n汇嘉时代\n美盈森\n鹏辉能源\n绝味食品\n博天环境\n铁汉生态\n百洋股份\n通达动力\nTCL 集团\n兆新股份\n中金黄金\n美思德\n伟星新材\n拓邦股份\n三江购物\n东方市场\n高新发展\n寿仙谷\n龙洲股份\n金达威\n永兴特钢\n天华院\n中兵红箭\n农尚环境\n宏达股份\n海得控制\n中材节能\n维格娜丝\n和晶科技\n浙江东日\n天龙集团\n广信股份\n大丰实业\n岳阳兴长\n恒锋信息\n中核科技\n泰禾光电\n福晶科技\n双林股份\n先进数通\n五矿稀土\n均胜电子\n富邦股份\n东旭蓝天\n厚普股份\n开能环保\n长春一东\n中天科技\n金域医学\n威星智能\n金能科技\n华峰氨纶\n合力科技\n麦迪电气\n欧比特\n亚威股份\n中金岭南\n中国出版\n丹邦科技\n爱司凯\n开立医疗\n深桑达Ａ\n华阳集团\n至纯科技\n深圳新星\n乐歌股份\n朗博科技\n阳普医疗\n天孚通信\n金风科技\n金洲管道\n康惠制药\n熊猫金控\n新光药业\n盛屯矿业\n太辰光\n江中药业\n秋林集团\n富瑞特装\n恒华科技\n方大特钢\n兴业矿业\n八一钢铁\n容大感光\n宝馨科技\n露笑科技\n天海防务\n晶瑞股份\n川金诺\n上海亚虹\n亿纬锂能\n罗莱生活\n贵研铂业\n百达精工\n深冷股份\n锌业股份\n创业环保\n振芯科技\n尔康制药\n鄂尔多斯\n电光科技\n新筑股份\n雅百特\n北方稀土\n山东黄金\n瑞丰光电\n穗恒运Ａ\n新疆火炬\n湘油泵\n龙蟠科技\n移为通信\n康德莱\n美力科技\n辉丰股份\n捷荣技术\n金发科技\n嘉凯城\n安凯客车\n藏格控股\n万里扬\n雄帝科技\n诚邦股份\n新通联\n东尼电子\n北巴传媒\n醋化股份\n万向钱潮\n广东榕泰\n奥士康\n口子窖\n景旺电子\n创源文化\n*ST弘高\n西部资源\n金卡智能\n熙菱信息\n佐力药业\n飞凯材料\n省广股份\n天赐材料\n普利特\n四方精创\n欧普康视\n完美世界\n创业黑马\n赤峰黄金\n蓝帆医疗\n北方股份\n普洛药业\n天际股份\n恒邦股份\n石英股份\n新宙邦\n浪莎股份\n上海贝岭\n翰宇药业\n韶钢松山\n盐津铺子\n设计总院\n森霸股份\n开尔新材\n红星发展\n乐通股份\n重庆燃气\n中广核技\n新宏泽\n戴维医疗\n鹏欣资源\n东方中科\n晨光生物\n麦迪科技\n日机密封\n德赛西威\n上海钢联\n有研新材\n华通医药\n凌钢股份\n依米康\n地尔汉宇\n北讯集团\n三钢闽光\n帝王洁具\n快意电梯\n正海生物\n中国巨石\n大千生态\n康达新材\n恒顺醋业\n经纬电材\n中大力德\n皇马科技\n洪汇新材\n横店东磁\n超频三\n新天药业\n先锋电子\n江粉磁材\n大族激光\n新坐标\n南极电商\n森远股份\n安阳钢铁\n台华新材\n蓝海华腾\n中材科技\n朗科科技\n金鸿顺\n歌尔股份\n通合科技\n智能自控\n纵横通信\n华铭智能\n中油工程\n达安股份\n银星能源\n翔鹭钨业\n大立科技\n永东股份\n凯发电气\n永安林业\n春风动力\n空港股份\n星网宇达\n中捷资源\n武汉凡谷\n伊之密\n长江通信\n南国置业\n常宝股份\n江龙船艇\n鲁北化工\n盛讯达\n丝路视觉\n美格智能\n新劲刚\n阿石创\n银江股份\n金银河\n国脉科技\n蒙娜丽莎\n豪能股份\n必创科技\n辅仁药业\n国科微\n泰嘉股份\n中船科技\n北化股份\n大烨智能\n赣能股份\n中通国脉\n中设股份\n梅轮电梯\n天顺股份\n勘设股份\n富煌钢构\n西陇科学\n华大基因\n英 力 特\n宝利国际\n恒林股份\n新凤鸣\n海川智能\n联诚精密\n天齐锂业\n金雷风电\n*ST新赛\n光威复材\n中环装备\n大博医疗\n金溢科技\n正川股份\n华源控股\n雅化集团\n康旗股份\n罗平锌电\n华锋股份\n德创环保\n红相电力\n双环科技\n晨丰科技\n浙商中拓\n宇顺电子\n神火股份\n中兴通讯\n珀莱雅\n中颖电子\n捷捷微电\n生益科技\n昭衍新药\n中天能源\n广哈通信\n兴齐眼药\n汇金股份\n广和通\n长春高新\n春秋电子\n联合光电\n亨通光电\n延江股份\n光明地产\n金瑞矿业\n智动力\n长盛轴承\n昇兴股份\n洲明科技\n友讯达\n中广天择\n*ST东数\n荣盛石化\n东宏股份\n华森制药\n索通发展\n英维克\n西泵股份\n宏达电子\n闻泰科技\n东方嘉盛\n湖南黄金\n安洁科技\n莱绅通灵\n杭州园林\n贝瑞基因\n银龙股份\n华凯创意\n一品红\n国光电器\n中环环保\n欧菲科技\n高科石化\n意华股份\n威唐工业\n新国都\n茂硕电源\n光库科技\n澄天伟业\n精研科技\n剑桥科技\n璞泰来\n韦尔股份\n跨境通\n天成自控\n水晶光电\n喜临门\n博迈科\n天安新材\n信隆健康\n江丰电子\n高斯贝尔\n美都能源\n立讯精密\n普莱柯\n东杰智能\n盛达矿业\n新经典\n江苏索普\n金辰股份\n扬农化工\n新晨科技\n和顺电气\n旭升股份\n和胜股份\n润禾材料\n北京君正\n莱克电气\n建研院\n金石资源\n东百集团\n金杯汽车\n同德化工\n英联股份\n伊戈尔\n光弘科技\n拉夏贝尔\n盛弘股份\n苏奥传感\n迪贝电气\n赛腾股份\n佳力图\n爱柯迪\n赣锋锂业\n广东骏亚\n丽岛新材\n东方材料\n泰瑞机器\n大业股份\n上海新阳\n国芳集团\n盘龙药业\n润都股份\n长川科技\n科创信息\n冀凯股份\n吉大通信\n湖北宜化\n铭普光磁\n安图生物\n银都股份\n九典制药\n亚士创能\n万隆光电\n振江股份\n晨曦航空\n西藏珠峰\n祥和实业\n华信新材\n凯莱英\n立昂技术\n陇神戎发\n鲁抗医药\n亚翔集成\n科创新源\n维业股份\n潜能恒信\n贝肯能源\n阳谷华泰\n畅联股份\n众生药业\n百利电气\n宇环数控\n阿科力\n白银有色\n士兰微\n易明医药\n*ST众和\n方大集团\n中科信息\n张家港行\n双一科技\n好太太\n索菱股份\n集泰股份\n川恒股份\n洛阳钼业\n汇金科技\n原尚股份\n晶华新材\n佛燃股份\n百华悦邦\n英科医疗\n洛凯股份\n*ST佳电\n三孚股份\n中曼石油\n*ST德力\n建科院\n康普顿\n*ST中富\n香飘飘\nST保千里\n安达维尔\n盛和资源\n德生科技\n永福股份\n海特生物\n金奥博\n新余国科\n信维通信\n深康佳Ａ\n国立科技\n科恒股份\n风华高科\n万马科技\n华通热力\n扬杰科技\n弘信电子\n西菱动力\n名臣健康\n科蓝软件\n山东赫达\n保隆科技\n贵州燃气\n皇台酒业\n南纺股份\n顺威股份\n乐视网\n豫金刚石\n太龙照明\n海达股份\n步森股份\n成都银行\n*ST昆机\n*ST吉恩\n御家汇\n明阳电路\n华西证券\n*ST建峰\n*ST钒钛\n*ST烯碳\n嘉友国际\n中源家居\n淳中科技\n南都物业\n养元饮品\nST网力\n天风证券\n沪硅产业\n新乳业\n山鹰国际\n湘佳股份\n明德生物\n新强联\n东阳光\n中建环能\n东方盛虹\n河钢资源\n达刚控股\n青松建化\n*ST熊猫\n宁德时代\n*ST宏图\n上海凯鑫\n科拓生物\n贝仕达克\n时空科技\n华峰铝业\n泰禾智能\n聚合顺\n首航高科\n江苏租赁\n鼎胜新材\n蔚蓝生物\n*ST联络\n双林生物\n欧菲光\n天味食品\n吉翔股份\n长虹华意\n长源东谷\n天润工业\n*ST梦舟\n*ST中南\n中贝通信\n瀚川智能\n弘高创意\n中国电研\n海晨股份\n普元信息\n京粮控股\n米奥会展\n苏州龙杰\n安道麦A\n成都燃气\n*ST金正\n硕世生物\n上海瀚讯\n公牛集团\n凯赛生物\n森麒麟\n雷曼光电\n*ST大晟\n帝尔激光\n*ST济堂\n红相股份\n凯迪退\n城地香江\n南兴股份\n妙可蓝多\n宏和科技\n圣济堂\n中盐化工\n*ST藏格\n华软科技\n南  玻Ａ\n长城证券\n帅丰电器\n上机数控\n品渥食品\n协和电子\n顺利办\n奥特维\n*ST界龙\n三美股份\n广联航空\n爱美客\n华特气体\n联创股份\n青农商行\n钢研纳克\n倍加洁\n丰山集团\n中国通号\n中粮资本\n睿创微纳\n宇晶股份\n奥海科技\n杭可科技\n东岳硅材\n锦江酒店\n罗博特科\n银泰黄金\n易天股份\n百亚股份\n*ST劝业\n传音控股\n苏农银行\nST华嵘\n罗欣药业\nST冠福\n佳禾智能\n*ST众泰\n中科软\n青岛银行\n甬金股份\n众望布艺\n瑞联新材\n浙江力诺\n海信视像\n爱旭股份\n福光股份\n京沪高铁\n申昊科技\n美畅股份\n甘源食品\n天箭科技\n国新健康\n国茂股份\n竞业达\n今创集团\n科瑞技术\n甘咨询\nST浩源\n久量股份\n创世纪\n*ST奋达\nST新海\n*ST天娱\n锦泓集团\n阿拉丁\n良信股份\n*ST赫美\n伟思医疗\n睿智医药\n若羽臣\n蓝盾光电\n中铁装配\nST安泰\n每日互动\n科达制造\n华铁应急\n金宏气体\n麦克奥迪\n帝科股份\n汉嘉设计\n*ST东电\n永新光学\n天融信\n奥来德\n阿尔特\n我爱我家\n*ST江泉\n*ST湘电\n飞亚达\n五方光电\n鸿合科技\n*ST同洲\n*ST安通\n保力新\n国华网安\n海星股份\n智莱科技\nST宇顺\nST沪科\n中微公司\n*ST宜生\n龙腾光电\n*ST华塑\n天智航\n和远气体\nST通葡\nST厦华\n中天火箭\nST地矿\n*ST鼎龙\n中船应急\n祥鑫科技\nST中捷\nST中安\n迈得医疗\n金科环境\n奥普家居\n冠盛股份\n昊海生科\n微芯生物\n城建发展\n德恩精工\n天准科技\n当虹科技\n中国一重\n石头科技\n天山铝业\n侨银环保\n凯撒旅业\n凯迪股份\n福莱特\n*ST西发\n*ST力帆\n思瑞浦\n山大地纬\n欣锐科技\n海目星\n孚能科技\n力合科技\n长阳科技\n科思股份\n光正眼科\n中国广核\n光峰科技\nST摩登\n安克创新\n爱朋医疗\nST安凯\n运达股份\n*ST华仪\n广大特材\n大洋生物\n*ST胜利\n绿的谐波\n迈瑞医疗\n安恒信息\n晨光新材\n长城科技\n朝阳科技\n太空智造\n金春股份\n*ST金洲\n渤海租赁\n交大思诺\n吉贝尔\n华丰股份\n百邦科技\n南京证券\nST中基\n昂立教育\n亿华通\n三泰控股\n仙乐健康\n雷赛智能\n电声股份\n科威尔\n*ST麦趣\n*ST海华\nST巴士\n广电计量\n福然德\n*ST中天\n中泰证券\n华夏航空\n大智慧\n红塔证券\n*ST中昌\n威胜信息\n晶丰明源\n奥福环保\n国联股份\n国网英大\n沪光股份\n日久光电\nST昌鱼\nST瑞德\n福蓉科技\n映翰通\n汉宇集团\n康辰药业\n首都在线\n三盛教育\n惠程科技\n先惠技术\n龙磁科技\n科德教育\n捷佳伟创\n雪龙集团\n天合光能\n卓胜微\n*ST林重\nST柳化\n郑州银行\n立昂微\n*ST聚力\n宝丽迪\n贵州轮胎\n华神科技\nST华鼎\n姚记科技\n固德威\n*ST盐湖\n亚联发展\n*ST天润\n*ST东科\n山东玻纤\n*ST中新\n博汇股份\nST游久\n嘉元科技\n恒银科技\n谱尼测试\n派克新材\n*ST经开\nST宏盛\n铁岭新城\n*ST环球\n万德斯\n筑博设计\n申联生物\n中天精装\nST德豪\n天元股份\n*ST时万\n万泰生物\n国瑞科技\n岭南股份\n淮河能源\n晶澳科技\n新产业\n锦浪科技\n*ST华映\n*ST友谊\n特宝生物\n中信出版\n华东数控\n长飞光纤\n药明康德\n晶晨股份\n优彩资源\n旭光电子\n豪悦护理\n天宜上佳\n路德环境\n中达安\n利通电子\n迈为股份\nST圣莱\n锦和商业\n中国外运\n捷强装备\n冰山冷热\n锐科激光\n地铁设计\n新媒股份\n数知科技\n上能电气\n克劳斯\n迪普科技\n金博股份\n祥生医疗\n卡倍亿\n卓越新能\n五洲特纸\n迪威尔\nST金刚\n国林科技\n仲景食品\n柏楚电子\n*ST新光\n长沙银行\n威派格\n天正电气\nST凯瑞\n*ST飞乐\n嘉必优\n宏柏新材\n豪美新材\n当代文体\n张  裕Ａ\n大胜达\n百奥泰\n指南针\n奥美医疗\n澜起科技\nST云投\n七一二\nC海融\n亚普股份\n越博动力\n华民股份\n宸展光电\nST抚钢\n中迪投资\n飞龙股份\n*ST升达\n美瑞新材\n仕佳光子\n*ST大洲\n中铝国际\n通达电气\n*ST海陆\n佰奥智能\n*ST金鸿\n春光科技\n南新制药\nST电能\n淮北矿业\n*ST金钰\n爱博医疗\n南大环境\n海越能源\n日月明\n浙海德曼\n东鹏控股\n松霖科技\n宇新股份\n中电兴发\n金力永磁\n开普检测\nST罗普\n*ST欧浦\n国网信通\n三友医疗\n三角防务\nC亿田\n芯朋微\n西麦食品\n稳健医疗\n中岩大地\n*ST海创\n宇信科技\n容百科技\n杰普特\n锦鸡股份\n小熊电器\n八亿时空\n华辰装备\n振德医疗\n中芯国际\n国联证券\n寒武纪\n*ST刚泰\n*ST拉夏\n佰仁医疗\n*ST美讯\n宝丰能源\n艾可蓝\n锦盛新材\n*ST皇台\n博瑞医药\n恒实科技\nST中葡\n招商港口\n*ST秦机\n德力股份\n鲁商发展\n铁科轨道\n天地数码\n泰永长征\n万华化学\n恒力石化\n*ST东洋\n科翔股份\n德方纳米\n高测股份\n芯原股份\n敏芯股份\n铜牛信息\n帝欧家居\n中科星图\nST禾盛\n紫光国微\n*ST中华A\n明新旭腾\n大宏立\n松炀资源\n鹏鼎控股\n*ST成城\n金山办公\n仁东控股\n乐鑫科技\n领益智造\n招商南油\n拉卡拉\n盛德鑫泰\n三达膜\n长鸿高科\n交建股份\n回盛生物\n苏盐井神\n*ST大港\n福能东方\n*ST安信\n燕麦科技\n柯力传感\n*ST德奥\n新智认知\nST猛狮\n吉峰科技\n华致酒行\n巴比食品\n深信服\nST椰岛\n金石亚药\n日海智能\nST天成\n宏力达\n中新集团\n*ST雪莱\n金富科技\n*ST华讯\n捷昌驱动\nST狮头\nST天龙\n厦门象屿\n八方股份\n爱丽家居\n均瑶健康\n大为股份\n泰和科技\n麒盛科技\n四会富仕\n招商积余\n瑞松科技\n苑东生物\n大地熊\n航天宏图\n*ST融捷\n玉禾田\n立华股份\n龙利得\n居然之家\n天下秀\n芯源微\n致远互联\n大东海A\n贝斯美\n君实生物\n圣湘生物\n渝农商行\n安宁股份\n珠海中富\n华创阳安\nST金花\n大东南\n*ST永泰\nST威龙\n日辰股份\n四方科技\nST国重装\n斯达半导\n旗天科技\n建龙微纳\n洁特生物\n心脉医疗\n奇安信\nST坊展\n英杰电气\n复洁环保\n*ST节能\n豆神教育\n锐新科技\n泉阳泉\n友发集团\n健之佳\nST金泰\n七彩化学\n汇创达\n北汽蓝谷\n*ST银河\n*ST天夏\n*ST永林\n和佳医疗\n川能动力\n派生科技\n兴图新科\n昂利康\n新诺威\n*ST富控\n航天彩虹\n攀钢钒钛\n青岛中程\n*ST交昂\n*ST康得\n开能健康\n*ST联合\n鲁  泰Ａ\n重药控股\n直真科技\n惠发食品\n矩子科技\n泛亚微透\n图南股份\n海能实业\n*ST中珠\n翔丰华\n*ST群兴\n瑞晟智能\n*ST科陆\n力鼎光电\n中国中免\n国光连锁\n珈伟新能\n海容冷链\nST人乐\n中信特钢\n法狮龙\n澳弘电子\n天臣医疗\n奥赛康\n慧辰资讯\n北摩高科\n华阳国际\nST仁智\nST索菱\n景津环保\n科安达\n东方环宇\n新洁能\n恒铭达\n中科海讯\n瑞达期货\n晶科科技\nST乐凯\n海航科技\n建霖家居\n中胤时尚\n亚世光电\n国安达\n国盛智科\n爱克股份\n中山金马\n*ST博信\n芒果超媒\n长城军工\n上纬新材\n唐源电气\n西部超导\n苏宁易购\n地素时尚\nST天圣\n金雷股份\n丹化科技\n前沿生物\n华润微\n万顺新材\n辽宁能源\n中信建投\n巨星农牧\nST中孚\n万通发展\n*ST科林\n中国卫通\nTCL科技\n隆利科技\nST舍得\n万胜智能\n启迪环境\n圣元环保\n*ST雅博\n赛摩智能\n金冠股份\nST创兴\n有友食品\n安徽建工\n耐普矿机\n双飞股份\n浩洋股份\n北元集团\n卧龙电驱\n彤程新材\n力合微\n中密控股\n*ST瀚叶\n宏川智慧\n奕瑞科技\n迦南智能\n华图山鼎\n海象新材\n文灿股份\n*ST夏利\n声迅股份\n东来技术\nST庞大\n江苏新能\n安集科技\n工业富联\n联瑞新材\nST天雁\n国新文化\nST长投\n秦川物联\n*ST胜尔\n蒙泰高新\n华兴源创\n*ST贵人\n松井股份\n渤海汽车\n中谷物流\n柳    工\n蓝特光学\n新城市\n金达莱\n卓易信息\n伯特利  \n浙矿股份\n苏州银行\n泽达易盛\n五洋停车\n航锦科技\n*ST北能\n尚纬股份\n*ST商城\n*ST银亿\n长江健康\n金现代\n雪天盐业\n贵州三力\n科沃斯\n松原股份\n康平科技\n湘财股份\n天禾股份\n锐明技术\n瑞鹄模具\n*ST北讯\n顺钠股份\n绿色动力\n昇辉科技\n德马科技\n熊猫乳品\nST八菱\n金龙鱼\n中公教育\n越剑智能\n嘉美包装\n中金公司\nST东网\n赛轮轮胎\n伟时电子\n*ST晨鑫\n紫天科技\n中创环保\n汇得科技\n保利联合\n财富趋势\n博杰股份\n盈康生命\n三峰环境\n壶化股份\n普门科技\n有方科技\n北鼎股份\n*ST蓝丰\n因赛集团\n佳云科技\n豪森股份\n国盾量子\n兴瑞科技\n泽璟制药\n科前生物\n天地在线\n恒久科技\n密尔克卫\n中国人保\n左江科技\n良品铺子\n三只松鼠\n彩讯股份\n新疆交建\n华盛昌\n*ST当代\n天迈科技\n昊华科技\n京源环保\n同庆楼\n永兴材料\n威尔药业\n龙软科技\n瑞玛工业\n蠡湖股份\nST天首\n*ST荣华\n郑中设计\n恩捷股份\n华光环能\nST毅昌\n德林海\n芯海科技\n大悦城\n宝兰德\n*ST信通\n鸿远电子\n亿嘉和\nST百花\n震安科技\n博汇科技\n天奈科技\n豪尔赛\n江航装备\n久日新材\n亚钾国际\n*ST中商\n欧陆通\n狄耐克\n粤桂股份\n长虹美菱\n苏美达\n南亚新材\n芯能科技\n顺博合金\n光云科技\n*ST九有\n锋尚文化\n中嘉博创\n康希诺\n康龙化成\n*ST高升\n顶固集创\nST运盛\n济南高新\n葫芦娃\n新天绿能\n天普股份\n经纬辉开\n沃格光电\n特  力Ａ\n宝明科技\nST毅达\n森霸传感\n青岛港\n维信诺\n西安银行\n科博达\n永冠新材\n博睿数据\n鸿泉物联\n*ST天马\n青鸟消防\n新化股份\n赛特新材\n三人行\n正帆科技\n佳发教育\n神州细胞\n深南股份\n蓝黛科技\nST宜化\n紫金银行\n*ST长城\n*ST康盛\n奥园美谷\n润建股份\n欣贺股份\n金田铜业\n江苏北人\n准油股份\n甘李药业\n埃夫特\n优刻得\n凌志软件\n利扬芯片\n荣联科技\n威奥股份\n佳电股份\n康泰医学\n德利股份\n*ST飞马\n华文食品\n*ST利源\n中粮科技\n*ST恒康\n长华股份\n金时科技\n大叶股份\n中国海防\n交控科技\n华林证券\n派瑞股份\n美迪西\n艾力斯\n格林达\n博深股份\n热景生物\n创源股份\n神驰机电\n新亚强\nST云网\n佳华科技\n*ST辉丰\nST生物\n海南发展\n惠云钛业\n山东墨龙\n维康药业\n*ST江特\n东珠生态\n海信家电\n博通集成\n方邦股份\n*ST海源\nST远程\n美吉姆\n丽江股份\n国城矿业\n海油发展\n天阳科技\n震有科技\n新兴装备\n朗进科技\n万  科Ａ\n赛科希德\n酷特智能\nST罗顿\n华熙生物\n建科机械\nST尤夫\n万里股份\n*ST斯太\n惠城环保\n重庆钢铁\n雅运股份\n华翔股份\n安必平\n兰剑智能\n京北方\n华菱钢铁\n*ST敦种\n华业香料\n大有能源\n京基智农\n*ST目药\n康华生物\n海昌新材\n中航西飞\n南华期货\n金海高科\n福昕软件\n维科技术\n九洲集团\n*ST实达\n艾迪药业\n华峰测控\n上海沿浦\n*ST亚振\n世华科技\n山科智能\n值得买\n华达新材\n洪通燃气\n道通科技\n拱东医疗\n泉峰汽车\n测绘股份\n丸美股份\n中信博\n天利科技\n西域旅游\n锋龙股份\nST科迪\n科思科技\n神农科技\n铂科新材\n捷安高科\n共创草坪\n胜蓝股份\n紫晶存储\n中船汉光\n瑞丰新材\n瑞芯微\n日丰股份\n丽人丽妆\n壹网壹创\n赛微电子\n深粮控股\n移远通信\n聚辰股份\n杰美特\n延安必康\n华培动力\n起帆电缆\n和顺石油\nST南风\n*ST金贵\nST昌九\n沃尔德\n盟升电子\n创业慧康\n中光学\n厦门银行\n仙鹤股份\n东方生物\n融捷健康\n步科股份\n安博通\n奥锐特\n芯瑞达\n邮储银行\n中控技术\n爱婴室\n赛伍技术\n三六零\n华光新材\n协鑫能科\n*ST乐通\n泰坦科技\n志邦家居\n山石网科\n建业股份\nST步森\n览海医疗\n炼石航空\n神马电力\n盛达资源\n联赢激光\nST亚邦\n*ST辅仁\n中创物流\nST亚星\n柳药股份\n福达合金\nST双环\n垒知集团\n省广集团\n新金路\n盛视科技\n泸天化\n虹软科技\n*ST围海\n一汽解放\nST岩石\n铂力特\n郑州煤电\n紫光学大\n金丹科技\n科远智慧\nST康美\n*ST盈方\n斯迪克\nST海马\n中银证券\n中简科技\n华宝股份\n*ST勤上\n聆达股份\n会通股份\n甘化科工\n键凯科技\n盛新锂能\n天奥电子\n元利科技\n华闻集团\n海尔智家\n万林物流\n华设集团\n*ST兆新\n莱伯泰科\n神工股份\n*ST六化\n东亚药业\nST加加\n泰林生物\n协创数据\n宇瞳光学\n海鸥住工\n南微医学\nC朗特\n赛诺医疗\n澳洋健康\n*ST长动\n宁水集团\n成都先导\n云涌科技\n城发环境\n天津普林\n海尔生物\n复旦张江\n皖仪科技\n山西路桥\n富祥药业\n开普云\n恒誉环保\n隆华科技\n聚杰微纤\n浙商银行\n新赛股份\n佛燃能源\n清溢光电\n明阳智能\n新大正\n新光光电\n富通鑫茂\n天邑股份\n三生国健\n新农股份\n海峡创新\n新致软件\n兆威机电\n海融科技\n确成股份\nC兆龙\n联泓新科\n朗特智能\nC凯龙\n博迁新材\nC润阳\n同兴环保\n西上海\nC研奥\n塞力医疗\n特发服务\n*ST中孚\n*ST鑫科\n派能科技\n舒华体育\n明微电子\n启迪药业\n蔚蓝锂芯\n明冠新材\n国机精工\n健麾信息\n鼎通科技\n三旺通信\n晋控电力\n悦康药业\n晋控煤业\n东贝集团\n伟创电气\nC天秦\n开元教育\n中伟股份\n一鸣食品\n思进智能\n华旺科技\n欧科亿\n振邦智能\n杭华股份\n彩虹集团\n南山智尚\n山西焦煤\n亿田智能\n科兴制药\n恒玄科技\n中晶科技\n立方制药\n南凌科技\n吉大正元\n航亚科技\n森林包装\n福立旺\n汉马科技\n通源环境\n兆龙互连\n星徽股份\n凯龙高科\n西大门\n侨银股份\n华峰化学\n研奥股份\nC法本\n奥普特\n润阳科技\nC火星人\n远东股份\n天秦装备\n鹏都农牧\n天原股份\n"
  },
  {
    "path": "legacy_v1/src/Leorio/tokenization.py",
    "content": "import __init__\n\nfrom Kite.database import Database\nfrom Kite import config\nfrom Kite import utils\n\nimport jieba\nimport pkuseg\nimport logging\n\nlogging.basicConfig(level=logging.INFO,\n                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n                    datefmt='%a, %d %b %Y %H:%M:%S')\n\n\nclass Tokenization(object):\n\n    def __init__(self, import_module=\"jieba\", user_dict=None, chn_stop_words_dir=None):\n        #self.database = Database().conn[config.DATABASE_NAME]  #.get_collection(config.COLLECTION_NAME_CNSTOCK)\n        self.database = Database()\n        self.import_module = import_module\n        self.user_dict = user_dict\n        if self.user_dict:\n            self.update_user_dict(self.user_dict)\n        if chn_stop_words_dir:\n            self.stop_words_list = utils.get_chn_stop_words(chn_stop_words_dir)\n        else:\n            self.stop_words_list = list()\n\n    def update_user_dict(self, old_user_dict_dir, new_user_dict_dir=None):\n        # 将缺失的(或新的)股票名称、金融新词等，添加进金融词典中\n        word_list = []\n        with open(old_user_dict_dir, \"r\", encoding=\"utf-8\") as file:\n            for row in file:\n                word_list.append(row.split(\"\\n\")[0])\n        name_code_df = self.database.get_data(config.STOCK_DATABASE_NAME,\n                                              config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                              keys=[\"name\", \"code\"])\n        new_words_list = list(set(name_code_df[\"name\"].tolist()))\n        for word in new_words_list:\n            if word not in word_list:\n                word_list.append(word)\n        new_user_dict_dir = old_user_dict_dir if not new_user_dict_dir else new_user_dict_dir\n        with open(new_user_dict_dir, \"w\", encoding=\"utf-8\") as file:\n            for word in word_list:\n                file.write(word + \"\\n\")\n\n    def cut_words(self, text):\n        outstr = list()\n        sentence_seged = None\n        if self.import_module == \"jieba\":\n            if self.user_dict:\n                jieba.load_userdict(self.user_dict)\n            sentence_seged = list(jieba.cut(text))\n        elif self.import_module == \"pkuseg\":\n            seg = pkuseg.pkuseg(user_dict=self.user_dict)  # 添加自定义词典\n            sentence_seged = seg.cut(text)  # 进行分词\n        if sentence_seged:\n            for word in sentence_seged:\n                if word not in self.stop_words_list \\\n                        and word != \"\\t\" \\\n                        and word != \" \" \\\n                        and utils.is_contain_chn(word)\\\n                        and len(word) > 1:\n                    outstr.append(word)\n            return outstr\n        else:\n            return False\n\n    def find_relevant_stock_codes_in_article(self, article, stock_name_code_dict):\n        stock_codes_set = list()\n        cut_words_list = self.cut_words(article)\n        if cut_words_list:\n            for word in cut_words_list:\n                try:\n                    stock_codes_set.append(stock_name_code_dict[word])\n                except Exception:\n                    pass\n        return list(set(stock_codes_set))\n\n    def update_news_database_rows(self,\n                                  database_name,\n                                  collection_name,\n                                  incremental_column_name=\"RelatedStockCodes\"):\n        name_code_df = self.database.get_data(config.STOCK_DATABASE_NAME,\n                                              config.COLLECTION_NAME_STOCK_BASIC_INFO,\n                                              keys=[\"name\", \"code\"])\n        name_code_dict = dict(name_code_df.values)\n        data = self.database.get_collection(database_name, collection_name).find()\n        for row in data:\n            # if row[\"Date\"] > \"2019-05-20 00:00:00\":\n            # 在新增数据中，并不存在更新列，但是旧数据中已存在更新列，因此需要\n            # 判断数据结构中是否包含该incremental_column_name字段\n            if incremental_column_name not in row.keys():\n                related_stock_codes_list = self.find_relevant_stock_codes_in_article(\n                                             row[\"Article\"], name_code_dict)\n                self.database.update_row(database_name,\n                                         collection_name,\n                                         {\"_id\": row[\"_id\"]},\n                                         {incremental_column_name: \" \".join(related_stock_codes_list)}\n                                         )\n                logging.info(\"[{} -> {} -> {}] updated {} key value ... \"\n                             .format(database_name, collection_name, row[\"Date\"], incremental_column_name))\n            else:\n                logging.info(\"[{} -> {} -> {}] has already existed {} key value ... \"\n                             .format(database_name, collection_name, row[\"Date\"], incremental_column_name))\n\n\nif __name__ == \"__main__\":\n    tokenization = Tokenization(import_module=\"jieba\",\n                                user_dict=\"financedict.txt\",\n                                chn_stop_words_dir=\"chnstopwords.txt\")\n    # documents_list = \\\n    #     [\n    #         \"中央、地方支持政策频出,煤炭行业站上了风口 券商研报浩如烟海，投资线索眼花缭乱，\\\n    #         第一财经推出《一财研选》产品，挖掘研报精华，每期梳理5条投资线索，便于您短时间内获\\\n    #         取有价值的信息。专业团队每周日至每周四晚8点准时“上新”，助您投资顺利！\",\n    #         \"郭文仓到重点工程项目督导检查 2月2日,公司党委书记、董事长、总经理郭文仓,公司董事,\\\n    #         股份公司副总经理、总工程师、郭毅民,股份公司副总经理张国富、柴高贵及相关单位负责人到\\\n    #         焦化厂煤场全封闭和干熄焦等重点工程项目建设工地督导检查施工进度和安全工作情况。\"\n    #     ]\n    # for text in documents_list:\n    #     cut_words_list = tokenization.cut_words(text)\n    #     print(cut_words_list)\n    # tokenization.update_news_database_rows(config.DATABASE_NAME, \"jrj\")\n"
  },
  {
    "path": "legacy_v1/src/Leorio/topicmodelling.py",
    "content": "import __init__\n\nimport os\nimport time\n\nfrom Kite import config\nfrom Kite import utils\nfrom Kite.database import Database\nfrom Leorio.tokenization import Tokenization\nfrom Hisoka.classifier import Classifier\n\nfrom sklearn import preprocessing\n\nfrom gensim import corpora\nfrom gensim import models\nfrom gensim.matutils import corpus2dense\n\nimport logging\nlogging.basicConfig(level=logging.INFO,\n                    format=\"%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s\",\n                    datefmt=\"%a, %d %b %Y %H:%M:%S\")\n\n\nclass TopicModelling(object):\n\n    def __init__(self):\n        self.tokenization = Tokenization(import_module=\"jieba\",\n                                         user_dict=config.USER_DEFINED_DICT_PATH,\n                                         chn_stop_words_dir=config.CHN_STOP_WORDS_PATH)\n        self.database = Database()\n        self.classifier = Classifier()\n\n    def create_dictionary(self,\n                          raw_documents_list,\n                          save_path=None,\n                          is_saved=False):\n        \"\"\"\n        将文中每个词汇关联唯一的ID，因此需要定义词汇表\n        :param: raw_documents_list, 原始语料列表，每个元素即文本，如[\"洗尽铅华...\", \"风雨赶路人...\", ...]\n        :param: savepath, corpora.Dictionary对象保存路径\n        \"\"\"\n        documents_token_list = []\n        for doc in raw_documents_list:\n            documents_token_list.append(self.tokenization.cut_words(doc))\n        _dict = corpora.Dictionary(documents_token_list)\n        # 找到只出现一次的token\n        once_items = [_dict[tokenid] for tokenid, docfreq in _dict.dfs.items() if docfreq == 1]\n        # 在documents_token_list的每一条语料中，删除只出现一次的token\n        for _id, token_list in enumerate(documents_token_list):\n            documents_token_list[_id] = list(filter(lambda token: token not in once_items, token_list))\n        # 极端情况，某一篇语料所有token只出现一次，这样该篇新闻语料的token列表就变为空，因此删除掉\n        documents_token_list = [token_list for token_list in documents_token_list if (len(token_list) != 0)]\n        # 找到只出现一次的token对应的id\n        once_ids = [tokenid for tokenid, docfreq in _dict.dfs.items() if docfreq == 1]\n        # 删除仅出现一次的词\n        _dict.filter_tokens(once_ids)\n        # 消除id序列在删除词后产生的不连续的缺口\n        _dict.compactify()\n        if is_saved and save_path:\n            _dict.save(save_path)\n            logging.info(\"new generated dictionary saved in path -> {} ...\".format(save_path))\n\n        return _dict, documents_token_list\n\n    def renew_dictionary(self,\n                         old_dict_path,\n                         new_raw_documents_list,\n                         new_dict_path=None,\n                         is_saved=False):\n        documents_token_list = []\n        for doc in new_raw_documents_list:\n            documents_token_list.append(self.tokenization.cut_words(doc))\n        _dict = corpora.Dictionary.load(old_dict_path)\n        _dict.add_documents(documents_token_list)\n        if new_dict_path:\n            old_dict_path = new_dict_path\n        if is_saved:\n            _dict.save(old_dict_path)\n            logging.info(\"updated dictionary by another raw documents serialized in {} ... \".format(old_dict_path))\n\n        return _dict, documents_token_list\n\n    def create_bag_of_word_representation(self,\n                                          raw_documents_list,\n                                          old_dict_path=None,\n                                          new_dict_path=None,\n                                          bow_vector_save_path=None,\n                                          is_saved_dict=False):\n        if old_dict_path:\n            # 如果存在旧的语料词典，就在原先词典的基础上更新，增加未见过的词\n            corpora_dictionary, documents_token_list = self.renew_dictionary(old_dict_path,\n                                                                             raw_documents_list,\n                                                                             new_dict_path=new_dict_path)\n        else:\n            # 否则重新创建词典\n            start_time = time.time()\n            corpora_dictionary, documents_token_list = self.create_dictionary(raw_documents_list,\n                                                                              save_path=new_dict_path,\n                                                                              is_saved=is_saved_dict)\n            end_time = time.time()\n            logging.info(\"there are {} mins spent to create a new dictionary ... \".format((end_time-start_time)/60))\n        # 根据新词典对文档(或语料)生成对应的词袋向量\n        start_time = time.time()\n        bow_vector = [corpora_dictionary.doc2bow(doc_token) for doc_token in documents_token_list]\n        end_time = time.time()\n        logging.info(\"there are {} mins spent to calculate bow-vector ... \".format((end_time - start_time) / 60))\n        if bow_vector_save_path:\n            corpora.MmCorpus.serialize(bow_vector_save_path, bow_vector)\n\n        return documents_token_list, corpora_dictionary, bow_vector\n\n    @staticmethod\n    def transform_vectorized_corpus(corpora_dictionary,\n                                    bow_vector,\n                                    model_type=\"lda\",\n                                    model_save_path=None):\n        # 如何没有保存任何模型，重新训练的情况下，可以选择该函数\n        model_vector = None\n        if model_type == \"lsi\":\n            # LSI(Latent Semantic Indexing)模型，将文本从词袋向量或者词频向量(更好)，转为一个低维度的latent空间\n            # 对于现实语料，目标维度在200-500被认为是\"黄金标准\"\n            model_tfidf = models.TfidfModel(bow_vector)\n            # model_tfidf.save(\"model_tfidf.tfidf\")\n            tfidf_vector = model_tfidf[bow_vector]\n            model = models.LsiModel(tfidf_vector,\n                                    id2word=corpora_dictionary,\n                                    num_topics=config.TOPIC_NUMBER)  # 初始化模型\n            model_vector = model[tfidf_vector]\n            if model_save_path:\n                model.save(model_save_path)\n        elif model_type == \"lda\":\n            model = models.LdaModel(bow_vector,\n                                    id2word=corpora_dictionary,\n                                    num_topics=config.TOPIC_NUMBER)  # 初始化模型\n            model_vector = model[bow_vector]\n            if model_save_path:\n                model.save(model_save_path)\n        elif model_type == \"tfidf\":\n            model = models.TfidfModel(bow_vector)  # 初始化\n            # model = models.TfidfModel.load(\"model_tfidf.tfidf\")\n            model_vector = model[bow_vector]  # 将整个语料进行转换\n            if model_save_path:\n                model.save(model_save_path)\n\n        return model_vector\n\n    def classify_stock_news(self,\n                            unseen_raw_document,\n                            database_name,\n                            collection_name,\n                            label_name=\"60DaysLabel\",\n                            topic_model_type=\"lda\",\n                            classifier_model=\"svm\",\n                            ori_dict_path=None,\n                            bowvec_save_path=None,\n                            is_saved_bow_vector=False):\n        historical_raw_documents_list = []\n        Y = []\n        for row in self.database.get_collection(database_name, collection_name).find():\n            if label_name in row.keys():\n                if row[label_name] != \"\":\n                    historical_raw_documents_list.append(row[\"Article\"])\n                    Y.append(row[label_name])\n        logging.info(\"fetch symbol '{}' historical news with label '{}' from [DB:'{}' - COL:'{}'] ... \"\n                     .format(collection_name, label_name, database_name, collection_name))\n\n        le = preprocessing.LabelEncoder()\n        Y = le.fit_transform(Y)\n        logging.info(\"encode historical label list by sklearn preprocessing for training ... \")\n        label_name_list = le.classes_  # ['中性' '利好' '利空'] -> [0, 1, 2]\n\n        # 根据历史新闻数据库创建词典，以及计算每个历史新闻的词袋向量；如果历史数据库创建的字典存在，则加载进内存\n        # 用未见过的新闻tokens去更新该词典\n        if not os.path.exists(ori_dict_path):\n            if not os.path.exists(bowvec_save_path):\n                _, _, historical_bow_vec = self.create_bag_of_word_representation(historical_raw_documents_list,\n                                                                                  new_dict_path=ori_dict_path,\n                                                                                  bow_vector_save_path=bowvec_save_path,\n                                                                                  is_saved_dict=True)\n                logging.info(\"create dictionary of historical news, and serialized in path -> {} ... \".format(ori_dict_path))\n                logging.info(\"create bow-vector of historical news, and serialized in path -> {} ... \".format(bowvec_save_path))\n            else:\n                _, _, _ = self.create_bag_of_word_representation(historical_raw_documents_list,\n                                                                 new_dict_path=ori_dict_path,\n                                                                 is_saved_dict=True)\n                logging.info(\"create dictionary of historical news, and serialized in path -> {} ... \".format(ori_dict_path))\n        else:\n            if not os.path.exists(bowvec_save_path):\n                _, _, historical_bow_vec = self.create_bag_of_word_representation(historical_raw_documents_list,\n                                                                                  new_dict_path=ori_dict_path,\n                                                                                  bow_vector_save_path=bowvec_save_path,\n                                                                                  is_saved_dict=True)\n                logging.info(\"historical news dictionary existed, which saved in path -> {}, but not the historical bow-vector\"\n                             \" ... \".format(ori_dict_path))\n            else:\n                historical_bow_vec_mmcorpus = corpora.MmCorpus(bowvec_save_path)  # type -> <gensim.corpora.mmcorpus.MmCorpus>\n                historical_bow_vec = []\n                for _bow in historical_bow_vec_mmcorpus:\n                    historical_bow_vec.append(_bow)\n                logging.info(\"both historical news dictionary and bow-vector existed, load historical bow-vector to memory ... \")\n\n        start_time = time.time()\n        updated_dictionary_with_old_and_unseen_news, unssen_documents_token_list = self.renew_dictionary(ori_dict_path,\n                                                                                                         [unseen_raw_document],\n                                                                                                         is_saved=True)\n        end_time = time.time()\n        logging.info(\"renew dictionary with unseen news tokens, and serialized in path -> {}, \"\n                     \"which took {} mins ... \".format(ori_dict_path, (end_time-start_time)/60))\n\n        unseen_bow_vector = [updated_dictionary_with_old_and_unseen_news.doc2bow(doc_token) for doc_token in\n                             unssen_documents_token_list]\n        updated_bow_vector_with_old_and_unseen_news = []\n        updated_bow_vector_with_old_and_unseen_news.extend(historical_bow_vec)\n        updated_bow_vector_with_old_and_unseen_news.extend(unseen_bow_vector)\n        # 原先updated_bow_vector_with_old_and_unseen_news是list类型，\n        # 但是经过下面序列化后重新加载进来的类型是gensim.corpora.mmcorpus.MmCorpus\n        if is_saved_bow_vector and bowvec_save_path:\n            corpora.MmCorpus.serialize(bowvec_save_path,\n                                       updated_bow_vector_with_old_and_unseen_news)  # 保存更新后的bow向量，即包括新旧新闻的bow向量集\n        logging.info(\"combined bow vector(type -> 'list') generated by historical news with unseen bow \"\n                     \"vector to create a new one ... \")\n\n        if topic_model_type == \"lsi\":\n            start_time = time.time()\n            updated_tfidf_model_vector = self.transform_vectorized_corpus(updated_dictionary_with_old_and_unseen_news,\n                                                                          updated_bow_vector_with_old_and_unseen_news,\n                                                                          model_type=\"tfidf\")  # type -> <gensim.interfaces.TransformedCorpus object>\n            end_time = time.time()\n            logging.info(\"regenerated TF-IDF model vector by updated dictionary and updated bow-vector, \"\n                         \"which took {} mins ... \".format((end_time-start_time)/60))\n\n            start_time = time.time()\n            model = models.LsiModel(updated_tfidf_model_vector,\n                                    id2word=updated_dictionary_with_old_and_unseen_news,\n                                    num_topics=config.TOPIC_NUMBER)  # 初始化模型\n            model_vector = model[updated_tfidf_model_vector]  # type -> <gensim.interfaces.TransformedCorpus object>\n            end_time = time.time()\n            logging.info(\"regenerated LSI model vector space by updated TF-IDF model vector space, \"\n                         \"which took {} mins ... \".format((end_time-start_time)/60))\n        elif topic_model_type == \"lda\":\n            start_time = time.time()\n            model_vector = self.transform_vectorized_corpus(updated_dictionary_with_old_and_unseen_news,\n                                                            updated_bow_vector_with_old_and_unseen_news,\n                                                            model_type=\"lda\")\n            end_time = time.time()\n            logging.info(\"regenerated LDA model vector space by updated dictionary and bow-vector, \"\n                         \"which took {} mins ... \".format((end_time-start_time)/60))\n\n        # 将gensim.interfaces.TransformedCorpus类型的lsi模型向量转为numpy矩阵\n        start_time = time.time()\n        latest_matrix = corpus2dense(model_vector,\n                                     num_terms=model_vector.obj.num_terms).T\n        end_time = time.time()\n        logging.info(\"transform {} model vector space to numpy.adarray, \"\n                     \"which took {} mins ... \".format(topic_model_type.upper(), (end_time-start_time)/60))\n\n        # 利用历史数据的话题模型向量(或特征)，进一步训练新闻分类器\n        start_time = time.time()\n        train_x, train_y, test_x, test_y = utils.generate_training_set(latest_matrix[:-1, :], Y)\n        clf = self.classifier.train(train_x, train_y, test_x, test_y, model_type=classifier_model)\n        end_time = time.time()\n        logging.info(\"finished training by sklearn {} using latest {} model vector space, which took {} mins ... \"\n                     .format(classifier_model.upper(), topic_model_type.upper(), (end_time-start_time)/60))\n\n        label_id = clf.predict(latest_matrix[-1, :].reshape(1, -1))[0]\n\n        return label_name_list[label_id]\n\n\nif __name__ == \"__main__\":\n    label_name = \"3DaysLabel\"\n    database_name = \"stocknews\"\n    # sh600004的数据量比较少，可作为跑通代码流程的参数；sz000001的数据量比较大，处理起来也较慢，可以作为后续案例测试\n    collection_name = \"sz000001\"\n    classifier_save_path = \"{}_classifier.pkl\".format(collection_name)\n    ori_dict_path = \"{}_docs_dict.dict\".format(collection_name)\n    bowvec_save_path = \"{}_bowvec.mm\".format(collection_name)\n\n    # 对(未见过的)新闻进行分类\n    # unseen_raw_documents_list = [\"智通财经APP讯，白云机场(600004.SH)发布公告，公司2020年11月起降40278架次，\\\n    #                               同比下降2.47%;旅客吞吐量约501.4万人次，同比下降19.31%;货邮吞吐量约17.32万\\\n    #                               吨，同比下降1.27%。此外，公司2020年累计起降约33.2万架次，同比下降26.07%;旅\\\n    #                               客吞吐量约3890.14万人次，同比下降42.00%;货邮吞吐量约158.12万吨，同比下降9.14%。\",\n    #                              \"格隆汇 9 月 1日丨白云机场(600004.SH)公布，公司收到中国证券监督管理委员会于2020\\\n    #                               年8月20日出具的《中国证监会行政许可项目审查一次反馈意见通知书》(202137号)。根据\\\n    #                               《反馈意见》的相关要求，白云机场控股股东广东省机场管理集团有限公司(“机场集团”)\\\n    #                               于2020年8月31日出具了《广东省机场管理集团有限公司关于不存在减持广州白云国际机场股\\\n    #                               份有限公司股票行为或减持计划的承诺函》，具体内容如下：鉴于机场集团拟以现金的方式参\\\n    #                               与认购本次白云机场非公开发行的A股股票。机场集团现作出如下承诺：1、自白云机场本次发\\\n    #                               行定价基准日(即2020年4月28日)前六个月至本承诺函出具之日，机场集团及机场集团控制的关\\\n    #                               联方未出售或以任何方式减持白云机场的任何股票。2、自本承诺函出具之日起至白云机场本次发\\\n    #                               行完成后六个月期间内，机场集团及机场集团控制的关联方将不会出售或以任何方式减持所持有的\\\n    #                               白云机场的任何股票，也不存在减持白云机场股票的计划。3、机场集团及机场集团控制的关联方\\\n    #                               不存在违反《中华人民共和国证券法》第四十四条的情形。如有违反，机场集团因减持股票所得收\\\n    #                               益将归白云机场所有。4、本承诺函自签署之日起对机场集团具有约束力，若机场集团或机场集团\\\n    #                               控制的关联方违反上述承诺发生减持情况，则减持所得全部收益归白云机场所有，机场集团依法\\\n    #                               承担由此产生的法律责任。\",\n    #                              \"格隆汇11月27日丨白云机场(600004.SH)公布，为增强上市公司经营独立性、业务及资产完整性，\\\n    #                              提升公司盈利能力与运行保障能力，扩展白云机场物流业务发展空间，同时减少关联交易，确保上\\\n    #                              市公司利益最大化，公司拟实施如下交易：机场集团以所持有的航合公司100%的股权以及铂尔曼酒\\\n    #                              店、澳斯特酒店相应的经营性资产及负债与上市公司所持有的物流公司51%的股权进行资产置换，差\\\n    #                              额部分以现金补足。其中航合公司100%股权作价7.54亿元，铂尔曼酒店经营性资产及负债作价2.28\\\n    #                              亿元，澳斯特酒店经营性资产及负债作价3950.01万元，物流公司51%股权作价8.57亿元，上市公司\\\n    #                              需向机场集团以现金方式支付差额1.64亿元。本次交易完成后，公司将持有航合公司100%股权、铂\\\n    #                              尔曼酒店和澳斯特酒店经营性资产及负债、物流公司49%股权；机场集团将持有物流公司51%股权。\\\n    #                              本次交易除上述资产置换外，还包括：(1)上市公司与机场集团重新划分国内航空主业收入中旅客服\\\n    #                              务费(以下简称“旅客服务费”)的分成比例，由上市公司占85%、机场集团占15%，变更为上市公司\\\n    #                              占100%，机场集团不再享有旅客服务费分成，2018年15%旅客服务费对应金额为1.19亿元；及(2)上\\\n    #                              市公司将按物流公司年营业收入的4%向物流公司收取经营权使用费。2018年，模拟计算物流公司营\\\n    #                              业收入4%对应的经营权使用费为2536.07万元。本次资产置换交易完成后，上市公司2018年备考口径\\\n    #                              净利润、归母净利润、净资产、归母净资产和每股收益都将增厚约5%，2018年备考每股收益将从\\\n    #                              0.5457元每股增厚至0.5717元每股。为充分保障上市公司及中小股东利益，机场集团同意，自本次\\\n    #                              资产置换交割之日起五年内，上市公司享有一次回购物流公司股权的权利，即上市公司有权要求机\\\n    #                              场集团将本次交易取得的全部物流公司股权(对应同等金额的注册资本金额，包括在此基础上进行\\\n    #                              配股、转增、折股等所取得的股权)按届时评估值转让给上市公司。因此，上市公司在本次资产置\\\n    #                              换中拥有充分的主动权，可以选择重新取得物流公司的控制权。据悉，旅客服务费是公司主营航空\\\n    #                              性业务收入的重要组成部分，对业务完整性具有重要意义。旅客服务费全部由上市公司享有后，将\\\n    #                              较大幅度增加上市公司的收入、利润和现金流水平。受益于粤港澳大湾区规划及白云机场T2航站楼\\\n    #                              启用，旅客吞吐量逐年提升。未来随着白云机场的T3航站楼及新跑道的建设推进，旅客吞吐量还将\\\n    #                              进一步提升，15%旅客服务费对应收入将随之提升，并为公司贡献更多业绩增长空间。\"]\n\n    unseen_raw_documents_list = [\"格隆汇6月23日丨平安银行(000001.SZ)公布，近日收到《中国银保监会关于平安银行变更注册资本\\\n                                 的批复》(银保监复〔2020〕342号)，中国银行保险监督管理委员会同意本行将注册资本由人民币\\\n                                 17, 170, 411, 366元增加至19, 405, 918, 198元，并修改本行章程相应条款。\",\n                                 \"平安银行(000001,股吧)(000001.SZ)公布，公司于2020年8月19日收到《中国银保监会关于平安理\\\n                                 财有限责任公司开业的批复》(银保监复〔2020〕513号)，中国银行保险监督管理委员会(简称“中\\\n                                 国银保监会”)已批准公司全资子公司平安理财有限责任公司(简称“平安理财”)开业。根据中国银\\\n                                 保监会批复，平安理财注册资本为50亿元人民币，注册地为深圳市，主要从事发行公募理财产品、\\\n                                 发行私募理财产品、理财顾问和咨询等资产管理相关业务。　　近年来，公司以打造“中国最卓越\\\n                                 、全球领先的智能化零售银行”为战略目标，坚持“科技引领、零售突破、对公做精”十二字策略\\\n                                 方针，强化“综合金融”、“科技赋能”两大核心优势，打造数字化银行、生态银行、平台银行三\\\n                                 张名片，推动发展迈向新台阶。在此基础上，稳步推进资产管理和理财业务转型，综合服务能力不\\\n                                 断提升，规模、质量、效益实现协调发展。设立平安理财是本行严格落实监管要求、促进理财业务\\\n                                 健康发展、推动理财业务回归本源的重要举措。平安理财将秉持“受人之托，代客理财”的服务宗\\\n                                 旨，深耕理财市场，为客户提供更优质的资管产品和财富管理服务，助力实体经济高质量发展。下\\\n                                 一步，公司将按照法律法规相关要求严格履行有关程序，推动平安理财尽快开业运营。\",\n                                 \"格隆汇5月26日丨平安银行(000001.SZ)公布，经中国银行保险监督管理委员会和中国人民银行批准\\\n                                 ，公司于近日在全国银行间债券市场成功发行了总额为300亿元人民币的小型微型企业贷款专项金融\\\n                                 债券。该期债券发行总规模为人民币300亿元，为3年期固定利率债券，票面利率为2.30%，募集资金\\\n                                 将依据适用法律和监管部门的批准，专项用于发放小型微型企业贷款，其中部分将用于发放与新冠\\\n                                 肺炎疫情防控相关的小微企业贷款，加大对小型微型企业信贷支持力度，推动小型微型企业业务稳\\\n                                 健、健康发展。\"]\n\n    topicmodelling = TopicModelling()\n    for unseen_doc in unseen_raw_documents_list:\n        chn_label = topicmodelling.classify_stock_news(unseen_doc,\n                                                       database_name,\n                                                       collection_name,\n                                                       label_name=label_name,\n                                                       topic_model_type=\"lsi\",\n                                                       classifier_model=\"rdforest\",  # rdforest / svm\n                                                       ori_dict_path=ori_dict_path,\n                                                       bowvec_save_path=bowvec_save_path)\n        logging.info(\"document '{}...' was classified with label '{}' for symbol {} ... \".format(unseen_doc[:20], chn_label, collection_name))\n\n    # lsi Tue, 15 Dec 2020 14:54:08 classifier.py[line:54] INFO train_pred: 0.9829  test_pred: 0.703 (只是去掉停用词、tab符以及空格符) 30DaysLabel\n    # lsi Tue, 15 Dec 2020 17:00:58 classifier.py[line:54] INFO train_pred: 0.9852  test_pred: 0.7492(去掉不含中文的词以及只有一个字符的词) 30DaysLabel\n    # lda Tue, 15 Dec 2020 17:29:56 classifier.py[line:54] INFO train_pred: 0.9498  test_pred: 0.7426(去掉不含中文的词以及只有一个字符的词) 30DaysLabel\n    # lsi Wed, 16 Dec 2020 15:57:28 classifier.py[line:54] INFO train_pred: 0.9872  test_pred: 0.7478(修改create_dictionary后) 30DaysLabel\n    # lsi Wed, 16 Dec 2020 17:14:57 classifier.py[line:54] INFO train_pred: 0.9777  test_pred: 0.7247(修改create_dictionary后) 3DaysLabel\n    # lsi Wed, 16 Dec 2020 17:30:15 classifier.py[line:54] INFO train_pred: 0.9883  test_pred: 0.7123(修改create_dictionary后) 60DaysLabel\n"
  },
  {
    "path": "legacy_v1/src/__init__.py",
    "content": ""
  },
  {
    "path": "legacy_v1/src/history_spyder_startup.bat",
    "content": "cd ./Gon\npython ./history_starter_stock_price.py\nstart python ./history_starter_cnstock.py\nstart python ./history_starter_nbd.py\nstart python ./history_starter_jrj.py"
  },
  {
    "path": "legacy_v1/src/main.py",
    "content": "import time\nimport logging\n\nfrom Kite import config\n\nfrom Gon.jrjspyder import JrjSpyder\nfrom Gon.nbdspyder import NbdSpyder\nfrom Gon.cnstockspyder import CnStockSpyder\nfrom Gon.stockinfospyder import StockInfoSpyder\n\nfrom Killua.denull import DeNull\nfrom Killua.deduplication import Deduplication\nfrom Killua.buildstocknewsdb import GenStockNewsDB\n\n\n# 1. 爬取历史数据\nstock_info_spyder = StockInfoSpyder(config.STOCK_DATABASE_NAME, config.COLLECTION_NAME_STOCK_BASIC_INFO)\nstock_info_spyder.get_historical_news(start_date=\"2020-01-01\")\n\ncnstock_spyder = CnStockSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\nfor url_to_be_crawled, type_chn in config.WEBSITES_LIST_TO_BE_CRAWLED_CNSTOCK.items():\n    logging.info(\"start crawling {} ...\".format(url_to_be_crawled))\n    cnstock_spyder.get_historical_news(url_to_be_crawled, category_chn=type_chn)\n    logging.info(\"finished ...\")\n    time.sleep(30)\n\njrj_spyder = JrjSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\njrj_spyder.get_historical_news(config.WEBSITES_LIST_TO_BE_CRAWLED_JRJ, start_date=\"2020-01-01\")\n\nnbd_spyder = NbdSpyder(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\nnbd_spyder.get_historical_news(60)\n\n\n# 2. 针对历史数据进行去重清洗\nDeduplication(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\nDeduplication(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\nDeduplication(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n\n\n# 3. 将历史数据中包含null值的行去掉\nDeNull(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK).run()\nDeNull(config.DATABASE_NAME, config.COLLECTION_NAME_NBD).run()\nDeNull(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ).run()\n\n\n# 4. 创建新的数据库，针对每一个股票，将所有涉及该股票的新闻都保存在新的数据库，并贴好\"利好\",\"利空\"和\"中性\"标签\ngen_stock_news_db = GenStockNewsDB()\ngen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_CNSTOCK)\ngen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_NBD)\ngen_stock_news_db.get_all_news_about_specific_stock(config.DATABASE_NAME, config.COLLECTION_NAME_JRJ)\n\n\n# 5. 开启实时爬取新闻数据\n"
  },
  {
    "path": "legacy_v1/src/realtime_spyder_startup.bat",
    "content": "@echo off\n:again\ncls\necho =========================== Please select programs below to run ===========================\necho 1 ./Gon/realtime_starter_cnstock.py\necho 2 ./Gon/realtime_starter_jrj.py\necho 3 ./Gon/realtime_starter_nbd.py\necho 4 ./Gon/realtime_starter_stock_price.py\necho 5 run all\necho.\necho Please input number 1-5:\nset /p num=\n\nif \"%num%\"==\"1\" (\ncd ./Gon\nstart python ./realtime_starter_redis_queue.py\nstart python ./realtime_starter_cnstock.py\n)\n\nif \"%num%\"==\"2\" (\ncd ./Gon\nstart python ./realtime_starter_redis_queue.py\nstart python ./realtime_starter_jrj.py\n)\n\nif \"%num%\"==\"3\" (\ncd ./Gon\nstart python ./realtime_starter_redis_queue.py\nstart python ./realtime_starter_nbd.py\n)\n\nif \"%num%\"==\"4\" (\ncd ./Gon\nstart python ./realtime_starter_redis_queue.py\nstart python ./realtime_starter_stock_price.py\n)\n\nif \"%num%\"==\"5\" (\ncd ./Gon\nstart python ./realtime_starter_redis_queue.py\nstart python ./realtime_starter_cnstock.py\nstart python ./realtime_starter_nbd.py\nstart python ./realtime_starter_jrj.py\nstart python ./realtime_starter_stock_price.py\n)\n"
  },
  {
    "path": "legacy_v1/src/realtime_spyder_stopall.bat",
    "content": "cd ./Gon\nstart python ./kill_realtime_spyder_tasks.py"
  },
  {
    "path": "reset_all_data.sh",
    "content": "#!/bin/bash\n# 一键清空所有数据并重新开始爬取\n\nset -e\n\necho \"==========================================\"\necho \"  FinnewsHunter 数据重置脚本\"\necho \"==========================================\"\necho \"\"\necho \"⚠️  警告：此操作将删除所有新闻和任务数据！\"\necho \"⚠️  此操作不可恢复！\"\necho \"\"\nread -p \"确认要清空所有数据吗？(yes/no): \" confirm\n\nif [ \"$confirm\" != \"yes\" ]; then\n    echo \"❌ 操作已取消\"\n    exit 0\nfi\n\necho \"\"\necho \"开始清空数据...\"\necho \"\"\n\n# 1. 清空PostgreSQL数据\necho \"[1/4] 清空PostgreSQL数据...\"\ndocker exec finnews_postgres psql -U finnews -d finnews_db <<EOF\n-- 清空新闻表\nDELETE FROM news;\n-- 清空任务表\nDELETE FROM crawl_tasks;\n-- 清空分析表（如果存在）\nDELETE FROM analyses;\n-- 重置自增ID\nALTER SEQUENCE news_id_seq RESTART WITH 1;\nALTER SEQUENCE crawl_tasks_id_seq RESTART WITH 1;\nALTER SEQUENCE analyses_id_seq RESTART WITH 1;\n-- 显示结果\nSELECT 'news表', COUNT(*) FROM news;\nSELECT 'crawl_tasks表', COUNT(*) FROM crawl_tasks;\nEOF\n\necho \"✅ PostgreSQL数据已清空\"\necho \"\"\n\n# 2. 清空Redis缓存\necho \"[2/4] 清空Redis缓存...\"\ndocker exec finnews_redis redis-cli FLUSHDB\necho \"✅ Redis缓存已清空\"\necho \"\"\n\n# 3. 清空Celery调度文件\necho \"[3/4] 清空Celery调度文件...\"\nrm -f backend/celerybeat-schedule\nrm -rf backend/celerybeat-schedule.db\necho \"✅ Celery调度文件已清空\"\necho \"\"\n\n# 4. 重启所有服务\necho \"[4/4] 重启服务...\"\ncd \"$(dirname \"$0\")\"\ndocker compose -f deploy/docker-compose.dev.yml restart celery-worker celery-beat\n\necho \"\"\necho \"==========================================\"\necho \"  ✨ 数据重置完成！\"\necho \"==========================================\"\necho \"\"\necho \"📋 状态：\"\necho \"  - PostgreSQL: 已清空\"\necho \"  - Redis: 已清空\"\necho \"  - Celery: 已重启\"\necho \"\"\necho \"🚀 下一步：\"\necho \"  1. Celery Beat 每1分钟会自动爬取10个新闻源\"\necho \"  2. 约5-10分钟后可在前端查看新数据\"\necho \"  3. 访问 http://localhost:3000 查看进度\"\necho \"\"\necho \"==========================================\"\n\n"
  },
  {
    "path": "thirdparty/DISC-FinLLM.md",
    "content": "# DISC-FinLLM - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\n```\n/home/ubuntu/DISC-FinLLM\n|-- .git/ (Git version control metadata)\n|-- LICENSE (Project license)\n|-- README-en.md (English documentation and project overview)\n|-- README.md (Chinese documentation and project overview)\n|-- cli_demo.py (Command-line interface demonstration and entry point)\n|-- web_demo.py (Web interface demonstration and entry point)\n|-- requirements.txt (Python package dependencies)\n|-- data/ (Contains JSON data files for different model components)\n|   |-- README.md\n|   |-- computing_part.json (Data for the financial computing module)\n|   |-- consulting_part.json (Data for the financial consulting module)\n|   |-- retrieval_part.json (Data for the financial knowledge retrieval module)\n|   |-- task_part.json (Data for the financial text analysis module)\n|-- eval/ (Contains evaluation data and the core evaluation logic)\n|   |-- README.md\n|   |-- computing_eval.json (Evaluation data for the computing module)\n|   |-- retriever_eval.json (Evaluation data for the retrieval module)\n|   |-- evaluator/ (Core module for all evaluation logic)\n|       |-- README.md\n|       |-- autoeval.py (Script for automated evaluation)\n|       |-- evaluate.py (Main evaluation script)\n|       |-- finllm.py (Core class/functions for interacting with the FinLLM)\n|       |-- preprocess.py (Script for data preprocessing before evaluation)\n|       |-- utils.py (Utility functions for evaluation)\n|-- images/ (Contains images used in the documentation)\n|   |-- README.md\n|   |-- data_en.png\n|   |-- data_zh.png\n|   |-- example_consult.gif\n|   |-- example_retrieval.gif\n|   |-- example_task.gif\n|   |-- example_tool.gif\n|   |-- lora_en.png\n|   |-- lora_zh.png\n|   |-- model_en.png\n|   |-- model_zh.png\n```\n\nThe project structure is concise and clearly organized, primarily focusing on demonstration, data, and evaluation. The root directory contains the main entry points (`cli_demo.py`, `web_demo.py`) and configuration files. The `data/` directory holds the instruction-tuning data for the four expert modules: financial consulting, text analysis, computing, and knowledge retrieval. The `eval/` directory is dedicated to model assessment, with the critical `evaluator/` subdirectory housing the core Python logic for evaluating the model's performance across different tasks. The `images/` folder contains visual assets for the documentation. This clean separation of concerns facilitates easy navigation and maintenance.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/DISC-FinLLM`: Contains the main application entry points (`cli_demo.py`, `web_demo.py`) that demonstrate the model's capabilities and orchestrate the high-level flow.\n*   `/home/ubuntu/DISC-FinLLM/eval/evaluator`: Contains the core Python classes and functions (`finllm.py`, `evaluate.py`, `autoeval.py`, `preprocess.py`, `utils.py`) responsible for loading the model, running evaluations, and handling data preparation. This is the heart of the model's operational and assessment logic.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module 1: Root/Demo Module (`/home/ubuntu/DISC-FinLLM`)\n\n### Core Responsibility\nThis module serves as the primary interface layer for the DISC-FinLLM, providing two distinct demonstration entry points: a command-line interface (`cli_demo.py`) and a web-based interface (`web_demo.py`). Its core function is to load the pre-trained FinLLM model and tokenizer, manage the conversation history, and facilitate real-time interaction with the user, including support for streaming responses.\n\n### Key Files and Functions\n*   **`cli_demo.py`**: Provides a simple, interactive terminal chat interface.\n    *   `init_model()`: Loads the model and tokenizer from the \"Go4miii/DISC-FinLLM\" path using `AutoModelForCausalLM` and `AutoTokenizer` from the `transformers` library. It sets `torch_dtype=torch.float16` and `device_map=\"auto\"` for efficient loading.\n    *   `clear_screen()`: Handles terminal clearing and prints the welcome message in Chinese, defining the basic commands (`exit`, `clear`, `stream`).\n    *   `main()`: The main chat loop, handling user input, command parsing, and calling the model's `chat` method for response generation, with optional streaming.\n*   **`web_demo.py`**: Implements a graphical chat interface using the `streamlit` framework.\n    *   `init_model()`: Similar to the CLI version, but decorated with `@st.cache_resource` to ensure the large model is loaded only once across sessions.\n    *   `clear_chat_history()`: Clears the `st.session_state.messages`.\n    *   `init_chat_history()`: Initializes the chat history and displays previous messages in the Streamlit interface.\n    *   `main()`: The main web application logic, handling user input via `st.chat_input` and displaying the model's streaming response in a chat message container.\n\n### Core Implementation and Dependencies\nThe core implementation relies heavily on the **Hugging Face `transformers`** library. The model loading process is standardized:\n1.  Load the model: `AutoModelForCausalLM.from_pretrained(...)`\n2.  Load the tokenizer: `AutoTokenizer.from_pretrained(...)`\n3.  Load generation configuration: `GenerationConfig.from_pretrained(...)`\n\nThe key interaction is the custom `model.chat(tokenizer, messages, stream=True)` method, which is assumed to be implemented within the model's `trust_remote_code` or a custom wrapper, providing a clean, multi-turn chat API.\n\n**Dependencies**: `torch`, `transformers`, `colorama` (for CLI), `streamlit` (for Web).\n\n## Module 2: Evaluation/Core Logic Module (`/home/ubuntu/DISC-FinLLM/eval/evaluator`)\n\n### Core Responsibility\nThis module contains the comprehensive framework for evaluating the performance of the DISC-FinLLM and other comparable models on the BBT-FinCUGE financial NLP benchmark. It abstracts the LLM interaction, manages model-specific configurations, handles dataset preprocessing, and implements task-specific evaluation metrics.\n\n### Key Files and Functions\n*   **`finllm.py`**: **LLM Abstraction and Model Wrappers**.\n    *   `DISCFINLLMBase` (Abstract Base Class): Defines the contract for all LLM wrappers with an abstract `generate(self, prompt: str) -> str` method.\n    *   Concrete Classes: Implements wrappers for various models like `DISCVFINLLMChatGLM26B`, `DISCVFINLLMBaichuan13BChat`, etc. These classes handle model-specific loading (including **LoRA** fine-tuning via `peft.PeftModel`), tokenization, and the actual generation call.\n*   **`evaluate.py`**: **Evaluation Logic and Prompt Engineering**.\n    *   Multiple `*Evaluator` Classes (e.g., `FinFEEvaluator`, `FinQAEvaluator`): Each class is responsible for a specific financial task (e.g., sentiment analysis, QA).\n    *   `__init__`: Loads the task-specific evaluation data and few-shot instruction samples.\n    *   `build_zero_shot_prompt` / `build_few_shot_prompt`: Implements prompt engineering by constructing the input text based on predefined templates and few-shot examples.\n    *   `evaluate`: Calculates the final metric (e.g., accuracy for sentiment, F1 for QA) by comparing model predictions (`preds`) with ground truth (`golds`).\n    *   `run_evaluation`: The main evaluation loop, iterating over all data samples, generating responses using the injected `llm.generate()` method, and calculating both zero-shot and few-shot metrics.\n*   **`autoeval.py`**: **Evaluation Orchestration**.\n    *   `model_lists` and `Eval_datasets`: Dictionaries mapping string names to the respective model and evaluator classes, implementing a **Factory Pattern**.\n    *   `main` block: Parses command-line arguments for model name, LoRA path, and dataset. It instantiates the chosen `llm` and `evaluator` and calls `evaluator().run_evaluation(llm)`.\n*   **`preprocess.py`**: **Data Preparation**.\n    *   `BBTFinCUGE` class: Manages the downloading and processing of the raw BBT-FinCUGE datasets.\n    *   `download_all()`: Uses `requests` to fetch raw JSON data from a GitHub repository.\n    *   `process_*` methods (e.g., `process_finfe`): Converts the raw dataset format into a standardized list of instances with `id`, `input`, `gold_answer`, and `source` fields.\n*   **`utils.py`**: **Utility Functions**.\n    *   `write_json`, `load_json`: Standardized JSON file I/O.\n    *   `_mixed_segmentation`, `_remove_punctuation`: Text cleaning and tokenization utilities, crucial for Chinese NLP tasks, using `nltk.word_tokenize`.\n    *   `_find_lcs`, `_compute_f1_score`: Implements the Longest Common Subsequence (LCS) algorithm and F1 score calculation, which is the core metric for generative tasks like QA.\n\n### Dependencies and Error/Performance\n**Dependencies**: `transformers`, `peft`, `torch`, `argparse`, `tqdm`, `requests`, `inspect`, `random`, `nltk`.\n**Performance**: The use of `torch.float16` and `device_map=\"auto\"` in model loading across all modules is a key performance optimization for large models on GPU. The `tqdm` library is used in `evaluate.py` to provide progress bars, enhancing user experience during long evaluation runs.\n**Error Handling**: Basic file existence checks are present in `preprocess.py` (`if not os.path.exists(file_path)`). The `evaluate.py` includes assertions (`assert len(golds) == len(preds)`) to ensure data integrity before metric calculation.\n\n### Module PlantUML Diagrams\n\n### Module 1: Root/Demo Module\n\n```plantuml\n@startuml\ntitle Root/Demo Module (cli_demo.py & web_demo.py)\n\nclass AutoModelForCausalLM\nclass AutoTokenizer\nclass GenerationConfig\nclass torch\nclass streamlit as st\nclass colorama\n\npackage \"Demo Scripts\" {\n    class cli_demo {\n        + init_model()\n        + clear_screen()\n        + main()\n    }\n\n    class web_demo {\n        + @st.cache_resource init_model()\n        + clear_chat_history()\n        + init_chat_history()\n        + main()\n    }\n}\n\ncli_demo ..> AutoModelForCausalLM : loads\ncli_demo ..> AutoTokenizer : loads\ncli_demo ..> GenerationConfig : loads\ncli_demo ..> torch : uses\ncli_demo ..> colorama : uses\n\nweb_demo ..> AutoModelForCausalLM : loads\nweb_demo ..> AutoTokenizer : loads\nweb_demo ..> GenerationConfig : loads\nweb_demo ..> torch : uses\nweb_demo ..> st : uses\n\nAutoModelForCausalLM <.. cli_demo : model.chat()\nAutoModelForCausalLM <.. web_demo : model.chat()\n\n@enduml\n```\n\n### Module 2: Evaluation/Core Logic Module\n\n```plantuml\n@startuml\ntitle Evaluation/Core Logic Module (eval/evaluator)\n\nabstract class DISCFINLLMBase {\n    + generate(prompt: str): str {abstract}\n}\n\npackage \"LLM Wrappers (finllm.py)\" {\n    class DISCVFINLLMChatGLM26B\n    class DISCVFINLLMBaichuan13BChat\n    class FinGPTv3\n    DISCFINLLMBase <|-- DISCVFINLLMChatGLM26B\n    DISCFINLLMBase <|-- DISCVFINLLMBaichuan13BChat\n    DISCFINLLMBase <|-- FinGPTv3\n}\n\npackage \"Data Preprocessing (preprocess.py)\" {\n    class BBTFinCUGE {\n        + download_all()\n        + process_finfe()\n        + process_finqa()\n        .. other process methods ..\n    }\n}\n\npackage \"Evaluation Logic (evaluate.py)\" {\n    class FinFEEvaluator {\n        + build_zero_shot_prompt()\n        + build_few_shot_prompt()\n        + evaluate(golds, preds)\n        + run_evaluation(llm)\n    }\n    class FinQAEvaluator\n    class FinCQAEvaluator\n    .. other Evaluators ..\n\n    FinFEEvaluator ..> BBTFinCUGE : loads instruct samples\n    FinFEEvaluator ..> DISCFINLLMBase : calls generate()\n}\n\npackage \"Utilities (utils.py)\" {\n    class Utils {\n        + write_json()\n        + load_json()\n        + _mixed_segmentation()\n        + _find_lcs()\n        + _compute_f1_score()\n    }\n}\n\npackage \"Orchestration (autoeval.py)\" {\n    class AutoEval {\n        + model_lists\n        + Eval_datasets\n        + main()\n    }\n}\n\nAutoEval --> DISCFINLLMBase : instantiates model\nAutoEval --> FinFEEvaluator : instantiates evaluator\nFinFEEvaluator ..> Utils : uses metrics/text processing\nBBTFinCUGE ..> Utils : uses load/write_json\n\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe DISC-FinLLM project is structured around a **modular, multi-expert design philosophy** centered on a clear separation of concerns between the LLM interaction, task-specific evaluation, and application demonstration.\n\nThe **core abstraction** is the `DISCFINLLMBase` abstract class defined in `finllm.py`. This class establishes a standardized interface (`generate(prompt: str) -> str`) for all underlying Large Language Models (LLMs), effectively decoupling the evaluation and application logic from the specific model implementation (e.g., ChatGLM, Baichuan, Bloomz). This allows the system to be easily extended to support new base models or different fine-tuned versions without modifying the evaluation framework.\n\nThe **design philosophy** is a **\"Model-as-a-Service\"** approach within the evaluation context. The LLM is treated as a black-box component that accepts a prompt and returns a response. The complexity of model loading, LoRA weight merging, and device management is encapsulated within the concrete model wrapper classes (e.g., `DISCVFINLLMBaichuan13BChat`). This encapsulation promotes code reusability and maintainability. Furthermore, the project implicitly follows a **Multi-Expert System** design, where the four data files (`consulting_part.json`, `task_part.json`, etc.) suggest the model is fine-tuned for distinct financial sub-tasks, which is then validated by the corresponding task-specific evaluators in `evaluate.py`.\n\nThe **lifecycle management** of the application is straightforward:\n1.  **Data Preparation**: The `preprocess.py` script manages the initial lifecycle phase by downloading and transforming raw BBT-FinCUGE data into a standardized format for evaluation.\n2.  **Model Loading**: The model is loaded once at the start of the application, either via `init_model()` in the demo scripts or via the `autoeval.py` orchestrator. Crucially, the use of `torch.float16` and `device_map=\"auto\"` ensures efficient, memory-optimized loading onto available hardware.\n3.  **Execution**:\n    *   **Demo Lifecycle**: The demo scripts maintain a continuous loop, managing conversation history (`messages` list) and repeatedly calling the model's `chat` method for each user turn.\n    *   **Evaluation Lifecycle**: The `autoeval.py` script orchestrates the evaluation, instantiating the chosen model and evaluator, running the full `run_evaluation` loop, and finally writing the metrics to a JSON file.\n\n#### 3.1.2. Component Interactions\n\nThe project exhibits two primary interaction flows: the **Demonstration Flow** and the **Evaluation Flow**.\n\n## 1. Demonstration Flow (e.g., `cli_demo.py`)\nThis flow is a direct, synchronous interaction between the user interface and the LLM.\n1.  **Initialization**: `cli_demo.py` calls `init_model()` to load the model and tokenizer.\n2.  **User Input**: The user provides a `prompt`.\n3.  **Request**: The script appends the user's prompt to the `messages` history.\n4.  **Generation**: The script calls the model's custom `model.chat(tokenizer, messages, stream=True)` method.\n5.  **Response**: The model generates a response, which is either printed as a stream (in `cli_demo.py`) or updated in a placeholder (in `web_demo.py`).\n6.  **History Update**: The model's response is appended to the `messages` history, maintaining the conversational context.\n\n## 2. Evaluation Flow (`autoeval.py` Orchestration)\nThis flow is more complex, involving multiple components to systematically test the LLM.\n1.  **Orchestration**: `autoeval.py` instantiates a specific `DISCFINLLMBase` implementation (`llm`) and one or more `*Evaluator` instances.\n2.  **Data Access**: The `*Evaluator` (e.g., `FinFEEvaluator`) loads its task-specific evaluation data (`finfe-eval.jsonl`) and few-shot samples (`instruct_samples.json`) using helper functions from `utils.py`.\n3.  **Prompt Engineering**: Inside `*Evaluator.run_evaluation()`, for each data sample, the appropriate prompt construction method (`build_zero_shot_prompt` or `build_few_shot_prompt`) is called. This is where the task-specific instruction and context are formatted for the LLM.\n4.  **LLM Interaction**: The evaluator calls `llm.generate(input_text)` on the model wrapper. This is the critical communication point, abstracting the underlying model's API.\n5.  **Metric Calculation**: The evaluator collects the model's predictions (`preds`) and compares them to the ground truth (`golds`). It uses utility functions from `utils.py` (e.g., `_remove_punctuation`, `_find_lcs`) to clean text and calculate metrics like F1 score or accuracy.\n6.  **Result Reporting**: The final metrics are returned to `autoeval.py`, which then aggregates and writes the results to a JSON file using `utils.write_json`.\n\nThe communication pattern between the `*Evaluator` and the `DISCFINLLMBase` is a clear example of the **Strategy Pattern**, where the evaluation logic (context) uses the model wrapper (strategy) to perform the generation task.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\ntitle DISC-FinLLM Overall Architecture\n\nskinparam componentStyle rectangle\n\npackage \"Application Layer\" {\n    [cli_demo.py] as CLI\n    [web_demo.py] as WEB\n}\n\npackage \"Core Model Abstraction\" {\n    abstract class DISCFINLLMBase\n    [Model Wrappers (finllm.py)] as WRAPPER\n    DISCFINLLMBase <|-- WRAPPER\n}\n\npackage \"Evaluation Framework\" {\n    [autoeval.py] as ORCHESTRATOR\n    [evaluate.py] as EVAL_LOGIC\n    [preprocess.py] as PREPROCESS\n    [utils.py] as UTILS\n    [Task Evaluators (e.g., FinFEEvaluator)] as EVALUATOR\n    EVAL_LOGIC ..> EVALUATOR\n}\n\npackage \"External Dependencies\" {\n    [Hugging Face Transformers] as HF\n    [PEFT (LoRA)] as PEFT\n    [BBT-FinCUGE Data] as DATA\n}\n\nCLI --> WRAPPER : loads & interacts\nWEB --> WRAPPER : loads & interacts\n\nORCHESTRATOR --> WRAPPER : instantiates LLM\nORCHESTRATOR --> EVALUATOR : instantiates Task Logic\n\nEVALUATOR --> WRAPPER : calls generate()\nEVALUATOR --> UTILS : uses metrics/helpers\nPREPROCESS --> DATA : downloads\nPREPROCESS --> UTILS : uses I/O\n\nWRAPPER --> HF : uses AutoModel/Tokenizer\nWRAPPER --> PEFT : loads LoRA weights\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe codebase, particularly the evaluation framework, leverages several fundamental design patterns to manage complexity and promote extensibility.\n\n## 1. Factory Pattern (Simple Factory)\n*   **Description**: The Factory Pattern is used to create objects without exposing the instantiation logic to the client.\n*   **Implementation**: In `autoeval.py`, the dictionaries `model_lists` and `Eval_datasets` act as simple factories.\n*   **Code Example (`autoeval.py`):**\n    ```python\n    # Factory for LLM models\n    model_lists = {\n        'chatglm-6b': DISCVFINLLMChatGLM6B,\n        'baichuan-13b-chat': DISCVFINLLMBaichuan13BChat,\n        # ...\n    }\n    # Factory for Evaluators\n    Eval_datasets = {\n        'finfe': FinFEEvaluator,\n        'finqa': FinQAEvaluator,\n        # ...\n    }\n    # Client code instantiates based on string key\n    llm = model_lists.get(model_name)(device, lora_path)\n    # ...\n    evaluator = Eval_datasets.get(eval_data)\n    ```\n\n## 2. Abstract Factory / Template Method Pattern\n*   **Description**: The Abstract Factory pattern provides an interface for creating families of related or dependent objects without specifying their concrete classes. The Template Method pattern defines the skeleton of an algorithm in the superclass but lets subclasses override specific steps.\n*   **Implementation**: The `DISCFINLLMBase` abstract class defines the common interface (`generate`), while each concrete model wrapper (e.g., `DISCVFINLLMBaichuan13BChat`) implements the specific steps for model loading, tokenization, and generation logic, which varies significantly between models (e.g., ChatGLM's `chat` method vs. Baichuan's prompt templating).\n\n## 3. Strategy Pattern\n*   **Description**: The Strategy Pattern defines a family of algorithms, encapsulates each one, and makes them interchangeable.\n*   **Implementation**: The `*Evaluator` classes (the context) use the `DISCFINLLMBase` instance (`llm`, the strategy) to perform the text generation. The evaluation logic remains the same regardless of which concrete LLM implementation is used.\n\n#### 3.3.2. Project Highlights\n\nThe DISC-FinLLM project demonstrates several key design strengths, primarily focused on rigorous evaluation and model flexibility.\n\n*   **Comprehensive Evaluation Framework**: The most significant highlight is the dedicated, multi-task evaluation framework. By integrating the BBT-FinCUGE benchmark and creating distinct `*Evaluator` classes for tasks like sentiment analysis (`FinFE`), question answering (`FinQA`), and relation extraction (`FinRE`), the project ensures a **systematic and reproducible assessment** of the LLM's performance across the financial domain.\n*   **Model Agnosticism via Abstraction**: The use of the `DISCFINLLMBase` abstract class provides excellent **extensibility**. New LLMs (e.g., Llama, Qwen) can be integrated simply by creating a new concrete wrapper class that implements the `generate` method, without altering the core evaluation or demonstration logic.\n*   **LoRA Fine-Tuning Support**: The model wrappers in `finllm.py` are designed to support **LoRA (Low-Rank Adaptation)** fine-tuning out-of-the-box via the `peft` library. This allows developers to load a base model and merge LoRA weights dynamically, which is crucial for efficient experimentation and deployment of specialized financial models.\n*   **Dual Interface for Demonstration**: Providing both a **Command-Line Interface (`cli_demo.py`)** and a **Web Interface (`web_demo.py`)** using Streamlit enhances the project's **accessibility and usability**. This dual approach caters to both developers who prefer a quick terminal check and end-users who need a more polished, graphical demonstration.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nWhile the project is well-structured, several areas could be improved to enhance performance, architectural robustness, and code quality.\n\n## 1. Architectural Optimization: Model Loading\n*   **Suggestion**: Implement a **Singleton Pattern** or a dedicated **Model Manager** class for the LLM.\n*   **Reasoning**: Currently, the model loading logic is duplicated across the demo scripts and the evaluation wrappers, and the evaluation wrappers themselves contain repetitive model loading code. A Singleton pattern would ensure the large LLM is loaded only once per process, centralizing resource management and reducing memory overhead.\n\n## 2. Code Quality: Refactoring `evaluate.py`\n*   **Suggestion**: Introduce a common `BaseEvaluator` class in `evaluate.py` to abstract common methods like `__init__`, `run_evaluation`, and prompt building logic.\n*   **Reasoning**: The current `evaluate.py` file is excessively long (nearly 1000 lines) due to the high degree of code duplication across the many `*Evaluator` classes. Abstracting the common structure (loading data, iterating samples, calling `llm.generate`, calculating metrics) would significantly reduce file size and improve maintainability.\n\n## 3. Robustness and Error Handling\n*   **Suggestion**: Enhance error handling, particularly in `preprocess.py` and model loading.\n*   **Reasoning**: The `preprocess.py` download function only prints an error message on failure (`print('failed to download dataset {}, {}'.format(eval_dataset, e))`) but does not raise an exception or retry. In a production environment, network failures should be handled with retries or graceful failure. Similarly, model loading should include more robust exception handling for missing files or incompatible hardware.\n\n## 4. Performance: Text Processing\n*   **Suggestion**: Replace the dependency on `nltk` for simple Chinese segmentation and punctuation removal in `utils.py` with a lighter, custom regex-based function or a more modern, dedicated Chinese NLP library like `jieba`.\n*   **Reasoning**: The current implementation relies on `nltk.word_tokenize`, which may not be optimized for Chinese text and introduces a heavy dependency for simple tasks. A more streamlined approach could improve the performance of the metric calculation step.\n\n#### 3.4.2. Secondary Development Guide\n\nThis guide outlines the best path for developers looking to explore, modify, or extend the DISC-FinLLM project.\n\n## 1. Code Exploration and Entry Points\n*   **Application Flow**: Start with `cli_demo.py` to understand how the model is loaded (`init_model`) and how the chat loop is managed. This is the simplest entry point for testing model responses.\n*   **Evaluation Flow**: The core logic is orchestrated by `autoeval.py`. Examine this file to see how models and evaluators are instantiated using the Factory Pattern.\n*   **Model Abstraction**: Study `eval/evaluator/finllm.py`. This file is crucial for understanding how different LLMs are wrapped and how LoRA weights are integrated.\n\n## 2. Extending Model Support\nTo integrate a new LLM (e.g., Llama-3):\n1.  Create a new class in `finllm.py` (e.g., `DISCVFINLLMLlama3`) inheriting from `DISCFINLLMBase`.\n2.  Implement the `__init__` method to handle the specific model and tokenizer loading for Llama-3, including any necessary `trust_remote_code` or LoRA integration.\n3.  Implement the `generate(prompt: str)` method, ensuring it correctly formats the prompt and calls the model's generation function to return a clean string response.\n4.  Add the new class to the `model_lists` dictionary in `autoeval.py`.\n\n## 3. Adding a New Evaluation Task\nTo add a new financial NLP task:\n1.  Create a new class in `evaluate.py` (e.g., `FinNewTaskEvaluator`) following the structure of existing evaluators.\n2.  Define the `zero_shot_prompts` and `few_shot_prompts` templates specific to the new task.\n3.  Implement the `evaluate(golds, preds)` static method to calculate the correct metric (e.g., F1, accuracy, exact match) for the task, leveraging helper functions in `utils.py`.\n4.  Add the new evaluator class to the `Eval_datasets` dictionary in `autoeval.py`.\n\n## 4. Customizing Data and Metrics\n*   **Data**: The `preprocess.py` script is the place to modify how raw data is converted into the standardized `input`/`gold_answer` format.\n*   **Metrics**: The `utils.py` file contains the core logic for text cleaning (`_mixed_segmentation`) and metric calculation (`_compute_f1_score`). Modifications here will affect all generative evaluation tasks.\n\n"
  },
  {
    "path": "thirdparty/ElegantRL.md",
    "content": "# ElegantRL - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe ElegantRL project is structured to separate the core reinforcement learning logic from examples, documentation, and utility components. The core logic resides primarily in the `elegantrl` directory, which is further divided into functional modules: `agents`, `envs`, and `train`.\n\n```\n/home/ubuntu/ElegantRL\n|____.github/             # GitHub configuration files (e.g., FUNDING.yml)\n|____docs/                # Documentation source files (using Sphinx/reStructuredText)\n|____elegantrl/           # Core Reinforcement Learning Library\n| |______init__.py        # Package initialization\n| |____agents/            # Implementations of various DRL agents (AgentBase, AgentPPO, AgentSAC, etc.)\n| |____envs/              # Custom and wrapper environments (StockTradingEnv, CustomGymEnv, etc.)\n| |____train/             # Core training components (config, evaluator, replay_buffer, run)\n|____examples/            # Scripts demonstrating how to use the library with different algorithms and environments\n|____figs/                # Figures and images used in documentation and README\n|____helloworld/          # Simple, single-file examples for quick start and tutorials\n|____requirements.txt     # Python dependencies\n|____rlsolver/            # A separate, specialized solver component, likely for combinatorial optimization (CO) problems\n|____unit_tests/          # Test files for agents, environments, and training components\n```\n\nThe primary focus is on the `elegantrl` directory, which contains the fundamental components of the DRL library. The separation into `agents`, `envs`, and `train` enforces a clear modular design, making the codebase maintainable and extensible. The top-level folders like `examples`, `helloworld`, and `unit_tests` serve to support the core library by providing usage demonstrations and ensuring code quality. The `rlsolver` folder suggests a specialized application of the DRL framework to optimization problems.\n```\n\n### 1.2. Core Folders for Analysis\n\n- **elegantrl/agents**: Contains the base class `AgentBase` and concrete implementations for various Deep Reinforcement Learning (DRL) algorithms, including on-policy (PPO, A2C) and off-policy (SAC, TD3, DDPG, DQN) methods, as well as multi-agent extensions (MADDPG, MAPPO, QMix, VDN).\n- **elegantrl/envs**: Houses custom and specialized environment implementations, such as `StockTradingEnv` for financial applications and wrappers for vectorized environments.\n- **elegantrl/train**: Manages the training infrastructure, including configuration (`config.py`), the main execution logic (`run.py`), experience storage (`replay_buffer.py`), and performance monitoring (`evaluator.py`).\n\n## Phase 2: Module-by-Module Deep Analysis\n\n### 1. Module: `elegantrl/agents`\n\n**Core Responsibility:** Implements the core logic for Deep Reinforcement Learning (DRL) agents, defining the interaction between the agent and the environment, and managing the policy and value networks.\n\n**Key Files and Functions:**\n- **`AgentBase.py`**: Defines the abstract base class `AgentBase` for all DRL agents. It handles initialization parameters (network dimensions, environment info, hyperparameters), device management (CPU/GPU), exploration logic (`explore_env`, `explore_action`), network update boilerplate (`update_net`, `optimizer_backward`, `soft_update`), and utility network classes (`ActorBase`, `CriticBase`, `build_mlp`).\n- **`AgentPPO.py`**: Implements the **Proximal Policy Optimization (PPO)** algorithm, an on-policy method. It extends `AgentBase` and includes specific logic for Generalized Advantage Estimation (GAE), ratio clipping, and entropy regularization. It also contains `AgentA2C` as a simpler variant.\n- **`AgentSAC.py`**: Implements the **Soft Actor-Critic (SAC)** algorithm, an off-policy, maximum entropy DRL method. It uses an ensemble of critics (`CriticEnsemble`) and includes logic for automatic temperature parameter (`alpha`) adjustment.\n- **`AgentTD3.py`**: Implements the **Twin Delayed DDPG (TD3)** algorithm, an off-policy method that improves upon DDPG with clipped double Q-learning and delayed policy updates. It includes `AgentDDPG` as a simpler variant.\n- **`AgentDQN.py`**: Implements **Deep Q-Network (DQN)** and its variants (Double DQN, Dueling DQN) for discrete action spaces.\n- **`MAgent*.py`**: Contains multi-agent extensions like `MAgentMADDPG`, `MAgentMAPPO`, `MAgentQMix`, and `MAgentVDN`, which adapt single-agent algorithms for multi-agent systems.\n\n**Core Implementation Details:**\n- **Network Abstraction**: Agents rely on `ActorBase` and `CriticBase` (defined in `AgentBase.py`) which are essentially wrappers around PyTorch `nn.Module`s built using the `build_mlp` utility.\n- **Exploration**: The `explore_env` method is central, handling the collection of trajectories from the environment, distinguishing between single-environment (`_explore_one_env`) and vectorized environment (`_explore_vec_env`) scenarios.\n- **Update Logic**: The `update_net` method orchestrates the training. The core difference between on-policy (PPO) and off-policy (SAC, TD3) agents is evident here: PPO calculates advantages and reward sums from the collected batch, while off-policy agents sample from the `ReplayBuffer`.\n\n### 2. Module: `elegantrl/envs`\n\n**Core Responsibility:** Provides custom and specialized environment interfaces, particularly for financial and multi-agent tasks, and handles the creation of vectorized environments.\n\n**Key Files and Functions:**\n- **`CustomGymEnv.py`**: A template or wrapper for integrating custom environments that follow the OpenAI Gym/Gymnasium interface.\n- **`StockTradingEnv.py`**: A specialized environment for financial reinforcement learning, a key feature of the AI4Finance foundation. It defines the state, action, and reward space for a stock trading problem.\n- **`PlanIsaacGymEnv.py`**: Integration with NVIDIA's Isaac Gym for highly parallelized, high-performance simulation environments.\n- **`PointChasingEnv.py`**: A simple multi-agent environment used for testing and demonstration of multi-agent algorithms.\n\n**Core Implementation Details:**\n- **Standard Interface**: All environments adhere to the standard `reset()` and `step()` methods, ensuring compatibility with the `AgentBase`'s exploration logic.\n- **Vectorization**: The concept of a vectorized environment (`VecEnv` in `config.py`) is crucial, allowing multiple environment instances to run in parallel, which is essential for the \"Massively Parallel\" aspect of ElegantRL.\n\n### 3. Module: `elegantrl/train`\n\n**Core Responsibility:** Manages the overall training workflow, configuration, data storage, and performance evaluation.\n\n**Key Files and Functions:**\n- **`config.py`**: Defines the `Config` class, which holds all hyperparameters and environment metadata. It includes logic to automatically determine if an agent is on-policy or off-policy (`get_if_off_policy`) and contains the `VecEnv` and `SubEnv` classes for parallel environment execution using Python's `multiprocessing.Pipe` and `Process`.\n- **`replay_buffer.py`**: Implements the `ReplayBuffer` class for off-policy algorithms. It supports both standard sampling and **Prioritized Experience Replay (PER)** using the `SumTree` data structure.\n- **`run.py`**: Contains the main entry points for training (`train_agent`, `train_agent_single_process`, `train_agent_multiprocessing`). It defines the `Learner`, `Worker`, and `EvaluatorProc` classes for distributed training using Python's `multiprocessing`.\n- **`evaluator.py`**: Implements the `Evaluator` class for logging, saving checkpoints, and calculating performance metrics (average return, steps, loss values). It supports both single and vectorized environment evaluation and includes utilities for plotting the learning curve.\n\n**Core Implementation Details:**\n- **Parallelism**: The multi-process architecture in `run.py` is the backbone of ElegantRL's \"Massively Parallel\" claim. `Worker` processes collect experience from environments, and the `Learner` process updates the agent's networks, communicating via `Pipe`s.\n- **Data Flow**: In off-policy training, `Worker`s send collected experience to the `Learner`, which stores it in the `ReplayBuffer` and samples batches for network updates. In on-policy training, the collected experience is used directly for a few epochs of updates before being discarded.\n\n### Module PlantUML Diagrams\n\n### 1. `elegantrl/agents` Module Diagram (Simplified Core)\n\n```puml\n@startuml\nskinparam classAttributeIconVisible false\n\nabstract class AgentBase {\n    + if_discrete: bool\n    + if_off_policy: bool\n    + net_dims: list\n    + state_dim: int\n    + action_dim: int\n    + device: torch.device\n    + act: ActorBase\n    + cri: CriticBase\n    + act_optimizer: Adam\n    + cri_optimizer: Adam\n    + explore_env(env, horizon_len)\n    + explore_action(state)\n    + update_net(buffer)\n    + update_objectives(buffer, update_t)\n    + soft_update(target_net, current_net, tau)\n}\n\nabstract class ActorBase extends nn.Module {\n    + net: nn.Sequential\n    + forward(state)\n    + get_action(state)\n}\n\nabstract class CriticBase extends nn.Module {\n    + net: nn.Sequential\n    + forward(state, action)\n    + get_q_values(state, action)\n}\n\nclass AgentPPO extends AgentBase {\n    + ratio_clip: float\n    + lambda_gae_adv: float\n    + get_advantages(states, rewards, undones, unmasks, values)\n}\n\nclass AgentSAC extends AgentBase {\n    + num_ensembles: int\n    + alpha_log: Parameter\n}\n\nclass AgentTD3 extends AgentBase {\n    + update_freq: int\n    + policy_noise_std: float\n}\n\nclass ActorPPO extends ActorBase {\n    + action_std_log: Parameter\n    + state_norm(state)\n    + get_logprob_entropy(state, action)\n}\n\nclass CriticPPO extends CriticBase {\n    + state_norm(state)\n}\n\nclass CriticEnsemble extends CriticBase {\n    + decoder_qs: list\n    + get_q_values(state, action)\n}\n\nAgentBase <|-- AgentPPO\nAgentBase <|-- AgentSAC\nAgentBase <|-- AgentTD3\nAgentBase <|-- AgentDDPG\nAgentBase <|-- AgentDQN\n\nActorBase <|-- ActorPPO\nCriticBase <|-- CriticPPO\nCriticBase <|-- CriticEnsemble\n\nAgentPPO *-- ActorPPO : uses\nAgentPPO *-- CriticPPO : uses\nAgentSAC *-- ActorSAC : uses\nAgentSAC *-- CriticEnsemble : uses\nAgentTD3 *-- Actor : uses\nAgentTD3 *-- CriticTwin : uses\n\n@enduml\n```\n\n### 2. `elegantrl/train` Module Diagram (Core Components)\n\n```puml\n@startuml\nskinparam classAttributeIconVisible false\n\nclass Config {\n    + num_envs: int\n    + agent_class: class\n    + env_class: class\n    + gamma: float\n    + learning_rate: float\n    + batch_size: int\n    + horizon_len: int\n    + buffer_size: int\n    + gpu_id: int\n    + init_before_training()\n    + get_if_off_policy()\n}\n\nclass SumTree {\n    + buf_len: int\n    + tree: Tensor\n    + update_ids(data_ids, prob)\n    + important_sampling(batch_size, beg, end, per_beta)\n}\n\nclass ReplayBuffer {\n    + max_size: int\n    + num_seqs: int\n    + states: Tensor\n    + actions: Tensor\n    + if_use_per: bool\n    + sum_trees: list[SumTree]\n    + update(items)\n    + sample(batch_size)\n    + sample_for_per(batch_size)\n}\n\nclass Evaluator {\n    + cwd: str\n    + total_step: int\n    + max_r: float\n    + recorder: list\n    + evaluate_and_save(actor, steps, exp_r, logging_tuple)\n    + save_training_curve_jpg()\n}\n\nclass SubEnv extends Process {\n    + sub_pipe0: Pipe\n    + vec_pipe1: Pipe\n    + run()\n}\n\nclass VecEnv {\n    + num_envs: int\n    + sub_envs: list[SubEnv]\n    + sub_pipe1s: list[Pipe]\n    + vec_pipe0: Pipe\n    + reset()\n    + step(action)\n}\n\nclass Worker extends Process {\n    + worker_pipe: Pipe\n    + learner_pipe: Pipe\n    + run()\n}\n\nclass Learner extends Process {\n    + recv_pipe: Pipe\n    + send_pipes: list[Pipe]\n    + run()\n}\n\nConfig *-- ReplayBuffer : configures\nReplayBuffer *-- SumTree : uses (for PER)\nConfig *-- VecEnv : creates\nVecEnv *-- SubEnv : manages\nLearner *-- ReplayBuffer : updates\nLearner *-- Worker : communicates\nLearner *-- EvaluatorProc : communicates\nWorker *-- VecEnv : uses\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe ElegantRL architecture is built around a set of highly modular and decoupled abstractions, primarily focused on the Actor-Critic paradigm and parallel execution.\n\n1.  **Agent (`AgentBase`)**: The central abstraction for any DRL algorithm. It encapsulates the policy (`act`), value function (`cri`), optimization logic, and exploration strategy. Concrete implementations like `AgentPPO` and `AgentSAC` inherit from this base class, ensuring a consistent interface for the training loop.\n2.  **Network (`ActorBase`, `CriticBase`)**: These define the neural network structures for the policy and value functions, respectively. They are decoupled from the agent logic, allowing for flexible network designs (e.g., `CriticTwin` for TD3, `CriticEnsemble` for SAC).\n3.  **Configuration (`Config`)**: A single source of truth for all hyperparameters, environment details, and training settings. This abstraction simplifies experiment management and ensures consistency across the entire framework.\n4.  **Experience Storage (`ReplayBuffer`, `SumTree`)**: Manages the collection and sampling of experience. The inclusion of `SumTree` for Prioritized Experience Replay (PER) highlights the focus on sample efficiency.\n5.  **Parallelism Components (`Learner`, `Worker`, `VecEnv`)**: These are the core components enabling the \"Massively Parallel\" design. The `Learner` handles model updates, while `Worker`s handle environment interaction, and `VecEnv` manages multiple environment instances in parallel processes (`SubEnv`).\n\n**Design Philosophy: Massively Parallel and Modular DRL**\nElegantRL's design philosophy is centered on two main pillars:\n\n1.  **Decoupled Parallelism**: The framework adopts a clear separation between the **data collection** (exploration) and the **model update** (learning) phases, a design common in high-throughput DRL systems. `Worker` processes run in parallel to collect massive amounts of experience, which is then asynchronously sent to the `Learner` process for efficient GPU-based training. This maximizes hardware utilization and significantly speeds up training.\n2.  **Modularity and Extensibility**: The codebase is highly modular, with clear boundaries between the `agents`, `envs`, and `train` components. This modularity makes it easy to implement new algorithms (by extending `AgentBase`), integrate new environments, or swap out core components like the `ReplayBuffer`.\n\n**Lifecycle Management**\nThe training lifecycle is managed by the `run.py` module:\n\n1.  **Initialization**: The `Config` object is initialized, and the `Learner`, `Worker`s, and `EvaluatorProc` processes are instantiated.\n2.  **Exploration (Worker)**: Each `Worker` process continuously interacts with its assigned `VecEnv` instances, collecting trajectories.\n3.  **Learning (Learner)**: The `Learner` receives batches of experience from all `Worker`s. It stores them in the `ReplayBuffer`, samples a batch, calculates the loss, updates the networks, and soft-updates the target networks.\n4.  **Synchronization**: The `Learner` periodically sends the updated policy network parameters back to the `Worker`s.\n5.  **Evaluation (Evaluator)**: The `Evaluator` process runs evaluation episodes, logs performance metrics, and handles model checkpointing.\n\n#### 3.1.2. Component Interactions\n\nThe inter-component communication is primarily handled by Python's `multiprocessing.Pipe` for inter-process communication (IPC), enabling the asynchronous and parallel nature of the framework.\n\n| Component | Role | Communication Pattern | Data Flow |\n| :--- | :--- | :--- | :--- |\n| **Worker** | Experience Collector | Sends data to `Learner` via `Pipe`. Receives model parameters from `Learner` via `Pipe`. | Trajectories (states, actions, rewards, etc.) -> `Learner`. Latest `Actor` state dict -> `Worker`. |\n| **Learner** | Model Updater | Receives data from `Worker`s. Sends model to `Worker`s and `Evaluator`. | Trajectories from `Worker`s -> `ReplayBuffer`. Sampled batches from `ReplayBuffer` -> `Agent` for update. |\n| **VecEnv** | Parallel Environment Manager | Manages multiple `SubEnv` processes using `Pipe`s. | Actions from `Worker` -> `SubEnv`. New states, rewards, dones from `SubEnv` -> `Worker`. |\n| **ReplayBuffer** | Experience Storage | Accessed exclusively by the `Learner` process. | Stores trajectories from `Worker`s. Provides sampled batches to `Learner`'s `Agent`. |\n| **Evaluator** | Performance Monitor | Receives training statistics from `Learner` via `Pipe`. | Training metrics (step, avgR, losses) -> `Evaluator`. |\n\n**Key Interaction Flow (Off-Policy Training):**\n\n1.  **Exploration**: `Worker` receives the latest `Actor` from `Learner`.\n2.  **Data Collection**: `Worker` calls `agent.explore_env(VecEnv)`, which executes `VecEnv.step()` across all `SubEnv`s in parallel, collecting a batch of trajectories.\n3.  **Data Transfer**: `Worker` sends the collected trajectories (e.g., 2048 steps * 8 environments) to the `Learner` via a `Pipe`.\n4.  **Storage**: `Learner` receives the data and calls `ReplayBuffer.update()`.\n5.  **Learning**: `Learner` repeatedly calls `ReplayBuffer.sample()` and passes the batch to `agent.update_net()`.\n6.  **Synchronization**: After a set number of learning steps, `Learner` sends the updated `Actor` weights back to the `Worker`s.\n7.  **Monitoring**: Periodically, `Learner` sends performance metrics to the `Evaluator` for logging and checkpointing.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\nskinparam defaultFontName Courier\nskinparam classAttributeIconVisible false\nskinparam packageStyle rectangle\n\ntitle ElegantRL Overall Architecture\n\npackage \"elegantrl.train\" {\n    class Config\n    class ReplayBuffer\n    class Evaluator\n    class Learner extends Process\n    class Worker extends Process\n    class VecEnv\n    class SubEnv extends Process\n}\n\npackage \"elegantrl.agents\" {\n    abstract class AgentBase\n    abstract class ActorBase\n    abstract class CriticBase\n}\n\npackage \"elegantrl.envs\" {\n    class Environment\n}\n\n' Relationships\n\n' 1. Configuration and Initialization\nConfig .> AgentBase : configures\nConfig .> ReplayBuffer : configures\nConfig .> VecEnv : configures\n\n' 2. Agent Core\nAgentBase <|-- AgentPPO\nAgentBase <|-- AgentSAC\nAgentBase <|-- AgentTD3\nAgentBase *-- ActorBase : uses\nAgentBase *-- CriticBase : uses\n\n' 3. Training Loop Components\nLearner *-- AgentBase : updates\nLearner *-- ReplayBuffer : manages\nLearner .> Evaluator : sends stats (Pipe)\n\nWorker .> AgentBase : uses for exploration\nWorker *-- VecEnv : collects data\n\n' 4. Inter-Process Communication (IPC)\nWorker .> Learner : sends data (Pipe)\nLearner .> Worker : sends model (Pipe)\n\n' 5. Environment Interaction\nVecEnv *-- SubEnv : manages parallel instances\nVecEnv .> Environment : wraps/uses\n\n' 6. Data Flow\nReplayBuffer .> AgentBase : samples data\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nElegantRL leverages several established software and reinforcement learning design patterns to achieve its modularity, stability, and performance goals.\n\n1.  **Actor-Critic Pattern (Reinforcement Learning Pattern)**\n    *   **Description**: Separates the policy (Actor) that selects actions from the value function (Critic) that estimates the expected return.\n    *   **Implementation**:\n        *   `AgentBase` is the abstract base for the entire pattern.\n        *   `ActorBase` and `CriticBase` define the network interfaces.\n        *   **Example (AgentPPO.py)**: The `AgentPPO` class explicitly instantiates `self.act = ActorPPO(...)` and `self.cri = CriticPPO(...)`, and the `update_objectives` method uses both to calculate the actor and critic losses.\n\n2.  **Target Network Pattern (Reinforcement Learning Pattern)**\n    *   **Description**: Used in off-policy algorithms (DDPG, TD3, SAC) to stabilize training by using a separate, delayed-update copy of the Q-network.\n    *   **Implementation**:\n        *   The `AgentBase` constructor initializes `self.act_target` and `self.cri_target`.\n        *   The static method `AgentBase.soft_update(target_net, current_net, tau)` implements the exponential moving average (EMA) update rule.\n        *   **Example (AgentTD3.py)**: The `update_objectives` method calculates the target Q-value using `next_q = self.cri_target.get_q_values(next_state, next_action).min(dim=1)[0]`.\n\n3.  **Factory Method Pattern (Software Design Pattern)**\n    *   **Description**: Defines an interface for creating an object, but lets subclasses alter the type of objects that will be created.\n    *   **Implementation**:\n        *   The `Config` object stores `self.agent_class` and `self.env_class`.\n        *   The `run.py` module uses these classes to instantiate the actual objects: `agent = args.agent_class(...)` and `env = build_env(args.env_class, ...)`.\n\n4.  **Strategy Pattern (Software Design Pattern)**\n    *   **Description**: Defines a family of algorithms, encapsulates each one, and makes them interchangeable.\n    *   **Implementation**:\n        *   The core training loop in `run.py` interacts only with the `AgentBase` interface (`agent.explore_env`, `agent.update_net`).\n        *   The specific implementation is encapsulated within the concrete strategy classes (`AgentPPO`, `AgentSAC`), making them interchangeable.\n\n5.  **Observer Pattern (Software Design Pattern)**\n    *   **Description**: Defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.\n    *   **Implementation**:\n        *   The `Learner` acts as the Subject, generating updated model parameters.\n        *   The `Worker`s and `Evaluator` act as Observers, receiving the updated model parameters (or performance data) via the IPC `Pipe`s.\n\n#### 3.3.2. Project Highlights\n\nElegantRL's design includes several innovative features that contribute to its high performance and usability:\n\n*   **Massively Parallel Architecture (Cloud-Native DRL)**: The core highlight is the clear separation of concerns into `Learner` (GPU-heavy computation) and multiple `Worker`s (CPU-heavy environment interaction), communicating via IPC. This design is highly scalable and is explicitly optimized for cloud-native DRL applications, allowing for efficient utilization of multi-core CPUs and single/multi-GPU setups.\n*   **Vectorized Environment Support (`VecEnv`)**: The framework natively supports running multiple environment instances in parallel within a single `Worker` process, dramatically increasing the data throughput (samples per second) and reducing the wall-clock time required for training. This is a crucial feature for on-policy algorithms like PPO.\n*   **Prioritized Experience Replay (PER) with `SumTree`**: The implementation of PER in `replay_buffer.py` using a dedicated `SumTree` data structure is a highlight. It ensures that the most \"surprising\" or high-error transitions are sampled more frequently, leading to faster convergence and better sample efficiency for off-policy methods.\n*   **Unified Agent Interface (`AgentBase`)**: By abstracting the core DRL logic into `AgentBase`, the framework provides a clean, consistent API for all algorithms (PPO, SAC, TD3, DQN, etc.). This significantly lowers the barrier to entry for users wanting to compare or switch between different algorithms.\n*   **Financial Reinforcement Learning Focus**: The inclusion of specialized environments like `StockTradingEnv` and the project's association with the AI4Finance-Foundation indicate a strong focus on applying DRL to complex financial problems, which often require the stability and efficiency ElegantRL provides.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nBased on the code structure and design, the following areas could be considered for improvement:\n\n1.  **Standardize Environment Interface**: The `elegantrl/envs` module contains custom environment implementations. While functional, adopting the latest Gymnasium API standards more strictly, possibly through a dedicated wrapper layer, would improve compatibility with the broader RL ecosystem and future-proof the environment integrations.\n2.  **Configuration Management**: The `Config` class is a simple data container. For large-scale experiments, migrating to a more robust configuration management system (e.g., Hydra, Gin-config) would allow for easier tracking, overriding, and composition of hyperparameter sets, especially for the multi-GPU and multi-process setups.\n3.  **Network Abstraction for Complex Architectures**: The current network building utility (`build_mlp`) is limited to simple Multi-Layer Perceptrons. Expanding the network module to include more complex, pre-built architectures (e.g., ResNets, attention-based models) or a more flexible network composition API would simplify the implementation of state-of-the-art DRL agents that require specialized network structures.\n4.  **Asynchronous Communication Overhead**: The reliance on Python's `multiprocessing.Pipe` for IPC, while simple, can introduce serialization/deserialization overhead, especially when transferring large batches of data (tensors) between `Worker` and `Learner`. Investigating more efficient IPC mechanisms like shared memory (e.g., PyTorch's `multiprocessing.shared_memory` or Ray) could further reduce latency and increase the overall throughput.\n5.  **Type Hinting and Documentation**: While type hints are present, expanding their use, especially in the core `AgentBase` and `run.py` components, along with more comprehensive docstrings, would significantly improve code readability and maintainability for secondary developers.\n\n#### 3.4.2. Secondary Development Guide\n\nFor developers looking to extend or build upon the ElegantRL framework, the following guide provides the best path for code exploration and secondary development:\n\n1.  **Implement a New Agent (Algorithm)**:\n    *   **Start with `AgentBase.py`**: Create a new class (e.g., `AgentNewRL`) that inherits from `AgentBase`.\n    *   **Define Networks**: Implement the specific Actor and Critic network architectures required by the new algorithm (e.g., `ActorNewRL`, `CriticNewRL`), inheriting from `ActorBase` and `CriticBase`.\n    *   **Override `__init__`**: Initialize the new agent, setting algorithm-specific hyperparameters and instantiating the new networks.\n    *   **Override `update_objectives`**: This is the most critical step. Implement the algorithm's core loss functions and optimization steps here.\n\n2.  **Integrate a New Environment**:\n    *   **Follow Gym/Gymnasium Standard**: Ensure the new environment implements the standard `__init__`, `reset`, and `step` methods.\n    *   **Use `elegantrl/envs` as a Template**: If the environment is complex, use `StockTradingEnv.py` as a template for structuring the state, action, and reward logic.\n    *   **Vectorization**: Ensure the environment is compatible with the `VecEnv` wrapper defined in `config.py` for high throughput.\n\n3.  **Explore the Training Workflow**:\n    *   **Configuration**: All experiments start with `config.py`. Understand how to set `agent_class`, `env_class`, and key hyperparameters.\n    *   **Execution**: The `run.py` module is the entry point. Focus on the `train_agent_multiprocessing` function to understand how `Learner` and `Worker` processes are launched and communicate.\n    *   **Data Flow**: Trace the data from `Worker.run()` (collection) through the `Pipe` to `Learner.run()` (storage and update) to fully grasp the parallel data pipeline.\n\n4.  **Debugging and Monitoring**:\n    *   **Logging**: Use the `Evaluator` in `evaluator.py` to monitor training progress.\n    *   **PyTorch Debugging**: Standard PyTorch debugging techniques can be applied directly within the `update_objectives` methods.\n\n"
  },
  {
    "path": "thirdparty/FinCast-fts.md",
    "content": "# FinCast-fts - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\n**Project Name:** FinCast-fts\n**Project Path:** /home/ubuntu/FinCast-fts\n\n```\n/home/ubuntu/FinCast-fts\n|____.git/ (EXCLUDE: Git version control metadata)\n|____.gitattributes (EXCLUDE: Git configuration)\n|____.gitignore (EXCLUDE: Files to ignore for Git)\n|____Inference/ (EXCLUDE: Jupyter notebook for model inference and demonstration)\n| |____inference_future.ipynb\n|____LICENSE (EXCLUDE: Project license)\n|____README.md (EXCLUDE: Project documentation)\n|____dep_install.sh (EXCLUDE: Script for dependency installation)\n|____env_setup.sh (EXCLUDE: Script for environment setup)\n|____experiments/ (EXCLUDE: Scripts for running long-horizon benchmarks and evaluations)\n| |____long_horizon_benchmarks/\n| | |____Freq_map_eval.py\n| | |____run_eval_ffm.py\n| | |____run_eval_ffm_dataset.py\n| | |____run_eval_ffm_stock.py\n|____notebooks/ (EXCLUDE: Jupyter notebook for result summary and visualization)\n| |____result_summary.ipynb\n|____paper.pdf (EXCLUDE: Associated research paper)\n|____peft_Fincast/ (CORE: Implementation for Parameter-Efficient Fine-Tuning (PEFT) integration)\n| |____peft_injector.py\n|____pics/ (EXCLUDE: Example images)\n| |____example1_APPL.png\n| |____example2_ETHUSD.png\n|____requirement_v2.txt (EXCLUDE: Project dependencies list)\n|____scripts/ (EXCLUDE: Shell scripts for running PEFT and evaluation)\n| |____Fincast_PEFT/\n| | |____local_4090_t1.sh\n| |____Fincast_eval/\n| | |____eval_stock_loop.sh\n| | |____eval_stock_loop_supervised42.sh\n|____setup.py (EXCLUDE: Python package setup file)\n|____src/ (CORE: Main source code directory)\n| |______init__.py\n| |____data_tools/ (CORE: Data loading, processing, and batch sampling utilities)\n| | |____Inference_dataset.py\n| | |____TSdataset.py\n| | |____batch_sampler.py\n| | |____batch_sampler_ddp.py\n| |____ffm/ (CORE: Core Financial Foundation Model (FFM) implementation)\n| | |______init__.py\n| | |____data_loader.py\n| | |____ffm_base.py\n| | |____ffm_torch_moe.py\n| | |____pytorch_patched_decoder_MOE.py\n| | |____time_features.py\n| | |____xreg_lib.py\n| |____st_moe_pytorch/ (CORE: Implementation of the Spatio-Temporal Mixture of Experts (ST-MoE) layer)\n| | |______init__.py\n| | |____distributed.py\n| | |____st_moe_pytorch.py\n| |____tools/ (CORE: General utility functions, metrics, model utils, and visualization)\n| | |______init__.py\n| | |____inference_utils.py\n| | |____metrics.py\n| | |____model_utils.py\n| | |____result_vis_plt.ipynb\n| | |____utils.py\n| |____unit_test/ (EXCLUDE: Contains a unit test script)\n| | |____BS_DDP_tc4.py\n```\n\nThe project is organized into five core logical modules under the root and `src/` directory: `peft_Fincast` for model adaptation, `src/data_tools` for data pipeline, `src/ffm` for the core model logic, `src/st_moe_pytorch` for the MoE implementation, and `src/tools` for utilities. The rest of the folders contain non-core elements like scripts, notebooks, and documentation.\n```\n\n### 1.2. Core Folders for Analysis\n\n- `/home/ubuntu/FinCast-fts/peft_Fincast`: Implementation for Parameter-Efficient Fine-Tuning (PEFT) integration.\n- `/home/ubuntu/FinCast-fts/src/data_tools`: Data loading, processing, and batch sampling utilities.\n- `/home/ubuntu/FinCast-fts/src/ffm`: Core Financial Foundation Model (FFM) implementation.\n- `/home/ubuntu/FinCast-fts/src/st_moe_pytorch`: Spatio-Temporal Mixture of Experts (ST-MoE) layer implementation.\n- `/home/ubuntu/FinCast-fts/src/tools`: General utility functions, metrics, and model utilities.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module Analysis\n\nThe FinCast-fts project is structured around a core deep learning model, the Financial Foundation Model (FFM), and its supporting infrastructure for data handling, training utilities, and inference. The architecture is heavily influenced by the TimesFM design, with significant modifications to incorporate a Spatio-Temporal Mixture of Experts (ST-MoE) layer.\n\n### 1. Module: `peft_Fincast` (Parameter-Efficient Fine-Tuning)\n\n*   **Files**: `peft_injector.py`\n*   **Core Responsibility**: This module is responsible for integrating Parameter-Efficient Fine-Tuning (PEFT), specifically LoRA (Low-Rank Adaptation) or DoRA, into the pre-trained FFM. This allows for efficient fine-tuning of the large model on downstream tasks by only training a small fraction of new parameters.\n*   **Key Implementation Details**:\n    *   **`wrap_with_peft` Function**: The main entry point, which takes the base model and LoRA hyperparameters (`lora_r`, `lora_alpha`, `lora_dropout`, `lora_targets_preset`). It uses the external `peft` library's `LoraConfig` and `get_peft_model` to inject the adapters.\n    *   **Target Selection (`_default_targets`)**: Defines presets for selecting which linear layers (`nn.Linear`) within the FFM should receive LoRA adapters. Presets include:\n        *   `attn`: Targets the attention mechanism's query/key/value projection (`qkv_proj`) and output projection (`o_proj`).\n        *   `attn_mlp`: Extends `attn` to include the feed-forward layers in both the input and horizon blocks.\n        *   `attn_mlp_gating`: Further extends to include the MoE gating mechanism (`moe.gate.to_gates`), indicating a focus on routing behavior.\n        *   `experts_heavy`: Targets the most parameters by including the experts themselves (`experts.experts`, `gate_proj`, `down_proj`).\n\n### 2. Module: `src/data_tools` (Data Handling and Batching)\n\n*   **Files**: `Inference_dataset.py`, `TSdataset.py`, `batch_sampler.py`, `batch_sampler_ddp.py`\n*   **Core Responsibility**: Manages the entire data pipeline, from reading raw CSV files to preparing batched, windowed, and optionally masked time-series data for both training and inference.\n*   **Key Implementation Details**:\n    *   **`TimeSeriesDataset_MultiCSV_train_Production` (`TSdataset.py`)**: The primary training dataset class. It reads multiple CSVs, converts multi-column data into a collection of univariate series, applies Z-score normalization (`sklearn.preprocessing.StandardScaler`), and generates sliding windows with a configurable stride (`data_slice_interval`) and variable context lengths (`possible_context_lengths`). It also implements input masking (`mask_ratio`) for potential pre-training objectives.\n    *   **`TimeSeriesDataset_SingleCSV_Inference` (`Inference_dataset.py`)**: A specialized dataset for inference on a single CSV, supporting both \"last window\" and \"sliding window\" modes. It returns metadata for traceability, which is crucial for post-inference analysis and plotting.\n    *   **`GroupByLengthBatchSampler_Production` (`batch_sampler.py`)**: A custom PyTorch `BatchSampler` that groups samples by their context length (`get_length`). This is a critical optimization, as it eliminates the need for padding within a batch, maximizing GPU efficiency for the Transformer architecture.\n    *   **`GroupByLengthBatchSampler_DDP` (`batch_sampler_ddp.py`)**: Extends the batch sampler for Distributed Data Parallel (DDP) training, ensuring that all ranks process a synchronized, deterministically shuffled subset of the data.\n\n### 3. Module: `src/ffm` (Financial Foundation Model Core)\n\n*   **Files**: `data_loader.py`, `ffm_base.py`, `ffm_torch_moe.py`, `pytorch_patched_decoder_MOE.py`, `time_features.py`, `xreg_lib.py`\n*   **Core Responsibility**: Contains the model definition, configuration, base API, and components for handling time-series features and external regressors.\n*   **Key Implementation Details**:\n    *   **`FFmBase` (`ffm_base.py`)**: Defines the abstract interface for the FFM API, including shared utilities like `_normalize` and `_renormalize` for per-time-series normalization. It also includes the complex logic for integrating **eXogenous Regressors (XReg)**, supporting two modes: \"timesfm + xreg\" (forecast residuals) and \"xreg + timesfm\" (forecast on residuals).\n    *   **`FFmTorch` (`ffm_torch_moe.py`)**: The concrete PyTorch implementation of the FFM API. It initializes the core model (`PatchedTimeSeriesDecoder_MOE`) and implements the inference loop (`_forecast`), handling checkpoint loading (including compiled models) and device placement (CPU/GPU).\n    *   **`PatchedTimeSeriesDecoder_MOE` (`pytorch_patched_decoder_MOE.py`)**: The main model class. It implements the Transformer-based decoder architecture, which operates on time-series patches.\n        *   **Patching**: Input time-series are reshaped into patches (`[B, N, P]`) before being passed to the transformer.\n        *   **Feature Injection**: It uses a `ResidualBlock` (`input_ff_layer`) to project the concatenated time-series patch and padding mask (`[P*2]`) into the model's hidden dimension.\n        *   **Frequency Embedding**: A learnable embedding (`freq_emb`) is added to the input to condition the model on the time-series frequency (e.g., high, medium, low).\n        *   **Output Head**: A final `ResidualBlock` (`horizon_ff_layer`) projects the transformer output to the prediction horizon, outputting both the mean and multiple quantiles.\n    *   **`TimesFMDecoderLayer` (`pytorch_patched_decoder_MOE.py`)**: The core building block of the transformer stack. It consists of:\n        *   **Attention**: `TimesFMAttention` (a standard multi-head attention with RMSNorm).\n        *   **Mixture of Experts (MoE)**: `SparseMoEBlock` (from `st_moe_pytorch`) is used as the feed-forward network, which is the key architectural innovation.\n    *   **`TimeCovariates` (`time_features.py`)**: Extracts a rich set of time-based features (minute, hour, day of week/month/year, month/week of year) and optional holiday features, which are then normalized.\n\n### 4. Module: `src/st_moe_pytorch` (Spatio-Temporal MoE)\n\n*   **Files**: `distributed.py`, `st_moe_pytorch.py`\n*   **Core Responsibility**: Provides the implementation for the Mixture of Experts (MoE) layer, which is integrated into the FFM's transformer blocks. This module is adapted from a general-purpose MoE library.\n*   **Key Implementation Details**:\n    *   **`MoE` (`st_moe_pytorch.py`)**: The main MoE class, composed of a `TopNGating` router and an `Experts` container.\n        *   **`TopNGating`**: The router computes raw gate logits, applies Gumbel noise (during training), and uses a differentiable top-K selection to choose the top `top_n` experts for each token. It also calculates auxiliary losses (`balance_loss`, `router_z_loss`) to encourage balanced expert usage.\n        *   **`Experts`**: A container for the individual `Expert` modules (which are simple MLPs). It handles the dispatching of tokens to the selected experts and combining the outputs.\n    *   **`SparseMoEBlock`**: Wraps the `MoE` layer, adding pre- and post-feed-forward layers (`ff_before`, `ff_after`) and a residual connection, which is noted in the source code as a stabilization technique.\n    *   **`distributed.py`**: Contains utility functions (`all_gather_variable_dim`, `AllGatherFunction`) for handling distributed communication (All-Gather) of variable-sized tensors, necessary for efficient distributed training of MoE models.\n\n### 5. Module: `src/tools` (Utilities)\n\n*   **Files**: `inference_utils.py`, `metrics.py`, `model_utils.py`, `utils.py`\n*   **Core Responsibility**: Provides miscellaneous utilities for model loading, evaluation, metrics calculation, and visualization.\n*   **Key Implementation Details**:\n    *   **`inference_utils.py`**: Contains the high-level `FinCast_Inference` class, which orchestrates the entire inference process: dataset creation, model loading, running the `DataLoader`, and post-processing the results. It also includes functions for plotting (`plot_last_outputs`) and saving outputs to CSV.\n    *   **`metrics.py`**: Implements standard time-series evaluation metrics using NumPy, including MAE, MSE, RMSE, MAPE, MSPE, RSE, and CORR.\n    *   **`model_utils.py`**: Simple helper to instantiate the FFM model (`FFM`) and its configuration (`FFmHparams`) from a checkpoint path.\n    *   **`utils.py`**: Provides logging and parameter counting utilities (`log_model_statistics`) for tracking model size and configuration.\n\n---\n## Module PlantUML Diagrams\n\n### 1. Module: `peft_Fincast`\n\n```puml\n@startuml peft_Fincast\nskinparam classAttributeIconSize 0\n\npackage \"peft_Fincast\" {\n    class peft_injector {\n        + wrap_with_peft(model, ...)\n        -- Private --\n        - _default_targets(model, preset)\n        - resolve_linear_targets(model, patterns)\n        - _unfreeze_all_params(model)\n    }\n}\n\npackage \"External: peft\" {\n    class LoraConfig\n    class get_peft_model\n}\n\npackage \"External: torch\" {\n    class nn.Module\n    class nn.Linear\n}\n\npeft_injector ..> LoraConfig : uses\npeft_injector ..> get_peft_model : uses\npeft_injector ..> nn.Module : operates on\npeft_injector ..> nn.Linear : targets\n\n@enduml\n```\n\n### 2. Module: `src/data_tools`\n\n```puml\n@startuml data_tools\nskinparam classAttributeIconSize 0\n\npackage \"data_tools\" {\n    class TimeSeriesDataset_MultiCSV_train_Production {\n        + __init__(...)\n        + __len__()\n        + get_length(idx)\n        + __getitem__(idx)\n        -- Private --\n        - _read_csvs()\n        - _prepare_index_records()\n    }\n\n    class TimeSeriesDataset_SingleCSV_Inference {\n        + __init__(...)\n        + __len__()\n        + get_length(idx)\n        + __getitem__(idx)\n        -- Private --\n        - _make_meta(series_idx, window_start)\n    }\n\n    class GroupByLengthBatchSampler_Production {\n        + __init__(dataset, batch_size, ...)\n        + __iter__()\n        + __len__()\n    }\n\n    class GroupByLengthBatchSampler_DDP {\n        + __init__(dataset, batch_size, ...)\n        + __iter__()\n        + __len__()\n        + set_epoch(epoch)\n    }\n\n    object function {\n        + freq_reader(file_path, freq_dict, mode)\n    }\n}\n\nTimeSeriesDataset_MultiCSV_train_Production ..> function : uses freq_reader\nTimeSeriesDataset_SingleCSV_Inference ..> function : uses freq_reader\nGroupByLengthBatchSampler_Production ..> TimeSeriesDataset_MultiCSV_train_Production : operates on\nGroupByLengthBatchSampler_DDP ..> TimeSeriesDataset_MultiCSV_train_Production : operates on\n\nTimeSeriesDataset_MultiCSV_train_Production .up.|> torch.utils.data.Dataset\nTimeSeriesDataset_SingleCSV_Inference .up.|> torch.utils.data.Dataset\nGroupByLengthBatchSampler_DDP .up.|> torch.utils.data.Sampler\nGroupByLengthBatchSampler_Production .up.|> torch.utils.data.BatchSampler\n\n@enduml\n```\n\n### 3. Module: `src/ffm` (Core Model)\n\n```puml\n@startuml ffm_core\nskinparam classAttributeIconSize 0\n\npackage \"ffm\" {\n    class FFmHparams << (D,orchid) dataclass >> {\n        + context_len : int\n        + horizon_len : int\n        + num_experts : int\n        + gating_top_n : int\n        + ...\n    }\n\n    abstract class FFmBase {\n        + __init__(hparams, checkpoint, ...)\n        + forecast(...)\n        + forecast_on_df(...)\n        -- Private --\n        - _preprocess(inputs, freq)\n        - _forecast(...)\n    }\n\n    class FFmTorch {\n        + __init__(hparams, checkpoint, ...)\n        + load_from_checkpoint_ffm(checkpoint)\n        + model_eval_mode()\n        -- Private --\n        - _forecast(...)\n    }\n\n    class PatchedTimeSeriesDecoder_MOE {\n        + config : FFMConfig\n        + input_ff_layer : ResidualBlock\n        + horizon_ff_layer : ResidualBlock\n        + stacked_transformer : StackedDecoder\n        + decode(...)\n        + forward(...)\n        -- Private --\n        - _preprocess_input(...)\n        - _postprocess_output(...)\n        - _forward_transform(...)\n        - _reverse_transform(...)\n    }\n\n    class TimeSeriesdata << (T,yellow) TensorFlow >> {\n        + __init__(...)\n        + train_gen()\n        + test_val_gen(mode, shift)\n        + tf_dataset(mode, shift)\n    }\n\n    class TimeCovariates {\n        + __init__(datetimes, ...)\n        + get_covariates()\n    }\n\n    class BatchedInContextXRegLinear {\n        + fit(...)\n        + create_covariate_matrix(...)\n    }\n}\n\nFFmTorch --|> FFmBase\nFFmTorch o-- PatchedTimeSeriesDecoder_MOE : wraps\nFFmBase o-- FFmHparams : config\nFFmBase ..> BatchedInContextXRegLinear : uses for XReg\nTimeSeriesdata ..> TimeCovariates : uses\nPatchedTimeSeriesDecoder_MOE ..> FFMConfig : config\nPatchedTimeSeriesDecoder_MOE ..> StackedDecoder : contains\nPatchedTimeSeriesDecoder_MOE ..> ResidualBlock : contains\n\n@enduml\n```\n\n### 4. Module: `src/st_moe_pytorch` (Spatio-Temporal MoE)\n\n```puml\n@startuml st_moe_pytorch\nskinparam classAttributeIconSize 0\n\npackage \"st_moe_pytorch\" {\n    class MoE {\n        + gate : TopNGating\n        + experts : Experts\n        + forward(x, ...) : MixtureOfExpertsReturn\n    }\n\n    class SparseMoEBlock {\n        + moe : MoE\n        + ff_before : Expert\n        + ff_after : Expert\n        + forward(x, ...) : MixtureOfExpertsReturn\n    }\n\n    class TopNGating {\n        + to_gates : nn.Linear\n        + forward(x, ...) : dispatch_tensor, combine_tensor, ...\n    }\n\n    class Experts {\n        + experts : ModuleList<Expert>\n        + forward(x, ...)\n    }\n\n    class Expert {\n        + gate_proj : nn.Linear\n        + down_proj : nn.Linear\n        + forward(x, paddings)\n        \n    }\n\n    class AllGatherFunction << (F,darkgreen) Distributed >>\n    class AllGather << (M,darkgreen) Distributed >>\n}\n\nSparseMoEBlock o-- MoE\nMoE o-- TopNGating\nMoE o-- Experts\nExperts o-- Expert\nTopNGating ..> AllGather : uses (indirectly via distributed utils)\n\n@enduml\n```\n\n### 5. Module: `src/tools` (Utilities)\n\n```puml\n@startuml tools\nskinparam classAttributeIconSize 0\n\npackage \"tools\" {\n    class FinCast_Inference {\n        + __init__(config)\n        + run_inference(...)\n        -- Private --\n        - _make_inference_loader(...)\n    }\n\n    object function {\n        + plot_last_outputs(...)\n        + _save_outputs_to_csv(...)\n        + get_model_api(...)\n        + log_model_statistics(...)\n        + MAE, MSE, RMSE, MAPE, RSE, CORR\n    }\n}\n\nFinCast_Inference ..> data_tools.TimeSeriesDataset_SingleCSV_Inference : creates\nFinCast_Inference ..> ffm.FFmTorch : loads model API\nFinCast_Inference ..> function : uses utilities\n\n@enduml\n```\n\n### Module PlantUML Diagrams\n\n### 1. Module: `peft_Fincast`\n\n```puml\n@startuml peft_Fincast\nskinparam classAttributeIconSize 0\n\npackage \"peft_Fincast\" {\n    class peft_injector {\n        + wrap_with_peft(model, ...)\n        -- Private --\n        - _default_targets(model, preset)\n        - resolve_linear_targets(model, patterns)\n        - _unfreeze_all_params(model)\n    }\n}\n\npackage \"External: peft\" {\n    class LoraConfig\n    class get_peft_model\n}\n\npackage \"External: torch\" {\n    class nn.Module\n    class nn.Linear\n}\n\npeft_injector ..> LoraConfig : uses\npeft_injector ..> get_peft_model : uses\npeft_injector ..> nn.Module : operates on\npeft_injector ..> nn.Linear : targets\n\n@enduml\n```\n\n### 2. Module: `src/data_tools`\n\n```puml\n@startuml data_tools\nskinparam classAttributeIconSize 0\n\npackage \"data_tools\" {\n    class TimeSeriesDataset_MultiCSV_train_Production {\n        + __init__(...)\n        + __len__()\n        + get_length(idx)\n        + __getitem__(idx)\n        -- Private --\n        - _read_csvs()\n        - _prepare_index_records()\n    }\n\n    class TimeSeriesDataset_SingleCSV_Inference {\n        + __init__(...)\n        + __len__()\n        + get_length(idx)\n        + __getitem__(idx)\n        -- Private --\n        - _make_meta(series_idx, window_start)\n    }\n\n    class GroupByLengthBatchSampler_Production {\n        + __init__(dataset, batch_size, ...)\n        + __iter__()\n        + __len__()\n    }\n\n    class GroupByLengthBatchSampler_DDP {\n        + __init__(dataset, batch_size, ...)\n        + __iter__()\n        + __len__()\n        + set_epoch(epoch)\n    }\n\n    object function {\n        + freq_reader(file_path, freq_dict, mode)\n    }\n}\n\nTimeSeriesDataset_MultiCSV_train_Production ..> function : uses freq_reader\nTimeSeriesDataset_SingleCSV_Inference ..> function : uses freq_reader\nGroupByLengthBatchSampler_Production ..> TimeSeriesDataset_MultiCSV_train_Production : operates on\nGroupByLengthBatchSampler_DDP ..> TimeSeriesDataset_MultiCSV_train_Production : operates on\n\nTimeSeriesDataset_MultiCSV_train_Production .up.|> torch.utils.data.Dataset\nTimeSeriesDataset_SingleCSV_Inference .up.|> torch.utils.data.Dataset\nGroupByLengthBatchSampler_DDP .up.|> torch.utils.data.Sampler\nGroupByLengthBatchSampler_Production .up.|> torch.utils.data.BatchSampler\n\n@enduml\n```\n\n### 3. Module: `src/ffm` (Core Model)\n\n```puml\n@startuml ffm_core\nskinparam classAttributeIconSize 0\n\npackage \"ffm\" {\n    class FFmHparams << (D,orchid) dataclass >> {\n        + context_len : int\n        + horizon_len : int\n        + num_experts : int\n        + gating_top_n : int\n        + ...\n    }\n\n    abstract class FFmBase {\n        + __init__(hparams, checkpoint, ...)\n        + forecast(...)\n        + forecast_on_df(...)\n        -- Private --\n        - _preprocess(inputs, freq)\n        - _forecast(...)\n    }\n\n    class FFmTorch {\n        + __init__(hparams, checkpoint, ...)\n        + load_from_checkpoint_ffm(checkpoint)\n        + model_eval_mode()\n        -- Private --\n        - _forecast(...)\n    }\n\n    class PatchedTimeSeriesDecoder_MOE {\n        + config : FFMConfig\n        + input_ff_layer : ResidualBlock\n        + horizon_ff_layer : ResidualBlock\n        + stacked_transformer : StackedDecoder\n        + decode(...)\n        + forward(...)\n        -- Private --\n        - _preprocess_input(...)\n        - _postprocess_output(...)\n        - _forward_transform(...)\n        - _reverse_transform(...)\n    }\n\n    class TimeSeriesdata << (T,yellow) TensorFlow >> {\n        + __init__(...)\n        + train_gen()\n        + test_val_gen(mode, shift)\n        + tf_dataset(mode, shift)\n    }\n\n    class TimeCovariates {\n        + __init__(datetimes, ...)\n        + get_covariates()\n    }\n\n    class BatchedInContextXRegLinear {\n        + fit(...)\n        + create_covariate_matrix(...)\n    }\n}\n\nFFmTorch --|> FFmBase\nFFmTorch o-- PatchedTimeSeriesDecoder_MOE : wraps\nFFmBase o-- FFmHparams : config\nFFmBase ..> BatchedInContextXRegLinear : uses for XReg\nTimeSeriesdata ..> TimeCovariates : uses\nPatchedTimeSeriesDecoder_MOE ..> FFMConfig : config\nPatchedTimeSeriesDecoder_MOE ..> StackedDecoder : contains\nPatchedTimeSeriesDecoder_MOE ..> ResidualBlock : contains\n\n@enduml\n```\n\n### 4. Module: `src/st_moe_pytorch` (Spatio-Temporal MoE)\n\n```puml\n@startuml st_moe_pytorch\nskinparam classAttributeIconSize 0\n\npackage \"st_moe_pytorch\" {\n    class MoE {\n        + gate : TopNGating\n        + experts : Experts\n        + forward(x, ...) : MixtureOfExpertsReturn\n    }\n\n    class SparseMoEBlock {\n        + moe : MoE\n        + ff_before : Expert\n        + ff_after : Expert\n        + forward(x, ...) : MixtureOfExpertsReturn\n    }\n\n    class TopNGating {\n        + to_gates : nn.Linear\n        + forward(x, ...) : dispatch_tensor, combine_tensor, ...\n    }\n\n    class Experts {\n        + experts : ModuleList<Expert>\n        + forward(x, ...)\n    }\n\n    class Expert {\n        + gate_proj : nn.Linear\n        + down_proj : nn.Linear\n        + forward(x, paddings)\n        \n    }\n\n    class AllGatherFunction << (F,darkgreen) Distributed >>\n    class AllGather << (M,darkgreen) Distributed >>\n}\n\nSparseMoEBlock o-- MoE\nMoE o-- TopNGating\nMoE o-- Experts\nExperts o-- Expert\nTopNGating ..> AllGather : uses (indirectly via distributed utils)\n\n@enduml\n```\n\n### 5. Module: `src/tools` (Utilities)\n\n```puml\n@startuml tools\nskinparam classAttributeIconSize 0\n\npackage \"tools\" {\n    class FinCast_Inference {\n        + __init__(config)\n        + run_inference(...)\n        -- Private --\n        - _make_inference_loader(...)\n    }\n\n    object function {\n        + plot_last_outputs(...)\n        + _save_outputs_to_csv(...)\n        + get_model_api(...)\n        + log_model_statistics(...)\n        + MAE, MSE, RMSE, MAPE, RSE, CORR\n    }\n}\n\nFinCast_Inference ..> data_tools.TimeSeriesDataset_SingleCSV_Inference : creates\nFinCast_Inference ..> ffm.FFmTorch : loads model API\nFinCast_Inference ..> function : uses utilities\n\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\n## Core Abstractions, Design Philosophy, and Lifecycle Management\n\nThe FinCast-fts project implements a sophisticated architecture for financial time-series forecasting, centered on the **Financial Foundation Model (FFM)**. Its design is characterized by a set of powerful abstractions and a clear philosophy focused on scalability, efficiency, and predictive richness.\n\n### Core Abstractions\n\nThe system is built upon four primary abstractions that govern how time-series data is processed and modeled:\n\n1.  **Time-Series Patch**: The fundamental unit of data input is not a single time step but a **patch** (defined by `patch_len`, typically 32). The input time-series is segmented into a sequence of overlapping or non-overlapping patches, transforming the 1D series into a 2D sequence (`[N_patches, Patch_len]`). This patching mechanism is a core component of the underlying TimesFM architecture, enabling the Transformer to process local temporal patterns efficiently.\n2.  **Frequency Embedding**: The model explicitly handles time-series data of varying frequencies (e.g., high, medium, low) by introducing a **Frequency Embedding** (`freq_emb`). This categorical embedding is added to the input representation, allowing the single FFM to condition its internal weights and attention mechanisms based on the inherent periodicity and characteristics of the input data.\n3.  **Spatio-Temporal Mixture of Experts (ST-MoE)**: This is the central architectural innovation. The traditional Feed-Forward Network (FFN) within the Transformer block is replaced by a **SparseMoEBlock**. This abstraction allows the model to scale its parameter count dramatically (via multiple \"experts\") while maintaining a constant computational cost during inference. For any given input token (patch), a router selects only the top $K$ experts to process the data, enabling high capacity with sparse activation.\n4.  **Quantile Forecast**: The model's output is abstracted beyond a simple point prediction (mean/median). The final layer predicts a full set of **Quantiles** (e.g., 0.1, 0.5, 0.9), providing a complete predictive distribution. This is essential for financial applications where risk assessment and uncertainty quantification are critical.\n\n### Design Philosophy\n\nThe project's design adheres to three key philosophical tenets:\n\n*   **Foundation Model Paradigm**: The FFM is designed as a large, pre-trained model capable of zero-shot or few-shot generalization across diverse financial time-series datasets. The goal is to capture universal temporal patterns and financial market dynamics, making it a powerful base model for various downstream tasks.\n*   **Efficiency and Scalability**: The combination of **ST-MoE** and **Parameter-Efficient Fine-Tuning (PEFT)** drives the efficiency philosophy. ST-MoE ensures that the model can scale its capacity (number of experts) without a proportional increase in computational load. PEFT, implemented via the `peft_Fincast` module, allows for rapid, low-resource fine-tuning by only training small, low-rank adapters (LoRA) instead of the entire massive model.\n*   **Data-Centric Optimization**: The use of the custom `GroupByLengthBatchSampler` is a pragmatic design choice to maximize hardware utilization. By grouping time-series samples by their context length, the system eliminates the need for zero-padding within batches, ensuring that all computation is meaningful and accelerating the training process significantly.\n\n### Lifecycle Management\n\nThe project's lifecycle is clearly delineated across its modules:\n\n| Phase | Module(s) Responsible | Key Components |\n| :--- | :--- | :--- |\n| **Data Ingestion & Preparation** | `src/data_tools`, `src/ffm/time_features.py` | `TSdataset`, `Inference_dataset`, `TimeCovariates` |\n| **Training Optimization** | `src/data_tools` | `GroupByLengthBatchSampler_Production`, `GroupByLengthBatchSampler_DDP` |\n| **Model Definition & Training** | `src/ffm`, `src/st_moe_pytorch` | `PatchedTimeSeriesDecoder_MOE`, `MoE`, `Expert` |\n| **Model Adaptation** | `peft_Fincast` | `peft_injector.py` (LoRA/DoRA) |\n| **Inference & Evaluation** | `src/tools` | `FinCast_Inference`, `metrics.py`, `plot_last_outputs` |\n\nThe `FinCast_Inference` class acts as the central orchestrator for the inference lifecycle, managing the loading of the model, the data flow from the `Inference_dataset`, and the final post-processing and visualization of the quantile forecasts.\n\n#### 3.1.2. Component Interactions\n\n## Component Interactions, Data Flow, and Communication Patterns\n\nThe FinCast-fts architecture is a tightly integrated system where data flows sequentially from raw input through data preparation, model processing, and finally to output generation. The core interaction pattern is a pipeline-style data transformation, with a critical internal loop governed by the Mixture of Experts (MoE) mechanism.\n\n### 1. Data Flow Pipeline\n\nThe overall data flow can be broken down into three main stages:\n\n| Stage | Source Module | Destination Module | Data Transformation |\n| :--- | :--- | :--- | :--- |\n| **Input & Preprocessing** | Raw CSV Files | `src/data_tools` | Raw time-series data is read, normalized (Z-score), and segmented into context windows and future horizons. Time features (e.g., day of week, month) are extracted by `TimeCovariates` and potentially used as eXogenous Regressors (XReg). |\n| **Model Forward Pass** | `src/data_tools` (Batches) | `src/ffm` (Model) | Batches of time-series windows (`x_context`, `x_padding`, `freq`) are fed into the `PatchedTimeSeriesDecoder_MOE`. The input is patched, normalized, and embedded with frequency information. |\n| **Output & Post-processing** | `src/ffm` (Forecasts) | `src/tools` | The model outputs a tensor of mean and quantile forecasts. This is denormalized, sliced to the required horizon, and then processed by `FinCast_Inference` for saving to CSV or visualization (`plot_last_outputs`). |\n\n### 2. Core Model Interaction: The Transformer Block with MoE\n\nThe most complex interaction occurs within the `PatchedTimeSeriesDecoder_MOE` (the FFM). Each layer of the `StackedDecoder` (a `TimesFMDecoderLayer`) involves a sequence of interactions:\n\n1.  **Input**: The hidden state (`hidden_states`) from the previous layer enters the current layer.\n2.  **Attention**: The hidden state first passes through the **TimesFMAttention** module. This is a standard self-attention mechanism, where the input interacts with itself to capture long-range temporal dependencies.\n3.  **Normalization**: The output of the attention block is normalized using **RMSNorm** before entering the MoE block.\n4.  **MoE Routing (Sparse Activation)**:\n    *   The normalized hidden state enters the **SparseMoEBlock**.\n    *   The **TopNGating** module (the router) calculates the probability of sending the token (patch) to each expert.\n    *   It selects the top $K$ experts (e.g., $K=2$) based on these probabilities.\n    *   A **Dispatch Tensor** is created, which maps each token to its selected expert(s) and their position within the expert's mini-batch.\n5.  **Expert Computation**:\n    *   The tokens are dispatched to the **Experts** module.\n    *   Each expert (a simple MLP) processes its assigned subset of tokens in parallel.\n6.  **MoE Combination**:\n    *   The **Combine Tensor** (containing the weights from the router) is used to aggregate the outputs from the activated experts back into the original sequence order and dimension.\n7.  **Output**: The combined output is added to the input via a residual connection, and the process repeats for the next layer.\n\nThis sparse activation pattern is the key communication pattern: it ensures that only a small, dynamic subset of the model's total parameters is activated for any given input, enabling the model's high capacity.\n\n### 3. Communication Patterns (Distributed)\n\nThe `src/st_moe_pytorch/distributed.py` module reveals the project's design for handling distributed training (DDP), which is essential for scaling MoE models:\n\n*   **All-Gather for Variable-Sized Tensors**: The `AllGather` class and its underlying `AllGatherFunction` are designed to collect tensors from all Distributed Data Parallel (DDP) ranks. Crucially, it handles **variable sequence lengths** (`all_gather_variable_dim`).\n    *   In a typical MoE setup, the tokens dispatched to an expert on one GPU might have a different batch size than the tokens dispatched to the same expert on another GPU.\n    *   The `AllGather` mechanism ensures that the necessary data is collected across all ranks, padded to a uniform size (`max_size`), and then unpadded after the operation, allowing for correct processing and gradient flow in a distributed environment.\n\nThis pattern is a low-level optimization to ensure that the MoE's routing and expert computation can be correctly synchronized and scaled across multiple GPUs.\n\n### 4. External Regressor (XReg) Interaction\n\nThe `FFmBase` class includes complex logic for integrating external regressors using `xreg_lib.py`. This interaction is highly configurable:\n\n*   **Data Preparation**: The `BatchedInContextXRegLinear` class prepares the time-series data (`targets`) and the external covariates (numerical, categorical, static, dynamic) into a flattened, batched matrix format (`x_train`, `x_test`).\n*   **Two-Way Interaction**:\n    *   **Mode 1 (`timesfm + xreg`)**: The FFM forecasts the time-series, and the XReg model is trained on the *residuals* (the difference between the FFM's forecast and the true value). The final forecast is the FFM output plus the XReg residual forecast.\n    *   **Mode 2 (`xreg + timesfm`)**: The XReg model is trained on the *raw time-series*. The FFM is then trained on the *residuals* (the difference between the XReg model's forecast and the true value). The final forecast is the XReg output plus the FFM residual forecast.\n\nThis flexible interaction pattern allows the FFM to focus on complex, non-linear temporal dependencies while offloading the modeling of linear, exogenous effects to a simpler, more interpretable linear regression model.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml FinCast_Architecture_v4\n!theme toy\n\ntitle FinCast-fts Overall Architecture\n\n' Define Modules (Packages)\npackage \"Data Pipeline (src/data_tools)\" as Data {\n    class TSdataset\n    class Inference_dataset\n    class BatchSampler\n}\n\npackage \"Model Core (src/ffm)\" as FFM {\n    class FFmTorch\n    class PatchedTimeSeriesDecoder_MOE\n    class TimeCovariates\n    class BatchedInContextXRegLinear\n}\n\npackage \"MoE Implementation (src/st_moe_pytorch)\" as MoE {\n    class SparseMoEBlock\n    class MoE_Router\n    class Expert_MLP\n}\n\npackage \"Utilities & Inference (src/tools)\" as Tools {\n    class FinCast_Inference\n    class Metrics\n}\n\npackage \"Adaptation (peft_Fincast)\" as PEFT {\n    class peft_injector\n}\n\n' External Entities\n[Raw CSV Data] as RawData\n[External Libraries] as ExtLibs\n\n' 1. Data Flow\nRawData --> TSdataset : Reads\nTSdataset --> FFmTorch : Supplies Batches\n\n' 2. Model Instantiation and Configuration\nFFmTorch o-- PatchedTimeSeriesDecoder_MOE : Instantiates\n\n' 3. Model Structure (FFM)\nPatchedTimeSeriesDecoder_MOE o-- SparseMoEBlock : Uses (in Transformer Layer)\nPatchedTimeSeriesDecoder_MOE ..> TimeCovariates : Uses for Time Features\nPatchedTimeSeriesDecoder_MOE ..> BatchedInContextXRegLinear : Uses for XReg\n\n' 4. MoE Structure\nSparseMoEBlock o-- MoE_Router : Routes Tokens\nSparseMoEBlock o-- Expert_MLP : Executes Computation\n\n' 5. Inference and Output\nFinCast_Inference ..> Inference_dataset : Uses Dataset\nFinCast_Inference ..> FFmTorch : Calls Forecast API\nFinCast_Inference ..> Metrics : Calculates Performance\nFFmTorch --> FinCast_Inference : Returns Forecasts\n\n' 6. Adaptation\npeft_injector ..> PatchedTimeSeriesDecoder_MOE : Wraps Model for Fine-Tuning\n\n' 7. Data Flow within Data Module\nTSdataset ..> BatchSampler : Uses for Batching\n\n' 8. External Dependencies\nExtLibs .up.> MoE : (einops, torch.distributed)\nExtLibs .up.> PEFT : (peft library)\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\n## Design Patterns\n\nThe FinCast-fts codebase employs several established software design patterns and specialized architectural patterns common in deep learning to achieve modularity, flexibility, and performance.\n\n### 1. Architectural Pattern: Mixture of Experts (MoE)\n\nThe core architectural pattern is the **Mixture of Experts (MoE)**, which is implemented in the `src/st_moe_pytorch` module and integrated into the FFM's transformer layers.\n\n*   **Pattern**: Replaces the standard Feed-Forward Network (FFN) with a collection of expert networks and a trainable gating network (router).\n*   **Implementation**:\n    *   The `MoE` class in `st_moe_pytorch.py` encapsulates the entire mechanism.\n    *   The `TopNGating` component acts as the router, using a soft-max over logits to determine the weight of each expert for a given token.\n    *   The `Expert` class represents the individual, specialized MLPs.\n*   **Code Example (from `st_moe_pytorch.py`):**\n    ```python\n    # MoE class initialization\n    self.gate = TopNGating(...)\n    self.experts = Experts(...)\n\n    # MoE forward pass\n    dispatch_tensor, combine_tensor, balance_loss, router_z_loss = self.gate(x, ...)\n    expert_inputs = einsum('b n d, b n e c -> b e c d', x, dispatch_tensor)\n    expert_outputs = self.experts(expert_inputs, ...)\n    output = einsum('b e c d, b n e c -> b n d', expert_outputs, combine_tensor)\n    ```\n\n### 2. Structural Pattern: Adapter\n\nThe **Adapter Pattern** is used to reconcile the core model implementation with the desired external API interface.\n\n*   **Pattern**: Converts the interface of a class into another interface clients expect.\n*   **Implementation**: The `FFmTorch` class (`ffm_torch_moe.py`) acts as an adapter, inheriting from the abstract `FFmBase` (`ffm_base.py`) and wrapping the concrete PyTorch model (`PatchedTimeSeriesDecoder_MOE`). This allows the model to conform to the TimesFM-inspired API (`forecast`, `forecast_on_df`) while using a custom PyTorch implementation.\n\n### 3. Behavioral Pattern: Strategy\n\nThe integration of eXogenous Regressors (XReg) follows the **Strategy Pattern**, allowing the user to select one of two distinct XReg integration methods at runtime.\n\n*   **Pattern**: Defines a family of algorithms, encapsulates each one, and makes them interchangeable.\n*   **Implementation**: The `FFmBase` class's `forecast_with_xreg` method accepts an `xreg_mode` parameter (`\"timesfm + xreg\"` or `\"xreg + timesfm\"`), which determines the strategy for combining the FFM forecast with the linear regressor (`BatchedInContextXRegLinear`).\n\n### 4. Creational Pattern: Factory Method\n\nA simple form of the **Factory Method Pattern** is used for model instantiation.\n\n*   **Pattern**: Defines an interface for creating an object, but lets subclasses decide which class to instantiate.\n*   **Implementation**: The `get_model_FFM` function in `src/tools/model_utils.py` centralizes the logic for creating the FFM model instance (`FFM`) and its configuration (`FFmHparams`) from a checkpoint path, abstracting the complex setup from the main inference logic.\n\n### 5. Idiomatic Pattern: Skip Connections (Residual Block)\n\nThe **Residual Block** pattern is fundamental to the stability and training of deep neural networks.\n\n*   **Pattern**: Adds the input of a layer to its output, bypassing one or more layers.\n*   **Implementation**:\n    *   The `ResidualBlock` class in `pytorch_patched_decoder_MOE.py` explicitly implements this pattern for the input and horizon feed-forward layers.\n    *   The `TimesFMDecoderLayer` and `SparseMoEBlock` also utilize residual connections around their main computational units (attention and MoE).\n*   **Code Example (from `pytorch_patched_decoder_MOE.py`):**\n    ```python\n    class ResidualBlock(nn.Module):\n        # ... (hidden_layer, output_layer, residual_layer defined)\n        def forward(self, x):\n            hidden = self.hidden_layer(x)\n            output = self.output_layer(hidden)\n            residual = self.residual_layer(x)\n            return output + residual # The skip connection\n    ```\n\n#### 3.3.2. Project Highlights\n\n## Project Highlights\n\nThe FinCast-fts project showcases several innovative features and design choices that contribute to its effectiveness, extensibility, and efficiency in financial time-series forecasting.\n\n*   **Spatio-Temporal Mixture of Experts (ST-MoE) Integration**:\n    *   **Highlight**: The core innovation is the seamless integration of the MoE architecture into the Transformer decoder, replacing the standard FFN. This allows the model to achieve a massive parameter count (high capacity) while maintaining a low, constant computational cost during the forward pass (sparse activation).\n    *   **Benefit**: This is crucial for foundation models, as it enables the FFM to learn highly specialized patterns (experts) for different types of time-series or market regimes without becoming prohibitively slow or expensive to run. The `st_moe_pytorch` module, with its custom `TopNGating` and auxiliary loss functions, ensures the experts are used efficiently and balanced during training.\n\n*   **Efficient Training via Length-Based Batching**:\n    *   **Highlight**: The use of the custom `GroupByLengthBatchSampler` in `src/data_tools` is a significant performance optimization. This sampler groups time-series samples with identical context lengths into the same batch.\n    *   **Benefit**: In a Transformer architecture, padding is a major source of wasted computation. By eliminating intra-batch padding, the project maximizes the utilization of GPU memory and compute, leading to faster training times and higher throughput, especially when dealing with time-series of varying lengths.\n\n*   **Parameter-Efficient Fine-Tuning (PEFT) Support**:\n    *   **Highlight**: The dedicated `peft_Fincast` module provides first-class support for PEFT techniques like LoRA and DoRA. It includes predefined presets (`attn`, `attn_mlp_gating`, `experts_heavy`) to target specific layers for adapter injection.\n    *   **Benefit**: This design choice directly addresses the challenge of fine-tuning large foundation models. Instead of retraining the entire FFM, users can fine-tune a small set of parameters (the adapters) for a new task, drastically reducing training time, memory footprint, and storage requirements for task-specific models. This enhances the model's **extensibility** to new financial datasets.\n\n*   **Comprehensive Time-Series Feature Engineering**:\n    *   **Highlight**: The `TimeCovariates` class in `src/ffm/time_features.py` extracts a rich, normalized set of temporal features (e.g., minute-of-hour, day-of-year, holiday proximity).\n    *   **Benefit**: This feature set provides the model with explicit, high-quality information about the time context, which is vital for financial data where seasonality and calendar effects (like holidays) are strong predictors. This design improves the model's **flexibility** and predictive power across different time granularities.\n\n*   **Quantile Forecasting for Risk Management**:\n    *   **Highlight**: The model's output head is designed to predict not just the mean, but a full distribution of quantiles (e.g., 0.1 to 0.9).\n    *   **Benefit**: In finance, point forecasts are often insufficient. By providing a full predictive distribution, the FFM enables advanced risk management, Value-at-Risk (VaR) calculations, and confidence interval estimation, making the model's output more **actionable** for trading and investment strategies.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\n## Improvement Suggestions\n\nBased on the comprehensive analysis of the FinCast-fts codebase, the following suggestions are proposed to address potential performance bottlenecks, optimize the architecture, and enhance code quality.\n\n### 1. Performance Bottlenecks and Optimization\n\n| Area | Suggestion | Rationale and Impact |\n| :--- | :--- | :--- |\n| **Data Loading (CPU)** | Implement a more efficient data loading mechanism for large-scale datasets, potentially using Apache Arrow or Parquet format instead of CSV. | The current implementation in `TSdataset.py` relies on `pd.read_csv` and `np.vstack`, which can be slow and memory-intensive for massive financial datasets. Using columnar formats and memory-mapped files can significantly reduce I/O overhead and memory usage. |\n| **XReg Solver** | Replace the JAX-based `BatchedInContextXRegLinear` with a PyTorch-native or highly optimized C++/CUDA linear algebra solver (e.g., using `torch.linalg.solve`). | The current XReg implementation in `xreg_lib.py` uses JAX, which introduces a dependency on a separate ecosystem and requires data transfer between PyTorch (model) and JAX (XReg). A unified PyTorch solution would eliminate this overhead and simplify the dependency stack. |\n| **MoE Dispatch** | Optimize the MoE dispatch and combine operations for GPU. | The `st_moe_pytorch` module relies heavily on `einsum` and tensor manipulation (`rearrange`, `pack`, `unpack`). While flexible, these operations can be less performant than highly optimized custom CUDA kernels used in production-grade MoE implementations (e.g., Fairseq's Fused MoE). Investigating a fused kernel implementation for the dispatch/combine steps could yield significant speedups. |\n\n### 2. Architecture Optimization\n\n*   **Decouple FFM from XReg**: The tight coupling of the FFM (`FFmBase`) with the XReg logic makes the core model API complex. It is recommended to separate the XReg functionality into a standalone wrapper class that takes a trained FFM model and applies the XReg logic externally. This would simplify the `FFmBase` interface and make the core model more modular.\n*   **Standardize Configuration Management**: The current configuration is spread across `FFmHparams` (dataclass) and `FFMConfig` (dataclass). It is recommended to consolidate all hyperparameters into a single, canonical configuration class (e.g., using `dataclasses` or `pydantic`) and pass this single object throughout the system. This improves clarity and reduces the risk of inconsistent parameter settings.\n*   **Refactor `pytorch_patched_decoder_MOE.py`**: This file is excessively large (over 800 lines) and contains multiple classes (`FFMConfig`, `TimesFMAttention`, `TimesFMDecoderLayer`, `PatchedTimeSeriesDecoder_MOE`). Breaking this file into smaller, more focused modules (e.g., `attention.py`, `decoder_layer.py`, `model.py`) would significantly improve code navigation and maintainability.\n\n### 3. Code Quality and Maintainability\n\n*   **Type Hinting and Docstrings**: While type hints are present, consistency can be improved, especially in utility functions and complex tensor manipulation code. Comprehensive docstrings following a standard format (e.g., Google or NumPy style) should be added to all public methods and classes, particularly in the `st_moe_pytorch` module, which is complex due to its distributed nature.\n*   **Remove Redundant TensorFlow Code**: The `src/ffm/data_loader.py` file contains a TensorFlow-based data loader (`TimeSeriesdata`). Since the rest of the project is PyTorch-native, this file appears to be vestigial code from the original TimesFM project. It should be removed or clearly marked as deprecated to avoid confusion and unnecessary dependencies.\n*   **Consistent Naming Conventions**: The project uses a mix of naming conventions (e.g., `FFmTorch`, `PatchedTimeSeriesDecoder_MOE`, `peft_injector`). Adopting a consistent style (e.g., all classes using `PascalCase` and all functions using `snake_case`) across all modules would enhance readability.\n\n#### 3.4.2. Secondary Development Guide\n\n## Secondary Development Guide\n\nThis guide provides a structured approach for exploring the FinCast-fts codebase and conducting secondary development, such as fine-tuning, adding new features, or integrating new data sources.\n\n### 1. Code Exploration Path\n\nTo understand the project, follow the data flow and model architecture sequentially:\n\n1.  **Data Preparation (`src/data_tools`)**:\n    *   Start with `src/data_tools/TSdataset.py` to understand how raw CSV data is converted into univariate time-series and how sliding windows are generated for training.\n    *   Examine `src/data_tools/batch_sampler.py` to grasp the length-based batching optimization, which is crucial for efficient training.\n2.  **Model Core and Configuration (`src/ffm`)**:\n    *   Review `src/ffm/ffm_base.py` and `src/ffm/ffm_torch_moe.py` to understand the high-level API and model loading process.\n    *   The core model logic is in `src/ffm/pytorch_patched_decoder_MOE.py`. Focus on the `PatchedTimeSeriesDecoder_MOE` class, particularly the `_preprocess_input` method (patching, normalization) and the `forward` method (Transformer stack, frequency embedding).\n3.  **Architectural Innovation (`src/st_moe_pytorch`)**:\n    *   Deep dive into `src/st_moe_pytorch/st_moe_pytorch.py`. This module defines the MoE mechanism. Understanding the `TopNGating` (router) and `MoE` (expert dispatch/combine) is key to modifying the model's capacity or routing behavior.\n\n### 2. Best Practices for Fine-Tuning (PEFT)\n\nThe recommended path for secondary development is **Parameter-Efficient Fine-Tuning (PEFT)** using the provided `peft_Fincast` module.\n\n*   **Select a Target Preset**: Use the `peft_injector.py` to wrap your pre-trained FFM. Start with a minimal preset like `\"attn\"` or `\"attn_mlp\"` to ensure stability. For maximum capacity increase, use `\"experts_heavy\"`.\n*   **Hyperparameter Tuning**: Focus on tuning the LoRA rank (`lora_r`) and alpha (`lora_alpha`). A higher rank increases the number of trainable parameters and model capacity but also increases memory usage.\n*   **Training Loop**: The fine-tuning process should be identical to the original training loop, but only the LoRA adapter parameters will have `requires_grad=True`.\n\n### 3. Adding New Features\n\n*   **New Time Features**: To add a new temporal covariate (e.g., lunar cycle, specific market hours), modify the `TimeCovariates` class in `src/ffm/time_features.py`. Ensure the new feature is correctly normalized and added to the output DataFrame.\n*   **New Exogenous Regressors (XReg)**: If you are adding new external data (e.g., sentiment scores, macroeconomic indicators), ensure they are prepared in the `FFmBase`'s `forecast_with_xreg` method and integrated into the `BatchedInContextXRegLinear` in `src/ffm/xreg_lib.py`. This requires providing the new data as `dynamic_numerical_covariates` or `static_numerical_covariates` to the XReg fitting process.\n*   **Custom Expert**: To experiment with a different expert architecture (e.g., a different activation function or a deeper MLP), modify the `Expert` class definition in `src/st_moe_pytorch/st_moe_pytorch.py`. Ensure the input and output dimensions remain consistent with the model's `hidden_size`.\n\n"
  },
  {
    "path": "thirdparty/FinGPT.md",
    "content": "# FinGPT - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe FinGPT repository is structured as a collection of distinct, yet related, sub-projects, each focusing on a specific financial application of Large Language Models (LLMs). This modular structure facilitates independent development and deployment of different FinLLM capabilities.\n\n```\n/home/ubuntu/FinGPT/\n├── fingpt/\n│   ├── FinGPT_Benchmark/             # Module 1: Benchmarking and Fine-tuning Utilities\n│   │   ├── benchmarks/               # Contains scripts for various financial NLP benchmarks (e.g., ConvFinQA, FiQA).\n│   │   ├── data/                     # Data download and preparation scripts for benchmarks.\n│   │   ├── train_lora.py             # Script for LoRA-based fine-tuning of models on benchmark datasets.\n│   │   └── utils.py                  # Utility functions for model path parsing, dataset loading, and tokenization.\n│   ├── FinGPT_FinancialReportAnalysis/ # Module 2: Financial Report Analysis (RAG)\n│   │   ├── reportanalysis.ipynb      # Jupyter notebook demonstrating the RAG analysis flow.\n│   │   └── utils/                    # Core RAG implementation, including document formatting and clustering (Raptor).\n│   │       ├── earning_calls.py      # Utilities for processing earning call transcripts.\n│   │       ├── format_pdf.py         # Utilities for formatting PDF documents.\n│   │       └── rag.py                # Core implementation of the Recursive Abstractive Clustering (Raptor) RAG system.\n│   ├── FinGPT_Forecaster/            # Module 3: Financial Forecasting\n│   │   ├── AAAI-Good-Data/           # Sub-module for a specific dataset/training configuration (e.g., AAAI paper data).\n│   │   ├── FinGPT-Forecaster-Chinese/ # Sub-module for Chinese-specific forecasting data and models.\n│   │   ├── app.py                    # Streamlit or Flask application for the forecaster interface.\n│   │   ├── data_pipeline.py          # Script for data acquisition, prompt generation, and dataset creation.\n│   │   ├── data.py                   # Core data preparation functions.\n│   │   ├── indices.py                # Definitions of financial indices (DOW, EURO-STOXX, CRYPTO).\n│   │   └── prompt.py                 # Functions for generating prompts for the LLM.\n│   ├── FinGPT_MultiAgentsRAG/        # Module 4: Multi-Agent RAG and Evaluation (Experimental)\n│   │   ├── Evaluation_methods/       # Contains evaluation scripts (HaluEval, MMLU, TruthfulQA).\n│   │   ├── Fine_tune_model/          # Notebooks for fine-tuning models (e.g., GLM2, Llama2).\n│   │   ├── MultiAgents/              # Notebooks demonstrating multi-agent inference.\n│   │   └── RAG/                      # Notebooks for RAG implementation.\n│   ├── FinGPT_Others/                # Module 5: Miscellaneous/Older Projects\n│   │   ├── FinGPT_Low_Code_Development/ # Low-code development examples.\n│   │   ├── FinGPT_Robo_Advisor/      # Robo-advisor examples.\n│   │   └── FinGPT_Trading/           # Trading examples.\n│   ├── FinGPT_RAG/                   # Module 6: General RAG and Data Scraping\n│   │   ├── instruct-FinGPT/          # Scripts for supervised fine-tuning (SFT) and inference.\n│   │   └── multisource_retrieval/    # Web scraping and data retrieval utilities.\n│   │       ├── external_LLMs/        # Utilities for external LLM integration.\n│   │       ├── scrapers/             # Specific web scrapers (Yahoo, CNBC, Google, etc.).\n│   │       └── utils/                # Classification and formatting utilities.\n│   ├── FinGPT_Sentiment_Analysis_v1/ # Module 7: Sentiment Analysis (Older Version)\n│   └── FinGPT_Sentiment_Analysis_v3/ # Module 8: Sentiment Analysis (Latest Version)\n│       ├── benchmark/                # Benchmarking notebooks.\n│       ├── data/                     # Data preparation notebooks.\n│   │   └── training_parallel/        # Parallel training scripts (e.g., using DeepSpeed).\n├── requirements.txt                  # Project dependencies.\n└── setup.py                          # Installation script.\n```\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinGPT/fingpt/FinGPT_Benchmark`: Contains the infrastructure for evaluating and fine-tuning FinLLMs on various financial NLP tasks. It includes utilities for data preparation, model loading, and LoRA-based training.\n*   `/home/ubuntu/FinGPT/fingpt/FinGPT_FinancialReportAnalysis/utils`: Houses the core logic for the RAG system applied to financial documents, notably the **Raptor** (Recursive Abstractive Clustering) implementation for document chunking and summarization.\n*   `/home/ubuntu/FinGPT/fingpt/FinGPT_Forecaster`: Contains the complete pipeline for financial forecasting, from data acquisition and prompt engineering to dataset creation for model training.\n*   `/home/ubuntu/FinGPT/fingpt/FinGPT_RAG/multisource_retrieval`: The primary module for web scraping and multi-source data retrieval, which is a critical component for feeding real-time financial news into the LLM.\n*   `/home/ubuntu/FinGPT/fingpt/FinGPT_Sentiment_Analysis_v3`: The latest implementation for sentiment analysis model training, including parallel training configurations and benchmarking tools.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n### Module 1: FinGPT_Benchmark\n- **Core Responsibility**: Provides a standardized environment for fine-tuning and evaluating various base LLMs (Llama2, ChatGLM2, Qwen, etc.) on financial tasks using the LoRA technique.\n- **Key Files**:\n    - `utils.py`: Defines model-specific LoRA target modules (`lora_module_dict`), prompt templates (`template_dict`), model path parsing (`parse_model_name`), and a robust dataset loading mechanism (`load_dataset`) that supports replication and remote/local loading.\n    - `train_lora.py`: The main training script. It loads the model, tokenizer, and dataset, applies LoRA configuration, and uses the Hugging Face `Trainer` with DeepSpeed for efficient, parallelized fine-tuning. It also integrates with **WandB** for experiment tracking.\n- **Implementation Details**: The `tokenize` function in `utils.py` is critical, handling the concatenation of instruction, input, and output, and ensuring the sequence length does not exceed the model's maximum length, a common challenge in LLM fine-tuning. The use of `parse_model_name` centralizes the mapping between a simple model name (e.g., 'llama2') and its corresponding Hugging Face repository path.\n\n### Module 2: FinGPT_FinancialReportAnalysis/utils\n- **Core Responsibility**: Implements the **Raptor** (Recursive Abstractive Clustering) RAG framework for processing large financial documents (like earning call transcripts or PDFs) by recursively clustering and summarizing text chunks to create a hierarchical index.\n- **Key Files**:\n    - `rag.py`: Contains the `Raptor` class. This class uses **UMAP** for dimensionality reduction and **Gaussian Mixture Model (GMM)** with **BIC** for optimal cluster determination. The key methods are `recursive_embed_cluster_summarize` and `text_spliter`, which implement the hierarchical chunking and summarization process.\n    - `format_pdf.py`: Handles the initial processing and formatting of PDF documents.\n    - `earning_calls.py`: Contains specific logic for handling earning call data.\n- **Implementation Details**: The `Raptor` class is a sophisticated implementation of hierarchical RAG. It first splits the text using `RecursiveCharacterTextSplitter`, then iteratively applies embedding, UMAP reduction, GMM clustering (using BIC for optimal cluster count), and LLM-based summarization. This recursive process creates a multi-layered knowledge base, significantly improving the context quality for RAG queries on long documents.\n\n### Module 3: FinGPT_Forecaster\n- **Core Responsibility**: Manages the end-to-end pipeline for generating structured financial forecasting datasets suitable for LLM fine-tuning.\n- **Key Files**:\n    - `data_pipeline.py`: The orchestrator. It defines the flow: 1) Acquire data for symbols in a given index (DOW, EURO, CRYPTO) via `prepare_data_for_symbol`. 2) Generate prompts and query an external LLM (GPT-4) for forecasts/rationales via `query_gpt4`. 3) Transform the results into a final training dataset via `create_dataset`.\n    - `indices.py`: Simple file defining lists of stock/crypto symbols for different indices.\n    - `prompt.py`: Contains the logic for constructing the detailed, structured prompts used to query the external LLM for forecasting.\n- **Implementation Details**: The pipeline is a strong example of using an LLM for data labeling and rationale generation. The `query_gpt4` function is the bottleneck, as it relies on an external, non-deterministic API call to enrich the raw financial data with LLM-generated forecasts and explanations, which are then used as the \"output\" for the fine-tuning dataset.\n\n### Module 4: FinGPT_RAG/multisource_retrieval\n- **Core Responsibility**: A comprehensive web scraping and data retrieval layer designed to gather real-time financial news from multiple sources, which serves as the knowledge base for the RAG system.\n- **Key Files**:\n    - `news_scraper.py`: The main scraping logic. It uses `requests` and `BeautifulSoup` for static scraping and includes logic for handling various financial news sites (Seeking Alpha, Reuters, Bloomberg, Yahoo, CNBC, MarketWatch). It also contains a `select_column_and_classify` function, suggesting an interactive or GUI-driven workflow for data labeling.\n    - `scrapers/`: Sub-directory containing site-specific scraping implementations (e.g., `scrape_yahoo.py`, `scrape_cnbc.py`).\n    - `external_LLMs/`: Utilities for tokenization and interaction with external LLMs (e.g., ChatGPT, g4f).\n- **Implementation Details**: The scraping logic is highly decentralized, with a central dispatcher (`scraping_by_url` in `news_scraper.py`) delegating to site-specific scrapers. This design is necessary due to the varied HTML structures of different news sites but makes the system fragile to website changes. The use of `similarity_score` attempts to filter for relevance before extracting the full article text.\n\n### Module 5: FinGPT_Sentiment_Analysis_v3\n- **Core Responsibility**: Provides the latest, optimized training pipeline for sentiment analysis models, focusing on efficiency and parallel processing.\n- **Key Files**:\n    - `training_parallel/train_lora.py`: A specialized LoRA training script, similar to the benchmark one but with custom `ModifiedTrainer` and `data_collator` classes. The `ModifiedTrainer` overrides `compute_loss` and `prediction_step` to handle the specific input/output format of the sentiment task, and customizes `save_model` to only save the LoRA adapter weights. It is configured for DeepSpeed and parallel training.\n- **Implementation Details**: The custom `ModifiedTrainer` is a key feature, allowing the project to bypass the standard Hugging Face Trainer's assumptions about loss calculation and model saving, which is often necessary when working with specialized models like ChatGLM or when only saving adapter weights. The `data_collator` handles padding and label masking specific to the sentiment fine-tuning task.\n\n### Module PlantUML Diagrams\n\n@startuml FinGPT_Benchmark\ntitle FinGPT_Benchmark Module Class Diagram\n\npackage \"HuggingFace/PEFT\" {\n    class AutoModelForCausalLM\n    class AutoTokenizer\n    class TrainingArguments\n    class Trainer\n    class LoraConfig\n    class get_peft_model\n}\n\npackage \"Datasets\" {\n    class Dataset\n    class concatenate_datasets\n}\n\npackage \"Benchmark Utilities\" {\n    class Utils {\n        + template_dict: Dict\n        + lora_module_dict: Dict\n        + get_prompt(template, instruction, input_text)\n        + tokenize(args, tokenizer, feature)\n        + parse_model_name(name, from_remote)\n        + load_dataset(names, from_remote)\n    }\n    class TrainLoRA {\n        - main(args)\n    }\n}\n\nTrainLoRA ..> Utils : uses\nTrainLoRA ..> AutoModelForCausalLM : loads\nTrainLoRA ..> AutoTokenizer : loads\nTrainLoRA ..> TrainingArguments : configures\nTrainLoRA ..> Trainer : initializes\nTrainLoRA ..> LoraConfig : configures\nTrainLoRA ..> get_peft_model : applies\nTrainLoRA ..> concatenate_datasets : combines\nUtils ..> Dataset : loads\nUtils ..> AutoTokenizer : uses in tokenize\n\n@enduml\n\n@startuml FinGPT_FinancialReportAnalysis_RAG\ntitle FinGPT_FinancialReportAnalysis RAG Module Class Diagram\n\npackage \"LangChain/Utils\" {\n    class ChatPromptTemplate\n    class StrOutputParser\n    class RecursiveCharacterTextSplitter\n}\n\npackage \"Clustering/Reduction\" {\n    class UMAP\n    class GaussianMixture\n}\n\nclass Raptor {\n    - model: LLM\n    - embd: Embeddings\n    + global_cluster_embeddings(embeddings, dim)\n    + local_cluster_embeddings(embeddings, dim)\n    + get_optimal_clusters(embeddings) : int\n    + GMM_cluster(embeddings, threshold) : Tuple[labels, n_clusters]\n    + perform_clustering(embeddings, dim, threshold) : List[np.ndarray]\n    + embed(texts) : np.ndarray\n    + embed_cluster_texts(texts) : DataFrame\n    + fmt_txt(df) : str\n    + embed_cluster_summarize_texts(texts, level) : Tuple[DataFrame, DataFrame]\n    + recursive_embed_cluster_summarize(texts, level, n_levels) : Dict\n    + text_spliter(text, chunk_size_tok, level, n_levels) : List[str]\n}\n\nRaptor ..> UMAP : uses for reduction\nRaptor ..> GaussianMixture : uses for clustering\nRaptor ..> ChatPromptTemplate : uses for summarization prompt\nRaptor ..> StrOutputParser : uses for summarization output\nRaptor ..> RecursiveCharacterTextSplitter : uses for initial chunking\nRaptor \"1\" *-- \"1\" UMAP\nRaptor \"1\" *-- \"1\" GaussianMixture\nRaptor \"1\" *-- \"1\" ChatPromptTemplate\nRaptor \"1\" *-- \"1\" StrOutputParser\nRaptor \"1\" *-- \"1\" RecursiveCharacterTextSplitter\n\n@enduml\n\n@startuml FinGPT_Forecaster\ntitle FinGPT_Forecaster Module Class Diagram\n\npackage \"Data Components\" {\n    class Indices {\n        + DOW_30: List[str]\n        + EURO_STOXX_50: List[str]\n        + CRYPTO: List[str]\n    }\n    class Data {\n        + prepare_data_for_symbol(symbol, data_dir, start_date, end_date, with_basics)\n        + query_gpt4(index, data_dir, start_date, end_date, min_past_weeks, max_past_weeks, with_basics)\n        + create_dataset(index, data_dir, start_date, end_date, train_ratio, with_basics)\n    }\n    class Prompt {\n        + get_all_prompts(index, data_dir, start_date, end_date, min_past_weeks, max_past_weeks, with_basics)\n    }\n    class DataInferenceFetch {\n        + get_curday()\n        + fetch_all_data()\n        + get_all_prompts_online()\n    }\n}\n\nclass DataPipeline {\n    + main(args)\n}\n\nDataPipeline ..> Indices : uses\nDataPipeline ..> Data : uses\nDataPipeline ..> Prompt : uses\nDataPipeline ..> DataInferenceFetch : uses\n\n@enduml\n\n@startuml FinGPT_RAG_MultisourceRetrieval\ntitle FinGPT_RAG Multisource Retrieval Module Class Diagram\n\npackage \"Web Scraping Tools\" {\n    class BeautifulSoup\n    class requests_get\n    class split_sentence\n    class similarity_score\n}\n\npackage \"Site Specific Scrapers\" {\n    class ScrapeYahoo\n    class ScrapeCNBC\n    class ScrapeMarketScreener\n    class ScrapeGoogle\n}\n\nclass NewsScraper {\n    + scraping_by_url(link, subject) : Tuple[url, subject]\n    + scrape_bloomberg(subject) : List[str]\n    + scrape_reuters(subject) : Tuple[url, subject]\n    + scrape_market_watch_article_page(url, subject) : Tuple[url, subject]\n    + select_column_and_classify() : void\n}\n\nNewsScraper ..> BeautifulSoup : uses\nNewsScraper ..> requests_get : uses\nNewsScraper ..> split_sentence : uses\nNewsScraper ..> similarity_score : uses\nNewsScraper ..> ScrapeYahoo : delegates\nNewsScraper ..> ScrapeCNBC : delegates\nNewsScraper ..> ScrapeMarketScreener : delegates\nNewsScraper ..> ScrapeGoogle : delegates\n\n@enduml\n\n@startuml FinGPT_Sentiment_Analysis_v3\ntitle FinGPT_Sentiment_Analysis_v3 Training Module Class Diagram\n\npackage \"HuggingFace/PEFT\" {\n    class AutoModel\n    class AutoTokenizer\n    class TrainingArguments\n    class Trainer\n    class LoraConfig\n    class get_peft_model\n}\n\nclass ModifiedTrainer extends Trainer {\n    + compute_loss(model, inputs, return_outputs=False)\n    + prediction_step(model, inputs, prediction_loss_only, ignore_keys)\n    + save_model(output_dir)\n}\n\nclass CastOutputToFloat {\n    + forward(x)\n}\n\nclass TrainLoRA {\n    + main()\n}\n\nclass DataCollator {\n    + data_collator(features: list) : dict\n}\n\nTrainLoRA ..> AutoModel : loads\nTrainLoRA ..> AutoTokenizer : loads\nTrainLoRA ..> TrainingArguments : configures\nTrainLoRA ..> ModifiedTrainer : initializes\nTrainLoRA ..> LoraConfig : configures\nTrainLoRA ..> get_peft_model : applies\nModifiedTrainer ..> DataCollator : uses (via trainer init)\nTrainLoRA ..> DataCollator : uses\n\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe FinGPT project is built upon a **modular, LLM-centric, and data-driven design philosophy**, aiming to provide an accessible, open-source framework for financial LLMs. The core abstractions are centered around three main pillars: **Parameter-Efficient Fine-Tuning (PEFT)**, **Hierarchical Retrieval-Augmented Generation (RAG)**, and **End-to-End Data Pipelines**.\n\nThe **LoRA Adapter** is the central abstraction for the model layer. Instead of fine-tuning the entire large language model, the project utilizes LoRA (Low-Rank Adaptation) to inject a small number of trainable parameters into the base LLM (e.g., Llama2, ChatGLM2). This abstraction allows for efficient domain adaptation with minimal computational resources, making the project highly accessible. The `lora_module_dict` in `FinGPT_Benchmark/utils.py` explicitly manages which modules of different base models are targeted for adaptation, demonstrating a flexible approach to model heterogeneity.\n\nThe **Raptor (Recursive Abstractive Clustering)** system, implemented in `FinGPT_FinancialReportAnalysis/utils/rag.py`, is the key abstraction for handling large, unstructured financial documents. It abstracts the complex process of document chunking, embedding, dimensionality reduction (UMAP), optimal clustering (GMM/BIC), and recursive summarization into a single, hierarchical RAG index. This allows the LLM to retrieve context from multiple levels of abstraction (raw text, cluster summaries, meta-summaries), significantly improving the quality of grounded responses.\n\nThe **Data Pipeline** abstraction, exemplified by `FinGPT_Forecaster/data_pipeline.py`, manages the entire lifecycle of creating a structured dataset. This pipeline abstracts data acquisition, prompt engineering, external LLM querying (e.g., GPT-4 for labeling/rationales), and final dataset transformation into a sequential, reproducible process.\n\nThe project’s **lifecycle management** follows a clear sequence:\n1.  **Data Acquisition**: Raw financial data (news, reports) is gathered via the `multisource_retrieval` layer.\n2.  **Data Preparation**: Data is cleaned, structured, and transformed into domain-specific datasets (Forecasting, Sentiment) or hierarchical RAG indices (Raptor).\n3.  **Model Adaptation**: Base LLMs are fine-tuned using the LoRA Adapter via the `train_lora.py` scripts.\n4.  **Application**: The adapted FinLLM is deployed within application agents (Forecaster, Sentiment Classifier, RAG Query Engine) to serve end-user tasks.\n\n#### 3.1.2. Component Interactions\n\nThe FinGPT architecture is characterized by a unidirectional, layered data flow, starting from external sources and culminating in the application layer.\n\n**Data Flow:**\n1.  **External Sources** (Websites, APIs, PDFs) feed into the **Data Acquisition Layer** (`multisource_retrieval`).\n2.  The **Scraper/Retriever** component extracts raw text and links.\n3.  Raw text is routed to two main paths:\n    *   **Structured Dataset Path**: Text is processed by `data_pipeline.py` (Forecaster) or similar scripts (Sentiment) to generate `instruction` and `output` pairs, often involving an external LLM (GPT-4) for initial labeling or rationale generation. This results in a Hugging Face `Dataset` object.\n    *   **RAG Index Path**: Large documents are processed by the **Raptor** component (`rag.py`), which generates a multi-level index of summaries and embeddings.\n4.  The **Fine-Tuning Layer** (`train_lora.py`) consumes the structured `Dataset` and applies the LoRA Adapter to the **Base LLM**.\n5.  The resulting **FinLLM Core** (Base LLM + LoRA Adapter) is used by the **Application Agents** (RAG Query Engine, Forecaster Agent, Sentiment Classifier) for inference.\n\n**Communication Patterns:**\n*   **Hugging Face Ecosystem**: The primary communication pattern for model training is the Hugging Face `Trainer` class, which manages the entire training loop, including data loading, optimization, and checkpointing. This is heavily integrated with the **PEFT** library for LoRA.\n*   **LangChain-Style Chains**: The RAG component in `rag.py` uses a functional chain pattern (`prompt | self.model | StrOutputParser()`) for summarization, a pattern popularized by LangChain, demonstrating a clear separation of prompt, model, and output parsing.\n*   **Inter-Module Python Calls**: Data flow within the pipelines (e.g., `data_pipeline.py` calling `indices.py`, `data.py`, and `prompt.py`) relies on standard Python function and class imports, maintaining a tightly coupled but clear execution sequence.\n*   **External API Calls**: The system communicates with external services for two main purposes: web scraping (`requests`, `BeautifulSoup` in `news_scraper.py`) and external LLM querying (e.g., `query_gpt4` in `data.py`, which is assumed to make an API call).\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml FinGPT_Overall_Architecture\ntitle FinGPT Overall Architecture\n\nskinparam componentStyle rectangle\n\npackage \"1. Data Acquisition Layer\" as DataAcquisition {\n    [Multisource Retrieval] as Scraper\n    [Data Fetchers] as Fetchers\n    [Financial News Sources] as Sources\n    Sources --> Scraper : Scrapes raw data\n    Scraper --> Fetchers : Provides raw data\n}\n\npackage \"2. Data Processing & Preparation\" as DataProcessing {\n    [Forecaster Data Pipeline] as ForecasterDP\n    [Sentiment Data Preparation] as SentimentDP\n    [Document Chunking & Clustering] as Raptor\n    [Financial Documents (PDFs)] as Docs\n    \n    Fetchers --> ForecasterDP : Structured data\n    Fetchers --> SentimentDP : Labeled data\n    Docs --> Raptor : Unstructured text\n}\n\npackage \"3. Model Fine-Tuning Layer\" as FineTuning {\n    [Base LLM (e.g., Llama2)] as BaseLLM\n    [LoRA Adapter] as Adapter\n    [Training Scripts (DeepSpeed)] as Trainer\n    \n    ForecasterDP --> Trainer : Forecasting Dataset\n    SentimentDP --> Trainer : Sentiment Dataset\n    Trainer --> Adapter : Fine-tunes weights\n    BaseLLM <--> Adapter : Loads adapter\n}\n\npackage \"4. Application & Inference Layer\" as Application {\n    [FinLLM Core] as FinLLM\n    [RAG Query Engine] as RAGEngine\n    [Forecasting Agent] as ForecasterAgent\n    [Sentiment Classifier] as SentimentAgent\n    \n    BaseLLM -[hidden]right-> Adapter\n    BaseLLM --> FinLLM : Core Model\n    Adapter --> FinLLM : Domain Knowledge\n    \n    Raptor --> RAGEngine : Hierarchical Index\n    FinLLM --> RAGEngine : Contextual Generation\n    \n    FinLLM --> ForecasterAgent : Prediction\n    FinLLM --> SentimentAgent : Classification\n    \n}\n\n' Interactions\nDataAcquisition --> DataProcessing : Raw Data Flow\nDataProcessing --> FineTuning : Structured Datasets\nDataProcessing --> Application : Knowledge Base (Raptor Index)\n\nRAGEngine .> FinLLM : Queries for grounded response\nForecasterAgent .> FinLLM : Queries for prediction\nSentimentAgent .> FinLLM : Queries for classification\n\n[User/API] --> ForecasterAgent\n[User/API] --> SentimentAgent\n[User/API] --> RAGEngine\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe FinGPT codebase employs several established software design patterns to manage complexity and promote modularity:\n\n1.  **Adapter Pattern (LoRA)**:\n    *   **Description**: The LoRA mechanism acts as an adapter, allowing a new interface (domain-specific fine-tuning) to be used with an existing class (the frozen base LLM).\n    *   **Implementation**: In `FinGPT_Benchmark/train_lora.py`, the `LoraConfig` and `get_peft_model` functions wrap the `AutoModelForCausalLM` instance, effectively adapting its behavior for financial tasks without modifying its massive original weights.\n    *   **Code Example**:\n        ```python\n        # FinGPT_Benchmark/train_lora.py\n        peft_config = LoraConfig(\n            task_type=TaskType.CAUSAL_LM,\n            r=8,\n            lora_alpha=32,\n            target_modules=lora_module_dict[args.base_model], # The adaptation logic\n            # ...\n        )\n        model = get_peft_model(model, peft_config) # The adapter application\n        ```\n\n2.  **Pipeline Pattern (Data Flow)**:\n    *   **Description**: A sequence of processing steps where the output of one step becomes the input of the next.\n    *   **Implementation**: The `main` function in `FinGPT_Forecaster/data_pipeline.py` clearly defines the pipeline stages: Acquire Data -> Generate Prompt/Query GPT-4 -> Transform to Training Format.\n    *   **Code Example**:\n        ```python\n        # FinGPT_Forecaster/data_pipeline.py (Simplified)\n        # 1. Acquire data\n        for symbol in tqdm(index):\n            prepare_data_for_symbol(symbol, data_dir, start_date, end_date, with_basics=with_basics)\n        # 2. Generate prompt and query GPT-4\n        query_gpt4(index, data_dir, start_date, end_date, min_past_weeks, max_past_weeks, with_basics=with_basics)\n        # 3. Transform into training format\n        dataset = create_dataset(index, data_dir, start_date, end_date, train_ratio, with_basics=with_basics)\n        ```\n\n3.  **Strategy Pattern (Model Configuration)**:\n    *   **Description**: Defines a family of algorithms, encapsulates each one, and makes them interchangeable.\n    *   **Implementation**: The `lora_module_dict` in `FinGPT_Benchmark/utils.py` holds different strategies (target modules) for applying LoRA based on the specific base model architecture (e.g., `chatglm2` uses `query_key_value`, while `llama2` uses `q_proj`, `k_proj`, `v_proj`).\n    *   **Code Example**:\n        ```python\n        # FinGPT_Benchmark/utils.py\n        lora_module_dict = {\n            'chatglm2': ['query_key_value'],\n            'llama2': ['q_proj', 'k_proj', 'v_proj'],\n            # ...\n        }\n        # ...\n        target_modules=lora_module_dict[args.base_model],\n        ```\n\n4.  **Composite Pattern (Raptor RAG)**:\n    *   **Description**: Composes objects into tree structures to represent part-whole hierarchies.\n    *   **Implementation**: The `recursive_embed_cluster_summarize` function in `rag.py` recursively processes summaries from one level as the \"documents\" for the next level, creating a hierarchical index where a cluster summary is a composite of its underlying document chunks.\n\n#### 3.3.2. Project Highlights\n\nThe FinGPT project demonstrates several innovative features that enhance its utility and flexibility in the financial domain:\n\n*   **Hierarchical RAG with Raptor**: The most innovative feature is the **Raptor** RAG system. By combining **UMAP** (dimensionality reduction) and **Gaussian Mixture Models (GMM)** for clustering, it creates a multi-level index of document summaries. This allows the RAG engine to retrieve not just granular text chunks but also high-level conceptual summaries, leading to more coherent and contextually rich answers from the LLM.\n*   **Accessibility through PEFT**: The core focus on **LoRA-based fine-tuning** significantly lowers the barrier to entry for financial LLM development. It allows researchers and developers to adapt massive models to financial tasks using consumer-grade GPUs, promoting the open-source spirit of the project.\n*   **End-to-End Financial Forecasting Pipeline**: The `FinGPT_Forecaster` module provides a complete, runnable example of how to convert raw market data into a structured, LLM-ready dataset, including the crucial step of using an external LLM for generating rationales and labels. This is a highly valuable, innovative feature for quantitative finance.\n*   **Robust Multisource Data Retrieval**: The dedicated `multisource_retrieval` component, with its site-specific scrapers (Yahoo, CNBC, Bloomberg), ensures the LLM can be grounded in up-to-date, real-world financial news, which is critical for time-sensitive financial applications.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nWhile the project is robust, several areas could be improved to enhance performance, maintainability, and architectural clarity:\n\n*   **Standardization and Code Consolidation**:\n    *   **Suggestion**: Consolidate the redundant `train_lora.py` and `utils.py` files found in multiple sub-projects (`FinGPT_Benchmark`, `FinGPT_Forecaster`, `FinGPT_Sentiment_Analysis_v3`).\n    *   **Benefit**: Reduces code duplication, simplifies maintenance, and ensures a single source of truth for core utilities like `tokenize` and `load_dataset`.\n*   **External Dependency Abstraction**:\n    *   **Suggestion**: Abstract the external LLM calls (e.g., `query_gpt4` in `data.py`) into a dedicated, configurable service layer (e.g., an `ExternalLLMService` class).\n    *   **Benefit**: Decouples the data pipeline from specific LLM providers, making it easier to switch between GPT-4, Claude, or other models, and simplifies API key management.\n*   **RAG System Optimization**:\n    *   **Suggestion**: The Raptor RAG system is computationally intensive due to UMAP and GMM clustering. Implement caching for the clustered embeddings and summaries, especially for static documents like financial reports.\n    *   **Benefit**: Reduces processing time and cost for repeated queries or application restarts.\n*   **Web Scraping Robustness**:\n    *   **Suggestion**: The `news_scraper.py` is highly dependent on HTML structure. Implement more resilient scraping techniques (e.g., using a general-purpose content extraction library) and add robust retry logic with exponential backoff to handle transient network errors and rate limits.\n\n#### 3.4.2. Secondary Development Guide\n\nFor developers looking to explore or extend the FinGPT codebase, the following path is recommended:\n\n1.  **Initial Exploration (Fine-Tuning)**:\n    *   Start by examining the **FinGPT_Benchmark** module. The `utils.py` file is essential for understanding model-specific configurations (LoRA targets) and data handling.\n    *   Review `train_lora.py` to grasp the standard fine-tuning workflow using Hugging Face and LoRA. This is the template for all model adaptation tasks.\n\n2.  **Understanding Data Flow (Forecasting)**:\n    *   The **FinGPT_Forecaster** module provides the clearest example of an end-to-end pipeline. Analyze `data_pipeline.py` to see how raw data is transformed into a structured dataset suitable for LLM training.\n\n3.  **Secondary Development - New Application Agent**:\n    *   To create a new financial application (e.g., a Merger & Acquisition Agent), the best approach is to reuse the existing components:\n        *   **Data**: Use the `multisource_retrieval` scrapers to gather M&A news.\n        *   **Model**: Use the `FinGPT_Benchmark/train_lora.py` script to fine-tune a base LLM on a new M&A-specific dataset.\n        *   **RAG**: If the task involves large documents (e.g., SEC filings), integrate the **Raptor** system from `FinGPT_FinancialReportAnalysis/utils/rag.py` to build the knowledge base.\n\n4.  **Contribution Focus**:\n    *   Focus contributions on developing new, robust scrapers in the `multisource_retrieval/scrapers` directory or creating new, standardized financial datasets for the community.\n    *   When adding new models, ensure the `lora_module_dict` in the core `utils.py` is updated with the correct target modules.\n\n"
  },
  {
    "path": "thirdparty/FinGenius.md",
    "content": "# FinGenius - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe FinGenius project exhibits a clean, modular structure typical of a well-organized Python application, with a clear separation of concerns between the core framework, agents, environments, and external capabilities.\n\n```\n/home/ubuntu/FinGenius\n├── config/                 # Configuration files for LLM settings and MCP server endpoints.\n│   ├── config.example.toml # Primary configuration for LLM, logging, and general settings.\n│   └── mcp.example.json    # Configuration for Model Context Protocol (MCP) server addresses.\n├── docs/                   # Documentation and visual assets (architecture diagrams, flow charts).\n├── main.py                 # The application's entry point and primary orchestration script.\n├── requirements.txt        # Lists all Python dependencies (e.g., pydantic, akshare, loguru).\n└── src/                    # The core source code directory.\n    ├── agent/              # **Core Module 1: Agent Definitions**\n    │   ├── base.py         # Defines BaseAgent, the abstract foundation for all agents.\n    │   ├── react.py        # Implements the ReAct (Reasoning and Acting) pattern.\n    │   ├── mcp.py          # Defines MCPAgent, integrating the Model Context Protocol.\n    │   └── [specialized].py# Contains the concrete, domain-specific agents (e.g., chip_analysis.py).\n    ├── environment/        # **Core Module 2: Execution Contexts**\n    │   ├── base.py         # Defines BaseEnvironment and the EnvironmentFactory.\n    │   ├── research.py     # Implements the Research Phase (data collection and analysis).\n    │   └── battle.py       # Implements the Battle Phase (adversarial debate and voting).\n    ├── tool/               # **Core Module 3: External Capabilities**\n    │   ├── base.py         # Defines BaseTool and ToolCollection, the tool interface.\n    │   ├── battle.py       # The tool agents use to interact within the BattleEnvironment.\n    │   ├── search/         # Contains various web search tools (Baidu, Google, DuckDuckGo).\n    │   └── [specialized].py# Contains tools for financial data fetching (e.g., big_deal_analysis.py).\n    ├── mcp/                # **Core Module 4: MCP Server Stubs**\n    │   └── [server].py     # Contains stubs for the specialized financial data servers (e.g., sentiment_server.py).\n    ├── prompt/             # **Core Module 5: Agent Prompts**\n    │   └── [agent_name].py # Stores the extensive system and next-step prompts for each agent.\n    ├── schema.py           # Pydantic models for data structures (Message, Memory, AgentState).\n    ├── llm.py              # Wrapper for LLM API calls.\n    └── logger.py           # Configuration for the loguru logging system.\n```\nThe structure clearly separates the core framework (`src/`), configuration (`config/`), and entry point (`main.py`). The `src/` directory is further divided into functional modules: `agent` for the actors, `environment` for the stages, `tool` for the capabilities, and `prompt` for the agent's \"mindset.\" This organization adheres to the principles of modular design and separation of concerns, which is essential for a complex multi-agent system.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinGenius/src/agent`: Contains the definitions for all specialized AI agents, including the base classes (`BaseAgent`, `ReActAgent`, `ToolCallAgent`, `MCPAgent`) and the domain-specific agents (e.g., `ChipAnalysisAgent`, `HotMoneyAgent`).\n*   `/home/ubuntu/FinGenius/src/environment`: Defines the two core operational environments (`ResearchEnvironment`, `BattleEnvironment`) and their base class (`BaseEnvironment`), which manage agent execution and interaction flow.\n*   `/home/ubuntu/FinGenius/src/tool`: Houses the definitions for all external capabilities and internal actions available to the agents, such as data fetching tools (`BigDealAnalysisTool`) and interaction tools (`Battle`, `Terminate`).\n*   `/home/ubuntu/FinGenius/src/mcp`: Contains the logic for the Model Context Protocol (MCP) integration, including the client-side logic used by `MCPAgent` and the server-side stubs for the specialized financial data services.\n*   `/home/ubuntu/FinGenius/src/prompt`: Stores the extensive system and next-step prompt templates (in Python string format) used to guide the behavior and reasoning of the various agents.\n*   `/home/ubuntu/FinGenius/src`: Contains core utility files and foundational classes like `llm.py`, `logger.py`, `schema.py`, and the main entry point logic.\n\n## Phase 2: Module-by-Module Deep Analysis\n\nThe FinGenius project is structured around five core Python modules, each serving a distinct purpose in the multi-agent system.\n\n### 1. `src/agent` Module (The Actors)\nThis module defines the entire agent hierarchy, from the abstract base to the specialized financial experts.\n\n*   **Files Enumerated:** `base.py`, `react.py`, `toolcall.py`, `mcp.py`, `chip_analysis.py`, `big_deal_analysis.py`, `hot_money.py`, `risk_control.py`, `sentiment.py`, `technical_analysis.py`, `report.py`.\n*   **Core Responsibility:** To provide the foundational logic for agent execution, memory management, LLM interaction, and to define the specific roles and capabilities of each financial expert agent.\n*   **Key Implementation Details:**\n    *   **`BaseAgent` (`base.py`):** Implements the main `run()` loop, state transitions (`AgentState`), and memory updates. It includes logic to detect and handle a \"stuck state\" (duplicate responses) by modifying the `next_step_prompt`.\n    *   **`ReActAgent` (`react.py`):** Overrides `step()` to implement the **ReAct pattern**, parsing the LLM's response to determine if the next action is a `thought` or a `tool_call`.\n    *   **`MCPAgent` (`mcp.py`):** The final base class, which integrates the `MCPClient` for specialized tool access. All domain agents inherit from this, ensuring they are \"MCP-enabled.\"\n    *   **Specialized Agents:** Agents like `ChipAnalysisAgent` and `BigDealAnalysisAgent` are simple, highly-configured classes. Their primary implementation is setting their unique `name`, `description`, `system_prompt`, and the specific `ToolCollection` they are allowed to use. This adheres to the **Strategy Pattern**.\n\n### 2. `src/environment` Module (The Stage)\nThis module defines the execution contexts that govern agent interaction and the overall workflow.\n\n*   **Files Enumerated:** `base.py`, `research.py`, `battle.py`.\n*   **Core Responsibility:** To manage the lifecycle of agents, define the rules of engagement, and orchestrate the two-phase analysis process (Research and Battle).\n*   **Key Implementation Details:**\n    *   **`BaseEnvironment` (`base.py`):** Provides the abstract interface and a factory (`EnvironmentFactory`) for creating environments. It manages the registration and retrieval of agents.\n    *   **`ResearchEnvironment` (`research.py`):** Manages the initial data collection. Its `run()` method executes all specialized agents, typically in parallel, and aggregates their final reports into a single `research_results` dictionary.\n    *   **`BattleEnvironment` (`battle.py`):** Implements the core innovation: the adversarial debate. It uses the **`BattleState`** class to track the debate history, agent order, and voting results. The `run()` method manages the multi-round debate, constructing a **cumulative context** (research results + previous speeches) for each agent before its turn. It acts as a **Mediator** for agent communication via the `Battle` tool.\n\n### 3. `src/tool` Module (The Capabilities)\nThis module provides the external and internal actions available to the agents, serving as the interface between the LLM-driven logic and the external world.\n\n*   **Files Enumerated:** `base.py`, `terminate.py`, `tool_collection.py`, `battle.py`, `big_deal_analysis.py`, `chip_analysis.py`, `search/` (various web search tools).\n*   **Core Responsibility:** To define a standard interface (`BaseTool`) for all capabilities and to implement the logic for data fetching, web searching, and inter-agent communication.\n*   **Key Implementation Details:**\n    *   **`BaseTool` (`base.py`):** An abstract class that defines the `name`, `description`, `parameters` (for LLM function calling), and the `async execute()` method. It also includes utility classes like `ToolResult` and `ToolFailure`.\n    *   **`ToolCollection` (`tool_collection.py`):** A container class that holds all available tools for an agent, mapping tool names to instances and providing the list of tool schemas to the LLM.\n    *   **`BigDealAnalysisTool` (`big_deal_analysis.py`):** A specialized tool that wraps the `akshare` library to fetch and process big order fund flow data, including a simple retry mechanism for unstable API calls.\n    *   **`Battle` (`battle.py`):** A unique tool that allows agents to `speak` and `vote` within the `BattleEnvironment`, acting as the communication channel for the debate.\n\n### 4. `src/mcp` Module (The Protocol Integration)\nThis module handles the Model Context Protocol (MCP) integration, which is key to accessing specialized financial data.\n\n*   **Files Enumerated:** `__init__.py`, `battle_server.py`, `big_deal_analysis_server.py`, `server.py`, etc.\n*   **Core Responsibility:** To define the server-side stubs for the specialized financial data services. These stubs are likely used in a separate deployment environment but are included here to define the protocol endpoints that the `MCPAgent`s are designed to call.\n*   **Key Implementation Details:** The files primarily contain `MCPServer` implementations (or stubs) for services like `sentiment_server` and `chip_analysis_server`, defining the expected input and output schemas for the financial data APIs.\n\n### 5. `src/prompt` Module (The Agent Mindset)\nThis module contains the extensive, Chinese-language prompt templates that define the personality, role, and instructions for each agent.\n\n*   **Files Enumerated:** `battle.py`, `big_deal_analysis.py`, `chip_analysis.py`, `hot_money.py`, `risk_control.py`, `sentiment.py`, `technical_analysis.py`, etc.\n*   **Core Responsibility:** To provide the system prompts (`SYSTEM_PROMPT`) and next-step prompts (`NEXT_STEP_PROMPT_ZN`) that guide the LLM's behavior within the ReAct loop, ensuring the agents adhere to their specialized financial roles and the rules of the environment. The prompts are critical for the project's A-share market specialization.\n\n### Module PlantUML Diagrams\n\n## Agent Module PlantUML Diagram\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Monospaced\nskinparam defaultFontSize 12\n\npackage \"src.agent\" {\n    abstract class BaseAgent {\n        + name: str\n        + memory: Memory\n        + state: AgentState\n        + run(request)\n        + {abstract} step()\n        + is_stuck()\n    }\n\n    abstract class ReActAgent {\n        + step()\n        - _parse_llm_response()\n    }\n\n    abstract class ToolCallAgent {\n        + available_tools: ToolCollection\n        + step()\n        - _execute_tool(tool_call)\n    }\n\n    class MCPAgent {\n        + mcp_client: MCPClient\n    }\n\n    class ChipAnalysisAgent\n    class BigDealAnalysisAgent\n    class HotMoneyAgent\n    class RiskControlAgent\n    class SentimentAgent\n    class TechnicalAnalysisAgent\n    class ReportAgent\n\n    BaseAgent <|-- ReActAgent\n    ReActAgent <|-- ToolCallAgent\n    ToolCallAgent <|-- MCPAgent\n\n    MCPAgent <|-- ChipAnalysisAgent\n    MCPAgent <|-- BigDealAnalysisAgent\n    MCPAgent <|-- HotMoneyAgent\n    MCPAgent <|-- RiskControlAgent\n    MCPAgent <|-- SentimentAgent\n    MCPAgent <|-- TechnicalAnalysisAgent\n    MCPAgent <|-- ReportAgent\n\n    BaseAgent ..> [src.schema.Memory] : uses\n    ToolCallAgent ..> [src.tool.ToolCollection] : manages\n    MCPAgent ..> [src.mcp.MCPClient] : uses\n}\n@enduml\n```\n\n## Environment Module PlantUML Diagram\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Monospaced\nskinparam defaultFontSize 12\n\npackage \"src.environment\" {\n    abstract class BaseEnvironment {\n        + name: str\n        + agents: Dict[str, BaseAgent]\n        + register_agent(agent)\n        + {abstract} run()\n    }\n\n    class ResearchEnvironment {\n        + run()\n        - _create_agents()\n        - _aggregate_results()\n    }\n\n    class BattleEnvironment {\n        + battle_state: BattleState\n        + run()\n        + handle_speak(agent_id, speak)\n        + handle_vote(agent_id, vote)\n        - _get_cumulative_context()\n    }\n\n    class BattleState {\n        + agent_order: List[str]\n        + debate_history: List[Dict]\n        + final_votes: Dict[str, str]\n        + _recalculate_vote_results()\n    }\n\n    class EnvironmentFactory {\n        + {static} create_environment(type, agents)\n    }\n\n    BaseEnvironment <|-- ResearchEnvironment\n    BaseEnvironment <|-- BattleEnvironment\n\n    BattleEnvironment o-- BattleState : manages\n\n    BaseEnvironment ..> [src.agent.BaseAgent] : contains\n    EnvironmentFactory ..> BaseEnvironment : creates\n}\n@enduml\n```\n\n## Tool Module PlantUML Diagram\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Monospaced\nskinparam defaultFontSize 12\n\npackage \"src.tool\" {\n    abstract class BaseTool {\n        + name: str\n        + description: str\n        + parameters: Dict\n        + {abstract} execute(**kwargs)\n        + to_param()\n    }\n\n    class ToolResult {\n        + output: Any\n        + error: Optional[str]\n    }\n\n    class ToolCollection {\n        + tools: Dict[str, BaseTool]\n        + get_tool_schemas()\n        + execute_tool(name, **kwargs)\n    }\n\n    class Terminate\n    class Battle {\n        + agent_id: str\n        + controller: BattleEnvironment\n        + execute(speak, vote)\n    }\n    class BigDealAnalysisTool {\n        + execute(stock_code)\n        - _safe_fetch(akshare_func)\n    }\n    class ChipAnalysisTool\n    class CreateChatCompletion\n    class WebSearchTool\n\n    BaseTool <|-- Terminate\n    BaseTool <|-- Battle\n    BaseTool <|-- BigDealAnalysisTool\n    BaseTool <|-- ChipAnalysisTool\n    BaseTool <|-- CreateChatCompletion\n    BaseTool <|-- WebSearchTool\n\n    ToolCollection o-- BaseTool : aggregates\n    BaseTool ..> ToolResult : returns\n    Battle ..> [src.environment.BattleEnvironment] : interacts with (controller)\n}\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe FinGenius architecture is built upon a set of well-defined core abstractions that facilitate the multi-agent, dual-environment design.\n\n**1. Agent Hierarchy (The Actors):**\nThe agent system follows a clear inheritance chain, embodying the **Strategy Pattern** and **Template Method Pattern**.\n*   **`BaseAgent` (`src/agent/base.py`):** The foundational abstract class. It provides core agent capabilities: state management (`AgentState`), memory (`Memory`), logging, and the main execution loop (`run()`). It enforces the abstract method `step()`, which is the single unit of work for any agent.\n*   **`ReActAgent` (`src/agent/react.py`):** Implements the **ReAct (Reasoning and Acting) pattern**. It extends `BaseAgent` by structuring the `step()` method to alternate between internal thought (reasoning) and external action (tool use).\n*   **`ToolCallAgent` (`src/agent/toolcall.py`):** Extends `ReActAgent` to manage and execute tools. It handles the parsing of LLM responses for function calls and the execution of the tools contained within the `ToolCollection`.\n*   **`MCPAgent` (`src/agent/mcp.py`):** The final, specialized base class. It extends `ToolCallAgent` to integrate the **Model Context Protocol (MCP)**, allowing agents to access specialized financial data servers via `MCPClient`. All domain-specific agents (e.g., `ChipAnalysisAgent`) inherit from this class.\n\n**2. Environment Hierarchy (The Stage):**\nThe environments define the context and rules of interaction for the agents.\n*   **`BaseEnvironment` (`src/environment/base.py`):** The abstract base class for all environments. It manages a collection of agents (`self.agents`) and defines the abstract `run()` method. It also includes an `EnvironmentFactory` for creating specific environment types.\n*   **`ResearchEnvironment` (`src/environment/research.py`):** Implements the data collection and initial analysis phase. It is responsible for initializing the specialized agents and running them to gather their individual reports.\n*   **`BattleEnvironment` (`src/environment/battle.py`):** Implements the adversarial validation phase. It manages the structured debate, tracks the debate history, and records agent votes using the **`BattleState`** class. This environment acts as a **Mediator**, controlling the flow of communication between agents.\n\n**3. Data and Utility Abstractions:**\n*   **`Memory` and `Message` (`src/schema.py`):** These Pydantic models define the structure for agent memory and communication. `Memory` stores a list of `Message` objects, which adhere to the OpenAI chat format (system, user, assistant, tool roles).\n*   **`BaseTool` and `ToolCollection` (`src/tool/base.py`):** `BaseTool` is the abstract interface for all external capabilities, enforcing the `execute()` method. `ToolCollection` is a container that maps tool names to `BaseTool` instances, simplifying tool management for agents.\n*   **`LLM` (`src/llm.py`):** A wrapper class for interacting with the Large Language Model API, centralizing LLM configuration and call logic.\n\nThe design philosophy is a modular, layered approach, separating the core agent logic, the interaction protocols (environments), and the external capabilities (tools). This separation of concerns ensures high extensibility, allowing new agents, tools, or even new debate formats to be introduced with minimal impact on the core framework. The use of Pydantic for data models enforces strict data validation and structure across the system.\n\n#### 3.1.2. Component Interactions\n\nThe FinGenius system operates on a two-stage, sequential pipeline: **Research** followed by **Battle**. The entire process is orchestrated by `main.py`.\n\n**1. Initialization and Research Phase (Data Collection & Analysis):**\n*   **`main.py`** acts as the orchestrator. It initializes the `EnvironmentFactory` to create the `ResearchEnvironment` and a team of specialized `MCPAgent`s (e.g., `ChipAnalysisAgent`, `HotMoneyAgent`).\n*   **`ResearchEnvironment.run()`** executes the agents, typically in parallel or a defined sequence.\n*   **`MCPAgent.run()`** initiates the agent's ReAct loop, calling `step()` repeatedly.\n*   **`ToolCallAgent.step()`** (inherited by `MCPAgent`) is the core of the interaction. It sends the current memory and prompt to the `LLM` to decide on the next action.\n*   **LLM** responds with a `tool_call` (e.g., `big_deal_analysis_tool`).\n*   **`ToolCallAgent`** executes the tool via the **`ToolCollection`**.\n*   **`BigDealAnalysisTool.execute()`** (a specialized `BaseTool`) uses external libraries like `akshare` to fetch real-time financial data. This is the primary external data flow.\n*   The tool returns a `ToolResult` (structured data) to the agent.\n*   The agent incorporates the tool result into its memory and continues the ReAct loop until it decides to `Terminate`.\n*   The `ResearchEnvironment` collects the final output from all agents into a comprehensive `research_results` dictionary.\n\n**2. Battle Phase (Adversarial Validation & Decision):**\n*   **`main.py`** then initializes the `BattleEnvironment`, passing the `research_results` as context.\n*   **`BattleEnvironment.run()`** starts the multi-round debate, managed by the `BattleState`.\n*   Agents are instructed to speak and vote using the **`Battle`** tool.\n*   **`MCPAgent`** receives the full research context and the debate history (cumulative context) and uses the `Battle` tool to submit its argument (`speak`) and final decision (`vote`).\n*   **`Battle.execute()`** is handled by the `BattleEnvironment`'s controller, which records the speech in the `debate_history` and updates the `BattleState`'s `final_votes`.\n*   After a set number of rounds, the `BattleEnvironment` synthesizes the final conclusion based on the vote results (`vote_results` in `BattleState`).\n\n**3. Final Reporting:**\n*   The final decision and report are passed back to `main.py`, which uses the `ReportAgent` (or a similar mechanism) to format the output into a structured HTML or JSON report for the user.\n\nThe communication pattern is primarily **sequential orchestration** (`main.py` -> Research -> Battle) with **internal parallel execution** (agents running concurrently in the `ResearchEnvironment`) and a **Mediator pattern** (`BattleEnvironment` managing agent interactions via the `Battle` tool).\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Monospaced\nskinparam defaultFontSize 12\n\npackage \"FinGenius\" {\n    package \"src\" {\n        package \"agent\" {\n            abstract class BaseAgent\n            abstract class ReActAgent\n            abstract class ToolCallAgent\n            class MCPAgent\n            class ChipAnalysisAgent\n            class BigDealAnalysisAgent\n            class HotMoneyAgent\n            class RiskControlAgent\n            class SentimentAgent\n            class TechnicalAnalysisAgent\n            class ReportAgent\n        }\n\n        package \"environment\" {\n            abstract class BaseEnvironment\n            class ResearchEnvironment\n            class BattleEnvironment\n            class EnvironmentFactory\n            class BattleState\n        }\n\n        package \"tool\" {\n            abstract class BaseTool\n            class ToolCollection\n            class Terminate\n            class Battle\n            class BigDealAnalysisTool\n            class ChipAnalysisTool\n            class CreateChatCompletion\n            class FinancialDeepSearchTool\n            class WebSearchTool\n        }\n\n        package \"mcp\" {\n            class MCPClient\n            class MCPServer\n        }\n\n        package \"core\" {\n            class LLM\n            class Memory\n            class Message\n            class AgentState\n        }\n\n        [main.py]\n    }\n}\n\n' Inheritance\nBaseAgent <|-- ReActAgent\nReActAgent <|-- ToolCallAgent\nToolCallAgent <|-- MCPAgent\nMCPAgent <|-- ChipAnalysisAgent\nMCPAgent <|-- BigDealAnalysisAgent\nMCPAgent <|-- HotMoneyAgent\nMCPAgent <|-- RiskControlAgent\nMCPAgent <|-- SentimentAgent\nMCPAgent <|-- TechnicalAnalysisAgent\nMCPAgent <|-- ReportAgent\n\nBaseEnvironment <|-- ResearchEnvironment\nBaseEnvironment <|-- BattleEnvironment\n\n' Dependencies\nBaseAgent ..> LLM : uses\nBaseAgent ..> Memory : uses\nBaseAgent ..> AgentState : manages\nMCPAgent ..> MCPClient : uses\nToolCallAgent ..> ToolCollection : manages\nToolCollection o-- BaseTool : aggregates\n\nResearchEnvironment o-- MCPAgent : contains (Research Team)\nBattleEnvironment o-- MCPAgent : contains (Battle Team)\nBattleEnvironment ..> BattleState : manages\nBattleEnvironment ..> Battle : uses (Tool)\n\n[main.py] ..> EnvironmentFactory : creates\n[main.py] ..> ResearchEnvironment : runs\n[main.py] ..> BattleEnvironment : runs\n\nBaseTool <|-- Battle\nBaseTool <|-- BigDealAnalysisTool\nBaseTool <|-- ChipAnalysisTool\nBaseTool <|-- Terminate\n\n' Data Flow / Interaction\n[main.py] --> ResearchEnvironment : Start Analysis\nResearchEnvironment --> MCPAgent : Execute Step\nMCPAgent --> ToolCollection : Call Tool\nToolCollection --> BaseTool : Execute\nResearchEnvironment --> BattleEnvironment : Pass Results\nBattleEnvironment --> MCPAgent : Debate Round\nMCPAgent --> Battle : Speak/Vote\nBattleEnvironment --> [main.py] : Final Report\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe FinGenius project effectively utilizes several key design patterns to manage complexity, promote modularity, and implement the multi-agent logic.\n\n**1. Chain of Responsibility / Template Method Pattern (Agent Hierarchy):**\nThe agent structure is a classic example of the **Template Method Pattern** implemented via a **Chain of Responsibility**.\n*   **Implementation:** The inheritance chain `BaseAgent` -> `ReActAgent` -> `ToolCallAgent` -> `MCPAgent` defines a fixed sequence of responsibilities. `BaseAgent` handles the execution loop, `ReActAgent` injects the reasoning/acting logic, and `ToolCallAgent` adds tool execution. The abstract `step()` method in `BaseAgent` is the template method that is refined at each level.\n*   **Example:** `MCPAgent`'s `step()` method calls `ToolCallAgent`'s logic, which in turn relies on `ReActAgent`'s logic to decide whether to reason or call a tool.\n\n**2. Strategy Pattern (Specialized Agents):**\nThe domain-specific agents (e.g., `ChipAnalysisAgent`, `HotMoneyAgent`) are concrete strategies that implement the agent interface defined by `MCPAgent`.\n*   **Implementation:** Each specialized agent is configured with a unique `system_prompt` and a specific `ToolCollection` containing only the tools relevant to its domain (e.g., `ChipAnalysisAgent` gets `ChipAnalysisTool`).\n*   **Example:** The difference between a `RiskControlAgent` and a `SentimentAgent` is primarily their system prompt (strategy) and the set of tools they are allowed to use (capabilities).\n\n**3. Mediator Pattern (BattleEnvironment):**\nThe `BattleEnvironment` acts as a mediator, controlling the interactions between the agents during the debate phase.\n*   **Implementation:** Agents do not communicate directly. Instead, they use the **`Battle`** tool, which routes their `speak` and `vote` actions to the `BattleEnvironment`'s controller. The environment then updates the shared `BattleState` and broadcasts the new context to the next agent.\n*   **Example:** When an agent calls `battle(speak=\"...\", vote=\"bullish\")`, the `BattleEnvironment` processes this, records it in `debate_history`, and then constructs the cumulative context for the next agent, ensuring controlled, structured communication.\n\n**4. Factory Method Pattern (EnvironmentFactory):**\nThe `EnvironmentFactory` is responsible for creating and initializing the correct environment type (`ResearchEnvironment` or `BattleEnvironment`) based on an input parameter.\n*   **Implementation:** The static method `EnvironmentFactory.create_environment(environment_type, ...)` encapsulates the logic for instantiating the correct environment class and registering the necessary agents. This decouples the client (`main.py`) from the concrete environment classes.\n\n**5. Adapter Pattern (BaseTool and ToolCollection):**\nThe `BaseTool` and `ToolCollection` serve as an adapter layer to integrate external capabilities (like `akshare` or the `Battle` mechanism) into the LLM's function-calling interface.\n*   **Implementation:** `BaseTool.to_param()` converts the Python class definition into the required JSON schema for the LLM. The `execute()` method then adapts the LLM's call into the actual Python function logic.\n\n| Pattern | Component | Role in FinGenius |\n| :--- | :--- | :--- |\n| **Template Method** | `BaseAgent` | Defines the skeleton of the agent's execution loop (`run`, `step`). |\n| **Strategy** | Specialized Agents | Each agent is a strategy with a unique prompt and toolset for a specific financial domain. |\n| **Mediator** | `BattleEnvironment` | Controls and structures the communication and debate flow between agents. |\n| **Factory Method** | `EnvironmentFactory` | Centralizes the creation and initialization of `Research` and `Battle` environments. |\n| **Adapter** | `BaseTool` / `ToolCollection` | Adapts external functions and internal logic for the LLM's function-calling interface. |\n\n#### 3.3.2. Project Highlights\n\nThe FinGenius project stands out due to its innovative approach to financial analysis, leveraging a sophisticated multi-agent architecture tailored for the Chinese A-share market.\n\n*   **Research–Battle Dual-Environment Architecture:** This is the core innovation. The system separates the process into two distinct phases: the **Research Environment** for parallel, specialized data collection and analysis, and the **Battle Environment** for adversarial validation. This dual structure ensures that the final conclusion is not just a summary of individual findings but a synthesis derived from a structured, competitive debate, significantly reducing the risk of LLM \"hallucination.\"\n*   **A-Share Market Specialization and Localization:** The project is explicitly designed to overcome the \"water-soil incompatibility\" of general-purpose AI in the Chinese financial context. This is achieved through:\n    *   **Specialized Agents:** Agents like the **Hot Money Agent (游资agent)** and **Chip Agent (筹码agent)** are based on unique A-share market concepts (e.g., Dragon and Tiger Lists, chip distribution).\n    *   **Localized Tools:** Integration with Chinese financial data APIs like `akshare` and localized search tools (Baidu search) ensures relevance and accuracy.\n    *   **Chinese Prompts:** The use of extensive, high-quality Chinese system prompts in `src/prompt` ensures the LLM's reasoning is grounded in the correct market terminology and context.\n*   **Cumulative Debate Mechanism:** The `BattleEnvironment` implements a sophisticated debate structure where each agent's argument is informed by the full research context and the speeches of all preceding agents in the current round. This **cumulative context** fosters a deeper, more context-aware discussion, simulating a real-world, progressive analysis process.\n*   **Modular and Extensible Design:** The clear separation of concerns using the **Agent-Environment-Tool** architecture (Strategy and Factory patterns) makes the system highly extensible. Adding a new financial expert (Agent) or a new data source (Tool) requires minimal changes to the core framework, primarily involving configuration and inheritance.\n*   **Robust State and Memory Management:** The use of Pydantic models for `Message`, `Memory`, and `BattleState` enforces strict data structure and validation. The `BaseAgent`'s built-in logic to detect and handle \"stuck states\" (duplicate responses) enhances the robustness of the autonomous execution loop.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe FinGenius project is architecturally sound, but several areas can be optimized for performance, robustness, and maintainability.\n\n**1. Performance and Robustness:**\n*   **Asynchronous Data Fetching and Caching:** The current tool implementations, particularly those relying on external APIs like `akshare` (e.g., `BigDealAnalysisTool`), appear to use synchronous calls within an `async` framework. While the `execute` method is `async`, the internal `_with_retry` and `_safe_fetch` functions use `time.sleep()`, which blocks the event loop.\n    *   **Suggestion:** Refactor all external API calls to use `aiohttp` or an asynchronous wrapper for `akshare` to prevent blocking the main event loop, significantly improving concurrency in the `ResearchEnvironment`. Implement a time-to-live (TTL) cache (e.g., using Redis) for frequently requested financial data to reduce redundant API calls and improve speed.\n*   **Tool Execution Timeout:** The `ToolCallAgent` should implement a strict timeout mechanism for tool execution to prevent a single unresponsive tool from stalling the entire agent's `run()` loop.\n\n**2. Architecture Optimization:**\n*   **Dynamic Tool Registration:** The `ToolCollection` is currently a static container. For a highly extensible system, consider implementing a dynamic tool discovery mechanism (e.g., using Python entry points or a configuration file) that automatically loads tools into the `ToolCollection` based on the agent's configuration, rather than requiring manual import and instantiation in each agent file.\n*   **Environment State Management:** The `BattleState` is a large Pydantic model. While effective, for long-running debates, consider offloading the `battle_history` and `debate_history` to a persistent store (e.g., a database) to reduce memory footprint and enable recovery from crashes.\n\n**3. Code Quality and Maintainability:**\n*   **Prompt Management Refinement:** The system prompts are stored as large Python string variables in `src/prompt/*.py`. This is difficult to manage and version control.\n    *   **Suggestion:** Consolidate prompts into a structured format (e.g., YAML or JSON files) or use a dedicated prompt management library. This would allow for easier localization, versioning, and separation of prompt content from Python logic.\n*   **Type Hinting Consistency:** While Pydantic is used extensively, the use of `Any` in critical areas (e.g., `controller: Optional[Any]` in `Battle` tool) reduces type safety. Replace `Any` with specific protocol classes or forward references to improve static analysis and code clarity.\n*   **Error Handling in Tools:** The `_safe_fetch` function in `BigDealAnalysisTool` returns `None` on failure. While safe, this can lead to silent failures.\n    *   **Suggestion:** Tools should return a `ToolFailure` object with a detailed error message, allowing the agent's ReAct loop to explicitly reason about the failure and attempt a recovery strategy, rather than simply receiving `None` data.\n\n#### 3.4.2. Secondary Development Guide\n\nThe FinGenius project is highly modular, making secondary development straightforward by focusing on the three core components: **Agents**, **Tools**, and **Environments**.\n\n### 1. Code Exploration Path\nTo understand the system flow, follow this path:\n1.  **Entry Point:** Start with `main.py` to see the high-level orchestration: environment creation, sequential execution of Research and Battle phases, and final report generation.\n2.  **Environment Flow:** Examine `src/environment/research.py` and `src/environment/battle.py` to understand the rules and data flow for each phase.\n3.  **Agent Logic:** Study the agent hierarchy in `src/agent/base.py` and `src/agent/toolcall.py` to grasp the ReAct loop and tool-calling mechanism.\n4.  **Capabilities:** Review `src/tool/base.py` and the specific tool implementations (e.g., `src/tool/big_deal_analysis.py`) to see how external data is fetched and processed.\n\n### 2. Adding a New Specialized Agent\nTo introduce a new financial expert (e.g., a \"Policy Agent\"):\n1.  **Define the Agent:** Create a new file (e.g., `src/agent/policy.py`) inheriting from `MCPAgent`.\n    ```python\n    class PolicyAgent(MCPAgent):\n        name: str = \"policy_agent\"\n        description: str = \"分析宏观政策和行业监管变动。\"\n        system_prompt: str = POLICY_SYSTEM_PROMPT # Define this prompt\n        available_tools: ToolCollection = Field(\n            default_factory=lambda: ToolCollection(PolicyTool(), Terminate())\n        )\n    ```\n2.  **Create Necessary Tools:** If the agent needs new capabilities, create a `BaseTool` implementation (e.g., `PolicyTool`) in `src/tool/`.\n3.  **Register the Agent:** Modify `src/environment/research.py`'s `_create_agents` method to instantiate and include the new `PolicyAgent` in the research team.\n\n### 3. Adding a New Tool (External Capability)\nTo integrate a new data source or function:\n1.  **Define the Tool:** Create a new file (e.g., `src/tool/new_data_source.py`) inheriting from `BaseTool`.\n2.  **Implement Execution:** Implement the `async def execute(...)` method, which contains the logic for interacting with the external service (e.g., a new financial API).\n3.  **Update Agent Toolset:** Add the new tool to the `ToolCollection` of the relevant specialized agent(s) in `src/agent/`.\n\n### 4. Configuration\n*   **LLM Configuration:** Modify `config/config.example.toml` to change the LLM model, API key, and other parameters.\n*   **MCP Configuration:** Adjust `config/mcp.example.json` to configure the endpoints for the specialized financial data servers that the `MCPAgent`s connect to.\n\nBy adhering to the established agent hierarchy and the Tool/Environment separation, new features can be added with high confidence and minimal side effects.\n\n"
  },
  {
    "path": "thirdparty/FinRL-Meta.md",
    "content": "# FinRL-Meta - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project structure is highly modular, with the core logic encapsulated within the `meta/` directory. This design facilitates clear separation of concerns between data handling, environment simulation, and agent implementation.\n\n```\nFinRL-Meta/\n├── meta/                               # Core library source code for FinRL-Meta framework.\n│   ├── agents/                         # DRL Agent implementations and wrappers for various DRL libraries (ElegantRL, RLLib, Stable-Baselines3).\n│   ├── config.py                       # Global configuration constants, including ticker lists, time zones, and API key placeholders.\n│   ├── data_processor.py               # The Facade class that orchestrates the entire data pipeline, selecting and running the appropriate data source processor.\n│   ├── data_processors/                # Module containing concrete implementations for fetching and cleaning data from different financial APIs.\n│   │   ├── _base.py                    # Abstract base class defining the common interface for all data processors (Strategy Pattern).\n│   │   ├── yahoofinance.py             # Implementation for fetching data from Yahoo Finance.\n│   │   ├── binance.py                  # Implementation for fetching data from Binance.\n│   │   └── ...                         # Other data source implementations (Alpaca, Tushare, etc.).\n│   ├── env_crypto_trading/             # Module for cryptocurrency trading environments.\n│   │   ├── env_multiple_crypto.py      # Multi-asset cryptocurrency trading environment, adhering to the OpenAI Gym interface.\n│   │   ├── env_btc_ccxt.py             # Single-asset Bitcoin trading environment.\n│   │   └── alpaca_paper_trade_multicrypto.py # Interface for live/paper trading execution using the Alpaca API.\n│   └── env_execution_optimizing/       # Module for specialized execution optimization problems.\n│       └── liquidation/                # Sub-module for the optimal liquidation problem.\n│           ├── env_execution_optimizing.py # Market environment based on the Almgren-Chriss model.\n│           ├── ddpg_agent.py           # Implementation of the DDPG agent for continuous control.\n│           └── model.py                # Neural network definitions (Actor and Critic) for the DDPG agent.\n├── README.md                           # Project documentation and usage examples.\n├── setup.py                            # Python package setup file.\n└── ...                                 # Non-core files (e.g., examples, notebooks, docs).\n```\nThe structure clearly delineates the **Data Layer** (`data_processors/`), the **Environment Layer** (`env_crypto_trading/`, `env_execution_optimizing/`), and the **Agent Layer** (`agents/`, `liquidation/ddpg_agent.py`), supporting the project's modular design philosophy. The use of a central `data_processor.py` and `config.py` provides global control and configuration points. The separation of environments into distinct domains (crypto trading vs. execution optimizing) allows for specialized modeling of market dynamics.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinRL-Meta/meta/data_processors`: Contains the core logic for fetching, cleaning, and transforming financial market data from various sources.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinRL-Meta/meta/env_crypto_trading`: Contains the reinforcement learning environments and live trading interfaces for cryptocurrency portfolio management.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinRL-Meta/meta/env_execution_optimizing/liquidation`: Contains the specialized environment and DRL agent implementation for the optimal trade execution (liquidation) problem.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinRL-Meta/meta/agents`: Contains the wrappers and base classes for integrating various external DRL libraries.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module Analysis\n\n### 1. Module: `meta/agents`\n*   **Files Enumerated**: `elegantrl_models.py`, `rllib_models.py`, `stablebaselines3_models.py`.\n*   **Module Core Responsibility**: To provide a standardized interface and wrappers for integrating various external Deep Reinforcement Learning (DRL) libraries (ElegantRL, RLLib, Stable-Baselines3) with the FinRL-Meta environments. This module abstracts the library-specific agent creation and training logic.\n*   **Key File Identification**:\n    *   `stablebaselines3_models.py`: Contains the `DRLAgent` class, which acts as a wrapper for Stable-Baselines3 algorithms (e.g., A2C, PPO, DDPG). It handles the creation of the agent, training, and testing, providing a unified API for the main workflow.\n    *   `elegantrl_models.py`: Provides similar wrappers for ElegantRL agents.\n    *   `rllib_models.py`: Provides wrappers for RLLib agents.\n*   **Core Implementation**: The `DRLAgent` classes typically take an environment, a model name, and hyperparameters. They encapsulate the boilerplate code for agent initialization, model saving/loading, and the training loop (`train_model`, `get_model`).\n*   **Dependencies**: Depends heavily on external DRL libraries (Stable-Baselines3, ElegantRL, RLLib) and the custom environments defined in the `meta/env_...` modules.\n\n### 2. Module: `meta/data_processors`\n*   **Files Enumerated**: `_base.py`, `alpaca.py`, `binance.py`, `ccxt.py`, `tushare.py`, `yahoofinance.py`.\n*   **Module Core Responsibility**: To provide concrete implementations for fetching, cleaning, and transforming raw financial data from various sources into a standardized format (Pandas DataFrame) and ultimately into NumPy arrays for the RL environments.\n*   **Key File Identification**:\n    *   `_base.py`: Defines the abstract base class `_Base`, which outlines the common interface (`download_data`, `clean_data`, `add_technical_indicator`, `df_to_array`) that all concrete processors must implement. This is the core of the Strategy Pattern.\n    *   `yahoofinance.py`: Implements data fetching using the `yfinance` library, including specific logic for price adjustment and handling time intervals.\n    *   `binance.py`: Implements data fetching from the Binance exchange, handling specific API calls and data aggregation logic.\n*   **Core Implementation**: The `download_data` methods handle API interaction. The `clean_data` methods are crucial for filling missing values and ensuring data integrity. The `df_to_array` method transforms the final DataFrame into the required NumPy arrays (`price_array`, `tech_array`, `turbulence_array`) for the RL environment.\n*   **Dependencies**: Depends on external data libraries (`yfinance`, `ccxt`, `tushare`, `alpaca_trade_api`) and common data science libraries (`pandas`, `numpy`).\n\n### 3. Module: `meta/env_crypto_trading`\n*   **Files Enumerated**: `alpaca_paper_trade_multicrypto.py`, `create_crypto_env.py`, `env_btc_ccxt.py`, `env_multiple_crypto.py`.\n*   **Module Core Responsibility**: To define the simulation environments for cryptocurrency trading and provide an interface for live/paper trading execution.\n*   **Key File Identification**:\n    *   `env_multiple_crypto.py`: Defines the `CryptoEnv` class, the primary multi-asset RL environment. It implements the core `reset()` and `step()` methods, managing the portfolio state (cash, stocks) and calculating the reward based on asset value change.\n    *   `alpaca_paper_trade_multicrypto.py`: Defines `AlpacaPaperTradingMultiCrypto`, which acts as the execution layer. It loads a trained DRL policy, fetches real-time data, infers an action, and executes trades via the Alpaca API.\n*   **Core Implementation**: The `CryptoEnv.step()` method contains the critical trading logic: action normalization (to handle large price differences), transaction cost calculation, and portfolio update. The state is constructed by stacking normalized cash, stocks, and a lookback window of technical indicators.\n*   **Dependencies**: Depends on the `meta/data_processors` for data, and external libraries like `gym`, `numpy`, `pandas`, and `alpaca_trade_api`.\n\n### 4. Module: `meta/env_execution_optimizing/liquidation`\n*   **Files Enumerated**: `ddpg_agent.py`, `env_execution_optimizing.py`, `model.py`, `utils.py`.\n*   **Module Core Responsibility**: To provide a specialized environment and DRL agent for the optimal trade execution problem, specifically the Almgren-Chriss liquidation model.\n*   **Key File Identification**:\n    *   `env_execution_optimizing.py`: Defines `MarketEnvironment`, which models the stock price dynamics under market impact (permanent and temporary) and calculates the reward based on the Almgren-Chriss utility function.\n    *   `ddpg_agent.py`: Defines the `Agent` class, a standard implementation of the DDPG algorithm, including `Actor` and `Critic` networks, `ReplayBuffer`, and `OU_Noise`.\n*   **Core Implementation**: The `MarketEnvironment.step()` method is the core, implementing the price evolution and market impact equations. The DDPG `Agent.learn()` method implements the standard DDPG update rules for the Actor and Critic networks.\n*   **Dependencies**: Depends on `numpy`, `torch`, and standard DRL components.\n\n### Module PlantUML Diagrams\n\n\n@startuml\ntitle Agents Module (Stable-Baselines3)\n\nabstract class BaseCallback {\n    + _on_step()\n}\n\nclass TensorboardCallback {\n    + _on_step(): bool\n}\n\nclass DRLAgent {\n    + __init__(env)\n    + get_model(model_name, policy, policy_kwargs, model_kwargs, verbose, seed)\n    + train_model(model, tb_log_name, total_timesteps)\n    + DRL_prediction(model, environment)\n    + DRL_prediction_load_from_file(model_name, environment, cwd)\n}\n\nclass DRLEnsembleAgent {\n    + __init__(df, train_period, ...)\n    + get_model(model_name, env, ...)\n    + train_model(model, model_name, tb_log_name, iter_num, total_timesteps)\n    + get_validation_sharpe(iteration, model_name)\n    + DRL_validation(model, test_data, test_env, test_obs)\n    + DRL_prediction(model, name, last_state, iter_num, ...)\n    + run_ensemble_strategy(A2C_model_kwargs, PPO_model_kwargs, DDPG_model_kwargs, timesteps_dict)\n}\n\nTensorboardCallback --|> BaseCallback\nDRLAgent ..> MODELS : uses\nDRLEnsembleAgent ..> MODELS : uses\nDRLEnsembleAgent ..> DRLAgent : uses methods\n\nnote right of DRLAgent::get_model\n  Initializes SB3 model (A2C, PPO, DDPG, SAC, TD3)\n  Handles action noise configuration\nend note\n\nnote right of DRLEnsembleAgent::run_ensemble_strategy\n  Core logic for rolling-window training\n  and model selection based on Sharpe ratio\nend note\n\n@enduml\n\n@startuml\nskinparam classAttributeIconVisible true\n\npackage \"Data Processors\" {\n    enum DataSource {\n        akshare\n        alpaca\n        alphavantage\n        baostock\n        binance\n        ccxt\n        iexcloud\n        joinquant\n        quandl\n        quantconnect\n        ricequant\n        tushare\n        wrds\n        yahoofinance\n    }\n\n    abstract class _Base {\n        + data_source: str\n        + start_date: str\n        + end_date: str\n        + time_interval: str\n        + dataframe: pd.DataFrame\n        --\n        + download_data(ticker_list: List[str])\n        + clean_data()\n        + fillna()\n        + add_technical_indicator(tech_indicator_list: List[str])\n        + add_turbulence()\n        + calculate_turbulence(): pd.DataFrame\n        + add_vix()\n        + df_to_array(tech_indicator_list: List[str], if_vix: bool)\n        + calc_nonstandard_time_interval(): str\n        + transfer_standard_ticker_to_nonstandard(ticker: str): str\n        + save_data(path)\n        + load_data(path)\n    }\n\n    class DataProcessor {\n        - processor: _Base\n        + data_source: DataSource\n        + start_date: str\n        + end_date: str\n        + time_interval: str\n        + dataframe: pd.DataFrame\n        --\n        + __init__(data_source: DataSource, ...)\n        + download_data(ticker_list)\n        + clean_data()\n        + add_technical_indicator(tech_indicator_list: List[str])\n        + add_turbulence()\n        + add_vix()\n        + df_to_array(if_vix: bool): np.array\n        + data_split(df, start, end)\n        + fillna()\n        + run(ticker_list: str, technical_indicator_list: List[str], if_vix: bool)\n    }\n\n    class Yahoofinance {\n        + download_data(ticker_list: List[str])\n    }\n\n    class Alpaca {\n        + api: tradeapi.REST\n        + download_data(ticker_list)\n        + clean_data()\n        + get_trading_days(start, end)\n    }\n\n    class Binance {\n        + download_data(ticker_list: List[str])\n        + dataframe_with_limit(symbol)\n        + fetch_n_combine(startDate, endDate, tickers)\n    }\n\n    class Tushare {\n        + token: str\n        + adj: str\n        + download_data(ticker_list: List[str])\n    }\n\n    DataProcessor o-- _Base : delegates\n    _Base <|-- Yahoofinance\n    _Base <|-- Alpaca\n    _Base <|-- Binance\n    _Base <|-- Tushare\n    DataProcessor o-- DataSource : uses\n}\n\n@enduml\n\n@startuml\nskinparam classAttributeIconVisible true\n\npackage \"RL Environments (meta.envs)\" {\n\n    package \"Crypto Trading\" {\n        class CryptoEnv {\n            + lookback: int\n            + initial_cash: float\n            + buy_cost_pct: float\n            + sell_cost_pct: float\n            + price_array: np.ndarray\n            + tech_array: np.ndarray\n            + stocks: np.ndarray\n            --\n            + __init__(config, lookback, initial_capital, ...)\n            + reset(): np.ndarray\n            + step(actions): (np.ndarray, float, bool, None)\n            + get_state(): np.ndarray\n            - _generate_action_normalizer()\n        }\n\n        class BitcoinEnv {\n            + stock_dim: int = 1\n            + initial_account: float\n            + transaction_fee_percent: float\n            --\n            + __init__(...)\n            + reset(): np.ndarray\n            + step(action): (np.ndarray, float, bool, None)\n            + draw_cumulative_return(...)\n            - load_data(...)\n        }\n\n        class AlpacaPaperTradingMultiCrypto {\n            - alpaca: tradeapi.REST\n            - act: AgentPPO.act\n            - CCTX_time_interval: str\n            - time_interval: int\n            - stocks: np.ndarray\n            - cash: float\n            --\n            + __init__(...)\n            + run()\n            + trade()\n            + get_state()\n            + submitOrder(qty, stock, side, resp)\n        }\n\n        class create_crypto_env {\n            + create_train_env(...)\n            + create_test_env(...)\n        }\n\n        CryptoEnv <.. create_crypto_env : creates\n        BitcoinEnv .up.|> CryptoEnv : specialized single-asset env (conceptual)\n        AlpacaPaperTradingMultiCrypto ..> CryptoEnv : uses concepts (state/action space)\n        AlpacaPaperTradingMultiCrypto ..> meta.data_processors.Ccxt : data source\n        AlpacaPaperTradingMultiCrypto ..> elegantrl.agent.AgentPPO : loads agent\n    }\n\n    package \"Execution Optimizing\" {\n        class Agent << (A, #FF7700) DDPG Agent >> {\n            + state_size: int\n            + action_size: int\n            - actor_local: Actor\n            - critic_local: Critic\n            - noise: OUNoise\n            - memory: ReplayBuffer\n            --\n            + __init__(state_size, action_size, random_seed)\n            + step(state, action, reward, next_state, done)\n            + act(state, add_noise=True)\n            + learn(experiences, gamma)\n            + soft_update(local_model, target_model, tau)\n        }\n\n        class OUNoise {\n            - mu: np.ndarray\n            - theta: float\n            - sigma: float\n            --\n            + __init__(size, seed, mu, theta, sigma)\n            + reset()\n            + sample()\n        }\n\n        class ReplayBuffer {\n            - memory: deque\n            - experience: namedtuple\n            --\n            + __init__(action_size, buffer_size, batch_size, seed)\n            + add(state, action, reward, next_state, done)\n            + sample()\n        }\n\n        Agent *-- OUNoise : uses\n        Agent *-- ReplayBuffer : uses\n        Agent ..> Actor : trains\n        Agent ..> Critic : trains\n    }\n}\n\n@enduml\n\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\n## Core Abstractions, Design Philosophy, and Lifecycle Management\n\nThe FinRL-Meta project is built upon a highly modular and layered architecture, primarily following the **Facade** and **Strategy** design patterns to achieve flexibility and extensibility. The core abstractions revolve around three main components: Data, Environment, and Agent.\n\n### 1. Data Abstraction\nThe data layer abstracts the complex process of connecting to various financial data sources (e.g., Yahoo Finance, Binance, Alpaca) into a unified interface.\n\n*   **`DataSource` Enum**: This is the fundamental abstraction, listing all supported data providers (`akshare`, `alpaca`, `yahoofinance`, etc.).\n*   **`_Base` Class**: An abstract base class (`meta/data_processors/_base.py`) that defines the common interface for all concrete data processors. It includes core methods like `download_data()`, `clean_data()`, `add_technical_indicator()`, and `df_to_array()`. This enforces a standard contract across all data sources.\n*   **`DataProcessor` Class**: This acts as a **Facade** (`meta/data_processor.py`). It takes a `DataSource` enum in its constructor and dynamically instantiates the corresponding concrete processor (e.g., `Yahoofinance`, `Binance`). Its `run()` method orchestrates the entire data pipeline: download, clean, add indicators, and transform the data into NumPy arrays (`price_array`, `tech_array`, `turbulence_array`) suitable for the RL environment.\n\n### 2. Environment Abstraction\nThe environment layer provides a standard interface for the Deep Reinforcement Learning (DRL) agents, adhering to the OpenAI Gym standard (`reset`, `step`).\n\n*   **`CryptoEnv` / `BitcoinEnv`**: These classes (`meta/env_crypto_trading/env_multiple_crypto.py`, `meta/env_crypto_trading/env_btc_ccxt.py`) abstract the trading logic, portfolio management, and reward calculation. They manage the state space (cash, holdings, technical indicators) and the action space (buy/sell/hold).\n*   **State Representation**: The state is a flattened NumPy array, typically a concatenation of normalized cash, normalized stock holdings, and a lookback window of normalized technical indicators. This design choice simplifies the state space for DRL algorithms.\n\n### 3. Agent and Execution Abstraction\nThe agent layer is designed to be decoupled from the core framework, allowing for easy integration of external DRL libraries (e.g., ElegantRL).\n\n*   **`DDPG_Agent`**: A concrete implementation of a DRL agent, demonstrating the use of the **Actor-Critic** architecture for continuous action spaces. It uses helper classes like `ReplayBuffer` and `OU_Noise`.\n*   **`AlpacaPaperTradingMultiCrypto`**: This class in the execution layer acts as a bridge between the trained DRL policy and a live trading API (Alpaca). It handles the real-time data fetching, state construction, policy inference, and order submission, managing the entire **live trading lifecycle**.\n\n### Lifecycle Management\nThe typical lifecycle involves:\n1.  **Initialization**: `DataProcessor` is initialized with a `DataSource` and time parameters.\n2.  **Data Preparation**: `DataProcessor.run()` fetches and processes historical data, outputting NumPy arrays.\n3.  **Environment Setup**: An environment (`CryptoEnv`) is instantiated with the processed data arrays.\n4.  **Training/Testing**: A DRL agent interacts with the environment using `reset()` and `step()` methods.\n5.  **Deployment (Live Trading)**: The trained agent's policy is loaded into an execution class (`AlpacaPaperTradingMultiCrypto`), which runs a continuous loop to fetch real-time data, generate actions, and execute trades. The `run()` method in this class manages the continuous trading loop.\n\n#### 3.1.2. Component Interactions\n\n## Component Interactions, Data Flow, and Communication Patterns\n\nThe FinRL-Meta architecture is characterized by a clear separation of concerns, with data flowing sequentially from the Data Layer to the Environment Layer, and control/action signals flowing between the Environment and the Agent Layer.\n\n### 1. Data Flow (Offline/Training Phase)\n\nThe primary data flow during the offline training phase is a one-way pipeline from the data source to the reinforcement learning environment.\n\n| Source Component | Target Component | Data Format | Communication Pattern | Description |\n| :--- | :--- | :--- | :--- | :--- |\n| **Data Processor** | **RL Environment** | NumPy Arrays | Synchronous Call | The `DataProcessor.run()` method orchestrates the data pipeline, culminating in the output of three key NumPy arrays: `price_array`, `tech_array`, and `turbulence_array`. These arrays, which represent the entire historical dataset, are passed directly to the `CryptoEnv` constructor. |\n| **RL Environment** | **DRL Agent** | NumPy Array (State) | Synchronous Call | In each `step()` call, the `CryptoEnv` calculates the next state (`get_state()`) and returns it to the DRL agent. The state is a flattened, normalized vector of market data and portfolio information. |\n\nThe `DataProcessor` acts as a **Strategy Pattern** selector, dynamically choosing a concrete data source module (e.g., `Yahoofinance`, `Binance`) based on the `DataSource` enum provided by the user. This ensures that the downstream components (the RL environments) only interact with the standardized NumPy array format, completely decoupling them from the complexities of external APIs.\n\n### 2. Control Flow (Training Phase)\n\nThe control flow adheres strictly to the standard **OpenAI Gym interface** for reinforcement learning.\n\n1.  **Initialization**: The DRL training loop calls `env.reset()`. The environment initializes the portfolio (cash, stocks) and returns the initial state vector.\n2.  **Action Selection**: The DRL agent receives the state and uses its neural network policy (`Actor.forward()`) to select an action (a continuous vector of target stock allocations).\n3.  **State Transition**: The DRL training loop calls `env.step(action)`.\n4.  **Environment Logic**: Inside `env.step()`, the environment:\n    *   Applies the action (simulates trades, updating `cash` and `stocks`).\n    *   Calculates the reward (change in total asset value).\n    *   Advances the time step.\n    *   Determines the next state (`get_state()`).\n    *   Checks for termination (`done`).\n5.  **Feedback**: The environment returns `(next_state, reward, done, info)` to the agent, closing the loop.\n\n### 3. Communication Patterns (Online/Live Trading Phase)\n\nThe `AlpacaPaperTradingMultiCrypto` class manages the real-time interaction with external services, introducing asynchronous and external API communication.\n\n1.  **Real-Time Data Fetch**: The `get_state()` method within the live trading class uses a data processor (specifically `Ccxt` in the example) to fetch the latest market data via HTTP requests to the exchange API (e.g., Binance). This is a synchronous, blocking call to retrieve the necessary historical lookback window.\n2.  **Policy Inference**: The fetched data is transformed into the state vector, which is then passed to the loaded DRL policy (`self.act(s_tensor)`). This is a local, synchronous operation.\n3.  **Trade Execution**: The resulting action is translated into market orders. The `submitOrder()` method uses the Alpaca API (`alpaca.submit_order()`) to send the order to the broker. This is typically an external, asynchronous HTTP call, although the provided code wraps it in a `threading.Thread` and uses `join()` to make it functionally synchronous within the main loop, ensuring one trade is processed before the next time step.\n\nThis layered design ensures that the core RL logic remains clean and platform-agnostic, while the complexity of external data fetching and live execution is encapsulated in dedicated modules.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\nskinparam componentStyle rectangle\nskinparam classAttributeIconVisible true\n\ntitle FinRL-Meta High-Level Architecture\n\npackage \"Data Layer\" {\n    class DataSource << (E, #ADD8E6) Enum >>\n    abstract class _Base << (A, #ADD8E6) Base Processor >>\n    class DataProcessor << (F, #ADD8E6) Facade >>\n    class Yahoofinance << (C, #ADD8E6) Concrete Processor >>\n    class Binance << (C, #ADD8E6) Concrete Processor >>\n    ' ... other concrete processors ...\n}\n\npackage \"Environment Layer\" {\n    class CryptoEnv << (E, #90EE90) RL Environment >> {\n        + price_array: np.ndarray\n        + tech_array: np.ndarray\n        + stocks: np.ndarray\n        --\n        + reset()\n        + step(actions)\n        + get_state()\n    }\n    class MarketEnvironment << (E, #90EE90) Liquidation Env >> {\n        + shares_remaining\n        + timeHorizon\n        --\n        + step(action)\n    }\n}\n\npackage \"Agent Layer\" {\n    class DDPG_Agent << (A, #FFB6C1) Deep RL Agent >>\n    class Actor << (N, #FFB6C1) Neural Network >>\n    class Critic << (N, #FFB6C1) Neural Network >>\n    class OUNoise << (H, #FFB6C1) Helper >>\n    class ReplayBuffer << (H, #FFB6C1) Helper >>\n}\n\npackage \"Execution Layer\" {\n    class AlpacaPaperTradingMultiCrypto << (T, #FFA07A) Trading Interface >> {\n        - alpaca: tradeapi.REST\n        - act: Agent.act\n        --\n        + run()\n        + trade()\n        + get_state()\n    }\n}\n\n' Relationships\n\n' Data Flow\nDataProcessor .up.> DataSource : uses\nDataProcessor .right.> _Base : delegates\nYahoofinance .up.|> _Base\nBinance .up.|> _Base\n\nDataProcessor --> CryptoEnv : feeds (price, tech, turbulence arrays)\nDataProcessor --> MarketEnvironment : feeds (implicitly via parameters)\n\n' Environment to Agent\nCryptoEnv .left.> DDPG_Agent : state/reward/action space\n\n' Agent Internals\nDDPG_Agent *-- Actor : trains/uses\nDDPG_Agent *-- Critic : trains/uses\nDDPG_Agent *-- OUNoise\nDDPG_Agent *-- ReplayBuffer\n\n' Execution Flow\nAlpacaPaperTradingMultiCrypto .up.> CryptoEnv : conceptual interface\nAlpacaPaperTradingMultiCrypto .up.> DDPG_Agent : loads/uses policy (act)\nAlpacaPaperTradingMultiCrypto .up.> Binance : data fetching (via Ccxt)\nAlpacaPaperTradingMultiCrypto .up.> Alpaca : trade execution\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\n## Design Patterns Used in the Codebase\n\nThe FinRL-Meta project effectively utilizes several software design patterns to achieve modularity, flexibility, and maintainability, particularly in handling diverse data sources and complex reinforcement learning components.\n\n### 1. Facade Pattern (DataProcessor)\nThe `DataProcessor` class (`meta/data_processor.py`) serves as a **Facade** to the entire data processing subsystem. It provides a simple, unified interface (`run()`, `download_data()`, `clean_data()`) for the complex operations of fetching, cleaning, and transforming data from multiple sources.\n\n*   **Implementation**: The `DataProcessor.__init__` method takes a `DataSource` enum and dynamically instantiates the appropriate concrete processor (e.g., `Yahoofinance`, `Binance`). All subsequent method calls on `DataProcessor` are delegated to the internal concrete processor instance.\n*   **Code Example (meta/data_processor.py)**:\n    ```python\n    class DataProcessor:\n        def __init__(self, data_source: DataSource, ...):\n            # ... dynamic instantiation logic ...\n            self.processor = processor_dict.get(self.data_source)(...)\n\n        def download_data(self, ticker_list):\n            self.processor.download_data(ticker_list=ticker_list)\n            self.dataframe = self.processor.dataframe\n    ```\n\n### 2. Strategy Pattern (Data Processors)\nThe various data source classes (e.g., `Yahoofinance`, `Alpaca`, `Binance`) implement the **Strategy Pattern**. They all inherit from the abstract base class `_Base` (`meta/data_processors/_base.py`), which defines the common interface (the \"Strategy\"). Each concrete class provides its own specific implementation (the \"Concrete Strategy\") for methods like `download_data()` and `clean_data()`, tailored to the requirements of its respective API.\n\n*   **Implementation**: The `_Base` class defines the contract, and classes like `Yahoofinance` and `Binance` provide the specific logic for their data fetching and cleaning. The `DataProcessor` (the \"Context\") selects and uses the appropriate strategy object.\n*   **Code Example (meta/data_processors/_base.py)**:\n    ```python\n    class _Base:\n        def download_data(self, ticker_list: List[str]):\n            pass # Defined in concrete classes\n    ```\n\n### 3. Actor-Critic Pattern (DDPG Agent)\nThe Deep Deterministic Policy Gradient (DDPG) agent implementation (`meta/env_execution_optimizing/liquidation/ddpg_agent.py`) is a prime example of the **Actor-Critic** architecture, a fundamental pattern in Reinforcement Learning.\n\n*   **Implementation**: The agent consists of two main neural networks:\n    *   **Actor (`Actor` class)**: The policy network that takes the state as input and outputs the action (the policy).\n    *   **Critic (`Critic` class)**: The value network that takes the state and action as input and outputs the Q-value (the value function).\n*   **Code Example (meta/env_execution_optimizing/liquidation/ddpg_agent.py)**:\n    ```python\n    # Actor Network (w/ Target Network)\n    self.actor_local = Actor(state_size, action_size, random_seed).to(device)\n    # Critic Network (w/ Target Network)\n    self.critic_local = Critic(state_size, action_size, random_seed).to(device)\n\n    # In learn method:\n    # Q_targets = r + γ * critic_target(next_state, actor_target(next_state))\n    ```\n\n### 4. Template Method Pattern (RL Environment)\nThe base environment structure, particularly in `CryptoEnv` and `BitcoinEnv`, follows the **Template Method Pattern**. The base class defines the skeleton of the algorithm (`reset`, `step`) but defers the implementation of specific steps (like state normalization or action scaling) to helper methods or configuration parameters.\n\n*   **Implementation**: The `step()` method in `CryptoEnv` is the template, which calls the concrete implementation of action normalization via `_generate_action_normalizer()` and applies the core trading logic. The overall structure is inherited from the OpenAI Gym interface, which itself is a form of the Template Method.\n*   **Code Example (meta/env_crypto_trading/env_multiple_crypto.py)**:\n    ```python\n    class CryptoEnv:\n        # ...\n        def step(self, actions) -> (np.ndarray, float, bool, None):\n            # Template step 1: Normalize action (deferred to helper)\n            for i in range(self.action_dim):\n                norm_vector_i = self.action_norm_vector[i]\n                actions[i] = actions[i] * norm_vector_i\n\n            # Template step 2: Execute trades (core logic)\n            # ... sell logic ...\n            # ... buy logic ...\n\n            # Template step 3: Update state and calculate reward (core logic)\n            # ...\n    ```\n\n#### 3.3.2. Project Highlights\n\n## Project Highlights: Innovative Features, Extensibility, and Flexibility Design\n\nThe FinRL-Meta project exhibits several innovative features and strong design choices that contribute to its extensibility and flexibility, making it a robust platform for financial reinforcement learning research and application.\n\n*   **Unified Data Pipeline Abstraction**:\n    *   **Innovation**: The use of the `DataProcessor` Facade over a set of concrete data source strategies (`Yahoofinance`, `Binance`, etc.) is a major highlight. This design abstracts away the heterogeneity of financial data APIs, which often have different data formats, time zone conventions, and rate limits.\n    *   **Flexibility**: Researchers can easily add support for a new data source by simply creating a new class that inherits from `_Base` and implementing the required methods. The core RL environment remains completely unaware of the data source's origin, only consuming the standardized NumPy arrays.\n\n*   **Decoupled RL Environment and Agent**:\n    *   **Extensibility**: The core RL environments (`CryptoEnv`, `MarketEnvironment`) are designed to be agnostic to the specific DRL algorithm used. They adhere to the standard OpenAI Gym interface (`reset`, `step`), which is the universal contract for RL. This allows the project to seamlessly integrate agents from different DRL libraries (e.g., ElegantRL, Stable-Baselines3, RLLib), as seen in the `AlpacaPaperTradingMultiCrypto` class which dynamically loads the policy.\n    *   **Innovation**: The environment state space is carefully engineered to be a fixed-size, normalized vector, making it directly compatible with standard deep learning models (e.g., fully connected layers in the Actor/Critic networks). The normalization factors (e.g., `cash * 2**-18`) are hardcoded to scale the state variables into a manageable range for neural network training.\n\n*   **Real-Time Trading Integration**:\n    *   **Innovation**: The inclusion of the `AlpacaPaperTradingMultiCrypto` module demonstrates a clear path from research to real-world application. This module encapsulates the complexity of live trading, including API communication, order submission, and real-time state construction. It bridges the gap between a simulated environment and a live paper trading account.\n    *   **Flexibility**: By separating the trading logic from the core RL environment, the project allows for different execution strategies (e.g., market orders, limit orders, different brokers) to be implemented without modifying the core training environment.\n\n*   **Domain-Specific Environment Modeling**:\n    *   **Innovation**: The `MarketEnvironment` for execution optimization, based on the Almgren-Chriss model, is a sophisticated, domain-specific environment. It models complex financial phenomena like **permanent and temporary market impact** and uses a reward function based on the change in the Almgren-Chriss utility function. This highlights the project's focus on advanced financial modeling beyond simple portfolio management.\n    *   **Extensibility**: The environment is parameterized with financial constants (`ANNUAL_VOLAT`, `BID_ASK_SP`, `LLAMBDA1`), allowing researchers to easily modify the market dynamics to test the robustness of their agents under different simulated conditions.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\n## Improvement Suggestions: Performance, Architecture, and Code Quality\n\nBased on the comprehensive analysis of the FinRL-Meta codebase, the following suggestions are proposed to enhance performance, optimize the architecture, and improve overall code quality.\n\n### 1. Performance Bottlenecks and Optimization\n\n| Area | Bottleneck/Issue | Suggestion for Improvement |\n| :--- | :--- | :--- |\n| **Data Processing (Pandas)** | Excessive use of `pd.concat()` and `df.append()` in data processors (e.g., `Yahoofinance.download_data`). These operations create new DataFrames in memory, leading to significant performance degradation and memory overhead, especially with large datasets. | **Pre-allocate Lists and Concatenate Once**: Instead of appending to a DataFrame in a loop, collect the individual DataFrames into a Python list and perform a single `pd.concat(list_of_dfs)` operation outside the loop. |\n| **State Normalization** | Hardcoded magic numbers for state normalization (e.g., `cash * 2**-18`, `stocks * 2**-3`) are used across multiple environment files (`CryptoEnv`, `BitcoinEnv`). This makes tuning and debugging difficult. | **Centralize Normalization Constants**: Define all normalization constants in a single configuration file (e.g., `meta/config.py`) and load them dynamically. This improves maintainability and allows for easier hyperparameter tuning of the state space. |\n| **Live Trading Latency** | The `AlpacaPaperTradingMultiCrypto.trade()` method uses `threading.Thread` with `join()` for `submitOrder`. This effectively makes the order submission synchronous and blocks the main trading loop, increasing latency. | **Asynchronous Order Submission**: Implement true asynchronous order submission using `asyncio` and non-blocking API calls (if supported by the Alpaca SDK) or a dedicated, non-blocking worker queue/process for trade execution. |\n\n### 2. Architecture Optimization\n\n*   **Formalize the Environment Base Class**: Currently, the RL environments (`CryptoEnv`, `BitcoinEnv`) do not explicitly inherit from a common abstract base class, other than the implicit contract of the OpenAI Gym interface.\n    *   **Suggestion**: Introduce a formal `BaseEnv` class in `meta/envs/_base.py` that inherits from `gym.Env` (or a modern equivalent) and defines abstract methods for `_calculate_reward()`, `_update_portfolio()`, and `_get_state()`. This would enforce a stricter contract and improve the clarity of the environment's responsibilities.\n*   **Decouple DRL Library Loading**: The `AlpacaPaperTradingMultiCrypto` class contains hardcoded imports and logic for `elegantrl` (lines 7164-7175). This tightly couples the execution layer to a specific DRL framework.\n    *   **Suggestion**: Use a **Factory Pattern** to load the agent. The execution class should only accept a path to a saved model and a configuration, and a separate utility function should handle the framework-specific loading and policy instantiation.\n\n### 3. Code Quality and Maintainability\n\n*   **Consistent Type Hinting**: While some files use type hints, consistency is lacking across the entire codebase.\n    *   **Suggestion**: Adopt comprehensive Python type hinting for all function signatures and class attributes. This significantly improves code readability, enables static analysis tools, and reduces runtime errors.\n*   **Magic Number Elimination**: The `MarketEnvironment` in the execution optimization module is heavily parameterized with financial constants (e.g., `LLAMBDA1 = 1e-6`, `NUM_N = 60`).\n    *   **Suggestion**: Move all these constants to a dedicated configuration file or a class-level attribute with clear documentation, making the environment's parameters transparent and easily adjustable.\n*   **Refactor `AlpacaPaperTradingMultiCrypto` State Logic**: The state construction logic in `get_state()` is complex, involving multiple array stacking and normalization steps.\n    *   **Suggestion**: Encapsulate the state construction into a dedicated `StateBuilder` class or a static method. This would isolate the complex logic and make the state representation easier to verify and modify.\n\n#### 3.4.2. Secondary Development Guide\n\n## Secondary Development Guide: Best Practices for Code Exploration and Extension\n\nThis guide provides a structured approach for developers looking to explore, modify, or extend the FinRL-Meta codebase.\n\n### 1. Code Exploration Path\n\nStart your exploration by focusing on the three core layers of the architecture:\n\n1.  **Configuration and Entry Point (`meta/config.py` and `meta/data_processor.py`)**:\n    *   Examine `meta/config.py` to understand the global constants, default ticker lists, and time zone settings.\n    *   Review `meta/data_processor.py` to grasp how data sources are selected and the standardized data arrays (`price_array`, `tech_array`, `turbulence_array`) are generated. This is the **input** to the entire RL system.\n\n2.  **Environment Layer (`meta/env_crypto_trading/`)**:\n    *   Focus on `meta/env_crypto_trading/env_multiple_crypto.py` (`CryptoEnv`). This is the heart of the simulation.\n    *   Analyze the `__init__`, `reset()`, and `step(actions)` methods to understand the state space definition, reward function, and transaction logic (cost calculation, portfolio update).\n\n3.  **Agent/Execution Layer (`meta/env_execution_optimizing/` and `meta/env_crypto_trading/`)**:\n    *   For DRL implementation details, study `meta/env_execution_optimizing/liquidation/ddpg_agent.py` and `model.py` to see the Actor-Critic network structure and training loop.\n    *   For real-world application, examine `meta/env_crypto_trading/alpaca_paper_trade_multicrypto.py` to understand how a trained policy is deployed for live trading.\n\n### 2. Best Practices for Extension\n\n*   **Adding a New Data Source**:\n    1.  Create a new file in `meta/data_processors/` (e.g., `new_source.py`).\n    2.  Define a class that inherits from `meta/data_processors/_base._Base`.\n    3.  Implement the required methods, especially `download_data()` and `clean_data()`, ensuring the final `self.dataframe` adheres to the expected format (columns: `time`, `open`, `high`, `low`, `close`, `volume`, `tic`).\n    4.  Update the `DataSource` enum and the `processor_dict` mapping in `meta/data_processor.py` to include your new class.\n\n*   **Creating a New Trading Environment**:\n    1.  Create a new file in `meta/envs/` (e.g., `env_forex_trading.py`).\n    2.  Define a new environment class (e.g., `ForexEnv`) that mimics the structure of `CryptoEnv`, implementing `reset()` and `step()`.\n    3.  Crucially, redefine the **state space** (`self.state_dim`) and **action space** (`self.action_dim`) to match the requirements of the new domain (e.g., different asset types, different technical indicators).\n    4.  Adjust the reward function and transaction cost logic to reflect the new market's characteristics.\n\n*   **Integrating a New DRL Algorithm**:\n    1.  Ensure your new algorithm's policy can be loaded and called with a NumPy state array to return a NumPy action array.\n    2.  If integrating into the live trading module, modify the agent loading section in `AlpacaPaperTradingMultiCrypto.__init__` to correctly load your new model and expose the `self.act` function.\n    3.  If the new algorithm requires a different environment interface (e.g., discrete action space), you will need to create a new environment wrapper that translates the continuous actions of the existing environments into the required format.\n\n"
  },
  {
    "path": "thirdparty/FinRL.md",
    "content": "# FinRL - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe FinRL project structure is organized into a core Python package (`finrl`) and several supporting directories, following a clear separation of concerns for a machine learning framework.\n\n```\nFinRL/\n├── .git/                     # Git version control metadata (Excluded from analysis)\n├── .github/                  # GitHub configuration (e.g., issue templates, workflows) (Excluded)\n├── docker/                   # Docker setup for containerized environments (Excluded)\n├── docs/                     # Documentation source files (Excluded)\n├── examples/                 # Jupyter notebooks and scripts demonstrating usage (Excluded)\n├── figs/                     # Project figures and logos (Excluded)\n├── finrl/                    # **CORE SOURCE CODE PACKAGE** - The heart of the framework\n│   ├── agents/               # **DRL Agents and Wrappers**: Integrates and adapts various DRL libraries (Stable-Baselines3, ElegantRL, RLlib) to the FinRL environment interface.\n│   │   ├── elegantrl/        # Integration with ElegantRL DRL library\n│   │   ├── portfolio_optimization/ # Specific agents for portfolio optimization tasks\n│   │   ├── rllib/            # Integration with RLlib DRL library\n│   │   └── stablebaselines3/ # Integration with Stable-Baselines3 DRL library\n│   ├── applications/         # **Financial Application Templates**: Provides end-to-end examples and specific configurations for different financial tasks.\n│   │   ├── cryptocurrency_trading/\n│   │   ├── high_frequency_trading/\n│   │   ├── portfolio_allocation/\n│   │   └── stock_trading/    # Example implementations for stock trading, including ensemble methods\n│   ├── meta/                 # **Meta/Environment Components**: The infrastructure layer for data and environment modeling.\n│   │   ├── data_processors/  # Data acquisition and feature engineering from various sources (Yahoo, Alpaca, etc.)\n│   │   ├── env_*/            # Custom OpenAI Gym environments for different financial tasks (stock trading, crypto, portfolio)\n│   │   ├── paper_trading/    # Real-time/paper trading integration (e.g., Alpaca)\n│   │   └── preprocessor/     # Legacy/alternative data downloaders\n│   ├── config.py             # Global configuration constants (dates, indicators, model params)\n│   ├── main.py               # Main entry point for CLI (train, test, trade modes)\n│   ├── train.py              # Core DRL training workflow logic\n│   ├── trade.py              # Core trading workflow logic (backtesting/paper trading)\n│   └── plot.py               # Utility for plotting results and performance metrics\n├── unit_tests/               # Unit tests (Excluded)\n└── ...                       # Other configuration files (README, LICENSE, setup.py, etc.) (Excluded)\n```\nThe structure is highly modular, with the `finrl` package acting as the primary container. The **`meta`** module handles the crucial task of transforming raw financial data into a standardized Reinforcement Learning problem (State, Action, Reward), while the **`agents`** module abstracts the complexity of different DRL algorithms. The top-level files (`main.py`, `train.py`, `trade.py`) serve as the **orchestration layer**, tying these components together to execute the full DRL pipeline. This design ensures that the core logic is separated from configuration, data handling, and algorithm implementation.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinRL/finrl`: The root of the core package, containing entry points and global configurations.\n*   `/home/ubuntu/FinRL/finrl/agents`: The module responsible for integrating and wrapping various DRL libraries (Stable-Baselines3, ElegantRL, RLlib) into a unified `DRLAgent` interface.\n*   `/home/ubuntu/FinRL/finrl/meta`: The meta-module that provides the necessary infrastructure for DRL in finance, including data processing, custom Gym environments, and paper trading interfaces.\n*   `/home/ubuntu/FinRL/finrl/applications`: Contains application-specific, end-to-end examples and templates for different financial tasks.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## 1. Core/Entry Module (`finrl`)\n\n**Module Core Responsibility**: This module serves as the **entry point** and **orchestrator** for the entire FinRL workflow. It defines global configurations and implements the high-level logic for the three main modes of operation: `train`, `test`, and `trade`.\n\n**Key File Identification**:\n*   `config.py`: Defines all global constants, including data directories (`DATA_SAVE_DIR`), date ranges (`TRAIN_START_DATE`), technical indicators (`INDICATORS` - e.g., `macd`, `rsi_30`), and default DRL model hyperparameters (`A2C_PARAMS`, `PPO_PARAMS`, etc.). This file centralizes all experiment parameters.\n*   `main.py`: The command-line interface entry point. It parses the `--mode` argument (`train`, `test`, `trade`) and calls the corresponding function from `finrl.train`, `finrl.test`, or `finrl.trade`. It ensures necessary directories are created for saving data and models.\n*   `train.py`: Implements the DRL training pipeline. It orchestrates the data flow: `DataProcessor` -> `download_data` -> `clean_data` -> `add_technical_indicator` -> `df_to_array` -> `StockTradingEnv` configuration -> `DRLAgent` initialization and `train_model`. It supports conditional loading of agents from `elegantrl`, `rllib`, or `stable_baselines3`.\n*   `trade.py`: Implements the trading pipeline, supporting two sub-modes: `backtesting` (which delegates to `finrl.test`) and `paper_trading` (which uses the `AlpacaPaperTrading` class from the `meta` module).\n\n## 2. Meta/Environment Module (`finrl/meta`)\n\n**Module Core Responsibility**: This is the **infrastructure layer** that adapts financial data and tasks into the standard Reinforcement Learning paradigm (Gym environments). It handles data acquisition, feature engineering, and the definition of the trading environment's state, action, and reward space.\n\n**Key File Identification**:\n*   `data_processor.py`: The main facade class, `DataProcessor`. It acts as a factory/wrapper for various data source-specific processors (e.g., `YahooFinanceProcessor`, `AlpacaProcessor`). It provides a unified interface for data downloading, cleaning, adding technical indicators, and converting the final DataFrame into the NumPy arrays (`price_array`, `tech_array`, `turbulence_array`) required by the Gym environments.\n*   `data_processors/processor_yahoofinance.py`: A concrete implementation of a data processor. It uses the `yfinance` library (and potentially Selenium for scraping) to fetch data and includes methods for data cleaning and feature engineering (e.g., adding the VIX index).\n*   `env_stock_trading/env_stocktrading.py`: The core custom Gym environment, `StockTradingEnv`.\n    *   **State Space**: A 1D NumPy array representing `[cash, stock_price_1, ..., stock_price_N, stock_shares_1, ..., stock_shares_N, tech_indicator_1, ..., tech_indicator_M, turbulence]`.\n    *   **Action Space**: A continuous `Box` space, where each element corresponds to the percentage of total assets to allocate to a stock (ranging from -1 to 1, representing sell/buy).\n    *   **Reward Function**: The reward is the change in the total portfolio value (cash + stock holdings) between the current step and the previous step, scaled by `reward_scaling`.\n    *   **Turbulence**: The environment incorporates a **turbulence index** (`risk_indicator_col`) to model market volatility. If turbulence exceeds a threshold, the agent is forced to liquidate all positions, a critical risk management mechanism.\n\n## 3. Agent Module (`finrl/agents`)\n\n**Module Core Responsibility**: This module provides the necessary **wrappers and interfaces** to seamlessly integrate popular DRL libraries (Stable-Baselines3, ElegantRL, RLlib) with the custom FinRL Gym environments. This abstracts the DRL implementation details from the main workflow.\n\n**Key File Identification**:\n*   `stablebaselines3/models.py`: Defines the `DRLAgent` class, which wraps SB3 models (A2C, PPO, SAC, TD3, DDPG). It uses the **Adapter Pattern** to make SB3 algorithms conform to the FinRL training and prediction interface. The `DRL_prediction` method handles the testing/backtesting loop using the trained model on a vectorized environment (`DummyVecEnv`).\n*   `elegantrl/models.py`: Defines the `DRLAgent` class for ElegantRL integration. This wrapper is more tightly coupled with the environment's internal arrays (`price_array`, `tech_array`) as ElegantRL uses a custom `Config` object for environment and agent setup.\n*   `portfolio_optimization/algorithms.py`: Contains specific algorithms for portfolio optimization, demonstrating the framework's flexibility beyond standard stock trading.\n\n## 4. Application Module (`finrl/applications`)\n\n**Module Core Responsibility**: This module provides **ready-to-use, end-to-end examples** for various financial tasks. These files serve as templates and demonstrations, showing how to combine the `meta` (data/env) and `agents` (DRL models) modules to solve a specific problem.\n\n**Key File Identification**:\n*   `stock_trading/ensemble_stock_trading.py`: A key example demonstrating the use of an **ensemble strategy** where multiple DRL agents (e.g., PPO, A2C, DDPG) are trained and their performance is validated to select the best one for trading. This highlights a key feature of the FinRL framework.\n*   Other files (e.g., `cryptocurrency_trading`, `portfolio_allocation`) provide specialized configurations and environment settings for those specific domains, showcasing the framework's adaptability.\n\n### Module PlantUML Diagrams\n\n@startuml Module_Meta\ntitle FinRL Meta Module (Data and Environment)\n\npackage \"finrl.meta\" {\n    class DataProcessor {\n        - processor: AbstractProcessor\n        + __init__(data_source, ...)\n        + download_data(...)\n        + clean_data(...)\n        + add_technical_indicator(...)\n        + add_turbulence(...)\n        + add_vix(...)\n        + df_to_array(...) : price_array, tech_array, turbulence_array\n    }\n\n    package \"data_processors\" {\n        interface AbstractProcessor {\n            + download_data()\n            + clean_data()\n            + add_technical_indicator()\n            + add_turbulence()\n            + add_vix()\n            + df_to_array()\n        }\n        class YahooFinanceProcessor\n        class AlpacaProcessor\n        class WrdsProcessor\n    }\n\n    package \"env_stock_trading\" {\n        class StockTradingEnv {\n            - df: DataFrame\n            - state: np.array\n            - day: int\n            - initial_amount: int\n            - asset_memory: list\n            + __init__(...)\n            + step(actions) : state, reward, done, info\n            + reset() : state\n            + _sell_stock(index, action)\n            + _buy_stock(index, action)\n            + get_sb_env() : DummyVecEnv\n        }\n        StockTradingEnv -up-|> gym.Env\n    }\n\n    package \"paper_trading\" {\n        class AlpacaPaperTrading {\n            - api_key\n            - api_secret\n            - model\n            + run()\n        }\n    }\n}\n\nDataProcessor o-- AbstractProcessor : uses\nYahooFinanceProcessor -up-|> AbstractProcessor\nAlpacaProcessor -up-|> AbstractProcessor\nWrdsProcessor -up-|> AbstractProcessor\n\nStockTradingEnv ..> DataProcessor : receives arrays from df_to_array()\nAlpacaPaperTrading ..> StockTradingEnv : uses for state/action logic\n\n@enduml\n\n@startuml Module_Agents\ntitle FinRL Agents Module (DRL Wrappers)\n\npackage \"finrl.agents\" {\n    interface DRLAgentInterface {\n        + get_model(model_name, ...)\n        + train_model(model, ...)\n        + DRL_prediction(model, environment)\n    }\n\n    package \"stablebaselines3\" {\n        class DRLAgent_SB3 {\n            - env: StockTradingEnv\n            + get_model(model_name, ...)\n            + train_model(model, ...)\n            + DRL_prediction(model, environment)\n        }\n        class TensorboardCallback\n    }\n\n    package \"elegantrl\" {\n        class DRLAgent_ElegantRL {\n            - env_config\n            + get_model(model_name, model_kwargs)\n            + train_model(model, cwd, total_timesteps)\n        }\n    }\n\n    package \"rllib\" {\n        class DRLAgent_RLlib {\n            + get_model(model_name)\n            + train_model(model, ...)\n        }\n    }\n}\n\nDRLAgent_SB3 -up-|> DRLAgentInterface\nDRLAgent_ElegantRL -up-|> DRLAgentInterface\nDRLAgent_RLlib -up-|> DRLAgentInterface\n\nDRLAgent_SB3 ..> TensorboardCallback : uses\nDRLAgent_SB3 ..> StockTradingEnv : wraps/uses\nDRLAgent_ElegantRL ..> StockTradingEnv : wraps/uses\n\n@enduml\n\n@startuml Module_Core\ntitle FinRL Core Module (Orchestration)\n\npackage \"finrl\" {\n    class Config {\n        + TRAIN_START_DATE\n        + INDICATORS\n        + PPO_PARAMS\n        + ...\n    }\n\n    class Main {\n        + main()\n        + build_parser()\n    }\n\n    class Train {\n        + train(...)\n    }\n\n    class Trade {\n        + trade(...)\n    }\n}\n\nMain ..> Config : reads constants\nMain ..> Train : calls train()\nMain ..> Trade : calls trade()\n\nTrain ..> DataProcessor : uses for data prep\nTrain ..> StockTradingEnv : instantiates environment\nTrain ..> DRLAgentInterface : uses for model training\n\nTrade ..> StockTradingEnv : instantiates environment\nTrade ..> DRLAgentInterface : uses for prediction (backtesting)\nTrade ..> AlpacaPaperTrading : uses for paper trading\n\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe FinRL framework is fundamentally built on the **Reinforcement Learning (RL) Paradigm** applied to quantitative finance, adhering closely to the **OpenAI Gym interface** for environment standardization.\n\n**Core Abstractions**:\n1.  **Data Processor**: This serves as an abstraction layer over diverse financial data sources (Yahoo Finance, Alpaca, WRDS, etc.). It is responsible for standardizing raw data into a clean, feature-engineered format (DataFrame) suitable for the RL environment. This abstraction ensures the core DRL logic remains independent of the data source.\n2.  **Environment (`StockTradingEnv`)**: This is the central abstraction that models the financial market as a Markov Decision Process (MDP). It rigorously defines the three core components of the RL problem:\n    *   **State**: The observation space, which includes cash, stock prices, stock shares, technical indicators, and the market turbulence index.\n    *   **Action**: The action space, a continuous `Box` representing the normalized allocation of total assets to each stock (ranging from -1 for selling to 1 for buying).\n    *   **Reward**: The immediate reward, calculated as the change in the total portfolio value (cash + stock holdings) between time steps.\n3.  **DRL Agent Wrapper (`DRLAgent`)**: This is a critical abstraction over different DRL libraries (Stable-Baselines3, ElegantRL, RLlib). It allows users to swap out the underlying DRL algorithm with minimal code changes, promoting modularity, experimentation, and comparison of different algorithms on the same financial task.\n\n**Design Philosophy**:\n*   **Modularity and Extensibility**: The clear separation of concerns between Data (`DataProcessor`), Environment (`Env`), and Algorithm (`DRLAgent`) is the cornerstone of the design. This structure allows for easy extension: new data sources require only a new processor implementation, new financial tasks require a new Gym environment, and new DRL algorithms require a new `DRLAgent` wrapper.\n*   **Risk-Awareness**: The framework demonstrates a focus on real-world risk management by explicitly including a **turbulence index** in the state space. The environment's logic includes a mechanism for forced liquidation of all positions if market turbulence exceeds a predefined threshold, a crucial feature for financial stability.\n*   **Ensemble Learning Focus**: The design encourages the use of ensemble strategies, as evidenced by the application templates, to mitigate the high variance and improve the robustness of DRL models in volatile financial markets.\n\n**Lifecycle Management**:\nThe lifecycle is managed by the core orchestration scripts (`main.py`, `train.py`, `trade.py`). The process flows from configuration (`config.py`) -> data preparation (`DataProcessor`) -> environment setup (`StockTradingEnv`) -> model training (`DRLAgent`) -> model persistence (saving trained models) -> and finally, deployment for backtesting or paper trading. This sequential, modular lifecycle ensures reproducibility and clear debugging paths.\n\n#### 3.1.2. Component Interactions\n\nThe FinRL system follows a clear, sequential data flow, primarily orchestrated by the `train.py` and `trade.py` scripts, ensuring a structured pipeline from data to decision-making.\n\n**1. Data Acquisition and Preprocessing**:\nThe process begins in `train.py` which calls the `DataProcessor` (from `finrl/meta/data_processor.py`). The `DataProcessor` acts as a facade, instantiating a source-specific processor (e.g., `YahooFinanceProcessor` in `finrl/meta/data_processors/processor_yahoofinance.py`). This processor fetches raw financial data, cleans it, adds technical indicators, and incorporates market volatility measures like the VIX index. The final output is a set of three NumPy arrays: `price_array`, `tech_array`, and `turbulence_array`, which are passed back to the core workflow.\n\n**2. Environment Initialization**:\nThese NumPy arrays are used to configure and instantiate the custom Gym environment, typically `StockTradingEnv` (from `finrl/meta/env_stock_trading/env_stocktrading.py`). The environment uses these arrays to define its state space and to simulate the passage of time (days), making the financial market an accessible Markov Decision Process (MDP) for the DRL agent.\n\n**3. Training Loop (Agent-Environment Interaction)**:\nThe `train.py` script initializes the appropriate `DRLAgent` wrapper (e.g., `DRLAgent_SB3` from `finrl/agents/stablebaselines3/models.py`) and calls its `train_model()` method.\n*   **Interaction**: The DRL model interacts with the `StockTradingEnv` by calling `env.step(action)`. The DRL model outputs an `action` (a normalized portfolio allocation vector).\n*   **Execution**: The `StockTradingEnv.step()` method executes the simulated trade, updates the portfolio state (`self.state`), calculates the `reward` (change in portfolio value), and returns the new state, reward, and terminal status to the DRL algorithm.\n*   Training results are logged via the `TensorboardCallback` for monitoring.\n\n**4. Testing/Trading Loop**:\nThe `trade.py` or `test.py` scripts handle post-training execution.\n*   The trained model is loaded via `DRLAgent.DRL_prediction()`.\n*   The model predicts an action for each day in the test/trade period, and the environment is stepped through.\n*   For performance evaluation, the `asset_memory` and `actions_memory` are recorded.\n*   For **paper trading**, the `AlpacaPaperTrading` class in `finrl/meta/paper_trading/alpaca.py` continuously monitors the market and executes trades via the Alpaca API based on the DRL model's predictions, bridging the gap between simulation and real-world application.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml FinRL_Architecture\ntitle FinRL Overall Architecture\n\nskinparam component {\n  BackgroundColor<<Core>> LightBlue\n  BorderColor<<Core>> Blue\n  BackgroundColor<<Meta>> LightGreen\n  BorderColor<<Meta>> Green\n  BackgroundColor<<Agent>> LightYellow\n  BorderColor<<Agent>> Orange\n}\n\ncomponent [Config] <<Core>> as C\ncomponent [Main Entry Point] <<Core>> as M\ncomponent [Train Workflow] <<Core>> as T\ncomponent [Trade Workflow] <<Core>> as TR\n\npackage \"finrl.meta\" <<Meta>> {\n    component [DataProcessor] <<Meta>> as DP\n    component [Data Sources] <<Meta>> as DS\n    component [StockTradingEnv (Gym)] <<Meta>> as E\n    component [Paper Trading Interface] <<Meta>> as PT\n}\n\npackage \"finrl.agents\" <<Agent>> {\n    component [DRLAgent (SB3, ERL, RLlib)] <<Agent>> as A\n    component [DRL Libraries] <<Agent>> as DRL\n}\n\nM --> C : Reads global parameters\nM --> T : Calls train()\nM --> TR : Calls trade()\n\nT --> DP : 1. Initializes\nDP --> DS : 2. Downloads & Preprocesses Data\nDP --> T : 3. Returns price/tech/turbulence arrays\n\nT --> E : 4. Instantiates Environment (with arrays)\nT --> A : 5. Initializes DRL Agent (with Env)\nA --> DRL : 6. Trains Model\n\nTR --> E : Instantiates Environment\nTR --> A : Loads Trained Model\nTR --> PT : Executes Paper Trading (if mode=paper_trading)\n\nE .right.> A : State/Action/Reward Loop (step())\nA .left.> E : State/Action/Reward Loop (predict())\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe FinRL codebase effectively utilizes several software design patterns to achieve its goals of modularity, extensibility, and separation of concerns.\n\n1.  **Adapter Pattern**\n    *   **Description**: This pattern allows the interface of an existing class to be used as another interface. In FinRL, it is used to unify the interfaces of disparate DRL libraries.\n    *   **Implementation**: The `DRLAgent` classes in `finrl/agents/stablebaselines3/models.py`, `finrl/agents/elegantrl/models.py`, and `finrl/agents/rllib/models.py` all conform to a common interface (`get_model`, `train_model`, `DRL_prediction`). Each class adapts the specific API calls of its underlying DRL library (SB3, ElegantRL, or RLlib) to this single, unified interface, allowing the core `train.py` script to treat them interchangeably.\n\n2.  **Factory Method Pattern (Implicit)**\n    *   **Description**: This pattern provides an interface for creating objects in a superclass, but allows subclasses to alter the type of objects that will be created.\n    *   **Implementation**: The `DataProcessor` class in `finrl/meta/data_processor.py` acts as a simple factory. Based on the `data_source` string passed to its constructor (e.g., `\"alpaca\"`, `\"yahoofinance\"`), it dynamically instantiates the correct concrete data processor object (e.g., `AlpacaProcessor`, `YahooFinanceProcessor`).\n    *   **Code Example (from `data_processor.py`)**:\n        ```python\n        class DataProcessor:\n            def __init__(self, data_source, ...):\n                if data_source == \"alpaca\":\n                    self.processor = Alpaca(...)\n                elif data_source == \"yahoofinance\":\n                    self.processor = YahooFinance()\n                # ... other data sources\n        ```\n\n3.  **Strategy Pattern**\n    *   **Description**: This pattern defines a family of algorithms, encapsulates each one, and makes them interchangeable. Strategy lets the algorithm vary independently from the clients that use it.\n    *   **Implementation**: The overall training workflow in `train.py` allows the user to select a \"strategy\" (the DRL algorithm, e.g., PPO, SAC, DDPG) and the DRL library (e.g., `stable_baselines3`, `elegantrl`) at runtime. The `train` function then dynamically loads and uses the corresponding `DRLAgent` and DRL model based on these parameters, enabling easy comparison of different trading strategies.\n\n#### 3.3.2. Project Highlights\n\nThe FinRL framework includes several innovative features and design choices that enhance its utility and flexibility for financial reinforcement learning:\n\n*   **Unified DRL Framework**: FinRL provides a single, consistent API that abstracts away the differences between multiple state-of-the-art DRL libraries, including Stable-Baselines3, ElegantRL, and RLlib. This allows researchers and practitioners to easily switch between and compare algorithms (e.g., PPO, SAC, DDPG) without modifying the core data or environment logic.\n*   **Financial Market Modeling with Risk Awareness**: The custom Gym environments, such as `StockTradingEnv`, are specifically tailored for finance. They incorporate essential real-world elements like **transaction costs** (`buy_cost_pct`, `sell_cost_pct`) and, critically, a **turbulence index**. This index is used to model market volatility, and the environment enforces a **risk-management mechanism** (forced liquidation) when turbulence exceeds a threshold, making the simulation more realistic and risk-aware.\n*   **Data Source Agnosticism**: Through the `DataProcessor` abstraction, the framework achieves a high degree of data source agnosticism. The same DRL pipeline can be run on data from various providers (Yahoo Finance, Alpaca, WRDS, etc.) by simply changing a configuration parameter, significantly reducing the effort required for data integration.\n*   **Real-World Readiness and Paper Trading**: The inclusion of a dedicated `trade.py` module with the `AlpacaPaperTrading` class provides a direct and seamless path from backtesting to live paper trading. This feature is a major highlight, enabling users to test their trained agents in a simulated live market environment before committing real capital.\n*   **Ensemble Learning Support**: The framework is explicitly designed to facilitate the training and validation of multiple agents, supporting robust **ensemble strategies** (as demonstrated in `ensemble_stock_trading.py`). This is a key feature for improving the stability and performance of DRL models in the highly stochastic financial domain.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe FinRL framework is robust, but several areas can be optimized to improve performance, maintainability, and flexibility:\n\n1.  **Environment Performance and Vectorization**:\n    *   **Issue**: The core `StockTradingEnv` in `env_stocktrading.py` is implemented using standard Python/Pandas/NumPy logic, which can be slow for high-frequency or large-scale backtesting due to Python's overhead in the simulation loop.\n    *   **Suggestion**: Implement a fully **vectorized environment** for training. This involves processing all time steps for all assets in parallel using NumPy or a library like JAX/PyTorch, drastically reducing the number of Python function calls and improving training speed. The current `DummyVecEnv` wrapper only vectorizes the environment interface, not the internal simulation logic.\n\n2.  **Data Acquisition Reliability and Brittle Code**:\n    *   **Issue**: The `YahooFinanceProcessor` shows a mix of `yfinance` library usage and brittle web scraping techniques (Selenium/BeautifulSoup) for data acquisition. Web scraping is highly susceptible to breaking when the target website's structure changes.\n    *   **Suggestion**: Standardize data acquisition to rely solely on stable, official APIs (like Alpaca, which is already integrated) or robust data providers. Remove the reliance on Selenium/scraping to ensure long-term stability and maintainability of the data pipeline.\n\n3.  **Configuration Management Modernization**:\n    *   **Issue**: The use of global constants in `config.py` is simple but limits the flexibility required for complex, reproducible experiments. Modifying a global constant affects all parts of the code.\n    *   **Suggestion**: Adopt a modern configuration management library like **Hydra** or use **Pydantic Settings**. This would allow for structured, hierarchical configuration files (YAML/JSON), easy command-line overrides, and better separation of configuration from the core codebase, making experiment tracking and parameter tuning more robust.\n\n4.  **Code Quality and Documentation**:\n    *   **Issue**: While type hints are present, the documentation, particularly docstrings for the core `DRLAgent` methods and environment parameters, could be more comprehensive.\n    *   **Suggestion**: Enforce a documentation standard (e.g., NumPy or Google style docstrings) for all public methods and classes. This will significantly improve code clarity and reduce the learning curve for secondary developers.\n\n#### 3.4.2. Secondary Development Guide\n\nThe FinRL framework is designed for extensibility, making secondary development straightforward by focusing on the three core modular components: Data, Environment, and Agent.\n\n1.  **Start with `config.py`**:\n    *   The first step for any new experiment is to define the scope by modifying the global constants in `finrl/config.py`. This includes setting the `TRAIN_START_DATE`, `TRAIN_END_DATE`, the list of `INDICATORS`, and the hyperparameters for the DRL models (e.g., `PPO_PARAMS`).\n\n2.  **Define the Task (Environment)**:\n    *   For standard tasks (stock trading, crypto), use the existing environments in `finrl/meta/env_stock_trading`.\n    *   To create a new financial task (e.g., options trading, futures), create a new custom Gym environment class that inherits from `gym.Env` and defines the unique state, action, and reward mechanisms specific to that task. Ensure the `step()` method correctly calculates the reward and updates the state based on the action.\n\n3.  **Prepare Data (DataProcessor)**:\n    *   If your data source is supported (Yahoo, Alpaca, etc.), use the existing `DataProcessor` facade.\n    *   To integrate a new data source, create a new `processor_yourname.py` file in `finrl/meta/data_processors`. This new class must implement the required methods: `download_data`, `clean_data`, `add_technical_indicator`, and crucially, `df_to_array` to convert the data into the NumPy arrays expected by the environment.\n\n4.  **Select/Implement Agent**:\n    *   Choose a DRL library (Stable-Baselines3 is recommended for its comprehensive documentation). The `DRLAgent` wrappers handle the integration.\n    *   To add a new DRL algorithm not currently supported, extend the appropriate `DRLAgent` class in `finrl/agents` and implement the `get_model`, `train_model`, and `DRL_prediction` methods to wrap the new algorithm's API.\n\n5.  **Execute via `main.py`**:\n    *   Use the command-line interface (`python main.py --mode=train`) to execute the workflow. The orchestration logic in `main.py`, `train.py`, and `trade.py` will handle the rest, ensuring the data, environment, and agent are correctly linked.\n\nThis modular approach ensures that developers can focus on one component at a time without needing to rewrite the entire pipeline.\n\n"
  },
  {
    "path": "thirdparty/FinRobot.md",
    "content": "# FinRobot - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\n\n```\n\n### 1.2. Core Folders for Analysis\n\n\n\n## Phase 2: Module-by-Module Deep Analysis\n\n\n\n### Module PlantUML Diagrams\n\n\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\n\n\n#### 3.1.2. Component Interactions\n\n\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\n\n\n#### 3.3.2. Project Highlights\n\n\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\n\n\n#### 3.4.2. Secondary Development Guide\n\n\n\n"
  },
  {
    "path": "thirdparty/FinceptTerminal.md",
    "content": "# FinceptTerminal - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\n/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal (Root of the project)\n|-- .github (Configuration for GitHub workflows and templates)\n|-- docs (Project documentation, likely Docusaurus or similar)\n|-- fincept-terminal-desktop (The main application source code)\n|   |-- public (Static assets for the frontend)\n|   |-- src-tauri (Rust backend code for Tauri)\n|   |   |-- src (Core Rust source files)\n|   |   |   |-- commands (Tauri commands for data fetching and utilities, over 30 data sources)\n|   |   |   |-- data_sources (Rust-side data source implementations)\n|   |   |   |-- utils (Utility functions, notably the Python execution bridge)\n|   |   |   |-- lib.rs (Main Rust library, process management, IPC setup)\n|   |   |   |-- main.rs (Tauri entry point)\n|   |-- src (TypeScript/React frontend code)\n|   |   |-- assets (Frontend static assets)\n|   |   |-- components (Reusable UI components)\n|   |   |   |-- tabs (Major feature views like data-mapping, trading, portfolio, node-editor)\n|   |   |   |-- ui (Design system components)\n|   |   |-- constants (Application-wide configuration values)\n|   |   |-- contexts (React Context providers for global state)\n|   |   |-- hooks (Custom React hooks for logic reuse)\n|   |   |-- lib (Frontend utility functions)\n|   |   |-- services (Core business logic and data orchestration)\n|   |   |   |-- backtesting (Logic for backtesting strategies)\n|   |   |   |-- websocket (Real-time data handling)\n|   |   |   |-- trading (Order management logic)\n|   |   |-- stockBrokers (Brokerage API integration adapters, e.g., ZerodhaKite)\n|   |   |-- types (TypeScript interfaces and type definitions)\n|   |   |-- App.tsx (Main React application component)\n|-- images (Marketing and documentation images)\n\nThe project structure clearly delineates the **Hybrid Architecture**. The `fincept-terminal-desktop` directory houses the core application, split into the `src-tauri` (Rust backend) and `src` (React/TypeScript frontend) folders. This separation of concerns is fundamental, with the Rust layer managing system-level tasks and the Python bridge, while the TypeScript layer handles the rich user interface and business logic via services. The extensive `commands` directory in the Rust backend highlights the project's focus on being a comprehensive financial data aggregator.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src-tauri/src`: The core Rust backend, handling IPC, process management, and data source delegation.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src/components`: The React frontend's presentation layer, including all UI elements and feature views.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src/services`: The frontend's business logic layer, containing core features like workflow management, backtesting, and trading.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src/stockBrokers`: Brokerage integration adapters, implementing the Adapter Pattern for trading.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src/types`: Shared TypeScript interfaces and type definitions for application-wide data structures.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src/constants`: Application-wide configuration values and magic strings.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src/contexts`: React Context providers for global state management.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/FinceptTerminal/fincept-terminal-desktop/src/hooks`: Custom React hooks for logic reuse across components.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module 1: `src-tauri/src` (Rust Backend)\n\n**Core Responsibility:** The Rust backend, built with Tauri, serves as the **core application logic and data gateway**. Its primary function is to manage system-level interactions, handle inter-process communication (IPC) with the frontend, and act as a secure, performant bridge to various external data sources and computational backends (like Python). It is responsible for managing the lifecycle of external processes, such as the MCP (Model Context Protocol) server.\n\n**Key Files and Functions:**\n*   `lib.rs`: Defines the core state management (`MCPState` and `MCPProcess`) for external processes.\n*   `commands/mod.rs`: The central registry for all Tauri commands, revealing **extensive data source integration** (e.g., `yfinance`, `polygon`, `fred`, `worldbank`).\n*   `utils/python.rs`: A critical file that implements the logic to locate and execute the Python interpreter across different operating systems, confirming that the Rust backend delegates data fetching and heavy computation to Python scripts.\n\n**Core Implementation & Dependencies:** The module uses Rust and Tauri, relying on `std::sync::{Arc, Mutex}` for safe, concurrent management of external processes. Tauri's `#[tauri::command]` macro is used extensively to expose Rust functions to the TypeScript/React frontend.\n\n## Module 2: `src/components` (Frontend UI Components)\n\n**Core Responsibility:** This module contains the React/TypeScript components that form the user interface, responsible for visual presentation and user interaction.\n\n**Key Files and Functions (Inferred from Directory Structure):**\n*   `components/tabs/*`: Contains the main feature views, such as `data-mapping`, `equity-research`, `node-editor`, `portfolio`, and `trading`, indicating a highly modular, tab-based application structure.\n*   `components/charts`: Dedicated components for financial data visualization.\n\n**Core Implementation & Dependencies:** Built with TypeScript and React, the components rely on the Tauri API (`@tauri-apps/api`) to call the Rust commands for data and system interaction.\n\n## Module 3: `src/services` (Frontend Business Logic)\n\n**Core Responsibility:** This module encapsulates the complex business logic and data orchestration for the frontend, separating it from the presentation layer.\n\n**Key Files and Functions:**\n*   `workflowService.ts`: Manages the creation, storage, execution, and state of user-defined **workflows**, suggesting a core feature is a visual programming or automation tool.\n*   `services/backtesting`: Contains logic for financial backtesting, likely integrating with Python libraries like `vectorbt` or `lean`.\n*   `services/websocket`: Handles real-time data streaming, essential for a financial terminal.\n\n**Core Implementation & Dependencies:** This module implements the **Service Layer** pattern and depends on the Tauri IPC layer to communicate with the Rust backend for data and process control.\n\n## Module 4: `src/stockBrokers` (Brokerage Integration)\n\n**Core Responsibility:** Provides a standardized interface for connecting to and interacting with various stock brokerage APIs.\n\n**Key Files and Functions:**\n*   `stockBrokers/india/zerodhaKite`: A concrete implementation for a specific Indian brokerage, indicating a focus on the Indian market or a modular design for regional expansion.\n\n**Core Implementation & Dependencies:** The module likely uses the **Adapter Pattern** to normalize the different brokerage APIs into a single interface used by the `trading` service.\n\n## Module 5: `src/types` (Shared Data Structures)\n\n**Core Responsibility:** Defines the core TypeScript data structures and interfaces used across the entire frontend application, ensuring type safety and consistency. This adheres to the **Single Source of Truth** principle for data types.\n\n## Module 6: `src/lib`, `src/hooks`, `src/constants`, `src/contexts` (Utilities and State)\n\n**Core Responsibility:** Contains common utilities, custom React hooks, application-wide constants, and React context providers for global state. This module uses the **Context Pattern** for dependency injection and state management throughout the frontend.\n\n**Conclusion:** The project is a **hybrid desktop application** built with **Tauri (Rust) and React/TypeScript**. The Rust backend acts as a secure data API gateway, leveraging Python for data fetching, while the React frontend provides a rich, modular, tab-based user interface with core features like **workflow automation**, **backtesting**, and **brokerage integration**.\n\n### Module PlantUML Diagrams\n\n# Rust Backend Module (`src-tauri/src`)\n\n@startuml\ntitle Rust Backend Module (`src-tauri/src`)\n\npackage \"Core Logic\" {\n    class AppHandle\n    class MCPState {\n        - processes: Mutex<HashMap<String, MCPProcess>>\n    }\n    class MCPProcess {\n        - child: Child\n        - stdin: Arc<Mutex<ChildStdin>>\n        - response_rx: Receiver<String>\n    }\n    class SpawnResult\n    interface TauriCommand\n}\n\npackage \"Utilities\" {\n    class PythonUtils {\n        + get_python_path(app: &AppHandle)\n        + execute_python_command(...)\n    }\n}\n\npackage \"Commands\" {\n    class YFinanceCommand <<TauriCommand>>\n    class PolygonCommand <<TauriCommand>>\n    class FredCommand <<TauriCommand>>\n    ' ... many other data source commands\n}\n\nAppHandle \"1\" -- \"1\" MCPState : manages\nMCPState \"1\" -- \"*\" MCPProcess : contains\nMCPProcess \"1\" -- \"1\" SpawnResult : returns status\nAppHandle \"1\" -- \"1\" PythonUtils : uses\nYFinanceCommand ..> PythonUtils : executes script via\nYFinanceCommand ..> AppHandle : requires\nTauriCommand <|-- YFinanceCommand\nTauriCommand <|-- PolygonCommand\nTauriCommand <|-- FredCommand\n\n@enduml\n\n# Frontend Services Module (`src/services`)\n\n@startuml\ntitle Frontend Services Module (`src/services`)\n\nclass Workflow {\n    + id: string\n    + name: string\n    + nodes: any[]\n    + edges: any[]\n    + status: 'idle' | 'running' | 'completed' | 'error' | 'draft'\n}\n\nclass WorkflowService {\n    - workflows: Map<string, Workflow>\n    - runningWorkflows: Set<string>\n    + saveWorkflow(workflow: Workflow)\n    + runWorkflow(workflowId: string)\n    + cleanupRunningWorkflows()\n}\n\nclass BacktestingService {\n    + runBacktest(strategy: Strategy)\n}\n\nclass WebsocketService {\n    + connect(url: string)\n    + subscribe(symbol: string)\n}\n\nclass TradingService {\n    + placeOrder(order: Order)\n}\n\nWorkflowService \"1\" -- \"*\" Workflow : manages\nBacktestingService ..> TradingService : may simulate orders\nWebsocketService ..> TradingService : provides real-time data\nWorkflowService ..> TauriIPC : calls Rust commands\nBacktestingService ..> TauriIPC : calls Rust commands\n\n@enduml\n\n# Brokerage Integration Module (`src/stockBrokers`)\n\n@startuml\ntitle Brokerage Integration Module (`src/stockBrokers`)\n\ninterface BrokerAdapter {\n    + authenticate(credentials: any)\n    + getQuote(symbol: string)\n    + placeOrder(order: Order)\n}\n\nclass ZerodhaKiteAdapter {\n    - apiKey: string\n    - accessToken: string\n    + authenticate(credentials: any)\n    + getQuote(symbol: string)\n    + placeOrder(order: Order)\n}\n\nBrokerAdapter <|.. ZerodhaKiteAdapter\nTradingService \"1\" --> \"1\" BrokerAdapter : uses adapter pattern\n\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe FinceptTerminal project is built on a **Hybrid Desktop Architecture** using **Tauri**, which is the foundational design philosophy. This approach leverages the performance, security, and native capabilities of **Rust** for the backend logic while providing a rich, cross-platform user interface with **React/TypeScript**.\n\n### Core Abstractions\n\n1.  **Tauri IPC Command:** The `#[tauri::command]` macro in Rust is the central abstraction for all application functionality. It creates a clean, asynchronous boundary between the UI and the system/data layer. Every data request, process management call, and utility function is exposed through this unified IPC mechanism.\n2.  **Data Gateway:** The collection of Rust commands acts as a comprehensive **Data Gateway**. Instead of implementing all data fetching logic in Rust, the project abstracts the data source itself. Each command (e.g., `yfinance`, `fred`, `polygon`) represents a specific external API, standardizing access to a vast array of financial and economic data.\n3.  **Workflow Object:** In the frontend, the `Workflow` object is a critical abstraction. It represents a user-defined, executable sequence of operations (a visual programming model). The object encapsulates the nodes, edges, status, and results of a user's automated task, making complex analysis and trading strategies manageable.\n4.  **Python Execution Bridge:** The `utils/python.rs` module is an abstraction that hides the complexity of finding and executing Python scripts across different operating systems. This allows the Rust layer to seamlessly delegate data fetching and heavy computational tasks to the extensive Python data science ecosystem.\n\n### Design Philosophy\n\nThe architecture adheres to a **Layered Design** with a strong emphasis on **Extensibility** and **Separation of Concerns**:\n*   **Presentation Layer (React/TS):** Handles UI and user interaction.\n*   **Service Layer (TypeScript Services):** Encapsulates business logic (`WorkflowService`, `BacktestingService`).\n*   **Application Layer (Rust Backend):** Manages IPC, process lifecycle, and data orchestration.\n*   **Data Layer (Python Scripts/External APIs):** Handles the actual data retrieval and processing.\n\n### Lifecycle Management\n\nThe application lifecycle is managed by Tauri and the custom services:\n*   **Application Startup:** Tauri initializes the Rust backend, which then prepares system resources, including the potential startup and state management of the **MCP (Model Context Protocol) server** (managed by `MCPState` and `MCPProcess` in `lib.rs`).\n*   **Workflow Lifecycle:** The `WorkflowService` manages the state of user workflows, persisting them in local storage. The Rust backend is responsible for the lifecycle of any external processes spawned by a running workflow, ensuring they are properly cleaned up upon application exit or workflow termination (`cleanup_running_workflows`).\n\n#### 3.1.2. Component Interactions\n\nThe application's dynamic behavior is driven by a sophisticated set of communication patterns:\n\n| Component | Interacts With | Communication Pattern | Data Flow Example |\n| :--- | :--- | :--- | :--- |\n| **React UI** | **Rust Backend** | Tauri IPC (Asynchronous Command Calls) | User clicks \"Fetch Data\" -> React calls `tauri.invoke('execute_yfinance_command', ...)` |\n| **Rust Backend** | **Python Scripts** | Process Spawning and Standard I/O | `YFinanceCommand` calls `execute_python_command` -> Rust spawns Python process -> Python prints JSON to stdout -> Rust captures stdout and returns it via IPC. |\n| **Rust Backend** | **MCP Server** | Inter-Process Communication (Custom Protocol) | Rust manages the `MCPProcess` lifecycle and communicates with it via its stdin/stdout streams, as suggested by the `MCPProcess` struct fields (`child`, `stdin`, `response_rx`). |\n| **Frontend Services** | **External APIs** | WebSocket (Real-time) | `WebsocketService` connects directly to a market data provider, pushing updates to the UI components for live charting. |\n| **Trading Service** | **Broker Adapters** | Adapter Pattern (Method Calls) | `TradingService` calls `brokerAdapter.placeOrder()`, which is implemented by a specific adapter like `ZerodhaKiteAdapter`. |\n\nThe primary data flow for historical and fundamental data is a **chain of delegation**: **Frontend -> Rust IPC -> Python Bridge -> External API**. This pattern ensures that the computationally heavy and data-intensive tasks are handled by the most appropriate tool (Python's data science stack), while the Rust layer maintains control and security. The use of JSON serialization is implied at every boundary (Python stdout, Tauri IPC) to ensure structured data transfer.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\ntitle FinceptTerminal Overall Architecture\n\nskinparam componentStyle rectangle\n\npackage \"Frontend (React/TypeScript)\" {\n    [UI Components] as UI\n    [Services] as S\n    [Data Types] as T\n    [Custom Hooks] as H\n    [Constants] as C\n    [Contexts] as CX\n}\n\npackage \"Backend (Rust/Tauri)\" {\n    [Tauri IPC Layer] as IPC\n    [Python Execution Bridge] as PyBridge\n    [MCP Server Manager] as MCPM\n    [Data Source Commands] as DSC\n}\n\npackage \"External Systems\" {\n    [Python Data Stack] as PDS\n    [External Data APIs] as APIs\n    [MCP Server] as MCP\n    [Brokerage APIs] as BAPI\n}\n\nUI --> S : calls business logic\nS --> IPC : invokes commands (via tauri.invoke)\nIPC --> DSC : routes data requests\nIPC --> MCPM : manages server lifecycle\nDSC --> PyBridge : delegates data fetching\nPyBridge --> PDS : executes Python scripts\nPDS --> APIs : fetches raw data\n\nS --> [Websocket Service] : real-time data\n[Websocket Service] --> APIs : streams data\n\nS --> [Trading Service] : order execution\n[Trading Service] --> [Broker Adapter] : uses standardized interface\n[Broker Adapter] --> BAPI : sends orders\n\nMCPM --> MCP : manages process (stdin/stdout)\n\nT <.. UI : defines data structures\nT <.. S : defines data structures\nH <.. UI : provides reusable logic\nCX <.. UI : provides global state\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe FinceptTerminal codebase employs several established design patterns to manage complexity, ensure modularity, and promote extensibility:\n\n1.  **Adapter Pattern (Brokerage Integration):**\n    *   **Description:** This pattern allows the interface of an existing class to be used as another interface. This is implemented in the `src/stockBrokers` module.\n    *   **Implementation:** The `TradingService` interacts with a generic `BrokerAdapter` interface. Concrete classes like `ZerodhaKiteAdapter` implement this interface, translating generic trading commands into the specific API calls required by the respective brokerage.\n\n2.  **Bridge Pattern (Rust-Python Interoperability):**\n    *   **Description:** Decouples an abstraction from its implementation so that the two can vary independently.\n    *   **Implementation:** The **Rust IPC Layer** acts as the abstraction (the \"what\" to do, e.g., \"get YFinance data\"). The **Python Execution Bridge** (`utils/python.rs`) and the underlying Python scripts act as the implementation (the \"how\" to do it).\n\n3.  **Service Layer Pattern (Frontend Business Logic):**\n    *   **Description:** Defines an application's boundary and a set of available operations from the perspective of the client.\n    *   **Implementation:** Modules like `WorkflowService`, `BacktestingService`, and `TradingService` in `src/services` encapsulate complex business rules and orchestration logic, shielding the UI components from direct interaction with the IPC layer.\n\n4.  **Command Pattern (Tauri IPC):**\n    *   **Description:** Encapsulates a request as an object, allowing for uniform handling of operations.\n    *   **Implementation:** Each `#[tauri::command]` in the Rust backend is a concrete command object that the frontend can invoke via `tauri.invoke()`, treating every backend operation uniformly.\n\n#### 3.3.2. Project Highlights\n\nThe FinceptTerminal project exhibits several innovative features and design choices that contribute to its extensibility and flexibility:\n\n*   **Hybrid Architecture for Best-of-Breed Tooling:** The combination of **Rust (Tauri)** for system performance and security, and **Python** for data science and financial libraries, is a significant highlight. This allows the application to deliver a fast, native desktop experience while leveraging the vast, specialized ecosystem of Python for financial analysis (e.g., `pandas`, `yfinance`, `vectorbt`).\n*   **Extensive Data Source Integration:** The sheer number of dedicated data source commands (over 30, including `fred`, `worldbank`, `sec`, `polygon`, and various government sources) is a major feature. This design makes the terminal a **universal financial data aggregator**, providing a single, standardized interface for disparate data APIs.\n*   **Workflow Automation Engine:** The presence of the `WorkflowService` and the implied `node-editor` component suggests a powerful, user-facing **visual programming environment**. This feature allows users to automate complex analytical tasks and trading strategies without writing code, greatly enhancing the application's utility and appeal to a broader user base.\n*   **Modular Brokerage Integration:** The use of the **Adapter Pattern** in `src/stockBrokers` ensures that the application is not locked into a single trading provider. This modularity is key to its flexibility, allowing for rapid expansion into new markets (e.g., the existing `zerodhaKite` for India) and future-proofing against changes in brokerage APIs.\n*   **Model Context Protocol (MCP) Integration:** The management of an **MCP Server** in the Rust backend indicates a forward-looking design for integrating external AI/ML models or custom computational backends. This provides a clear path for future expansion into advanced, AI-powered financial insights.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe current architecture is robust and highly extensible, but a few areas could be optimized for performance, scalability, and maintainability:\n\n1.  **Optimize the Rust-Python Bridge Performance:** The current model of delegating via `execute_python_command` introduces significant overhead from process startup and inter-process communication (IPC) latency. A key optimization would be to transition from spawning a new Python process for every data request to a **persistent Python server** (e.g., using a lightweight Flask/FastAPI server or a dedicated Jupyter kernel managed by the Rust backend). This would allow the Python environment to be initialized once, drastically reducing latency for subsequent data requests.\n\n2.  **Implement a Standardized Data Source Interface:** The current `commands/mod.rs` lists many specific commands, which could lead to code duplication. Creating a generic `DataSource` trait or interface in Rust that all data source commands must implement would enforce consistency, simplify the addition of new data sources, and allow for meta-operations like caching or rate-limiting to be applied uniformly.\n\n3.  **Enhance Workflow Persistence and State Management:** The migration of workflow storage from the browser's `localStorage` (as seen in `workflowService.ts`) to a more robust, local database solution like **SQLite** using Tauri's built-in database capabilities is recommended. This provides better scalability for a large number of complex workflows, ensures data integrity, and allows for more complex querying and indexing of workflow metadata.\n\n4.  **Refactor Frontend State Management:** For complex, application-wide state (e.g., user settings, active data connections), adopting a dedicated global state management library (e.g., Redux Toolkit, Zustand) to replace or augment the current reliance on React Contexts would provide better debugging tools, predictable state changes, and easier management of asynchronous operations.\n\n#### 3.4.2. Secondary Development Guide\n\nFor developers looking to extend or maintain the FinceptTerminal codebase, the following practices are recommended:\n\n1.  **Understand the Hybrid Flow:** Always trace a feature request through the three main layers: **React UI** (user interaction) -> **TypeScript Service** (business logic) -> **Rust IPC Command** (data gateway) -> **Python Script** (data execution). Start with the `src/App.tsx` for the frontend entry and `src-tauri/src/lib.rs` for the backend entry.\n\n2.  **Adding a New Data Source:**\n    *   **Step 1 (Python):** Write a new Python script (e.g., `new_api_data.py`) that handles the API request and returns the result as a JSON string to standard output.\n    *   **Step 2 (Rust):** Create a new file in `src-tauri/src/commands/` (e.g., `new_api.rs`). Implement a new `#[tauri::command]` that calls `utils::python::execute_python_command` with the new Python script's name and arguments.\n    *   **Step 3 (Frontend):** Create a new service or function in the TypeScript layer that calls the new Tauri command via `tauri.invoke()`.\n\n3.  **Integrating a New Stock Broker:**\n    *   Create a new adapter class in `src/stockBrokers/` (e.g., `src/stockBrokers/europe/degiroAdapter.ts`). This class **MUST** implement the generic `BrokerAdapter` interface to ensure compatibility with the `TradingService`.\n\n4.  **Extending the Workflow Engine:** New analytical or trading nodes are added by defining their structure and logic in the `src/components/tabs/node-editor` module. The execution logic for these nodes will primarily live in the `WorkflowService`, which orchestrates calls to the relevant services or directly to the Rust IPC layer.\n\n"
  },
  {
    "path": "thirdparty/Kronos.md",
    "content": "# Kronos - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe Kronos project exhibits a clear, modular structure typical of a modern deep learning project, organized around the core components of data processing, model definition, training, and deployment.\n\n```\n/home/ubuntu/Kronos_project\n├── .git/                               # Git version control metadata (Excluded from analysis)\n├── LICENSE                             # Project license file\n├── README.md                           # Project overview and documentation\n├── examples/                           # Scripts demonstrating model usage and prediction\n│   ├── data/                           # Sample financial data for examples\n│   └── prediction_*.py                 # Example scripts for single and batch prediction\n├── figures/                            # Image assets for documentation (e.g., overview.png)\n├── finetune/                           # **Core Module 2: Data Pipeline and Training**\n│   ├── config.py                       # Centralized configuration for all hyperparameters and paths\n│   ├── dataset.py                      # PyTorch Dataset for Qlib data, handles sliding window and sampling\n│   ├── qlib_data_preprocess.py         # Script for loading, cleaning, and splitting Qlib financial data\n│   ├── train_predictor.py              # Main script for training the Kronos Causal Language Model\n│   ├── train_tokenizer.py              # Main script for training the KronosTokenizer (VQ-VAE)\n│   └── utils/                          # Utility functions for DDP setup, logging, and model size calculation\n├── finetune_csv/                       # Alternative finetuning pipeline for CSV data (secondary/legacy)\n├── model/                              # **Core Module 1: Model Definition**\n│   ├── __init__.py                     # Module initialization\n│   ├── kronos.py                       # Defines the main Kronos and KronosTokenizer models, and KronosPredictor\n│   └── module.py                       # Defines all fundamental building blocks (Transformer, Quantizer, Embeddings)\n├── requirements.txt                    # Python dependencies for the core project\n├── tests/                              # Unit and regression tests (Excluded from core analysis)\n└── webui/                              # **Core Module 3: Web Interface for Inference**\n    ├── app.py                          # Flask application logic, API endpoints for data/model loading and prediction\n    ├── run.py                          # Startup script for the Flask application\n    ├── start.sh                        # Shell script for starting the web server\n    └── templates/                      # HTML templates for the frontend (e.g., index.html)\n```\n\nThe structure clearly separates concerns: the `model` directory contains the reusable intellectual property (the model architecture), `finetune` contains the ML engineering pipeline (data and training), and `webui` contains the deployment and demonstration layer. This separation ensures high modularity and maintainability. The `examples` and `tests` directories provide necessary context for usage and validation.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/Kronos_project/model`: Contains the core PyTorch model definitions, including the main `Kronos` and `KronosTokenizer` classes, the quantization logic, and all fundamental Transformer components. This is the heart of the project's intellectual property.\n*   `/home/ubuntu/Kronos_project/finetune`: Contains the complete data preparation and model training pipeline. This includes configuration (`config.py`), dataset handling (`dataset.py`), data preprocessing (`qlib_data_preprocess.py`), and the main training scripts (`train_tokenizer.py`, `train_predictor.py`).\n*   `/home/ubuntu/Kronos_project/finetune/utils`: Contains utility functions essential for the training process, such as DDP setup, seed setting, model size calculation, and time formatting.\n*   `/home/ubuntu/Kronos_project/webui`: Contains the Flask-based web application for demonstrating the model's inference capabilities. This includes the main application logic (`app.py`), the startup script (`run.py`), and the frontend template (`templates/index.html`).\n\n## Phase 2: Module-by-Module Deep Analysis\n\nThe Kronos project is composed of three primary core modules: `model`, `finetune`, and `webui`.\n\n### 1. Model Module (`/home/ubuntu/Kronos_project/model`)\n\nThis module is the core intellectual property, defining the architecture for the hierarchical quantization and causal language model.\n\n| File | Responsibility | Key Classes/Functions |\n| :--- | :--- | :--- |\n| `module.py` | Defines all fundamental building blocks for the Transformer and Quantization components. | `BSQuantizer`, `TransformerBlock`, `RMSNorm`, `RotaryPositionalEmbedding`, `HierarchicalEmbedding`, `DualHead` |\n| `kronos.py` | Defines the main model classes that assemble the components from `module.py`. | `KronosTokenizer`, `Kronos`, `KronosPredictor` |\n\n#### Implementation Details\n*   **Quantization**: The `BSQuantizer` implements the **Binary Spherical Quantization** technique, which is a variant of VQ-VAE. It maps continuous data to discrete indices, crucial for the language modeling approach. The `DifferentiableEntropyFunction` is used to enforce codebook usage and prevent collapse, a key challenge in VQ-VAE training.\n*   **Transformer Components**: The model uses a standard Transformer block (`TransformerBlock`) but replaces the traditional LayerNorm with **RMSNorm** and uses **Rotary Positional Embedding (RoPE)** for positional encoding, which are common optimizations in modern LLMs (e.g., Llama).\n*   **Hierarchical Tokenization**: `KronosTokenizer` and `HierarchicalEmbedding` implement the core idea of splitting the financial data into two token streams ($S_1$ for coarse, $S_2$ for fine) to improve the quality of the discrete representation.\n*   **Prediction**: The `KronosPredictor` class encapsulates the entire inference process, including the complex **autoregressive generation** loop, which involves repeated calls to the `Kronos` model and the tokenizer's `decode` method.\n\n### 2. Finetune Module (`/home/ubuntu/Kronos_project/finetune`)\n\nThis module manages the entire machine learning pipeline, from data ingestion to model training.\n\n| File | Responsibility | Key Classes/Functions |\n| :--- | :--- | :--- |\n| `config.py` | Centralizes all configuration parameters. | `Config` |\n| `qlib_data_preprocess.py` | Handles data loading from Qlib, feature calculation, and train/val/test splitting. | `QlibDataPreprocessor` |\n| `dataset.py` | Implements the PyTorch `Dataset` for time-series sampling. | `QlibDataset` |\n| `train_tokenizer.py` | Training script for the `KronosTokenizer` (VQ-VAE). | `train_model` (Tokenizer) |\n| `train_predictor.py` | Training script for the `Kronos` Causal Language Model. | `train_model` (Predictor) |\n\n#### Implementation Details\n*   **Data Ingestion**: `QlibDataPreprocessor` uses the Qlib library to access financial data, calculates the `amt` (amount) feature, and ensures the data is correctly windowed for the lookback and prediction horizons defined in `Config`.\n*   **Sampling Strategy**: `QlibDataset` implements a non-standard sampling strategy where `__getitem__` ignores the input index and instead draws a **random sample** from a pre-computed pool of all possible sliding windows. This is a robust method for training on large, continuous time-series data.\n*   **Training Loop**: Both training scripts use **Distributed Data Parallel (DDP)** and a standard optimization loop with **AdamW** and a **OneCycleLR** scheduler. The tokenizer training minimizes reconstruction and quantization loss, while the predictor training minimizes cross-entropy loss on the next token prediction.\n\n### 3. WebUI Module (`/home/ubuntu/Kronos_project/webui`)\n\nThis module provides a demonstration and inference interface using the Flask framework.\n\n| File | Responsibility | Key Classes/Functions |\n| :--- | :--- | :--- |\n| `app.py` | Defines the Flask application, API routes, and model loading/prediction logic. | `load_model`, `predict`, `create_prediction_chart` |\n| `run.py` | Startup script that checks dependencies and launches the Flask server. | `main` |\n\n#### Implementation Details\n*   **API Endpoints**: Key endpoints include `/api/load-model` (loads a model from a predefined list of configurations), `/api/data-files` (lists available data), and `/api/predict` (performs the core prediction).\n*   **Inference Integration**: The `predict` endpoint uses the `KronosPredictor` to generate forecasts. It handles input data validation, time-series slicing, and parameter passing (temperature, top-p, sample count) for the generative model.\n*   **Visualization**: The `create_prediction_chart` function uses the **Plotly** library to generate interactive candlestick charts, visualizing the historical data, the predicted future data, and optionally, the actual future data for comparison. The chart is returned as a JSON object for the frontend.\n\n### Module PlantUML Diagrams\n\n@startuml module_model\ntitle Core Model Module (/model)\n\npackage \"Quantization\" {\n    class DifferentiableEntropyFunction <<Function>>\n    class BinarySphericalQuantizer {\n        + embed_dim: int\n        + beta: float\n        + gamma0: float\n        + gamma: float\n        + zeta: float\n        + quantize(z)\n        + forward(z)\n        + soft_entropy_loss(z)\n    }\n    class BSQuantizer {\n        + s1_bits: int\n        + s2_bits: int\n        + codebook_dim: int\n        + forward(z, half)\n    }\n}\n\npackage \"Transformer Components\" {\n    class RMSNorm\n    class FeedForward\n    class RotaryPositionalEmbedding\n    class MultiHeadAttentionWithRoPE\n    class MultiHeadCrossAttentionWithRoPE\n    class TransformerBlock {\n        + norm1: RMSNorm\n        + self_attn: MultiHeadAttentionWithRoPE\n        + norm2: RMSNorm\n        + ffn: FeedForward\n        + forward(x)\n    }\n}\n\npackage \"Embeddings & Heads\" {\n    class FixedEmbedding\n    class TemporalEmbedding {\n        + minute_embed: Embed\n        + hour_embed: Embed\n        ...\n        + forward(x)\n    }\n    class HierarchicalEmbedding {\n        + emb_s1: nn.Embedding\n        + emb_s2: nn.Embedding\n        + fusion_proj: nn.Linear\n        + split_token(token_ids)\n        + forward(token_ids)\n    }\n    class DualHead {\n        + proj_s1: nn.Linear\n        + proj_s2: nn.Linear\n        + compute_loss(...)\n        + forward(x)\n        + cond_forward(x2)\n    }\n    class DependencyAwareLayer {\n        + cross_attn: MultiHeadCrossAttentionWithRoPE\n        + forward(hidden_states, sibling_embed)\n    }\n}\n\npackage \"Main Models\" {\n    class KronosTokenizer <<PyTorchModelHubMixin>> {\n        + encoder: nn.ModuleList\n        + decoder: nn.ModuleList\n        + tokenizer: BSQuantizer\n        + embed: nn.Linear\n        + head: nn.Linear\n        + forward(x)\n        + encode(x)\n        + decode(x)\n    }\n    class Kronos <<PyTorchModelHubMixin>> {\n        + embedding: HierarchicalEmbedding\n        + time_emb: TemporalEmbedding\n        + transformer: nn.ModuleList\n        + dep_layer: DependencyAwareLayer\n        + head: DualHead\n        + forward(s1_ids, s2_ids, stamp)\n    }\n    class KronosPredictor {\n        + model: Kronos\n        + tokenizer: KronosTokenizer\n        + generate(...)\n        + predict(...)\n        + predict_batch(...)\n    }\n}\n\n' Relationships\nBinarySphericalQuantizer <.. DifferentiableEntropyFunction : uses\nBSQuantizer *-- BinarySphericalQuantizer : wraps\nKronosTokenizer *-- BSQuantizer : uses\nKronosTokenizer *-- TransformerBlock : uses (encoder/decoder)\nKronos *-- HierarchicalEmbedding : uses\nKronos *-- TemporalEmbedding : uses\nKronos *-- TransformerBlock : uses\nKronos *-- DependencyAwareLayer : uses\nKronos *-- DualHead : uses\nTransformerBlock *-- MultiHeadAttentionWithRoPE : uses\nTransformerBlock *-- FeedForward : uses\nMultiHeadAttentionWithRoPE *-- RotaryPositionalEmbedding : uses\nMultiHeadCrossAttentionWithRoPE *-- RotaryPositionalEmbedding : uses\nDependencyAwareLayer *-- MultiHeadCrossAttentionWithRoPE : uses\nTemporalEmbedding *-- FixedEmbedding : uses (or nn.Embedding)\nKronosPredictor *-- Kronos : aggregates\nKronosPredictor *-- KronosTokenizer : aggregates\n\n@enduml\n\n@startuml module_finetune\ntitle Finetune Module (/finetune)\n\npackage \"Configuration\" {\n    class Config {\n        + qlib_data_path: str\n        + lookback_window: int\n        + predict_window: int\n        + train_time_range: list\n        + batch_size: int\n        + tokenizer_learning_rate: float\n        + predictor_learning_rate: float\n    }\n}\n\npackage \"Data Pipeline\" {\n    class QlibDataPreprocessor {\n        + initialize_qlib()\n        + load_qlib_data()\n        + prepare_dataset()\n    }\n    class QlibDataset <<PyTorch Dataset>> {\n        + __init__(data_type)\n        + __len__()\n        + __getitem__(idx)\n        + set_epoch_seed(epoch)\n    }\n}\n\npackage \"Training Scripts\" {\n    class TrainingUtils\n    class TrainTokenizer {\n        + main()\n        + train_model()\n    }\n    class TrainPredictor {\n        + main()\n        + train_model()\n    }\n}\n\n' Relationships\nConfig <.. QlibDataPreprocessor : Reads parameters\nConfig <.. QlibDataset : Reads parameters\nConfig <.. TrainTokenizer : Reads parameters\nConfig <.. TrainPredictor : Reads parameters\nQlibDataPreprocessor --> QlibDataset : Prepares and saves data\nQlibDataset <.. TrainTokenizer : Provides batches\nQlibDataset <.. TrainPredictor : Provides batches\nTrainTokenizer ..> TrainingUtils : Uses DDP setup/utilities\nTrainPredictor ..> TrainingUtils : Uses DDP setup/utilities\nTrainTokenizer ..> KronosTokenizer : Trains\nTrainPredictor ..> Kronos : Trains\nTrainPredictor ..> KronosTokenizer : Uses for tokenization\n\n@enduml\n\n@startuml module_webui\ntitle WebUI Module (/webui)\n\npackage \"Frontend\" {\n    [index.html] as IH\n}\n\npackage \"Backend\" {\n    class FlaskApp <<app.py>> {\n        + load_data_files()\n        + load_data_file(path)\n        + save_prediction_results(...)\n        + create_prediction_chart(...)\n        + / (index)\n        + /api/load-model (POST)\n        + /api/predict (POST)\n    }\n    class StartupScript <<run.py>> {\n        + check_dependencies()\n        + main()\n    }\n}\n\n' Relationships\nStartupScript --> FlaskApp : Launches server\nFlaskApp --> IH : Renders template\nFlaskApp ..> KronosPredictor : Uses for prediction\nFlaskApp ..> pandas : Data handling\nFlaskApp ..> plotly : Chart generation\n\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe Kronos project is built upon a **Hierarchical Quantization and Transformer-based Language Modeling** philosophy, specifically adapted for financial time-series data.\n\n### Core Abstractions\n\n1.  **Hierarchical Quantization (BSQuantizer)**: The project's most critical abstraction is the use of a **Binary Spherical Quantizer (BSQuantizer)**, implemented in `model/module.py`. This acts as a Vector Quantized Variational Autoencoder (VQ-VAE) component, translating continuous financial data (OHLCV) into discrete, hierarchical tokens.\n    *   **BSQuantizer** is a wrapper around `BinarySphericalQuantizer`, which maps a continuous vector `z` to a binary vector `zhat` (values in $\\{-1, 1\\}$) and calculates a quantization loss.\n    *   The **Hierarchical** aspect is achieved by splitting the total codebook dimension into two parts: $S_1$ bits (coarse token) and $S_2$ bits (fine token). This allows for a two-stage tokenization and prediction process, improving efficiency and representation power.\n\n2.  **KronosTokenizer (VQ-VAE)**: This class (`model/kronos.py`) implements the full VQ-VAE structure. It consists of:\n    *   **Encoder**: A stack of Transformer blocks that maps the input time-series data ($X$) to a latent representation ($Z$).\n    *   **Quantizer**: The `BSQuantizer` which converts $Z$ into discrete token indices ($S_1$ and $S_2$).\n    *   **Decoder**: A stack of Transformer blocks that reconstructs the original input ($X'$) from the quantized representation.\n    *   The tokenizer's primary role is to learn a compact, discrete representation of the continuous financial data, effectively creating a **financial vocabulary**.\n\n3.  **Kronos (Causal Language Model)**: This is the main prediction model (`model/kronos.py`), which operates entirely in the discrete token space, similar to a standard decoder-only Transformer (like GPT).\n    *   It uses **HierarchicalEmbedding** to combine the $S_1$ and $S_2$ token IDs into a single embedding vector.\n    *   It incorporates **TemporalEmbedding** to inject time-based features (minute, hour, day, month) into the sequence.\n    *   It uses a stack of **TransformerBlock**s for sequence modeling.\n    *   The **DualHead** predicts the probability distribution over the next $S_1$ and $S_2$ tokens, enabling the causal prediction of future financial states.\n\n### Design Philosophy\n\nThe project follows a **Two-Stage Training** and **Tokenization-First** philosophy:\n1.  **Tokenization (Unsupervised)**: The `KronosTokenizer` is trained first to learn the optimal discrete representation of the financial data, minimizing reconstruction error and maximizing codebook usage (entropy loss). This is an unsupervised pre-training step.\n2.  **Prediction (Supervised/Self-Supervised)**: The `Kronos` model is then trained on the sequence of generated tokens using a causal language modeling objective. This decouples the continuous-to-discrete mapping from the sequence prediction task, which is a common and powerful technique in modern generative AI (e.g., VQ-GAN, VQ-Diffusion).\n\n### Lifecycle Management\n\nThe lifecycle is managed through standard PyTorch and Hugging Face practices:\n*   **Configuration**: The `Config` class (`finetune/config.py`) centralizes all hyperparameters, paths, and data splitting logic, ensuring a single source of truth for the entire pipeline.\n*   **Distributed Training**: The training scripts (`train_tokenizer.py`, `train_predictor.py`) are built for **Distributed Data Parallel (DDP)** using `torch.distributed`, enabling efficient scaling across multiple GPUs.\n*   **Model Persistence**: Both `KronosTokenizer` and `Kronos` inherit from `PyTorchModelHubMixin`, allowing them to use the `save_pretrained` and `from_pretrained` methods for easy serialization and loading from local paths or the **Hugging Face Hub**.\n*   **Inference Abstraction**: The `KronosPredictor` class provides a clean, high-level API (`predict`, `predict_batch`) that abstracts away the underlying tokenization, model inference, and denormalization steps, making the model easy to integrate into applications like the `webui`.\n\n#### 3.1.2. Component Interactions\n\nThe Kronos project architecture is centered around the **Kronos** and **KronosTokenizer** models, which are trained and then deployed for inference via a web interface. The core interaction flows are divided into three main phases: Data Preparation, Model Training, and Model Inference.\n\n### 1. Data Preparation and Loading\nThe data pipeline is managed within the `finetune` module, specifically by `QlibDataPreprocessor` and `QlibDataset`.\n*   **QlibDataPreprocessor** (`finetune/qlib_data_preprocess.py`) interacts with the external **Qlib Data Source** to load raw financial data (OHLCV). It processes this data by calculating additional features (like `amt`), performing data cleaning, and splitting it into train, validation, and test sets.\n*   The processed data is saved as pickled files (`.pkl`).\n*   **QlibDataset** (`finetune/dataset.py`) loads these pickled files and implements a sliding window mechanism to extract time-series samples. Crucially, it performs **instance-level normalization** on the fly for each sample, ensuring the model receives standardized input.\n\n### 2. Model Training (Tokenizer and Predictor)\nThe training process uses a two-stage approach: first training the tokenizer, then the predictor.\n*   **Tokenizer Training** (`finetune/train_tokenizer.py`): The script loads the `QlibDataset` and trains the **KronosTokenizer** (`model/kronos.py`). The input financial data (`x`) is passed to the tokenizer's `forward` method, which performs:\n    1.  Linear embedding (`self.embed`).\n    2.  Encoder Transformer blocks.\n    3.  Quantization embedding (`self.quant_embed`).\n    4.  Binary Spherical Quantization (`self.tokenizer`).\n    5.  Decoder Transformer blocks.\n    The loss function is a combination of **reconstruction loss** (MSE between input `x` and reconstructed output `z`) and the **BSQuantizer loss** (`bsq_loss`), which includes a commitment loss and an entropy penalty.\n*   **Predictor Training** (`finetune/train_predictor.py`): This script first loads the *finetuned* `KronosTokenizer` in evaluation mode. It then loads and trains the **Kronos** model (`model/kronos.py`).\n    1.  The input financial data (`x`) is passed to the *frozen* tokenizer's `encode` method to get the hierarchical token IDs (`s1_ids`, `s2_ids`).\n    2.  These token IDs, along with the time features (`stamp`), are fed into the `Kronos` model's `forward` method.\n    3.  The model uses a standard **causal language modeling** objective, predicting the next token (`s1_ids[t+1]`, `s2_ids[t+1]`) based on the current sequence.\n    4.  The loss is calculated using the **DualHead**'s `compute_loss` method, which applies cross-entropy loss to both S1 and S2 token logits.\n\n### 3. Model Inference (Web UI)\nThe web interface provides a user-friendly way to perform predictions.\n*   **Model Loading** (`webui/app.py`): The Flask application loads the `Kronos` model and `KronosTokenizer` using the `from_pretrained` method (suggesting a dependency on **Hugging Face Hub**). It then instantiates the **KronosPredictor** class, which encapsulates the entire inference logic.\n*   **Prediction** (`webui/app.py` -> `KronosPredictor.predict`):\n    1.  The user's input data (`df`) is pre-processed, including normalization using the mean/std of the input data itself.\n    2.  The pre-processed data is passed to `KronosPredictor.generate`.\n    3.  The `generate` method performs **autoregressive sampling**. In each step, it uses the `Kronos` model to predict the next token IDs, which are then decoded back into the financial data space using the `KronosTokenizer.decode` method.\n    4.  The output is denormalized and returned as a prediction DataFrame.\n*   **Visualization**: The results are formatted into a Plotly chart (candlestick) for display in the `index.html` template.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml architecture_diagram\ntitle Kronos Project Overall Architecture\n\nskinparam component {\n  FontSize 14\n  FontName Arial\n  BorderColor #383838\n  BackgroundColor #F0F0F0\n}\n\nskinparam package {\n  BorderColor #1E90FF\n  BackgroundColor #ADD8E6\n}\n\npackage \"Data Pipeline (finetune)\" as Finetune {\n    [Config] as C\n    [QlibDataPreprocessor] as QDP\n    [QlibDataset] as QD\n    [train_tokenizer.py] as TT\n    [train_predictor.py] as TP\n}\n\npackage \"Core Model (model)\" as Model {\n    [KronosTokenizer] as KT\n    [Kronos] as K\n    [KronosPredictor] as KP\n    [BSQuantizer] as BSQ\n    [Transformer Blocks] as TB\n}\n\npackage \"Web Interface (webui)\" as WebUI {\n    [app.py (Flask App)] as FA\n    [run.py (Startup)] as RU\n    [index.html (Frontend)] as IH\n}\n\npackage \"External Services\" as External {\n    [Qlib Data Source] as QDS\n    [Hugging Face Hub] as HF\n    [Comet ML] as CM\n}\n\n' Data Flow and Dependencies\nQDS --> QDP : Loads raw financial data\nQDP --> QD : Generates processed datasets (.pkl)\nQD --> TT : Provides training/validation batches\nQD --> TP : Provides training/validation batches\n\n' Training Flow\nTT --> KT : Trains Tokenizer (VQ-VAE loss)\nTP --> K : Trains Predictor (Cross-entropy loss)\nKT --> BSQ : Uses Quantizer\nK --> KT : Uses Tokenizer for token IDs\nKT --> HF : Loads/Saves pretrained/finetuned Tokenizer\nK --> HF : Loads/Saves pretrained/finetuned Predictor\nTT --> CM : Logs training metrics\nTP --> CM : Logs training metrics\n\n' Prediction/Inference Flow\nFA --> KP : Loads model and performs prediction\nKP --> K : Uses Kronos model for generation\nKP --> KT : Uses KronosTokenizer for encoding/decoding\nFA --> IH : Renders web page\nRU --> FA : Starts Flask server\n\n' High-Level Relationships\nModel <.. Finetune : Core logic used for training\nModel <.. WebUI : Core logic used for inference\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe Kronos codebase employs several established software and machine learning design patterns to manage complexity, ensure modularity, and facilitate training.\n\n### 1. Two-Stage Training Pattern (ML Pattern)\nThis pattern is fundamental to the project's structure, separating the representation learning from the sequence prediction task.\n\n*   **Implementation**:\n    *   **Stage 1 (Representation Learning)**: `KronosTokenizer` is trained as a VQ-VAE using `finetune/train_tokenizer.py`. It learns to compress continuous financial data into discrete tokens.\n    *   **Stage 2 (Sequence Modeling)**: `Kronos` is trained as a Causal Language Model using `finetune/train_predictor.py`. It learns to predict the next token in a sequence, using the vocabulary learned in Stage 1.\n\n### 2. Adapter Pattern (Structural Pattern)\nThe `KronosPredictor` class acts as an adapter, simplifying the complex interaction between the `Kronos` model and `KronosTokenizer` for external users (like the `webui`).\n\n*   **Implementation**:\n    ```python\n    # model/kronos.py (inside KronosPredictor.__init__)\n    self.model = model       # The Kronos Causal LM\n    self.tokenizer = tokenizer # The KronosTokenizer VQ-VAE\n    \n    # model/kronos.py (inside KronosPredictor.predict)\n    # The predictor handles the full pipeline:\n    # 1. Pre-processing/Normalization\n    # 2. Autoregressive generation (calling model and tokenizer repeatedly)\n    # 3. Denormalization\n    pred_df = predictor.predict(...)\n    ```\n    The `predict` method hides the intricate steps of token encoding, autoregressive loop, token decoding, and denormalization from the caller.\n\n### 3. Composite Pattern (Structural Pattern)\nThe Transformer architecture naturally lends itself to the Composite pattern, where a complex structure is built from a tree of simpler, uniform components.\n\n*   **Implementation**:\n    ```python\n    # model/kronos.py (inside Kronos.__init__)\n    self.transformer = nn.ModuleList([\n        TransformerBlock(...)\n        for _ in range(self.n_layers)\n    ])\n    \n    # model/module.py (TransformerBlock)\n    class TransformerBlock(nn.Module):\n        def __init__(self, ...):\n            self.self_attn = MultiHeadAttentionWithRoPE(...)\n            self.ffn = FeedForward(...)\n            # ...\n    ```\n    The `Kronos` model is a composite of `TransformerBlock`s, which are themselves composites of `MultiHeadAttentionWithRoPE` and `FeedForward` layers.\n\n### 4. Strategy Pattern (Behavioral Pattern)\nThe `QlibDataset` uses a strategy for sampling data, which is essential for training on large, time-series datasets where a full epoch is impractical.\n\n*   **Implementation**:\n    ```python\n    # finetune/dataset.py (QlibDataset.__getitem__)\n    # The index `idx` passed to __getitem__ is ignored.\n    # Instead, a random index is drawn from the pre-computed pool of indices.\n    random_idx = self.py_rng.randint(0, len(self.indices) - 1)\n    symbol, start_idx = self.indices[random_idx]\n    # ... extract data window\n    ```\n    This implements a **Random Sampling Strategy** over all possible sliding windows, ensuring that the model sees a diverse set of samples in each training iteration, regardless of the batch size or epoch definition.\n\n### 5. Dependency Injection (Software Pattern)\nThe training scripts inject the necessary configuration and models into the training functions.\n\n*   **Implementation**:\n    ```python\n    # finetune/train_predictor.py (main function)\n    config_instance = Config()\n    main(config_instance.__dict__)\n    \n    # ...\n    # train_model(model, tokenizer, device, config, ...)\n    ```\n    The `Config` object is created once and its parameters are passed around, ensuring that all parts of the training pipeline (data loading, model initialization, optimization) use the same set of parameters. The `tokenizer` is explicitly loaded and passed to `train_model` for use by the `model`, demonstrating a clear dependency chain.\n\n#### 3.3.2. Project Highlights\n\nThe Kronos project demonstrates several innovative and flexible design choices:\n\n*   **Financial Time-Series as a Language (Innovation)**: The core innovation is the application of a **Causal Language Model (CLM)**, typically used for text generation, to financial time-series prediction. This is achieved by:\n    *   **Tokenization**: Using the `KronosTokenizer` (a VQ-VAE variant) to translate continuous OHLCV data into discrete tokens, creating a \"financial vocabulary.\"\n    *   **Prediction**: Training the `Kronos` CLM to predict the next token in the sequence, which is equivalent to predicting the next price movement. This leverages the powerful sequence modeling capabilities of the Transformer architecture.\n\n*   **Hierarchical Quantization (Flexibility)**: The use of $S_1$ (coarse) and $S_2$ (fine) bits in the `BSQuantizer` and `HierarchicalEmbedding` provides a flexible and robust tokenization scheme.\n    *   This allows the model to first capture the **macro-level** price movements ($S_1$) and then refine the prediction with **micro-level** details ($S_2$), potentially improving both stability and accuracy.\n\n*   **Modern Transformer Components (Extensibility)**: The model incorporates state-of-the-art components from the LLM domain:\n    *   **RMSNorm** and **Rotary Positional Embedding (RoPE)** are used in the `TransformerBlock`. These choices are known to improve training stability and performance in large-scale sequence models, making the architecture easily extensible to larger model sizes.\n\n*   **Decoupled and Reusable Architecture (Extensibility)**: The clear separation between the `model` definition, the `finetune` pipeline, and the `webui` deployment layer ensures high reusability.\n    *   The `KronosPredictor` class is a clean, single-entry point for inference, allowing the core model to be easily integrated into other applications (e.g., trading bots, backtesting platforms) without needing to rewrite the complex autoregressive sampling logic.\n\n*   **Distributed Training Ready (Flexibility)**: The training scripts are fully configured for **Distributed Data Parallel (DDP)**, making the project immediately scalable for training on massive financial datasets using multi-GPU or multi-node setups. This is a critical feature for a foundation model approach.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe Kronos project is well-engineered, but several areas could be optimized for performance, robustness, and modern ML practices.\n\n### 1. Performance Bottlenecks and Optimization\n*   **Data Loading Efficiency**: The `QlibDataset.__getitem__` method performs **instance-level normalization** and clipping (`x = (x - x_mean) / (x_std + 1e-5)`, `x = np.clip(x, -self.config.clip, self.config.clip)`) for *every sample* during training. While correct, this repeated calculation and conversion from NumPy to PyTorch can be a bottleneck.\n    *   **Suggestion**: Pre-calculate and save the mean/std for each time series during the `QlibDataPreprocessor` stage. Normalize the data once and store it. The `__getitem__` method would then only need to perform the final clipping and tensor conversion.\n*   **Quantizer Entropy Loss**: The `BinarySphericalQuantizer` calculates a complex entropy penalty (`soft_entropy_loss` or `get_hard_per_sample_entropy`). The `soft_entropy_loss` involves `torch.einsum` and `softmax` over a potentially large codebook, which can be computationally intensive.\n    *   **Suggestion**: Profile the `BSQuantizer` forward pass. If the entropy calculation is a major overhead, consider simplifying the loss term or implementing a more efficient approximation, especially during the later stages of training.\n\n### 2. Architecture Optimization\n*   **Decoupling Tokenizer and Predictor**: The `KronosPredictor` class requires both the `Kronos` model and the `KronosTokenizer` to be loaded. The tokenizer is only used for the `encode` and `decode` steps.\n    *   **Suggestion**: Merge the necessary tokenization/detokenization logic (specifically `indices_to_bits` and the embedding/head layers) directly into the `Kronos` model's `generate` method. This would allow the `Kronos` model to be a single, self-contained unit for inference, simplifying deployment and reducing the number of objects to manage.\n\n### 3. Code Quality and Robustness\n*   **Configuration Management**: The `Config` class uses a flat structure (`self.epochs`, `self.batch_size`). For a project of this complexity, a nested configuration system (e.g., using **Hydra** or **Pydantic**) would improve readability and prevent naming conflicts.\n    *   **Suggestion**: Refactor `config.py` to use a structured configuration format, grouping parameters logically (e.g., `config.data.lookback_window`, `config.model.d_model`).\n*   **Error Handling in WebUI**: The `webui/app.py` has several `try...except Exception as e` blocks that catch all exceptions and return a generic 500 error.\n    *   **Suggestion**: Implement more granular exception handling. For example, catch `FileNotFoundError` for data loading, `ValueError` for invalid parameters, and a custom `ModelNotLoadedError` for prediction attempts, providing more informative error messages to the user.\n*   **Type Hinting and Documentation**: While some files are well-documented, extending type hints across all function signatures (especially in `model/module.py`) would improve code maintainability and static analysis. The `KronosPredictor` methods, in particular, would benefit from explicit return type hints.\n\n#### 3.4.2. Secondary Development Guide\n\nThe Kronos project is a well-structured, two-stage financial time-series prediction model. Secondary development should focus on the following areas:\n\n### 1. Code Exploration Path\n*   **Core Logic**: Start with `/home/ubuntu/Kronos_project/model/kronos.py` and `/home/ubuntu/Kronos_project/model/module.py`. These files define the `Kronos` and `KronosTokenizer` models, the core components (`TransformerBlock`, `BSQuantizer`), and the prediction logic (`KronosPredictor`). Understanding the **Hierarchical Quantization** mechanism in `BSQuantizer` is paramount.\n*   **Configuration**: Review `/home/ubuntu/Kronos_project/finetune/config.py` to understand all hyperparameters, data paths, and time-splitting logic. All major modifications to the training process start here.\n*   **Data Pipeline**: Examine `/home/ubuntu/Kronos_project/finetune/qlib_data_preprocess.py` and `/home/ubuntu/Kronos_project/finetune/dataset.py` to see how raw Qlib data is transformed into the normalized, windowed samples used for training.\n\n### 2. Secondary Development Best Practices\n*   **Feature Engineering**: To introduce new financial features, modify `QlibDataPreprocessor.load_qlib_data` to calculate the features and update `Config.feature_list`. The input dimension (`d_in`) of `KronosTokenizer` will need to be updated accordingly.\n*   **Model Architecture Modification**: Changes to the Transformer block structure (e.g., adding a new layer type, changing attention mechanism) should be implemented in `model/module.py` and then integrated into `Kronos` or `KronosTokenizer` in `model/kronos.py`.\n*   **Quantization Scheme**: Experimenting with the quantization parameters (e.g., `s1_bits`, `s2_bits`, `beta`, `gamma`) requires modifying `Config` and observing the impact on the `KronosTokenizer` training loss (reconstruction vs. entropy).\n*   **Distributed Training**: The training scripts (`train_tokenizer.py`, `train_predictor.py`) are DDP-ready. Use `torchrun` to launch training for maximum efficiency. Ensure all data loading and logging is handled correctly by the rank 0 process to avoid redundancy.\n*   **Inference Integration**: For integrating the model into a new application, use the `KronosPredictor` class. It is designed to handle the full inference lifecycle, from data normalization to autoregressive generation, requiring only the input DataFrame and timestamps.\n\n### 3. Testing\n*   The `tests` folder contains a regression test (`test_kronos_regression.py`). Any changes to the core model logic should be validated against this test to ensure numerical stability and prevent unintended side effects. New features should be accompanied by new unit tests.\n\n"
  },
  {
    "path": "thirdparty/Lean.md",
    "content": "# Lean - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project structure is organized around the core components of a quantitative trading engine, emphasizing a clear separation of concerns between financial abstractions, the algorithm execution layer, and the technical analysis library.\n\n```\n/home/ubuntu/FinnewsHunter/thirdparty/Lean/\n├── Algorithm/             # Core execution layer and strategy framework\n│   ├── Framework/         # Modular components (Alpha, Portfolio, Execution, Risk)\n│   └── QCAlgorithm.cs     # The main base class for all trading strategies\n├── Common/                # Fundamental financial abstractions and data structures\n│   ├── Data/              # Base data types (IBaseData, Slice)\n│   ├── Orders/            # Order types and transaction management\n│   ├── Securities/        # Security definition, holdings, and financial models\n│   └── Interfaces/        # Core interfaces (IAlgorithm, IFeeModel, etc.)\n├── Indicators/            # Comprehensive library of technical analysis indicators\n│   ├── CandlestickPatterns/ # Specific folder for candlestick pattern recognition\n│   └── IndicatorBase.cs   # The generic base class for all indicators\n└── ... (Other folders like Data, Configuration, Logging, etc.)\n```\n\nThe `Common` folder serves as the foundation, defining the core language of the system (Security, Order, Data). The `Algorithm` folder builds upon this foundation to provide the execution environment (`QCAlgorithm`) and the modular strategy pipeline (Framework). The `Indicators` folder provides the necessary tools for signal generation. This structure promotes high cohesion within modules and low coupling between them, making the system highly maintainable and extensible.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinnewsHunter/thirdparty/Lean/Common`: Contains the core financial abstractions, data structures, and interfaces for the entire engine, including `Security`, `Order`, and various financial models.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/Lean/Algorithm`: Houses the main execution logic, primarily the `QCAlgorithm` base class, and the modular Algorithm Framework components (Alpha, Portfolio, Execution, Risk, Universe Selection).\n*   `/home/ubuntu/FinnewsHunter/thirdparty/Lean/Indicators`: A comprehensive library of technical analysis indicators, built on a generic, composable base class (`IndicatorBase<T>`).\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module Analysis - Common\n\nThe `Common` module is the foundational layer of the LEAN engine, housing the core financial abstractions and data structures that are utilized throughout the entire system.\n\n### Core Responsibility\nThe module's primary responsibility is to define the **financial domain model** and provide the fundamental building blocks for representing market data, financial instruments, and trading actions.\n\n### Key Abstractions and Implementation Details\n\n#### 1. Financial Instruments (`Securities` sub-folder)\n*   **`Security.cs`**: This is the central class representing a tradable asset. It aggregates all necessary financial models (`IFeeModel`, `IFillModel`, `ISlippageModel`, `IBuyingPowerModel`) via the **Strategy Pattern**. This allows for highly realistic and customizable simulation of brokerage behavior. It also contains the `SecurityHolding` object, which tracks the current position.\n*   **`SecurityHolding.cs`**: Encapsulates the portfolio position for a single security, tracking `Quantity`, `AveragePrice`, and calculating real-time metrics like `HoldingsValue` and `UnrealizedProfit`.\n\n#### 2. Trading Actions (`Orders` sub-folder)\n*   **`Order.cs`**: An abstract base class for all order types. It defines common properties like `Symbol`, `Quantity`, `Type`, and `Status`. Concrete implementations like `MarketOrder`, `LimitOrder`, and `StopMarketOrder` inherit from this, providing the specific logic for each order type.\n\n#### 3. Data Structures\n*   **`IBaseData`**: The fundamental interface for all data points consumed by the engine (e.g., `TradeBar`, `Tick`). This ensures a uniform data handling mechanism.\n*   **`Slice`**: A container object passed to the algorithm's `OnData` method, holding all new `IBaseData` objects for the current time step.\n\n### Dependencies\nThe `Common` module is largely self-contained, with its interfaces serving as dependencies for higher-level modules like `Algorithm` and `Indicators`.\n\n## Module Analysis - Algorithm\n\nThe `Algorithm` module is the core execution layer of the LEAN engine, providing the base class (`QCAlgorithm`) that users extend to write their trading strategies. It integrates all the foundational components from the `Common` module and introduces the **Algorithm Framework** for modular strategy design.\n\n### Core Responsibility\nThe module's primary responsibility is to serve as the **entry point and execution context** for a trading strategy. It manages the lifecycle of the algorithm (initialization, data handling, event processing, termination) and provides a rich API for interacting with the market, portfolio, and data.\n\n### Key Abstractions and Implementation Details\n\n#### 1. The Core Algorithm (`QCAlgorithm.cs`)\n*   **Inheritance and Interfaces**: `QCAlgorithm` inherits from `MarshalByRefObject` and implements the crucial `IAlgorithm` interface.\n*   **Initialization (`Initialize`, `PostInitialize`)**: The constructor sets up all core managers (`Securities`, `Transactions`, `Portfolio`, `Schedule`, `UniverseManager`) and databases (`MarketHoursDatabase`, `SymbolPropertiesDatabase`). The `Initialize()` method is the user's entry point for setting up data subscriptions, cash, and start/end dates.\n*   **Data Handling (`OnData`, `OnEndOfTimeStep`)**: The `OnData(Slice slice)` method is the main event loop handler, receiving a `Slice` object containing all new data for the current time step. `OnEndOfTimeStep()` allows for post-data processing before the next step.\n*   **Security Management (`AddSecurity`, `RemoveSecurity`)**: Provides overloaded methods like `AddEquity`, `AddForex`, `AddFuture`, etc., which internally call `AddSecurity<T>`. This process involves creating `SubscriptionDataConfig` objects and initializing the `Security` object using the configured `ISecurityInitializer`.\n\n#### 2. The Algorithm Framework (Sub-folders)\nThe framework implements a **pipeline design pattern** for modular strategy creation, allowing users to compose a strategy from interchangeable components:\n*   **`Selection` (Universe Selection)**: Defines how the set of tradable assets is chosen and maintained.\n*   **`Alphas` (Alpha Model)**: Responsible for generating trading **insights** (predictions of price movement).\n*   **`Portfolio` (Portfolio Construction)**: Translates the generated insights into target portfolio weights.\n*   **`Execution` (Execution Model)**: Determines how to convert portfolio targets into actual market orders.\n*   **`Risk` (Risk Management)**: Applies risk checks and constraints to the generated orders.\n\n### Dependencies\nThe `Algorithm` module is heavily dependent on the interfaces and classes defined in the `Common` module, particularly `IAlgorithm`, `Security`, `Order`, and the various model interfaces (`IFeeModel`, `IFillModel`, etc.).\n\n## Module Analysis - Indicators\n\nThe `Indicators` module is a comprehensive library for technical analysis, providing a vast collection of financial indicators and patterns. Its design is highly modular, centered around a generic base class that simplifies the creation and chaining of new indicators.\n\n### Core Responsibility\nThe module's primary function is to provide **reusable, stateful technical analysis components** that can be easily integrated into any trading algorithm. It abstracts the complex mathematical calculations and data management required for indicators, allowing users to focus on strategy logic.\n\n### Key Abstractions and Implementation Details\n\n#### 1. Indicator Base Classes\n*   **`IndicatorBase.cs`**: This file defines the core hierarchy:\n    *   **`IndicatorBase` (Abstract)**: The non-generic base class implementing the `IIndicator` interface. It manages common properties like `Name`, `Samples`, `IsReady`, and the `Window` (a `RollingWindow<IndicatorDataPoint>`) to store historical output values. It also handles the `Updated` event, a key mechanism for chaining indicators.\n    *   **`IndicatorBase<T>` (Abstract Generic)**: The generic base class where `T` is the input data type. It implements the core `Update(IBaseData input)` method, which handles data type validation, time-series checks, and calls the abstract `ComputeNextValue(T input)` method, which is the sole responsibility of the concrete indicator implementation. This is a clear example of the **Template Method Pattern**.\n\n#### 2. Indicator Types and Composition\n*   **Simple Indicators**: The vast majority of files (e.g., `SimpleMovingAverage.cs`, `RelativeStrengthIndex.cs`) implement the `IndicatorBase<T>` and override `ComputeNextValue`.\n*   **Composite Indicators**: Indicators that combine the output of other indicators, demonstrating the **Decorator Pattern** and **Composite Pattern** (e.g., `CompositeIndicator.cs`).\n*   **Specialized Indicators**: Sub-folders like `CandlestickPatterns` contain classes for pattern recognition.\n\n### Dependencies\nThe module primarily depends on the `Common` module for its core data structures (`IBaseData`, `IndicatorDataPoint`, `RollingWindow`) and the `QuantConnect.Data.Consolidators` namespace for data aggregation.\n\n### Code Highlights\n*   **Indicator Chaining**: The design facilitates easy chaining of indicators using operator overloading, allowing users to write concise code like `var macd = EMA(12) - EMA(26)`.\n*   **Type Safety and Generics**: The use of generics in `IndicatorBase<T>` ensures that each indicator is strongly typed to the data it consumes.\n\n### Module PlantUML Diagrams\n\n@startuml Common Module\ntitle Common Module - Core Abstractions\n\ninterface IAlgorithm\ninterface ISecurityInitializer\ninterface IFeeModel\ninterface IFillModel\ninterface ISlippageModel\ninterface IBuyingPowerModel\ninterface ICurrencyConverter\n\nclass Security {\n    + Symbol : Symbol\n    + Type : SecurityType\n    + Price : decimal\n    + Holdings : SecurityHolding\n    + Exchange : SecurityExchange\n    + FeeModel : IFeeModel\n    + FillModel : IFillModel\n    + SlippageModel : ISlippageModel\n    + BuyingPowerModel : IBuyingPowerModel\n    + SetLeverage(leverage: decimal)\n    + Update(data: IReadOnlyList<BaseData>)\n}\n\nclass SecurityHolding {\n    - _security : Security\n    - _currencyConverter : ICurrencyConverter\n    + Quantity : decimal\n    + AveragePrice : decimal\n    + HoldingsCost : decimal\n    + HoldingsValue : decimal\n    + IsLong : bool\n    + IsShort : bool\n    + Update(price: decimal)\n}\n\nabstract class Order {\n    + Id : int\n    + Symbol : Symbol\n    + Quantity : decimal\n    + Price : decimal\n    + Type : OrderType\n    + Status : OrderStatus\n    + Properties : IOrderProperties\n    + GetValue(security: Security) : decimal\n}\n\nclass MarketOrder extends Order\nclass LimitOrder extends Order\nclass StopMarketOrder extends Order\n\nIAlgorithm <|-- Security : uses\nSecurity *-- SecurityHolding : contains\nSecurity *-- IFeeModel : uses\nSecurity *-- IFillModel : uses\nSecurity *-- ISlippageModel : uses\nSecurity *-- IBuyingPowerModel : uses\nSecurityHolding *-- ICurrencyConverter : uses\nOrder *-- Symbol : contains\nOrder *-- IOrderProperties : contains\n@enduml\n\n@startuml Algorithm Module\ntitle Algorithm Module - Framework Components\n\ninterface IAlgorithm\ninterface IAlphaModel\ninterface IPortfolioConstructionModel\ninterface IExecutionModel\ninterface IRiskManagementModel\ninterface IUniverseSelectionModel\n\nclass QCAlgorithm {\n    - Securities : SecurityManager\n    - Portfolio : SecurityPortfolioManager\n    - Transactions : SecurityTransactionManager\n    + Initialize()\n    + OnData(slice: Slice)\n    + AddEquity(ticker: string) : Equity\n    + SetAlpha(model: IAlphaModel)\n    + SetPortfolioConstruction(model: IPortfolioConstructionModel)\n    + SetExecution(model: IExecutionModel)\n    + SetRiskManagement(model: IRiskManagementModel)\n    + SetUniverseSelection(model: IUniverseSelectionModel)\n}\n\nclass AlphaModel implements IAlphaModel\nclass PortfolioConstructionModel implements IPortfolioConstructionModel\nclass ExecutionModel implements IExecutionModel\nclass RiskManagementModel implements IRiskManagementModel\nclass UniverseSelectionModel implements IUniverseSelectionModel\n\nQCAlgorithm .up.|> IAlgorithm\nQCAlgorithm \"1\" *-- \"1\" IAlphaModel : uses\nQCAlgorithm \"1\" *-- \"1\" IPortfolioConstructionModel : uses\nQCAlgorithm \"1\" *-- \"1\" IExecutionModel : uses\nQCAlgorithm \"1\" *-- \"1\" IRiskManagementModel : uses\nQCAlgorithm \"1\" *-- \"1\" IUniverseSelectionModel : uses\n\nAlphaModel .up.|> IAlphaModel\nPortfolioConstructionModel .up.|> IPortfolioConstructionModel\nExecutionModel .up.|> IExecutionModel\nRiskManagementModel .up.|> IRiskManagementModel\nUniverseSelectionModel .up.|> IUniverseSelectionModel\n@enduml\n\n@startuml Indicators Module\ntitle Indicators Module - Core Hierarchy\n\ninterface IIndicator {\n    + Name : string\n    + IsReady : bool\n    + Current : IndicatorDataPoint\n    + Update(input: IBaseData) : bool\n    + Reset()\n}\n\nabstract class IndicatorBase {\n    + Window : RollingWindow<IndicatorDataPoint>\n    + Samples : long\n    + OnUpdated(consolidated: IndicatorDataPoint)\n}\n\nabstract class \"IndicatorBase<T>\" as IndicatorBaseT {\n    # ComputeNextValue(input: T) : decimal\n    + Update(input: IBaseData) : bool\n}\n\nclass SimpleMovingAverage {\n    - _window : RollingWindow<IndicatorDataPoint>\n    # ComputeNextValue(input: IndicatorDataPoint) : decimal\n}\n\nclass RelativeStrengthIndex {\n    - _up : RelativeMovingAverage\n    - _down : RelativeMovingAverage\n    # ComputeNextValue(input: IndicatorDataPoint) : decimal\n}\n\nclass CompositeIndicator {\n    - _indicatorA : IIndicator\n    - _indicatorB : IIndicator\n}\n\nIIndicator <|-- IndicatorBase\nIndicatorBase <|-- IndicatorBaseT\nIndicatorBaseT <|-- SimpleMovingAverage\nIndicatorBaseT <|-- RelativeStrengthIndex\nIndicatorBaseT <|-- CompositeIndicator\n\nCompositeIndicator \"1\" *-- \"2\" IIndicator : aggregates\nSimpleMovingAverage ..> RollingWindow : uses\nRelativeStrengthIndex ..> RelativeMovingAverage : uses\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe LEAN engine's architecture is built upon a set of fundamental abstractions that define the lifecycle and components of a quantitative trading strategy. The design philosophy centers on **modularity, extensibility, and event-driven processing**.\n\n### Core Abstractions\nThe system is anchored by three primary financial abstractions:\n\n1.  **`Security`**: Represents a financial instrument (e.g., Equity, Forex, Future). It is the central object for all market-related data and configuration. Crucially, it aggregates all models that define its behavior: `IFeeModel`, `IFillModel`, `ISlippageModel`, and `IBuyingPowerModel`. This aggregation follows the **Strategy Pattern**, allowing for easy customization of trading costs and constraints per asset.\n2.  **`Order`**: An abstract base class for all trade requests (e.g., `MarketOrder`, `LimitOrder`). It encapsulates the intent to trade a specific quantity of a `Security`. The separation of `Order` from its execution logic is key to the system's transaction management.\n3.  **`IAlgorithm` (Implemented by `QCAlgorithm`)**: The main control plane. It manages the algorithm's state, portfolio, and the flow of data. Its lifecycle is strictly defined:\n    *   **Initialization**: The user-defined `Initialize()` method sets up the trading environment (cash, data subscriptions).\n    *   **Execution**: The engine repeatedly calls `OnData(Slice slice)` to push new market data to the algorithm.\n    *   **Termination**: The algorithm can be stopped via external events or reaching the backtest end date.\n\n### Design Philosophy: The Algorithm Framework\nThe engine adopts a **modular pipeline** philosophy, formalized in the Algorithm Framework, which breaks down the complex task of strategy creation into five interchangeable components:\n\n| Component | Interface | Responsibility |\n| :--- | :--- | :--- |\n| **Universe Selection** | `IUniverseSelectionModel` | Determines the set of tradable assets. |\n| **Alpha Model** | `IAlphaModel` | Generates trading signals/predictions (`Insight` objects). |\n| **Portfolio Construction** | `IPortfolioConstructionModel` | Translates signals into target portfolio weights. |\n| **Risk Management** | `IRiskManagementModel` | Filters or adjusts orders based on risk constraints. |\n| **Execution** | `IExecutionModel` | Converts final targets into executable `Order` objects. |\n\nThis design promotes **Separation of Concerns** and allows developers to mix and match components, greatly enhancing the platform's flexibility and research capabilities.\n\n#### 3.1.2. Component Interactions\n\nThe system operates on an **event-driven architecture**, where market data drives the entire process. The primary interaction flow is a sequence of transformations, starting with raw data and ending with a trade order.\n\n### Data Flow Sequence\n\n1.  **Data Ingestion and Consolidation**: Raw market data (Ticks, TradeBars) is loaded and passed through `IDataConsolidator` instances (often implicitly created by the `QCAlgorithm.AddSecurity` methods).\n2.  **Algorithm Update**: The engine calls `QCAlgorithm.OnData(Slice slice)`. The `Slice` object contains all new data points for the current time step.\n3.  **Universe Selection**: The `IUniverseSelectionModel` processes the data to determine which assets are currently active and tradable. This results in a dynamic list of `Security` objects.\n4.  **Alpha Generation**: The `IAlphaModel` consumes the data and generates a collection of `Insight` objects, which are predictions about the direction and magnitude of an asset's price movement.\n5.  **Portfolio Construction**: The `IPortfolioConstructionModel` takes the generated `Insight` objects and the current portfolio state, and calculates a set of desired `PortfolioTarget` objects (e.g., \"I want 10% of my portfolio in AAPL\").\n6.  **Risk Management**: The `IRiskManagementModel` intercepts the proposed `PortfolioTarget` objects and applies any necessary risk constraints (e.g., maximum drawdown, position size limits), potentially modifying or rejecting the targets.\n7.  **Order Execution**: The `IExecutionModel` takes the final, risk-adjusted targets and translates them into concrete `Order` objects (e.g., `MarketOrder` for 100 shares).\n8.  **Transaction Processing**: The `SecurityTransactionManager` receives the `Order` and applies the various models attached to the `Security` (Fee, Fill, Slippage) to simulate the trade execution and update the `SecurityHolding` and `SecurityPortfolioManager` state.\n\n### Key Communication Patterns\n\n*   **Observer Pattern (Indicators)**: Indicators use the `Updated` event to notify dependent indicators of a new value. This allows for complex indicator chains (e.g., RSI of an EMA) to update automatically.\n*   **Dependency Injection (Models)**: The `QCAlgorithm` class uses setter methods (`SetAlpha`, `SetExecution`, etc.) to inject concrete implementations of the strategy interfaces, adhering to the **Inversion of Control** principle.\n*   **Manager-Centric Access**: Core state is managed by dedicated classes (`SecurityManager`, `SecurityPortfolioManager`, `SecurityTransactionManager`) which are exposed as properties on `QCAlgorithm`, centralizing access to the trading environment.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml Overall Architecture\ntitle LEAN Engine Core Architecture\n\npackage Common {\n    [Security]\n    [Order]\n    [IBaseData]\n    [IAlgorithm]\n}\n\npackage Algorithm {\n    [QCAlgorithm]\n    [Alpha Model]\n    [Portfolio Model]\n    [Execution Model]\n}\n\npackage Indicators {\n    [IndicatorBase]\n    [SimpleMovingAverage]\n    [RelativeStrengthIndex]\n}\n\n[QCAlgorithm] ..> [Security] : manages\n[QCAlgorithm] ..> [Order] : creates\n[QCAlgorithm] ..> [IBaseData] : consumes\n[QCAlgorithm] ..> [Alpha Model] : sets\n[QCAlgorithm] ..> [Portfolio Model] : sets\n[QCAlgorithm] ..> [Execution Model] : sets\n\n[Alpha Model] ..> [IndicatorBase] : uses for signals\n[IndicatorBase] ..> [IBaseData] : consumes\n\n[QCAlgorithm] .up.|> [IAlgorithm]\n\n' High-level dependencies\nAlgorithm .up.|> Common : Core Financial Abstractions\nIndicators .up.|> Common : Data Structures (IBaseData)\nAlgorithm .up.> Indicators : Strategy Logic\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe LEAN engine leverages several established design patterns to achieve its modularity, extensibility, and maintainability.\n\n| Pattern | Description | Implementation Example |\n| :--- | :--- | :--- |\n| **Strategy Pattern** | Defines a family of algorithms, encapsulates each one, and makes them interchangeable. | The `Security` class aggregates various strategy interfaces: `IFeeModel`, `IFillModel`, `ISlippageModel`, and `IBuyingPowerModel`. A user can swap a `ConstantFeeModel` for a `PercentageFeeModel` without changing the core `Security` logic. |\n| **Template Method Pattern** | Defines the skeleton of an algorithm in a base class, deferring some steps to subclasses. | The `IndicatorBase<T>` abstract class defines the `Update` method's flow (data validation, sample counting, event firing) but delegates the core calculation to the abstract `ComputeNextValue(T input)` method, which must be implemented by every concrete indicator (e.g., `SimpleMovingAverage`). |\n| **Composite Pattern** | Composes objects into tree structures to represent part-whole hierarchies. | The `CompositeAlphaModel` allows multiple `IAlphaModel` instances to be treated as a single model. Similarly, the `CompositeIndicator` allows combining indicators (e.g., `IndicatorA + IndicatorB`). |\n| **Observer Pattern** | Defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically. | The `IndicatorBase` class exposes the `Updated` event. This is used extensively for **indicator chaining**, where one indicator (the subject) updates, and a dependent indicator (the observer) automatically receives the new value for its own calculation. |\n| **Factory Method Pattern** | Defines an interface for creating an object, but lets subclasses decide which class to instantiate. | The `Securities.CreateSecurity` method acts as a factory, using the `SecurityType` to determine which concrete `Security` subclass (`Equity`, `Future`, `Option`, etc.) to instantiate. |\n\n#### 3.3.2. Project Highlights\n\nThe LEAN engine's design incorporates several innovative features that contribute to its power and flexibility as a backtesting and live trading platform.\n\n*   **Modular Algorithm Framework**: The separation of the trading strategy into five distinct, interchangeable models (`Alpha`, `Portfolio`, `Execution`, `Risk`) is the most significant highlight. This modularity allows for rapid prototyping, component-level testing, and the creation of highly sophisticated, multi-component strategies.\n*   **Extensive Indicator Library and Chaining**: The `Indicators` module, with over 200 implementations, is a robust feature. The use of operator overloading on the `IndicatorBase` class (e.g., `indicatorA + indicatorB`) enables a highly intuitive and concise syntax for creating complex, chained technical analysis signals.\n*   **Data Abstraction and Uniformity**: The use of `IBaseData` and `Slice` ensures that all data types (Ticks, TradeBars, custom data) are handled uniformly by the `QCAlgorithm`. This abstraction simplifies the process of integrating new data sources without modifying the core algorithm logic.\n*   **Python Integration Support**: The design explicitly supports polyglot programming, with partial classes in `QCAlgorithm.Python.cs` dedicated to bridging the C# core with Python algorithms via `Python.Runtime`. This allows a single engine to serve both C# and Python developers, greatly expanding its user base.\n*   **Model-Driven Financial Logic**: By externalizing financial logic (fees, fills, slippage, buying power) into dedicated model interfaces, the engine achieves a high degree of realism and configurability. A user can simulate different brokerage environments simply by injecting different model implementations into the `Security` object.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nBased on the deep analysis of the core modules, the following suggestions are offered to optimize the architecture, improve performance, and enhance code quality.\n\n### Performance Bottlenecks and Optimization\n1.  **Parallelize Algorithm Framework Execution**: The current event-driven model processes the Algorithm Framework pipeline (Alpha, Portfolio, Execution) sequentially within the `OnData` call. For algorithms managing a large universe of securities, this can become a bottleneck. The **Alpha Model** and **Portfolio Construction Model** could be executed in parallel across different securities or insights, provided thread-safe data access is ensured. This would require a careful refactoring of the data managers to support concurrent reads.\n2.  **Optimize `RollingWindow` for Memory**: While the `RollingWindow` is efficient, for extremely long backtests or high-frequency data, the memory footprint of hundreds of indicators, each maintaining its own window, can be substantial. Exploring memory-mapped files or a more centralized, shared history service for indicators could reduce memory pressure.\n\n### Architecture Optimization\n1.  **Formalize Dependency Injection (DI)**: The `QCAlgorithm` currently uses direct setter methods (`SetAlpha`, `SetExecution`) to configure the strategy. While functional, adopting a more formal DI container pattern (e.g., using a lightweight IoC container) would make the system configuration more robust, testable, and easier to manage for complex deployments. This would decouple the `QCAlgorithm` from the concrete implementation details of the models.\n2.  **Strongly-Typed Event Handling**: The `OnCommand` method uses the `dynamic` keyword, which bypasses compile-time type checking and can lead to runtime errors. Refactoring this to use a strongly-typed command pattern with a dedicated command dispatcher would significantly improve code quality and maintainability.\n\n#### 3.4.2. Secondary Development Guide\n\nFor developers looking to extend or build upon the LEAN engine, the following path is recommended for code exploration and secondary development:\n\n### 1. Code Exploration Starting Point\n*   **Core Abstraction**: Begin by examining `Common/Securities/Security.cs` and `Common/Orders/Order.cs` to understand the fundamental financial objects.\n*   **Control Flow**: Next, study `Algorithm/QCAlgorithm.cs` and `Common/Interfaces/IAlgorithm.cs` to grasp the algorithm's lifecycle, the `Initialize()` setup, and the `OnData(Slice)` event loop.\n\n### 2. Implementing a New Trading Strategy\nThe most effective way to implement a new strategy is by creating custom components for the **Algorithm Framework**:\n\n*   **New Alpha Model**: To generate a new trading signal, create a class that implements `IAlphaModel`. The core logic will reside in the `Update` method, which should return a list of `Insight` objects based on the input data.\n*   **New Risk Management Model**: To enforce a custom risk rule (e.g., a maximum daily loss), implement `IRiskManagementModel`. This model's logic will filter or modify the `PortfolioTarget` objects before they are converted into orders.\n\n### 3. Creating a Custom Technical Indicator\nTo add a new technical indicator to the library:\n\n1.  **Inherit from `IndicatorBase<T>`**: Choose the appropriate input type `T` (e.g., `TradeBar` or `IndicatorDataPoint`).\n2.  **Implement `ComputeNextValue(T input)`**: This is the single, critical method where the indicator's mathematical calculation must be implemented.\n3.  **Chain Dependencies**: If the new indicator depends on others (e.g., a new moving average), use the existing indicator factory methods and the operator overloading feature to easily chain them within the constructor. For example: `_dependency = new SimpleMovingAverage(period).Of(inputData);`.\n\n"
  },
  {
    "path": "thirdparty/README.md",
    "content": "# 第三方开源金融智能体框架\n\n本文件夹包含了截至2025年11月的主要开源金融智能体框架。以下是已成功克隆的仓库列表。\n\n## 成功克隆的仓库 (20个)\n\n### 一、多智能体协作框架\n\n#### 1. TradingAgents系列\n- **TradingAgents** - 多角色专业分工（分析师/研究员/交易员）\n- **TradingAgents-CN** - 中文优化，支持A股/港股/国产大模型\n\n#### 2. FinRL生态系统\n- **FinRL** - 首个开源金融强化学习框架，三层架构\n- **FinRL-Meta** - 市场环境库，300+真实交易环境\n- **ElegantRL** - 轻量高效DRL库，FinRL的算法引擎\n\n#### 3. 学术研究导向\n- **FinRobot** - 四层架构，16个专业智能体分工\n- **DISC-FinLLM** - 复旦DISC团队，投研团队模拟（NLPCC 2025获奖）\n\n### 二、LLM+金融智能体框架\n\n#### 1. 通用金融LLM智能体\n- **FinGPT** - 金融大模型基座，支持多种下游任务\n\n#### 2. 专业领域智能体\n- **investor-agent** - MCP协议服务器，投资分析专用\n- **agentic-trading** - Google ADK演示，A2A互操作性\n- **FinGenius** - A股专用，博弈论决策机制\n\n### 三、量化交易+智能体集成框架\n\n#### 1. 成熟交易平台\n- **vnpy** - 全功能量化框架，模块化设计\n- **qlib** - 微软出品，AI量化投资平台\n- **backtrader** - 经典回测框架，支持策略开发\n\n#### 2. 专业工具集成\n- **panda_quantflow** - 可视化工作流，节点式编排\n- **FinceptTerminal** - CLI工具，技术/基本面/情绪分析\n\n### 四、基础模型与数据框架\n\n#### 1. 金融基础模型\n- **Kronos** - K线基础模型，45个交易所预训练\n- **FinCast-fts** - 时序预测基础模型，20B数据点\n\n### 五、特色项目与工具\n\n#### 1. 开发工具链\n- **awesome-quant** - 量化资源精选列表\n- **Lean** - QuantConnect算法交易引擎\n\n## 克隆失败的仓库 (11个)\n\n以下仓库在克隆时未找到或不存在：\n\n1. **TradingAgents-Lite** - github.com/TauricResearch/TradingAgents-Lite\n2. **HedgeAgents** - github.com/HedgeAgents/HedgeAgents\n3. **FinMem** - github.com/FinMem/FinMem\n4. **FinArena** - github.com/FinArena/FinArena\n5. **FinHEAR** - github.com/FinHEAR/FinHEAR\n6. **PulseReddit** - github.com/PulseReddit/PulseReddit\n7. **mbt_gym** - github.com/mbt_gym/mbt_gym\n8. **Agent-Trading-Arena** - github.com/Agent-Trading-Arena/Agent-Trading-Arena\n9. **AI-Hedge-Fund** - github.com/AI-Hedge-Fund/AI-Hedge-Fund\n10. **OpenBBTerminal** - github.com/OpenBB-finance/OpenBBTerminal (仓库过大，克隆超时)\n\n## 说明\n\n- 总计尝试克隆：31个仓库\n- 成功克隆：20个仓库\n- 失败原因：仓库不存在或已删除/重命名（10个），仓库过大超时（1个）\n- 克隆时间：2025年11月30日\n\n## 使用建议\n\n建议根据具体需求选择合适的框架：\n- **多智能体协作**：优先考虑 TradingAgents、FinRL、FinRobot\n- **中文支持**：TradingAgents-CN、FinGenius\n- **量化交易**：vnpy、qlib、backtrader\n- **学术研究**：FinRL、DISC-FinLLM、Kronos\n- **工具集成**：Lean、awesome-quant\n"
  },
  {
    "path": "thirdparty/TradingAgents-CN.md",
    "content": "# TradingAgents-CN - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project, **TradingAgents-CN**, is a comprehensive, multi-agent LLM financial trading framework. Its structure is highly modular, separating the core business logic (agents, data, graph) from the infrastructure (config, docker) and the user interface (web, cli).\n\nThe root directory `/home/ubuntu/TradingAgents-CN` contains:\n*   `tradingagents/`: The core Python package containing all the business logic. This is the heart of the application.\n*   `web/`: A Streamlit-based web application for user interaction, visualization, and management.\n*   `cli/`: Command-line interface tools for running the agents and managing the system.\n*   `config/`: Contains configuration files and default settings.\n*   `data/`: Storage for logs, cache, and other runtime data.\n*   `docker/`: Docker-related files for containerization and deployment.\n*   `tests/`: Unit and integration tests (excluded from analysis).\n*   `docs/`: Documentation files.\n\nThe core logic resides within the `tradingagents` package, which is further subdivided into critical modules:\n*   `tradingagents/agents`: Implements the various LLM-powered agents (Analyst, Researcher, Manager, Trader) and their associated state and utility classes.\n*   `tradingagents/dataflows`: Manages all data acquisition, caching, and external data provider interfaces, abstracting away the complexity of financial data sources (Tushare, yfinance, etc.).\n*   `tradingagents/config`: Centralized configuration management, including API keys, model pricing, and database connections.\n*   `tradingagents/graph`: Implements the multi-agent orchestration logic, likely using a state machine or graph-based framework like LangGraph to manage the flow of analysis and debate.\n*   `tradingagents/utils`: General utility functions, including logging, stock validation, and tool logging.\n\nThe separation of concerns is clearly enforced, with the `tradingagents` package providing the backend service, and `web` and `cli` acting as frontends.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/TradingAgents-CN/tradingagents/agents`: Core implementation of the multi-agent system, including analysts, researchers, managers, and the trader.\n*   `/home/ubuntu/TradingAgents-CN/tradingagents/dataflows`: Data abstraction layer, handling external data providers, caching, and data completeness checks.\n*   `/home/ubuntu/TradingAgents-CN/tradingagents/config`: Configuration management, model pricing, API key handling, and database connection setup.\n*   `/home/ubuntu/TradingAgents-CN/tradingagents/graph`: Orchestration logic for the multi-agent workflow, defining the state machine and conditional transitions.\n*   `/home/ubuntu/TradingAgents-CN/web`: The Streamlit-based web interface, including UI components, session management, and analysis runners.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## 1. tradingagents/agents (Multi-Agent System Core)\n\n**Core Responsibility**: Implements the various LLM-powered agents that perform financial analysis, debate, risk management, and trading. It defines the roles and the communication structure for the multi-agent system.\n\n**Key Files**:\n*   `agents/__init__.py`: Exports all agent creation functions and utility classes.\n*   `agents/analysts/`: Contains specialized analysts: `fundamentals_analyst.py`, `market_analyst.py`, `news_analyst.py`, `social_media_analyst.py`.\n*   `agents/researchers/`: Implements the `bull_researcher.py` and `bear_researcher.py` for investment debate.\n*   `agents/managers/`: Contains `research_manager.py` and `risk_manager.py` for high-level coordination.\n*   `agents/trader/trader.py`: The final decision-making agent responsible for generating trading signals.\n*   `agents/utils/agent_states.py`: Defines the core state structures (`AgentState`, `InvestDebateState`, `RiskDebateState`) used by the graph orchestrator.\n*   `agents/utils/toolkit.py`: Defines the `Toolkit` class, which bundles all available data access functions for the agents to use.\n\n**Core Implementation**:\nThe agents are implemented as functions (e.g., `create_china_market_analyst`) that return a node function for a graph-based workflow (likely LangGraph). Each agent function:\n1.  Defines a highly specialized **system message** (prompt) tailored to its role (e.g., \"专业的中国股市分析师\").\n2.  Selects a subset of **tools** from the global `Toolkit` relevant to its task.\n3.  Constructs a LangChain `ChatPromptTemplate` using `MessagesPlaceholder` to maintain conversation history.\n4.  The node function takes the `state` (an `AgentState` dictionary) as input, invokes the LLM with the prompt and tools, and updates the state with the result (e.g., `china_market_report`).\n\n**Dependencies**:\n*   **Internal**: Heavily depends on `tradingagents.dataflows.interface` for data access (via the `Toolkit`), `tradingagents.utils.logging_init` for logging, and `tradingagents.agents.utils.agent_states` for state management.\n*   **External**: `langchain_core.prompts`, `langchain_core.messages`, and the underlying LLM library (e.g., `openai`, `dashscope`).\n\n## 2. tradingagents/dataflows (Data Abstraction Layer)\n\n**Core Responsibility**: Provides a unified, cached interface for accessing diverse financial data sources, including market data, news, and fundamentals, with a strong focus on Chinese market data.\n\n**Key Files**:\n*   `dataflows/interface.py`: Defines the public API for data access (e.g., `get_china_stock_data_unified`, `get_YFin_data`).\n*   `dataflows/cache/`: Contains various caching implementations (`file_cache.py`, `db_cache.py`, `integrated.py`, `adaptive.py`). The `IntegratedCacheManager` provides a fallback mechanism (MongoDB/Redis -> File).\n*   `dataflows/providers/china/`: Specific providers for the Chinese market (`tushare.py`, `akshare.py`, `baostock.py`).\n*   `dataflows/providers/us/`: Providers for the US market (`yfinance.py`, `alpha_vantage_common.py`).\n*   `dataflows/data_source_manager.py`: Manages the selection and configuration of different data providers.\n\n**Core Implementation**:\nThe module employs the **Adapter Pattern** to standardize access to disparate data sources. The `interface.py` acts as the Facade, exposing simple functions that internally manage:\n1.  **Caching**: All data retrieval functions first check the `IntegratedCacheManager`.\n2.  **Provider Selection**: For Chinese data, it can switch between Tushare, AKShare, and BaoStock via `switch_china_data_source`.\n3.  **Data Standardization**: Raw data from providers is converted into a standardized format (often a string or a Pandas DataFrame) before being returned.\n\n**Dependencies**:\n*   **Internal**: `tradingagents.config.database_manager` for database connection details, `tradingagents.utils.logging_manager`.\n*   **External**: `pandas`, `yfinance`, `tushare`, `akshare`, `pymongo` (optional for DB cache).\n\n## 3. tradingagents/config (Configuration and Settings)\n\n**Core Responsibility**: Centralized management of application settings, API keys, LLM model configurations, and token usage tracking.\n\n**Key Files**:\n*   `config/config_manager.py`: The main class `ConfigManager` handles loading and saving configurations from JSON files or MongoDB. **Note**: This file is marked as `DEPRECATED`, suggesting a migration to a service-based approach.\n*   `config/database_manager.py`: Manages the connection to the MongoDB database for persistent storage of configuration and usage data.\n*   `config/usage_models.py`: Defines data classes (`ModelConfig`, `PricingConfig`, `UsageRecord`) for structured configuration data.\n*   `config/providers_config.py`: Contains specific configuration logic for different LLM providers (e.g., DashScope, OpenAI).\n\n**Core Implementation**:\nThe `ConfigManager` uses a **Fallback Strategy**: it attempts to load configuration from environment variables (`.env` file), then from local JSON files, and finally from a MongoDB instance if configured. It includes logic for:\n1.  **API Key Validation**: Specifically for OpenAI keys (`validate_openai_api_key_format`).\n2.  **Token Tracking**: The `TokenTracker` class records LLM usage based on `UsageRecord` objects, allowing for cost calculation using `PricingConfig`.\n\n**Dependencies**:\n*   **Internal**: `tradingagents.utils.logging_manager`.\n*   **External**: `dotenv`, `pymongo` (optional), `dataclasses`.\n\n## 4. tradingagents/graph (Agent Orchestration)\n\n**Core Responsibility**: Defines the state machine and conditional logic that orchestrates the flow of analysis and decision-making among the various agents.\n\n**Key Files**:\n*   `graph/trading_graph.py`: The main class `TradingAgentsGraph` that constructs and compiles the workflow graph.\n*   `graph/conditional_logic.py`: Contains the crucial conditional functions (e.g., `should_continue_market`, `should_continue_debate`) that determine the next node in the graph based on the current `AgentState`.\n*   `graph/setup.py`: Handles the initialization and setup of the graph, including adding nodes and edges.\n*   `graph/propagation.py`: Defines the `Propagator` class, which is responsible for passing the final analysis results to the `ResearchManager`.\n\n**Core Implementation**:\nThe module uses the **State Machine Pattern** (likely implemented with LangGraph) to manage the complex multi-step process:\n1.  **Analysis Phase**: Parallel execution of analysts (Market, News, Social, Fundamentals).\n2.  **Debate Phase**: Sequential, turn-based interaction between `BullResearcher` and `BearResearcher` (controlled by `should_continue_debate`).\n3.  **Management Phase**: `ResearchManager` synthesizes the debate and analysis reports.\n4.  **Risk Phase**: `RiskManager` and debators (`RiskyDebator`, `SafeDebator`) discuss the final recommendation.\n5.  **Trading Phase**: `Trader` makes the final decision.\n\nThe `ConditionalLogic` class is critical for preventing infinite loops in the LLM tool-calling process by checking tool call counts and report completeness.\n\n**Dependencies**:\n*   **Internal**: `tradingagents.agents.utils.agent_states`, all agent creation functions from `tradingagents.agents`.\n*   **External**: `langgraph`.\n\n## 5. web (Web Interface)\n\n**Core Responsibility**: Provides a user-friendly Streamlit web interface for configuring the system, running analyses, and viewing results and logs.\n\n**Key Files**:\n*   `web/app.py`: The main entry point for the Streamlit application.\n*   `web/run_web.py`: Script to launch the web application.\n*   `web/components/`: Contains reusable Streamlit components (`analysis_form.py`, `analysis_results.py`, `sidebar.py`).\n*   `web/utils/analysis_runner.py`: Handles the asynchronous execution of the `TradingAgentsGraph` workflow.\n*   `web/utils/auth_manager.py`: Manages user authentication and session state.\n*   `web/utils/progress_tracker.py`: Tracks the progress of the long-running analysis workflow and updates the UI.\n\n**Core Implementation**:\nThe web interface is built on Streamlit, leveraging its reactive nature and component model. The `analysis_runner.py` is key, as it encapsulates the execution of the core business logic (`TradingAgentsGraph`) in a separate thread or process to prevent the UI from blocking. Session management and persistence are handled by classes like `FileSessionManager` and `RedisSessionManager` (in `web/utils/`).\n\n**Dependencies**:\n*   **Internal**: `tradingagents.config.config_manager`, `tradingagents.graph.trading_graph`.\n*   **External**: `streamlit`, `redis` (optional).\n\n### Module PlantUML Diagrams\n\n## 1. tradingagents/agents\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"Agents Module\" {\n    abstract class BaseAgent {\n        + llm\n        + toolkit\n        + create_node()\n    }\n\n    class Toolkit {\n        + get_china_stock_data()\n        + get_YFin_data()\n        + get_finnhub_news()\n        + ... (many data access methods)\n    }\n\n    class AgentState <<TypedDict>> {\n        + messages: list\n        + company_of_interest: str\n        + trade_date: str\n        + market_report: str\n        + sentiment_report: str\n        + ... (all reports)\n    }\n\n    class ChinaMarketAnalyst\n    class FundamentalsAnalyst\n    class NewsAnalyst\n    class SocialMediaAnalyst\n\n    class BullResearcher\n    class BearResearcher\n\n    class ResearchManager\n    class RiskManager\n\n    class Trader\n\n    BaseAgent <|-- ChinaMarketAnalyst\n    BaseAgent <|-- FundamentalsAnalyst\n    BaseAgent <|-- NewsAnalyst\n    BaseAgent <|-- SocialMediaAnalyst\n    BaseAgent <|-- BullResearcher\n    BaseAgent <|-- BearResearcher\n    BaseAgent <|-- ResearchManager\n    BaseAgent <|-- RiskManager\n    BaseAgent <|-- Trader\n\n    ChinaMarketAnalyst ..> Toolkit : uses\n    FundamentalsAnalyst ..> Toolkit : uses\n    ResearchManager ..> AgentState : reads/writes\n    Trader ..> AgentState : reads/writes\n}\n@enduml\n```\n\n## 2. tradingagents/dataflows\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"Dataflows Module\" {\n    interface DataInterface {\n        + get_china_stock_data_unified()\n        + get_YFin_data()\n        + get_finnhub_news()\n        + ...\n    }\n\n    abstract class BaseStockDataProvider {\n        + get_stock_data()\n        + get_fundamentals()\n    }\n\n    package \"Providers\" {\n        class TushareProvider\n        class AKShareProvider\n        class YFinanceUtils\n        class AlphaVantageCommon\n    }\n\n    package \"Cache\" {\n        class StockDataCache <<File Cache>>\n        class DatabaseCacheManager\n        class IntegratedCacheManager {\n            - primary_backend\n            - fallback_backend\n            + get_data()\n            + set_data()\n        }\n    }\n\n    class DataSourceManager {\n        + switch_china_data_source()\n        + get_provider()\n    }\n\n    DataInterface .> IntegratedCacheManager : uses\n    DataInterface .> DataSourceManager : uses\n\n    BaseStockDataProvider <|-- TushareProvider\n    BaseStockDataProvider <|-- AKShareProvider\n    BaseStockDataProvider <|-- YFinanceUtils\n\n    IntegratedCacheManager o-- StockDataCache : fallback\n    IntegratedCacheManager o-- DatabaseCacheManager : primary\n}\n@enduml\n```\n\n## 3. tradingagents/config\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"Config Module\" {\n    class ConfigManager <<Deprecated>> {\n        - models_file: Path\n        - pricing_file: Path\n        - mongodb_storage: MongoDBStorage\n        + load_models()\n        + get_api_key()\n        + validate_openai_api_key_format()\n    }\n\n    class TokenTracker {\n        + record_usage(record: UsageRecord)\n        + calculate_cost()\n    }\n\n    class DatabaseManager {\n        + get_client()\n        + get_database()\n    }\n\n    class ModelConfig <<DataClass>>\n    class PricingConfig <<DataClass>>\n    class UsageRecord <<DataClass>>\n\n    ConfigManager .> DatabaseManager : uses\n    TokenTracker .> PricingConfig : uses\n    TokenTracker .> UsageRecord : stores\n}\n@enduml\n```\n\n## 4. tradingagents/graph\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"Graph Module\" {\n    class TradingAgentsGraph {\n        + build_graph()\n        + compile()\n    }\n\n    class ConditionalLogic {\n        + should_continue_market()\n        + should_continue_debate()\n        + should_continue_risk_analysis()\n    }\n\n    class GraphSetup {\n        + add_nodes()\n        + add_edges()\n    }\n\n    class Propagator {\n        + propagate_analysis_results()\n    }\n\n    class AgentState <<from agents>>\n\n    TradingAgentsGraph o-- ConditionalLogic : uses\n    TradingAgentsGraph o-- GraphSetup : uses\n    ConditionalLogic .> AgentState : reads\n    Propagator .> AgentState : writes\n}\n@enduml\n```\n\n## 5. web\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"Web Module\" {\n    class StreamlitApp <<Main App>>\n    class AnalysisRunner {\n        + run_analysis_async()\n        - execute_graph(graph)\n    }\n\n    class ProgressTracker {\n        + update_progress()\n        + get_status()\n    }\n\n    class AuthManager {\n        + login()\n        + is_logged_in()\n        + logout()\n    }\n\n    class FileSessionManager\n    class RedisSessionManager\n\n    StreamlitApp o-- AnalysisRunner : triggers\n    AnalysisRunner .> ProgressTracker : updates\n    StreamlitApp o-- AuthManager : uses\n    AuthManager .> FileSessionManager : uses\n    AuthManager .> RedisSessionManager : uses\n}\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe **TradingAgents-CN** project is built upon a robust **Multi-Agent System (MAS) architecture**, leveraging the **LangGraph** framework for orchestration. The core design philosophy is centered on **Role-Based Specialization** and **Data Abstraction**, mirroring the structure of a professional financial research firm.\n\n**Core Abstractions**:\n1.  **Agent (Node)**: Each agent (e.g., `ChinaMarketAnalyst`, `BullResearcher`, `Trader`) is an independent, specialized component implemented as a LangGraph node. This enforces the **Single Responsibility Principle (SRP)**, making each agent's prompt and toolset highly focused.\n2.  **State (`AgentState`)**: A central, mutable data structure (`TypedDict`) that holds the collective knowledge and progress of the entire workflow. It acts as the shared memory and communication channel between all agents. This is a critical abstraction for the state machine pattern.\n3.  **Toolkit**: A collection of standardized data access functions that are passed to the agents. This abstraction decouples the agent's reasoning logic from the complexity of data retrieval and caching.\n4.  **Data Provider/Interface**: The `dataflows` module abstracts external financial APIs (Tushare, yfinance) behind a unified interface, allowing the core system to be agnostic to the underlying data source.\n\n**Design Philosophy**:\n*   **Modularity and Decoupling**: The system is cleanly separated into five major modules (`agents`, `dataflows`, `config`, `graph`, `web`), minimizing cross-module dependencies and facilitating maintenance and scaling.\n*   **State Machine Orchestration**: The use of LangGraph in the `graph` module provides a deterministic, traceable, and conditional execution flow, which is essential for complex, multi-step decision-making processes like financial analysis and debate.\n*   **Chinese Market Focus**: The project is explicitly tailored for the Chinese market, with specialized agents (`ChinaMarketAnalyst`) and data providers (`TushareProvider`, `AKShareProvider`), which is a key differentiator from its original counterpart.\n*   **Caching and Fallback**: The `IntegratedCacheManager` in `dataflows` ensures performance and resilience by providing multiple layers of caching (MongoDB/Redis, File) and a graceful fallback mechanism when primary data sources fail.\n\n**Lifecycle Management**:\nThe lifecycle of a single analysis run is managed by the `TradingAgentsGraph`:\n1.  **Initialization**: The `AgentState` is initialized with the target stock and date.\n2.  **Parallel Analysis**: Multiple analysts run concurrently, fetching data via the `Toolkit` and writing their reports to the `AgentState`.\n3.  **Sequential Debate**: The `BullResearcher` and `BearResearcher` engage in a turn-based debate, with the `ConditionalLogic` controlling the flow until a consensus or maximum rounds are reached.\n4.  **Synthesis and Decision**: The `ResearchManager` and `RiskManager` synthesize the reports and debate outcomes, leading to the final trading decision by the `Trader`.\n5.  **Termination**: The graph terminates, and the final decision is propagated to the `web` module for display.\n\n#### 3.1.2. Component Interactions\n\nThe system's communication is primarily mediated by the central **AgentState** object, following a **Shared State/Blackboard Pattern**.\n\n**Key Interaction Flows**:\n\n1.  **Agent-to-Data Interaction (Data Flow)**:\n    *   **Agent**: An agent (e.g., `FundamentalsAnalyst`) needs data.\n    *   **Communication**: The agent calls a function from the injected `Toolkit` (e.g., `toolkit.get_china_stock_data`).\n    *   **Data Flow**: `Toolkit` -> `dataflows.interface` -> `IntegratedCacheManager` (Cache Check) -> `DataSourceManager` (Provider Selection) -> External API (e.g., Tushare).\n    *   **Response**: The data is returned to the agent, which then uses it to generate its report.\n\n2.  **Agent-to-Agent Interaction (Orchestration Flow)**:\n    *   **Communication**: Agents communicate indirectly by reading from and writing to the shared `AgentState`.\n    *   **Example (Debate)**:\n        *   `BullResearcher` reads the initial `AgentState` and writes its bullish argument to the state's message history.\n        *   The `ConditionalLogic` checks the state and routes the flow to the `BearResearcher`.\n        *   `BearResearcher` reads the state (including the bullish argument) and writes its bearish counter-argument.\n        *   This loop continues until the `ConditionalLogic` determines the debate is complete, then routes to the `ResearchManager`.\n\n3.  **Frontend-to-Backend Interaction**:\n    *   **Frontend (`web` module)**: The user submits a request via the Streamlit UI.\n    *   **Communication**: The `AnalysisRunner` in the `web` module asynchronously executes the `TradingAgentsGraph`.\n    *   **Data Flow**: The `ProgressTracker` (in `web/utils`) monitors the graph's execution state and updates the UI in real-time, providing feedback to the user.\n\n**Communication Patterns**:\n*   **Shared State (Blackboard)**: The primary pattern for agent-to-agent communication, enabling asynchronous and decoupled interactions.\n*   **Facade/Adapter**: Used in the `dataflows` module to simplify and standardize access to complex external data sources.\n*   **Tool-Use/Function Calling**: The LLM agents use the `Toolkit` functions via the LLM's function-calling capability, allowing the model to decide when and how to retrieve data.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\nskinparam handwritten true\nskinparam classAttributeIconVisible false\n\ntitle TradingAgents-CN Overall Architecture\n\nrectangle \"External Financial APIs\" as APIs\n\npackage \"tradingagents\" {\n    package \"Dataflows (Data Abstraction)\" as Dataflows {\n        interface DataInterface\n        class IntegratedCacheManager\n        class DataSourceManager\n        DataInterface .> IntegratedCacheManager\n        DataInterface .> DataSourceManager\n    }\n\n    package \"Config (Settings & Pricing)\" as Config {\n        class ConfigManager\n        class TokenTracker\n    }\n\n    package \"Agents (LLM Nodes)\" as Agents {\n        class Toolkit\n        class Analysts\n        class Researchers\n        class Managers\n        class Trader\n        Toolkit .> Dataflows : uses\n    }\n\n    package \"Graph (Orchestration)\" as Graph {\n        class TradingAgentsGraph\n        class ConditionalLogic\n        class AgentState <<Shared State>>\n        TradingAgentsGraph o-- ConditionalLogic\n        TradingAgentsGraph o-- AgentState\n        Analysts -> AgentState : writes report\n        Researchers -> AgentState : writes debate\n        Managers -> AgentState : reads/writes\n        Trader -> AgentState : reads/writes\n    }\n\n    Dataflows .> APIs : fetches data\n    Agents .> Toolkit : calls tools\n    Agents .> Config : reads LLM settings\n    Graph .> Agents : orchestrates\n}\n\npackage \"Web (Streamlit Frontend)\" as Web {\n    class StreamlitApp\n    class AnalysisRunner\n    StreamlitApp -> AnalysisRunner : triggers\n    AnalysisRunner -> Graph : executes\n}\n\nWeb .> Graph : monitors progress\nConfig .> APIs : stores API keys\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe codebase effectively utilizes several established design patterns to manage complexity and promote maintainability.\n\n| Design Pattern | Description | Specific Implementation in TradingAgents-CN |\n| :--- | :--- | :--- |\n| **State Machine** | An object whose behavior is determined by its state, transitioning between states based on inputs. | Implemented using **LangGraph** in `tradingagents/graph`. The `TradingAgentsGraph` defines the nodes (agents) and edges (transitions), with `ConditionalLogic` handling the state-based routing (e.g., continuing a debate or moving to the next phase). |\n| **Adapter** | Converts the interface of a class into another interface clients expect. | The `dataflows` module uses this pattern extensively. `TushareProvider`, `AKShareProvider`, and `YFinanceUtils` all adapt their respective external APIs to conform to the internal `BaseStockDataProvider` interface. |\n| **Facade** | Provides a unified interface to a set of interfaces in a subsystem. | `dataflows/interface.py` acts as a facade, providing simple, high-level functions (e.g., `get_china_stock_data_unified`) that hide the complexity of caching, provider selection, and API calls. |\n| **Strategy** | Defines a family of algorithms, encapsulates each one, and makes them interchangeable. | The caching system in `dataflows/cache` implements this. The `IntegratedCacheManager` can switch between `FileCache`, `DatabaseCacheManager`, and `AdaptiveCacheSystem` based on configuration (`TA_CACHE_STRATEGY`). |\n| **Singleton** | Ensures a class has only one instance and provides a global point of access to it. | Implicitly used for core managers like `ConfigManager` and `DataSourceManager`, which are often initialized once and accessed globally via helper functions (e.g., `get_data_source_manager()`). |\n\n#### 3.3.2. Project Highlights\n\nThe project demonstrates several innovative and well-designed features that enhance its functionality, extensibility, and flexibility.\n\n*   **Advanced Multi-Agent Orchestration**: The use of LangGraph for a complex financial workflow is a significant highlight. The system goes beyond simple sequential processing by incorporating **parallel analysis**, **turn-based debate** (`BullResearcher` vs. `BearResearcher`), and **conditional routing** based on the state. This sophisticated orchestration allows for more nuanced and robust decision-making.\n*   **Comprehensive Data Abstraction and Resilience**: The `dataflows` module is exceptionally well-structured.\n    *   **Unified Interface**: Standardizes access across diverse data sources (China, US, HK).\n    *   **Integrated Caching**: The multi-layered caching system (`IntegratedCacheManager`) with file and database fallbacks ensures high performance and resilience against temporary API outages.\n    *   **Chinese Market Specialization**: The inclusion of multiple Chinese data providers (Tushare, AKShare, BaoStock) and the ability to switch between them is crucial for a China-focused application.\n*   **Extensible LLM Provider Support**: The `llm_adapters` module is designed for easy integration of new LLMs (DashScope, DeepSeek, Google, OpenAI). The use of a base compatible class (`openai_compatible_base.py`) ensures that new providers can be added with minimal changes to the core agent logic.\n*   **Detailed Configuration and Cost Tracking**: The `config` module's ability to track token usage and calculate costs based on model-specific pricing (`PricingConfig`) is a valuable feature for managing the operational expenses of an LLM-intensive application.\n*   **User-Friendly Web Interface**: The Streamlit-based `web` module provides a clean, interactive front-end for running analyses, which is essential for usability. The `AnalysisRunner` and `ProgressTracker` effectively handle the long-running nature of the agent workflow, providing a good user experience.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe architecture is robust, but several areas can be optimized for performance, maintainability, and future-proofing.\n\n## Performance Bottlenecks and Optimization\n1.  **Graph Execution Time**: The sequential nature of the debate phase (`BullResearcher` vs. `BearResearcher`) and the risk discussion phase is a major time sink, as each turn requires a full LLM call.\n    *   **Suggestion**: Implement a **\"Fast-Track\" Conditional Logic** where the debate is skipped or shortened if the initial analysis reports (Market, Fundamentals, News) are highly consistent or if the stock is deemed low-risk early on.\n2.  **Deprecated Config Manager**: The `ConfigManager` is marked as deprecated, but its usage persists. This creates technical debt and potential for configuration conflicts (JSON vs. MongoDB).\n    *   **Suggestion**: Complete the migration to the new service-based configuration system (`app.services.config_service.ConfigService`) and fully remove the deprecated `tradingagents/config` module to simplify the configuration loading process and centralize state management.\n3.  **Data Serialization Overhead**: The `AgentState` is a large, shared object that is passed between all nodes. If the underlying LangGraph implementation serializes and deserializes this state for each transition, the overhead can become significant, especially with large reports.\n    *   **Suggestion**: Investigate using a more efficient, in-memory state management solution (e.g., Redis) for the `AgentState` during a single graph run, or use LangGraph's built-in memory management features to optimize state persistence.\n\n## Architecture Optimization and Code Quality\n1.  **Toolkit Decoupling**: The `Toolkit` class is a monolithic collection of all data access functions. While convenient, it violates the **Interface Segregation Principle (ISP)**, as every agent receives a massive toolkit, most of which it does not need.\n    *   **Suggestion**: Refactor the `Toolkit` into smaller, role-specific interfaces (e.g., `MarketDataToolkit`, `NewsDataToolkit`, `FundamentalsToolkit`). Each agent should only be injected with the specific toolkit it requires.\n2.  **Explicit Data Models**: The system heavily relies on passing strings and dictionaries (e.g., reports, dataframes serialized as strings) within the `AgentState`. This is error-prone and lacks type safety.\n    *   **Suggestion**: Introduce explicit Pydantic models for all data structures passed between agents (e.g., `MarketReportModel`, `DebateSummaryModel`). This will improve code quality, enable better validation, and simplify the logic in the `ResearchManager` and `RiskManager`.\n3.  **Unified Logging**: While a unified logging system exists, the conditional logic functions in `graph/conditional_logic.py` are heavily polluted with detailed logging for debugging deadlocks.\n    *   **Suggestion**: Abstract the deadlock detection and logging into a dedicated `GraphMonitor` or `DebugMiddleware` class, keeping the `ConditionalLogic` clean and focused solely on the transition logic.\n\n#### 3.4.2. Secondary Development Guide\n\nThis guide outlines the best path for developers looking to explore the codebase or extend its functionality.\n\n## 1. Code Exploration Path\nThe project's complexity is best understood by following the execution flow from the entry point to the core logic.\n\n| Step | Module to Explore | Key Files | Focus |\n| :--- | :--- | :--- | :--- |\n| **1. Entry Point** | `web` or `cli` | `web/app.py`, `cli/main.py` | Understand how the application starts and how user input is collected. |\n| **2. Orchestration** | `graph` | `graph/trading_graph.py`, `graph/conditional_logic.py` | This is the **most critical** module. Trace the `TradingAgentsGraph` definition to see the sequence of analysis, debate, and decision-making. |\n| **3. Agent Logic** | `agents` | `agents/analysts/`, `agents/researchers/` | Select an agent (e.g., `fundamentals_analyst.py`) to see how the system prompt is constructed, which tools are used, and how the report is generated. |\n| **4. Data Access** | `dataflows` | `dataflows/interface.py`, `dataflows/cache/integrated.py` | Examine the `Toolkit` and trace a data call (e.g., `get_china_stock_data_unified`) to understand the caching and provider abstraction layers. |\n| **5. Configuration** | `config` | `config/config_manager.py`, `config/usage_models.py` | Understand how API keys and model pricing are loaded and managed. |\n\n## 2. Extending Functionality\n\n### A. Adding a New Agent Role\n1.  **Create Agent File**: Create a new file in `tradingagents/agents/` (e.g., `macro_analyst.py`).\n2.  **Define Logic**: Implement a function (e.g., `create_macro_analyst`) that defines the system prompt, selects necessary tools from the `Toolkit`, and returns a LangGraph node function.\n3.  **Update Graph**: Modify `tradingagents/graph/trading_graph.py` to add the new agent as a node and define its edges (e.g., run it in parallel with other analysts).\n4.  **Update State**: If the new agent generates a new type of report, update the `AgentState` in `tradingagents/agents/utils/agent_states.py` to include a field for the new report.\n\n### B. Integrating a New Data Source\n1.  **Create Provider**: Create a new file in `tradingagents/dataflows/providers/` (e.g., `europe/eurostat_provider.py`) that inherits from `BaseStockDataProvider` and implements the required data retrieval methods.\n2.  **Update Manager**: Modify `dataflows/data_source_manager.py` to register the new provider.\n3.  **Update Interface**: Add a new function to `dataflows/interface.py` to expose the new data source via the unified interface.\n4.  **Update Toolkit**: Add the new interface function to the `Toolkit` class in `tradingagents/agents/utils/toolkit.py` so agents can access it.\n\n### C. Modifying the Workflow\nTo change the decision-making process (e.g., adding a pre-screening step), modify the `TradingAgentsGraph` in `tradingagents/graph/trading_graph.py` and adjust the transition logic in `tradingagents/graph/conditional_logic.py`. This is the central control point for the entire system.\n\n"
  },
  {
    "path": "thirdparty/TradingAgents.md",
    "content": "# TradingAgents - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project is structured to support a multi-agent system for financial trading, built primarily on Python and the LangGraph framework. The core logic is cleanly separated into three main components: `agents`, `dataflows`, and `graph`, reflecting a clear separation of concerns between the cognitive layer, the data access layer, and the orchestration layer.\n\n```\n/home/ubuntu/TradingAgents\n|-- cli/\n|   |-- main.py             # Command-line interface entry point for running the agent system.\n|   |-- models.py           # Data models for CLI arguments (e.g., Pydantic models).\n|   |-- utils.py            # Utility functions for the CLI.\n|-- main.py                 # Main entry point for the core application logic (likely for non-CLI use).\n|-- tradingagents/          # Core package for the multi-agent framework\n|   |-- agents/             # The \"Brain\" - Defines all LLM-powered agents and their logic.\n|   |   |-- analysts/       # Specialized agents for data analysis (Market, News, Fundamentals).\n|   |   |-- managers/       # Agents responsible for managing the debate flow (Research, Risk).\n|   |   |-- researchers/    # Agents for the investment debate (Bull, Bear).\n|   |   |-- risk_mgmt/      # Agents for the risk debate (Risky, Safe, Neutral).\n|   |   |-- trader/         # The final decision-making agent.\n|   |   |-- utils/          # Agent-specific utilities, state definitions, and memory (ChromaDB).\n|   |-- dataflows/          # The \"Data Layer\" - Handles all external API interactions and data fetching.\n|   |   |-- alpha_vantage/  # Specific implementations for Alpha Vantage data access.\n|   |   |-- y_finance.py    # Implementation for Yahoo Finance data access.\n|   |   |-- interface.py    # Core abstraction layer for routing data requests to vendors.\n|   |   |-- config.py       # Configuration management for data vendors.\n|   |-- graph/              # The \"Orchestration Layer\" - Implements the multi-agent workflow using LangGraph.\n|   |   |-- conditional_logic.py # Defines state-based routing logic for the graph.\n|   |   |-- propagation.py  # Handles state initialization and passing data between nodes.\n|   |   |-- reflection.py   # Logic for post-trade reflection and memory update.\n|   |   |-- trading_graph.py# Main class to build and run the LangGraph state machine.\n|   |-- default_config.py   # Default settings for the entire framework.\n```\n\nThe `tradingagents` directory is the heart of the project, containing the three main modules: `agents` (the cognitive layer), `dataflows` (the data layer), and `graph` (the orchestration layer). The `cli` directory provides a command-line interface for interacting with the core framework. This structure is highly modular, facilitating maintenance and extension.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/TradingAgents/tradingagents/agents`: Contains the implementation of all specialized LLM agents, including analysts, researchers, risk debators, and the final trader. This module is responsible for the decision-making logic and memory management.\n*   `/home/ubuntu/TradingAgents/tradingagents/dataflows`: Contains the data abstraction layer, which handles fetching financial and news data from various external vendors (e.g., Alpha Vantage, Yahoo Finance). It provides a unified toolset for the agents.\n*   `/home/ubuntu/TradingAgents/tradingagents/graph`: Contains the LangGraph-based orchestration logic, defining the state machine, conditional transitions, and the overall flow of the multi-agent system.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n### Module: `tradingagents/agents`\n**Core Responsibility:** To implement the specialized roles of the multi-agent system, process reports, engage in structured debates, and make final trading decisions. It acts as the **cognitive layer** of the framework.\n\n**File Enumeration:**\n*   `agents/utils/agent_states.py`: Defines `AgentState` (the global state), `InvestDebateState`, and `RiskDebateState` using `TypedDict` for LangGraph.\n*   `agents/utils/memory.py`: Implements `FinancialSituationMemory` using `chromadb` for vector-based retrieval of past trading lessons.\n*   `agents/analysts/*`: Implements four key analysts (`fundamentals`, `market`, `news`, `social_media`) using LangChain's `ChatPromptTemplate` and tool-binding to generate comprehensive reports.\n*   `agents/researchers/*`: Implements the `bull_researcher` and `bear_researcher` for the investment debate.\n*   `agents/risk_mgmt/*`: Implements the three risk debators (`aggresive_debator`, `conservative_debator`, `neutral_debator`).\n*   `agents/trader/trader.py`: Implements the final `trader_node` which uses the accumulated reports and memory to output a `FINAL TRANSACTION PROPOSAL: **BUY/HOLD/SELL**`.\n\n**Implementation Details:**\nEach agent is implemented as a function (e.g., `create_fundamentals_analyst(llm)`) that returns a LangGraph node function. This node function takes the current `state` as input, uses an LLM with a highly specialized system prompt and bound tools, and returns an updated `state` dictionary. The use of `FinancialSituationMemory` is a key feature, allowing agents to learn from past, similar market situations via a Retrieval-Augmented Generation (RAG) approach. The agents' system prompts are highly detailed, guiding the LLM to act in a specific role (e.g., \"You are a Bull Analyst advocating for investing in the stock\") and to use the provided data reports to build their arguments.\n\n### Module: `tradingagents/dataflows`\n**Core Responsibility:** To abstract and manage all external data sources, providing a consistent, tool-friendly interface for the agents. It is the **data access layer**.\n\n**File Enumeration:**\n*   `dataflows/interface.py`: Contains the crucial `route_to_vendor` function, which dynamically selects the correct data fetching function based on the configured vendor for a given data type (e.g., `core_stock_apis`, `fundamental_data`).\n*   `dataflows/config.py`: Manages the runtime configuration, allowing users to switch data vendors without changing agent code.\n*   `dataflows/alpha_vantage_common.py`: Provides low-level utilities like `_make_api_request` and error handling (`AlphaVantageRateLimitError`).\n*   `dataflows/alpha_vantage_*.py`: Specific implementations for fetching stock, indicator, fundamental, and news data from Alpha Vantage.\n\n**Implementation Details:**\nThe module employs a **Strategy Pattern** via the `route_to_vendor` function. This decouples the data request from the data source implementation. For example, `get_stock_data` in `agents/utils/core_stock_tools.py` calls `route_to_vendor(\"get_stock_data\", ...)` which then routes the call to the appropriate vendor-specific function (e.g., `alpha_vantage_stock.get_stock`). This design ensures the agent logic remains clean and vendor-agnostic. The Alpha Vantage implementations handle API key retrieval, request formatting, and response parsing (including CSV filtering via Pandas).\n\n### Module: `tradingagents/graph`\n**Core Responsibility:** To define the sequential and conditional workflow of the multi-agent system, ensuring a structured, multi-step decision-making process. It is the **orchestration layer**.\n\n**File Enumeration:**\n*   `graph/trading_graph.py`: The main class `TradingAgentsGraph` that builds the LangGraph `StateGraph`.\n*   `graph/conditional_logic.py`: Implements the conditional routing functions (e.g., `should_continue_debate`, `should_continue_risk_analysis`) that determine the next node in the graph based on the current state (e.g., number of debate rounds completed, presence of tool calls).\n*   `graph/propagation.py`: Contains `Propagator` class for state initialization (`create_initial_state`) and managing graph arguments.\n*   `graph/reflection.py`: Contains `Reflector` class for post-trade analysis, generating reflections using an LLM, and updating the `FinancialSituationMemory`.\n\n**Implementation Details:**\nThe system uses a **State Machine** architecture implemented with LangGraph. The flow is highly structured: it begins with a research phase (tool-call loops), moves to a two-party investment debate (Bull/Bear), then to a Trader decision, followed by a three-party risk debate (Risky/Safe/Neutral), and concludes with a final decision and a reflection phase to update the memory. The `ConditionalLogic` is critical, using counters (`count`) and the last speaker (`latest_speaker`) to manage the turn-based, fixed-round debates.\n\n### Module PlantUML Diagrams\n\n### Module: `tradingagents/agents`\n\n```plantuml\n@startuml\nskinparam linetype polyline\nskinparam linetype ortho\n\npackage \"Agents Module\" {\n    \n    class AgentState <<TypedDict>> {\n        + company_of_interest: str\n        + trade_date: str\n        + market_report: str\n        + fundamentals_report: str\n        + investment_debate_state: InvestDebateState\n        + risk_debate_state: RiskDebateState\n        ...\n    }\n    \n    class InvestDebateState <<TypedDict>> {\n        + history: str\n        + count: int\n        ...\n    }\n    \n    class RiskDebateState <<TypedDict>> {\n        + history: str\n        + latest_speaker: str\n        + count: int\n        ...\n    }\n    \n    class FinancialSituationMemory <<ChromaDB/OpenAI>> {\n        + __init__(name, config)\n        + get_embedding(text)\n        + add_situations(situations_and_advice)\n        + get_memories(current_situation)\n    }\n    \n    abstract class AnalystAgent {\n        + create_agent(llm)\n        --\n        - system_message: str\n        - tools: list\n    }\n    \n    AnalystAgent <|-- FundamentalsAnalyst\n    AnalystAgent <|-- MarketAnalyst\n    AnalystAgent <|-- NewsAnalyst\n    AnalystAgent <|-- SocialMediaAnalyst\n    \n    abstract class DebatorAgent {\n        + create_debator(llm, memory)\n        --\n        - prompt: str (contextual)\n    }\n    \n    DebatorAgent <|-- BullResearcher\n    DebatorAgent <|-- BearResearcher\n    DebatorAgent <|-- RiskyAnalyst\n    DebatorAgent <|-- SafeAnalyst\n    DebatorAgent <|-- NeutralAnalyst\n    \n    class Trader {\n        + create_trader(llm, memory)\n    }\n    \n    ' Relationships\n    AnalystAgent ..> AgentState : updates reports\n    DebatorAgent ..> AgentState : reads reports, updates debate state\n    Trader ..> AgentState : reads reports, updates plan\n    \n    BullResearcher ..> FinancialSituationMemory : uses memory\n    BearResearcher ..> FinancialSituationMemory : uses memory\n    Trader ..> FinancialSituationMemory : uses memory\n    \n    AgentState *-- InvestDebateState\n    AgentState *-- RiskDebateState\n}\n@enduml\n```\n\n### Module: `tradingagents/dataflows`\n\n```plantuml\n@startuml\nskinparam linetype polyline\nskinparam linetype ortho\n\npackage \"DataFlows Module\" {\n    \n    class Config {\n        + initialize_config()\n        + set_config(config)\n        + get_config()\n    }\n    \n    class Interface <<route_to_vendor>> {\n        + route_to_vendor(function_name, *args)\n    }\n    \n    abstract class DataVendor {\n        + get_stock_data(...)\n        + get_indicators(...)\n        + get_fundamentals(...)\n        ...\n    }\n    \n    class AlphaVantageVendor {\n        - _make_api_request(...)\n        - _filter_csv_by_date_range(...)\n    }\n    \n    class YFinanceVendor {\n        + get_stock_data(...)\n    }\n    \n    Interface --> Config : reads vendor config\n    Interface .up.> DataVendor : dynamic dispatch\n    \n    DataVendor <|-- AlphaVantageVendor\n    DataVendor <|-- YFinanceVendor\n    \n    AlphaVantageVendor ..> AlphaVantageCommon : uses helpers\n    \n    package \"AlphaVantage Helpers\" {\n        class AlphaVantageCommon {\n            + _make_api_request()\n            + _filter_csv_by_date_range()\n        }\n    }\n}\n@enduml\n```\n\n### Module: `tradingagents/graph`\n\n```plantuml\n@startuml\nskinparam linetype polyline\nskinparam linetype ortho\n\npackage \"Graph Module\" {\n    \n    class TradingAgentsGraph {\n        + __init__(llm, memory, ...)\n        + build_graph()\n        + run_graph(company, date)\n        --\n        - graph: StateGraph\n    }\n    \n    class ConditionalLogic {\n        + should_continue_market(state)\n        + should_continue_debate(state)\n        + should_continue_risk_analysis(state)\n    }\n    \n    class Propagator {\n        + create_initial_state(company, date)\n    }\n    \n    class Reflector {\n        + __init__(quick_thinking_llm)\n        + reflect_bull_researcher(...)\n        + reflect_trader(...)\n        + _get_reflection_prompt()\n    }\n    \n    TradingAgentsGraph *-- ConditionalLogic : uses for edges\n    TradingAgentsGraph *-- Propagator : uses for setup\n    TradingAgentsGraph *-- Reflector : uses for final node\n    \n    Reflector ..> FinancialSituationMemory : updates memory\n    \n    note right of TradingAgentsGraph\n        Built using LangGraph\n        Nodes: Analysts, Debators, Trader, Managers, Tools\n        Edges: ConditionalLogic\n    end note\n}\n@enduml\n```,project_highlights:\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe **TradingAgents** framework is built upon three core abstractions that define its architecture: **Specialized Agents**, **State Machine Orchestration**, and **Vendor-Agnostic Data Access**.\n\n1.  **Specialized Agents (Cognitive Layer):**\n    *   **Abstraction:** Each component of a traditional trading firm (analyst, researcher, risk manager, trader) is abstracted into a dedicated, LLM-powered agent.\n    *   **Design Philosophy:** The system adopts a **Role-Based Multi-Agent System** philosophy, where complex tasks are broken down into sub-tasks handled by experts. This mimics real-world organizational structure, leading to more robust and less hallucinated outputs.\n    *   **Implementation:** Agents are implemented as LangGraph nodes, each with a highly specific system prompt and a limited set of bound tools. For example, the `MarketAnalyst` is only given tools for technical indicators and stock data, ensuring focus.\n\n2.  **State Machine Orchestration (Graph Layer):**\n    *   **Abstraction:** The entire trading decision process is abstracted as a **Finite State Machine (FSM)** using the LangGraph library.\n    *   **Design Philosophy:** This ensures a structured, non-linear, and auditable workflow. The process is divided into distinct phases: **Research**, **Investment Debate**, **Trader Decision**, and **Risk Debate**. Conditional logic dictates the flow, allowing for dynamic looping (e.g., continuing a debate until a round limit is reached) and tool-call resolution.\n    *   **Lifecycle Management:** The `AgentState` (`tradingagents/agents/utils/agent_states.py`) acts as the single source of truth, managing the entire lifecycle of a trading session. All information (reports, debate history, final decision) is propagated through this state object.\n\n3.  **Vendor-Agnostic Data Access (Data Layer):**\n    *   **Abstraction:** All external data fetching is abstracted behind a unified `route_to_vendor` interface.\n    *   **Design Philosophy:** This implements the **Strategy Pattern**, making the framework flexible and resilient to changes in data providers. Agents request data using generic tool names (e.g., `get_stock_data`), and the `dataflows` module handles the routing to the currently configured vendor (e.g., Alpha Vantage or YFinance). This promotes **loose coupling** between the cognitive and data layers.\n\n#### 3.1.2. Component Interactions\n\n\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n```plantuml\n@startuml\nskinparam linetype polyline\nskinparam linetype ortho\n\npackage \"TradingAgents Framework\" {\n    \n    package \"1. Orchestration Layer (graph)\" as Graph {\n        class TradingAgentsGraph\n        class ConditionalLogic\n        class Propagator\n        class Reflector\n    }\n\n    package \"2. Cognitive Layer (agents)\" as Agents {\n        class AgentState <<TypedDict>>\n        class FinancialSituationMemory <<ChromaDB/OpenAI>>\n        \n        package \"Analysts\" {\n            class MarketAnalyst\n            class NewsAnalyst\n            class FundamentalsAnalyst\n            class SocialMediaAnalyst\n        }\n        \n        package \"Debators\" {\n            class BullResearcher\n            class BearResearcher\n            class RiskyAnalyst\n            class SafeAnalyst\n            class NeutralAnalyst\n        }\n        \n        class Trader\n        \n        package \"Tools\" {\n            class DataTools <<LangChain Tool>>\n        }\n    }\n\n    package \"3. Data Layer (dataflows)\" as DataFlows {\n        class Interface <<route_to_vendor>>\n        class Config\n        package \"Vendors\" {\n            class AlphaVantage\n            class YFinance\n        }\n    }\n    \n    ' Relationships\n    \n    ' Graph to Agents/State\n    TradingAgentsGraph --> AgentState : manages state\n    TradingAgentsGraph --> ConditionalLogic : uses for routing\n    TradingAgentsGraph --> Propagator : initializes state\n    TradingAgentsGraph --> Reflector : updates memory\n    \n    ' Agents to DataFlows/Tools\n    MarketAnalyst ..> DataTools : uses\n    FundamentalsAnalyst ..> DataTools : uses\n    DataTools --> Interface : routes request\n    \n    ' Memory\n    Reflector --> FinancialSituationMemory : updates\n    Trader ..> FinancialSituationMemory : retrieves lessons\n    BullResearcher ..> FinancialSituationMemory : retrieves lessons\n    BearResearcher ..> FinancialSituationMemory : retrieves lessons\n    \n    ' DataFlows\n    Interface --> AlphaVantage : calls vendor\n    Interface --> YFinance : calls vendor\n    \n    ' Agent Flow (Simplified)\n    Analysts --> Debators : generate reports\n    Debators --> Trader : generate investment plan\n    Trader --> Debators : generates initial decision\n    Debators --> Graph : final decision\n}\n\n@enduml\n```,component_interactions:\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe codebase effectively utilizes several software design patterns to manage complexity and promote flexibility.\n\n| Pattern | Description | Implementation in TradingAgents |\n| :--- | :--- | :--- |\n| **Strategy Pattern** | Defines a family of algorithms, encapsulates each one, and makes them interchangeable. | Implemented in the `dataflows` module. The `route_to_vendor` function in `interface.py` acts as the context, dynamically selecting a concrete strategy (e.g., `AlphaVantageVendor` or `YFinanceVendor`) based on the configuration to fulfill a data request (e.g., `get_stock_data`). |\n| **State Machine Pattern** | An object whose behavior is determined by its internal state, and which transitions between states based on input. | Implemented using the **LangGraph** library in the `graph` module. The `AgentState` is the state object, and the `ConditionalLogic` class defines the transition rules (edges) between the agent nodes (states). This ensures a structured, multi-step decision-making process. |\n| **Chain of Responsibility Pattern** | Passes a request along a chain of handlers. Each handler decides either to process the request or pass it to the next handler in the chain. | Implicitly used in the **Research Phase** of the graph. The request for information flows through a sequence of analyst agents (Market -> News -> Fundamentals), where each agent adds its specialized report to the shared `AgentState` before passing the state to the next. |\n| **Observer Pattern** | An object (subject) maintains a list of its dependents (observers) and notifies them automatically of any state changes. | Used in the **Debate Phases**. The shared `AgentState` acts as the subject. When one debator (e.g., `BullResearcher`) updates the state with a new argument, the next debator (e.g., `BearResearcher`) is notified (triggered by the graph edge) and reacts to the new state. |\n| **Retrieval-Augmented Generation (RAG)** | Augments an LLM with an external knowledge base to improve the quality of generated responses. | Implemented via the `FinancialSituationMemory` class in `agents/utils/memory.py`. This class uses ChromaDB and OpenAI embeddings to store and retrieve past trading situations and their outcomes, allowing the Trader and Researchers to \"learn from experience.\" |\n\n#### 3.3.2. Project Highlights\n\n\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nBased on the analysis, the following areas present opportunities for performance, architecture, and code quality improvements:\n\n*   **Performance Bottleneck: Synchronous Data Fetching:** The current data fetching model appears to be synchronous, as each analyst node must complete its tool calls before the next node can execute. For a multi-agent system, converting the data fetching in the `dataflows` module to use **asynchronous I/O (asyncio)** would allow multiple data requests (e.g., fetching news, fundamentals, and technical data) to run concurrently, drastically reducing the overall research phase time.\n*   **Architecture Optimization: Centralized Tool Definition:** The LangChain tools are currently defined in `agents/utils/*_tools.py` and act as wrappers around `dataflows.interface.route_to_vendor`. While functional, a cleaner separation would be to define the tools directly within the `dataflows` module, making the `dataflows` module a self-contained, tool-exposing service. This would simplify the `agents` module and reinforce the data layer's role.\n*   **Code Quality: Configuration Management:** The configuration logic in `dataflows/config.py` uses global variables (`_config`, `DATA_DIR`), which can lead to hard-to-track side effects. A better practice would be to use a dedicated configuration object (e.g., a Singleton or a Pydantic settings class) that is explicitly passed to the `TradingAgentsGraph` constructor and then injected into the necessary components (like `FinancialSituationMemory` and `Interface`).\n*   **Robustness: Vendor Fallback Mechanism:** The `dataflows` module currently routes to a single configured vendor. Implementing a **Decorator Pattern** or an explicit fallback mechanism within `route_to_vendor` would enhance robustness. If the primary vendor (e.g., Alpha Vantage) fails due to a rate limit or API error, the system could automatically attempt the request with a secondary vendor (e.g., YFinance) before failing the agent task.\n\n#### 3.4.2. Secondary Development Guide\n\nFor a developer looking to extend or modify the TradingAgents framework, the following steps provide the best path for code exploration and secondary development:\n\n1.  **Understand the State:** Start by reviewing `tradingagents/agents/utils/agent_states.py`. The `AgentState` class is the central data structure. Understanding what data is available and how it is structured is crucial for modifying any agent or graph node.\n2.  **Trace the Workflow:** Examine `tradingagents/graph/trading_graph.py` to understand the sequence of operations. The `build_graph` method clearly defines the nodes (agents) and the edges (transitions). This is the primary file for modifying the overall decision-making flow.\n3.  **Modify Agent Behavior (Prompts):** To change an agent's reasoning or output format, modify the system prompt within the agent's definition (e.g., in `tradingagents/agents/analysts/market_analyst.py`). Ensure the new prompt still guides the LLM to use its bound tools correctly.\n4.  **Add New Data Sources (DataFlows):** To integrate a new data API:\n    *   Create a new file in `tradingagents/dataflows` (e.g., `my_new_vendor.py`) with functions that match the required signatures (e.g., `get_stock_data`).\n    *   Update the configuration in `tradingagents/default_config.py` to include the new vendor as an option.\n    *   The existing agent tools will automatically route requests to the new vendor if it is selected in the configuration, thanks to the Strategy Pattern in `dataflows/interface.py`.\n5.  **Extend Learning (Memory):** To enhance the system's memory, examine `tradingagents/graph/reflection.py` and `tradingagents/agents/utils/memory.py`. The `Reflector` is where the \"lessons\" are generated. Modifying the `reflection_system_prompt` can change the quality and focus of the stored memories, directly impacting future agent decisions.\n6.  **Extend Debate Logic:** To add a new debator or change the debate rules, modify the agent creation functions in `tradingagents/agents/researchers` or `tradingagents/agents/risk_mgmt`, and update the conditional logic in `tradingagents/graph/conditional_logic.py` to include the new agent in the turn-based flow.\n\n"
  },
  {
    "path": "thirdparty/TrendRadar.md",
    "content": "# TrendRadar - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\n```\n/\n├── config/                     # Application configuration files (config.yaml, frequency_words.txt)\n├── docker/                     # Docker and containerization setup (Dockerfile, docker-compose.yml, manage.py)\n├── main.py                     # Main application entry point\n├── mcp_server/                 # Core package for the Model Context Protocol (MCP) server\n│   ├── server.py               # FastAPI server definition and API endpoints\n│   ├── services/               # Service layer for business logic\n│   │   ├── cache_service.py    # Caching mechanisms\n│   │   ├── data_service.py     # Data access and persistence\n│   │   └── parser_service.py   # News data parsing and processing\n│   ├── tools/                  # Callable tools exposed to the MCP Agent\n│   │   ├── analytics.py        # Data analysis and reporting tools\n│   │   ├── config_mgmt.py      # Configuration management tools\n│   │   ├── data_query.py       # Data querying tools\n│   │   ├── search_tools.py     # Search functionality tools\n│   │   └── system.py           # System interaction tools\n│   └── utils/                  # General utility functions\n│       ├── date_parser.py      # Date and time parsing helpers\n│       ├── errors.py           # Custom exception definitions\n│       └── validators.py       # Data validation logic\n├── output/                     # Storage for processed data and logs\n└── requirements.txt            # Python dependency list\n```\n```\n\n### 1.2. Core Folders for Analysis\n\n- `/home/ubuntu/TrendRadar/mcp_server` (MCP Server Core)\n- `/home/ubuntu/TrendRadar/mcp_server/services` (Services Layer)\n- `/home/ubuntu/TrendRadar/mcp_server/tools` (Agent Tools)\n- `/home/ubuntu/TrendRadar/mcp_server/utils` (Utilities)\n- `/home/ubuntu/TrendRadar/main.py` (Application Entry Point)\n- `/home/ubuntu/TrendRadar/config` (Configuration Files - to be analyzed with the code that uses them)\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module Analysis: MCP Server Core (/home/ubuntu/TrendRadar/mcp_server)\n\n### Module Core Responsibility\nThe `mcp_server` module serves as the **Model Context Protocol (MCP) Server** entry point and tool dispatcher for the TrendRadar project. It is built on the `FastMCP 2.0` framework, enabling the exposure of 13 high-level, data-centric tools to an external AI Agent. Its primary functions are server lifecycle management (supporting `stdio` and `http` transport) and delegating tool requests to specialized service classes.\n\n### Key File Identification\n- **`server.py`**: The main file. It defines the `FastMCP` application instance, registers all 13 MCP tools using the `@mcp.tool` decorator, and contains the `run_server` function for server startup.\n- **`__init__.py`**: Standard package initializer, primarily used for relative imports.\n\n### Code Detail Analysis\n- **Core Implementation (Tool Dispatch):** The `_get_tools` function implements a **Singleton Pattern** to ensure that instances of the underlying tool classes (e.g., `DataQueryTools`, `AnalyticsTools`) are created only once upon the first request. This ensures consistent state and resource management across all tool calls. Each registered MCP tool function (e.g., `get_latest_news`) acts as a thin, asynchronous wrapper that validates input, calls the corresponding method on the singleton tool instance, and formats the result as a JSON string.\n- **Critical Utility (`resolve_date_range`):** The `resolve_date_range` tool is a standout feature, designed to be called *first* by the Agent. It uses the `DateParser` utility to reliably translate natural language date expressions (e.g., \"this week\", \"last 7 days\") into a standardized `{\"start\": \"YYYY-MM-DD\", \"end\": \"YYYY-MM-DD\"}` format. This standardization is crucial for the consistency and accuracy of all subsequent data-querying and analysis tools.\n- **Dependencies:** The module is tightly coupled with the `FastMCP` library and depends on all tool classes in the `tools` sub-module and utility classes like `DateParser` and `MCPError` from the `utils` sub-module.\n\n## Module Analysis: Services Layer (/home/ubuntu/TrendRadar/mcp_server/services)\n\n### Module Core Responsibility\nThe `services` module implements the core business logic for data handling, caching, and news parsing. It acts as the **data access and processing layer**, abstracting the raw file system storage and configuration details from the higher-level tool logic.\n\n### Key File Identification\n- **`data_service.py`**: The central service for data retrieval. It handles reading raw news data from the file system (`output` directory), parsing the data, and aggregating it for query tools. It also manages configuration parsing (`config.yaml` and `frequency_words.txt`).\n- **`cache_service.py`**: Provides a simple, in-memory caching mechanism (`SimpleCache`) to speed up repeated data access, especially for the large file-based data store.\n- **`parser_service.py`**: Contains the logic (`DataParserService`) for parsing the raw `.txt` files generated by the crawler, extracting titles, ranks, and platform information.\n\n### Code Detail Analysis\n- **Core Implementation (Data Persistence):** The `DataService` class is the primary interface for data. It uses the `Path` object from `pathlib` to manage file paths, indicating a modern Python approach to file system interaction. The raw data is stored in a hierarchical structure: `output/{date_folder}/txt/{timestamp}.txt`. The `_get_data_by_date` method is critical, as it handles cache lookups, file system traversal, and data aggregation from multiple time-stamped files into a single daily view.\n- **Caching Mechanism:** The `CacheService` implements a basic **Least Recently Used (LRU)**-like cache using a dictionary, with a Time-To-Live (TTL) mechanism to ensure data freshness. This is essential for a file-based data store to avoid slow disk I/O on every request.\n- **Configuration Parsing:** `DataService` includes methods (`parse_yaml_config`, `parse_frequency_words`) to read and structure the application's configuration and keyword lists, which are then used by other services and tools. The keyword parsing supports complex logic with required (`+`) and filter (`!`) words.\n- **Dependencies:** This module depends heavily on `pathlib` for file operations, `yaml` for configuration parsing, and custom exceptions from `mcp_server.utils.errors` (e.g., `DataNotFoundError`, `FileParseError`). It also uses the `CacheService` internally.\n\n## Module Analysis: Tools Layer (/home/ubuntu/TrendRadar/mcp_server/tools)\n\n### Module Core Responsibility\nThe `tools` module contains the **concrete implementations** of the business logic that are exposed as MCP tools. Each file defines a class that groups related functionalities, acting as the bridge between the high-level MCP server wrappers (`server.py`) and the low-level data services (`services`).\n\n### Key File Identification\n- **`data_query.py`**: Implements basic data retrieval methods like `get_latest_news`, `get_trending_topics`, and `get_news_by_date`. It relies heavily on `DataService` for data access.\n- **`analytics.py`**: Implements advanced data analysis tools, including `analyze_topic_trend_unified`, `analyze_data_insights_unified`, `analyze_sentiment`, `find_similar_news`, and `generate_summary_report`. This is the core of the project's intelligence.\n- **`search_tools.py`**: Implements various search functionalities, such as `search_news_unified` (supporting exact, fuzzy, and entity search) and `search_related_news_history`.\n- **`config_mgmt.py`**: Provides the `get_current_config` method, which uses `DataService` to read and return configuration settings.\n- **`system.py`**: Implements system-level operations like `get_system_status` and the crucial `trigger_crawl` method, which simulates a news crawl and handles data saving.\n\n### Code Detail Analysis\n- **Core Implementation (Tool Classes):** All tool classes (e.g., `DataQueryTools`, `AnalyticsTools`) inherit from a base class (implied or a simple structure) and are initialized with a `project_root`. They encapsulate the logic for processing tool arguments, calling the necessary services, and applying business rules.\n- **`AnalyticsTools` Logic:** This class is complex, implementing unified methods that dispatch to different internal analysis logic based on the `analysis_type` or `insight_type` parameter. This **Strategy Pattern** allows the MCP tool interface to remain clean while supporting multiple analytical modes (e.g., trend, lifecycle, viral).\n- **`SystemManagementTools` (`trigger_crawl`):** This method is a key component of the project's data pipeline. It simulates a news crawl, aggregates the results, and, if `save_to_local` is true, persists the data to the `output` directory in a structured format (`.txt` and `.html` files). It includes logic for generating a simple HTML report for visualization.\n- **Dependencies:** The tools layer is the primary consumer of the `services` layer (`DataService`, `CacheService`, `ParserService`) and the `utils` layer (`DateParser`, `MCPError`). It also uses external libraries like `datetime` and `json`.\n\n## Module Analysis: Utilities Layer (/home/ubuntu/TrendRadar/mcp_server/utils)\n\n### Module Core Responsibility\nThe `utils` module provides essential, reusable, low-level functionalities for the entire MCP server, primarily focusing on **date/time manipulation, input validation, and custom error handling**. This ensures data integrity and robust error reporting to the consuming AI Agent.\n\n### Key File Identification\n- **`date_parser.py`**: Contains the `DateParser` class, which is responsible for translating natural language date expressions (e.g., \"yesterday\", \"last week\") into precise date ranges or specific dates. This is a critical component for the `resolve_date_range` MCP tool.\n- **`validators.py`**: A collection of validation functions (e.g., `validate_limit`, `validate_date_range`, `validate_keyword`) used to sanitize and check the parameters passed to the MCP tools. It enforces business rules like date range validity and keyword length limits.\n- **`errors.py`**: Defines a hierarchy of custom exceptions, all inheriting from `MCPError`. This allows the server to catch specific errors and return structured, user-friendly error messages to the AI Agent, which is a key requirement for the MCP specification.\n\n### Code Detail Analysis\n- **Core Implementation (Date Parsing):** The `DateParser` class uses the `datetime` module extensively. It defines methods like `resolve_date_range_expression` which is the core logic for translating natural language into a standard date range object. It also includes validation methods like `validate_date_not_future` and `validate_date_not_too_old` to ensure queries are within the system's data limits.\n- **Error Handling:** The custom exception hierarchy (`MCPError` -> `InvalidParameterError`, `DataNotFoundError`, `FileParseError`) is a clean implementation of the **Chain of Responsibility** or **Custom Exception Pattern**. By raising specific exceptions, the tool wrappers in `server.py` can consistently format error responses for the Agent.\n- **Input Validation:** The `validators.py` file centralizes all input checks, preventing invalid data from reaching the core business logic. The `validate_date_range` function is particularly important, as it checks for correct format, logical order (start <= end), and ensures the dates are not in the future, even dynamically checking the available data range via `DataService`.\n\n### Module PlantUML Diagrams\n\n### Module Class Diagram Generation (MCP Server Core)\n\n```plantuml\n@startuml\nskinparam ClassAttributeIconStyle none\n\ntitle MCP Server Core Module Diagram\n\npackage \"mcp_server\" {\n    class \"server.py (MCP Dispatcher)\" {\n        + mcp: FastMCP\n        - _tools_instances: Dict\n        + _get_tools(project_root): Dict\n        + run_server(transport, host, port)\n    }\n\n    class \"resolve_date_range (MCP Tool)\"\n    class \"get_latest_news (MCP Tool)\"\n    class \"analyze_topic_trend (MCP Tool)\"\n    ' ... 10 other MCP tool functions ...\n}\n\npackage \"External\" {\n    class \"FastMCP\"\n}\n\npackage \"mcp_server.tools\" {\n    class \"DataQueryTools\"\n    class \"AnalyticsTools\"\n    class \"SearchTools\"\n    class \"ConfigManagementTools\"\n    class \"SystemManagementTools\"\n}\n\npackage \"mcp_server.utils\" {\n    class \"DateParser\"\n    class \"MCPError\"\n}\n\n\"server.py (MCP Dispatcher)\" ..> \"FastMCP\" : uses\n\"server.py (MCP Dispatcher)\" ..> \"DataQueryTools\" : instantiates\n\"server.py (MCP Dispatcher)\" ..> \"AnalyticsTools\" : instantiates\n\"server.py (MCP Dispatcher)\" ..> \"SearchTools\" : instantiates\n\"server.py (MCP Dispatcher)\" ..> \"ConfigManagementTools\" : instantiates\n\"server.py (MCP Dispatcher)\" ..> \"SystemManagementTools\" : instantiates\n\n\"resolve_date_range (MCP Tool)\" ..> \"DateParser\" : uses\n\"resolve_date_range (MCP Tool)\" ..> \"MCPError\" : handles\n\n\"get_latest_news (MCP Tool)\" ..> \"DataQueryTools\" : delegates\n\"analyze_topic_trend (MCP Tool)\" ..> \"AnalyticsTools\" : delegates\n' ... other tool functions delegate to their respective tool classes ...\n\n@enduml\n```\n\n### Module Class Diagram Generation (Services Layer)\n\n```plantuml\n@startuml\nskinparam ClassAttributeIconStyle none\n\ntitle Services Layer Module Diagram\n\npackage \"mcp_server.services\" {\n    class SimpleCache {\n        - cache: Dict\n        - ttl: int\n        + get(key): Any\n        + set(key, value, ttl): void\n        + delete(key): void\n    }\n\n    class DataParserService {\n        + parse_txt_file(file_path): Tuple\n    }\n\n    class DataService {\n        - project_root: Path\n        - cache: SimpleCache\n        + get_date_folder_name(date): str\n        + parse_yaml_config(config_path): dict\n        + parse_frequency_words(words_file): List[Dict]\n        + _get_data_by_date(date, platform_ids): Tuple\n    }\n}\n\npackage \"mcp_server.utils\" {\n    class DataNotFoundError\n    class FileParseError\n}\n\nDataService *-- SimpleCache : uses\nDataService ..> DataParserService : uses\nDataService ..> DataNotFoundError : raises\nDataService ..> FileParseError : raises\n\n@enduml\n```\n\n### Module Class Diagram Generation (Tools Layer)\n\n```plantuml\n@startuml\nskinparam ClassAttributeIconStyle none\n\ntitle Tools Layer Module Diagram\n\npackage \"mcp_server.tools\" {\n    class BaseTools {\n        # project_root: Path\n    }\n\n    class DataQueryTools\n    class AnalyticsTools\n    class SearchTools\n    class ConfigManagementTools\n    class SystemManagementTools {\n        + trigger_crawl(platforms, save_to_local, include_url)\n        - _generate_simple_html(results, id_to_name, failed_ids, now)\n    }\n}\n\npackage \"mcp_server.services\" {\n    class DataService\n    class CacheService\n    class ParserService\n}\n\nBaseTools <|-- DataQueryTools\nBaseTools <|-- AnalyticsTools\nBaseTools <|-- SearchTools\nBaseTools <|-- ConfigManagementTools\nBaseTools <|-- SystemManagementTools\n\nDataQueryTools ..> DataService : uses\nAnalyticsTools ..> DataService : uses\nSearchTools ..> DataService : uses\nConfigManagementTools ..> DataService : uses\n\nSystemManagementTools ..> DataService : uses (for config)\nSystemManagementTools ..> ParserService : uses (for crawl simulation)\n\n@enduml\n```\n\n### Module Class Diagram Generation (Utilities Layer)\n\n```plantuml\n@startuml\nskinparam ClassAttributeIconStyle none\n\ntitle Utilities Layer Module Diagram\n\npackage \"mcp_server.utils\" {\n    abstract class MCPError {\n        + code: str\n        + message: str\n        + suggestion: str\n        + to_dict(): dict\n    }\n\n    class InvalidParameterError\n    class DataNotFoundError\n    class FileParseError\n\n    class DateParser {\n        + resolve_date_range_expression(expression): dict\n        + parse_date_query(date_query): datetime\n        + validate_date_not_future(date): void\n        + validate_date_not_too_old(date, max_days): void\n    }\n\n    class Validators {\n        + validate_limit(limit, default, max_limit): int\n        + validate_date(date_str): datetime\n        + validate_date_range(date_range): tuple\n        + validate_keyword(keyword): str\n        + validate_mode(mode, valid_modes, default): str\n        + validate_date_query(date_query): datetime\n    }\n}\n\nMCPError <|-- InvalidParameterError\nMCPError <|-- DataNotFoundError\nMCPError <|-- FileParseError\n\nValidators ..> DateParser : uses\nValidators ..> InvalidParameterError : raises\nValidators ..> DataNotFoundError : uses (indirectly via DataService)\n\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe TrendRadar project is fundamentally designed around the **Model Context Protocol (MCP)**, which dictates a clear separation of concerns between the AI Agent (the consumer) and the data/logic server (the provider).\n\n**Core Abstractions:**\n1.  **MCP Tool:** The primary abstraction, represented by the `@mcp.tool` decorated functions in `server.py`. These are the public, high-level functions that the AI Agent can call. They are designed to be self-documenting (via docstrings) and return structured JSON, adhering strictly to the MCP specification.\n2.  **Tool Class (e.g., `DataQueryTools`):** A middle-layer abstraction that groups related business logic. These classes are instantiated once (Singleton pattern) and hold the actual implementation of the tool's functionality, acting as the **Service Facade** for the MCP tool wrappers.\n3.  **Service (e.g., `DataService`):** The lowest-level abstraction for business logic, responsible for interacting with external resources like the file system (data persistence) and configuration files. It abstracts away the complexity of data storage and retrieval.\n4.  **Data Persistence:** The file system (`output/`) is abstracted as the primary data store, where data is organized by date and time-stamped files. This is a simple, robust, and easily inspectable form of persistence.\n\n**Design Philosophy:**\nThe design follows a **Layered Architecture** with a strong emphasis on **Tool-Centric Design**.\n-   **Tool Layer (MCP):** Focuses on clear, Agent-friendly interfaces and robust input/output handling.\n-   **Business Logic Layer (Tools):** Focuses on implementing complex features (e.g., unified trend analysis) by orchestrating services.\n-   **Data/Utility Layer (Services & Utils):** Focuses on data integrity, performance (caching), and reusable utilities (date parsing, validation).\nThe **\"Date First\"** philosophy is evident in the `resolve_date_range` tool, which is explicitly recommended for priority calling to ensure all subsequent data queries use standardized, server-validated date parameters.\n\n**Lifecycle Management:**\nThe server lifecycle is managed by the `run_server` function in `server.py`.\n-   **Initialization:** Tool instances are created once via `_get_tools` (Singleton) during server startup.\n-   **Execution:** The server runs indefinitely, listening for requests via either `stdio` (for local/testing) or `http` (for production) transport, as managed by the `FastMCP` framework.\n-   **Data Flow:** Data is primarily loaded from the file system into memory/cache upon request, and new data is written back to the file system via the `trigger_crawl` tool.\n\n#### 3.1.2. Component Interactions\n\n**Communication Pattern:**\nThe primary communication pattern is **Request-Response** via the MCP. The AI Agent sends a JSON-RPC request to the server, which identifies the requested tool and parameters, executes the corresponding Python function, and returns a structured JSON response.\n\n**Key Interaction Flows:**\n\n1.  **Date Resolution Flow (Pre-Query):**\n    -   **Agent** calls `resolve_date_range(expression=\"本周\")`.\n    -   **MCP Server** delegates to `DateParser.resolve_date_range_expression`.\n    -   **DateParser** returns a standardized `{\"start\": \"YYYY-MM-DD\", \"end\": \"YYYY-MM-DD\"}` JSON object.\n    -   **Agent** uses this standardized output in subsequent data tools.\n\n2.  **Data Query Flow (e.g., `get_latest_news`):**\n    -   **Agent** calls `get_latest_news(limit=50)`.\n    -   **MCP Server** delegates to `DataQueryTools.get_latest_news`.\n    -   **DataQueryTools** calls `DataService._get_data_by_date` (for today's data).\n    -   **DataService** checks `SimpleCache`.\n    -   **Cache Miss:** `DataService` reads and aggregates multiple `.txt` files from the `output/{date}/txt/` directory, parses them via `DataParserService`, and stores the result in `SimpleCache`.\n    -   **DataService** returns the aggregated data to `DataQueryTools`.\n    -   **MCP Server** returns the final JSON to the Agent.\n\n3.  **System Management Flow (`trigger_crawl`):**\n    -   **Agent** calls `trigger_crawl(save_to_local=True)`.\n    -   **MCP Server** delegates to `SystemManagementTools.trigger_crawl`.\n    -   **SystemManagementTools** simulates the crawl, aggregates results, and calls file system operations to write the new data to the `output` directory as both `.txt` (raw data) and `.html` (report).\n\n**Data Flow:**\n-   **Input:** User request (via Agent) -> MCP Tool parameters.\n-   **Processing:** MCP Tool -> Tool Class -> Service Class -> Data (File System/Cache).\n-   **Output:** Structured JSON (News lists, analysis results, status reports) -> MCP Tool -> Agent.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n```plantuml\n@startuml\nskinparam ClassAttributeIconStyle none\nskinparam packageStyle rectangle\n\ntitle TrendRadar MCP Server Overall Architecture\n\npackage \"External Agent\" as Agent {\n    [AI Agent]\n}\n\npackage \"MCP Server Core (mcp_server)\" as Core {\n    [server.py (Tool Dispatcher)]\n}\n\npackage \"Tools Layer (mcp_server.tools)\" as Tools {\n    [DataQueryTools]\n    [AnalyticsTools]\n    [SearchTools]\n    [ConfigManagementTools]\n    [SystemManagementTools]\n}\n\npackage \"Services Layer (mcp_server.services)\" as Services {\n    [DataService]\n    [CacheService]\n    [ParserService]\n}\n\npackage \"Utilities (mcp_server.utils)\" as Utils {\n    [DateParser]\n    [Validators]\n    [MCPError Hierarchy]\n}\n\npackage \"Data Persistence\" as Data {\n    [config/config.yaml]\n    [output/ (Raw News Files)]\n}\n\n' Interactions\nAgent --> Core : JSON-RPC Request (Tool Call)\nCore --> Tools : Delegates Tool Execution (Singleton Pattern)\n\nTools --> Services : Calls Business Logic\nTools --> Utils : Uses Validation/Date Parsing\n\nServices --> Data : Reads/Writes Data\nServices --> Utils : Uses Error Handling\n\nCore --> Utils : Uses DateParser (resolve_date_range)\n\n' Dependencies\nServices ..> Utils : Imports MCPError\nTools ..> Services : Depends on DataService\nCore ..> Tools : Instantiates Tool Classes\nCore ..> Utils : Imports DateParser\n\n@enduml\n```\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\n| Pattern Name | Description | Implementation Details |\n| :--- | :--- | :--- |\n| **Singleton** | Ensures a class has only one instance, and provides a global point of access to it. | Implemented in `mcp_server/server.py` via the `_get_tools` function, which uses a dictionary (`_tools_instances`) to store and return the single instance of each tool class (e.g., `DataQueryTools`, `AnalyticsTools`). |\n| **Facade** | Provides a unified interface to a set of interfaces in a subsystem. | The MCP Tool functions in `server.py` act as a Facade to the complex business logic contained within the `Tools` classes (e.g., `DataQueryTools`). The Agent only sees the simple tool interface. |\n| **Strategy** | Defines a family of algorithms, encapsulates each one, and makes them interchangeable. | Implemented in `AnalyticsTools` via methods like `analyze_topic_trend_unified` and `analyze_data_insights_unified`. These methods accept an `analysis_type` or `insight_type` string, which determines which specific internal analysis algorithm (strategy) to execute. |\n| **Custom Exception** | Defines a hierarchy of domain-specific exceptions for robust error handling. | Implemented in `mcp_server/utils/errors.py` with the base `MCPError` and its subclasses (`InvalidParameterError`, `DataNotFoundError`, `FileParseError`). This allows for structured, machine-readable error responses to the Agent. |\n\n#### 3.3.2. Project Highlights\n\n-   **Model Context Protocol (MCP) Integration:** The use of `FastMCP 2.0` is the primary highlight, instantly transforming the data pipeline into a powerful, AI-callable service. This allows any compliant AI Agent to leverage TrendRadar's data and analysis capabilities without needing to understand the underlying Python code or data storage.\n-   **Standardized Date Resolution:** The dedicated `resolve_date_range` tool is an innovative feature that solves a common pain point in Agent-Tool interactions: inconsistent date parsing. By forcing the Agent to use a server-side, validated date range, it ensures data query accuracy and consistency across all analytical tools.\n-   **Unified Tool Interfaces:** The `AnalyticsTools` and `SearchTools` use \"unified\" methods (e.g., `analyze_topic_trend_unified`) that accept a `type` parameter to select the specific analysis mode. This design keeps the number of exposed MCP tools low and the Agent's cognitive load minimal, while maintaining high functional coverage.\n-   **Simple, Inspectable Data Persistence:** Storing data in a structured file system (`output/{date}/txt/{timestamp}.txt`) provides excellent flexibility and inspectability. It avoids the overhead of a database while still allowing for time-series data aggregation. The use of a `SimpleCache` mitigates the performance impact of file I/O.\n-   **Extensibility:** The layered architecture makes the system highly extensible.\n    -   **New Tool:** Add a new method to an existing Tool Class (e.g., `AnalyticsTools`) and register a new wrapper in `server.py`.\n    -   **New Data Source:** The `DataService` and `ParserService` can be extended to handle new data formats or sources without affecting the Tool or MCP layers.\n    -   **New Analysis Strategy:** A new analysis type can be added to `AnalyticsTools` by implementing the new logic and updating the `unified` dispatcher method.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe TrendRadar architecture is well-suited for its current file-based data model and MCP-centric design. However, several areas can be optimized for performance and maintainability:\n\n1.  **Performance Bottleneck: File I/O and Caching:**\n    -   **Suggestion:** Upgrade the `SimpleCache` to a more robust, dedicated in-memory cache (e.g., Redis or a dedicated LRU cache library). The current cache is simple but lacks features like automatic eviction policies beyond TTL.\n    -   **Optimization:** The `DataService._get_data_by_date` method involves reading and aggregating multiple `.txt` files on a cache miss. For large datasets, this I/O operation will be a significant bottleneck. Consider a background process to pre-aggregate daily data into a single, optimized file format (e.g., Parquet or a compressed JSON line file) to minimize read operations.\n\n2.  **Architecture Optimization: Dependency Management:**\n    -   **Suggestion:** Implement a lightweight **Dependency Injection (DI)** pattern. Currently, `DataService` is instantiated directly in `DataQueryTools`, and `DataService` is implicitly imported in `validators.py` (via a try-except block) to check available dates. This tight coupling makes testing and refactoring difficult.\n    -   **Optimization:** Pass the `DataService` instance to the `Tools` classes during initialization in `server.py`. This would make dependencies explicit and improve unit testability.\n\n3.  **Code Quality and Maintainability:**\n    -   **Suggestion:** Introduce comprehensive **Type Hinting** across all module methods, especially in the `Tools` and `Services` layers. While some type hints exist, full coverage would improve code clarity and enable static analysis tools.\n    -   **Optimization:** Implement unit tests for the complex logic in `AnalyticsTools` and the file parsing logic in `ParserService`. The current system lacks a visible testing suite, which is crucial for a project with complex data processing.\n\n#### 3.4.2. Secondary Development Guide\n\nFor a developer looking to extend the TrendRadar project, the following path is recommended:\n\n1.  **Code Exploration Path (Bottom-Up):**\n    -   **Start with Data:** Examine `DataService` and the `output/` directory structure to understand how raw news data is stored and retrieved.\n    -   **Understand Utilities:** Review `validators.py` and `date_parser.py` to grasp the input constraints and date standardization logic.\n    -   **Trace Tool Logic:** Follow the flow from a tool wrapper in `server.py` (e.g., `get_latest_news`) down to its implementation in `DataQueryTools` and the subsequent call to `DataService`.\n\n2.  **Adding a New MCP Tool (e.g., `get_top_platforms`):**\n    -   **Step 1: Service Implementation:** If the new tool requires new data logic, add a method to `DataService` (or create a new service class in `mcp_server/services`).\n    -   **Step 2: Tool Class Wrapper:** Add the new method to an existing Tool Class (e.g., `ConfigManagementTools`) or create a new one in `mcp_server/tools`. This method will contain the business logic and call the service.\n    -   **Step 3: MCP Registration:** In `mcp_server/server.py`, define a new asynchronous function decorated with `@mcp.tool`. This function should call the method from the Tool Class instance obtained via `_get_tools()`, handle any `MCPError` exceptions, and return the result as a JSON string.\n\n3.  **Integrating a New Data Source/Crawler:**\n    -   **Step 1: Update Configuration:** Modify `config/config.yaml` to include the new platform ID and configuration details.\n    -   **Step 2: Update `trigger_crawl`:** The `SystemManagementTools.trigger_crawl` method currently simulates the crawl. To integrate a real crawler, the simulation logic must be replaced with the actual crawler invocation, ensuring the output format matches the expected structure for `ParserService`.\n    -   **Step 3: Data Persistence:** Ensure the new crawler's output is saved to the `output/` directory in the expected time-stamped `.txt` format, which is the contract for the `DataService`.\n\n"
  },
  {
    "path": "thirdparty/agentic-trading.md",
    "content": "# agentic-trading - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project is structured as a multi-component system for agentic trading, utilizing a microservices-like architecture with three main components: `alphabot`, `riskguard`, and `simulator`, all sharing a `common` library.\n\n```\n/home/ubuntu/FinnewsHunter/thirdparty/agentic-trading\n├── Dockerfile.alphabot        # Dockerfile for the AlphaBot service\n├── Dockerfile.riskguard       # Dockerfile for the RiskGuard service\n├── Dockerfile.simulator       # Dockerfile for the Simulator service\n├── LICENSE                    # Project license\n├── README.md                  # Project documentation\n├── alphabot                   # Core module: The primary trading agent (AlphaBot)\n│   ├── __main__.py            # Entry point for running AlphaBot as an A2A server\n│   ├── a2a_risk_tool.py       # Tool for AlphaBot to interact with RiskGuard\n│   ├── agent.py               # Core logic for the AlphaBot agent\n│   └── agent_executor.py      # Executes the agent's logic, integrating with A2A\n├── cloudbuild-*.yaml          # Google Cloud Build configuration files\n├── common                     # Core module: Shared utilities, configuration, and data models\n│   ├── config.py              # Configuration settings for all services\n│   ├── models.py              # Pydantic data models for inter-service communication\n│   └── utils                  # Utility functions\n│       ├── agent_utils.py     # Utilities for Agent-to-Agent (A2A) communication\n│       └── indicators.py      # Technical indicator calculation functions\n├── deploy_cloud_run.sh        # Script for deploying to Google Cloud Run\n├── deploy_local.sh            # Script for local deployment\n├── pyproject.toml             # Project metadata and build configuration\n├── requirements.txt           # Python dependencies\n├── riskguard                  # Core module: The risk management agent (RiskGuard)\n│   ├── __main__.py            # Entry point for running RiskGuard as an A2A server\n│   ├── agent.py               # Core logic for the RiskGuard agent\n│   ├── agent_executor.py      # Executes the agent's logic, integrating with A2A\n│   └── rules.py               # Implementation of risk management rules\n├── simulator                  # Core module: Market simulation and FastAPI UI\n│   ├── main.py                # FastAPI application entry point and orchestration\n│   ├── market.py              # Market data simulation and management\n│   ├── portfolio.py           # Portfolio state management\n│   ├── static                 # Static assets (e.g., CSS) - Excluded from code analysis\n│   └── templates              # HTML templates - Excluded from code analysis\n└── tests                      # Unit and integration tests - Excluded from code analysis\n```\nThe project's structure clearly delineates the responsibilities of each component. The `alphabot`, `riskguard`, and `simulator` directories contain the core business logic for the three independent services. The `common` directory acts as the shared library, providing the essential data contracts and utility functions that bind the services together. Configuration and deployment files (`Dockerfile.*`, `cloudbuild-*.yaml`, `deploy_*.sh`) are kept at the root, supporting a modern, containerized deployment workflow. The `tests` directory ensures code quality and correctness for all core modules.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinnewsHunter/thirdparty/agentic-trading/alphabot`: Implements the core trading logic, acting as the primary decision-making agent, utilizing LangChain and an LLM.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/agentic-trading/riskguard`: Implements the risk management and compliance logic, acting as a deterministic gatekeeper for proposed trades.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/agentic-trading/simulator`: Contains the market simulation, portfolio management, and the FastAPI-based orchestration layer for the agents.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/agentic-trading/common`: Provides shared data models (Pydantic), configuration, and utility functions used across all three main components, ensuring a consistent data contract.\n\n## Phase 2: Module-by-Module Deep Analysis\n\nThe project is composed of four core modules: `alphabot`, `riskguard`, `simulator`, and `common`. Each module plays a distinct role in the agentic trading system.\n\n### Module: alphabot (Trading Agent)\n*   **Files Enumeration**: `__main__.py`, `a2a_risk_tool.py`, `agent.py`, `agent_executor.py`.\n*   **Core Responsibility**: To act as the **primary trading agent**. It analyzes the current market data and portfolio state, uses an LLM-based reasoning process (via LangChain), and leverages the `RiskGuard` as an external tool to propose a trade signal (buy, sell, or hold) along with the rationale. It is exposed as an A2A service.\n*   **Key File Identification**:\n    *   `agent.py`: Contains the core LLM-based trading logic and the definition of the agent's behavior.\n    *   `a2a_risk_tool.py`: Critical for inter-agent communication, defining the mechanism for AlphaBot to query RiskGuard.\n*   **Core Implementation**: The `AlphaBotAgent` in `agent.py` is a LangChain agent using a `ChatOpenAI` model. It is designed to consume `PortfolioState` and `MarketData` and produce a `TradeSignal`. Its core safety mechanism is the provision of the `RiskGuardTool`, which the LLM is prompted to use to validate any potential trade before finalizing its decision.\n*   **Dependencies**: Internal dependencies include `common.models` and `a2a_risk_tool`. External dependencies are primarily the `langchain` framework, `pydantic` for structured output, and `requests` for calling the RiskGuard service.\n*   **Error & Performance**: The primary performance bottleneck is the **synchronous LLM call** within the agent's execution, which blocks the A2A request handler. Error handling is robust, relying on `PydanticOutputParser` to ensure the LLM's output conforms to the required `TradeSignal` schema.\n\n### Module: riskguard (Risk Management Agent)\n*   **Files Enumeration**: `__main__.py`, `agent.py`, `agent_executor.py`, `rules.py`.\n*   **Core Responsibility**: To act as the **risk management agent**. It receives a proposed trade and the current portfolio state, applies a set of predefined, deterministic risk rules, and returns a decision (approved or rejected) with a rationale. It is exposed as an A2A service, primarily consumed by `AlphaBot`.\n*   **Key File Identification**:\n    *   `rules.py`: Contains the critical, deterministic risk logic (`RiskGuardRules`).\n    *   `agent_executor.py`: The A2A service interface, translating A2A requests into risk rule execution.\n*   **Core Implementation**: The `RiskGuardRules` class in `rules.py` implements the risk logic, which is purely rule-based (e.g., Max Position Size, Max Daily Loss) and not LLM-driven. It takes a `ProposedTrade` and `PortfolioState` and returns a `RiskCheckResult`.\n*   **Dependencies**: Internal dependencies include `common.models`. External dependencies are minimal, mainly the `uvicorn`/`fastapi` framework for serving the A2A endpoint.\n*   **Error & Performance**: This module is **highly performant** as its logic is purely deterministic and computational, involving no external I/O (other than the network request itself). Error handling is straightforward, relying on Pydantic validation.\n\n### Module: simulator (Orchestrator and Environment)\n*   **Files Enumeration**: `main.py`, `market.py`, `portfolio.py`.\n*   **Core Responsibility**: To act as the **orchestration and environment layer**. It provides a web UI (via FastAPI), simulates the financial market, manages the portfolio state, and drives the simulation loop by calling the `AlphaBot` agent and executing approved trades.\n*   **Key File Identification**:\n    *   `main.py`: The central orchestrator, containing the FastAPI routes and the simulation loop logic.\n    *   `portfolio.py`: Defines the state of the trading system and handles trade execution.\n*   **Core Implementation**: The `main.py` uses `requests` to communicate with the `AlphaBot` A2A service. The simulation loop manages the flow: get state, call `AlphaBot`, execute trade on `Portfolio`. The `Market` class simulates price movements, and the `Portfolio` class handles trade execution and P&L calculation.\n*   **Dependencies**: Internal dependencies include `common.models`, `market`, and `portfolio`. External dependencies are `fastapi` for the web UI and `requests` for inter-agent communication.\n*   **Error & Performance**: The simulation speed is directly limited by the **latency of the AlphaBot LLM call** in each step of the simulation loop. The FastAPI server itself is highly performant.\n\n### Module: common (Shared Foundation)\n*   **Files Enumeration**: `config.py`, `models.py`, `utils/agent_utils.py`, `utils/indicators.py`.\n*   **Core Responsibility**: To provide the **shared foundation** for the entire system, ensuring consistency in configuration, data structure, and utility functions, especially those related to inter-agent communication and financial calculations.\n*   **Key File Identification**:\n    *   `models.py`: The **schema contract** for all inter-component communication.\n    *   `utils/agent_utils.py`: Encapsulates the logic for A2A communication.\n*   **Core Implementation**: `models.py` uses **Pydantic** extensively to define all data structures. `agent_utils.py` contains the crucial `call_agent_skill` function, which abstracts the HTTP request to an A2A service and handles Pydantic parsing of the response.\n*   **Dependencies**: External dependencies are `pydantic` and `requests`.\n*   **Error & Performance**: This module is highly performant. The `agent_utils.py` handles network errors and non-200 HTTP responses during A2A calls. The `indicators.py` is a potential area for optimization if it does not use vectorized operations.\n\n### Module PlantUML Diagrams\n\n# Module PlantUML Diagrams - alphabot\n@startuml alphabot\ntitle AlphaBot Module Class Diagram\n\npackage \"common.models\" {\n  class MarketData\n  class PortfolioState\n  class TradeSignal\n  class ProposedTrade\n  class RiskCheckResult\n}\n\npackage \"alphabot\" {\n  class AlphaBotAgent {\n    + run(market_data, portfolio_state)\n    - prompt_template\n  }\n  \n  class RiskGuardTool {\n    + name = \"risk_guard_tool\"\n    + description = \"Tool to check trade risk with RiskGuard\"\n    + _run(proposed_trade) : RiskCheckResult\n  }\n  \n  class AlphaBotAgentExecutor {\n    + execute_skill(skill_id, input_data) : TradeSignal\n  }\n  \n  class MainCLI\n}\n\nAlphaBotAgent \"1\" -- \"1\" RiskGuardTool : uses >\nAlphaBotAgent \"1\" -- \"1\" TradeSignal : outputs >\nAlphaBotAgent \"1\" -- \"1\" MarketData : consumes >\nAlphaBotAgent \"1\" -- \"1\" PortfolioState : consumes >\nRiskGuardTool \"1\" -- \"1\" ProposedTrade : consumes >\nRiskGuardTool \"1\" -- \"1\" RiskCheckResult : outputs >\nAlphaBotAgentExecutor \"1\" -- \"1\" AlphaBotAgent : executes >\nMainCLI \"1\" -- \"1\" AlphaBotAgentExecutor : instantiates >\n\nnote right of AlphaBotAgent\n  LangChain Agent\n  Uses LLM for decision\n  making and rationale.\nend note\n\nnote right of RiskGuardTool\n  Custom LangChain Tool\n  Communicates with RiskGuard\n  A2A service via HTTP.\nend note\n\n@enduml\n\n# Module PlantUML Diagrams - riskguard\n@startuml riskguard\ntitle RiskGuard Module Class Diagram\n\npackage \"common.models\" {\n  class PortfolioState\n  class ProposedTrade\n  class RiskCheckResult\n}\n\npackage \"riskguard\" {\n  class RiskGuardRules {\n    + check_trade_risk(trade, portfolio) : RiskCheckResult\n    - max_position_size_check()\n    - max_daily_loss_check()\n  }\n  \n  class RiskGuardAgentExecutor {\n    + execute_skill(skill_id, input_data) : RiskCheckResult\n  }\n  \n  class RiskGuardAgent {\n    + run(proposed_trade, portfolio_state) : RiskCheckResult\n  }\n  \n  class MainCLI\n}\n\nRiskGuardAgent \"1\" -- \"1\" RiskGuardRules : delegates to >\nRiskGuardRules \"1\" -- \"1\" ProposedTrade : consumes >\nRiskGuardRules \"1\" -- \"1\" PortfolioState : consumes >\nRiskGuardRules \"1\" -- \"1\" RiskCheckResult : outputs >\nRiskGuardAgentExecutor \"1\" -- \"1\" RiskGuardAgent : executes >\nMainCLI \"1\" -- \"1\" RiskGuardAgentExecutor : instantiates >\n\nnote right of RiskGuardRules\n  Deterministic, rule-based\n  risk checking logic.\nend note\n\n@enduml\n\n# Module PlantUML Diagrams - simulator\n@startuml simulator\ntitle Simulator Module Class Diagram\n\npackage \"common.models\" {\n  class MarketData\n  class PortfolioState\n  class TradeSignal\n}\n\npackage \"simulator\" {\n  class Market {\n    + get_current_data() : MarketData\n    + advance_time()\n    - simulate_price_movement()\n  }\n  \n  class Portfolio {\n    + state : PortfolioState\n    + execute_trade(trade_signal)\n    + calculate_pnl()\n  }\n  \n  class SimulatorApp {\n    + run_simulation_step()\n    + call_alphabot(data) : TradeSignal\n    + call_riskguard(trade) : RiskCheckResult\n  }\n  \n  class MainFastAPIApp\n}\n\nSimulatorApp \"1\" -- \"1\" Market : uses >\nSimulatorApp \"1\" -- \"1\" Portfolio : manages >\nSimulatorApp \"1\" -- \"1\" TradeSignal : consumes >\nSimulatorApp \"1\" -- \"1\" MarketData : produces >\nSimulatorApp \"1\" -- \"1\" PortfolioState : consumes >\nPortfolio \"1\" -- \"1\" PortfolioState : holds >\nMarket \"1\" -- \"1\" MarketData : produces >\nMainFastAPIApp \"1\" -- \"1\" SimulatorApp : orchestrates >\n\nnote right of SimulatorApp\n  FastAPI application\n  Orchestrates the simulation\n  and agent communication.\nend note\n\n@enduml\n\n# Module PlantUML Diagrams - common\n@startuml common\ntitle Common Module Class Diagram\n\npackage \"common.models\" {\n  class MarketData {\n    + symbol: str\n    + price: float\n    + timestamp: datetime\n  }\n  \n  class PortfolioState {\n    + cash: float\n    + holdings: Dict[str, float]\n    + total_value: float\n  }\n  \n  class TradeSignal {\n    + action: str (BUY/SELL/HOLD)\n    + symbol: str\n    + quantity: float\n    + rationale: str\n  }\n  \n  class ProposedTrade {\n    + trade_signal: TradeSignal\n    + risk_level: str\n  }\n  \n  class RiskCheckResult {\n    + approved: bool\n    + rationale: str\n  }\n}\n\npackage \"common.utils\" {\n  class AgentUtils {\n    + get_service_url(env_var, default_host, default_port)\n    + call_agent_skill(url, skill_id, input_data)\n  }\n  \n  class Indicators {\n    + calculate_ma(prices, period)\n    + calculate_rsi(prices, period)\n  }\n}\n\nAgentUtils \"1\" -- \"*\" MarketData : communicates >\nAgentUtils \"1\" -- \"*\" TradeSignal : communicates >\nAgentUtils \"1\" -- \"*\" RiskCheckResult : communicates >\n\nnote right of MarketData\n  Pydantic Model\nend note\n\nnote right of AgentUtils\n  Handles A2A HTTP calls\n  and Pydantic parsing.\nend note\n\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe core of the `agentic-trading` project is defined by a clear set of abstractions, a microservices-like design philosophy, and a loop-based lifecycle management.\n\n### Core Abstractions\nThe system's functionality is built upon three primary categories of data models, all centrally defined in `common/models.py` using Pydantic:\n\n1.  **State Models**: These represent the environment and the system's internal condition.\n    *   `MarketData`: Encapsulates the current market information (e.g., price, symbol, timestamp).\n    *   `PortfolioState`: Represents the system's holdings, cash balance, and total value.\n2.  **Action/Decision Models**: These are the outputs of the agents and the structures used for inter-agent communication.\n    *   `TradeSignal`: The final output of the `AlphaBot` agent, specifying the action (BUY/SELL/HOLD), quantity, and rationale.\n    *   `ProposedTrade`: A structure used internally by `AlphaBot` to propose a trade to `RiskGuard` for validation.\n3.  **Result Models**: These act as gatekeepers and feedback mechanisms.\n    *   `RiskCheckResult`: The output of the `RiskGuard` agent, containing a boolean `approved` status and a detailed `rationale`.\n\n### Design Philosophy\nThe project adheres to a **Microservices-like Agentic Architecture** using the **Agent-to-Agent (A2A) pattern**:\n\n*   **Separation of Concerns**: The system is strictly divided into three independent services: a decision-maker (`AlphaBot`), a gatekeeper (`RiskGuard`), and an environment/orchestrator (`Simulator`). This separation allows for independent development, scaling, and technology choices (e.g., LLM-based decision vs. deterministic rules).\n*   **Data Contract First**: The universal use of **Pydantic models** enforces a strict, validated data contract for all inter-service communication, which is essential for reliability in a distributed system.\n*   **Safety via Tool Use**: The `AlphaBot` agent is designed to be **safe by default** by requiring it to explicitly use the `RiskGuardTool` before finalizing a trade. This design ensures that the LLM's creative decision is always filtered through a deterministic, rule-based safety layer.\n\n### Lifecycle Management\nThe system's operational lifecycle is managed by the `Simulator` component:\n\n1.  **Initialization**: All three services (`AlphaBot`, `RiskGuard`, `Simulator`) are started independently, typically via their respective entry points, exposing A2A endpoints.\n2.  **Simulation Loop**: The `Simulator` drives the core loop, advancing the market time and repeatedly performing the trade decision and execution flow. This loop is the heart of the system's operation.\n3.  **Termination**: The simulation stops when the market data is exhausted or the user manually terminates the `Simulator` application.\n\n#### 3.1.2. Component Interactions\n\nThe primary communication pattern is **Synchronous HTTP Request/Response** following the **Agent-to-Agent (A2A) protocol**. This pattern ensures clear, decoupled communication between the three main services: `Simulator`, `AlphaBot`, and `RiskGuard`.\n\n### Communication Patterns\n*   **Simulator to AlphaBot**: The `Simulator` initiates the trading cycle by making an HTTP POST request to the `AlphaBot` A2A endpoint. This request passes the current `MarketData` and `PortfolioState` (defined in `common/models.py`) to request the `provide_trade_signal` skill.\n*   **AlphaBot to RiskGuard**: This critical interaction is mediated by the `RiskGuardTool` within the `AlphaBotAgent`. When the LLM-based agent decides on a potential trade, it is prompted to use this tool. The tool then makes an internal HTTP POST request to the `RiskGuard` A2A endpoint, passing a `ProposedTrade` to request the `check_trade_risk` skill.\n\n### Key Interaction Flow: Trade Decision and Execution\nThe entire system operates within a simulation loop orchestrated by the `Simulator` (`simulator/main.py`):\n\n1.  **State Collection**: The `Simulator` retrieves the current market data from the `Market` component and the portfolio status from the `Portfolio` component.\n2.  **Decision Request**: The `Simulator` sends the combined state data (`MarketData`, `PortfolioState`) to the `AlphaBot` service via an A2A call.\n3.  **Internal Risk Check**: The `AlphaBotAgent` processes the request using its LLM-based logic. It decides on a potential trade (`ProposedTrade`) and, as a safety measure, is prompted to use the `RiskGuardTool` to validate this trade.\n4.  **Risk Validation**: The `RiskGuardTool` calls the `RiskGuard` service. `RiskGuard` applies its deterministic rules (`riskguard/rules.py`) to the `ProposedTrade` and returns a `RiskCheckResult` (approved or rejected).\n5.  **Final Signal**: The `AlphaBotAgent` incorporates the `RiskCheckResult` into its final reasoning. If approved, it outputs a `TradeSignal`. If rejected, it typically outputs a \"HOLD\" signal or a revised trade, which is then returned to the `Simulator`.\n6.  **Execution**: The `Simulator` receives the `TradeSignal`. If the signal is not \"HOLD\" and the trade is valid, it calls `Portfolio.execute_trade()`, updating the system's state.\n\n### Data Flow Summary\n| Source Component | Destination Component | Data Model | Purpose |\n| :--- | :--- | :--- | :--- |\n| Simulator | AlphaBot | `MarketData`, `PortfolioState` | Request for a trade decision. |\n| AlphaBot | RiskGuard | `ProposedTrade` | Request for risk validation. |\n| RiskGuard | AlphaBot | `RiskCheckResult` | Risk validation outcome. |\n| AlphaBot | Simulator | `TradeSignal` | Final trading decision. |\n| Simulator | Portfolio | `TradeSignal` | Instruction to execute the trade. |\n| Portfolio | Simulator | `PortfolioState` | Updated state after trade execution. |\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml architecture\ntitle Agentic Trading System Architecture\n\n' Define the main components (modules)\npackage \"Simulator (Orchestrator & UI)\" as Simulator {\n  class SimulatorApp\n  class Market\n  class Portfolio\n}\n\npackage \"AlphaBot (Trading Agent)\" as AlphaBot {\n  class AlphaBotAgent\n  class RiskGuardTool\n}\n\npackage \"RiskGuard (Risk Agent)\" as RiskGuard {\n  class RiskGuardRules\n}\n\npackage \"Common (Shared Models & Utils)\" as Common {\n  class DataModels\n  class AgentUtils\n}\n\n' Define the inter-module dependencies and data flow\n\n' 1. Simulator orchestrates the system\nSimulatorApp --> AlphaBot : 1. Request Trade Signal (HTTP/A2A)\nSimulatorApp --> Portfolio : 3. Execute Approved Trade\nSimulatorApp --> Market : 1. Get Market Data\n\n' 2. AlphaBot uses RiskGuard as a tool\nAlphaBotAgent --> RiskGuardTool : Uses Tool\nRiskGuardTool --> RiskGuard : 2. Check Proposed Trade Risk (HTTP/A2A)\n\n' 3. All components rely on Common for models and utilities\nSimulatorApp ..> Common : Uses DataModels, AgentUtils\nAlphaBotAgent ..> Common : Uses DataModels\nRiskGuardRules ..> Common : Uses DataModels\n\n' 4. Data Flow within Simulator\nMarket --> SimulatorApp : Provides MarketData\nPortfolio --> SimulatorApp : Provides PortfolioState\n\n' 5. Data Flow between Agents\nAlphaBotAgent --> DataModels : Outputs TradeSignal\nRiskGuardRules --> DataModels : Outputs RiskCheckResult\n\n' High-level flow note\nnote \"Simulation Loop Flow:\\n1. Simulator requests trade from AlphaBot.\\n2. AlphaBot uses RiskGuardTool to check risk.\\n3. AlphaBot returns TradeSignal.\\n4. Simulator executes trade on Portfolio.\" as FlowNote\nSimulatorApp .. FlowNote\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe `agentic-trading` project effectively utilizes several design patterns to manage complexity and ensure robust inter-service communication:\n\n### 1. Agent Pattern (Specialized Agents)\nThis is the foundational pattern, where the system's intelligence is distributed across specialized, autonomous entities.\n*   **Implementation**: The system features the **AlphaBot Agent** (decision-making) and the **RiskGuard Agent** (gatekeeping).\n*   **Code Example**: The definitions of `AlphaBotAgent` (`alphabot/agent.py`) and `RiskGuardAgent` (`riskguard/agent.py`), both wrapped in an A2A executor, exemplify this pattern.\n\n### 2. Service-Oriented Architecture (SOA) / Microservices Pattern\nThe project is decomposed into independent, loosely coupled services that communicate over a network (HTTP/A2A), allowing for independent deployment and scaling.\n*   **Implementation**: Each core component (`alphabot`, `riskguard`, `simulator`) runs as a separate service with its own entry point and is designed to be containerized (indicated by `Dockerfile.*`).\n*   **Code Example**: The `call_agent_skill` function in `common/utils/agent_utils.py` abstracts the HTTP communication layer, treating each agent as a distinct service endpoint.\n\n### 3. Data Transfer Object (DTO) Pattern\nPydantic models are used to transfer data between services, ensuring a well-defined, validated, and self-documenting data structure.\n*   **Implementation**: Models like `MarketData`, `PortfolioState`, `TradeSignal`, and `RiskCheckResult` defined in `common/models.py` act as the DTOs.\n*   **Code Example**: The use of `PydanticOutputParser` in `alphabot/agent.py` ensures the LLM's output strictly conforms to the `TradeSignal` DTO.\n\n### 4. Tool/Function Calling Pattern (LangChain)\nThe LLM-based agent is given access to external functions (tools) it can choose to call to perform actions or gather information.\n*   **Implementation**: The `AlphaBotAgent` is provided with the `RiskGuardTool` (`alphabot/a2a_risk_tool.py`), which encapsulates the logic for calling the separate `RiskGuard` service. This integrates a microservice call directly into the LLM's reasoning chain.\n*   **Code Example**: The `RiskGuardTool` class inherits from a LangChain `BaseTool` and its `_run` method executes the A2A call to the RiskGuard service.\n\n#### 3.3.2. Project Highlights\n\nThe `agentic-trading` project showcases several innovative features and design choices that contribute to its robustness, extensibility, and safety:\n\n*   **Safety-First Agentic Design**: The most significant highlight is the **separation of intelligence and safety**. The LLM-driven `AlphaBot` (intelligence) is forced to consult the deterministic, rule-based `RiskGuard` (safety) via the Tool/Function Calling pattern. This design mitigates the risk of LLM hallucinations or irrational decisions by enforcing hard constraints before any trade is executed. This is a critical feature for any real-world trading system.\n*   **Protocol-Oriented Interoperability (A2A)**: By adopting the Agent-to-Agent (A2A) protocol, the project establishes a clear, standardized way for agents to discover and interact with each other. This promotes high interoperability and makes it easy to swap out or add new agents (e.g., a new `DataAgent` or `ExecutionAgent`) without modifying the core logic of existing agents.\n*   **Clear Separation of Concerns**: The three main components (`alphabot`, `riskguard`, `simulator`) are highly decoupled. The `Simulator` is purely the environment and orchestrator, `AlphaBot` is the LLM-based decision engine, and `RiskGuard` is the deterministic rule engine. This separation simplifies testing, maintenance, and technology upgrades for each part.\n*   **Strong Data Contract**: The universal use of Pydantic models in the `common` module provides a robust, self-documenting, and validated data contract across all service boundaries, which is a hallmark of professional microservice design and enhances the flexibility of the system.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe analysis reveals several areas for performance optimization, architectural refinement, and code quality improvement:\n\n1.  **Performance Bottleneck: LLM Latency**:\n    *   **Suggestion**: Introduce **asynchronous execution** for the `AlphaBotAgent`. The current synchronous call to the LLM blocks the A2A request handler in `alphabot`, limiting concurrency.\n    *   **Specific Action**: Refactor `AlphaBotAgentExecutor` to use an asynchronous LangChain agent and make the A2A request handling fully `async` (e.g., using `asyncio` and `httpx` for the `RiskGuardTool` call). This allows the `alphabot` service to handle multiple trade signal requests concurrently while waiting for the LLM response.\n\n2.  **Architecture Optimization: RiskGuard Rules Extensibility**:\n    *   **Suggestion**: Decouple the risk rules from the `RiskGuardRules` class logic to allow for dynamic, external configuration.\n    *   **Specific Action**: Implement a **Strategy Pattern** or a simple configuration loader (e.g., reading rules from a YAML or JSON file) in `riskguard/rules.py`. This enables new risk rules to be added, modified, or disabled without requiring a code change and redeployment of the `riskguard` service.\n\n3.  **Code Quality: Indicator Optimization**:\n    *   **Suggestion**: Ensure the financial calculations in `common/utils/indicators.py` are optimized for performance.\n    *   **Specific Action**: Verify that all indicator calculations leverage vectorized operations using libraries like **NumPy** or **Pandas** instead of pure Python loops. This is crucial for performance when processing large time-series datasets in a real-world scenario.\n\n4.  **Observability and Logging**:\n    *   **Suggestion**: Implement structured logging and distributed tracing across the services.\n    *   **Specific Action**: Use a structured logging library (e.g., `structlog`) and ensure a unique **correlation ID** is passed in the A2A requests (from `Simulator` to `AlphaBot` to `RiskGuard`). This ID would link all log entries for a single trade decision, making debugging and performance monitoring significantly easier.\n\n#### 3.4.2. Secondary Development Guide\n\nThis guide outlines the best practices for exploring and extending the `agentic-trading` codebase for secondary development.\n\n1.  **Understand the Core Data Contract**: Begin by thoroughly reviewing `common/models.py`. All inter-service communication is governed by these Pydantic models. Any change to an agent's input or output MUST be reflected here first. Use these models for type hinting in all new code to ensure compatibility and leverage static analysis.\n\n2.  **Extend the Agent Logic (AlphaBot)**: To change the trading strategy or LLM prompt, modify the prompt template and the `AlphaBotAgent` logic in `alphabot/agent.py`. If the new strategy requires external data or actions, create a new LangChain `BaseTool` in `alphabot/` and add it to the agent's tool list. Always ensure the final output strictly adheres to the `TradeSignal` schema.\n\n3.  **Modify Risk Rules (RiskGuard)**: To add or modify a risk constraint, edit the `RiskGuardRules.check_trade_risk` method in `riskguard/rules.py`. Risk rules are deterministic and should be kept simple and fast. New rules MUST be accompanied by unit tests in `tests/riskguard/` to ensure correctness.\n\n4.  **Add a New Agent**: To introduce a new specialized agent (e.g., a `DataAgent` or `ExecutionAgent`), create a new module, define its input/output models in `common/models.py`, implement the agent logic, and wrap it in an A2A executor (`__main__.py`). Finally, update the `Simulator` or another agent to use this new agent as a tool or service.\n\n5.  **Testing and Simulation**: The `simulator` is the primary integration test environment. Use the `simulator/main.py` to run end-to-end tests of your changes. All new logic should have dedicated unit tests in the `tests/` directory.\n\n"
  },
  {
    "path": "thirdparty/awesome-quant.md",
    "content": "# awesome-quant - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project structure is typical for a GitHub \"awesome list\" that includes automation scripts and a static site generator (Quarto) for presentation. The core logic is distributed across a few Python scripts in the root directory, which are responsible for data processing and list maintenance.\n\nThe root directory (`/home/ubuntu/FinnewsHunter/thirdparty/awesome-quant`) serves as the central hub, containing the primary content source (`README.md`), configuration files (`pyproject.toml`, `poetry.lock`), and the executable Python scripts. The scripts `parse.py`, `cranscrape.py`, and `topic.py` form the data processing layer, transforming raw content into structured data. `parse.py` is the most critical, responsible for reading the Markdown list, enriching it with GitHub metadata, and outputting `site/projects.csv`. `cranscrape.py` is a specialized utility for scraping R package data into `cran.csv`.\n\nThe `site/` directory is the dedicated presentation layer, built around the **Quarto** publishing system. It contains the site configuration (`_quarto.yml`), static content pages (`about.qmd`, `index.qmd`), and the dynamic list template (`projects.qmd`). This separation ensures that the content maintenance logic (Python scripts) is decoupled from the content presentation logic (Quarto). The `.github/workflows/` folder contains the CI/CD pipeline (`build.yml`) which automates the execution of the Python scripts and the Quarto site generation, completing the \"Content-as-Code\" lifecycle. This structure is highly efficient for a data-driven, open-source documentation project.\n\n```\n/home/ubuntu/FinnewsHunter/thirdparty/awesome-quant\n├── .git/                 # Git version control metadata\n├── .github/              # GitHub Actions workflows for CI/CD\n│   └── workflows/\n│       └── build.yml     # Workflow to build the Quarto site\n├── .gitignore            # Files to be ignored by Git\n├── .nojekyll             # Configuration to disable Jekyll processing on GitHub Pages\n├── README.md             # Main entry point and introduction to the awesome list\n├── cran.csv              # Output data file from cranscrape.py (list of R packages)\n├── cranscrape.py         # Python script to scrape R package data from CRAN\n├── legacy.txt            # List of legacy or deprecated entries\n├── parse.py              # Main Python script to parse README.md, fetch GitHub commit dates, and generate projects.csv\n├── poetry.lock           # Poetry dependency lock file\n├── pyproject.toml        # Poetry project configuration and dependencies\n├── quants.md             # Secondary Markdown file, likely containing the main list content\n├── recommendation.ipynb  # Jupyter notebook for content recommendation/analysis\n├── site/                 # Directory for the Quarto-based static website generation\n│   ├── .gitignore        # Git ignore for the site build\n│   ├── CODE_OF_CONDUCT.qmd # Quarto document for the Code of Conduct\n│   ├── _quarto.yml       # Quarto site configuration file\n│   ├── about.qmd         # Quarto document for the \"About\" page\n│   ├── index.qmd         # Quarto document for the main index page\n│   ├── projects.csv      # Output data file from parse.py (list of all projects with metadata)\n│   └── projects.qmd      # Quarto document to display the list of projects\n├── styles.css            # Custom CSS for the Quarto site\n└── topic.py              # Python script for topic-related processing (needs further analysis)\n```\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinnewsHunter/thirdparty/awesome-quant/` (Root): Contains the main Python scripts (`parse.py`, `cranscrape.py`, `topic.py`) responsible for data extraction, processing, and list maintenance, as well as the primary content files (`README.md`, `quants.md`).\n*   `/home/ubuntu/FinnewsHunter/thirdparty/awesome-quant/site/`: Contains the Quarto documents (`.qmd`) and configuration files necessary to generate the static website, which serves as the final, structured presentation of the awesome list data.\n\n## Phase 2: Module-by-Module Deep Analysis\n\nThe project's core functionality is implemented across three main Python scripts and the Quarto-based static site generation directory. These components form a data pipeline for maintaining and publishing the \"awesome-quant\" list.\n\n## 1. Root Directory Scripts (Data Processing Pipeline)\n\n### 1.1. `parse.py` (List Parsing and Metadata Fetcher Module)\n\n*   **File Enumeration:** `parse.py`\n*   **Module Core Responsibility:** This is the central orchestration script. Its primary purpose is to parse the main content file (`README.md`), extract project links and descriptions, concurrently fetch the latest commit date for each GitHub repository using the GitHub API, and compile the final, structured project data into `site/projects.csv`.\n*   **Key File Identification:** `parse.py` is the single key file.\n*   **Core Implementation:**\n    *   **GitHub API Integration:** It initializes a `Github` object from the `pygithub` library using an environment variable (`GITHUB_ACCESS_TOKEN`). This is critical for fetching up-to-date metadata.\n    *   **Concurrency:** The script uses the `threading.Thread` class, which is subclassed into the custom `Project` class. This allows the time-consuming network calls to the GitHub API (`get_last_commit`) to run in parallel, significantly speeding up the data collection process.\n    *   **Parsing Logic:** It uses two main regular expressions: `ret` for matching Markdown headers (`#+ Title`) to track the project section, and `rex` for matching the list items (`- [project](url) - description`). The script iterates line-by-line through `README.md`, maintaining a stack of section titles (`m_titles`) to categorize each project.\n    *   **Data Structure:** The `Project` thread stores its results in a dictionary (`self.regs`) which is later collected into a list of dictionaries and converted into a `pandas.DataFrame` for final output to CSV.\n*   **Dependencies:** `os`, `re`, `pandas`, `threading`, `github` (PyGithub).\n*   **Error & Performance:** The script includes a basic `try...except` block in `get_last_commit` to handle API errors (e.g., repository not found or API rate limits) and logs the error, preventing a full script crash. The use of multi-threading is the primary performance optimization.\n\n### 1.2. `cranscrape.py` (CRAN Scraper Module)\n\n*   **File Enumeration:** `cranscrape.py`\n*   **Module Core Responsibility:** This script is a specialized data collector focused on R packages relevant to quantitative finance. It scrapes a hardcoded list of CRAN package pages to find associated GitHub links.\n*   **Key File Identification:** `cranscrape.py` is the single key file.\n*   **Core Implementation:**\n    *   **Web Scraping:** It uses the `requests` library to fetch the HTML content of a predefined list of CRAN URLs.\n    *   **Regex Extraction:** A regular expression (`reu = re.compile(r'https://github.com/([\\w-]+/[\\w-]+)')`) is used to extract the GitHub URL from the fetched HTML, assuming the link is present on the package's index page.\n    *   **Output:** The collected data (CRAN URL, GitHub URL, and repository path) is saved to `cran.csv`. This file serves as a supplementary data source, potentially for merging or cross-referencing with the main `projects.csv`.\n*   **Dependencies:** `requests`, `re`, `pandas`.\n\n### 1.3. `topic.py` (GitHub Topic Search Utility)\n\n*   **File Enumeration:** `topic.py`\n*   **Module Core Responsibility:** This is a utility script, likely used for discovery and maintenance, to find new, popular repositories tagged with the 'quant' topic on GitHub.\n*   **Key File Identification:** `topic.py` is the single key file.\n*   **Core Implementation:** It uses the PyGithub `search_repositories` method with a query string (`topic:quant`) and filters the results to only show repositories with a high star count (currently hardcoded to `stargazers_count < 1000`). The results are printed to the console.\n*   **Dependencies:** `os`, `github` (PyGithub).\n\n## 2. `site/` Directory (Static Site Generation Module)\n\n*   **File Enumeration:** `site/.gitignore`, `site/CODE_OF_CONDUCT.qmd`, `site/_quarto.yml`, `site/about.qmd`, `site/index.qmd`, `site/projects.csv`, `site/projects.qmd`.\n*   **Module Core Responsibility:** To transform the collected and processed data (`projects.csv`) and the Quarto Markdown content (`.qmd` files) into a complete, navigable static website for publishing.\n*   **Key File Identification:**\n    *   `_quarto.yml`: Defines the site's configuration, navigation, and output format.\n    *   `projects.qmd`: The template file responsible for reading `projects.csv` and rendering the main list of projects, likely using Quarto's data-driven table features.\n*   **Core Implementation:** The implementation relies entirely on the external **Quarto** publishing system. The `.qmd` files are essentially enhanced Markdown that can embed code (e.g., R or Python) to process data and generate dynamic content, such as tables from the `projects.csv` file. This separation of concerns delegates the presentation layer to a dedicated tool.\n*   **Dependencies:** External Quarto system.\n\n### Module PlantUML Diagrams\n\n## Diagram for `parse.py` Module\n\nThis diagram focuses on the main class and its interaction with external services and data structures within the `parse.py` script.\n\n```plantuml\n@startuml parse_module\ntitle parse.py Module Class Diagram\n\n' External Dependencies\npackage \"External Libraries\" {\n  class Github\n  class Thread\n  class DataFrame\n}\n\n' Main Script Logic\nclass ParseScript {\n  - g: Github\n  - projects: list<Project>\n  - ret: Regex (Header)\n  - rex: Regex (List Item)\n  + main_execution()\n}\n\n' Utility Functions\nclass UtilityFunctions {\n  + extract_repo(url: str): str\n  + get_last_commit(repo: str): str\n}\n\n' Core Abstraction for Concurrency\nclass Project {\n  - _match: RegexMatch\n  - _section: str\n  + regs: dict\n  + __init__(match, section)\n  + run(): void\n}\n\n' Relationships\nParseScript --> Github : uses API\nParseScript --> Project : creates and manages threads\nParseScript --> UtilityFunctions : uses helper functions\nParseScript --> DataFrame : outputs final data\nProject ..> Thread : extends\nProject --> UtilityFunctions : calls get_last_commit\nProject --> extract_repo : calls\nUtilityFunctions --> Github : uses API\n\n@enduml\n```\n\n## Diagram for `cranscrape.py` Module\n\nThis script is purely procedural and does not contain classes, so a class diagram is not applicable. A functional flow diagram would be more appropriate, but to adhere to the class diagram requirement, a conceptual representation of its main function is provided.\n\n```plantuml\n@startuml cranscrape_module\ntitle cranscrape.py Module Conceptual Diagram\n\n' External Dependencies\npackage \"External Libraries\" {\n  class requests\n  class re\n  class pandas\n}\n\n' Main Function\nclass get_data {\n  + url: str\n  + res: Response\n  + github_url: str\n  + get_data(url): dict\n}\n\n' Main Script Execution\nclass MainExecution {\n  - urls: list<str>\n  - all_data: list<dict>\n  + execute_scraping()\n  + save_to_csv(filename: str)\n}\n\n' Relationships\nMainExecution --> get_data : calls for each URL\nget_data --> requests : fetches URL content\nget_data --> re : extracts GitHub URL\nMainExecution --> pandas : creates DataFrame and saves CSV\n\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe **awesome-quant** project employs a **Data Pipeline Architecture** centered around the maintenance and publication of a curated list. The design philosophy is one of **separation of concerns** and **data-driven generation**, where raw content is processed by scripts to produce structured data, which is then consumed by a dedicated presentation layer.\n\nThe core abstractions are:\n\n1.  **The Project Entity (`Project` class in `parse.py`):** This is the fundamental unit of data. It abstracts a single entry from the Markdown list, encapsulating its name, URL, description, and crucial metadata like the `last_commit` date, which is fetched concurrently. This abstraction is key to enriching the raw list data.\n2.  **The Data Source (Markdown/CRAN):** The project treats the `README.md` file as the primary, human-editable source of truth. The `cranscrape.py` script introduces a secondary, specialized data source abstraction for R packages on CRAN.\n3.  **Concurrent Enrichment:** The use of the `threading.Thread` subclass for the `Project` entity is an abstraction over the time-consuming process of external API calls. It abstracts the complexity of parallel execution, allowing the main script to initiate many data-fetching tasks simultaneously.\n\n**Design Philosophy:** The project follows a **\"Content-as-Code\"** philosophy. The human-curated list is maintained in a simple Markdown file (`README.md`), which is then programmatically processed and validated by the Python scripts. This ensures that the list remains up-to-date (via `last_commit` checks) and can be presented in a highly structured, machine-readable format (`projects.csv`) before final publication.\n\n**Lifecycle Management:** The project lifecycle is a simple, three-stage process:\n1.  **Content Authoring:** A maintainer edits `README.md` and `quants.md`.\n2.  **Data Processing:** The Python scripts (`parse.py`, `cranscrape.py`) are executed to generate the structured data files (`projects.csv`, `cran.csv`).\n3.  **Publication:** The Quarto system builds the static site by consuming the structured data and the `.qmd` templates, resulting in the final website. This process is automated via the `.github/workflows/build.yml` GitHub Action.\n\n#### 3.1.2. Component Interactions\n\nThe architecture is a linear data flow pipeline with two main branches: the primary list processing and the secondary CRAN scraping.\n\n**1. Primary Data Flow (List Processing):**\n*   **Input:** `README.md` (raw Markdown list).\n*   **Processor:** `parse.py` reads the Markdown line-by-line, using regular expressions to extract project details (Name, URL, Description).\n*   **External Communication:** For each extracted project URL, `parse.py` initiates a concurrent request to the **GitHub API** (via PyGithub) to fetch the repository's latest commit date. This is the main external communication pattern.\n*   **Output:** The enriched data is aggregated and written to `site/projects.csv`.\n\n**2. Secondary Data Flow (CRAN Scraping):**\n*   **Input:** Hardcoded list of CRAN URLs within `cranscrape.py`.\n*   **Processor:** `cranscrape.py` uses the `requests` library to scrape the HTML of each CRAN page.\n*   **External Communication:** Direct HTTP requests to the **CRAN website**.\n*   **Output:** The results, primarily the extracted GitHub URL for each R package, are written to `cran.csv`.\n\n**3. Presentation Flow (Site Generation):**\n*   **Input:** `site/projects.csv` (structured project data) and the Quarto template files (`.qmd`).\n*   **Processor:** The **Quarto** static site generator.\n*   **Output:** The final static website, where `site/projects.qmd` uses the data in `projects.csv` to render a dynamic, sortable table of all projects.\n\n**Communication Patterns:**\n*   **Internal:** File-based communication (e.g., `parse.py` -> `projects.csv` -> Quarto).\n*   **External:** Synchronous API calls (GitHub API) and HTTP requests (CRAN website). The use of **multi-threading** in `parse.py` is a key pattern to mitigate the latency of external communication.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml awesome_quant_architecture\ntitle Awesome-Quant Data Pipeline Architecture\n\n' Define Components\ncomponent [README.md] as README {\n  Source of raw list content\n}\n\ncomponent [CRAN Scraper] as CRAN_SCRAPER {\n  cranscrape.py\n}\n\ncomponent [List Parser] as PARSER {\n  parse.py\n}\n\ncomponent [Structured Data] as DATA {\n  projects.csv\n  cran.csv\n}\n\ncomponent [Quarto Site Generator] as QUARTO {\n  site/\n  .qmd templates\n}\n\n' Define External Systems\ncloud \"External Services\" {\n  [GitHub API] as GITHUB_API\n  [CRAN Website] as CRAN_WEB\n}\n\n' Define Data Flow and Dependencies\nREADME --> PARSER : Reads raw list content\nPARSER --> GITHUB_API : Fetches last commit date (Concurrent)\nPARSER --> DATA : Writes enriched project data\n\nCRAN_SCRAPER --> CRAN_WEB : Scrapes GitHub links\nCRAN_SCRAPER --> DATA : Writes R package data\n\nDATA --> QUARTO : Consumes structured data\n\nQUARTO --> [Final Static Website] : Generates HTML/CSS\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe codebase, though small, effectively utilizes several design patterns to manage complexity, especially around external API interaction and data processing.\n\n1.  **Worker Thread Pattern (Concurrency)**\n    *   **Description:** This pattern is used to execute time-consuming tasks (fetching data from the GitHub API) in parallel, preventing the main thread from blocking and significantly improving the script's execution time.\n    *   **Implementation:** The `Project` class in `parse.py` inherits from `threading.Thread`. Each project entry in the Markdown list is instantiated as a `Project` object, and its `run` method is executed in a separate thread.\n    *   **Code Example (`parse.py`):**\n        ```python\n        class Project(Thread):\n            def __init__(self, match, section):\n                super().__init__()\n                # ... initialization ...\n\n            def run(self):\n                # ... calls get_last_commit(repo) which is the blocking network call\n                last_commit = get_last_commit(repo)\n                # ... stores result in self.regs\n        \n        # ... in main execution loop:\n        p = Project(m, ' > '.join(m_titles[1:]))\n        p.start()\n        projects.append(p)\n        ```\n\n2.  **Data-Driven Generation Pattern**\n    *   **Description:** The content presentation is entirely driven by a structured data file (`projects.csv`), which is generated by the processing scripts. This separates the content logic from the presentation logic.\n    *   **Implementation:** The `parse.py` script's final action is to create `projects.csv`. The Quarto site generator then uses this CSV file to dynamically render the project list page (`site/projects.qmd`).\n    *   **Code Example (`parse.py`):**\n        ```python\n        df = pd.DataFrame(projects)\n        df.to_csv('site/projects.csv', index=False)\n        ```\n\n3.  **Template Method Pattern (Implicit)**\n    *   **Description:** The overall maintenance process follows a fixed sequence of steps: Read Source (Markdown) -> Enrich Data (GitHub API) -> Persist Data (CSV) -> Render Presentation (Quarto). The Python scripts provide the \"enrich and persist\" steps, which are customized implementations within a broader, fixed pipeline.\n\n#### 3.3.2. Project Highlights\n\n*   **Automated Content Enrichment:** The most significant highlight is the programmatic fetching of the **last commit date** for every listed repository via the GitHub API. This feature automatically provides a crucial metric for list users—the project's recency and maintenance status—which is a common pain point for manually maintained awesome lists.\n*   **Content-as-Code (CaC) Philosophy:** By treating the list content (`README.md`) as a source file that is parsed and processed by code, the project ensures that the human-readable list and the machine-readable data (`projects.csv`) are always synchronized. This reduces manual effort and potential errors in maintaining a large, dynamic list.\n*   **Decoupled Presentation Layer:** The use of **Quarto** for static site generation provides a professional, feature-rich web interface (e.g., dynamic tables, search, filtering) without requiring complex web development code within the core data processing scripts. The scripts only focus on generating the data, and Quarto handles the complex task of rendering.\n*   **Extensibility via Data Sources:** The architecture is flexible enough to support multiple data sources. The existence of `cranscrape.py` alongside `parse.py` shows that specialized data collection pipelines (e.g., for R packages from CRAN) can be easily added to enrich the final dataset without modifying the core parsing logic.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe current implementation is effective but has several areas for optimization, primarily related to robustness and modern Python practices.\n\n1.  **Robust Error Handling:** The `get_last_commit` function in `parse.py` uses a bare `except:` block. This is a critical anti-pattern as it catches all exceptions, including system errors, making debugging difficult and potentially masking critical failures.\n    *   **Suggestion:** Replace the bare `except:` with specific exception handling for the `PyGithub` library, such as `except UnknownObjectException` for 404 errors or `except RateLimitExceededException` for API limits. This allows for targeted logging and recovery.\n\n2.  **Modern Concurrency Management:** The use of raw `threading.Thread` with a manual polling loop (`while True: checks = [not p.is_alive() for p in projects]`) is functional but verbose and less idiomatic in modern Python.\n    *   **Suggestion:** Refactor the concurrent logic in `parse.py` to use `concurrent.futures.ThreadPoolExecutor`. This abstraction simplifies thread management, automatically handles the waiting process, and results in cleaner, more readable code.\n\n3.  **Configuration Externalization:** The list of CRAN URLs in `cranscrape.py` is hardcoded. This requires modifying the source code to update the list of R packages being scraped.\n    *   **Suggestion:** Move this list to an external configuration file (e.g., a simple text file or a JSON/YAML file). This allows for easier maintenance and updates to the list of R packages without modifying the Python script itself, improving maintainability.\n\n4.  **Markdown Parsing Robustness:** The reliance on regular expressions (`rex` and `ret`) to parse the `README.md` is fragile. A slight change in Markdown formatting could break the script.\n    *   **Suggestion:** Adopt a dedicated Markdown parsing library (e.g., `markdown-it-py` or a similar tool) to create an Abstract Syntax Tree (AST) of the document. This would make the parsing logic resilient to minor formatting variations and future changes in the list's structure.\n\n#### 3.4.2. Secondary Development Guide\n\nThis guide outlines the best practices for exploring and contributing to the **awesome-quant** project.\n\n1.  **Environment Setup:**\n    *   Clone the repository: `gh repo clone wilsonfreitas/awesome-quant`\n    *   Install dependencies using Poetry (as indicated by `pyproject.toml`): `poetry install`\n    *   Set the required environment variable for the GitHub API: `export GITHUB_ACCESS_TOKEN='your_token'`\n\n2.  **Content Contribution (Adding a Project):**\n    *   The primary source of truth is `README.md`. Add new entries following the existing Markdown list format: `- [Project Name](https://github.com/user/repo) - A brief description of the project.`\n    *   Ensure the URL is a direct link to the GitHub repository for the `parse.py` script to correctly extract the repository path.\n\n3.  **Data Generation and Validation:**\n    *   Run the main data processing script: `python parse.py`\n    *   This script will concurrently fetch the latest commit date for all projects and generate the `site/projects.csv` file.\n    *   If updating the R package list, run: `python cranscrape.py` to update `cran.csv`.\n\n4.  **Local Site Preview:**\n    *   To view the final output, ensure Quarto is installed and run the site generator from the root directory: `quarto preview site`\n    *   This will build the static site using the newly generated `projects.csv` and launch a local web server for review.\n\n5.  **Script Exploration:**\n    *   Start with `parse.py` to understand the core data flow and the concurrent API interaction logic.\n    *   Examine the `Project` class to see how data is enriched, and the main loop to see how Markdown headers are used to categorize projects.\n\n"
  },
  {
    "path": "thirdparty/backtrader.md",
    "content": "# backtrader - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project structure is typical for a Python package, with the core logic residing in the `backtrader` directory. The main `backtrader/` directory serves as the core Python package containing the framework's logic. It is organized into several sub-modules: `analyzers/` for performance metrics, `brokers/` for trade execution simulation and integration, `comminfos/` for commission and slippage models, `dataseries/` for time-series data structures, `feeds/` for data loading mechanisms, `indicators/` for technical analysis tools, `observers/` for backtest monitoring, `strategies/` for user-defined trading logic base classes, and `utils/` for general helper functions. Outside the core package, the repository includes `docs/` for documentation, `examples/` for usage demonstrations, `tests/` for unit and integration testing, and `tools/` for command-line utilities. This modular structure clearly separates the core engine, financial modeling, data handling, and user-facing logic, which is a hallmark of a well-designed, extensible framework.\n```\n\n### 1.2. Core Folders for Analysis\n\n- `/home/ubuntu/backtrader/backtrader` (Main Engine and Core Classes)\n- `/home/ubuntu/backtrader/backtrader/analyzers` (Performance Metrics)\n- `/home/ubuntu/backtrader/backtrader/brokers` (Broker Simulation and Integration)\n- `/home/ubuntu/backtrader/backtrader/comminfos` (Commission and Slippage Models)\n- `/home/ubuntu/backtrader/backtrader/dataseries` (Time Series Data Handling)\n- `/home/ubuntu/backtrader/backtrader/feeds` (Data Loading and Management)\n- `/home/ubuntu/backtrader/backtrader/indicators` (Technical Analysis Indicators)\n- `/home/ubuntu/backtrader/backtrader/observers` (Backtest Monitoring and Visualization)\n- `/home/ubuntu/backtrader/backtrader/strategies` (Strategy Definition Base Classes)\n- `/home/ubuntu/backtrader/backtrader/utils` (Helper Functions and Mixins)\n\n## Phase 2: Module-by-Module Deep Analysis\n\n# Module Analysis: Core Engine and Base Classes (`/backtrader`)\n\n## 3. Module Core Responsibility\nThis module, which is the root `backtrader` directory, contains the **core execution engine** and the **fundamental base classes** that define the entire framework's architecture. Its primary responsibility is to manage the backtesting lifecycle, synchronize data, execute trading logic, and simulate market interactions.\n\n## 3.1 Key Files and Responsibilities\n\n| File | Core Responsibility |\n| :--- | :--- |\n| `cerebro.py` | **The Brain/Engine**: Manages the backtesting run, aggregates strategies, data feeds, brokers, and observers. Controls the main execution loop (`run()`). |\n| `strategy.py` | **User Logic Base**: Defines the `Strategy` and `SignalStrategy` base classes where users implement their trading logic (`next()`, `notify_order()`). |\n| `broker.py` | **Market Simulation Base**: Defines `BrokerBase` and `BackBroker` for simulating cash, portfolio value, order submission, and position management. |\n| `order.py` | **Transaction Object**: Defines `OrderBase` and `Order` objects, including various order types (Market, Limit, Stop) and their execution details (`OrderExecutionBit`). |\n| `dataseries.py` | **Time-Series Data**: Defines the core data structures like `DataSeries`, `OHLC`, and `OHLCDateTime` which encapsulate financial time-series data. |\n| `indicator.py` | **Technical Analysis Base**: Defines the `Indicator` base class, which is a specialized `LineIterator` for calculating technical values. |\n| `metabase.py` | **Metaclass System**: Contains `MetaParams`, the custom metaclass that enables the powerful parameter and line-based system used throughout the framework. |\n| `linebuffer.py`, `lineiterator.py`, `lineroot.py`, `lineseries.py` | **Data Flow and Synchronization**: Defines the abstract classes (`LineRoot`, `LineIterator`, `LineSeries`) that manage data synchronization, lookback, and calculation dependencies. |\n\n## 4. Code Detail Analysis\n\n### 4.1 Core Implementation: The Line-Based System\n\nThe most critical and innovative part of the `backtrader` core is its **Line-Based System**, implemented through a hierarchy of classes: `LineRoot`, `LineSingle`, `LineMultiple`, and `LineSeries`.\n\n*   **`LineRoot`**: The base class for any object that holds a series of values (a \"line\"). It uses the custom `MetaParams` metaclass to handle class-level parameters.\n*   **`LineIterator`**: A mixin/base class that allows objects (like `Strategy` and `Indicator`) to iterate over their input lines, ensuring synchronization.\n*   **`LineSeries`**: Represents a time-series of values, providing array-like access (`self.lines[0][0]`) and lookback functionality (`self.lines[0][-1]`).\n\nThis system is the foundation for:\n1.  **Data Synchronization**: The `Cerebro` engine uses the `LineIterator` mechanism to ensure that all data feeds, indicators, and strategies are advanced synchronously bar-by-bar.\n2.  **Lookback and Dependency Management**: Indicators automatically manage their required lookback period (`_minperiod`) by inspecting the dependencies defined in their `__init__` method.\n\n### 4.2 Dependencies\n\nThe core module has strong internal dependencies, forming a tightly coupled system:\n*   **`Cerebro`** depends on:\n    *   `Strategy` (to run the logic)\n    *   `BrokerBase` (to handle execution)\n    *   `DataSeries` and `FeedBase` (to provide data)\n    *   `Observer` (to collect statistics)\n*   **`Strategy`** depends on:\n    *   `BrokerBase` (to submit orders)\n    *   `Order` (to create transactions)\n    *   `Sizer` (to calculate position size)\n    *   `Indicator` (for technical analysis)\n*   **`BrokerBase`** depends on:\n    *   `CommInfoBase` (for commission calculation)\n    *   `Order` and `Position` (for state management)\n\n### 4.3 Error & Performance\n\n*   **Error Handling**: The core uses custom exceptions defined in `errors.py` (e.g., `BacktraderError`, `StrategySkipError`). The `Cerebro` engine is responsible for catching and notifying strategies of order-related errors (e.g., `Margin` error).\n*   **Performance**: The `Cerebro` class includes parameters like `preload` and `runonce` (lines 63-72 in `cerebro.py`) to optimize performance.\n    *   `preload=True`: Loads all data into memory before the backtest starts.\n    *   `runonce=True`: Enables vectorized execution for indicators, significantly speeding up calculations by leveraging NumPy-like operations on the entire data series at once, rather than bar-by-bar. This is a key performance feature.\n\n# Module Analysis: Data Structures and Feeds (`/dataseries` and `/feeds`)\n\n## 3. Module Core Responsibility\nThis module is responsible for the **ingestion, representation, and time-synchronization of financial time-series data**. It provides the fundamental data structures (`DataSeries`) and the mechanisms (`AbstractDataBase` and concrete feeds) to load data from various sources (CSV, Pandas, live feeds) and prepare it for the core engine (`Cerebro`).\n\n## 3.1 Key Files and Responsibilities\n\n| File | Core Responsibility |\n| :--- | :--- |\n| `dataseries.py` | Defines the `DataSeries`, `OHLC`, and `OHLCDateTime` classes, which are the canonical data containers for financial data. Also defines `TimeFrame` constants. |\n| `feed.py` | Defines `AbstractDataBase`, the base class for all data feeds. It manages parameters, time-zone handling, date filtering, and live data notifications. |\n| `feeds/btcsv.py` | Implementation for loading data from backtrader's default CSV format. |\n| `feeds/pandafeed.py` | Implementation for seamlessly integrating Pandas DataFrames as a data source. |\n| `feeds/yahoo.py` | Implementation for fetching historical data from Yahoo Finance. |\n| `feeds/ibdata.py` | Implementation for connecting to Interactive Brokers for live and historical data. |\n\n## 4. Code Detail Analysis\n\n### 4.1 Core Implementation: Data Representation and Time Management\n\nThe data module is built upon the core `LineSeries` concept.\n*   **`DataSeries`**: Extends `LineSeries` and introduces standard financial fields as lines: `open`, `high`, `low`, `close`, `volume`, `openinterest`, and `datetime`. This standardization allows indicators and strategies to access data consistently.\n*   **`TimeFrame`**: A utility class defining constants for various time granularities (Ticks, Seconds, Minutes, Days, Weeks, etc.), which are crucial for data aggregation and resampling.\n*   **`AbstractDataBase`**: This class acts as the bridge between the raw data source and the `Cerebro` engine. It implements complex logic for:\n    *   **Timezone Handling**: Using parameters like `tz` and `tzinput` to correctly localize and convert timestamps.\n    *   **Date Filtering**: Applying `fromdate` and `todate` to limit the data range.\n    *   **Resampling/Replaying**: It is the base for data resampling and replaying mechanisms (though the implementation details are in `resamplerfilter.py`), allowing users to mix data of different timeframes.\n\n### 4.2 Dependencies\n\nThe module's primary dependency is on the core `LineSeries` for its data structure foundation. It also depends on:\n*   **`backtrader.utils`**: For date/time utilities (`date2num`, `num2date`, `tzparse`).\n*   **`backtrader.tradingcal`**: For integrating market calendars to handle trading sessions and holidays, particularly in `AbstractDataBase`.\n*   **`backtrader.resamplerfilter`**: For the actual logic of changing data granularity.\n\n### 4.3 Error & Performance\n\n*   **Performance**: The `AbstractDataBase` is designed to support the `preload` and `runonce` performance flags from `Cerebro`. For live feeds, it includes a `qcheck` parameter (in `feed.py`) to define a timeout for checking for new data events, which is critical for non-blocking live trading.\n*   **Extensibility**: The design is highly extensible. New data sources only need to inherit from `AbstractDataBase` and implement the logic to parse their specific format and push data into the inherited line series. The `/feeds` directory is a clear example of this pattern.\n\n# Module Analysis: Indicators and Analysis (`/indicators` and `/analyzers`)\n\n## 3. Module Core Responsibility\nThis module provides the **quantitative tools** for the backtesting framework. The `/indicators` directory contains the technical analysis components used within strategies, while the `/analyzers` directory contains the performance measurement components used to evaluate the strategy's results.\n\n## 3.1 Key Files and Responsibilities\n\n| Directory | File Example | Core Responsibility |\n| :--- | :--- | :--- |\n| `/backtrader` | `indicator.py` | Defines the `Indicator` base class, which is the foundation for all technical indicators. |\n| `/indicators` | `sma.py`, `rsi.py`, `macd.py` | Implementations of specific technical analysis indicators. |\n| `/backtrader` | `analyzer.py` | Defines the `Analyzer` base class, which hooks into the strategy lifecycle to collect performance data. |\n| `/analyzers` | `sharpe.py` | Calculates the Sharpe Ratio of the strategy's returns. |\n| `/analyzers` | `drawdown.py` | Tracks and calculates the maximum drawdown and related metrics. |\n| `/analyzers` | `tradeanalyzer.py` | Provides detailed statistics on individual trades (wins, losses, duration, etc.). |\n\n## 4. Code Detail Analysis\n\n### 4.1 Core Implementation: Indicator Chaining and Vectorization\n\nIndicators are the primary example of the **Line-Based System** in action.\n*   **Indicator Definition**: An indicator is essentially a specialized `LineSeries` that calculates its output line(s) based on its input line(s) (which can be a data feed line or the output of another indicator).\n*   **`next()` vs. Vectorization**: Indicators are designed to be calculated bar-by-bar in the `next()` method for live/event-driven mode. However, the `Cerebro` engine, when configured with `runonce=True`, leverages the indicator's internal structure to perform calculations in a vectorized (NumPy-like) manner over the entire data series, offering a significant performance boost.\n*   **Chaining**: The `Indicator` class handles the dependency chain automatically. For example, a Bollinger Band indicator depends on a Simple Moving Average (SMA) indicator, which in turn depends on a data line (e.g., `data.close`). The framework ensures the SMA is calculated before the Bollinger Band for each bar.\n\n### 4.2 Core Implementation: Analyzer Hooks\n\nThe `Analyzer` class is a powerful example of the **Observer Pattern**.\n*   **Lifecycle Integration**: Analyzers are instantiated within a `Strategy` and automatically receive callbacks from the `Cerebro` engine at key points in the backtest lifecycle:\n    *   `notify_order(order)`: When an order status changes.\n    *   `notify_trade(trade)`: When a trade is opened or closed.\n    *   `notify_cashvalue(cash, value)`: Before each bar's processing.\n    *   `next()`: For bar-by-bar calculations (e.g., tracking daily returns).\n*   **Hierarchical Analysis**: The `Analyzer` class supports a parent-child structure (`_children` list), allowing complex analyzers to be composed of simpler ones, and ensuring notifications are propagated down the hierarchy.\n\n### 4.3 Dependencies\n\n*   **Indicators** depend on:\n    *   `LineSeries` (for data structure).\n    *   `mathsupport` (for mathematical functions).\n    *   Other `Indicator` classes (for chaining).\n*   **Analyzers** depend on:\n    *   `Strategy` (for context and notifications).\n    *   `Trade` and `Order` (for input data).\n    *   `TimeFrame` (for time-based analysis like `TimeFrameAnalyzerBase`).\n\n# Module Analysis: Brokerage and Execution (`/brokers` and `/comminfos`)\n\n## 3. Module Core Responsibility\nThis module is responsible for **simulating the trading environment**, including managing cash, portfolio value, executing orders, and calculating the financial impact of trades (commissions, margin, interest). The `/brokers` directory contains the broker implementations, and the core `comminfo.py` file defines the rules for transaction costs.\n\n## 3.1 Key Files and Responsibilities\n\n| File | Core Responsibility |\n| :--- | :--- |\n| `broker.py` (Core) | Defines `BrokerBase`, the abstract interface for all brokers, and `BackBroker`, the default simulated broker used for backtesting. |\n| `comminfo.py` (Core) | Defines `CommInfoBase`, the class responsible for calculating commissions, margin requirements, and interest charges for different asset types (stock-like vs. futures-like). |\n| `brokers/bbroker.py` | Contains the `BackBroker` implementation, which handles the core logic of order matching, position updates, and cash management in a backtesting context. |\n| `brokers/ibbroker.py` | Provides integration with the Interactive Brokers API for live trading. |\n| `sizer.py` (Core) | Defines the `Sizer` base class, which is used by strategies to determine the size of a trade (e.g., `FixedSize`, `PercentSizer`). |\n\n## 4. Code Detail Analysis\n\n### 4.1 Core Implementation: The `CommInfoBase` Engine\n\nThe `CommInfoBase` class is a sophisticated financial modeler. It uses a parameter-driven approach to define the financial characteristics of an asset:\n*   **Asset Type**: Determined by `stocklike` and `margin` parameters, allowing it to model both stock/forex (percentage commission, no margin) and futures (fixed commission, margin required) trading.\n*   **Calculations**: It provides methods for:\n    *   `getcommission(size, price)`: Calculates the transaction cost.\n    *   `get_margin(price)`: Calculates the margin required per unit.\n    *   `getsize(price, cash)`: Calculates the maximum tradable size given current cash.\n    *   `profitandloss(size, price, newprice)`: Calculates P&L.\n    *   `get_credit_interest(...)`: Calculates interest on short positions.\n\n### 4.2 Core Implementation: `BackBroker` Execution\n\nThe `BackBroker` (in `bbroker.py`) is the concrete implementation of `BrokerBase` for backtesting.\n*   **Order Matching**: It implements the logic to match submitted orders (`Order` objects) against the incoming data bar. This includes handling various order types (Market, Limit, Stop) and simulating slippage and partial fills.\n*   **State Management**: It maintains the current cash, value, and a dictionary of open `Position` objects for each data feed.\n*   **Cheat-on-Open**: It supports the `cheat_on_open` mechanism, allowing orders to be executed at the opening price of the current bar, which is a common feature in backtesting platforms.\n\n### 4.3 Dependencies\n\n*   **`BrokerBase`** depends on:\n    *   `CommInfoBase` (for all financial calculations).\n    *   `Order` and `Position` (for state and transaction objects).\n    *   `Sizer` (to delegate position sizing logic).\n*   **`CommInfoBase`** is self-contained but relies on the `MetaParams` metaclass for its parameter system.\n\n# Module Analysis: Monitoring and Utilities (`/observers` and `/utils`)\n\n## 3. Module Core Responsibility\nThis module provides the **monitoring and visualization** components (`/observers`) and the essential **helper functions and classes** (`/utils`) that support the entire framework. Observers are crucial for collecting data during the backtest for later plotting and analysis, while utilities handle common tasks like date/time manipulation and custom data structures.\n\n## 3.1 Key Files and Responsibilities\n\n| Directory | File Example | Core Responsibility |\n| :--- | :--- | :--- |\n| `/backtrader` | `observer.py` | Defines the `Observer` base class, which is a specialized `LineIterator` for monitoring the backtest state. |\n| `/observers` | `broker.py` | Implements `Broker` observer to track cash and portfolio value over time. |\n| `/observers` | `trades.py` | Implements `Trades` observer to track the entry and exit points of all trades. |\n| `/observers` | `buysell.py` | Implements `BuySell` observer to mark buy and sell signals on the plot. |\n| `/utils` | `py3.py` | Compatibility layer for Python 2/3 differences. |\n| `/utils` | `date.py` | Date and time utilities, including timezone handling and conversion between datetime objects and floating-point numbers used internally. |\n| `/utils` | `autodict.py` | Defines custom dictionary classes like `AutoOrderedDict` for convenient attribute access. |\n\n## 4. Code Detail Analysis\n\n### 4.1 Core Implementation: The Observer Pattern\n\nThe `Observer` class is a specialized `LineIterator` that is attached to a `Strategy` or `Cerebro`.\n*   **Monitoring**: Unlike indicators, observers do not influence the trading logic. Their purpose is purely to record state changes. They implement the `next()` method to capture data points (e.g., portfolio value) at each bar.\n*   **Plotting**: Observers are the primary source of data for the built-in plotting system. For example, the `Broker` observer tracks the `cash` and `value` lines, which are then plotted to visualize portfolio performance.\n*   **Strategy-Wide vs. Data-Specific**: Observers can be defined to monitor the entire strategy (e.g., portfolio value) or specific data feeds (e.g., a trade observer for a single stock).\n\n### 4.2 Core Implementation: Utilities\n\nThe `/utils` module is vital for maintaining code consistency and cross-platform compatibility.\n*   **Date Handling**: `date.py` is crucial for handling the internal representation of time. `backtrader` uses a floating-point number (Julian date-like) to represent datetimes internally for fast comparison and calculation, and `date.py` provides the necessary conversion functions (`date2num`, `num2date`) and timezone localization (`Localizer`).\n*   **Custom Data Structures**: `autodict.py` provides classes like `AutoOrderedDict`, which allows dictionary keys to be accessed as object attributes (e.g., `bar.close` instead of `bar['close']`), enhancing code readability for users.\n\n### 4.3 Dependencies\n\n*   **Observers** depend on:\n    *   `LineIterator` (for synchronization).\n    *   `Broker` and `Trade` (to extract monitoring data).\n*   **Utilities** are generally self-contained but are imported heavily by almost all other modules in the framework.\n\n### Module PlantUML Diagrams\n\n# Core Engine Module: backtrader/\n```plantuml\n@startuml Core Engine Module\n\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Courier\n\ntitle Core Engine Module: backtrader/\n\n' Metaclasses and Base Classes\nabstract class MetaBase << (M, #ADD1B2) Metaclass >>\nabstract class MetaParams << (M, #ADD1B2) Metaclass >>\nMetaBase <|-- MetaParams\n\nabstract class LineRoot << (R, #FF7700) Base >>\nabstract class LineIterator << (I, #FF7700) Base >>\nabstract class StrategyBase << (S, #FF7700) Base >>\nabstract class BrokerBase << (B, #FF7700) Base >>\nabstract class OrderBase << (O, #FF7700) Base >>\nabstract class CommInfoBase << (C, #FF7700) Base >>\n\nLineRoot <|-- LineIterator\nLineIterator <|-- StrategyBase\n\n' Core Components\nclass Cerebro << (C, #FFCC00) Engine >> {\n    + run()\n    + addstrategy()\n    + adddata()\n    + setbroker()\n}\n\nclass Strategy << (S, #33CCFF) User Logic >> {\n    + next()\n    + notify_order()\n    + buy()\n    + sell()\n    + lines\n    + params\n}\n\nclass BackBroker << (B, #AAFFCC) Broker >> {\n    + submit()\n    + getcash()\n    + getvalue()\n}\n\nclass Order << (O, #FF99CC) Transaction >> {\n    + status\n    + size\n    + price\n}\n\nclass Position << (P, #CC99FF) State >> {\n    + size\n    + price\n}\n\n' Relationships\nMetaParams <|-- Cerebro\nMetaParams <|-- BrokerBase\nMetaParams <|-- OrderBase\n\nStrategyBase <|-- Strategy\nBrokerBase <|-- BackBroker\n\nBrokerBase \"1\" o-- \"0..*\" Order : manages\nBrokerBase \"1\" o-- \"0..*\" Position : manages\nBrokerBase \"1\" o-- \"1\" CommInfoBase : uses\n\nCerebro \"1\" o-- \"1..*\" Strategy : runs\nCerebro \"1\" o-- \"1\" BackBroker : uses\nCerebro \"1\" o-- \"1..*\" LineRoot : feeds (Data)\n\nStrategy \"1\" o-- \"1\" BrokerBase : submits orders to\nStrategy \"1\" o-- \"1..*\" LineRoot : uses (Indicators/Data)\n\nLineRoot <|-- DataSeries\nLineRoot <|-- Indicator\n\n@enduml\n```\n\n# Data Structures and Feeds Module: dataseries/feeds\n```plantuml\n@startuml Data Structures and Feeds Module\n\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Courier\n\ntitle Data Structures and Feeds Module: dataseries/feeds\n\n' Core Line System (from Core Module)\nabstract class LineSeries << (L, #FF7700) Core >>\n\n' Data Structures\nclass TimeFrame << (U, #AAFFCC) Utility >> {\n    + Ticks\n    + Minutes\n    + Days\n}\n\nclass DataSeries << (D, #33CCFF) Data Container >> {\n    + lines = (datetime, open, high, low, close, volume, openinterest)\n}\n\nclass OHLC << (D, #33CCFF) Data Container >>\nclass OHLCDateTime << (D, #33CCFF) Data Container >>\n\nLineSeries <|-- DataSeries\nDataSeries <|-- OHLC\nOHLC <|-- OHLCDateTime\n\n' Data Feeds\nabstract class AbstractDataBase << (F, #FFCC00) Base Feed >> {\n    + params\n    + _gettz()\n    + islive()\n    + put_notification()\n}\n\nclass CSVFeedBase << (F, #FFCC00) Base Feed >>\nclass PandaFeed << (F, #FFCC00) Concrete Feed >>\nclass YahooFeed << (F, #FFCC00) Concrete Feed >>\n\nOHLCDateTime <|-- AbstractDataBase\nAbstractDataBase <|-- CSVFeedBase\nAbstractDataBase <|-- PandaFeed\nAbstractDataBase <|-- YahooFeed\n\n' Relationships\nAbstractDataBase \"1\" o-- \"1\" TimeFrame : uses\nAbstractDataBase \"1\" o-- \"1\" TradingCalendar : uses\n\n@enduml\n```\n\n# Indicators and Analysis Module: indicators/analyzers\n```plantuml\n@startuml Indicators and Analysis Module\n\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Courier\n\ntitle Indicators and Analysis Module: indicators/analyzers\n\n' Core Components (from Core Module)\nabstract class LineIterator << (I, #FF7700) Core >>\nabstract class Strategy << (S, #33CCFF) Core >>\nabstract class Trade << (T, #FF99CC) Core >>\nabstract class Order << (O, #FF99CC) Core >>\n\n' Indicators\nabstract class IndicatorBase << (I, #AAFFCC) Base >>\nLineIterator <|-- IndicatorBase\n\nclass Indicator << (I, #AAFFCC) Indicator >> {\n    + next()\n    + lines\n}\nIndicatorBase <|-- Indicator\n\nclass SMA << (I, #AAFFCC) Concrete >>\nclass RSI << (I, #AAFFCC) Concrete >>\nclass MACD << (I, #AAFFCC) Concrete >>\n\nIndicator <|-- SMA\nIndicator <|-- RSI\nIndicator <|-- MACD\n\n' Analyzers\nabstract class Analyzer << (A, #FFCC00) Base >> {\n    + notify_order(order)\n    + notify_trade(trade)\n    + get_analysis() : dict\n}\n\nclass TimeFrameAnalyzerBase << (A, #FFCC00) Base >>\nAnalyzer <|-- TimeFrameAnalyzerBase\n\nclass SharpeRatio << (A, #FFCC00) Concrete >>\nclass DrawDown << (A, #FFCC00) Concrete >>\nclass TradeAnalyzer << (A, #FFCC00) Concrete >>\n\nAnalyzer <|-- SharpeRatio\nAnalyzer <|-- DrawDown\nAnalyzer <|-- TradeAnalyzer\n\n' Relationships\nStrategy \"1\" o-- \"0..*\" Indicator : uses\nStrategy \"1\" o-- \"0..*\" Analyzer : contains\n\nAnalyzer .> Trade : observes\nAnalyzer .> Order : observes\n\nIndicator \"1\" o-- \"0..*\" Indicator : chains (e.g., BBands uses SMA)\n\n@enduml\n```\n\n# Brokerage and Execution Module: brokers/comminfos\n```plantuml\n@startuml Brokerage and Execution Module\n\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Courier\n\ntitle Brokerage and Execution Module: brokers/comminfos\n\n' Core Components (from Core Module)\nabstract class MetaParams << (M, #ADD1B2) Metaclass >>\nabstract class Order << (O, #FF99CC) Core >>\nabstract class Position << (P, #CC99FF) Core >>\n\n' Commission and Financial Modeling\nclass CommInfoBase << (C, #AAFFCC) Financial Model >> {\n    + params\n    + getcommission(size, price)\n    + get_margin(price)\n    + getsize(price, cash)\n}\n\n' Sizing\nabstract class Sizer << (S, #AAFFCC) Base >> {\n    + _getsizing(data, isbuy, isopen)\n}\n\nclass FixedSize << (S, #AAFFCC) Concrete >>\nclass PercentSizer << (S, #AAFFCC) Concrete >>\n\nSizer <|-- FixedSize\nSizer <|-- PercentSizer\n\n' Brokerage\nabstract class BrokerBase << (B, #FFCC00) Base >> {\n    + submit(order)\n    + cancel(order)\n    + getcash()\n    + getvalue()\n}\n\nclass BackBroker << (B, #FFCC00) Simulated >>\nclass IBBroker << (B, #FFCC00) Live >>\n\nBrokerBase <|-- BackBroker\nBrokerBase <|-- IBBroker\n\n' Relationships\nMetaParams <|-- CommInfoBase\nMetaParams <|-- Sizer\n\nBrokerBase \"1\" o-- \"1\" CommInfoBase : uses\nBrokerBase \"1\" o-- \"0..*\" Order : manages\nBrokerBase \"1\" o-- \"0..*\" Position : manages\n\nStrategy \"1\" o-- \"1\" Sizer : uses to calculate size\n\n@enduml\n```\n\n# Monitoring and Utilities Module: observers/utils\n```plantuml\n@startuml Monitoring and Utilities Module\n\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Courier\n\ntitle Monitoring and Utilities Module: observers/utils\n\n' Core Components (from Core Module)\nabstract class LineIterator << (I, #FF7700) Core >>\nabstract class StrategyBase << (S, #FF7700) Core >>\nabstract class BrokerBase << (B, #FFCC00) Core >>\n\n' Observers\nabstract class ObserverBase << (O, #AAFFCC) Base >>\nLineIterator <|-- ObserverBase\n\nclass Observer << (O, #AAFFCC) Observer >> {\n    + next()\n    + plotinfo\n}\nObserverBase <|-- Observer\n\nclass BrokerObserver << (O, #AAFFCC) Concrete >>\nclass TradesObserver << (O, #AAFFCC) Concrete >>\nclass BuySellObserver << (O, #AAFFCC) Concrete >>\n\nObserver <|-- BrokerObserver\nObserver <|-- TradesObserver\nObserver <|-- BuySellObserver\n\n' Utilities\nclass AutoOrderedDict << (U, #ADD1B2) Utility >>\nclass DateUtils << (U, #ADD1B2) Utility >> {\n    + date2num()\n    + num2date()\n    + Localizer()\n}\n\n' Relationships\nStrategyBase \"1\" o-- \"0..*\" Observer : monitors\nBrokerObserver .> BrokerBase : reads state from\n\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe **core abstraction** of backtrader is the **Line-Based System**, implemented through classes like `LineRoot`, `LineSeries`, and `LineIterator`. This system treats all time-series data—whether raw data, indicator outputs, or observer values—as synchronized \"lines\" of data. This abstraction provides:\n1.  **Automatic Synchronization**: All lines advance together, ensuring that the strategy operates on a consistent view of the market at any given bar.\n2.  **Lookback Management**: The system automatically calculates and enforces the minimum required lookback period (`_minperiod`) for indicators and strategies, preventing look-ahead bias.\n3.  **Vectorization Support**: The line-based structure allows for seamless switching between bar-by-bar iteration and vectorized calculation (via `runonce=True`), a key performance feature.\n\nThe **design philosophy** is a strong **Separation of Concerns** based on the Model-View-Controller (MVC) pattern, adapted for a backtesting engine:\n*   **Controller (Cerebro)**: The central orchestrator, managing the simulation loop and component registration.\n*   **Model (Data, Broker, Order, Position)**: The state and environment components, representing market data, account status, and transaction details.\n*   **View (Observer, Analyzer)**: The monitoring and reporting components, collecting data for visualization and final analysis.\n*   **User Logic (Strategy)**: The component where the user defines the trading rules, isolated from the core engine mechanics.\n\nThe **lifecycle management** is driven by the `Cerebro.run()` method, which executes an **event-driven loop**:\n1.  **Initialization**: Components are instantiated, parameters are set (often via the `MetaParams` metaclass), and dependencies are resolved.\n2.  **Pre-Run**: Data is optionally preloaded (`preload=True`), and the minimum period for all components is calculated.\n3.  **Main Loop (Bar-by-Bar)**: `Cerebro` advances the clock, pushing the next bar of data to all feeds.\n    *   The `Broker` processes pending orders against the new bar's prices.\n    *   Indicators calculate their new values.\n    *   The `Strategy.next()` method is called, allowing the user to read indicator values and submit new orders.\n    *   `Analyzers` and `Observers` record the new state.\n4.  **Finalization**: The loop ends, and `Analyzers` are instructed to finalize their results.\n\n#### 3.1.2. Component Interactions\n\nThe framework's communication is highly structured, relying on method calls and a notification system.\n\n### Key Interaction Flows\n\n| Interaction | Source | Target | Communication Pattern | Description |\n| :--- | :--- | :--- | :--- | :--- |\n| **Data Synchronization** | `Cerebro` | `Data Feed`, `Indicator`, `Strategy` | Sequential Method Call (`next()`) | `Cerebro` drives the simulation by calling `next()` on all components in a fixed order, ensuring data consistency. |\n| **Order Submission** | `Strategy` | `Broker` | Direct Method Call (`broker.buy()`, `broker.sell()`) | The strategy initiates a transaction by calling a method on the `Broker` instance. |\n| **Order/Trade Update** | `Broker` | `Strategy`, `Analyzer`, `Observer` | Notification Method Call (`notify_order()`, `notify_trade()`) | The `Broker` acts as a central event source, notifying all interested parties (Strategy for logic, Analyzer/Observer for recording) about changes in order status or trade execution. |\n| **Indicator Calculation** | `Indicator` | `Indicator` / `Data Feed` | Line Access / Dependency Chain | An indicator reads the lines of its input (data or another indicator) to calculate its own output line. This is managed implicitly by the `LineIterator` system. |\n\n### Data Flow\n\n1.  **Input Data**: Raw data (e.g., CSV, Pandas DataFrame) is loaded by a concrete `Data Feed` (e.g., `PandaFeed`) and converted into the internal `OHLCDateTime` format, which is a collection of synchronized `LineSeries`.\n2.  **Transformation**: The `LineSeries` flow into `Indicator` objects, which transform the raw price data into technical metrics (e.g., SMA, RSI).\n3.  **Decision**: The `Strategy` reads the transformed data (Indicator lines) and raw data lines to make a trading decision.\n4.  **Execution**: The decision is sent to the `Broker` as an `Order` object. The `Broker` uses `CommInfo` to calculate the financial impact (cost, margin).\n5.  **Feedback**: The `Broker` updates the `Position` and `Trade` objects and sends notifications back to the `Strategy` (to update its state) and the `Analyzer`/`Observer` components (for performance tracking).\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml backtrader Architecture\n\nskinparam classAttributeIconVisible false\nskinparam defaultFontName Courier\n\ntitle Overall backtrader Architecture\n\n' Main Components\ncomponent [Cerebro] as CEREBRO #FFCC00\ncomponent [Strategy] as STRATEGY #33CCFF\ncomponent [Data Feed] as DATA #AAFFCC\ncomponent [Broker] as BROKER #FF7700\ncomponent [Indicator] as INDICATOR #ADD1B2\ncomponent [Analyzer] as ANALYZER #CC99FF\ncomponent [Observer] as OBSERVER #CC99FF\n\n' Sub-Components\n[Order] as ORDER\n[Position] as POSITION\n[CommInfo] as COMMINFO\n\n' Relationships\n' 1. Orchestration by Cerebro\nCEREBRO --> STRATEGY : 1. Runs (addstrategy)\nCEREBRO --> DATA : 2. Feeds (adddata)\nCEREBRO --> BROKER : 3. Uses (setbroker)\nCEREBRO --> ANALYZER : 4. Aggregates Results\n\n' 2. Strategy Logic\nSTRATEGY --> INDICATOR : Reads lines from\nSTRATEGY --> BROKER : Submits (buy/sell)\n\n' 3. Execution and State\nBROKER --> ORDER : Manages\nBROKER --> POSITION : Manages\nBROKER --> COMMINFO : Calculates costs\n\n' 4. Data Flow and Calculation\nDATA --> INDICATOR : Provides input lines\nINDICATOR --> INDICATOR : Chains calculations\n\n' 5. Monitoring and Reporting\nBROKER --> STRATEGY : Notifies (Order/Trade)\nBROKER --> ANALYZER : Notifies (Order/Trade)\nBROKER --> OBSERVER : Notifies (Cash/Value)\n\nSTRATEGY --> ANALYZER : Notifies (Self)\nSTRATEGY --> OBSERVER : Notifies (Self)\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe backtrader framework makes extensive use of classic object-oriented design patterns to achieve its flexibility and power.\n\n| Pattern | Description | Implementation Example |\n| :--- | :--- | :--- |\n| **Template Method** | Defines the skeleton of an algorithm in a base class, deferring some steps to subclasses. | The `Strategy` class defines the `__init__`, `next`, `notify_order`, and `stop` methods. Users must override `next()` and optionally others to implement their logic. |\n| **Observer** | Defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically. | The `Broker` acts as the subject, notifying `Strategy`, `Analyzer`, and `Observer` objects when an `Order` or `Trade` status changes. |\n| **Chain of Responsibility** | Passes a request along a chain of handlers. | The **Line-Based System** is a form of this. When a line is updated, it triggers the calculation of dependent indicators, which in turn trigger their dependents, forming a calculation chain. |\n| **Factory Method** | Defines an interface for creating an object, but lets subclasses decide which class to instantiate. | The `Cerebro` class acts as a factory for creating and configuring the entire backtesting environment (e.g., `addstrategy`, `adddata`). |\n| **Metaclass (Custom Pattern)** | A non-standard but critical pattern used to inject functionality into classes at creation time. | The `MetaParams` metaclass automatically handles class-level parameters (`params` tuple) and maps them to instance attributes, simplifying parameter management across the entire framework. |\n\n#### 3.3.2. Project Highlights\n\nThe backtrader framework is highly regarded for its innovative design, which provides both high performance and exceptional flexibility.\n\n*   **The Line-Based System**: This is the single most innovative feature. By abstracting all time-series data into synchronized \"lines,\" it enables:\n    *   **Indicator Chaining**: Indicators can be effortlessly chained together without manual data passing.\n    *   **Automatic Lookback**: The system automatically determines the minimum required data points for all calculations, preventing runtime errors and look-ahead bias.\n*   **Vectorized Execution (`runonce`)**: The ability to run indicators in a vectorized mode dramatically improves performance for historical backtesting, allowing users to choose between the flexibility of bar-by-bar logic and the speed of batch processing.\n*   **Extensibility via Inheritance**: The framework is designed around abstract base classes (`Strategy`, `BrokerBase`, `AbstractDataBase`, `Analyzer`). Extending the framework is straightforward: users simply subclass the relevant base class and override the necessary methods (e.g., creating a new data feed, a custom broker, or a new indicator).\n*   **Parameter Management (`MetaParams`)**: The custom metaclass system simplifies parameter handling. Users define parameters as a class attribute tuple (`params = (...)`), and the metaclass automatically handles default values, inheritance, and instance attribute assignment, leading to clean, readable strategy code.\n*   **Data Resampling and Replaying**: The built-in support for resampling (aggregating data to a lower frequency, e.g., 1-minute to 1-hour) and replaying (mixing different timeframes, e.g., daily data with 1-minute data) is a powerful feature for complex multi-timeframe strategies.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nWhile backtrader is a mature and powerful library, a few areas could be considered for improvement or modernization.\n\n*   **Performance Bottlenecks (Python GIL)**: The core backtesting loop in `Cerebro` is single-threaded and bound by the Python Global Interpreter Lock (GIL). While optimization is available via `runonce`, true parallel backtesting (e.g., running multiple strategies or optimizations simultaneously) is limited by the current architecture.\n    *   **Suggestion**: Explore using `multiprocessing` more aggressively for optimization runs, or refactor core calculation loops to leverage libraries like Numba or Cython for C-level speed.\n*   **Modern Python Typing and Structure**: The codebase was developed before modern Python type hinting became standard. Adopting type hints would significantly improve code clarity, maintainability, and allow for better static analysis.\n    *   **Suggestion**: Introduce type hints across the entire codebase, especially in public interfaces like `Strategy.next()` and `Broker.notify_order()`.\n*   **Data Handling Modernization**: The internal date/time handling uses a custom float-based system and relies on `pytz` for timezones. While functional, modern Python development often prefers native `datetime` objects with built-in timezone support or dedicated libraries like `dateutil`.\n    *   **Suggestion**: Investigate migrating internal date/time representation to a more standard format, potentially leveraging NumPy's `datetime64` for performance gains in vectorized operations.\n*   **Decoupling of Plotting**: The plotting logic is tightly coupled with the `Observer` and `Indicator` classes via the `plotinfo` attribute. This makes it difficult to use alternative plotting backends (e.g., Plotly, Bokeh) without custom wrappers.\n    *   **Suggestion**: Introduce a dedicated Plotting Interface/Adapter layer to decouple the core components from the visualization implementation.\n\n#### 3.4.2. Secondary Development Guide\n\nFor a developer looking to extend or modify the backtrader framework, the following steps and best practices are recommended:\n\n1.  **Understand the Core Abstraction (The Line-Based System)**: Start by reading `lineroot.py`, `lineseries.py`, and `indicator.py`. Grasping how data is represented as synchronized lines is fundamental to all other components.\n2.  **Trace the Execution Flow (The Cerebro Loop)**: Examine `cerebro.py` to understand the sequence of events in the `run()` method. This is the master control flow that dictates when `Broker`, `Strategy`, and `Indicator` methods are called.\n3.  **Extending User Logic (Strategy)**:\n    *   To create a new strategy, subclass `backtrader.Strategy`.\n    *   Use the `params` tuple for configuration, not `__init__` arguments directly.\n    *   Implement trading logic in `next()`.\n    *   Handle order and trade status updates in `notify_order()` and `notify_trade()`.\n4.  **Creating a New Indicator**:\n    *   Subclass `backtrader.Indicator`.\n    *   Define input lines in `lines = (...)` and output lines in `lines = (...)`.\n    *   Implement the calculation logic in `next()`. The framework handles the lookback and synchronization automatically.\n5.  **Adding a New Data Source**:\n    *   Subclass `backtrader.feeds.AbstractDataBase`.\n    *   The primary task is to implement the logic to read the source data and push it into the inherited `lines` (especially `datetime`, `open`, `high`, `low`, `close`).\n    *   Pay close attention to timezone handling and data alignment, as this is the most complex part of data integration.\n6.  **Testing**: The `tests/` directory contains a comprehensive suite of unit tests. Any new feature or modification should be accompanied by new tests that validate the expected behavior, particularly for financial calculations and data synchronization.\n7.  **Leverage `MetaParams`**: When creating any new component that requires configuration, use the `MetaParams` pattern (by inheriting from a class that uses it, like `CommInfoBase` or `Strategy`) to ensure consistent and robust parameter handling.\n\n"
  },
  {
    "path": "thirdparty/investor-agent.md",
    "content": "# investor-agent - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project exhibits a highly streamlined and focused directory structure, typical for a single-purpose Model Context Protocol (MCP) server application.\n\n```\n/home/ubuntu/investor-agent\n├── .git/                     # Git version control metadata, used for tracking changes.\n├── .github/                  # Contains GitHub Actions workflows, specifically for continuous integration and publishing.\n├── .gitignore                # Defines files and directories to be excluded from version control.\n├── .python-version           # Specifies the required Python version (e.g., 3.11) for environment consistency.\n├── LICENSE                   # The open-source license under which the project is distributed.\n├── README.md                 # Primary documentation, providing setup instructions and usage examples.\n├── chat.py                   # The client-side demonstration script, used to interact with the MCP server via an LLM agent.\n├── investor_agent/           # The core Python package containing the server logic.\n│   ├── __init__.py           # Package initialization file (currently empty).\n│   └── server.py             # The main implementation file for the Investor-Agent MCP server.\n├── pyproject.toml            # Project configuration file, managing dependencies, build system, and metadata.\n└── uv.lock                   # Dependency lock file generated by the `uv` package manager, ensuring reproducible builds.\n```\n\n**Annotation:**\nThe structure is minimal, emphasizing a single, core function. The **`investor_agent/`** directory is the sole source code package, with all server logic consolidated into **`server.py`**. This file is responsible for initializing the `FastMCP` server, defining all the financial data tools, and handling external API integrations. The **`chat.py`** file is crucial for understanding the project's execution context, as it demonstrates how an LLM agent connects to and utilizes the MCP server via standard I/O (`MCPServerStdio`). The use of `pyproject.toml` and `uv.lock` indicates a modern approach to Python dependency management, prioritizing reproducible environments. The overall design is highly modular, with the entire financial logic encapsulated within the `investor_agent` package, making it easy to deploy and integrate as a specialized tool.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   **`/home/ubuntu/investor-agent/investor_agent`**: Contains the primary source code for the Investor-Agent MCP server. This is where all the financial data retrieval and analysis tools are implemented and exposed to the Large Language Model (LLM).\n*   **`/home/ubuntu/investor-agent`**: The root directory contains the essential client-side demonstration script (`chat.py`), which is necessary for understanding the project's execution and communication flow with the MCP server.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module: investor_agent (Core MCP Server)\n\n### File Enumeration\n*   `/home/ubuntu/investor-agent/investor_agent/server.py` (748 lines)\n*   `/home/ubuntu/investor-agent/investor_agent/__init__.py` (1 line, empty)\n\n### Core Responsibility\nThe `investor_agent` module functions as a **Model Context Protocol (MCP) server**, providing a specialized suite of financial data and analysis tools to a Large Language Model (LLM). Its core purpose is to abstract the complexities of multiple external financial APIs (Yahoo Finance, Google Trends, Nasdaq, etc.) and data processing libraries (`pandas`, `talib`), exposing their capabilities as clean, reliable, and LLM-consumable functions (tools).\n\n### Key Implementation Details (`server.py`)\n\n#### 1. Initialization and Utilities\nThe file begins by initializing the `FastMCP` server instance: `mcp = FastMCP(\"Investor-Agent\", dependencies=[\"yfinance\", \"pandas\", \"pytrends\"])`. It also includes utility functions critical for data integrity and network resilience:\n*   **`api_retry` Decorator**: A unified, robust retry mechanism using `tenacity`. It implements exponential backoff and specifically handles rate-limiting errors (`YFRateLimitError`) and common HTTP errors (5xx, 429), ensuring high reliability for external API calls.\n*   **HTTP Client**: `create_async_client` utilizes `httpx` and `hishel.httpx.AsyncCacheClient` to provide an asynchronous, cached HTTP client, improving performance and reducing load on external servers.\n*   **Data Cleaning**: The `to_clean_csv(df: pd.DataFrame) -> str` function is essential. It cleans DataFrames by removing empty columns and converts the result to a clean, index-less CSV string, which is the project's optimized format for LLM consumption.\n*   **Validation**: Functions like `validate_ticker` and `validate_date` ensure input parameters are correctly formatted before API calls are made.\n\n#### 2. Core MCP Tools (Financial Data Retrieval)\nThe module exposes numerous tools, primarily using `yfinance` and web scraping:\n\n| Tool Name | Core Function | Data Source | Concurrency/Optimization |\n| :--- | :--- | :--- | :--- |\n| `get_ticker_data` | Comprehensive data (info, news, calendar, recommendations). | `yfinance` | Uses `ThreadPoolExecutor` to run multiple blocking `yfinance` calls in parallel. |\n| `get_price_history` | Historical OHLCV data. | `yfinance` | Simple wrapper for `yf_call`. |\n| `get_financial_statements` | Income, balance, and cash flow statements. | `yfinance` | Uses `ThreadPoolExecutor` for parallel fetching of different statement types. |\n| `get_options` | Options chain data. | `yfinance` | Uses `ThreadPoolExecutor` to fetch options chains for multiple expiry dates concurrently. |\n| `get_market_movers` | Top gainers, losers, and most active stocks. | Yahoo Finance (Web Scraping) | Uses `fetch_text` and `pandas.read_html` to parse data from Yahoo Finance web pages. |\n| `get_nasdaq_earnings_calendar` | Earnings announcements for a specific date. | Nasdaq API | Uses `fetch_json` with custom headers. |\n| `get_cnn_fear_greed_index` | Current and historical Fear & Greed index data. | CNN API | Uses `fetch_json`. |\n| `get_google_trends` | Relative search interest for keywords. | `pytrends` | Uses the `pytrends` library to build and fetch payload. |\n\n#### 3. Optional Tools (Conditional Registration)\nThe module demonstrates excellent modularity by conditionally registering tools based on dependency availability:\n*   **`calculate_technical_indicator`**: Only registered if `talib` is installed. It uses `yfinance` data and `talib` functions (SMA, EMA, RSI, MACD, BBANDS) to calculate technical indicators, returning both price and indicator data as separate CSV strings.\n*   **`fetch_intraday_data`**: Only registered if `alpaca-py` is installed. It uses Alpaca's API for high-resolution intraday data, requiring `ALPACA_API_KEY` and `ALPACA_API_SECRET` environment variables.\n\n## Client: chat.py (Demonstration Interface)\n\n### File Enumeration\n*   `/home/ubuntu/investor-agent/chat.py` (58 lines)\n\n### Core Responsibility\nThe `chat.py` file is a **client-side demonstration** script. It illustrates the standard pattern for connecting an LLM-based agent to the `Investor-Agent` MCP server, providing a simple, interactive command-line chat interface.\n\n### Key Implementation Details\nThe script uses the `pydantic_ai` library:\n1.  **Server Connection**: It launches the `investor-agent` server as a subprocess using `MCPServerStdio('uv', args=['run', 'investor-agent', 'stdio'], ...)`. This establishes the communication bridge over standard I/O.\n2.  **Agent Initialization**: An `Agent` is created, and the `MCPServerStdio` instance is passed as a toolset: `agent = Agent(model_identifier, toolsets=[server])`.\n3.  **Chat Loop**: The `main` asynchronous function manages the interactive loop, taking user input and calling `agent.run()` to process the query. This is where the LLM decides whether to use the tools provided by the MCP server. The script ensures a graceful exit and basic error logging.\n\n### Module PlantUML Diagrams\n\n# investor_agent Module Diagram\n\n@startuml\ntitle Investor-Agent Module Class Diagram\n\npackage \"investor_agent\" {\n    class FastMCP as InvestorAgent {\n        + mcp : FastMCP\n        + run()\n    }\n\n    component \"HTTP Client Utilities\" as HttpUtils {\n        + create_async_client() : AsyncCacheClient\n        + fetch_json(url) : dict\n        + fetch_text(url) : str\n    }\n\n    component \"Data Utilities\" as DataUtils {\n        + validate_ticker(ticker) : str\n        + validate_date(date_str) : date\n        + to_clean_csv(df) : str\n        + api_retry(func) : func\n    }\n\n    component \"YFinance Wrapper\" as YFinanceWrapper {\n        + yf_call(ticker, method, ...)\n        + get_options_chain(ticker, expiry, type) : DataFrame\n    }\n\n    component \"Concurrency Manager\" as ThreadPool {\n        + ThreadPoolExecutor\n    }\n\n    component \"Pandas Data Structure\" as DataFrame {\n        + DataFrame\n    }\n\n    InvestorAgent --> HttpUtils : uses\n    InvestorAgent --> DataUtils : uses\n    InvestorAgent --> YFinanceWrapper : uses\n\n    ' Tools - functions exposed to the LLM\n    package \"MCP Tools\" {\n        class get_ticker_data {\n            + get_ticker_data(ticker, ...) : dict\n        }\n        class get_financial_statements {\n            + get_financial_statements(ticker, ...) : dict\n        }\n        class get_market_movers {\n            + get_market_movers(category, ...) : str\n        }\n        class get_google_trends {\n            + get_google_trends(keywords, ...) : str\n        }\n        class get_options {\n            + get_options(ticker, ...) : str\n        }\n        class get_nasdaq_earnings_calendar {\n            + get_nasdaq_earnings_calendar(date, ...) : str\n        }\n        class get_cnn_fear_greed_index {\n            + get_cnn_fear_greed_index(...) : dict\n        }\n        class calculate_technical_indicator {\n            + calculate_technical_indicator(ticker, indicator, ...) : dict\n        }\n    }\n\n    get_ticker_data ..> YFinanceWrapper : calls\n    get_ticker_data ..> ThreadPool : uses for parallel calls\n    get_financial_statements ..> YFinanceWrapper : calls\n    get_financial_statements ..> ThreadPool : uses for parallel calls\n    get_market_movers ..> HttpUtils : calls fetch_text\n    get_google_trends ..> YFinanceWrapper : calls\n    get_options ..> YFinanceWrapper : calls\n    get_nasdaq_earnings_calendar ..> HttpUtils : calls fetch_json\n    get_cnn_fear_greed_index ..> HttpUtils : calls fetch_json\n\n    ' All tools return or process DataFrames\n    \"MCP Tools\" ..> DataFrame : processes/returns CSV from\n    YFinanceWrapper ..> DataFrame : returns\n    HttpUtils ..> DataFrame : processes HTML tables\n    DataUtils ..> DataFrame : cleans/converts\n}\n\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe Investor-Agent is fundamentally an **LLM-Tooling** project built upon the **Model Context Protocol (MCP)**, which shapes its core abstractions and design philosophy.\n\n**Core Abstractions:**\n1.  **The Tool (MCP Function)**: The most critical abstraction is the function decorated with `@mcp.tool()` in `server.py`. This abstracts away the complexity of API interaction, data cleaning, and error handling, presenting a clean, self-contained, and well-documented interface to the LLM. Each tool encapsulates a specific financial data task, such as retrieving historical prices or financial statements.\n2.  **The Data Frame (Pandas)**: Internally, the `pandas.DataFrame` is the central data structure. It abstracts raw, often messy API responses (JSON, HTML tables) into a structured, manipulable tabular format. This allows for consistent data processing, cleaning, and transformation.\n3.  **The CSV String (LLM Output)**: The final output abstraction is the CSV string, generated by the `to_clean_csv` utility. This is a deliberate design choice to ensure the data returned to the LLM is highly structured, easily parsable, and token-efficient, which is crucial for reliable LLM reasoning and cost-effectiveness.\n\n**Design Philosophy:**\nThe project adheres to the philosophy of **\"Specialized, Reliable, and LLM-Optimized.\"**\n*   **Specialized**: The agent focuses exclusively on financial data, integrating multiple specialized libraries (`yfinance`, `pytrends`, `talib`, `alpaca-py`) to provide deep domain expertise.\n*   **Reliable**: Reliability is achieved through the robust, unified **`@api_retry`** mechanism using `tenacity` and the use of an HTTP cache (`hishel`), ensuring resilience against transient network issues and rate-limiting errors common with external financial APIs.\n*   **LLM-Optimized**: Data is aggressively cleaned and structured (CSV format) to maximize the LLM's ability to interpret and use the information effectively, minimizing hallucination and improving accuracy.\n\n**Lifecycle Management:**\nThe lifecycle is managed by the client-server relationship:\n1.  **Server Startup**: The `chat.py` client launches the `investor-agent` server as a subprocess using `MCPServerStdio`. The `FastMCP` instance initializes and registers all `@mcp.tool()` functions.\n2.  **Execution**: The LLM sends a tool call request, which is executed by the server. Blocking I/O is managed by `ThreadPoolExecutor` to maintain server responsiveness.\n3.  **Shutdown**: The server process is terminated when the client exits, ensuring a clean resource release.\n\n#### 3.1.2. Component Interactions\n\nThe project operates on a clear three-tier architecture: **Client (LLM Agent) -> MCP Server -> External APIs**. This structure ensures a clean separation of concerns, with the MCP Server acting as a specialized financial data broker for the LLM.\n\n**Component Interactions:**\n1.  **Client (`chat.py`) and LLM Agent**: The `chat.py` script serves as the command-line interface, initializing the LLM Agent (`pydantic_ai.Agent`) and injecting the MCP Server as a toolset. The user's text query is passed to the LLM, which then decides whether to invoke one of the available tools.\n2.  **LLM Agent and MCP Server (`server.py`)**: This is the core communication channel, utilizing the **Model Context Protocol (MCP)** over standard I/O. The LLM sends a structured JSON request for a tool call (e.g., `get_ticker_data(ticker=\"TSLA\")`), and the MCP Server responds with a JSON object containing the result.\n3.  **MCP Server and External APIs**: The server interacts with external services using two primary patterns:\n    *   **Synchronous Blocking Calls**: Used for the `yfinance` library (e.g., `yf_call`). To prevent the asynchronous server from blocking, these calls are strategically wrapped in a **`concurrent.futures.ThreadPoolExecutor`** (e.g., in `get_ticker_data`) to execute them concurrently and reduce overall latency.\n    *   **Asynchronous HTTP Calls**: Used for general web scraping and API calls (e.g., CNN Fear & Greed, Nasdaq Earnings). These leverage `httpx` and the `AsyncCacheClient` for non-blocking, cached requests.\n\n**Data Flow:**\nThe data flow is designed to maximize LLM efficiency:\n1.  **Inbound**: User Query (Text) -> LLM Agent -> Tool Call Request (JSON) -> MCP Server.\n2.  **Processing**: MCP Server receives the request, executes the corresponding Python function, and calls External APIs (e.g., `yfinance.Ticker().history()`). Raw data (JSON, HTML) is received and processed into a **`pandas.DataFrame`**.\n3.  **Outbound**: The `pandas.DataFrame` is passed through the `to_clean_csv()` utility, resulting in a clean, structured **CSV String**. This CSV String (or a structured dictionary) is wrapped in a JSON Tool Call Response and sent back to the LLM Agent for final reasoning and text generation. This CSV-centric output is a critical design choice for reliable LLM consumption.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\ntitle Overall Investor-Agent Architecture\n\nskinparam component {\n  BackgroundColor<<LLM>> LightGreen\n  BackgroundColor<<Client>> LightBlue\n  BackgroundColor<<Server>> LightYellow\n  BackgroundColor<<External>> LightCoral\n}\n\ncomponent [LLM Agent] <<LLM>> as LLM\ncomponent [Chat Client] <<Client>> as Client\ncomponent [Investor-Agent MCP Server] <<Server>> as Server\ndatabase [Yahoo Finance API] <<External>> as YF\ndatabase [Google Trends API] <<External>> as GT\ndatabase [CNN Fear & Greed] <<External>> as CNN\ndatabase [Nasdaq Earnings API] <<External>> as NASDAQ\ndatabase [Alpaca Data API] <<External>> as ALPACA\ncomponent [TA-Lib Library] <<External>> as TALIB\n\n' Interactions\nClient --> LLM : User Query (Text)\nLLM --> Server : Tool Call Request (JSON/MCP)\nServer --> LLM : Tool Call Result (CSV/JSON/MCP)\nLLM --> Client : Final Answer (Text)\n\n' Server Internal Dependencies\nServer --> YF : yfinance calls (via ThreadPoolExecutor)\nServer --> GT : pytrends calls\nServer --> CNN : HTTP GET (AsyncCacheClient)\nServer --> NASDAQ : HTTP GET (AsyncCacheClient)\nServer ..> ALPACA : fetch_intraday_data (Optional)\nServer ..> TALIB : calculate_technical_indicator (Optional)\n\n' Core Abstractions\nnote right of Server\n  Core Abstractions:\n  - @mcp.tool() Functions\n  - pandas.DataFrame\n  - CSV String Output\n  - @api_retry Decorator\nend note\n\n' Data Flow\nYF --> Server : Raw Financial Data\nGT --> Server : Search Interest Data\nCNN --> Server : Fear & Greed Index Data\nNASDAQ --> Server : Earnings Calendar Data\nServer --> Server : Data Processing (Pandas)\nServer --> LLM : Clean CSV Data\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe codebase, despite its minimal size, effectively utilizes several established software design patterns to enhance robustness, modularity, and maintainability.\n\n1.  **Adapter Pattern**\n    *   **Description**: Converts the interface of a class into another interface clients expect.\n    *   **Implementation**: The entire `server.py` module acts as an Adapter layer. It takes the raw, often complex and inconsistent APIs of `yfinance`, `pytrends`, and various web endpoints, and adapts them into a uniform, simple, and LLM-friendly interface of `@mcp.tool()` functions that return clean CSV strings.\n    *   **Code Example (Implicit in `server.py`):** The `yf_call` function (lines 117-121) is a micro-adapter that wraps various `yfinance.Ticker` methods (`get_info`, `history`, `option_chain`) into a single, retry-enabled function call, standardizing access to the underlying library.\n\n2.  **Decorator Pattern**\n    *   **Description**: Attaches additional responsibilities to an object dynamically and transparently.\n    *   **Implementation**: The **`@api_retry`** decorator (lines 51-64) is a critical implementation of this pattern. It wraps core API-calling functions (`fetch_json`, `fetch_text`, `yf_call`) with robust error handling and retry logic using the `tenacity` library, without altering the core logic of the wrapped functions.\n    *   **Code Example (server.py:51-53, 75):**\n        ```python\n        # Decorator definition\n        def api_retry(func):\n            return retry(\n                stop=stop_after_attempt(3),\n                # ... retry logic ...\n            )(func)\n\n        # Decorator usage\n        @api_retry\n        async def fetch_json(url: str, headers: dict | None = None) -> dict:\n            \"\"\"Generic JSON fetcher with retry logic.\"\"\"\n            # ...\n        ```\n\n3.  **Strategy Pattern (Conditional Registration)**\n    *   **Description**: Defines a family of algorithms, encapsulates each one, and makes them interchangeable.\n    *   **Implementation**: The conditional registration of optional tools (`calculate_technical_indicator` and `fetch_intraday_data`) based on the availability of `talib` and `alpaca-py` (lines 613-614, 662-663) allows the server to dynamically switch its available capabilities (strategies) based on the user's installed environment. This makes the core agent lightweight while allowing for powerful extensions.\n    *   **Code Example (server.py:662-663):**\n        ```python\n        # Only register the technical indicator tool if TA-Lib is available\n        if _ta_available:\n            @mcp.tool()\n            def calculate_technical_indicator(...):\n                # ... TA-Lib logic ...\n        ```\n\n#### 3.3.2. Project Highlights\n\nThe Investor-Agent project demonstrates several innovative and flexible design choices that contribute to its effectiveness as an LLM tool.\n\n*   **Robustness and Resilience via Unified Retry Mechanism**:\n    *   The project implements a single, unified **`@api_retry`** decorator using `tenacity`. This decorator is applied to all external API calls (`yfinance`, `fetch_json`, `fetch_text`). This centralized approach ensures that the agent can reliably handle the common fragility of external financial APIs, including network timeouts, transient errors, and specific rate-limiting exceptions (`YFRateLimitError`), with automatic exponential backoff.\n\n*   **LLM-Centric Data Formatting (CSV Optimization)**:\n    *   The aggressive use of `pandas` and the custom **`to_clean_csv`** utility is a key highlight. This utility removes empty columns and converts the resulting DataFrame into a clean, index-less CSV string. This structured, minimal format is highly optimized for LLM consumption, minimizing token usage and maximizing the LLM's ability to accurately parse and reason over the data.\n\n*   **Performance Optimization through Concurrency**:\n    *   The strategic use of **`concurrent.futures.ThreadPoolExecutor`** within asynchronous tool functions (e.g., `get_ticker_data`, `get_financial_statements`) is a significant performance feature. It allows multiple blocking `yfinance` calls to execute in parallel, drastically reducing the total latency for comprehensive data requests and preventing the asynchronous server from being blocked.\n\n*   **Extensibility and Modularity via Conditional Registration**:\n    *   The core MCP design promotes extensibility, but the conditional registration of optional tools (e.g., `calculate_technical_indicator` for TA-Lib and `fetch_intraday_data` for Alpaca) is a sophisticated modularity feature. This allows users to install only the dependencies they need, keeping the core agent lightweight while enabling powerful, specialized extensions.\n\n*   **Asynchronous Caching for Web Requests**:\n    *   The use of **`hishel.httpx.AsyncCacheClient`** provides automatic, persistent caching for all web-scraped and general API data. This reduces the load on external servers, speeds up repeated requests for static data, and further enhances the agent's reliability and performance.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe Investor-Agent is well-engineered for robustness, but several architectural and code quality improvements could further enhance its performance and maintainability.\n\n1.  **Asynchronous `yfinance` Integration**:\n    *   **Issue**: The current implementation wraps blocking `yfinance` calls in a `ThreadPoolExecutor`. While functional, this adds overhead and is less idiomatic for an `async` server.\n    *   **Suggestion**: Explore using an asynchronous wrapper for `yfinance` (if available) or migrating to a fully asynchronous financial data library. This would eliminate the need for the `ThreadPoolExecutor`, simplifying the code and improving the server's overall non-blocking performance.\n\n2.  **Standardized Data Output Schema (Pydantic)**:\n    *   **Issue**: Complex tools like `get_ticker_data` return a raw dictionary, which lacks strong type checking and requires the LLM to infer the structure.\n    *   **Suggestion**: Define explicit Pydantic models for the output of complex tools. MCP supports Pydantic models, which would provide a reliable, strongly-typed contract for the LLM, reducing ambiguity and improving the reliability of tool-use reasoning.\n\n3.  **Centralized Configuration Management**:\n    *   **Issue**: API keys and settings (e.g., Alpaca credentials) are managed via environment variables, which can be scattered and difficult to manage in complex deployments.\n    *   **Suggestion**: Implement a dedicated configuration library (e.g., Pydantic Settings or `python-decouple`) to manage settings centrally. This would improve security, allow for environment-specific configuration files, and make the application easier to deploy in various environments.\n\n4.  **Enhanced Error Reporting Detail**:\n    *   **Issue**: The `api_retry` mechanism is robust, but the final error message to the LLM can be generic (e.g., \"Failed to retrieve data\").\n    *   **Suggestion**: Enhance exception handling within each tool to provide more specific, actionable error messages to the LLM (e.g., \"Ticker 'XYZ' not found on Yahoo Finance\" or \"Invalid date range provided\"). This allows the LLM to better self-correct or provide more informative feedback to the user.\n\n#### 3.4.2. Secondary Development Guide\n\nThe Investor-Agent is highly modular and designed for easy extension via the Model Context Protocol (MCP). Secondary development should focus on adding new `@mcp.tool()` functions to expand the agent's financial data capabilities.\n\n1.  **Setup and Environment**:\n    *   Clone the repository and install dependencies using the project's preferred package manager (e.g., `uv pip install -e .`).\n    *   Set up necessary environment variables, such as `ALPACA_API_KEY` and `ALPACA_API_SECRET`, if the optional Alpaca tool is to be used.\n    *   Test the existing functionality by running the client with `python chat.py`.\n\n2.  **Adding a New Financial Tool**:\n    *   **Locate `investor_agent/server.py`**. All new tool logic must be implemented here.\n    *   **Define the Function**: Create a new asynchronous function that encapsulates the data retrieval logic (e.g., fetching data from a new API).\n    *   **Apply Decorator**: Decorate the function with `@mcp.tool()` to expose it to the LLM.\n    *   **Implement Robustness**: Ensure all external API calls within the function are wrapped with the `@api_retry` decorator to inherit the project's error handling and retry logic.\n    *   **Format Output**: Use `pandas` for data manipulation and ensure the final return value is a clean, structured CSV string via `to_clean_csv(df)` or a well-defined dictionary/Pydantic model, as this is the preferred format for LLM consumption.\n\n3.  **Dependency Management**:\n    *   If a new tool requires a new library, add it to `pyproject.toml`. If the dependency is optional, implement conditional registration in `server.py` (similar to TA-Lib and Alpaca) to maintain a lightweight core for users who do not need the feature.\n\n"
  },
  {
    "path": "thirdparty/panda_quantflow.md",
    "content": "# panda_quantflow - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project, `panda_quantflow`, exhibits a clear, multi-layered structure typical of a complex Python application that combines a web interface, a backtesting engine, and a real-time trading service. The root directory contains standard project files (`.gitignore`, `Dockerfile`, `pyproject.toml`) and the main source code under the `src` directory.\n\nThe `src` directory is divided into four primary, high-level modules: `common`, `panda_backtest`, `panda_trading`, and `panda_web`, along with a `utils` module.\n\n*   **`src/common`**: This module serves as the foundational infrastructure layer. It houses generic components like data models for backtest results (`backtest/model`), database connection handlers (`connector` for MongoDB, MySQL, Redis), system configuration (`config`), and logging utilities. This module is designed for reusability across the backtesting and trading services.\n*   **`src/panda_backtest`**: This is the core backtesting engine. It is highly structured, containing the user-facing trading APIs (`api`), the core context and constant definitions (`backtest_common`), a robust exception handling system (`exception`), and the simulated exchange logic (`exchange`) for various asset classes (stock, future, fund). It also includes the logic for order verification and result processing.\n*   **`src/panda_trading`**: This module is dedicated to real-time trading execution. It includes a FastAPI application (`__main__.py`) to manage trade processes, data models for real-time accounts, and the critical integration with external trading gateways, notably the CTP (China Futures Trading) API (`real_trade_api/ctp`). It also contains logic for trade monitoring and routing.\n*   **`src/panda_web`**: This module functions as the application's web backend and API gateway. It defines the FastAPI routes (`routes`) for user interaction (backtest management, workflow control, chat), the business logic (`logic`) for handling these routes, and a sophisticated LLM-powered service layer (`services/llm`) for code assistance and strategy generation. The presence of static assets suggests it also serves the frontend.\n*   **`src/utils`**: This module contains miscellaneous, general-purpose utilities that are not specific to trading or backtesting, such as annotation helpers (e.g., for Singleton), data manipulation tools, Redis-based distributed locking, and threading helpers.\n\nThis structure clearly separates concerns: **Infrastructure** (`common`, `utils`), **Simulation Logic** (`panda_backtest`), **Real-Time Execution** (`panda_trading`), and **User Interface/API** (`panda_web`).\n\n```\n/home/ubuntu/panda_quantflow\n├── .git/\n├── Dockerfile\n├── README.md\n├── pyproject.toml\n└── src/\n    ├── common/ (Infrastructure: Models, DB Connectors, Config)\n    │   ├── backtest/model/ (Pydantic models for backtest data)\n    │   ├── config/ (Project configuration)\n    │   ├── connector/ (MongoDB, MySQL, Redis clients)\n    │   ├── cron/ (Scheduled task management)\n    │   ├── logging/ (System and user logging)\n    │   └── utils/ (General common utilities)\n    ├── panda_backtest/ (Backtesting Engine Core)\n    │   ├── api/ (User-facing trading APIs)\n    │   ├── backtest_common/ (Core context, constants, data structures)\n    │   ├── exception/ (Custom exception handling)\n    │   ├── exchange/ (Simulated exchange logic for assets)\n    │   ├── model/ (Quotation and result models)\n    │   ├── order/ (Order verification and building)\n    │   ├── result/ (Backtest result calculation)\n    │   ├── system/ (Core context and time management)\n    │   └── util/ (Backtest-specific utilities)\n    ├── panda_trading/ (Real-Time Trading Service)\n    │   ├── models/ (Real-time trade models)\n    │   ├── real_trade_api/ctp/ (CTP API integration)\n    │   ├── trading/ (Core real-time trading logic)\n    │   ├── trading_account_monitor/ (Account monitoring service)\n    │   └── trading_route/ (Trade management server)\n    ├── panda_web/ (Web API and LLM Services)\n    │   ├── logic/ (Business logic for API routes)\n    │   ├── messaging/ (RabbitMQ consumers)\n    │   ├── models/ (API request/response models)\n    │   ├── routes/ (FastAPI endpoints)\n    │   └── services/llm/ (LLM Agents and Code Checkers)\n    └── utils/ (General Utilities: Lock, Thread, Time)\n```\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/panda_quantflow/src/common`: Foundational components, including backtest data models, database connectors (MongoDB, Redis, MySQL), configuration, and logging utilities. This forms the core infrastructure layer.\n*   `/home/ubuntu/panda_quantflow/src/panda_backtest`: The comprehensive backtesting engine. It contains the trading APIs, core context management, data structures for accounts and positions, exception handling, and the logic for simulating various financial instruments (stock, future, fund).\n*   `/home/ubuntu/panda_quantflow/src/panda_trading`: The real-time trading module. This includes the FastAPI server for trade management, models for real-time accounts, and the critical integration with external trading gateways, specifically the CTP (China Futures Trading) API.\n*   `/home/ubuntu/panda_quantflow/src/panda_web`: The web application backend. It hosts the FastAPI routes, the business logic for managing workflows, and the advanced LLM-powered services (assistants and code checkers).\n*   `/home/ubuntu/panda_quantflow/src/utils`: A collection of general-purpose utility functions, including singleton annotation, file/data manipulation, Redis-based distributed locking, logging factory, and threading utilities.\n\n## Phase 2: Module-by-Module Deep Analysis\n\nThe project is logically partitioned into four main Python modules, each with distinct responsibilities, supported by a general `utils` module.\n\n### 1. Module: `src/common`\n**Core Responsibility**: Provides the fundamental infrastructure, data models, and connectivity services used by the higher-level backtesting and trading modules. It acts as the shared foundation of the application.\n\n**Key Files and Responsibilities**:\n*   `backtest/model/*.py`: Defines Pydantic models (`BacktestAccountModel`, `BacktestModel`, `BacktestPositionModel`, etc.) for persistent backtest data, often stored in MongoDB. The custom `PyObjectId` class handles MongoDB ID serialization.\n*   `connector/mongodb_handler.py`: Manages the connection and CRUD operations for MongoDB, the primary data store for models and results.\n*   `connector/redis_client.py`: Provides a client for Redis, used for caching, session management, and message queuing.\n*   `config/config.py`, `config/project.py`: Handles application configuration loading and access.\n*   `logging/system_logger.py`, `logging/user_logger.py`: Implements distinct logging mechanisms for internal system events and user-facing strategy logs.\n\n**Implementation Details**: The module heavily relies on **Pydantic** for data integrity and **MongoDB** as the primary document store. The separation of system and user logging is a crucial feature for a platform where user-written code (strategies) is executed.\n\n### 2. Module: `src/panda_backtest`\n**Core Responsibility**: The complete backtesting engine, responsible for simulating trading strategies against historical data, managing virtual accounts, and calculating performance metrics.\n\n**Key Files and Responsibilities**:\n*   `api/api.py`: The main entry point for user-written strategies, exposing high-level functions like `order_shares`, `order_values`, `buy_open`, and `cancel_order`.\n*   `backtest_common/system/context/core_context.py`: The central **Singleton** class that holds the state of the current backtest run.\n*   `backtest_common/exchange/stock/back_test/stock_exchange.py`: Contains the core logic for simulating stock market operations (e.g., handling dividends, splits, and trade execution).\n*   `order/common/order_quotation_verify.py`: Logic to check if an order is valid based on current market data (e.g., price limits).\n*   `result/result_calculate.py`: Computes final backtest performance metrics (Sharpe ratio, Max Drawdown, Alpha, etc.).\n\n**Implementation Details**: The module implements a complex simulation environment. The trading APIs are generic, relying on the `CoreContext` to dispatch actions to the appropriate simulated exchange logic. The extensive use of sub-modules for different asset types (`fund`, `future`, `stock`) and logic layers (`exchange`, `order`, `result`) demonstrates a high degree of modularity.\n\n### 3. Module: `src/panda_trading`\n**Core Responsibility**: Manages real-time trading execution, primarily focusing on the integration with external trading gateways like CTP (China Futures Trading Platform).\n\n**Key Files and Responsibilities**:\n*   `__main__.py`: The entry point for the real-time trading service, which runs a **FastAPI** server to receive commands (start/stop trade).\n*   `trading_route/manager/real_trade_manager.py`: Manages the lifecycle of real-time trading processes.\n*   `real_trade_api/ctp/*.py`: Contains the CTP-specific implementation, including the `ctp_trade_api.py` and `ctp_quotation_api.py` for trade and market data, and the corresponding SPI (Service Provider Interface) classes (`ctp_trade_spi.py`, `ctp_quotation_spi.py`) to handle asynchronous callbacks from the CTP gateway.\n*   `models/trading/trading_future_account.py`: Pydantic model for real-time future trading accounts.\n\n**Implementation Details**: This module is a dedicated microservice. It uses a **Producer-Consumer Pattern** where the `panda_web` module acts as the producer (sending start/stop commands) and this module acts as the consumer/executor. The CTP integration is a complex, low-level component that uses a thread-based query mechanism (`qry_thread.py`) to manage synchronous and asynchronous API calls.\n\n### 4. Module: `src/panda_web`\n**Core Responsibility**: The user-facing API and business logic layer, including workflow management and advanced LLM-powered features.\n\n**Key Files and Responsibilities**:\n*   `routes/*.py`: Defines the FastAPI endpoints for backtesting, trading, workflow management, and chat.\n*   `logic/workflow_run_logic.py`: Core business logic for initiating and managing strategy runs.\n*   `services/llm/agents/*.py`: Defines the LLM-powered assistants (`backtest_assistant.py`, `code_assistant.py`) that handle user queries and code generation/checking.\n*   `services/llm/code_checker/*.py`: Contains logic for static analysis of user-submitted code, including rule definitions and variable tracking.\n*   `messaging/workflow_consumer.py`: A consumer that listens to the RabbitMQ queue to process workflow execution tasks.\n\n**Implementation Details**: This module is the application's brain. It implements the **Command Pattern** by translating API requests into internal commands (logic calls) and external messages (RabbitMQ). The LLM service is a sophisticated feature, employing a **Chain of Responsibility** or **Strategy Pattern** to route user requests to the correct assistant and then validate the generated code using the `code_checker` sub-module.\n\n### 5. Module: `src/utils`\n**Core Responsibility**: General-purpose, reusable utilities.\n\n**Key Files and Responsibilities**:\n*   `annotation/singleton_annotation.py`: Mechanism to easily apply the Singleton pattern to classes.\n*   `lock/redis_lock.py`: Implements a distributed lock using Redis, essential for coordinating actions across multiple services (e.g., preventing concurrent updates to the same account).\n*   `time/time_util.py`: Utility functions for time and date manipulation.\n\n**Implementation Details**: This module ensures that common, non-domain-specific tasks are centralized and reusable, promoting a DRY (Don't Repeat Yourself) principle across the entire project. The distributed locking mechanism is critical for the integrity of a multi-service financial platform.\n\n### Module PlantUML Diagrams\n\n### Module: `src/common`\n\n```plantuml\n@startuml\npackage \"common\" {\n  class PyObjectId {\n    + validate(v)\n    + __get_pydantic_core_schema__()\n  }\n\n  package \"backtest.model\" {\n    class BacktestAccountBaseModel\n    class BacktestAccountModel\n    class BacktestBaseModel\n    class BacktestModel\n    class BacktestPositionBaseModel\n    class BacktestPositionModel\n    ' ... other models\n  }\n\n  package \"connector\" {\n    class MongoDBHandler\n    class RedisClient\n    class MySQLClient\n  }\n\n  package \"logging\" {\n    class LogFactory {\n      + get_logger()\n    }\n    class RemoteLogFactory {\n      + get_sr_logger()\n    }\n  }\n\n  BacktestAccountBaseModel <|-- BacktestAccountModel\n  BacktestBaseModel <|-- BacktestModel\n  BacktestPositionBaseModel <|-- BacktestPositionModel\n\n  PyObjectId <.. BacktestAccountModel : uses\n  PyObjectId <.. BacktestModel : uses\n\n  MongoDBHandler .> RedisClient : dependency\n  MongoDBHandler .> MySQLClient : dependency\n}\n@enduml\n```\n\n### Module: `src/panda_backtest`\n\n```plantuml\n@startuml\npackage \"panda_backtest\" {\n  package \"api\" {\n    class TradingAPI {\n      + order_shares()\n      + order_values()\n      + buy_open()\n      + sell_open()\n      + cancel_order()\n    }\n  }\n\n  package \"backtest_common.system.context\" {\n    class CoreContext <<Singleton>> {\n      + get_instance()\n      + operation_proxy\n      + strategy_context\n    }\n  }\n\n  package \"order.common\" {\n    class OrderQuotationVerify\n    class OrderRiskControlVerify\n  }\n\n  package \"exchange.stock.back_test\" {\n    class StockExchange\n    class DividendManager\n  }\n\n  package \"result\" {\n    class ResultCalculate\n  }\n\n  CoreContext \"1\" -- \"1\" TradingAPI : provides access to\n  TradingAPI .> CoreContext : uses\n  CoreContext \"1\" -- \"1\" OperationProxy : contains\n  OperationProxy .> OrderQuotationVerify : delegates\n  OperationProxy .> OrderRiskControlVerify : delegates\n  OperationProxy .> StockExchange : delegates\n  StockExchange .> DividendManager : uses\n  TradingAPI .> ResultCalculate : uses\n}\n@enduml\n```\n\n### Module: `src/panda_trading`\n\n```plantuml\n@startuml\npackage \"panda_trading\" {\n  class MainApp <<FastAPI>>\n\n  package \"trading_route.manager\" {\n    class RealTradeManager {\n      + start_trade(run_id)\n      + kill_run_trade(run_id)\n    }\n  }\n\n  package \"real_trade_api.ctp\" {\n    class CTPTradeAPI\n    class CTPQuotationAPI\n    class CTPTradeSPI <<Callback>>\n    class CTPQuotationSPI <<Callback>>\n    class QryThread\n  }\n\n  MainApp --> RealTradeManager : manages\n  RealTradeManager --> CTPTradeAPI : uses\n  CTPTradeAPI .> CTPTradeSPI : registers callback\n  CTPQuotationAPI .> CTPQuotationSPI : registers callback\n  CTPTradeAPI .> QryThread : uses\n  CTPQuotationAPI .> QryThread : uses\n}\n@enduml\n```\n\n### Module: `src/panda_web`\n\n```plantuml\n@startuml\npackage \"panda_web\" {\n  package \"routes\" {\n    class WorkflowRoutes <<FastAPI Router>>\n    class BacktestRoute <<FastAPI Router>>\n    class ChatRoutes <<FastAPI Router>>\n  }\n\n  package \"logic\" {\n    class WorkflowRunLogic\n    class WorkflowSaveLogic\n  }\n\n  package \"services.llm.agents\" {\n    class CodeAssistant\n    class BacktestAssistant\n    class PromptsProvider\n  }\n\n  package \"services.llm.code_checker\" {\n    class BaseCodeChecker\n    class BacktestCodeChecker\n    class VariableTracker\n  }\n\n  WorkflowRoutes --> WorkflowRunLogic : calls\n  ChatRoutes --> CodeAssistant : calls\n  ChatRoutes --> BacktestAssistant : calls\n\n  CodeAssistant .> BaseCodeChecker : uses\n  BacktestAssistant .> BaseCodeChecker : uses\n  BaseCodeChecker <|-- BacktestCodeChecker\n  BaseCodeChecker .> VariableTracker : uses\n  CodeAssistant .> PromptsProvider : loads prompts\n}\n@enduml\n```\n\n### Module: `src/utils`\n\n```plantuml\n@startuml\npackage \"utils\" {\n  package \"annotation\" {\n    class SingletonAnnotation <<Decorator>>\n  }\n\n  package \"lock\" {\n    class RedisLock {\n      + acquire()\n      + release()\n    }\n  }\n\n  package \"data\" {\n    class DataUtil\n    class FileUtil\n  }\n\n  package \"time\" {\n    class TimeUtil\n  }\n\n  RedisLock .> RedisClient : requires (from common)\n}\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe **panda_quantflow** project is built upon a set of core abstractions that enforce a clear separation between the trading logic, data management, and application infrastructure.\n\n**1. Core Context and State Management (`CoreContext`)**\nThe most critical abstraction is the **CoreContext** (`panda_backtest/backtest_common/system/context/core_context.py`), which implements the **Singleton Pattern**. This class serves as the central registry for all runtime components and state within a single backtest or trading run. It encapsulates:\n*   **Strategy Context**: Holds run-specific parameters (start date, end date, capital, etc.).\n*   **Operation Proxy**: The interface for all trading actions (ordering, canceling, subscribing).\n*   **Data Sources**: Access to historical and real-time quotation data.\n*   **Account/Position Models**: The current state of the trading account.\nThis design ensures that all parts of a running strategy have a consistent, single source of truth for the environment and available actions.\n\n**2. Data Modeling (`BaseModel` and `PyObjectId`)**\nThe project extensively uses **Pydantic's `BaseModel`** for defining data structures, ensuring data validation and clear schema definition across all components.\n*   Classes like `BacktestModel`, `FutureAccountModel`, and `WorkflowModel` define the persistent state of the system.\n*   The custom **`PyObjectId`** class handles the serialization and validation of MongoDB's `ObjectId` within the Pydantic framework, abstracting away the database-specific ID type for cleaner application logic.\n\n**3. Service Interface (`OperationProxy` and LLM Agents)**\nThe **OperationProxy** acts as a facade, providing a unified interface for trading operations while hiding the complexity of order verification, risk control, and exchange interaction. This allows the strategy code to remain clean and agnostic to the underlying execution environment (backtest simulation vs. real-time CTP connection).\n\nThe **LLM Agents** (`panda_web/services/llm/agents`) are a key abstraction for the AI-powered features. Agents like `CodeAssistant` and `BacktestAssistant` abstract the complex logic of interacting with large language models, prompt engineering, and code checking (`code_checker`) into simple, callable services.\n\n**4. Lifecycle Management**\nThe lifecycle of a strategy run is managed by the **Workflow Logic** in `panda_web`.\n*   **Start**: A request triggers `workflow_run_logic.py`, which prepares the environment and dispatches the task via RabbitMQ.\n*   **Execution**: The consumer process (`panda_backtest` or `panda_trading`) initializes the `CoreContext` and executes the strategy.\n*   **End**: The execution process saves the final results to MongoDB and updates the run status, completing the lifecycle. The use of `uvicorn` and `FastAPI` in `panda_trading` also suggests a managed, long-running service lifecycle for real-time execution.\n\n#### 3.1.2. Component Interactions\n\nThe system operates as a multi-component service-oriented architecture, primarily communicating through internal Python function calls, database interactions, and a message queue.\n\n**1. Web-to-Logic Flow (Workflow Execution)**\nThe primary entry point is the `panda_web` module, which exposes FastAPI routes (`routes`).\n*   A user request (e.g., to run a backtest or a real-time strategy) hits a route in `panda_web/routes`.\n*   The route calls the corresponding business logic in `panda_web/logic` (e.g., `workflow_run_logic.py`).\n*   The logic layer uses the `common` module's database connectors (e.g., `mongodb_handler.py`) to retrieve workflow and strategy details.\n*   For execution, the logic layer likely sends a message to a message queue (RabbitMQ, via `panda_web/messaging/rabbitmq_client.py`) to trigger the actual backtest or trading service.\n\n**2. Backtest Execution Flow**\n*   The `panda_backtest` module, likely running as a consumer process (`workflow_consumer.py`), receives the execution message.\n*   It initializes the core context (`CoreContext` in `panda_backtest/backtest_common/system/context/core_context.py`).\n*   The strategy code is executed, making calls to the high-level trading APIs (`panda_backtest/api/api.py`).\n*   These APIs delegate operations (e.g., `order_shares`, `cancel_order`) to an `OperationProxy` instance within the `CoreContext`.\n*   The `OperationProxy` orchestrates the process:\n    *   **Order Verification**: Calls modules in `panda_backtest/order` (e.g., `order_quotation_verify.py`, `order_risk_control_verify.py`) to check limits and risk.\n    *   **Exchange Simulation**: Interacts with the simulated exchange logic in `panda_backtest/backtest_common/exchange` to update account and position models.\n    *   **Data Persistence**: Uses `common/connector` to save results (`BacktestModel`, `BacktestAccountModel`, etc.) to MongoDB.\n\n**3. Real-Time Trading Flow**\n*   The `panda_trading` module runs a dedicated FastAPI server (`panda_trading/__main__.py`) to manage real-time strategies.\n*   The `RealTradeManager` handles starting and stopping trade processes.\n*   The core trading logic (`panda_trading/trading/extensions/real_trade/main.py`) uses the CTP API integration (`panda_trading/real_trade_api/ctp`) to connect to the exchange.\n*   The CTP SPI (Service Provider Interface) classes (`ctp_trade_spi.py`, `ctp_quotation_spi.py`) handle asynchronous communication with the CTP gateway, receiving market data and trade confirmations.\n*   Trade data and account updates are persisted to MongoDB and Redis using the `common` and `utils` modules.\n\n**Key Communication Patterns:**\n*   **API Gateway Pattern**: `panda_web` acts as the gateway, routing requests to specialized internal services.\n*   **Message Queue (Asynchronous)**: Used for decoupling the web request from the long-running backtest/trading execution.\n*   **Shared Database (MongoDB)**: Centralized storage for models, configurations, and results, facilitating communication between services.\n*   **Shared Cache (Redis)**: Used for session management, real-time data caching, and distributed locking (`utils/lock/redis_lock.py`).\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\nskinparam componentStyle rectangle\n\npackage \"panda_quantflow\" {\n  component [panda_web] as Web {\n    [Routes]\n    [LLM Services]\n    [Workflow Logic]\n  }\n  component [panda_backtest] as Backtest {\n    [APIs]\n    [Exchange Logic]\n    [Order Verification]\n    [Result Processing]\n  }\n  component [panda_trading] as Trading {\n    [Real-Time Trade Server]\n    [CTP API Integration]\n    [Trade Models]\n  }\n  component [common] as Common {\n    [Backtest Models]\n    [DB Connectors]\n    [Configuration]\n    [Logging]\n  }\n  component [utils] as Utils {\n    [Annotations]\n    [Redis Lock]\n    [Time/Data Utils]\n  }\n\n  Web --> Backtest : Backtest Management\n  Web --> Trading : Real-Time Trade Management\n  Web --> Common : Data Models & Config\n  Web --> Utils : Utility Functions\n\n  Backtest --> Common : Models, Connectors\n  Backtest --> Utils : Utility Functions\n\n  Trading --> Common : Models, Connectors\n  Trading --> Utils : Utility Functions\n  Trading --> [CTP Gateway] : Real-Time Trading\n\n  Common --> [MongoDB] : Data Storage\n  Common --> [Redis] : Caching, Locking\n  Common --> [RabbitMQ] : Messaging\n\n  Utils --> [Redis] : Locking\n\n  [CTP Gateway] .> Trading : Market Data/Trade Execution\n}\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe codebase demonstrates the application of several common and domain-specific design patterns to manage complexity and enforce structure.\n\n**1. Singleton Pattern**\n*   **Implementation**: The `CoreContext` class (`panda_backtest/backtest_common/system/context/core_context.py`) and various logging/factory classes (e.g., `LogFactory`, `RemoteLogFactory`) use the Singleton pattern to ensure a single, globally accessible instance of a resource.\n*   **Example**: The `CoreContext.get_instance()` method ensures that all parts of a strategy run access the same environment and state, which is crucial for consistent backtesting and trading. The `utils/annotation/singleton_annotation.py` file suggests a custom decorator or mechanism is used to enforce this pattern.\n\n**2. Factory Pattern**\n*   **Implementation**: The `log_factory.py` and `remote_log_factory.py` files in `common/logging` and `panda_backtest/util/log` use the Factory pattern to abstract the creation of logger instances.\n*   **Example**: `LogFactory.get_logger()` returns a configured logger instance, decoupling the logging client code from the specific implementation details of the logging system.\n\n**3. Builder Pattern**\n*   **Implementation**: The exception handling mechanism uses a Builder pattern to construct complex exception objects.\n*   **Example**: `risk_control_exception_builder.py` and `strategy_exception_builder.py` are responsible for assembling detailed, contextualized exception messages and objects, which is a common practice for robust error reporting in financial systems.\n\n**4. Strategy Pattern (Implicit in LLM Services)**\n*   **Implementation**: The LLM services are structured to follow the Strategy pattern, where different \"assistants\" (e.g., `backtest_assistant`, `code_assistant`, `factor_assistant`) implement a common interface (`llm_service.py`) but contain distinct logic for handling different types of user queries.\n*   **Example**: The logic files (`backtest_assistant_stream_logic.py`, `code_assistant_nonstream_logic.py`) represent different strategies for fulfilling the assistant's request (streaming vs. non-streaming response).\n\n**5. Data Access Object (DAO) Pattern**\n*   **Implementation**: The database connector classes in `common/connector` (`mongodb_handler.py`, `mysql_client.py`, `redis_client.py`) act as Data Access Objects, abstracting the CRUD operations for their respective databases. This isolates the application logic from the specifics of the data persistence layer.\n\n**6. Adapter Pattern**\n*   **Implementation**: The `panda_trading` module uses an Adapter pattern to interface with the CTP trading gateway.\n*   **Example**: `future_trade_adapter.py` likely translates the project's internal, generic trading commands into the specific data structures and function calls required by the CTP API, allowing the core logic to remain clean. The CTP SPI classes (`ctp_trade_spi.py`, `ctp_quotation_spi.py`) also serve as adapters for the asynchronous CTP callback mechanism.\n\n#### 3.3.2. Project Highlights\n\nThe **panda_quantflow** project features several innovative and well-designed aspects that contribute to its flexibility and power as a quantitative trading platform.\n\n*   **Integrated LLM-Powered Code Assistance**: The inclusion of the `services/llm` module is a significant highlight. It provides a **Code Assistant** and **Backtest Assistant** that can generate, check, and debug user-submitted strategy code. This lowers the barrier to entry for users and significantly enhances the development experience by integrating AI-driven static analysis and code generation directly into the platform's workflow. The `code_checker` sub-module, with its rules and variable tracking, is a robust implementation of this feature.\n\n*   **Clear Separation of Concerns (Backtest vs. Real-Time)**: The architecture strictly separates the simulation environment (`panda_backtest`) from the real-time execution environment (`panda_trading`). This is a best practice in quantitative finance, ensuring that the complex, non-deterministic nature of real-time trading (e.g., CTP API callbacks, network latency) does not pollute the deterministic, reproducible environment required for backtesting.\n\n*   **Extensible Trading API via `OperationProxy`**: The core trading functions (e.g., `order_shares`) are exposed through a clean, high-level API in `panda_backtest/api/api.py`. This API delegates all work to an `OperationProxy` within the `CoreContext`. This **Facade/Proxy Pattern** makes the system highly extensible. To support a new exchange or a new backtest feature, one only needs to update the `OperationProxy` implementation without changing the user-facing strategy code.\n\n*   **Robust Infrastructure with Distributed Primitives**: The `common` and `utils` modules provide production-grade infrastructure components. The `RedisLock` utility is critical for maintaining data integrity in a distributed environment, preventing race conditions when multiple services (web, backtest, trading) might be accessing or modifying the same user account or workflow state.\n\n*   **Pydantic-Enforced Data Integrity**: The pervasive use of Pydantic `BaseModel` for all internal and external data structures (models, requests, responses) ensures strong data validation and clear documentation (via FastAPI/Pydantic schema generation). This drastically improves the reliability and maintainability of the entire system.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe project is well-structured but has several areas where architectural clarity, performance, and maintainability could be improved.\n\n**1. Decoupling and Abstraction**\n*   **Suggestion**: Introduce a clearer **Repository Pattern** for data access. Currently, the `common/connector` classes are used directly by logic modules. A dedicated repository layer (e.g., `BacktestRepository`, `WorkflowRepository`) would abstract the database operations further, allowing the core logic to be database-agnostic and simplifying unit testing.\n*   **Benefit**: Reduces coupling between business logic and the persistence layer (MongoDB/MySQL).\n\n**2. Consistency in Data Handling and Typing**\n*   **Suggestion**: Enforce consistent use of Python's native `datetime` objects instead of string representations for dates and times in Pydantic models. The current mix of `Optional[str]` and `Optional[datetime]` for time fields (e.g., in `BacktestBaseModel`) is error-prone.\n*   **Benefit**: Improves type safety, simplifies date arithmetic, and prevents runtime parsing errors.\n\n**3. Performance Optimization in Backtesting**\n*   **Suggestion**: The backtesting engine should be reviewed for potential performance bottlenecks, especially in data loading and iteration. Consider using vectorized operations with libraries like NumPy or Polars instead of pure Python loops where possible for high-frequency data processing.\n*   **Benefit**: Significantly speeds up backtest execution, especially for long time periods or high-resolution data.\n\n**4. Clearer Separation of Concerns in `panda_backtest`**\n*   **Suggestion**: The `panda_backtest` module is very large and contains both high-level APIs and low-level exchange simulation logic. Consider splitting it into two distinct modules: `panda_strategy_api` (the user-facing API) and `panda_simulation_core` (the internal engine logic).\n*   **Benefit**: Enforces a cleaner boundary between the user-facing interface and the complex internal mechanics, making both easier to maintain and evolve.\n\n**5. LLM Service Robustness**\n*   **Suggestion**: Implement robust rate-limiting and caching mechanisms for the LLM services in `panda_web/services/llm`. LLM API calls are expensive and slow. Caching common code check results or prompt responses can reduce cost and latency.\n*   **Benefit**: Improves application responsiveness and reduces operational costs associated with external API usage.\n\n#### 3.4.2. Secondary Development Guide\n\nFor secondary development, a new developer should focus on the modular structure and core abstractions.\n\n**1. Environment Setup and Configuration**\n*   **Configuration**: Start by understanding `common/config/project.py` and the `.ini` files in `panda_trading/trading_route`. All environment-specific settings (database connections, server IPs, CTP credentials) are managed here.\n*   **Dependencies**: Ensure all required external services (MongoDB, Redis, RabbitMQ) are running, as the system is highly dependent on them for state and messaging.\n\n**2. Code Exploration Path**\n*   **Trading Logic**: To understand how orders are placed, start at `panda_backtest/api/api.py` (for backtest) or `panda_trading/trading/extensions/real_trade/trade` (for real-time). Follow the call chain to the `CoreContext` and `OperationProxy`.\n*   **Data Models**: All persistent data structures are defined using Pydantic in `common/backtest/model` and `panda_backtest/backtest_common/model`. Understanding these models is key to tracing data flow.\n*   **LLM Integration**: New AI features should be added as new \"Agents\" in `panda_web/services/llm/agents`, following the pattern of existing assistants. This involves defining new prompts in `prompts_provider.py` and implementing the logic in `logic`.\n\n**3. Best Practices**\n*   **Type Consistency**: Maintain strict type hinting using Pydantic models for all data transfer objects (DTOs) and API requests/responses.\n*   **Contextual Logging**: Use the provided `SRLogger` (Remote Logger) for all strategy-related output, as this is designed for remote monitoring and user-facing logs.\n*   **Avoid Direct DB Calls**: Always interact with the database through the DAO classes in `common/connector` to maintain abstraction and consistency.\n*   **Unit Testing**: The `panda_web/services/llm/tests` directory provides examples of how to test the LLM-powered components; ensure new features are covered with similar tests.\n\n"
  },
  {
    "path": "thirdparty/qlib.md",
    "content": "# qlib - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe Qlib project, located at `/home/ubuntu/FinnewsHunter/thirdparty/qlib`, is structured as a comprehensive Python package for quantitative investment research. The top-level directory contains the main source code package (`qlib`), along with supporting directories for documentation, examples, and utilities.\n\nThe library's core functionality is encapsulated within the `qlib` package, which follows a modular design:\n\n*   **`qlib/`**: This is the core Python package containing the entire library's logic. It is subdivided into modules that represent the main functional components of the quantitative investment platform.\n*   **`qlib/backtest`**: The engine for simulating trading strategies, including components for account management, decision making, exchange simulation, position tracking, and performance reporting.\n*   **`qlib/data`**: Manages all aspects of data, from storage and caching to dataset creation, filtering, and time-series operations. This module is critical for feature engineering and preventing look-ahead bias.\n*   **`qlib/model`**: Contains the base classes and implementations for machine learning models, trainers, and specialized components like risk models and ensemble methods.\n*   **`qlib/strategy`**: Defines the interface and base classes for implementing trading strategies, acting as the decision-making layer.\n*   **`qlib/workflow`**: Handles the end-to-end management of quantitative experiments, including task recording, result management, and integration with MLflow for reproducibility.\n*   **`qlib/rl`**: Dedicated module for Reinforcement Learning applications, particularly in optimal order execution, providing specialized simulators and trainers.\n*   **`qlib/utils`**: A collection of general-purpose utilities used across the library, including serialization, dynamic object creation, and time/data helpers.\n*   **`examples/`**: Contains various example workflows, tutorials (e.g., Jupyter notebooks), and configuration files demonstrating how to use the Qlib library.\n*   **`scripts/`**: Houses utility scripts, primarily for data collection and management, including data collectors for various sources and scripts for dumping and checking data health.\n*   **`tests/`**: Contains unit and integration tests for various components, ensuring code quality and reliability.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/qlib/backtest`: Core backtesting engine components, including account, position, exchange simulation, and execution logic.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/qlib/data`: Data management, storage, feature engineering, and dataset creation.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/qlib/model`: Machine learning model abstraction, training infrastructure, and specialized risk/ensemble models.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/qlib/strategy`: Trading strategy base classes and interfaces for generating trade decisions.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/qlib/workflow`: Experiment management, tracking, and reproducibility via the Recorder system (MLflow-based).\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/qlib/rl`: Dedicated components for Reinforcement Learning applications, particularly for optimal order execution.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/qlib/utils`: General utility functions for serialization, dynamic object creation, and parallel processing.\n*   `/home/ubuntu/FinnewsHunter/thirdparty/qlib/scripts`: External data collection and management utilities.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module: qlib/backtest\n\n### Core Responsibility\nThe `qlib/backtest` module is the **core simulation engine** of Qlib, responsible for executing trading strategies against historical data. It manages the trading environment, including the account, positions, market exchange, and the execution of trade decisions. Its primary function is to provide a realistic and high-performance simulation of a quantitative investment strategy's performance over time, including the calculation of detailed portfolio and trading metrics.\n\n### Key Files and Functions\n| File | Primary Classes/Functions | Responsibility |\n| :--- | :--- | :--- |\n| `backtest.py` | `backtest_loop`, `collect_data_loop` | Defines the main backtesting loop, which orchestrates the interaction between the `BaseStrategy` and `BaseExecutor`. It collects portfolio and indicator metrics at the end of the simulation. |\n| `executor.py` | `BaseExecutor`, `NestedExecutor` | Abstract base for trade execution. `NestedExecutor` implements the critical **nested decision execution** pattern, allowing for multi-frequency or hierarchical strategies (e.g., daily strategy deciding on minute-level execution). |\n| `account.py` | `Account`, `AccumulatedInfo` | Manages the trading account, tracking cash, positions (`BasePosition`), accumulated return, cost, and turnover. It is responsible for updating the account state after each trade and at the end of each trading bar. |\n| `decision.py` | `Order`, `BaseTradeDecision` | Defines the fundamental data structures for trading. `Order` represents a single buy/sell instruction. `BaseTradeDecision` is the abstract interface for a strategy's output, which the executor consumes. |\n| `exchange.py` | `Exchange` | Simulates the market environment, providing price data, checking for stock suspensions, and calculating the actual trade price and volume based on market conditions and order size. |\n| `position.py` | `BasePosition`, `Position` | Tracks the holdings of the account, including the amount and cost basis of each stock. `Position` is the concrete implementation. |\n| `report.py` | `Indicator`, `PortfolioMetrics` | Defines the classes for calculating and storing performance metrics, such as alpha, beta, Sharpe ratio, and trading indicators like price advantage (PA) and fulfill rate (FFR). |\n\n### Core Implementation Details\nThe backtesting process is driven by a generator pattern in `backtest.py`'s `collect_data_loop`. The loop iteratively calls the strategy to generate a `BaseTradeDecision` and then passes it to the executor for processing.\n\n**Nested Execution:** The `NestedExecutor` is a key abstraction. It holds a list of sub-executors, each potentially operating at a different frequency (e.g., day, minute). A decision from an outer (slower) executor is passed down to an inner (faster) executor, which then executes the decision over its own calendar steps. The `Account` object is shallow-copied across nested levels, ensuring that while positions are shared, each level can maintain its own set of trading metrics.\n\n**Order Processing:** An `Order` object is created by the strategy and contains the `stock_id`, `amount` (adjusted), `direction`, and time range. The `Exchange` determines the actual `deal_amount` and `trade_price`. The `Account` then updates its cash and position based on the executed trade, and the `AccumulatedInfo` tracks overall trading statistics. The `Account`'s `_update_state_from_order` method handles the complex calculation of return (`rtn`) and cost, ensuring that the return calculation is consistent with the end-of-bar earning calculation.\n\n**Example: Order Structure**\nThe `Order` class in `decision.py` uses a `dataclass` and an `IntEnum` (`OrderDir`) for clarity:\n```python\n@dataclass\nclass Order:\n    stock_id: str\n    amount: float\n    direction: OrderDir  # OrderDir.SELL or OrderDir.BUY\n    start_time: pd.Timestamp\n    end_time: pd.Timestamp\n    deal_amount: float = 0.0\n    factor: Optional[float] = None\n```\n\n### Dependencies\nThe `qlib/backtest` module has critical dependencies on:\n*   **`qlib.strategy.base`**: Depends on `BaseStrategy` to receive trade decisions.\n*   **`qlib.data`**: Implicitly depends on the data module via `Exchange` to fetch market data (prices, volumes, suspension status).\n*   **`qlib.utils`**: Uses `init_instance_by_config` for dynamic object creation (e.g., `BasePosition` implementation) and time utilities like `Freq.parse`.\n*   **`pandas`**: Heavily relies on `pd.Timestamp` and `pd.DataFrame` for time series and data handling.\n\n## Module: qlib/data\n\n### Core Responsibility\nThe `qlib/data` module is the **Data Layer** of the Qlib platform. Its responsibility is to manage all aspects of data, from raw data access and storage to advanced feature engineering and dataset creation for machine learning models. It abstracts the underlying data source and provides a unified, time-series-aware interface for the rest of the system.\n\n### Key Files and Functions\n| File | Primary Classes/Functions | Responsibility |\n| :--- | :--- | :--- |\n| `data.py` | `CalendarProvider`, `InstrumentProvider`, `FeatureProvider` | Defines the abstract interfaces for accessing market calendar, instrument lists (stock pools), and raw feature data. It is the foundation for all data access. |\n| `dataset/handler.py` | `DataHandlerABC`, `DataHandler` | The central class for data preparation. It loads data via a `DataLoader`, stores it in a multi-index `pd.DataFrame` (indexed by `datetime` and `instrument`), and provides the `fetch` method for accessing processed data slices. |\n| `dataset/processor.py` | `Processor`, `ZScoreNorm`, `CSZScoreNorm` | Defines the base class for all feature engineering and data cleaning steps. Concrete implementations like `ZScoreNorm` and `CSZScoreNorm` (Cross-Sectional Z-Score Normalization) are applied to the data before it is consumed by models. |\n| `dataset/loader.py` | `DataLoader` | Abstract interface for loading raw data into the `DataHandler`. |\n| `storage/` | `FileStorage` | Handles the persistence layer, managing how data is stored and retrieved from disk. |\n\n### Core Implementation Details\nThe data flow is highly structured:\n1.  **Data Loading**: A concrete `DataLoader` implementation fetches raw data.\n2.  **Data Handling**: The `DataHandler` receives the raw data and stores it in a multi-index DataFrame. This structure is fundamental, allowing for efficient time-series and cross-sectional operations.\n3.  **Feature Processing**: A pipeline of `Processor` objects is applied to the data. The `Processor` base class includes a `fit` method (for learning parameters like mean/std from a training set) and a `__call__` method (for applying the transformation). This separation is crucial for preventing **look-ahead bias** in quantitative research. For example, `MinMaxNorm` and `ZScoreNorm` fit their parameters only on the historical data defined by `fit_start_time` and `fit_end_time`.\n4.  **Data Access**: The `fetch` method of `DataHandler` allows other modules (like `qlib/model` or `qlib/backtest`) to retrieve specific slices of the processed data, typically separated into `raw`, `infer`, and `learn` data keys.\n\n### Dependencies\nThe `qlib/data` module is largely self-contained but relies heavily on:\n*   **`pandas` and `numpy`**: For all data manipulation, especially multi-index DataFrame operations.\n*   **`qlib.utils`**: For serialization (`Serializable`), dynamic object creation (`init_instance_by_config`), and parallel processing (`datetime_groupby_apply`).\n\n## Module: qlib/model\n\n### Core Responsibility\nThe `qlib/model` module provides the **Model Abstraction and Training Infrastructure** for quantitative research. It defines the interfaces for all learnable models, handles the training lifecycle, and includes specialized components for ensemble methods and risk modeling.\n\n### Key Files and Functions\n| File | Primary Classes/Functions | Responsibility |\n| :--- | :--- | :--- |\n| `base.py` | `BaseModel`, `Model`, `ModelFT` | Defines the fundamental interfaces. `BaseModel` for prediction, `Model` adds the `fit` method for training on a `Dataset`, and `ModelFT` (Fine-Tunable) adds the `finetune` method. |\n| `trainer.py` | `Trainer`, `TrainerR`, `DelayTrainerR` | Manages the training process for one or more models (tasks). `TrainerR` uses the Qlib Recorder (`R`) for logging and saving models. `DelayTrainerR` supports delayed execution, which is useful for parallelizing the training of multiple models. |\n| `ens/ensemble.py` | `RollingEnsemble` | Provides mechanisms for combining multiple models, often used in a rolling window fashion to simulate real-world deployment. |\n| `riskmodel/structured.py` | `StructuredCovarianceEstimator` | Implements advanced risk modeling techniques, such as estimating the covariance matrix of asset returns using a structured approach (e.g., factor models). |\n| `meta/model.py` | `MetaModel` | Supports meta-learning or model-agnostic meta-learning (MAML) approaches, where a model learns to quickly adapt to new tasks. |\n\n### Core Implementation Details\n**Model Abstraction:** The `Model` class enforces a clear separation between the `fit` and `predict` phases. The `fit` method takes a `Dataset` object, which is responsible for providing the processed features and labels. This design ensures that models operate on clean, pre-processed data, decoupling the modeling logic from the data engineering pipeline.\n\n**Training Workflow:** The `Trainer` classes, particularly `TrainerR`, integrate tightly with the `qlib.workflow.R` (Recorder) system. The `task_train` function encapsulates the end-to-end process:\n1.  Start a new Recorder (`R.start`).\n2.  Log the task configuration.\n3.  Initialize the `Model` and `Dataset` from the configuration.\n4.  Call `model.fit(dataset)`.\n5.  Save the trained model and the configured (but not data-dumped) dataset to the Recorder.\n6.  Generate prediction, backtest, and analysis records.\n\nThe `DelayTrainer` concept is an advanced feature that allows the system to quickly create \"placeholders\" for models in the `train` phase and defer the actual, time-consuming model fitting to the `end_train` phase, often executed in parallel or on a separate cluster.\n\n## Module: qlib/workflow\n\n### Core Responsibility\nThe `qlib/workflow` module is the **Experiment Management and Tracking System** of Qlib. It provides a robust, MLflow-based infrastructure for defining, executing, tracking, and reproducing quantitative research experiments. It manages the lifecycle of experiments and individual runs (Recorders).\n\n### Key Files and Functions\n| File | Primary Classes/Functions | Responsibility |\n| :--- | :--- | :--- |\n| `recorder.py` | `Recorder`, `MLflowRecorder` | Defines the interface for logging a single experiment run. It handles logging parameters, metrics, tags, and saving artifacts (models, predictions) to the artifact store, typically backed by MLflow. |\n| `expm.py` | `ExpManager`, `MLflowExpManager` | Manages the collection of experiments. It provides methods to create, get, and search for experiments, and handles the activation/deactivation of the current experiment context. |\n| `exp.py` | `Experiment`, `MLflowExperiment` | Defines the interface for an experiment, which is a collection of runs (Recorders). |\n| `task/manage.py` | `TaskManager`, `run_task` | Provides utilities for managing and executing a set of tasks, often used in conjunction with `Trainer` to orchestrate multi-model training. |\n| `record_temp.py` | `SignalRecord`, `PortAnalysisRecord` | Contains concrete implementations of records that can be generated after a model is trained, such as recording prediction signals or backtest analysis results. |\n\n### Core Implementation Details\nThe workflow system is built around the **MLflow** tracking system, which Qlib abstracts with its own interfaces (`Recorder`, `Experiment`, `ExpManager`). This abstraction allows Qlib to add custom logic (like auto-logging uncommitted code or environment variables) while leveraging MLflow's robust backend for tracking and artifact storage.\n\n**Experiment Lifecycle:** The process begins with `ExpManager.start_exp()`, which sets up the context for a new experiment run and returns an active `Recorder`. The `Trainer` then uses this active `Recorder` to log all training details, model parameters, and save the final model object as an artifact using `Recorder.save_objects()`. This ensures that every step of the quantitative research process is traceable and reproducible.\n\n**Task Management:** The `task` submodule is crucial for defining a quantitative workflow as a configuration dictionary (a \"task\"). This configuration typically includes the `dataset`, `model`, and a list of `record` actions to perform. The `run_task` function orchestrates the execution of this configuration, ensuring the model is trained and the necessary records (predictions, backtest reports) are generated and logged.\n\n## Module: qlib/rl\n\n### Core Responsibility\nThe `qlib/rl` module is dedicated to **Reinforcement Learning (RL) applications** within the quantitative finance domain, with a strong focus on **Optimal Order Execution (OOE)**. It provides a specialized RL environment, data integration, and a training framework tailored for financial tasks.\n\n### Key Files and Functions\n| File | Primary Classes/Functions | Responsibility |\n| :--- | :--- | :--- |\n| `order_execution/simulator_qlib.py` | `SingleAssetOrderExecution` | Implements the RL environment (simulator) for the Single-Asset Order Execution (SAOE) problem, built on top of the core `qlib/backtest` engine. It translates the backtest loop into an RL step-by-step interaction. |\n| `order_execution/strategy.py` | `SAOEStrategy`, `SAOEStateAdapter` | Defines the RL strategy interface and an adapter to convert the backtest state into an RL state (`SAOEState`) that the agent can observe. |\n| `trainer/trainer.py` | `Trainer` | A sophisticated RL training utility (similar to PyTorch Lightning) that manages the training loop, including collecting policy-env interactions, updating the policy, and handling callbacks and logging. |\n| `trainer/vessel.py` | `TrainingVesselBase` | A container that bundles all necessary RL components (policy, simulator, state/action/reward interpreters) for a specific training task. |\n| `utils/finite_env.py` | `FiniteVectorEnv` | Provides a vectorized environment wrapper, allowing multiple RL episodes (simulations) to run in parallel for efficient data collection. |\n\n### Core Implementation Details\n**RL-Backtest Integration:** The `SingleAssetOrderExecution` class is the bridge between the core Qlib backtest and the RL framework. It uses the `collect_data_loop` from `qlib/backtest` as a generator, yielding control back to the RL agent at each time step to receive an action (the amount to deal). The `SAOEStateAdapter` converts the complex backtest state (position, account info) into a simplified `SAOEState` (e.g., time remaining, volume remaining) for the agent.\n\n**Training Framework:** The `Trainer` class is designed for complex RL workflows. It operates in \"collect\" iterations, where the agent interacts with the environment (`FiniteVectorEnv`) to gather experience. It supports:\n*   **Vectorized Environments**: Running multiple simulations concurrently for faster data collection.\n*   **Callbacks**: Hooks for custom logic during training (e.g., checkpointing, early stopping).\n*   **State Management**: `state_dict` and `load_state_dict` methods for saving and resuming the entire training state.\n\n**The Training Vessel:** The `TrainingVesselBase` is a key abstraction, ensuring that all components required for an RL task (policy, environment, reward function, etc.) are correctly configured and passed to the `Trainer`.\n\n## Module: qlib/strategy\n\n### Core Responsibility\nThe `qlib/strategy` module defines the **Strategy Abstraction** and the interface for generating trading decisions. It acts as the brain of the backtesting process, deciding what to buy, sell, and when, based on market data and model predictions.\n\n### Key Files and Functions\n| File | Primary Classes/Functions | Responsibility |\n| :--- | :--- | :--- |\n| `base.py` | `BaseStrategy` | The abstract base class for all trading strategies. It provides access to the backtesting infrastructure (`trade_calendar`, `trade_position`, `trade_exchange`) and defines the core method `generate_trade_decision`. |\n| `base.py` | `RLStrategy`, `RLIntStrategy` | Specialized base classes for strategies driven by Reinforcement Learning agents. `RLIntStrategy` includes `state_interpreter` and `action_interpreter` to bridge the RL agent's state/action space with the Qlib backtesting environment. |\n\n### Core Implementation Details\n**The Strategy-Executor Loop:** The central contract is the `generate_trade_decision` method in `BaseStrategy`. In each step of the backtest loop, the executor calls this method, passing the result of the previous execution step (`execute_result`). The strategy then uses this information, along with market data (via `trade_exchange`) and its current position (`trade_position`), to generate a new `BaseTradeDecision` (which typically contains a list of `Order` objects).\n\n**Infrastructure Access:** `BaseStrategy` is initialized with `LevelInfrastructure` and `CommonInfrastructure`, giving it access to the current state of the simulation. This is a crucial design choice, as it allows strategies to be context-aware without tightly coupling them to the executor's implementation details.\n\n**RL Integration:** The `RLIntStrategy` demonstrates Qlib's extensibility. It wraps an RL `policy` and uses interpreters to:\n1.  **State Interpretation**: Convert the `execute_result` (raw simulation output) into a state representation (`_interpret_state`) suitable for the RL policy.\n2.  **Action Interpretation**: Convert the RL policy's action (`_action`) into a Qlib-compatible `BaseTradeDecision` (`_trade_decision`).\n\nThis pattern is an example of the **Adapter Pattern**, enabling the integration of external components (RL agents) into the core framework.\n\n## Module: qlib/utils\n\n### Core Responsibility\nThe `qlib/utils` module serves as the **Utility and Infrastructure Layer** for the entire Qlib project. It provides essential, non-domain-specific functionalities such as object serialization, dynamic object creation, parallel processing helpers, and time/data manipulation routines.\n\n### Key Files and Functions\n| File | Primary Classes/Functions | Responsibility |\n| :--- | :--- | :--- |\n| `serial.py` | `Serializable` | Defines the base class for all objects that need to be pickled/unpickled. It implements custom logic (`__getstate__`, `__setstate__`) to control which attributes are saved, allowing for selective dumping (e.g., saving model parameters but not large dataframes). |\n| `objm.py` | `ObjManager`, `FileManager` | Provides an abstract interface for object management (saving, loading, listing). `FileManager` is a concrete implementation that uses the local file system for object persistence. |\n| `mod.py` | `init_instance_by_config` | A critical utility for Qlib's configuration-driven design. It dynamically loads and instantiates Python objects (classes) based on a configuration dictionary that specifies the class name and module path. |\n| `paral.py` | `ParallelExt` | Contains utilities for parallelizing tasks, often used in data processing or multi-model training. |\n| `time.py` | `Freq` | Provides utilities for handling time frequencies (e.g., 'day', 'minute') and converting between time formats. |\n\n### Core Implementation Details\n**Configuration-Driven Design:** The `init_instance_by_config` function is a cornerstone of Qlib's architecture. It allows users to define complex workflows (data handlers, models, strategies) entirely through configuration files (e.g., YAML), promoting flexibility and reproducibility without writing custom Python code for every new experiment.\n\n**Serialization Control:** The `Serializable` class is essential for experiment tracking. By overriding `__getstate__`, it implements a policy for attribute dumping: attributes starting with `_` are dropped by default, unless `dump_all` is set or they are explicitly included. This prevents large, transient objects (like in-memory dataframes) from being saved with the model, keeping experiment artifacts small and manageable.\n\n**Object Management:** The `ObjManager` abstraction, implemented by `FileManager`, is used by the `Recorder` to save and load artifacts (models, predictions) to the artifact store, ensuring that the persistence layer is modular and potentially swappable.\n\n### Module PlantUML Diagrams\n\n## Module: qlib/backtest\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"qlib.backtest\" {\n    abstract class BaseTradeDecision\n    class Order {\n        + stock_id: str\n        + amount: float\n        + direction: OrderDir\n    }\n    BaseTradeDecision <|-- Order\n\n    abstract class BaseExecutor {\n        + time_per_step: str\n        + execute(trade_decision)\n        + collect_data(trade_decision)\n    }\n\n    class NestedExecutor {\n        - sub_executors: List[BaseExecutor]\n    }\n    BaseExecutor <|-- NestedExecutor\n\n    class Exchange {\n        + get_close(code, start, end)\n        + check_stock_suspended(code, start, end)\n    }\n\n    class Account {\n        - current_position: BasePosition\n        - accum_info: AccumulatedInfo\n        + update_order(order, trade_val, cost, trade_price)\n    }\n\n    class AccumulatedInfo {\n        + rtn: float\n        + cost: float\n        + to: float\n    }\n\n    abstract class BasePosition\n    class Position\n    BasePosition <|-- Position\n\n    class PortfolioMetrics\n    class Indicator\n\n    Account o-- BasePosition\n    Account o-- AccumulatedInfo\n    Account o-- PortfolioMetrics\n    Account o-- Indicator\n    BaseExecutor o-- Exchange\n    BaseExecutor o-- Account\n}\n@enduml\n```\n\n## Module: qlib/data\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"qlib.data\" {\n    abstract class CalendarProvider {\n        + calendar(start_time, end_time, freq)\n        + locate_index(start_time, end_time, freq)\n        # load_calendar(freq, future)\n    }\n\n    abstract class InstrumentProvider {\n        + instruments(market, filter_pipe)\n        # list_instruments(instruments, start_time, end_time, freq)\n    }\n\n    abstract class FeatureProvider {\n        + feature(instrument, field, start_time, end_time, freq)\n    }\n\n    abstract class DataHandlerABC {\n        + fetch(selector, level, col_set, data_key)\n    }\n\n    class DataHandler {\n        - _data: pd.DataFrame\n        - data_loader: DataLoader\n        + setup_data()\n        + fetch(...)\n    }\n\n    abstract class DataLoader {\n        + load(instruments, start_time, end_time)\n    }\n\n    abstract class Processor {\n        + fit(df)\n        + __call__(df)\n        + is_for_infer()\n    }\n\n    class ZScoreNorm {\n        - mean_train\n        - std_train\n        + fit(df)\n        + __call__(df)\n    }\n\n    class CSZScoreNorm {\n        + __call__(df)\n    }\n\n    DataHandlerABC <|-- DataHandler\n    DataLoader <.. DataHandler : uses\n    Processor <|-- ZScoreNorm\n    Processor <|-- CSZScoreNorm\n\n    note right of DataHandler::fetch\n    Handles multi-index DataFrame\n    (datetime, instrument)\n    end note\n\n    note right of Processor\n    Feature Engineering\n    and Data Cleaning\n    end note\n}\n@enduml\n```\n\n## Module: qlib/model\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"qlib.model\" {\n    abstract class BaseModel {\n        + predict()\n    }\n\n    abstract class Model {\n        + fit(dataset: Dataset, reweighter: Reweighter)\n        + predict(dataset: Dataset, segment)\n    }\n\n    abstract class ModelFT {\n        + finetune(dataset: Dataset)\n    }\n\n    class Trainer {\n        + train(tasks)\n        + end_train(models)\n    }\n\n    class TrainerR {\n        - experiment_name\n        + train(tasks)\n    }\n\n    class DelayTrainerR {\n        + train(tasks)\n        + end_train(models)\n    }\n\n    class StructuredCovarianceEstimator {\n        + fit(data)\n    }\n\n    BaseModel <|-- Model\n    Model <|-- ModelFT\n    Trainer <|-- TrainerR\n    TrainerR <|-- DelayTrainerR\n\n    note right of Model::fit\n    Takes Dataset for features/labels\n    and Reweighter for sample weights\n    end note\n\n    note right of TrainerR\n    Integrates with Qlib Recorder (R)\n    for experiment tracking\n    end note\n\n    Model ..> Dataset : uses\n    TrainerR ..> Recorder : manages\n}\n@enduml\n```\n\n## Module: qlib/workflow\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"qlib.workflow\" {\n    abstract class Experiment {\n        + create_recorder()\n        + search_records()\n    }\n\n    class MLflowExperiment {\n    }\n\n    abstract class Recorder {\n        + start_run()\n        + end_run()\n        + log_params()\n        + save_objects()\n    }\n\n    class MLflowRecorder {\n    }\n\n    abstract class ExpManager {\n        + start_exp()\n        + end_exp()\n        + get_exp()\n    }\n\n    class MLflowExpManager {\n    }\n\n    class TaskManager {\n        + run_task()\n    }\n\n    Experiment <|-- MLflowExperiment\n    Recorder <|-- MLflowRecorder\n    ExpManager <|-- MLflowExpManager\n\n    MLflowExpManager o-- MLflowExperiment : manages\n    MLflowExperiment o-- MLflowRecorder : contains\n\n    note right of MLflowRecorder\n    Wraps MLflow Run\n    for experiment tracking\n    end note\n}\n@enduml\n```\n\n## Module: qlib/rl\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"qlib.rl\" {\n    abstract class Simulator {\n        + reset()\n        + step(action)\n        + get_state()\n        + done()\n    }\n\n    class SingleAssetOrderExecution {\n        - _executor: NestedExecutor\n        - _collect_data_loop: Generator\n        + twap_price\n    }\n\n    class SAOEStateAdapter {\n        + saoe_state: SAOEState\n    }\n\n    class Trainer {\n        + fit(vessel: TrainingVesselBase)\n        + test(vessel: TrainingVesselBase)\n        + venv_from_iterator(iterator)\n    }\n\n    abstract class TrainingVesselBase {\n        + train(env)\n        + validate(env)\n        + test(env)\n    }\n\n    class FiniteVectorEnv {\n    }\n\n    Simulator <|-- SingleAssetOrderExecution\n    SingleAssetOrderExecution ..> SAOEStateAdapter : uses\n    Trainer ..> TrainingVesselBase : trains\n    Trainer ..> FiniteVectorEnv : uses\n\n    note right of SingleAssetOrderExecution\n    Bridge between Qlib Backtest\n    and RL Environment\n    end note\n}\n@enduml\n```\n\n## Module: qlib/strategy\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"qlib.strategy\" {\n    abstract class BaseStrategy {\n        + trade_calendar: TradeCalendarManager\n        + trade_position: BasePosition\n        + trade_exchange: Exchange\n        + generate_trade_decision(execute_result)\n        + reset(level_infra, common_infra)\n    }\n\n    abstract class RLStrategy {\n        - policy\n    }\n\n    abstract class RLIntStrategy {\n        - state_interpreter: StateInterpreter\n        - action_interpreter: ActionInterpreter\n        + generate_trade_decision(execute_result)\n    }\n\n    BaseStrategy <|-- RLStrategy\n    RLStrategy <|-- RLIntStrategy\n\n    RLIntStrategy ..> StateInterpreter : uses\n    RLIntStrategy ..> ActionInterpreter : uses\n    BaseStrategy ..> BaseTradeDecision : returns\n    BaseStrategy ..> LevelInfrastructure : uses\n    BaseStrategy ..> CommonInfrastructure : uses\n}\n@enduml\n```\n\n## Module: qlib/utils\n\n```plantuml\n@startuml\nskinparam classAttributeIconVisible false\n\npackage \"qlib.utils\" {\n    abstract class Serializable {\n        + dump_all: bool\n        + to_pickle(path)\n        + load(filepath)\n        # __getstate__()\n    }\n\n    abstract class ObjManager {\n        + save_obj(obj, name)\n        + load_obj(name)\n    }\n\n    class FileManager {\n        - path: Path\n    }\n\n    class ModuleUtils {\n        + init_instance_by_config(config)\n    }\n\n    class ParallelExt {\n        + run_parallel(func, args)\n    }\n\n    Serializable <|-- ObjManager\n    ObjManager <|-- FileManager\n\n    note right of Serializable\n    Custom serialization logic\n    to control which attributes are saved\n    end note\n\n    note right of ModuleUtils\n    Dynamic object instantiation\n    from configuration\n    end note\n}\n@enduml\n```\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe Qlib architecture is built on a **Configuration-Driven, Modular, and Extensible** design philosophy, centered around four core abstractions: **Data, Model, Strategy, and Workflow**.\n\n### Core Abstractions\n1.  **The Dataset/DataHandler Abstraction (`qlib/data`)**: This abstraction is responsible for providing a unified, time-series-aware view of the financial data. The `DataHandler` manages a multi-index DataFrame (indexed by `datetime` and `instrument`), and the `Processor` classes enforce a strict separation between feature engineering and model training to prevent look-ahead bias.\n2.  **The Model Abstraction (`qlib/model`)**: The `Model` interface (`Model.fit`, `Model.predict`) decouples the learning algorithm from the data source and the execution environment. This allows for easy integration of various machine learning models (e.g., LightGBM, PyTorch models) into the Qlib ecosystem.\n3.  **The Strategy Abstraction (`qlib/strategy`)**: The `BaseStrategy` defines the decision-making logic, which is separated from the execution mechanics. This allows researchers to focus purely on alpha generation logic, while the `backtest` module handles the complex simulation details.\n4.  **The Workflow/Recorder Abstraction (`qlib/workflow`)**: This abstraction, primarily implemented by the `Recorder` and `ExpManager`, manages the entire lifecycle of a quantitative experiment. It ensures **reproducibility** by logging all parameters, metrics, and artifacts (models, predictions) to a persistent store (MLflow), making it possible to trace every result back to its exact configuration and trained model.\n\n### Design Philosophy\nThe primary design intention is to create an **end-to-end platform for quantitative research** that is both **flexible** and **rigorous**.\n*   **Flexibility through Configuration**: The heavy reliance on `init_instance_by_config` (from `qlib/utils/mod.py`) allows users to define complex pipelines entirely through YAML configuration, enabling rapid experimentation and component swapping.\n*   **Rigour through Abstraction**: Strict interfaces (e.g., `Processor` for data cleaning, `BaseStrategy` for decision-making) enforce best practices, such as preventing data leakage and ensuring a clean separation of concerns.\n*   **Extensibility**: The use of abstract base classes (e.g., `BaseExecutor`, `BaseStrategy`, `BaseModel`) and the `contrib` module encourages community contributions and the integration of new algorithms or data sources.\n\n### Lifecycle Management\nThe typical Qlib lifecycle is managed by the `qlib/workflow` module:\n1.  **Configuration**: A task is defined in a configuration file (YAML/Dict).\n2.  **Training**: The `Trainer` initializes the `Dataset` and `Model`, calls `model.fit()`, and logs the results to the `Recorder`.\n3.  **Backtesting**: The trained `Model` is used by a `Strategy` inside the `Backtest` loop to generate trading decisions.\n4.  **Reporting**: The `Backtest` module generates detailed `PortfolioMetrics` and `Indicator` reports, which are then logged by the `Recorder`.\n5.  **Deployment/Inference**: The saved `Model` and `Dataset` configuration can be loaded for online inference, often utilizing the `DelayTrainer` concept for parallel or distributed execution.\n\n#### 3.1.2. Component Interactions\n\nThe Qlib system is a tightly integrated pipeline where data flows sequentially from preparation to simulation and finally to reporting.\n\n### Key Interaction Flows\n\n| Source Module | Target Module | Interaction Description |\n| :--- | :--- | :--- |\n| **`qlib/data`** | **`qlib/model`** | **Data Provision for Training**: The `Dataset` object (managed by `DataHandler`) is passed to `Model.fit()`. The `Dataset.prepare()` method provides the model with processed features and labels, ensuring the data is correctly segmented (train/valid/test) and free of look-ahead bias. |\n| **`qlib/model`** | **`qlib/strategy`** | **Prediction Generation**: A trained `Model` is often used within a concrete `BaseStrategy` implementation. The strategy calls `model.predict()` on the current market data slice to generate a prediction signal (e.g., stock scores) which informs the trading decision. |\n| **`qlib/strategy`** | **`qlib/backtest`** | **Decision-Execution Loop**: The `BaseStrategy.generate_trade_decision()` method is called by the `BaseExecutor` in the `backtest` module. The strategy returns a `BaseTradeDecision` (containing `Order` objects), which the `Executor` then attempts to fulfill via the `Exchange`. |\n| **`qlib/backtest`** | **`qlib/backtest`** | **Nested Execution**: The `NestedExecutor` is a critical internal pattern. An outer (e.g., daily) executor passes its decision to an inner (e.g., minute) executor, which breaks down the trade into smaller, higher-frequency steps. This allows for multi-frequency trading simulation. |\n| **`qlib/backtest`** | **`qlib/workflow`** | **Result Logging**: At the end of the backtest, the `Account` and `Report` objects generate `PortfolioMetrics` and `Indicator` data. This data is passed to the `Recorder` to be logged as artifacts and metrics, completing the experiment loop. |\n| **`qlib/utils`** | **All Modules** | **Infrastructure Services**: The `qlib/utils` module provides core services like dynamic object creation (`init_instance_by_config`) and controlled object persistence (`Serializable`), which are used ubiquitously across all other modules to maintain the configuration-driven and reproducible nature of the platform. |\n\n### Data Flow\n1.  **Raw Data**: Loaded by `DataLoader` -> Stored in `DataHandler` (Multi-Index DataFrame).\n2.  **Processed Data**: `DataHandler` applies `Processor` pipeline (e.g., `ZScoreNorm`) -> `Dataset` segments the data.\n3.  **Model Input**: `Dataset` provides `(X_train, y_train)` to `Model.fit()`.\n4.  **Signal**: Trained `Model` generates prediction scores (signals).\n5.  **Order**: `Strategy` converts signals into `Order` objects.\n6.  **Execution**: `Executor` processes `Order` via `Exchange` -> updates `Account` and `Position`.\n7.  **Metrics**: `Account` generates `PortfolioMetrics` -> `Recorder` logs results.\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml\nskinparam componentStyle rectangle\nskinparam defaultFontName Courier\nskinparam monochrome true\n\ntitle Qlib Overall Architecture\n\npackage \"qlib.utils\" as Utils {\n    [Serializable]\n    [init_instance_by_config]\n}\n\npackage \"qlib.data\" as Data {\n    [DataHandler]\n    [Processor]\n    [Dataset]\n}\n\npackage \"qlib.model\" as Model {\n    [Model]\n    [Trainer]\n}\n\npackage \"qlib.strategy\" as Strategy {\n    [BaseStrategy]\n    [RLIntStrategy]\n}\n\npackage \"qlib.backtest\" as Backtest {\n    [BaseExecutor]\n    [NestedExecutor]\n    [Account]\n    [Exchange]\n}\n\npackage \"qlib.workflow\" as Workflow {\n    [ExpManager]\n    [Recorder]\n}\n\npackage \"qlib.rl\" as RL {\n    [SAOE Simulator]\n}\n\n' Dependencies and Data Flow\nData --> Model : Provides (Dataset)\nModel --> Strategy : Provides (Prediction Signal)\nStrategy --> Backtest : Provides (Trade Decision/Order)\nBacktest --> Data : Requests (Market Data via Exchange)\nBacktest --> Workflow : Logs (Metrics/Reports)\nModel --> Workflow : Logs (Model Artifacts)\nWorkflow ..> Utils : Uses (Serialization)\nData ..> Utils : Uses (Dynamic Config)\nModel ..> Utils : Uses (Dynamic Config)\nStrategy ..> Utils : Uses (Dynamic Config)\nBacktest ..> Utils : Uses (Dynamic Config)\n\n' Specialized Flows\nRL --> Strategy : Implements (RLIntStrategy)\nRL --> Backtest : Uses (SAOE Simulator wraps Executor)\n\n' Key Abstractions\n[DataHandler] .up.|> [Dataset]\n[Model] .up.|> [Trainer]\n[BaseStrategy] .up.|> [BaseExecutor]\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nQlib extensively uses several fundamental design patterns to achieve its modularity and flexibility.\n\n1.  **Strategy Pattern**:\n    *   **Implementation**: The `BaseStrategy` class in `qlib/strategy` defines the interface for an algorithm (the trading logic). Concrete strategies (e.g., `SingleOrderStrategy`) implement this interface. The `BaseExecutor` holds a reference to a `BaseStrategy` and calls its `generate_trade_decision` method, allowing the execution logic to be independent of the decision-making logic.\n    *   **Example**: Different trading strategies can be swapped out simply by changing the configuration passed to the backtest.\n\n2.  **Factory Method / Abstract Factory Pattern (via Configuration)**:\n    *   **Implementation**: The `init_instance_by_config` utility in `qlib/utils/mod.py` acts as a generic factory. It takes a configuration dictionary (which specifies the class path and keyword arguments) and dynamically creates an instance of that class. This pattern is used to instantiate `DataHandler`s, `Model`s, `Strategy`s, and `Executor`s.\n    *   **Benefit**: This is the core mechanism for Qlib's configuration-driven workflow, allowing the system to be assembled from components defined in a YAML file.\n\n3.  **Adapter Pattern**:\n    *   **Implementation**: The `RLIntStrategy` in `qlib/strategy` and the `SAOEStateAdapter` in `qlib/rl` are prime examples. They adapt the complex, internal state of the Qlib backtesting engine into the simplified state/action space required by an external RL agent (policy).\n    *   **Example**: The `SAOEStateAdapter` converts the backtest's `Account` and `Exchange` data into a simple `SAOEState` object for the RL agent.\n\n4.  **Template Method Pattern**:\n    *   **Implementation**: The `BaseExecutor` in `qlib/backtest` defines the skeleton of the backtesting algorithm (`collect_data`), but defers the specific execution details to abstract methods like `_collect_data` which are implemented by concrete executors (e.g., `NestedExecutor`). Similarly, `BaseModel` defines the `fit`/`predict` template.\n\n5.  **Decorator Pattern (Implicit)**:\n    *   **Implementation**: The `Processor` classes in `qlib/data/dataset/processor.py` wrap the raw data (`DataFrame`) and add new behavior (normalization, cleaning) before passing it to the next stage. A pipeline of processors effectively decorates the data.\n\n#### 3.3.2. Project Highlights\n\n*   **Nested Decision Execution**: This highly innovative feature addresses a critical need in quantitative finance: simulating multi-frequency strategies. The `NestedExecutor` allows a high-level strategy (e.g., daily rebalancing) to delegate execution to a lower-level strategy (e.g., minute-level optimal execution), providing a more realistic and powerful simulation environment.\n*   **Look-Ahead Bias Prevention**: The explicit `fit` method in `Processor` and the segmentation logic in `Dataset` are designed to prevent data leakage. By forcing the normalization parameters (mean, std) to be learned only on the training set and then applied to the test set, Qlib enforces scientific rigor in the research process.\n*   **MLflow-Based Reproducibility**: By abstracting MLflow into the `qlib/workflow` module, Qlib provides first-class support for experiment tracking. Every model, prediction, and backtest result is automatically logged with its full configuration, ensuring that research findings are fully reproducible and auditable.\n*   **RL-Backtest Integration**: The dedicated `qlib/rl` module, which seamlessly integrates the backtesting engine with a modern RL training framework (e.g., PyTorch-based policies), positions Qlib as a cutting-edge platform for research in areas like Optimal Order Execution (OOE) and portfolio management using reinforcement learning.\n*   **Controlled Serialization (`Serializable`)**: The ability to selectively save object attributes is a key flexibility feature. It allows large, memory-intensive objects (like raw data) to be excluded from the model artifact, making it practical to save and share thousands of trained models without excessive storage overhead.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nThe Qlib architecture is robust, but a few areas present opportunities for optimization and enhanced clarity.\n\n### Performance Bottlenecks\nThe most significant performance challenge stems from the heavy reliance on **Pandas DataFrames** for all data handling within `qlib/data`. While Pandas is flexible, its memory footprint and performance can degrade substantially with high-frequency or large-scale datasets. Specifically, cross-sectional operations in `qlib/data/dataset/processor.py`, which often involve grouping and applying functions across time steps, can be slow.\n*   **Suggestion**: Investigate integrating high-performance data libraries like **Polars** or **Apache Arrow** for the core data storage and manipulation layers, especially for the multi-index DataFrames in `DataHandler`. This could drastically reduce memory usage and accelerate data processing.\n\n### Architecture Optimization\nThe tight coupling between the `Trainer` in `qlib/model` and the `qlib.workflow.R` (Recorder) abstraction, which is itself a wrapper around MLflow, limits flexibility.\n*   **Suggestion**: Introduce a more generic **Experiment Tracking Interface** that sits between the Qlib components and the MLflow-specific `MLflowRecorder`. This would allow users to easily plug in alternative experiment tracking systems (e.g., Weights & Biases, custom database) without modifying the core `Trainer` logic.\n\n### Code Quality and Clarity\nThe serialization logic in `qlib/utils/serial.py` is overly complex. The `Serializable` class uses a set of implicit rules (`_is_kept`) to determine which attributes to dump, which can be confusing and lead to subtle bugs.\n*   **Suggestion**: Simplify the serialization mechanism. Instead of relying on name-based conventions (e.g., attributes starting with `_`), adopt a more explicit approach using Python's built-in `dataclasses` or `attrs` library with explicit field metadata to control serialization. Additionally, the \"trick\" of using a shallow copy of `Account` in `NestedExecutor` should be replaced with a more explicit and well-documented pattern to reduce the risk of future maintenance errors.\n\n#### 3.4.2. Secondary Development Guide\n\nThe Qlib framework is designed for extensibility, primarily through its configuration-driven architecture and well-defined abstract base classes. Secondary development should follow these best practices:\n\n1.  **Understand the Configuration Flow**: All core components are instantiated via `qlib.utils.init_instance_by_config`. To understand a workflow, start by examining the YAML configuration file and tracing how the components (DataHandler, Model, Strategy, Executor) are initialized and linked together.\n\n2.  **Adding a New Model**:\n    *   Inherit from `qlib.model.base.Model`.\n    *   Implement the `fit(dataset)` method, which takes a `Dataset` object. Use `dataset.prepare()` to get the training features and labels (`x_train`, `y_train`) as Pandas DataFrames.\n    *   Implement the `predict(dataset, segment)` method, which should return a Pandas Series or DataFrame of prediction scores.\n\n3.  **Adding a New Strategy**:\n    *   Inherit from `qlib.strategy.base.BaseStrategy`.\n    *   Implement the core method `generate_trade_decision(execute_result)`. This method is called at each time step of the backtest.\n    *   Use the inherited properties like `self.trade_exchange` to fetch current market data and `self.trade_position` to check current holdings. The method must return a `BaseTradeDecision` (typically a list of `Order` objects).\n\n4.  **Debugging and Reproducibility**:\n    *   Leverage the `qlib.workflow.R` (Recorder) system. All parameters and artifacts are logged.\n    *   To debug a specific run, use the Recorder's API to load the exact model and task configuration that produced the result, ensuring full reproducibility of the environment.\n    *   For backtesting issues, focus on the interaction between the `generate_trade_decision` method in your strategy and the `execute_result` returned by the executor.\n\n"
  },
  {
    "path": "thirdparty/vnpy.md",
    "content": "# vnpy - In-Depth Source Code Analysis\n\n## Phase 1: Global Scan & Planning\n\n### 1.1. Full Directory Structure\n\n```\nThe project structure is highly modular, separating the core trading logic, event handling, remote communication, and visualization into distinct top-level packages under `/home/ubuntu/vnpy/vnpy`.\n\n```\n/home/ubuntu/vnpy\n├── vnpy/\n│   ├── __init__.py\n│   ├── alpha/             # Alpha research and strategy development tools (Excluded from core analysis)\n│   ├── chart/             # **Core Module: Data Visualization**\n│   │   ├── __init__.py\n│   │   ├── axis.py        # Custom axis for PyQtGraph\n│   │   ├── base.py        # Chart constants and utilities\n│   │   ├── item.py        # Abstract and concrete chart items (CandleItem, VolumeItem)\n│   │   ├── manager.py     # Bar data management and indexing (BarManager)\n│   │   └── widget.py      # Main chart widget and cursor logic (ChartWidget, ChartCursor)\n│   ├── event/             # **Core Module: Event-Driven Architecture**\n│   │   ├── __init__.py\n│   │   └── engine.py      # Event class and asynchronous EventEngine implementation\n│   ├── rpc/               # **Core Module: Remote Procedure Call**\n│   │   ├── __init__.py\n│   │   ├── client.py      # ZeroMQ-based RPC client implementation (REQ/SUB)\n│   │   ├── common.py      # RPC constants (Heartbeat)\n│   │   └── server.py      # ZeroMQ-based RPC server implementation (REP/PUB)\n│   └── trader/            # **Core Module: Trading Engine and Data Model**\n│       ├── __init__.py\n│       ├── app.py         # Base class for application modules (BaseApp)\n│       ├── constant.py    # Trading enums (Direction, Exchange, Status)\n│       ├── database.py    # Database interface (Abstract)\n│       ├── engine.py      # MainEngine, OmsEngine, LogEngine, EmailEngine\n│       ├── event.py       # Trading event constants (EVENT_TICK, EVENT_ORDER)\n│       ├── gateway.py     # Abstract gateway interface (BaseGateway)\n│       ├── logger.py      # Logging utility\n│       ├── object.py      # Core data objects (TickData, OrderData, ContractData)\n│       ├── setting.py     # Configuration management\n│       ├── ui/            # User interface components (Qt-based widgets)\n│       └── utility.py     # General utility functions\n```\n\nThe structure clearly delineates responsibilities: `trader` holds the core business logic and data model, `event` provides the architectural backbone, `rpc` enables distributed scaling, and `chart` handles visualization. This modularity is key to the framework's extensibility. Folders like `alpha`, `locale`, and `ui` are present but contain non-core or localized components, while the four identified modules form the essential, language-agnostic core of the trading system.\n```\n\n### 1.2. Core Folders for Analysis\n\n*   `/home/ubuntu/vnpy/vnpy/trader`: The core trading engine, defining data models, the main application orchestrator (`MainEngine`), and the gateway interface (`BaseGateway`).\n*   `/home/ubuntu/vnpy/vnpy/event`: The event-driven core, implementing the central message bus (`EventEngine`) for decoupled communication.\n*   `/home/ubuntu/vnpy/vnpy/rpc`: The remote procedure call module, enabling distributed deployment using ZeroMQ for synchronous function calls and asynchronous data streaming.\n*   `/home/ubuntu/vnpy/vnpy/chart`: The data visualization module, providing optimized charting components for displaying market data.\n\n## Phase 2: Module-by-Module Deep Analysis\n\n## Module Analysis\n\nThe vn.py framework is structured around a highly decoupled, event-driven architecture, with core functionality segregated into distinct modules. The primary modules analyzed are `vnpy.trader`, `vnpy.event`, `vnpy.rpc`, and `vnpy.chart`.\n\n### 1. vnpy.trader: The Core Trading Engine\n\nThe `trader` module is the central nervous system of the framework, defining the fundamental data model, the core application logic, and the interface for external connectivity.\n\n| File | Core Responsibility | Key Classes/Functions |\n| :--- | :--- | :--- |\n| `object.py` | **Data Model Definition** | `BaseData`, `TickData`, `BarData`, `OrderData`, `ContractData`, `OrderRequest`, `CancelRequest` |\n| `constant.py` | **Trading Constants** | `Direction`, `Exchange`, `Status`, `OrderType`, `Product` (Enums) |\n| `gateway.py` | **External Interface** | `BaseGateway` (Abstract Class), `on_tick`, `send_order` |\n| `engine.py` | **Application Orchestration** | `MainEngine`, `BaseEngine` (Abstract), `OmsEngine`, `LogEngine`, `EmailEngine` |\n\n#### Core Implementation Details\n\n*   **Data Structures (`object.py`)**: All trading data objects are defined as Python `dataclass`es inheriting from `BaseData`. This ensures clear definition of data fields. The use of `vt_symbol`, `vt_orderid`, etc., (e.g., `self.vt_symbol: str = f\"{self.symbol}.{self.exchange.value}\"`) is a key abstraction for creating globally unique identifiers across different gateways.\n*   **Main Engine (`engine.py`)**: The `MainEngine` acts as a **Service Locator** and **Facade**. It manages a collection of `BaseGateway` instances (for connectivity) and `BaseEngine` instances (for functionality like logging, OMS, etc.). It delegates high-level trading operations (e.g., `send_order`) to the appropriate gateway.\n*   **Order Management System (`OmsEngine`)**: This engine is responsible for maintaining the current state of all trading objects (ticks, orders, positions, etc.). It registers handlers for all incoming events (`EVENT_TICK`, `EVENT_ORDER`, etc.) and updates its internal dictionaries (`self.ticks`, `self.orders`). This implements the **Repository** pattern, providing a single source of truth for all trading data via methods like `get_tick` and `get_all_active_orders`.\n*   **Gateway Interface (`gateway.py`)**: The `BaseGateway` is an abstract class that defines the mandatory interface for connecting to any trading system. It enforces a callback mechanism (`on_tick`, `on_order`, etc.) which gateways must use to push data back to the `MainEngine` via the `EventEngine`. This is a clear application of the **Adapter** pattern, allowing various vendor APIs to conform to a single internal standard.\n\n### 2. vnpy.event: The Event-Driven Core\n\nThe `event` module provides the foundational event-driven mechanism that decouples all components in the framework.\n\n| File | Core Responsibility | Key Classes/Functions |\n| :--- | :--- | :--- |\n| `engine.py` | **Event Bus Implementation** | `Event`, `EventEngine`, `EVENT_TIMER`, `register`, `put` |\n\n#### Core Implementation Details\n\n*   **Event Class**: A simple container with `type` (string identifier) and `data` (the payload, e.g., `TickData`).\n*   **EventEngine**: Implements a classic **Publisher-Subscriber (Pub/Sub)** pattern.\n    *   **Producer-Consumer**: It uses a `Queue` (`self._queue`) and a dedicated `_thread` (`self._thread`) to process events asynchronously, ensuring that event generation (e.g., from a gateway) does not block event processing (e.g., by a strategy).\n    *   **Dispatching**: The `_process` method dispatches the `Event` to type-specific handlers (`self._handlers`) and general handlers (`self._general_handlers`).\n    *   **Timer**: A separate `_timer` thread generates periodic `EVENT_TIMER` events, crucial for time-based operations like strategy execution or heartbeat checks.\n\n### 3. vnpy.rpc: Remote Procedure Call\n\nThe `rpc` module enables distributed deployment by providing a robust inter-process communication layer based on ZeroMQ.\n\n| File | Core Responsibility | Key Classes/Functions |\n| :--- | :--- | :--- |\n| `client.py` | **RPC Client** | `RpcClient`, `RemoteException`, `__getattr__` |\n| `server.py` | **RPC Server** | `RpcServer`, `register`, `publish` |\n| `common.py` | **Shared Constants** | `HEARTBEAT_TOPIC`, `HEARTBEAT_INTERVAL`, `HEARTBEAT_TOLERANCE` |\n\n#### Core Implementation Details\n\n*   **Hybrid Communication**: Uses ZeroMQ's `REQ-REP` pattern for synchronous RPC calls (e.g., `RpcClient` calling a function on `RpcServer`) and `PUB-SUB` for asynchronous data streaming (e.g., `RpcServer` publishing market data to `RpcClient`).\n*   **Dynamic Proxy (`RpcClient`)**: The `RpcClient` uses Python's magic method `__getattr__` to dynamically create remote call functions. When a method is called on the client, it serializes the function name and arguments, sends them over the `REQ` socket, and waits for the response.\n*   **Heartbeat**: The `RpcServer` periodically publishes a heartbeat on `HEARTBEAT_TOPIC`, which the `RpcClient` monitors to detect disconnections and call `on_disconnected`.\n\n### 4. vnpy.chart: Data Visualization\n\nThe `chart` module provides the graphical components for displaying market data, built on the PyQtGraph library.\n\n| File | Core Responsibility | Key Classes/Functions |\n| :--- | :--- | :--- |\n| `manager.py` | **Data Management** | `BarManager`, `update_history`, `get_price_range` |\n| `item.py` | **Chart Elements** | `ChartItem` (Abstract), `CandleItem`, `VolumeItem` |\n| `widget.py` | **Main Chart View** | `ChartWidget`, `ChartCursor`, `add_plot`, `add_item` |\n\n#### Core Implementation Details\n\n*   **BarManager**: Responsible for storing and indexing `BarData` objects. It manages the mapping between `datetime` and integer index (`self._datetime_index_map`, `self._index_datetime_map`), which is crucial for the x-axis plotting in PyQtGraph. It also caches price and volume ranges for efficient redrawing.\n*   **ChartItem**: Abstract base class for all plottable elements. It uses `QPicture` for optimized drawing of bars, implementing a **Flyweight**-like pattern where each bar's drawing is cached. `CandleItem` and `VolumeItem` inherit from this to implement specific drawing logic.\n*   **ChartWidget**: The main container, inheriting from `pg.PlotWidget`. It manages multiple `pg.PlotItem`s (plots) and `ChartItem`s (data series). It handles user interaction (keyboard/mouse for zooming and panning) and ensures that all plots are linked on the x-axis. The `ChartCursor` is responsible for displaying crosshair and data information.\n\n### Module PlantUML Diagrams\n\n### Module: vnpy.trader\n\n```plantuml\n@startuml vnpy.trader\nskinparam classAttributeIconVisible false\n\npackage vnpy.trader {\n\n    abstract class BaseData {\n        + gateway_name: str\n        + vt_symbol: str\n    }\n\n    class TickData {\n        + symbol: str\n        + exchange: Exchange\n        + datetime: Datetime\n        + last_price: float\n        + bid_price_1: float\n        + ask_price_1: float\n    }\n\n    class BarData {\n        + symbol: str\n        + exchange: Exchange\n        + datetime: Datetime\n        + open_price: float\n        + high_price: float\n        + low_price: float\n        + close_price: float\n    }\n\n    class OrderData {\n        + orderid: str\n        + direction: Direction\n        + status: Status\n        + is_active(): bool\n        + create_cancel_request(): CancelRequest\n    }\n\n    class TradeData {\n        + tradeid: str\n        + orderid: str\n        + price: float\n        + volume: float\n    }\n\n    class ContractData {\n        + name: str\n        + product: Product\n        + size: float\n    }\n\n    class LogData {\n        + msg: str\n        + level: int\n    }\n\n    class OrderRequest {\n        + create_order_data(orderid, gateway_name): OrderData\n    }\n\n    class CancelRequest {\n        + orderid: str\n    }\n\n    abstract class BaseEngine {\n        + __init__(main_engine, event_engine, name)\n        + close()\n    }\n\n    class MainEngine {\n        + event_engine: EventEngine\n        - gateways: dict<str, BaseGateway>\n        - engines: dict<str, BaseEngine>\n        + add_gateway(gateway_class)\n        + add_engine(engine_class)\n        + send_order(req, gateway_name): str\n        + subscribe(req, gateway_name)\n        + write_log(msg, source)\n    }\n\n    class OmsEngine {\n        - ticks: dict<str, TickData>\n        - orders: dict<str, OrderData>\n        + process_order_event(event)\n        + get_tick(vt_symbol): TickData\n        + get_all_active_orders(): list<OrderData>\n    }\n\n    abstract class BaseGateway {\n        + default_name: str\n        + exchanges: list<Exchange>\n        + connect(setting)\n        + close()\n        + subscribe(req)\n        + send_order(req): str\n        + on_tick(tick: TickData)\n        + on_order(order: OrderData)\n    }\n\n    BaseData <|-- TickData\n    BaseData <|-- BarData\n    BaseData <|-- OrderData\n    BaseData <|-- TradeData\n    BaseData <|-- ContractData\n    BaseData <|-- LogData\n\n    BaseEngine <|-- OmsEngine\n    BaseEngine <|-- LogEngine\n    BaseEngine <|-- EmailEngine\n\n    MainEngine o-- BaseGateway : manages\n    MainEngine o-- BaseEngine : manages\n    MainEngine ..> OmsEngine : delegates data access\n    BaseGateway ..> BaseData : uses\n    BaseGateway ..> OrderRequest : accepts\n    BaseGateway ..> CancelRequest : accepts\n    BaseGateway ..> LogData : generates\n}\n@enduml\n\n### Module: vnpy.event\n\n```plantuml\n@startuml vnpy.event\nskinparam classAttributeIconVisible false\n\npackage vnpy.event {\n\n    class Event {\n        + type: str\n        + data: Any\n    }\n\n    class EventEngine {\n        - _queue: Queue\n        - _thread: Thread\n        - _timer: Thread\n        - _handlers: defaultdict<str, list<HandlerType>>\n        + start()\n        + stop()\n        + put(event: Event)\n        + register(type, handler)\n        - _run()\n        - _process(event)\n    }\n\n    EventEngine \"1\" o-- \"0..*\" Event : processes\n    EventEngine \"1\" o-- \"0..*\" HandlerType : dispatches to\n}\n@enduml\n\n### Module: vnpy.rpc\n\n```plantuml\n@startuml vnpy.rpc\nskinparam classAttributeIconVisible false\n\npackage vnpy.rpc {\n\n    class RemoteException {\n        + __init__(value)\n    }\n\n    class RpcClient {\n        - _socket_req: zmq.Socket (REQ)\n        - _socket_sub: zmq.Socket (SUB)\n        + start(req_address, sub_address)\n        + stop()\n        + __getattr__(name): dorpc()\n        + subscribe_topic(topic)\n        + on_disconnected()\n        + callback(topic, data)\n    }\n\n    class RpcServer {\n        - _socket_rep: zmq.Socket (REP)\n        - _socket_pub: zmq.Socket (PUB)\n        - _functions: dict<str, Callable>\n        + start(rep_address, pub_address)\n        + stop()\n        + register(func)\n        + publish(topic, data)\n        + check_heartbeat()\n    }\n\n    RpcClient ..> RpcServer : calls remote function\n    RpcServer .> RpcClient : publishes data\n}\n@enduml\n\n### Module: vnpy.chart\n\n```plantuml\n@startuml vnpy.chart\nskinparam classAttributeIconVisible false\n\npackage vnpy.chart {\n\n    class BarManager {\n        - _bars: dict<datetime, BarData>\n        - _datetime_index_map: dict<datetime, int>\n        + update_history(history)\n        + update_bar(bar)\n        + get_price_range(min_ix, max_ix)\n        + get_bar(ix): BarData\n    }\n\n    abstract class ChartItem {\n        - _manager: BarManager\n        - _bar_picutures: dict<int, QPicture>\n        + update_history(history)\n        + update_bar(bar)\n        + paint(painter, opt, w)\n        + {abstract} _draw_bar_picture(ix, bar): QPicture\n        + {abstract} get_y_range(min_ix, max_ix)\n    }\n\n    class CandleItem {\n        + _draw_bar_picture(ix, bar): QPicture\n    }\n\n    class VolumeItem {\n        + _draw_bar_picture(ix, bar): QPicture\n    }\n\n    class ChartWidget {\n        - _manager: BarManager\n        - _plots: dict<str, PlotItem>\n        - _items: dict<str, ChartItem>\n        - _cursor: ChartCursor\n        + add_plot(name, height)\n        + add_item(item_class, name, plot_name)\n        + update_history(history)\n        + move_to_right()\n        - _update_y_range()\n    }\n\n    class ChartCursor {\n        - _widget: ChartWidget\n        + update_info()\n    }\n\n    ChartItem <|-- CandleItem\n    ChartItem <|-- VolumeItem\n\n    ChartWidget o-- BarManager : uses\n    ChartWidget o-- ChartItem : contains\n    ChartWidget o-- ChartCursor : contains\n    ChartItem o-- BarManager : uses\n}\n@enduml\n\n## Phase 3: Overall Architecture & Summary\n\n### 3.1. Overall Architecture Analysis\n\n#### 3.1.1. Core Abstractions\n\nThe vn.py framework is built upon a robust **Event-Driven Architecture (EDA)**, which serves as the core design philosophy, ensuring high decoupling, extensibility, and responsiveness.\n\n### Core Abstractions\nThe architecture is defined by five primary abstractions:\n\n1.  **Event (`vnpy.event.Event`)**: The fundamental unit of communication. It is a simple, immutable container holding a string `type` (e.g., `EVENT_TICK`, `EVENT_ORDER`) and a `data` payload (e.g., `TickData`, `OrderData`). This abstraction ensures that components communicate without direct knowledge of each other.\n2.  **Event Engine (`vnpy.event.EventEngine`)**: The central message bus and the heart of the EDA. It manages event registration, queuing, and asynchronous dispatching. It operates on a separate thread, ensuring that event generation (e.g., from a gateway) does not block the main application thread or event processing.\n3.  **Main Engine (`vnpy.trader.MainEngine`)**: The application orchestrator, acting as a **Service Locator** and **Facade**. It is responsible for initializing and managing all other components, including gateways and functional engines. It provides a high-level interface for user operations (e.g., `send_order`, `subscribe`).\n4.  **Base Gateway (`vnpy.trader.BaseGateway`)**: The abstract interface for all external connectivity. It standardizes the communication with various trading systems (brokers, exchanges). It defines mandatory methods for trading operations (`connect`, `send_order`) and a set of callback methods (`on_tick`, `on_order`) used to push data back into the system via the Event Engine.\n5.  **Base Engine (`vnpy.trader.BaseEngine`)**: The abstract interface for all internal functional components (e.g., `OmsEngine`, `LogEngine`). It provides a standardized way for modules to integrate with the `MainEngine` and access the `EventEngine`.\n\n### Design Philosophy\nThe architecture adheres to the following principles:\n\n*   **Decoupling**: The Event Engine completely decouples data producers (Gateways) from data consumers (Engines/Strategies). Components only need to know the event type they are interested in, not the source or other components.\n*   **Extensibility (Open/Closed Principle)**: New trading systems can be integrated simply by implementing the `BaseGateway` interface. New features (e.g., risk management, strategy execution) can be added by implementing a new `BaseEngine` without modifying the core logic.\n*   **Centralized State Management**: The `OmsEngine` (Order Management System Engine) acts as the single source of truth for all current trading data (positions, orders, accounts). All other components query the `OmsEngine` for the latest state, preventing data inconsistencies.\n\n### Lifecycle Management\nThe application lifecycle is managed by the `MainEngine`:\n\n1.  **Initialization**: The `MainEngine` is instantiated, which in turn initializes and starts the `EventEngine`'s processing and timer threads.\n2.  **Component Registration**: The `MainEngine` registers core `BaseEngine`s (like `OmsEngine`) and then loads and registers `BaseGateway`s and application-specific engines (`BaseApp`s).\n3.  **Connection**: The user calls `MainEngine.connect()` for a specific gateway, which triggers the gateway to establish a connection and query initial data (contracts, positions).\n4.  **Shutdown**: The `MainEngine.close()` method is called. Crucially, it first stops the `EventEngine` to prevent new events, then sequentially calls the `close()` method on all registered engines and gateways for a clean shutdown.\n\n#### 3.1.2. Component Interactions\n\nThe entire system's operation is a continuous loop of data flowing into the system, being processed as events, and resulting in actions flowing out.\n\n### Data Flow: Market Data Ingestion (Asynchronous)\n\n1.  **Gateway Ingestion**: A `BaseGateway` (e.g., a simulated or real-time connection) receives a market data update (e.g., a new tick).\n2.  **Event Creation**: The gateway creates a `TickData` object and wraps it in an `Event` of type `EVENT_TICK`.\n3.  **Event Submission**: The gateway calls `self.event_engine.put(event)`.\n4.  **Event Processing**: The `EventEngine`'s worker thread retrieves the event from the queue.\n5.  **Dispatch to Handlers**:\n    *   **OmsEngine**: Updates its internal `self.ticks` dictionary with the latest data.\n    *   **Strategy Engine (Implied)**: Receives the event to execute its trading logic.\n    *   **Chart Engine (`vnpy.chart`)**: Receives the event to update the real-time chart display.\n\n### Control Flow: Trading Operations (Synchronous Request/Asynchronous Response)\n\n1.  **Request Initiation**: A strategy or the user interface calls a method on the `MainEngine`, such as `MainEngine.send_order(req, gateway_name)`.\n2.  **Delegation**: The `MainEngine` locates the specified `BaseGateway` and calls `gateway.send_order(req)`.\n3.  **External Communication**: The `BaseGateway` sends the order request to the external trading system. It immediately returns a unique `vt_orderid`.\n4.  **Asynchronous Response**: The external system's response (e.g., order accepted, filled, or rejected) is received by the `BaseGateway`.\n5.  **Internal Update**: The `BaseGateway` creates an `OrderData` or `TradeData` object and pushes it as an `EVENT_ORDER` or `EVENT_TRADE` back into the `EventEngine`.\n6.  **State Update**: The `OmsEngine` processes the event, updating the order's status or recording the trade.\n\n### Communication Patterns\n\n| Pattern | Module | Purpose |\n| :--- | :--- | :--- |\n| **Publish-Subscribe (Pub/Sub)** | `vnpy.event` | Core mechanism for all internal data and state changes. Ensures high decoupling. |\n| **Request-Reply (Req/Rep)** | `vnpy.trader` (MainEngine to Gateway) | Synchronous control flow for sending trading commands (e.g., `send_order`). |\n| **Remote Procedure Call (RPC)** | `vnpy.rpc` | Used for distributed deployment. `RpcClient` uses `REQ/REP` for function calls and `PUB/SUB` for streaming data from the `RpcServer`. |\n| **Facade** | `vnpy.trader.MainEngine` | Simplifies the complex underlying system (multiple gateways and engines) into a single, easy-to-use interface. |\n\n### 3.2. Overall Architecture PlantUML Diagram\n\n```plantuml\n@startuml\n@startuml vnpy_architecture\nskinparam classAttributeIconVisible false\n\ntitle vn.py Core Architecture\n\npackage \"External Trading Systems\" {\n    [Broker API] as API\n}\n\npackage \"vnpy.event\" {\n    class EventEngine {\n        + put(event)\n        + register(type, handler)\n    }\n    class Event {\n        + type\n        + data\n    }\n}\n\npackage \"vnpy.trader\" {\n    class MainEngine {\n        + add_gateway()\n        + add_engine()\n        + send_order()\n        + subscribe()\n    }\n\n    abstract class BaseGateway {\n        + connect()\n        + send_order()\n        + on_tick()\n        + on_order()\n    }\n\n    class OmsEngine {\n        + process_order_event()\n        + get_all_orders()\n    }\n\n    class StrategyEngine {\n        + process_tick_event()\n        + process_order_event()\n        + send_order()\n    }\n}\n\npackage \"vnpy.chart\" {\n    class ChartWidget {\n        + update_bar()\n    }\n}\n\n' Relationships\nAPI --> BaseGateway : Data In/Out\n\nBaseGateway .> Event : creates\nBaseGateway -> EventEngine : put(Event)\n\nMainEngine o-- BaseGateway : manages\nMainEngine o-- OmsEngine : manages\nMainEngine o-- StrategyEngine : manages\n\nMainEngine -> BaseGateway : send_order() (Control Flow)\n\nEventEngine -> OmsEngine : dispatch(Event)\nEventEngine -> StrategyEngine : dispatch(Event)\nEventEngine -> ChartWidget : dispatch(Event)\n\nOmsEngine .> MainEngine : get_contract() (State Query)\nStrategyEngine -> MainEngine : send_order() (Action)\n\nnote right of EventEngine\n    The EventEngine is the central\n    message bus, decoupling all components.\nend note\n\n@enduml\n@enduml\n```\n\n### 3.3. Design Patterns & Highlights\n\n#### 3.3.1. Design Patterns\n\nThe vn.py framework leverages several classic software design patterns to achieve its flexibility, scalability, and maintainability.\n\n### 1. Observer Pattern (via Event-Driven Architecture)\n*   **Description**: Defines a one-to-many dependency so that when one object (the subject) changes state, all its dependents (observers) are notified.\n*   **Implementation**: Implemented through the **Event Engine**. The `EventEngine` is the Subject, and any function registered to handle an event (e.g., `OmsEngine.process_order_event`) is an Observer.\n*   **Code Example (`vnpy/event/engine.py`)**:\n    ```python\n    # Registration (Observer subscribes to Subject)\n    def register(self, type: str, handler: HandlerType) -> None:\n        handler_list: list = self._handlers[type]\n        if handler not in handler_list:\n            handler_list.append(handler)\n    ```\n\n### 2. Adapter Pattern\n*   **Description**: Allows the interface of an existing class (external API) to be used as another interface (internal standard).\n*   **Implementation**: The `BaseGateway` (`vnpy/trader/gateway.py`) acts as the target interface. Specific gateway implementations adapt the vendor's API calls and data structures to the standardized `BaseGateway` methods and callbacks.\n*   **Code Example (`vnpy/trader/gateway.py`)**:\n    ```python\n    class BaseGateway(ABC):\n        @abstractmethod\n        def connect(self, setting: dict) -> None:\n            \"\"\"Start gateway connection.\"\"\"\n            pass\n    ```\n\n### 3. Facade Pattern\n*   **Description**: Provides a unified interface to a set of interfaces in a subsystem.\n*   **Implementation**: The `MainEngine` (`vnpy/trader/engine.py`) serves as the facade for the entire trading system, hiding the complexity of managing multiple gateways and functional engines.\n*   **Code Example (`vnpy/trader/engine.py`)**:\n    ```python\n    class MainEngine:\n        # ... manages gateways and engines internally ...\n        def send_order(self, req: OrderRequest, gateway_name: str) -> str:\n            \"\"\"Send new order request to a specific gateway.\"\"\"\n            gateway: BaseGateway | None = self.get_gateway(gateway_name)\n            if gateway:\n                return gateway.send_order(req)\n            else:\n                return \"\"\n    ```\n\n### 4. Repository Pattern\n*   **Description**: Mediates between the domain and data mapping layers, acting like an in-memory collection of domain objects.\n*   **Implementation**: The `OmsEngine` (`vnpy/trader/engine.py`) acts as the repository for all current trading data (orders, positions, accounts, contracts), centralizing the state.\n*   **Code Example (`vnpy/trader/engine.py` - OmsEngine):\n    ```python\n    class OmsEngine(BaseEngine):\n        def __init__(self, main_engine: MainEngine, event_engine: EventEngine) -> None:\n            self.orders: dict[str, OrderData] = {}\n            # ...\n        \n        def get_all_active_orders(self) -> list[OrderData]:\n            \"\"\"Get all active orders.\"\"\"\n            return list(self.active_orders.values())\n    ```\n\n#### 3.3.2. Project Highlights\n\nThe vn.py framework exhibits several innovative and flexible design choices:\n\n*   **Unified Data Model (VT-Symbol)**: The use of the `vt_symbol` (e.g., `symbol.exchange.value`) abstraction in `vnpy/trader/object.py` is a key highlight. It creates a globally unique identifier for every instrument across all connected gateways, simplifying data management and cross-gateway operations.\n*   **High Extensibility via Base Classes**: The core is designed around abstract base classes (`BaseGateway`, `BaseEngine`, `BaseApp`). This makes it exceptionally easy to extend the system by adding new trading interfaces (gateways) or new functional modules (engines/apps) without modifying the core logic. This adheres to the Open/Closed Principle.\n*   **Asynchronous RPC for Distributed Deployment**: The `vnpy.rpc` module, utilizing ZeroMQ, provides a built-in solution for distributing the trading system. This allows for separating the high-frequency market data processing (Server) from the strategy execution or UI (Client), enhancing performance and deployment flexibility. The hybrid use of REQ/REP for function calls and PUB/SUB for data streaming is a robust design choice.\n*   **Optimized Charting with PyQtGraph**: The `vnpy.chart` module uses PyQtGraph and the `QPicture` object caching mechanism (`ChartItem._draw_item_picture`) to significantly optimize the rendering of large amounts of historical bar data, ensuring a smooth and responsive user experience even with extensive backtesting results.\n\n### 3.4. Summary & Recommendations\n\n#### 3.4.1. Potential Improvements\n\nBased on the analysis, the following areas could be considered for improvement:\n\n*   **Asynchronous Event Handling for High-Frequency**: While the `EventEngine` uses a separate thread, the event processing loop (`_run` in `vnpy/event/engine.py`) is synchronous. For extremely high-frequency trading (HFT) or scenarios with very high data throughput, switching the event processing to an `asyncio` loop with coroutines could prevent a slow handler from blocking the processing of subsequent events. This would improve performance under heavy load.\n*   **Strict Immutability Enforcement**: The core data objects (`TickData`, `OrderData`, etc.) are defined as `dataclass`es, which implies a design intent for immutability. However, Python's `dataclass`es are mutable by default. Adding `frozen=True` to the `@dataclass` decorator in `vnpy/trader/object.py` would strictly enforce immutability, preventing accidental modification of critical state data after it has been published, thus enhancing data integrity.\n*   **Dependency Injection for MainEngine**: The `MainEngine` currently instantiates its engines directly (e.g., `self.add_engine(LogEngine)`). This tight coupling makes unit testing harder. Implementing a simple dependency injection pattern where engines are passed to the `MainEngine` constructor or registered via a configuration would improve testability and modularity.\n*   **Standardized Error Handling**: The `MainEngine`'s error handling often relies on logging messages in Chinese (e.g., `self.write_log(_(\"找不到底层接口：{}\").format(gateway_name))`). A more standardized, exception-based error propagation mechanism (e.g., custom exceptions for `GatewayNotFound`, `OrderRejected`) would allow for more robust programmatic error handling in strategies and applications, moving beyond simple logging.\n\n#### 3.4.2. Secondary Development Guide\n\nFor developers looking to extend or build upon the vn.py framework, the following path is recommended:\n\n1.  **Understand the Event Flow**: The first step is to grasp the **Event-Driven Architecture**. All data flows through the `EventEngine`. To receive data, you must register a handler for the relevant event type (e.g., `EVENT_TICK`). To send data/commands, you must use the `MainEngine` facade.\n2.  **Develop a New Strategy (App)**:\n    *   Create a new module that inherits from `BaseApp` (if a UI is needed) or directly from `BaseEngine` (for pure backend logic).\n    *   In the engine's `__init__`, register your event handlers with the `EventEngine` (e.g., `event_engine.register(EVENT_TICK, self.on_tick)`).\n    *   Implement the core logic within the handler methods, querying the current state from the `OmsEngine` (e.g., `self.main_engine.get_position(vt_positionid)`).\n3.  **Integrate a New Gateway**:\n    *   Create a new class that inherits from `BaseGateway` (`vnpy/trader/gateway.py`).\n    *   Implement the abstract methods: `connect()`, `close()`, `subscribe()`, `send_order()`, `query_account()`, and `query_position()`.\n    *   Crucially, implement the callbacks (`on_tick`, `on_order`, etc.) to translate the external API's data format into vn.py's standardized `TickData`, `OrderData`, etc., and push them via `self.on_event()`.\n4.  **Utilize the OMS Engine**: Always query the `OmsEngine` via the `MainEngine`'s helper methods (`self.main_engine.get_contract`, `self.main_engine.get_all_active_orders`) to access the current, authoritative state of the system. Do not attempt to maintain separate state copies.\n5.  **Use VT-Symbols**: When referencing any instrument, order, or position, always use the unified VT-Symbol (`vt_symbol`, `vt_orderid`) to ensure compatibility across different gateways.\n\n"
  }
]