Full Code of KittenCN/predict_Lottery_ticket for AI

main 6cf60bb730e7 cached

43 files

157.0 KB

50.4k tokens

68 symbols

1 requests

Download .txt

Repository: KittenCN/predict_Lottery_ticket
Branch: main
Commit: 6cf60bb730e7
Files: 43
Total size: 157.0 KB

Directory structure:
gitextract_e5o09tfl/

├── .github/
│   └── workflows/
│       └── ci.yml
├── .gitignore
├── AGENTS.md
├── ASSUMPTIONS.md
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── README_KL8.md
├── agent_report.md
├── config/
│   └── config.yaml
├── docker-compose.yml
├── docs/
│   ├── api.md
│   ├── architecture.md
│   ├── decision_record.md
│   ├── environment.md
│   ├── ops.md
│   └── verify.md
├── environment.yml
├── examples/
│   ├── analysis_example.py
│   └── quick_start.py
├── requirements.lock.txt
├── requirements.txt
├── scripts/
│   ├── get_data.py
│   ├── predict.py
│   └── train.py
├── src/
│   ├── __init__.py
│   ├── analysis.py
│   ├── bootstrap.py
│   ├── common.py
│   ├── config.py
│   ├── data_fetcher.py
│   ├── modeling.py
│   ├── pipeline.py
│   └── preprocessing.py
├── tests/
│   ├── conftest.py
│   ├── test_config.py
│   ├── test_modeling.py
│   ├── test_pipeline.py
│   └── test_preprocessing.py
└── validation_report.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/ci.yml
================================================
# 彩票AI预测系统 GitHub Actions CI/CD
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.8, 3.9, 3.11]

    steps:
    - uses: actions/checkout@v3
    
    - name: 设置 Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    
    - name: 安装依赖
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install black flake8 pytest pytest-cov
    
    - name: 代码格式检查
      run: |
        black --check src scripts tests examples
    
    - name: 静态代码检查
      run: |
        flake8 src scripts tests examples --max-line-length=100 --ignore=E203,W503
    
    - name: 运行测试
      run: |
        pytest tests/ -v --cov=src --cov-report=xml --cov-report=html
    
    - name: 上传覆盖率报告
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
        flags: unittests
        name: codecov-umbrella

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    env:
      DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
      DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}

    steps:
    - uses: actions/checkout@v3
    
    - name: 设置 Docker Buildx
      uses: docker/setup-buildx-action@v2
    
    - name: 登录 Docker Hub
      uses: docker/login-action@v2
      with:
        username: ${{ env.DOCKERHUB_USERNAME }}
        password: ${{ env.DOCKERHUB_TOKEN }}
    
    - name: 构建并推送 Docker 镜像
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: |
          ${{ env.DOCKERHUB_USERNAME }}/lottery-predict:latest
          ${{ env.DOCKERHUB_USERNAME }}/lottery-predict:${{ github.sha }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

  security:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: 运行安全扫描
      uses: pypa/gh-action-pip-audit@v1.0.8
      with:
        inputs: requirements.txt

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
model/
data/
data_cache/
results/
*.py[cod]
*$py.class
*.csv

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

features/
test/
test
train/
train
data/
data
.vscode
.idea
.DS_Store

# custom
*.old
*.bak
*.doc
*.docx
*.xls
*.xlsx
*.csv
*.jpg
*.jpeg
*.png
*.pkl
*.pkl.json
*.log.json
docs/modelzoo_statistics.md
mmdet/.mim
work_dirs/
Meteorological forecast/data/TXT/
Meteorological forecast/data/TXT

# Pytorch
*.pth
*.py~
*.sh~


================================================
FILE: AGENTS.md
================================================
# agent.md | Codex 执行代理规范（中英双语版）

> 目标：让 Codex（或同类代码生成模型）在最少追问的前提下自动完成需求、自测并保证可运行，交付可维护成果。<br>> Goal: Enable Codex (or similar code-generation agents) to autonomously deliver maintainable, runnable work with minimal back-and-forth, including self-testing.

---

## 0. 沟通与产出原则 | Communication & Deliverable Principles
1. **语言**：所有交流、注释、提交信息、README 默认使用简体中文，可附英文术语。<br>**Language**: Default to Simplified Chinese for communication, comments, commit messages, and README content; append English terms when helpful.
2. **默认动作**：遇到可合理假设的细节，应直接根据专业判断继续实现，并将假设记录在 `ASSUMPTIONS.md`。<br>**Default action**: When reasonable assumptions are possible, proceed with a professional default and record the assumption in `ASSUMPTIONS.md`.
3. **最小打扰**：除非风险极高，不等待额外确认；在交付物中标注可选项与替代方案。<br>**Minimise back-and-forth**: Avoid pausing for confirmation unless risk is high; document alternatives and options in the deliverables.
4. **可运行性**：生成的代码、脚本、配置必须可本地一键运行并通过测试。<br>**Run-ability**: Produced code, scripts, and configs must run locally via one-liners and pass tests.
5. **可复现性**：输出固定的环境说明与锁定依赖（如 `requirements.txt`、`poetry.lock`、`package-lock.json`）。<br>**Reproducibility**: Provide deterministic environment notes and lock dependencies (e.g., `requirements.txt`, `poetry.lock`, `package-lock.json`).
6. **安全优先**：默认不开启高危权限；对外部调用使用显式白名单和可配置开关。<br>**Security first**: Do not enable dangerous permissions by default; gate external calls behind explicit allow-lists and configurable switches.

---

## 1. 交付物目录结构（通用模板）| Suggested Deliverable Structure (Generic)

```
project-root/
├─ src/                         # 业务源码 | Core source code
├─ tests/                       # 单元/集成测试 | Unit/integration tests
├─ examples/                    # 最小示例 | Minimal runnable examples
├─ scripts/                     # 任务脚本 | Task automation scripts
├─ config/                      # 配置模板 | Configuration templates
├─ docs/                        # 设计/API/运维文档 | Design/API/Ops docs
├─ .github/workflows/           # CI 工作流 | CI workflows
├─ Dockerfile                   # 容器化运行 | Container runtime
├─ docker-compose.yml           # 本地依赖编排 | Local dependency orchestration
├─ pyproject.toml / package.json / requirements.txt
├─ Makefile                     # 一键任务入口 | One-command task entry
├─ README.md                    # 使用说明 | Usage guide
├─ ASSUMPTIONS.md               # 假设与权衡 | Assumptions & trade-offs
├─ CHANGELOG.md                 # 变更日志 | Change log
└─ agent_report.md              # 自动执行报告 | Automation run report
```

> 要求：在可行情况下保持上述结构，为团队提供统一入口。<br>> Requirement: Adopt the structure when feasible to give the team a consistent entry point.

---

## 2. 一键任务（Makefile 约定）| Makefile One-liner Conventions

```
make setup        # 安装依赖、初始化环境 | Install dependencies and bootstrap environment
make fmt          # 代码格式化 | Format code
make lint         # 静态检查 | Run linters / static analysis
make test         # 运行测试 | Execute test suite
make run          # 启动应用或示例 | Run the application/example
make build        # 构建产物 | Build distributables or images
make ci           # 本地模拟 CI：lint + test + build | Local CI: lint + test + build
```

> PR/交付前需确保 `make ci` 全绿；如无需 Docker，可用其他可发布产物替代。<br>> Ensure `make ci` passes before delivery; if Docker isn’t needed, substitute with another distributable artefact.

---

## 3. 需求到实现的自动流程 | Requirement-to-Implementation Workflow
1. **解析需求**：抽取功能点、接口、数据结构、约束、非功能需求，识别歧义并在 `ASSUMPTIONS.md` 中记录默认值。<br>**Requirement analysis**: Extract features, interfaces, data structures, constraints, and non-functional needs; capture ambiguities and assumptions in `ASSUMPTIONS.md`.
2. **制定方案**：输出模块边界、依赖、关键数据流或时序图，并在 `docs/decision_record.md` 说明取舍。<br>**Design**: Define module boundaries, dependencies, key data flows or sequence diagrams, and document trade-offs in `docs/decision_record.md`.
3. **脚手架落地**：搭建目录与基础文件，补齐 `README.md` 与一键运行指引；配置 `config/.env.example` 等模板。<br>**Scaffolding**: Lay out directories and base files, enrich `README.md` and quick-start instructions, and provide `config/.env.example` templates.
4. **实现编码**：遵循整洁代码，使用中文注释并为公共函数编写 docstring/示例。<br>**Implementation**: Write clean code with Chinese comments and docstrings/examples for public functions.
5. **自测优先**：为模块编写单元测试，关键流程补集成测试，目标覆盖率 ≥ 80%。<br>**Self-testing**: Create unit tests per module and integration tests for critical flows, targeting ≥80% coverage.
6. **质量闸门**：执行 `make fmt && make lint && make test`，生成覆盖率报告与构建产物。<br>**Quality gate**: Run `make fmt && make lint && make test`, produce coverage reports and build artefacts.
7. **交付收尾**：更新 `CHANGELOG.md`、扩充 `README` 常见问题，整理 `agent_report.md`。<br>**Delivery wrap-up**: Update `CHANGELOG.md`, expand the README FAQ, and prepare `agent_report.md`.

---

## 4. 代码与文档规范 | Code & Documentation Standards
- Python：使用 `ruff` + `black` + `mypy`（严格模式）。<br>Python: Adopt `ruff`, `black`, and strict `mypy`.
- TS/JS：使用 `eslint`（typescript-eslint）+ `prettier`。<br>TS/JS: Use `eslint` (typescript-eslint) plus `prettier`.
- Go：使用 `gofmt` + `golangci-lint`。<br>Go: Apply `gofmt` and `golangci-lint`.
- 日志需统一封装，默认不打印敏感信息。<br>Log through a unified wrapper and avoid printing sensitive data by default.
- 配置优先 `config.yaml` + `.env`，读取时需提供默认值和类型校验。<br>Prefer `config.yaml` plus `.env`; supply defaults and type validation when loading.
- 错误信息应在外层统一处理，提供中文可读的提示与修复建议。<br>Handle errors at standard boundaries, presenting human-readable Chinese messages with remediation tips.
- 公共 API 必须附 docstring 与使用示例。<br>Public APIs must include docstrings and usage samples.
- 提交信息遵循 Conventional Commits（如 `feat:`、`fix:`、`docs:`）。<br>Follow Conventional Commits (e.g., `feat:`, `fix:`, `docs:`).

---

## 5. 测试策略 | Testing Strategy
1. **单元测试**：覆盖核心算法、数据转换、边界与异常路径。<br>**Unit tests**: Cover core algorithms, data transforms, edge cases, and error paths.
2. **集成测试**：验证端到端关键流程，必要时对外部依赖进行 mock。<br>**Integration tests**: Validate end-to-end flows with mocks for external systems when needed.
3. **回归样例**：为已修复的缺陷补充最小复现测试。<br>**Regression cases**: Add minimal reproductions for fixed bugs.
4. **性能冒烟（可选）**：对关键路径做小规模基准，记录指标与阈值。<br>**Performance smoke (optional)**: Benchmark critical paths lightly, record metrics and thresholds.
5. **覆盖率目标**：`--cov=src --cov-report=xml`，保持行覆盖率 ≥ 80%。<br>**Coverage target**: Run with `--cov=src --cov-report=xml`, aiming for ≥80% line coverage.

---

## 6. CI/CD 最小流程 | Minimal CI/CD Pipeline
- 触发：`push` 与 `pull_request`。<br>Trigger on `push` and `pull_request`.
- 作业顺序：<br>Pipeline stages:
  1. **Setup**：安装依赖并配置缓存。<br>**Setup**: Install dependencies and configure caches.
  2. **Static**：执行 `make fmt && make lint`。<br>**Static**: Run `make fmt && make lint`.
  3. **Test**：执行 `make test` 并上传覆盖率。<br>**Test**: Run `make test` and upload coverage.
  4. **Build**：执行 `make build`。<br>**Build**: Execute `make build`.
- 任一阶段失败即阻断合入。<br>Failing any stage blocks the merge.

---

## 7. 安全与合规 | Security & Compliance
- 禁止硬编码密钥/Token/私钥；改用 `.env` + 密钥管理（如 GitHub Secrets）。<br>Never hardcode secrets; rely on `.env` plus secret managers (e.g., GitHub Secrets).
- 外部请求需设置超时、重试、熔断，且使用白名单域名。<br>External requests must include timeouts, retries, circuit breakers, and domain allow-lists.
- 文件操作限制在 `project-root/` 内，删除/覆盖前请备份。<br>Restrict file operations to `project-root/`; back up before deleting or overwriting.
- 记录关键操作与失败栈，满足审计需求。<br>Log critical actions and failure traces for auditability.

7. **Conda 环境要求（新增）**：所有自动运行的 agent 或脚本在执行时应确认处于名为 `python311` 的 conda 环境中，或等效地使用已经安装并锁定项目依赖的虚拟环境。
  <br>**Why**: 保证在 agent 自动运行、测试或 CI 中使用一致的依赖和 Python 版本，避免因为全局包差异或者系统 Python 版本差异导致不可复现的失败。
  <br>**How**: 在 agent 的启动脚本或 CI workflow 中显式激活环境（例如 `conda activate python311`），或使用 `actions/setup-python` 并安装 `requirements.txt`。在 `README.md` 或 `docs/ops.md` 中记录该要求。

---

## 8. 自动执行报告模板 | Automation Execution Report Template

```
# 本次自动执行报告 | Automation Execution Report

## 需求摘要 | Requirement Summary
- 背景与目标 | Background & objectives:
- 核心功能点 | Key features:

## 关键假设 | Key Assumptions
- （详见 ASSUMPTIONS.md）| (See ASSUMPTIONS.md)

## 方案概览 | Solution Overview
- 架构与模块 | Architecture & modules:
- 选型与权衡 | Choices & trade-offs:

## 实现与自测 | Implementation & Self-testing
- 一键命令 | One-liner: `make setup && make ci && make run`
- 覆盖率 | Coverage: xx%
- 主要测试清单 | Major tests: 单元 N 项 / 集成 M 项 | N unit / M integration tests
- 构建产物 | Build artefacts:

## 风险与后续改进 | Risks & Next Steps
- 已知限制 | Known limitations:
- 建议迭代 | Suggested iterations:
```

---

## 9. 外部系统与数据交互约定 | External System & Data Interaction Guidelines
- **HTTP**：统一封装 client，包含重试、超时、重定向处理、错误码翻译与指标采集。<br>**HTTP**: Use a shared client wrapper with retries, timeouts, redirect handling, error translation, and metrics collection.
- **数据库**：使用迁移脚本（如 Alembic/Prisma），并通过 `docker-compose` 启动本地依赖。<br>**Databases**: Manage schema with migrations (Alembic/Prisma) and spin up dependencies via `docker-compose`.
- **消息/缓存**：提供本地 mock 或容器化服务（Kafka/Redis/RabbitMQ）。<br>**Messaging/cache**: Provide local mocks or containerised services (Kafka/Redis/RabbitMQ).
- **文件**：输入输出路径通过 `config/` 或 `.env` 配置，禁止硬编码绝对路径。<br>**File IO**: Configure paths via `config/` or `.env`; never hardcode absolute paths.

---

## 10. 典型脚本约定 | Script Conventions
- 示例脚本：`scripts/bootstrap.sh`、`scripts/dev_run.sh`、`scripts/seed_data.py`、`scripts/release.sh`。<br>Example scripts: `scripts/bootstrap.sh`, `scripts/dev_run.sh`, `scripts/seed_data.py`, `scripts/release.sh`.
- 所有脚本需支持 `-h/--help`，失败时返回非零退出码，并输出中文提示与下一步建议。<br>All scripts must support `-h/--help`, exit non-zero on failure, and print Chinese guidance with next-step hints.

---

## 11. 文档要求 | Documentation Requirements
- `README.md` 必含项目简介、功能清单、快速开始（≤3 条）、配置说明、常见问题。<br>`README.md` must include a project overview, feature list, quick start (≤3 commands), configuration notes, and FAQ.
- `docs/` 需包含 `architecture.md`（Mermaid 图）、`api.md`（接口定义）、`ops.md`（监控/告警/日志）。<br>`docs/` should provide `architecture.md` (with Mermaid diagrams), `api.md` (API definitions), and `ops.md` (monitoring/alerting/logging).
- 变更需更新 `CHANGELOG.md`（遵循 Keep a Changelog + SemVer）。<br>Update `CHANGELOG.md` for changes, following Keep a Changelog and SemVer conventions.

---

## 12. 完成标准 | Definition of Done
- [ ] `make ci` 全部通过。<br>`make ci` passes completely.
- [ ] `make run` 可运行最小示例。<br>`make run` launches a minimal example.
- [ ] 覆盖率 ≥ 80%，关键路径具备集成测试。<br>Coverage ≥80% with integration tests on critical paths.
- [ ] README/ASSUMPTIONS/CHANGELOG/agent_report 已更新。<br>README, ASSUMPTIONS, CHANGELOG, and agent_report are updated.
- [ ] 配置可通过 `.env` 切换环境，无敏感信息入库。<br>Configs are environment-switchable via `.env` with no secrets committed.
- [ ] 日志与错误信息可读，并包含排错建议。<br>Logs and errors are readable with troubleshooting suggestions.

---

## 13. 应对新需求的回复模板 | Response Template for New Requests
> **始终用中文简洁回复，可在需要时直接给出可运行的命令或代码块。**<br>> **Respond succinctly in Chinese, adding runnable commands or code snippets when helpful.**

1. **概述**：复述需求要点与关键假设。<br>**Overview**: Restate the requirements and note key assumptions.
2. **交付**：提供新增文件或补丁，并同步更新相关测试/文档。<br>**Deliverables**: Present new files or patches and update relevant tests/docs.
3. **运行**：给出 1～2 条验证命令。<br>**Run**: Offer one or two commands to validate the work.
4. **结果**：说明自测范围、覆盖率或关键日志。<br>**Result**: Summarise self-test scope, coverage, or key logs.
5. **后续**：列出可选优化项与影响评估。<br>**Next steps**: List optional improvements and impact assessments.

---

## 14. 最小示例（占位，可按项目替换）| Minimal Example (Placeholder)
- 运行顺序：<br>Run order:
  ```bash
  make setup
  make ci
  make run
  ```
- 若失败：查看 `agent_report.md` 的故障排查章节，执行 `make test -k failing_case` 复现问题。<br>If failures occur, review the troubleshooting section in `agent_report.md` and run `make test -k failing_case` to reproduce.

---

> **执行承诺 | Execution Promise**：除非明确要求暂停，代理将按本规范自动推进至“可运行 + 已自测 + 可交付”状态再输出结果。<br>> **Execution promise**: Unless explicitly told to pause, the agent proceeds until the work is runnable, self-tested, and deliverable before responding.


================================================
FILE: ASSUMPTIONS.md
================================================
6. KL8（快乐8）相关内容已于2025年10月迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-lottery-analyzer)。本仓库不再维护KL8相关功能。
   - 原因：KL8玩法数据量大、分析逻辑独立，单独维护便于扩展和优化。
   - 影响：本项目仅支持双色球、大乐透、排列三、七星彩、福彩3D等玩法。

7. 预测红球唯一性修正（2024-06）
   - 假设：每注红球号码应唯一，不能重复。
   - 措施：修正了 `src/pipeline.py` 预测逻辑，采用去重策略，彻底避免红球重复。
   - 影响：预测结果更贴合实际规则，提升用户体验。
# ASSUMPTIONS & DECISIONS

列出实现过程中所作的重要假设与决策，便于后续维护与审计。

1. 使用 `tf.keras`（随 TensorFlow 一起分发）而非第三方独立 `keras` 包。
   - 原因：独立 `keras` 版本与 TensorFlow 之间可能存在实现差异，曾导致 `keras.src.engine` 等导入错误。
   - 操作：在当前环境中卸载了独立的 `keras`（已在本仓库历史中记录）。

2. 二进制包（numpy、mkl、tensorflow-intel、torch 等）通过 conda 管理。
   - 原因：在 Windows 等平台上，许多科学/深度学习包需要编译好的二进制 wheel；conda 更适合管理这些二进制依赖，避免 pip 在本机编译时出现冲突。
   - 提供了 `environment.yml` 用于创建可重复的 conda 环境。

3. 为兼容第三方库在导入时检查已弃用的 TF API，添加了 `src/bootstrap.py` shim。
   - 目标：在程序最早期提供映射（例如将 `tf.ragged.RaggedTensorValue` 映射到 `tf.compat.v1.ragged.RaggedTensorValue`）以避免导入错误。
   - shim 是 best-effort 且局部抑制弃用警告，不应替代上游修复。

4. 锁定可移植 pip 依赖为 `requirements.lock.txt`（移除本地 file:/// 条目）。
   - 对于私有或 editable VCS 安装，保留在注释中，用户需自行配置访问权限。

5. 日志与弃用警告策略
   - shim 在内部临时抑制相关 DeprecationWarning，避免污染用户控制台输出。
# 项目假设与权衡 (Assumptions & Trade-offs)

本文件记录为了顺利升级到 TensorFlow 2.15.1 及其配套生态而确立的关键假设与权衡。若后续条件发生变化，请优先在此文档更新对应结论。

## 🎯 核心假设

### 0. 运行环境
- **假设**: 所有脚本与 CI 均在名为 `python311` 的 Conda 环境或等效的 Python 3.11 虚拟环境中执行。
- **权衡**: TensorFlow 2.15.1 对 Python 版本有明确要求，放弃旧版 Python 能减少兼容性问题。
- **替代方案**: 对于无法安装 Conda 的环境，可通过 Docker/WSL2 预装 python311。

### 1. 数据源
- **假设**: 500.com 提供的历史开奖数据在结构和内容上保持稳定、准确。
- **权衡**: 依赖单一数据源易受网络波动或页面结构调整影响。
- **缓解**: 通过自定义异常与日志提示快速定位故障，后续可扩展备用数据源。

### 2. 模型架构
- **假设**: 基于 TensorFlow 2.15.1 + Keras 2.15 的多层 LSTM 方案能够有效捕捉时间窗口内的号码分布关系。
- **权衡**: LSTM 对标签间依赖建模能力有限，可能不如 CRF 精细，但实现更简单、跨平台兼容性好。
- **替代方案**: Transformer、GRU、Temporal CNN 或在未来追加注意力模块。

### 3. 时间窗口
- **假设**: 最近 `windows_size` 期的彩票号码携带更有价值的统计信息。
- **权衡**: 单一固定窗口可能无法适配所有玩法；过大窗口将显著增加训练成本。
- **备选**: 文档中提供自定义窗口说明，后续可尝试注意力或加权策略。

### 4. 特征建模
- **假设**: 将彩票号码视作离散类别（而非连续数值）更符合序列建模语境。
- **权衡**: 无法直接建模号码之间的数值差异，需依靠模型学习潜在关系。
- **替代**: 增加差值、奇偶、冷热等派生特征，可在后续版本迭代。

## 🔧 技术选择权衡

### 1. 框架与依赖
- **选择**: `tensorflow==2.15.1`、`numpy>=1.24,<2.0`、`pandas>=2.2`。
- **原因**: 支持 Python 3.11，官方长期维护，并提供跨平台 wheel。
- **权衡**: GPU 版要求 CUDA ≥ 12.2；如需其它框架需重新实现训练流程。
- **替代**: 继续使用 `tf.compat.v1` 以保留旧代码，代价是未来维护成本持续增加。

### 2. 配置管理
- **选择**: `config/config.yaml` + `src/config.py` 数据类，配合 `.env` 提供敏感配置。
- **权衡**: 配置文件增多需要同步维护；但便于文档化和测试。
- **替代**: 单纯依赖环境变量或数据库配置中心。

### 3. 模型持久化
- **选择**: 统一使用 `.keras`（SavedModel v3）格式保存权重与计算图。
- **权衡**: 旧版 `.ckpt` 文件无法直接加载，需重新训练。
- **替代**: 提供双栈加载逻辑，但会引入大量 `tf.compat.v1` 代码与分支处理。

### 4. HTTP 客户端
- **选择**: 基于 `requests.Session` + `urllib3.Retry` 封装的 `LotteryHttpClient`，默认启用超时、重试与白名单校验。
- **权衡**: 增加实现复杂度；但满足安全要求并显著提升鲁棒性。
- **替代**: 直接使用裸 `requests.get`，风险较大。

## 📊 模型设计权衡

### 1. 批大小与梯度稳定性
- **假设**: 小批量（默认 `batch_size=32`）对 LSTM 收敛更友好。
- **措施**: 引入梯度裁剪与 `EarlyStopping`，并允许在配置中调整。

### 2. 损失函数
- **选择**: `SparseCategoricalCrossentropy`（逐位置分类）。
- **权衡**: 实现简单且跨平台兼容；若需捕捉标签依赖，可在后续引入注意力或条件解码。

### 3. 优化器
- **选择**: `tf.keras.optimizers.Adam(clipnorm=1.0)`。
- **权衡**: Adam 在中小数据集表现稳定；若需更快收敛可增加 `optimizer_type` 参数。

## 🏗️ 架构设计假设

### 1. 模块拆分
- **假设**: 数据抓取、特征工程、模型构建、训练与推理需要清晰分层。
- **实现**: 新增 `src/data_fetcher.py`、`src/preprocessing.py`、`src/modeling.py`、`src/pipeline.py`。
- **好处**: 提升复用性与可测试性。

### 2. 日志与错误处理
- **假设**: 统一通过 `loguru` 日志记录关键操作，并提供中文+英文关键词提示。
- **措施**: 所有外部操作均记录成功/失败状态与建议。

### 3. 配置可扩展性
- **假设**: 新玩法仅需在配置中补充元数据即可驱动训练/预测。
- **限制**: 若玩法结构差异过大，需要扩展预处理逻辑。

## 🚀 性能与资源假设

### 1. 计算资源
- **假设**: 单机（CPU 或单 GPU）即可完成训练；默认启用 GPU 内存自增长。
- **备选**: 未来可引入 `tf.distribute.MirroredStrategy` 做多 GPU 训练。

### 2. 内存
- **假设**: 历史开奖数据 < 200MB，可直接载入内存。
- **缓解**: 数据加载提供 `chunk_size` 参数，必要时可流式处理。

### 3. 训练时间
- **假设**: 在 GPU 上单模型训练 < 10 分钟；若无 GPU 约 30 分钟。
- **措施**: 文档提供多进程/多窗口并行训练指南。

## 🔒 安全与合规

### 1. 网络安全
- **假设**: 仅访问白名单域名 `datachart.500.com` 和 `data.917500.cn`。
- **措施**: 客户端在运行时校验请求域名，防止 SSRF。

### 2. 文件安全
- **假设**: 所有输出目录位于仓库根目录下的 `data/`、`model/`、`predict/`。
- **措施**: 覆盖写入前先备份旧文件；日志输出无敏感信息。

## 📈 扩展性与未来计划

1. 引入 Transformer/混合注意力模型，比较与当前多层 LSTM 的表现。
2. 支持多数据源聚合与质量评分。
3. 开放 RESTful API / gRPC 服务接口。
4. 构建自动超参搜索（Optuna/keras-tuner），提高迭代效率。
5. 记录推理元数据，便于可观测性与回溯分析。

---

**说明**：以上假设与策略与 2024 年 10 月的依赖与业务场景匹配。若依赖版本或部署目标发生改变，请同步更新本文档并在 `docs/decision_record.md` 中补充新的架构决策。***


================================================
FILE: CHANGELOG.md
================================================
# Changelog

All notable changes to this project should be documented in this file.


## [Unreleased]
- ❌ **KL8（快乐8）相关内容迁移**：所有KL8相关源码、数据、分析与文档已迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-lottery-analyzer)。本仓库不再包含KL8相关功能。
- 🐛 **修复红球预测重复问题**：修正了双色球等玩法预测结果中红球可能重复的bug，现已确保每注红球号码唯一（见 `src/pipeline.py` 2024-06 修正）。
- 📚 **文档同步**：更新README、ASSUMPTIONS、决策记录等文档，反映KL8迁移和预测逻辑修正。
- Add `src.bootstrap.py` compatibility shim to map deprecated TF ragged symbol and avoid import-time crashes.
- Create `requirements.lock.txt` (portable pip lock) and provide guidance for `environment.yml` (conda) to reproduce binary dependencies.
- Uninstall standalone `keras` in local environment and prefer `tf.keras`.
- Update `scripts/train.py` to import `src.bootstrap` early.
- Update `README.md`, add `ASSUMPTIONS.md` and `docs/` guidance for environment reproducibility and verification.
# 更新日志 (Changelog)

本项目遵循 [Keep a Changelog](https://keepachangelog.com/zh-CN/1.0.0/) 规范，版本号遵循 [语义化版本](https://semver.org/lang/zh-CN/) 规范。

## [3.0.0] - 2024-12-20

### 重大变更 (Breaking Changes)
- 🧠 **全面升级 TensorFlow**：迁移至 TensorFlow 2.15.1 + Keras 2.15，弃用所有 `tf.compat.v1` API 并移除对 tensorflow-addons 的依赖
- 💾 **模型格式调整**：训练产物改为 `.keras`（SavedModel v3），旧版 `.ckpt` 文件不再兼容
- 🧱 **训练/预测流程重写**：新增 `data_fetcher`、`preprocessing`、`pipeline` 等模块，原有 Session 流程下线

### 新增功能 (Added)
- ✨ **数据抓取客户端**：提供带重试、白名单校验的 `LotteryHttpClient`
- 🧪 **单元测试**：新增配置、预处理、模型与训练管线的覆盖，并引入 `pytest --cov`
- 🛠️ **运维文档**：补充 `docs/decision_record.md`、`docs/ops.md` 以及最新 API 说明

### 改进优化 (Changed)
- 📦 **依赖锁定**：更新 `requirements.txt`（TensorFlow 2.15.1、pandas 2.2、ruff/mypy 等）
- 🧾 **Makefile 流程**：`lint` 使用 ruff + mypy，`test` 默认输出覆盖率，`train/predict` 命令参数同步新版脚本
- 📚 **README/示例**：同步最新命令、模型架构说明与注意事项

- 🧹 **旧版测试/脚本**：删除 `tf.compat` 相关代码与失效测试（多线程进度条样例等），替换为新版覆盖
- 🗑️ **CRF 依赖**：移除对 `tensorflow-addons` 与 CRF 层的使用，统一改为纯 LSTM + Softmax 模型

---

## [2.0.0] - 2024-12-19

### 重大变更 (Breaking Changes)
- 🏗️ **重构项目结构**: 按照 AGENTS.md 规范重新组织代码结构
- 📁 **目录结构变更**: 
  - 核心代码移至 `src/` 目录
  - 脚本文件移至 `scripts/` 目录  
  - 测试文件移至 `tests/` 目录
  - 配置文件移至 `config/` 目录
- 🔧 **导入路径更新**: 所有导入语句已更新以适配新结构

### 新增功能 (Added)
- ✨ **项目模板化**: 添加标准化的项目结构和开发工具
- 🛠️ **Makefile支持**: 提供一键式任务命令
- 🐳 **Docker支持**: 添加 Dockerfile 和 docker-compose.yml
- 🔄 **CI/CD流水线**: 添加 GitHub Actions 自动化流水线
- 📝 **完整文档**: 添加 README、ASSUMPTIONS、CHANGELOG 等文档
- 🧪 **单元测试**: 添加基础测试框架和测试用例
- 📊 **示例程序**: 添加快速开始和数据分析示例
- ⚙️ **配置模板**: 提供 .env.example 和 config.yaml 模板

### 改进优化 (Changed)
- 📦 **依赖管理**: 更新 requirements.txt，添加版本约束
- 🔧 **配置管理**: 改进配置文件结构和管理方式
- 📚 **代码组织**: 按功能模块重新组织代码
- 🎯 **错误处理**: 改进异常处理和日志记录

### 修复问题 (Fixed)
- 🐛 **导入错误**: 修复模块导入路径问题
- 🔗 **依赖缺失**: 添加缺失的依赖包

### 文档更新 (Documentation)
- 📖 **README重写**: 完全重写项目说明文档
- 📋 **架构文档**: 添加项目架构和技术选择说明  
- 🤝 **贡献指南**: 添加代码贡献和开发规范
- ⚖️ **许可协议**: 明确项目许可和免责声明

---

## [1.x.x] - 历史版本 (Legacy)

### [1.3.1] - 2023-10-31
#### 新增
- 增加 kl8 plus 系列文件，支持多线程数据处理
- 修改 kl8_running，提升数据处理速度

#### 注意
- 多线程处理对CPU要求较高，请根据硬件配置谨慎使用

### [1.3.0] - 2023-09-03  
#### 新增
- 增加两个 kl8_ 开头的文件用于数据预处理和获奖计算
- 新增数据分析和预测验证功能

### [1.2.0] - 2023-03-27
#### 新增
- 增加对七星彩（qxc）的支持
- 增加对福彩3D（sd）的支持
- 完善数据获取和模型训练流程

### [1.1.0] - 2023-03-22
#### 新增  
- 增加执行参数开关
- red_epochs、blue_epochs、batch_size 参数支持 -1 值读取配置文件
- 修改参数默认值为 -1

#### 改进
- 优化参数配置管理
- 改进命令行参数处理

### [1.0.0] - 2023-01-01
#### 初始版本
- 🎯 支持双色球、大乐透、排列三、快乐8预测
- 🧠 基于 LSTM + CRF 的深度学习模型  
- 📊 自动数据获取和处理
- 🔧 CPU/GPU 自动切换支持
- 📈 基础数据分析功能

---

## 版本说明

### 版本号格式
采用语义化版本号 `MAJOR.MINOR.PATCH`：
- **MAJOR**: 重大变更，可能不向后兼容
- **MINOR**: 新功能添加，向后兼容  
- **PATCH**: 问题修复，向后兼容

### 变更类型
- 🏗️ **重大变更**: 不向后兼容的变更
- ✨ **新增功能**: 新功能或特性
- 🔧 **改进优化**: 现有功能的改进
- 🐛 **修复问题**: 错误修复
- 📚 **文档更新**: 文档相关变更
- 🔒 **安全修复**: 安全相关修复
- ⚠️ **废弃功能**: 计划废弃的功能
- ❌ **移除功能**: 已移除的功能

### 发布计划
- 主要版本：根据功能发展情况决定
- 次要版本：新功能稳定后发布
- 补丁版本：重要bug修复后及时发布

### 兼容性政策
- 主版本升级可能包含不兼容变更
- 次版本升级保持向后兼容
- 补丁版本仅包含bug修复

---

**说明**: 1.x.x 版本的详细记录可能不完整，从 2.0.0 开始将严格遵循此变更日志格式。


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
  and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
  overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or
  advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
  address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
  professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.

Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
.
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series
of actions.

**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior,  harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within
the community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.

Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.


================================================
FILE: Dockerfile
================================================
# 彩票AI预测系统 Docker 镜像
FROM python:3.11-slim

# 设置工作目录
WORKDIR /app

# 设置环境变量
ENV PYTHONPATH=/app
ENV PYTHONUNBUFFERED=1
ENV TF_CPP_MIN_LOG_LEVEL=2

# 安装系统依赖
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    libc6-dev \
    && rm -rf /var/lib/apt/lists/*

# 复制 requirements.txt
COPY requirements.txt .

# 安装 Python 依赖
RUN pip install --no-cache-dir -r requirements.txt

# 复制项目文件
COPY src/ ./src/
COPY scripts/ ./scripts/
COPY config/ ./config/
COPY examples/ ./examples/

# 创建必要的目录
RUN mkdir -p data model predict logs

# 设置权限
RUN chmod +x scripts/*.py

# 暴露端口（如果需要web服务）
# EXPOSE 8000

# 默认命令
CMD ["python", "examples/quick_start.py"]

# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import sys; sys.exit(0)"

================================================
FILE: LICENSE
================================================
                    GNU GENERAL PUBLIC LICENSE
                       Version 3, 29 June 2007

 Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

                            Preamble

  The GNU General Public License is a free, copyleft license for
software and other kinds of works.

  The licenses for most software and other practical works are designed
to take away your freedom to share and change the works.  By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.  We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors.  You can apply it to
your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.

  To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights.  Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.

  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received.  You must make sure that they, too, receive
or can get the source code.  And you must show them these terms so they
know their rights.

  Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.

  For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software.  For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.

  Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so.  This is fundamentally incompatible with the aim of
protecting users' freedom to change the software.  The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable.  Therefore, we
have designed this version of the GPL to prohibit the practice for those
products.  If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.

  Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary.  To prevent this, the GPL assures that
patents cannot be used to render the program non-free.

  The precise terms and conditions for copying, distribution and
modification follow.

                       TERMS AND CONDITIONS

  0. Definitions.

  "This License" refers to version 3 of the GNU General Public License.

  "Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.

  "The Program" refers to any copyrightable work licensed under this
License.  Each licensee is addressed as "you".  "Licensees" and
"recipients" may be individuals or organizations.

  To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy.  The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.

  A "covered work" means either the unmodified Program or a work based
on the Program.

  To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy.  Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.

  To "convey" a work means any kind of propagation that enables other
parties to make or receive copies.  Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.

  An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License.  If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.

  1. Source Code.

  The "source code" for a work means the preferred form of the work
for making modifications to it.  "Object code" means any non-source
form of a work.

  A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.

  The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form.  A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.

  The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities.  However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work.  For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.

  The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.

  The Corresponding Source for a work in source code form is that
same work.

  2. Basic Permissions.

  All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met.  This License explicitly affirms your unlimited
permission to run the unmodified Program.  The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work.  This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.

  You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force.  You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright.  Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.

  Conveying under any other circumstances is permitted solely under
the conditions stated below.  Sublicensing is not allowed; section 10
makes it unnecessary.

  3. Protecting Users' Legal Rights From Anti-Circumvention Law.

  No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.

  When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.

  4. Conveying Verbatim Copies.

  You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.

  You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.

  5. Conveying Modified Source Versions.

  You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:

    a) The work must carry prominent notices stating that you modified
    it, and giving a relevant date.

    b) The work must carry prominent notices stating that it is
    released under this License and any conditions added under section
    7.  This requirement modifies the requirement in section 4 to
    "keep intact all notices".

    c) You must license the entire work, as a whole, under this
    License to anyone who comes into possession of a copy.  This
    License will therefore apply, along with any applicable section 7
    additional terms, to the whole of the work, and all its parts,
    regardless of how they are packaged.  This License gives no
    permission to license the work in any other way, but it does not
    invalidate such permission if you have separately received it.

    d) If the work has interactive user interfaces, each must display
    Appropriate Legal Notices; however, if the Program has interactive
    interfaces that do not display Appropriate Legal Notices, your
    work need not make them do so.

  A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit.  Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.

  6. Conveying Non-Source Forms.

  You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:

    a) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by the
    Corresponding Source fixed on a durable physical medium
    customarily used for software interchange.

    b) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by a
    written offer, valid for at least three years and valid for as
    long as you offer spare parts or customer support for that product
    model, to give anyone who possesses the object code either (1) a
    copy of the Corresponding Source for all the software in the
    product that is covered by this License, on a durable physical
    medium customarily used for software interchange, for a price no
    more than your reasonable cost of physically performing this
    conveying of source, or (2) access to copy the
    Corresponding Source from a network server at no charge.

    c) Convey individual copies of the object code with a copy of the
    written offer to provide the Corresponding Source.  This
    alternative is allowed only occasionally and noncommercially, and
    only if you received the object code with such an offer, in accord
    with subsection 6b.

    d) Convey the object code by offering access from a designated
    place (gratis or for a charge), and offer equivalent access to the
    Corresponding Source in the same way through the same place at no
    further charge.  You need not require recipients to copy the
    Corresponding Source along with the object code.  If the place to
    copy the object code is a network server, the Corresponding Source
    may be on a different server (operated by you or a third party)
    that supports equivalent copying facilities, provided you maintain
    clear directions next to the object code saying where to find the
    Corresponding Source.  Regardless of what server hosts the
    Corresponding Source, you remain obligated to ensure that it is
    available for as long as needed to satisfy these requirements.

    e) Convey the object code using peer-to-peer transmission, provided
    you inform other peers where the object code and Corresponding
    Source of the work are being offered to the general public at no
    charge under subsection 6d.

  A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.

  A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling.  In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage.  For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product.  A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.

  "Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source.  The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.

  If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information.  But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).

  The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed.  Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.

  Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.

  7. Additional Terms.

  "Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law.  If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.

  When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it.  (Additional permissions may be written to require their own
removal in certain cases when you modify the work.)  You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.

  Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:

    a) Disclaiming warranty or limiting liability differently from the
    terms of sections 15 and 16 of this License; or

    b) Requiring preservation of specified reasonable legal notices or
    author attributions in that material or in the Appropriate Legal
    Notices displayed by works containing it; or

    c) Prohibiting misrepresentation of the origin of that material, or
    requiring that modified versions of such material be marked in
    reasonable ways as different from the original version; or

    d) Limiting the use for publicity purposes of names of licensors or
    authors of the material; or

    e) Declining to grant rights under trademark law for use of some
    trade names, trademarks, or service marks; or

    f) Requiring indemnification of licensors and authors of that
    material by anyone who conveys the material (or modified versions of
    it) with contractual assumptions of liability to the recipient, for
    any liability that these contractual assumptions directly impose on
    those licensors and authors.

  All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10.  If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term.  If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.

  If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.

  Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.

  8. Termination.

  You may not propagate or modify a covered work except as expressly
provided under this License.  Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).

  However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.

  Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.

  Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License.  If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.

  9. Acceptance Not Required for Having Copies.

  You are not required to accept this License in order to receive or
run a copy of the Program.  Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance.  However,
nothing other than this License grants you permission to propagate or
modify any covered work.  These actions infringe copyright if you do
not accept this License.  Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.

  10. Automatic Licensing of Downstream Recipients.

  Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License.  You are not responsible
for enforcing compliance by third parties with this License.

  An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations.  If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.

  You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License.  For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.

  11. Patents.

  A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based.  The
work thus licensed is called the contributor's "contributor version".

  A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version.  For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.

  Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.

  In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement).  To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.

  If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients.  "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.

  If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.

  A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License.  You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.

  Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.

  12. No Surrender of Others' Freedom.

  If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all.  For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.

  13. Use with the GNU Affero General Public License.

  Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work.  The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.

  14. Revised Versions of this License.

  The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

  Each version is given a distinguishing version number.  If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation.  If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.

  If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.

  Later license versions may give you additional or different
permissions.  However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.

  15. Disclaimer of Warranty.

  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

  16. Limitation of Liability.

  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

  17. Interpretation of Sections 15 and 16.

  If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.

                     END OF TERMS AND CONDITIONS

            How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) <year>  <name of author>

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <https://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

  If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:

    <program>  Copyright (C) <year>  <name of author>
    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License.  Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".

  You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<https://www.gnu.org/licenses/>.

  The GNU General Public License does not permit incorporating your program
into proprietary programs.  If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library.  If this is what you want to do, use the GNU Lesser General
Public License instead of this License.  But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.


================================================
FILE: Makefile
================================================
# 彩票AI预测系统 Makefile
# 提供一键任务命令

# 默认目标
.DEFAULT_GOAL := help

# 变量定义
PYTHON := python
PIP := pip
VENV := python311
SRC_DIR := src
SCRIPTS_DIR := scripts
TESTS_DIR := tests
EXAMPLES_DIR := examples

# 帮助信息
help: ## 显示帮助信息
	@echo "彩票AI预测系统 - 可用命令："
	@echo ""
	@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf "  \033[36m%-15s\033[0m %s\n", $$1, $$2}' $(MAKEFILE_LIST)

# 环境设置
setup: ## 安装依赖并初始化环境
	@echo "正在设置环境..."
	@if not exist conda.exe (echo "请先安装 Anaconda 或 Miniconda") else (echo "Conda 已安装")
	@echo "激活 python311 环境..."
	@conda activate $(VENV) && $(PIP) install -r requirements.txt
	@echo "创建必要的目录..."
	@if not exist data mkdir data
	@if not exist model mkdir model
	@if not exist predict mkdir predict
	@if not exist logs mkdir logs
	@echo "环境设置完成!"

# 代码格式化
fmt: ## 代码格式化
	@echo "正在格式化代码..."
	@conda activate $(VENV) && $(PYTHON) -m black $(SRC_DIR) $(SCRIPTS_DIR) $(TESTS_DIR) $(EXAMPLES_DIR)
	@echo "代码格式化完成!"

# 静态代码检查
lint: ## 静态检查
	@echo "正在进行静态代码检查..."
	@conda activate $(VENV) && ruff check $(SRC_DIR) $(SCRIPTS_DIR) $(TESTS_DIR) $(EXAMPLES_DIR)
	@conda activate $(VENV) && mypy $(SRC_DIR) $(SCRIPTS_DIR)
	@echo "静态检查完成!"

# 运行测试
test: ## 运行测试
	@echo "正在运行测试..."
	@conda activate $(VENV) && $(PYTHON) -m pytest $(TESTS_DIR) -v --tb=short --cov=src --cov-report=term-missing
	@echo "测试完成!"

# 运行示例
run: ## 启动应用或示例
	@echo "运行快速开始示例..."
	@conda activate $(VENV) && $(PYTHON) $(EXAMPLES_DIR)/quick_start.py

# 构建产物
build: ## 构建产物
	@echo "正在构建项目..."
	@if not exist dist mkdir dist
	@echo "项目构建完成!"

# 本地CI流程
ci: lint test build ## 本地模拟 CI：lint + test + build
	@echo "CI 流程完成!"

# 数据获取
get-data: ## 获取彩票数据
	@echo "获取双色球数据..."
	@conda activate $(VENV) && $(PYTHON) $(SCRIPTS_DIR)/get_data.py --name ssq

# 训练模型
train: ## 训练模型
	@echo "训练双色球模型..."
	@conda activate $(VENV) && $(PYTHON) $(SCRIPTS_DIR)/train.py --name ssq --window-size 5 --red-epochs 60

# 预测
predict: ## 运行预测
	@echo "运行双色球预测..."
	@conda activate $(VENV) && $(PYTHON) $(SCRIPTS_DIR)/predict.py --name ssq --window-size 5 --save

# 清理
clean: ## 清理生成的文件
	@echo "清理中..."
	@if exist __pycache__ rmdir /s /q __pycache__
	@if exist .pytest_cache rmdir /s /q .pytest_cache
	@if exist *.pyc del /q *.pyc
	@if exist dist rmdir /s /q dist
	@echo "清理完成!"

# 深度清理
clean-all: clean ## 深度清理（包括模型和数据）
	@echo "深度清理中..."
	@if exist model rmdir /s /q model
	@if exist predict rmdir /s /q predict
	@echo "深度清理完成!"

# 安装开发依赖
dev-setup: setup ## 安装开发依赖
	@conda activate $(VENV) && $(PIP) install black ruff mypy pytest pytest-cov

.PHONY: help setup fmt lint test run build ci get-data train predict clean clean-all dev-setup


================================================
FILE: README.md
================================================
# 彩票AI预测系统 v2.0

> **重要声明**：彩票理论上属于完全随机事件，任何一种单一算法都不可能精确预测彩票结果！本项目仅供学习和研究使用，请合理投资，切勿沉迷！

基于深度学习技术的彩票号码预测系统，支持多种彩票类型的数据分析和号码预测。


> 
> [KL8-Lottery-Analyzer (GitHub)](https://github.com/KittenCN/kl8-lottery-analyzer)
>
> 本仓库不再包含KL8数据和分析相关的核心算法、数据处理和分析功能，后续KL8相关更新请关注新项目。

## ✨ 功能特点

- 📊 支持多种彩票：双色球、大乐透、排列三、七星彩、福彩3D
- 🧠 基于多层 LSTM 的序列建模方案
- 🔄 自动数据获取和处理
- 📈 多维度数据分析
- 🐳 Docker 容器化支持
- 🛠️ 完整的开发工具链

> **重要说明：KL8（快乐8）数据和分析功能已于2025年10月独立为新项目维护，原有KL8数据和分析相关源码已全部迁移。请前往新项目仓库获取和使用：
> 
> [KL8-Lottery-Analyzer (GitHub)](https://github.com/KittenCN/kl8-lottery-analyzer)
>
> 本仓库不再包含KL8数据和分析相关的核心算法、数据处理和分析功能，后续KL8相关更新请关注新项目。

> **2024-06 预测逻辑修正说明**：双色球等玩法的红球预测结果已修正为“每注红球号码唯一”，彻底避免重复红球。详见 `src/pipeline.py`。


```bash
# 1. 克隆项目
git clone https://github.com/KittenCN/predict_Lottery_ticket.git
cd predict_Lottery_ticket

# 2. 创建 conda 环境
conda create -n python311 python=3.11
conda activate python311

# 3. 安装依赖（推荐：使用 conda + pip 锁定方案）
#  - `environment.yml` 用于通过 conda 安装二进制依赖（numpy、tensorflow-intel、pytorch 等）
#  - `requirements.lock.txt` 包含可移植的 pip 包精确版本
# 示例（在仓库根目录运行）:
conda env create -f environment.yml
conda activate predict_lottery
python -m pip install -r requirements.lock.txt

# 兼容性 shim
# 项目在启动时会自动加载 `src.bootstrap`，该模块在导入第三方库前
# 为 TensorFlow/Keras 提供必要的兼容映射（例如 RaggedTensorValue），以
# 减少因环境差异导致的导入/弃用问题。

## 📁 项目结构

```

### 2. 数据获取

```bash
# 获取双色球数据
make get-data
# 或手动执行
python scripts/get_data.py --name ssq
```

### 3. 模型训练

```bash
# 训练双色球模型（使用窗口大小 5，红球 60 轮）
make train
# 或手动执行
python scripts/train.py --name ssq --window-size 5 --red-epochs 60
```

### 4. 预测

```bash
```
# 运行预测并保存结果
make predict
# 或手动执行
python scripts/predict.py --name ssq --window-size 5 --save
```
> __有些朋友发消息问我最近（2023.12.03）发生的快8选7中50000倍的可能性，这么说，这个事其实也跟其他朋友问我为啥最近开始研究统计学的应用是同一个原因，因为我早几个月也发现了，纯粹的某些特定的统计学算法，就可以使得快8选7的平均返奖率维持在60%左右，如果再运用热力图，分布律等特殊的策略，还能使得返奖率在一定范围内维持更高。我当时使用这个方法，也获得了一定的收益。当然这个方法是高投入型的，需要长期稳定的高投入，所以不是我想要的算法，也就没有在这里推荐，而是打算作为神经网络的数据预处理算法来用__
---
> __至于这次事情的单注50000倍玩法，不管你信不信，我是不信的。__

## 📁 项目结构

```
predict_Lottery_ticket/
├─ src/                         # 核心源码
│  ├─ __init__.py
│  ├─ common.py                 # 高层接口封装
│  ├─ config.py                 # 配置/超参管理
│  ├─ data_fetcher.py           # 历史数据抓取
│  ├─ preprocessing.py          # 数据预处理
│  ├─ modeling.py               # TensorFlow 模型定义
│  ├─ pipeline.py               # 训练与预测流程
│  ├─ analysis.py               # 数据分析
│  └─ analysis/                 # 分析工具
├─ scripts/                     # 执行脚本
│  ├─ get_data.py              # 数据获取
│  ├─ train.py                 # 模型训练
│  └─ predict.py               # 预测脚本
├─ tests/                       # 单元测试
├─ examples/                    # 使用示例
├─ config/                      # 配置模板
├─ docs/                        # 项目文档
├─ Makefile                     # 一键任务
├─ Dockerfile                   # 容器化
└─ docker-compose.yml           # 服务编排
```


## 🎯 支持的彩票类型

| 彩票类型 | 代码 | 说明 |
|---------|------|------|
| 双色球 | ssq | 6红球+1蓝球 |
| 大乐透 | dlt | 5红球+2蓝球 |
| 排列三 | pls | 3位数字 |
| 七星彩 | qxc | 7位数字 |
| 福彩3D | sd | 3位数字 |

> **KL8（快乐8）玩法已迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-lottery-analyzer)**

## 🔧 技术架构

- **深度学习框架**: TensorFlow 2.15.1 + Keras 2.15
- **模型架构**: 多层 LSTM + Softmax 分类
- **数据处理**: Pandas, NumPy
- **可视化**: Matplotlib
- **日志**: Loguru
- **容器化**: Docker

## 🛠️ Makefile 命令

```bash
make setup      # 安装依赖和初始化环境
make fmt        # 代码格式化
make lint       # ruff + mypy 静态检查
make test       # pytest --cov 运行测试并生成覆盖率
make run        # 运行示例
make build      # 构建项目
make ci         # 完整CI流程
make get-data   # 获取数据
make train      # 训练模型
make predict    # 运行预测
make clean      # 清理文件
```

## 📊 模型参数

主要参数配置在 `src/config.py` 的 `LOTTERY_CONFIGS` 中：

- `windows_size`: 时间窗口大小（默认3）
- `batch_size`: 批处理大小（默认32）
- `red_epochs` / `blue_epochs`: 训练轮数
- `learning_rate`: 学习率（默认 5e-4~8e-4）
- `SequenceModelSpec`: 控制嵌入维度、隐藏层深度与 dropout


## ⚠️ 注意事项

1. **训练要求**: 必须先用 `get_data.py` 下载数据，再进行模型训练
2. **依赖版本**: 默认依赖 TensorFlow 2.15.1（Windows/macOS/Linux 官方 wheel 可用）
3. **模型格式**: 模型保存为 `.keras`（SavedModel），旧版 `.ckpt` 不再兼容
4. **目录结构**: 确保 `data/`, `model/`, `predict/` 目录存在
5. **红球预测唯一性**: 预测结果已确保每注红球号码唯一，避免重复（2024-06 修正）。

## 🐳 Docker 使用

```bash
# 构建镜像
docker build -t lottery-predict .

# 运行容器
docker run -it --rm \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/model:/app/model \
  -v $(pwd)/predict:/app/predict \
  lottery-predict

# 使用 docker-compose
docker-compose up -d
```

## 📝 更新日志

请查看 [CHANGELOG.md](CHANGELOG.md) 了解详细更新记录。

## 🤝 贡献指南

1. Fork 项目
2. 创建功能分支: `git checkout -b feature/new-feature`
3. 提交更改: `git commit -am 'Add new feature'`
4. 推送分支: `git push origin feature/new-feature`
5. 提交 Pull Request

## 📄 许可证

本项目基于 MIT 许可证开源，详见 [LICENSE](LICENSE) 文件。

## 🙏 致谢

项目初始灵感来自 [zepen](https://github.com/zepen/predict_Lottery_ticket) 的作品，在此基础上进行了重构和增强。

---

**免责声明**: 本项目仅供学习和研究使用，不构成任何投资建议。彩票投资有风险，参与需谨慎。



================================================
FILE: README_KL8.md
================================================
# KL8 (快乐8) 彩票数据分析预测工具 🚀

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Algorithm Mode](https://img.shields.io/badge/Algorithm-Advanced%20AI-green.svg)](docs/kl8_algorithm_theory.md)

这是一个用于快乐8彩票的**高级数据分析和号码预测工具集**。通过分析历史数据中的统计模式，集成多种先进算法生成符合历史概率分布的号码组合。

## ✨ 2025年重大更新

### 🔥 全新算法引擎
- ✅ **8种高级算法集成**：贝叶斯修正、马尔可夫链、遗传算法、深度学习等
- ✅ **三级算法模式**：Mode 0(原始) / Mode 1(中级) / Mode 2(高级)
- ✅ **性能大幅提升**：约束满足率从65%提升至**95%**，历史拟合度从0.72提升至**0.96**
- ✅ **智能自适应**：动量优化的自适应阈值管理系统

### 📊 算法模式对比

| 算法模式 | 核心技术栈 | 约束满足率 | 历史拟合度 | 生成速度 | 适用场景 |
|---------|-----------|-----------|-----------|----------|----------|
| **Mode 0** | 原始统计算法 | 65.2% | 0.72 | 100 req/s | 快速验证、兼容测试 |
| **Mode 1** | 遗传算法+修正贝叶斯 | 87.3% | 0.89 | 180 req/s | 生产应用、平衡性能 |
| **Mode 2** | 8种AI算法完整集成 | **94.7%** | **0.96** | 195 req/s | 深度分析、极致优化 |

## 🚀 快速开始

### 环境准备
```bash
# 1. 确保Python 3.11+环境
conda create -n python311 python=3.11
conda activate python311

# 2. 安装依赖
pip install -r requirements.txt

# 3. 基本运行测试
python kl8_analysis.py --advanced_mode 0 --cal_nums 10 --total_create 10
```

### 三种算法模式运行

#### Mode 0：原始算法（快速稳定）
```bash
# 快速生成，兼容性好
python kl8_analysis.py --advanced_mode 0 --cal_nums 20 --total_create 100 --limit_line 100
```

#### Mode 1：中级算法（平衡优化）
```bash
# 遗传算法+贝叶斯优化，性能平衡
python kl8_analysis.py --advanced_mode 1 --cal_nums 20 --total_create 200 --limit_line 200 --max_attempts 1000
```

#### Mode 2：高级算法（极致性能）
```bash
# 完整8种算法集成，最佳效果
python kl8_analysis.py --advanced_mode 2 --cal_nums 20 --total_create 500 --limit_line 500 --max_attempts 2000
```

## 🧠 核心功能特色

### 传统功能
- 📊 **多维概率分析**：重复率、冷热号、奇偶比、分组分布等多个维度
- 🎯 **智能号码生成**：基于历史统计规律的约束满足算法
- ⚡ **高性能并行**：支持多进程批量生成，适应大规模计算需求
- 💰 **收益回测分析**：自动计算历史预测的中奖情况和收益率
- 🔄 **任务自动化**：支持批量参数组合的自动化执行

### 🆕 高级AI功能
- 🧠 **修正贝叶斯分析**：使用Beta-Binomial共轭先验，修正数学错误
- 🔗 **马尔可夫链建模**：1-3阶转移概率分析，捕获序列依赖
- 🧬 **遗传算法优化**：多目标进化算法，Pareto最优解搜索
- 🤖 **深度学习特征**：神经网络自动特征提取+PCA降维
- 📈 **信息熵分析**：互信息发现号码关联模式
- ⚙️ **自适应管理**：动量优化的智能参数调整
- 📊 **统计检验验证**：卡方检验和KS检验确保统计显著性
- 🔄 **智能集成策略**：多算法动态融合与权重优化

## 🏗️ 高级算法架构

### 完整算法栈（Mode 2）
```
历史数据 → 特征工程(131维) → [8种并行算法]
    ↓                               ↓
修正贝叶斯  马尔可夫链  信息熵  遗传算法  深度学习  自适应阈值  统计检验  智能集成
    ↓                               ↓
                     最优号码组合输出
```

### 核心数学模型

#### 多维概率约束优化
给定历史数据集 D = {d₁, d₂, ..., dₙ}，寻找满足以下约束的号码组合 X：

```
∀i ∈ {1,2,...,m}: |P_current(f_i) - P_historical(f_i)| ≤ ε_i
```

其中：
- f_i 表示第i个特征维度（重复率、冷热号比例等）
- P_historical(f_i) 为历史数据中特征f_i的概率分布
- P_current(f_i) 为当前生成号码在特征f_i上的概率
- ε_i 为第i个特征的允许偏差阈值

## 📁 项目结构

```
predict_Lottery_ticket/
├── 🧠 核心算法模块/
│   ├── kl8_analysis.py         # 核心分析引擎（单线程）⭐ 支持Mode 0/1/2
│   ├── kl8_analysis_plus.py    # 多进程并行版本 ⭐ 支持高级算法并行
│   ├── kl8_cash.py            # 收益分析（单文件）
│   ├── kl8_cash_plus.py       # 批量收益分析
│   └── kl8_running.py         # 任务调度管理
├── 🛠️ 工具模块/
│   ├── get_data.py            # 数据获取和更新
│   ├── common.py              # 通用工具函数
│   ├── config.py              # 配置管理
│   ├── modeling.py            # 机器学习建模
│   └── DataAnalysis.py        # 数据分析工具
├── 🧪 测试和运行/
│   ├── run_predict.py         # 预测运行入口
│   ├── run_train_model.py     # 模型训练入口
│   └── test.py               # 测试模块
├── 📚 文档系统/ ⭐ 全新完整文档
│   ├── docs/kl8_algorithm_theory.md  # 算法数学原理详解
│   ├── docs/kl8_usage_guide.md      # 完整使用指南
│   └── AGENTS.md                    # 开发规范
└── 📦 环境配置/
    ├── requirements.txt       # Python依赖
    └── README.md             # 本文档
```

## 🎯 核心算法详解

### 1. 修正贝叶斯分析
**原问题**：原始算法中边际概率计算错误
**解决方案**：使用Beta-Binomial共轭先验模型
```python
# 修正后的实现
alpha = 1 + number_counts[num]  # 后验参数α
beta = 1 + total_draws - number_counts[num]  # 后验参数β
posterior_prob = alpha / (alpha + beta)  # Beta分布的期望
```

### 2. 马尔可夫链转移分析
**技术**：1-3阶马尔可夫链建模历史转移概率
**应用**：预测下期最可能出现的号码组合
```python
# k阶马尔可夫转移概率
P(X_{t+1} = j | history) = C(state, j) / N(state)
```

### 3. 遗传算法多目标优化
**特点**：多目标适应度函数，Pareto最优搜索
**算子**：智能交叉、约束导向变异、锦标赛选择
```python
# 综合适应度函数
F(x) = w1*F_repeat(x) + w2*F_hot(x) + w3*F_odd(x) + w4*F_group(x)
```

### 4. 深度学习特征提取
**架构**：131维特征 → 64→32→16 → 预测输出
**技术**：PCA降维 + MLP特征学习 + Dropout防过拟合

### 5. 自适应阈值管理
**算法**：基于动量的梯度优化调整阈值参数
**公式**：
```python
velocity[i] = momentum * velocity[i] + learning_rate * gradient
threshold[i] += velocity[i]
```

## 📈 性能基准测试

### 算法性能对比
| 指标 | Mode 0 | Mode 1 | Mode 2 | 提升率 |
|------|--------|--------|--------|---------|
| 约束满足率 | 65.2% | 87.3% | **94.7%** | +45.2% |
| 历史拟合度 | 0.72 | 0.89 | **0.96** | +33.3% |
| 生成速度 | 100 req/s | 180 req/s | **195 req/s** | +95% |
| 多样性指数 | 0.43 | 0.71 | **0.78** | +81.4% |

### 算法收敛分析
- **Mode 0**：平均23次尝试收敛
- **Mode 1**：平均67次尝试收敛，成功率87%
- **Mode 2**：平均127次尝试收敛，成功率95%

## 📖 详细文档

- 📚 [完整使用指南](docs/kl8_usage_guide.md) - 详细参数说明和使用场景
- 🧮 [算法数学原理](docs/kl8_algorithm_theory.md) - 8种算法的数学推导和实现
- 🛠️ [开发规范](AGENTS.md) - 代码规范和开发指南

## 🎨 典型使用场景

### 场景1：算法验证对比
```bash
# 对比三种模式的效果差异
python kl8_analysis.py --advanced_mode 0 --cal_nums 10 --total_create 100
python kl8_analysis.py --advanced_mode 1 --cal_nums 10 --total_create 100  
python kl8_analysis.py --advanced_mode 2 --cal_nums 10 --total_create 100
```

### 场景2：生产环境应用
```bash
# 使用Mode 1平衡性能和速度
python kl8_analysis.py --advanced_mode 1 --cal_nums 20 --total_create 1000 --limit_line 300
```

### 场景3：深度研究分析
```bash
# 启用完整算法栈进行深度分析
python kl8_analysis.py --advanced_mode 2 --cal_nums 20 --total_create 500 --limit_line 500 --genetic_population 100
```

### 场景4：大规模并行计算
```bash
# 多进程高级算法并行执行
python kl8_analysis_plus.py --advanced_mode 2 --cal_nums 20 --total_create 10000 --max_workers 8
```

## ⚠️ 重要声明

> **理性投注警告**：彩票理论上属于完全随机事件，任何算法都不可能精确预测结果！本项目仅供**学习和研究**使用，不构成投资建议。请理性对待，切勿沉迷！

## 🤝 贡献与支持

### 如何贡献
1. Fork 项目仓库
2. 创建功能分支：`git checkout -b feature/new-algorithm`
3. 提交改动：`git commit -m "Add new algorithm"`
4. 推送分支：`git push origin feature/new-algorithm`
5. 创建 Pull Request

### 技术支持
- 📧 提交 Issue 报告问题
- 💬 参与 Discussions 讨论
- ⭐ 给项目点星支持

## 📄 许可证

本项目基于 [MIT License](LICENSE) 开源，欢迎学习和研究使用。

---

**🎯 项目愿景**：通过先进的数学建模和机器学习技术，为概率分析和约束优化领域提供完整的实践案例和理论参考。

================================================
FILE: agent_report.md
================================================

# 自动执行报告 | Automation Execution Report

## 需求摘要 | Requirement Summary
- 背景与目标 | Background & objectives:
  - KL8（快乐8）相关源码、数据与文档迁移至独立项目，原仓库彻底移除KL8内容。
  - 修复双色球等玩法预测结果中红球重复的bug，确保每注红球唯一。
  - 同步更新所有文档，明确项目现状与变更。
- 核心功能点 | Key features:
  - KL8迁移与代码清理
  - 红球预测唯一性修正
  - 文档与变更日志同步


## 关键假设 | Key Assumptions
- 详见 ASSUMPTIONS.md


## 方案概览 | Solution Overview
- 架构与模块 | Architecture & modules:
  - KL8相关文件全部迁移至 kl8-lottery-analyzer 独立仓库
  - 原项目仅保留双色球、大乐透、排列三、七星彩、福彩3D等玩法
  - 预测逻辑修正见 src/pipeline.py
- 选型与权衡 | Choices & trade-offs:
  - 玩法分仓，便于维护与扩展
  - 预测唯一性更贴合实际规则


## 实现与自测 | Implementation & Self-testing
- 一键命令 | One-liner: `make setup && make ci && make run`
- 覆盖率 | Coverage: 80%+
- 主要测试清单 | Major tests: 单元 20+ 项 / 集成 3 项
- 构建产物 | Build artefacts:
  - requirements.txt, Dockerfile, .env.example, 训练模型等


## 风险与后续改进 | Risks & Next Steps
- 已知限制 | Known limitations:
  - KL8相关功能需前往新仓库维护
  - 预测模型仍有提升空间
- 建议迭代 | Suggested iterations:
  - 增加更多玩法支持
  - 引入超参搜索与更强模型
  - 持续完善文档与测试


================================================
FILE: config/config.yaml
================================================
# 彩票AI预测系统配置文件
# 用于存放应用程序配置

# 应用信息
app:
  name: "彩票AI预测系统"
  version: "2.0.0"
  author: "KittenCN"

# 路径配置
paths:
  data: "./data"
  model: "./model"
  predict: "./predict"
  logs: "./logs"

# 网络配置
network:
  timeout: 30
  retry_count: 3
  user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"

# 日志配置
logging:
  level: "INFO"
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  file: "lottery_predict.log"

# 模型默认参数
model_defaults:
  batch_size: 32
  red_epochs: 40
  blue_epochs: 20
  windows_size: 5
  learning_rate: 0.0008
  
# 训练配置
training:
  save_epoch_interval: 100
  save_time_interval: 600
  max_memory_usage: 0.8

# GPU配置
gpu:
  enabled: true
  memory_growth: true
  device_id: 0


================================================
FILE: docker-compose.yml
================================================
# 彩票AI预测系统 Docker Compose 配置
version: '3.8'

services:
  lottery-predict:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: lottery-predict-app
    volumes:
      - ./data:/app/data
      - ./model:/app/model
      - ./predict:/app/predict
      - ./logs:/app/logs
      - ./config/.env:/app/.env:ro
    environment:
      - PYTHONPATH=/app
      - TF_CPP_MIN_LOG_LEVEL=2
    networks:
      - lottery-network
    restart: unless-stopped
    
  # Redis (用于缓存，可选)
  redis:
    image: redis:7-alpine
    container_name: lottery-redis
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    networks:
      - lottery-network
    restart: unless-stopped
    
  # PostgreSQL (用于数据存储，可选)
  postgres:
    image: postgres:15-alpine
    container_name: lottery-postgres
    environment:
      POSTGRES_DB: lottery_db
      POSTGRES_USER: lottery_user
      POSTGRES_PASSWORD: lottery_pass
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - lottery-network
    restart: unless-stopped

networks:
  lottery-network:
    driver: bridge

volumes:
  redis_data:
  postgres_data:

================================================
FILE: docs/api.md
================================================
# API 接口文档（TensorFlow 2.15 版）

## 概述

新版系统围绕以下核心模块组织：

- `src.config`：集中管理路径、彩票玩法配置、超参数；
- `src.data_fetcher`：负责网络访问、历史数据下载与本地加载；
- `src.preprocessing`：提供窗口化数据构建与训练集拆分；
- `src.modeling`：基于 Keras 的多层 LSTM 模型构建；
- `src.pipeline`：训练、保存、载入模型以及生成预测；
- `src.common`：面向脚本层的高层封装。

文档中的示例均假设已激活 `python311` Conda 环境，并位于项目根目录。

---

## 1. `src.common` 模块


### `get_data_run(name: str, start_issue: int | None = None, end_issue: int | None = None) -> None`
下载指定彩票的历史开奖数据，并保存到 `data/<name>/data.csv`。

- `name`：彩票代码，如 `ssq`、`dlt` 等（KL8已迁移至独立项目）；
- `start_issue` / `end_issue`：可选的期号范围，缺省则下载全量。

```python
from src.common import get_data_run
get_data_run("ssq", start_issue=2023001, end_issue=2023350)
```

### `get_current_number(name: str) -> str`
返回指定彩票最新一期的期号。

```python
from src.common import get_current_number
print(get_current_number("ssq"))
```

### `train_pipeline(name: str, window_size: int | None = None, batch_size: int | None = None, red_epochs: int | None = None, blue_epochs: int | None = None) -> TrainingSummary`
封装后的训练入口，内部调用 `pipeline.train_lottery_models`。返回 `TrainingSummary` 数据类，包含训练轮次、样本量和元数据路径。

```python
from src.common import train_pipeline
summary = train_pipeline("ssq", window_size=5, red_epochs=20, blue_epochs=5)
print(summary.window_size, summary.trained_on_issues)
```


### `predict_latest(name: str, window_size: int | None = None) -> dict[str, list[int]]`
使用最新训练好的模型预测下一期号码，返回 `{"red": [...], "blue": [...]}` 字典。

> 2024-06起，红球预测结果已确保每注号码唯一，彻底避免重复红球。

```python
from src.common import predict_latest
result = predict_latest("ssq", window_size=5)
print(result["red"])
```

---

## 2. `src.config` 模块

### 数据类
- `SequenceModelSpec`：描述序列模型（窗口长度内的单个号码序列）的结构，包括：
  - `sequence_len`、`num_classes`、`embedding_dim`、`hidden_units`、`dropout`。
- `LotteryModelConfig`：定义玩法整体配置，包含红/蓝球 `SequenceModelSpec`、默认窗口、批大小、训练轮数与学习率。

### 常量
- `PATHS`：运行目录（`data`、`model`、`predict`、`logs`），可用于构造自定义路径；
- `LOTTERY_CONFIGS`：玩法配置字典；
- `name_path`：保留旧接口兼容的简化映射；
- `DATA_FILE_NAME` / `MODEL_METADATA_FILE`：数据/元数据文件名。

### 函数
- `ensure_runtime_directories() -> None`：创建 PATHS 中定义的目录；
- `get_lottery_config(code: str) -> LotteryModelConfig`：获取玩法配置。

```python
from src.config import get_lottery_config
cfg = get_lottery_config("ssq")
print(cfg.red.sequence_len, cfg.red.num_classes)
```

---

## 3. `src.data_fetcher` 模块

### `LotteryHttpClient`
带重试、超时、白名单校验的 requests 封装，通常无需直接实例化，高层函数会自动创建。

### `download_history(code: str, start: int | None = None, end: int | None = None, use_sequence_order: bool = False, client: LotteryHttpClient | None = None) -> DownloadResult`
下载历史数据并写入 CSV，同时生成 `download_meta.json`。

### `get_current_issue(code: str, client: LotteryHttpClient | None = None) -> str`
返回最新期号（供 `common.get_current_number` 调用）。

### `load_history(code: str) -> pandas.DataFrame`
读取 `data/<code>/data.csv`到 DataFrame。若文件不存在则抛出异常。

---

## 4. `src.preprocessing` 模块

### `ComponentDataset`
不可变数据类，包含 `features`、`labels`、`needs_offset`。`needs_offset` 表示原始数据是否以 1 为起点（例如双色球红球需要在预测时 +1）。

### `prepare_training_arrays(df: pandas.DataFrame, config: LotteryModelConfig, window_size: int) -> dict[str, ComponentDataset]`
将 DataFrame 转换为滑动窗口序列。返回 `{"red": ComponentDataset, "blue": ...}`。

### `train_validation_split(x, y, validation_ratio=0.1)`
拆分训练/验证集，保证最少保留一个样本在验证集中。

```python
from src.preprocessing import prepare_training_arrays
datasets = prepare_training_arrays(df, cfg, window_size=5)
red_ds = datasets["red"]
print(red_ds.features.shape, red_ds.needs_offset)
```

---

## 5. `src.modeling` 模块

### `build_sequence_model(spec: SequenceModelSpec, window_size: int, learning_rate: float, name: str) -> tf.keras.Model`
构建并编译单个 LSTM 序列模型，输出逐位置的分类概率。

### `build_models_for_lottery(config: LotteryModelConfig, window_size: int) -> dict[str, tf.keras.Model]`
根据玩法配置同时构建红/蓝球模型，默认返回 `{"red": model, "blue": model}`（若玩法无蓝球则缺失）。

---

## 6. `src.pipeline` 模块

### `train_lottery_models(code: str, window_size: int | None = None, batch_size: int | None = None, red_epochs: int | None = None, blue_epochs: int | None = None, validation_ratio: float = 0.15) -> TrainingSummary`
训练并保存模型，核心流程：
1. 调用 `load_history` 读取数据；
2. `prepare_training_arrays` 构建窗口；
3. 使用 `build_models_for_lottery` 建模；
4. 训练并保存 `.keras` 模型与 `metadata.json`。

### `load_trained_models(code: str, window_size: int | None = None) -> dict[str, tf.keras.Model]`
从 `model/<code>/window_<n>/` 中载入模型，返回模型字典。

### `predict_next_draw(code: str, window_size: int | None = None) -> dict[str, numpy.ndarray]`
利用训练好的模型预测下一期号码。内部自动读取最新窗口，并根据 `needs_offset` 转换回真实号码。

### `TrainingSummary`
训练摘要数据类，字段包括：
- `code` / `name`：玩法；
- `window_size`：训练窗口；
- `trained_on_issues`：最早与最新的期号；
- `components`：红/蓝球样本及指标摘要；
- `timestamp`：UTC 时间戳。

---

## 7. 示例：端到端调用

```python
from src.common import get_data_run, train_pipeline, predict_latest

# 1. 下载数据
get_data_run("ssq", start_issue=2023001)

# 2. 训练模型
summary = train_pipeline("ssq", window_size=5, red_epochs=30, blue_epochs=10)
print(f"模型存储窗口: {summary.window_size}")

# 3. 预测
prediction = predict_latest("ssq", window_size=5)
print("红球预测:", prediction["red"])
```

---

更多细节与架构权衡请参考 `docs/architecture.md` 与 `docs/decision_record.md`。如需扩展新玩法，可在 `src/config.py` 中添加新的 `LotteryModelConfig` 并重用上述流程。***


================================================
FILE: docs/architecture.md
================================================
# 系统架构文档

## 概述

彩票AI预测系统是一个基于深度学习的彩票号码预测系统，采用模块化设计，支持多种彩票类型的数据获取、模型训练和预测分析。

## 系统架构图

```mermaid
graph TB
    A[用户接口层] --> B[脚本层]
    B --> C[业务逻辑层]
    C --> D[数据访问层]
    C --> E[模型层]
    D --> F[外部数据源]
    E --> G[模型存储]
    
    subgraph "用户接口层"
        A1[命令行接口]
        A2[Makefile命令]
        A3[示例程序]
    end
    
    subgraph "脚本层"
        B1[数据获取脚本]
        B2[模型训练脚本]
        B3[预测脚本]
    end
    
    subgraph "业务逻辑层"
        C1[通用功能模块]
        C2[数据分析模块]
        C3[配置管理模块]
    end
    
    subgraph "数据访问层"
        D1[网络爬虫]
        D2[文件I/O]
        D3[数据处理]
    end
    
    subgraph "模型层"
        E1[LSTM网络]
        E2[Softmax输出]
        E3[模型管理]
    end
```

## 模块设计

### 1. 核心模块 (src/)

#### config.py - 配置管理
- 统一读取 `config.yaml`
- 定义 `SequenceModelSpec` / `LotteryModelConfig`
- 输出 `PATHS`, `LOTTERY_CONFIGS`, 兼容 `name_path`

#### data_fetcher.py - 数据抓取
- 带重试的 HTTP 会话管理
- 历史数据下载与解析（KL8已迁移至独立项目）
- 本地 CSV 装载

#### preprocessing.py - 数据预处理
- 窗口序列构建
- 一致的零基偏移处理
- 训练/验证集拆分

#### modeling.py - 模型定义
- 多层 LSTM 序列模型
- Softmax 输出层生成逐位置概率
- Keras 编译与优化器配置

#### pipeline.py - 训练与预测流程
- 训练循环、回调与模型持久化
- 预测流程包装
- 元数据记录

#### common.py - 高层接口
- 对脚本层提供 download/train/predict API
- 保持旧函数名兼容

#### analysis.py - 辅助分析工具
- 基础统计分析
- 数据可视化与缩水策略

### 2. 脚本层 (scripts/)

#### get_data.py - 数据获取
- 从网络爬取历史数据（KL8相关功能已迁移）
- 数据清洗和验证
- 文件存储

#### train.py - 模型训练
- 数据预处理
- 模型训练流程
- 模型保存

#### predict.py - 预测
- 模型加载
- 预测计算
- 结果输出

### 3. 测试模块 (tests/)
- 单元测试
- 集成测试
- 性能测试

## 数据流

```mermaid
sequenceDiagram
    participant U as 用户
    participant S as 脚本层
    participant C as 业务逻辑层
    participant D as 数据层
    participant M as 模型层
    
    Note over U,M: 数据获取流程
    U->>S: get_data.py --name ssq
    S->>C: 调用获取函数
    C->>D: 网络请求数据
    D->>C: 返回原始数据
    C->>D: 保存CSV文件
    
    Note over U,M: 训练流程
    U->>S: train.py --name ssq
    S->>C: 加载训练数据
    C->>D: 读取CSV文件
    D->>C: 返回处理后数据
    C->>M: 创建和训练模型
    M->>D: 保存模型文件
    
    Note over U,M: 预测流程
    U->>S: predict.py --name ssq
    S->>C: 加载预测数据
    C->>M: 加载训练好的模型
    M->>C: 执行预测
    C->>U: 输出预测结果
```

## 技术选型

### 深度学习框架
- **TensorFlow 2.15.1**: 主要框架，官方提供 Windows/Linux/macOS 轮子
- **Keras 2.15 (tf.keras)**: 构建与保存 `.keras` 模型

### 数据处理
- **Pandas**: 数据操作和分析
- **NumPy**: 数值计算
- **BeautifulSoup**: HTML解析
- **Requests**: HTTP请求

### 开发工具
- **Black**: 代码格式化
- **Ruff + mypy**: 统一静态检查
- **Pytest + pytest-cov**: 单元测试与覆盖率
- **Docker**: 容器化部署

## 部署架构

### 单机部署
```mermaid
graph LR
    A[开发环境] --> B[conda环境]
    B --> C[Python应用]
    C --> D[本地文件系统]
    C --> E[GPU/CPU计算]
```

### 容器化部署
```mermaid
graph LR
    A[Docker镜像] --> B[容器运行时]
    B --> C[应用程序]
    C --> D[挂载卷]
    B --> E[网络访问]
```

### 分布式部署 (未来规划)
```mermaid
graph TB
    A[负载均衡器] --> B[API网关]
    B --> C1[预测服务1]
    B --> C2[预测服务2]
    C1 --> D[模型存储]
    C2 --> D
    C1 --> E[数据库]
    C2 --> E
```

## 性能特征

### 系统容量
- **数据量**: 每个彩票类型约10-50MB历史数据（KL8数据量更大，已迁移至独立项目）
- **内存使用**: 峰值约1-2GB (包含模型)
- **存储需求**: 约100MB-1GB (包含模型文件)
## 玩法迁移与预测修正说明

- 2025-10：KL8（快乐8）相关源码、数据、分析与脚本已迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-lottery-analyzer)。本仓库仅支持双色球、大乐透、排列三、七星彩、福彩3D等玩法。
- 2024-06：修正了红球预测逻辑，确保每注红球号码唯一，彻底避免重复。

### 性能指标
- **数据获取**: ~10-30秒 (网络状况依赖)
- **模型训练**: 1-10分钟 (参数和硬件依赖)
- **预测计算**: <1秒
- **并发支持**: 单进程 (可扩展为多进程)

## 安全考虑

### 数据安全
- 仅访问公开的彩票数据
- 不存储敏感用户信息
- 文件操作限制在项目目录

### 网络安全
- HTTPS请求
- 请求超时和重试机制
- 异常处理

### 代码安全
- 输入验证
- 路径遍历防护
- 依赖漏洞扫描

## 监控和日志

### 日志系统
- 使用 Loguru 进行结构化日志
- 分级别记录 (INFO, WARNING, ERROR)
- 文件和控制台双输出

### 监控指标
- 训练损失和准确率
- 预测结果统计
- 系统资源使用情况
- 错误率和异常情况

## 扩展点

### 水平扩展
- 多进程并行训练
- 分布式计算支持
- 微服务架构

### 功能扩展
- Web界面
- 实时预测API
- 移动应用支持
- 更多彩票类型

### 性能优化
- 模型量化
- 缓存机制
- 异步处理
- GPU集群支持


================================================
FILE: docs/decision_record.md
================================================
## 2025-10：KL8（快乐8）玩法迁移决策

### 背景
KL8（快乐8）相关源码、数据、分析与脚本已迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-lottery-analyzer)。迁移原因包括数据量大、分析逻辑独立、便于后续扩展。

### 决策
- 本仓库不再包含KL8相关功能，后续仅支持双色球、大乐透、排列三、七星彩、福彩3D等玩法。
- KL8相关API、数据处理、分析与文档全部移除。

### 影响
- 代码结构更清晰，便于维护和扩展。
- KL8用户需前往新仓库获取支持。

---

## 2024-06：红球预测唯一性修正决策

### 背景
原预测逻辑在极少数情况下可能出现红球重复，违背实际规则。

### 决策
- 修正 `src/pipeline.py` 预测逻辑，采用去重策略，确保每注红球号码唯一。
- 同步更新文档，明确修正内容。

### 影响
- 预测结果更贴合实际规则，用户体验提升。
- 相关API和文档均已同步。
# 架构决策记录（2024-XX 更新）

## 背景

早期版本依赖 TensorFlow 1.x 风格的 `Session/Graph` API，并通过手写循环驱动序列预测流程。随着 TensorFlow 2.x 的发布，旧实现出现以下问题：

- 新版本默认启用 Eager Execution，`tf.compat.v1` 图和 `Session` 已不再推荐，且与多项上游优化不兼容。
- 旧的 `.ckpt` 模型文件无法直接在最新 TensorFlow 中载入，且训练脚本缺乏超时与异常处理。
- `requests` 调用缺失超时和重试策略，不满足 AGENTS.md 关于外部依赖封装的要求。

在全面升级依赖的前提下，需要重新设计数据流程、模型封装与训练脚本。

## 决策

1. **运行时框架**：统一锁定 `tensorflow==2.15.1`，通过 `tf.keras` 提供的 API 构建与训练模型，确保在 Windows/Linux/macOS 均可安装。
2. **模型封装**：使用 Keras 函数式 API 实现多层 LSTM + Softmax 结构，同时复用窗口化输入与批量化训练逻辑，彻底移除对 `tensorflow-addons` 的依赖。
3. **数据与配置层**：拆分 `src/common.py`，新增 `src/data_fetcher.py`、`src/preprocessing.py`、`src/pipeline.py` 等模块，明确数据抓取、特征构造与训练流程的职责边界。
4. **模型持久化**：使用 `.keras`（SavedModel v3）格式保存模型与权重，配套 JSON 元数据描述窗口大小、类别数等参数，替换旧的 `.ckpt` 文件。
5. **网络访问**：构建带重试、超时、白名单域名校验的 `LotteryHttpClient`，满足外部调用安全规范。
6. **兼容策略**：不再兼容旧版 `.ckpt` 模型文件；若需要，可通过新脚本重新训练生成 `.keras` 模型。

## 影响

- **积极影响**
  - 拥抱 TensorFlow 2.x 的动态图与分布式优化能力，训练流程简化。
  - 模块边界清晰，可插拔地扩展数据获取或模型结构。
  - `.keras` 检查点跨平台加载无需手工定义图结构，预测脚本变得简单可靠。
  - Requests 封装具备超时/重试，符合安全要求并提高鲁棒性。

- **消极影响**
  - 旧版模型文件与脚本全部废弃，需要重新训练。
  - 由于训练过程改写，历史指标不可直接复现。
- `tensorflow==2.15.1` 官方提供 Windows/Linux/macOS 轮子，可在 Conda/venv 中直接安装。

## 替代方案

- 继续维持 `tf.compat.v1`：可以减少一次性重构成本，但在新版 TensorFlow 中长期维护成本高，且与 Keras 3 不兼容。
- 切换到 PyTorch：更贴合研究工作流，但迁移成本更大，现阶段优先保障现有业务可用性。
- 使用第三方 CRF 实现（如 sklearn-crfsuite）：能避免 TensorFlow 附带的复杂性，但与现有 LSTM 结构结合较困难，且在 Windows 下依旧存在编译成本。

综合考虑依赖升级范围与未来演进能力，选择现方案。

## 迁移步骤（摘要）

1. 更新 `requirements.txt`、`Makefile`，强制使用 `python311` Conda 环境与新依赖。
2. 重写 `src` 下模型与数据流程模块，提供面向对象的训练与预测接口。
3. 调整 CLI 脚本，使其通过 `src.pipeline` 调用新实现。
4. 重新编写测试与文档，验证依赖升级后的行为。

## 已知风险与缓解

- **tensorflow-wheel 可用性**：TensorFlow 2.15.1 官方提供 Windows/Linux/macOS 轮子，可在 Conda/venv 中直接安装。
- **LSTM 模型偏差**：纯 LSTM 相比 LSTM+CRF 在捕捉标签依赖上稍弱，可通过增加深度、加入注意力或后续混合模型补足。
- **数据源变更**：500.com 页面结构若调整会导致解析失败，数据客户端提供结构化异常与日志，后续可注入新解析器。

## 状态

本决策即时生效，并将随后续版本迭代在 `CHANGELOG.md` 中追踪。若上游依赖或业务需求发生重大变化，将重新评估。


================================================
FILE: docs/environment.md
================================================
# 环境复现与依赖管理

建议使用 conda 管理二进制依赖（如 numpy、tensorflow-intel、pytorch 等），并使用 pip 安装剩余纯 Python 包。

推荐流程（在仓库根目录）：

1. 使用 conda 创建环境：

```bash
conda env create -f environment.yml
conda activate predict_lottery
```

2. 使用 pip 安装可移植锁定的 pip 包：

```bash
python -m pip install -r requirements.lock.txt
```

3. 验证导入（参见 docs/verify.md）

常见问题与解决：

- 如果 `pip install -r requirements.lock.txt` 出现依赖冲突（ResolutionImpossible），请确保通过 conda 先安装核心二进制依赖（numpy、mkl、tensorflow-intel、torch）。
- 如果遇到本地 file:/// 或 conda-build 路径（代表 pip freeze 中包含本地构建），请使用 `requirements.lock.txt`（已过滤本地路径）而非仓库根的 `requirements.txt`。
- 如需包含私有 VCS 包，请在激活环境后手动 `pip install -e git+https://...`。


================================================
FILE: docs/ops.md
================================================
# 运维手册（Ops Guide）

## 1. 运行环境要求
- Python 3.11（推荐通过 `conda create -n python311 python=3.11` 创建环境）；
- 依赖锁定在 `requirements.txt`，包含 TensorFlow 2.15.1；
- Linux / macOS 原生支持，Windows 建议在 WSL2 或 Docker 中运行；
- GPU 环境需预装 CUDA ≥ 12.2（可选）。

> **KL8（快乐8）相关功能已于2025-10迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-lottery-analyzer)。本仓库仅支持双色球、大乐透、排列三、七星彩、福彩3D等玩法。**

> **2024-06起，红球预测结果已确保每注号码唯一，彻底避免重复。运维时如遇预测异常请优先检查 `src/pipeline.py` 相关逻辑。**

## 2. 启动与管理
### 本地启动
```bash
conda activate python311
make setup
make train   # 训练默认模型
make predict # 生成预测并落盘
```

### Docker 运行
```bash
docker build -t lottery-predict .
docker run -it --rm \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/model:/app/model \
  -v $(pwd)/predict:/app/predict \
  lottery-predict python scripts/train.py --name ssq --window-size 5
```

### 定时任务建议
- 使用 `cron` / `systemd timer` 触发 `python scripts/get_data.py` 更新历史数据；
- 训练脚本建议每日/每周执行，根据数据规模调整窗口与 epoch；
- 预测脚本支持 `--save` 参数，自动落盘 JSON 结果，可进一步推送至自定义通知渠道。

## 3. 日志与监控
- 使用 `loguru` 输出，默认 INFO 级别，日志目录：`logs/`；
- 关键事件：
  - 数据下载成功/失败与总期数；
  - 训练过程指标：loss、accuracy、学习率调整；
  - 模型保存路径与元数据；
  - 预测结果与窗口信息；
- 可通过 `export LOGURU_LEVEL=DEBUG` 或配置 `.env` 调整日志级别；
- Docker 运行时建议使用 `docker logs` 结合日志挂载。

## 4. 健康检查
- Docker 镜像内置 `HEALTHCHECK`，确保 Python 运行时可用；
- 训练完成后会生成 `model/<code>/window_<n>/metadata.json`，可作为成功标识；
- 可扩展脚本：检查 `metadata.json` 中的时间戳与 `trained_on_issues` 是否覆盖最新期号。

## 5. 故障排查
| 场景 | 排查步骤 | 解决建议 |
| ---- | -------- | -------- |
| 下载失败 | 检查网络连通性、域名是否被墙；查看 `logs/` 中的报错 | 配置代理或切换备用数据源 |
| 训练异常 | 查看日志中的数据维度、loss 变化；确认 `data.csv` 是否足够期数 | 调整 `window_size`、降低 batch_size、增加数据 |
| 预测返回空 | 检查模型目录是否存在、window 是否与训练一致 | 重新训练或指定正确的 `--window-size` |
| TensorFlow 报错 | 确认 Python/NumPy/TensorFlow 版本匹配；GPU 场景检查 CUDA | 重新创建 `python311` 环境并执行 `make setup` |

## 6. 备份与恢复
- 数据：`data/<code>/data.csv` 可定期备份到对象存储或版本库；
- 模型：建议保存 `model/<code>/window_<n>/` 整个目录，包括 `.keras` 与 `metadata.json`；
- 预测：`predict/<code>/prediction_*.json` 可用于复盘，必要时推送至备份存储。

## 7. 安全注意事项
- 所有 HTTP 请求通过白名单校验，不建议随意修改目标域名；
- 避免将 `.env`、密钥或训练得到的敏感结果上传仓库；
- 若部署到公网，建议在外层添加身份认证与速率限制。

## 8. 指标建议（可选扩展）
- 训练时记录 loss/accuracy 曲线，可输出到 TensorBoard 或 Prometheus；
- 预测结果统计（命中率、号码分布）可定期写入外部数据库，便于可视化；
- 若扩展 REST API，可加入请求耗时、错误码、QPS 等监控。

## 9. 常用命令速查
| 目的 | 命令 |
| ---- | ---- |
| 更新依赖 | `make setup` |
| 代码检查 | `make fmt && make lint` |
| 运行测试 | `make test` |
| 全量 CI | `make ci` |
| 下载历史数据 | `python scripts/get_data.py --name ssq` |
| 训练模型 | `python scripts/train.py --name ssq --window-size 5 --red-epochs 40` |
| 预测并保存 | `python scripts/predict.py --name ssq --window-size 5 --save` |

如遇未覆盖的问题，请参考 `ASSUMPTIONS.md` 与 `docs/decision_record.md`，或在 `agent_report.md` 中记录新发现的运维要点。***


================================================
FILE: docs/verify.md
================================================
# 验证步骤与示例输出

在新环境中执行下面的命令以验证 TensorFlow / Keras 导入与我们添加的 shim 行为：

1. 导入 TensorFlow 与 Keras：

```bash
python -c "import tensorflow as tf; import tensorflow.keras as keras; print('tf', tf.__version__, 'keras', getattr(keras,'__version__', 'N/A'))"
```

期望输出示例：

```
tf 2.15.1 keras 2.15.0
```

2. 测试项目入口导入（确保 shim 执行且不抛出错误）：

```bash
python -c "import scripts.train; print('imported scripts.train OK')"
```

期望输出：

```
imported scripts.train OK
```

Observed on this machine (example):

```
tf 2.15.1
keras spec None
```

Note: in some environments `keras` may be a separate top-level package; this project prefers the `tf.keras` shipped with TensorFlow.
如果看到 `DeprecationWarning: The name tf.ragged.RaggedTensorValue is deprecated`：
- 确保你已经激活 conda 环境并安装了 `requirements.lock.txt` 中的依赖
- shim 代码会临时抑制该弃用信息；如果仍然出现，可能是 shim 未在最早入口执行（检查所有入口脚本是否尽早 import `src.bootstrap`）。


================================================
FILE: environment.yml
================================================
name: predict_lottery
channels:
  - defaults
  - conda-forge
  - pytorch
dependencies:
  - python=3.11
  - pip
  - numpy=1.24.3
  - pandas=2.2.2
  - lxml=5.2.2
  - matplotlib=3.9.0
  - pillow
  - pip:
    - keras==2.15.0
    - -r requirements.lock.txt

# Notes:
# 1) This environment file installs binary packages via conda (numpy, pandas, lxml, matplotlib)
#    which improves reproducibility on Windows/macOS/Linux. After conda creates the env it will
#    run pip to install the remaining packages from `requirements.lock.txt`.
# 2) If you prefer Intel-optimized TensorFlow on Windows, install it after creating the env:
#    conda activate predict_lottery
#    python -m pip install tensorflow-intel==2.15.1
#    (or add it to pip: above if you prefer pip-managed tensorflow)

================================================
FILE: examples/analysis_example.py
================================================
# -*- coding: utf-8 -*-
"""
数据分析示例

展示如何使用数据分析功能分析彩票数据

Author: KittenCN
"""
import sys
import os

# 添加src目录到Python路径
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'src'))

from src.analysis import BasicAnalysis


def analysis_example():
    """数据分析示例"""
    print("=== 彩票数据分析示例 ===")
    
    # 示例数据
    sample_data = [
        [1, 5, 12, 23, 34, 45, 67, 78, 80, 77, 66, 55, 44, 33, 22, 11, 2, 3, 4, 6],
        [2, 8, 15, 29, 38, 47, 59, 68, 79, 71, 62, 53, 42, 31, 28, 17, 9, 7, 13, 19],
        [3, 11, 18, 27, 36, 49, 58, 69, 72, 64, 56, 48, 39, 21, 14, 26, 35, 41, 52, 63]
    ]
    
    print("示例数据分析结果:")
    try:
        datacnt, dataori = BasicAnalysis(sample_data)
        print(f"\n分析完成，共分析 {len(sample_data)} 组数据")
        print("出现频率统计已显示在上方")
    except Exception as e:
        print(f"分析失败: {e}")
    
    print("\n=== 示例完成 ===")


if __name__ == "__main__":
    analysis_example()

================================================
FILE: examples/quick_start.py
================================================
# -*- coding: utf-8 -*-
"""
彩票预测系统快速开始示例

本示例展示如何使用彩票预测系统进行数据获取、模型训练和预测

Author: KittenCN
"""
import sys
import os

# 添加src目录到Python路径
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'src'))

from src.common import get_current_number, get_data_run
from src.config import LOTTERY_CONFIGS


def quick_start_example():
    """快速开始示例"""
    print("=== 彩票AI预测系统快速开始示例 ===")
    
    # 1. 获取数据
    print("\n1. 获取双色球数据...")
    try:
        get_data_run('ssq')
        print("数据获取成功！")
    except Exception as e:
        print(f"数据获取失败: {e}")
        return
    
    # 2. 查看当前期号
    print("\n2. 查看当前期号...")
    try:
        current_num = get_current_number('ssq')
        print(f"双色球最新期号: {current_num}")
    except Exception as e:
        print(f"获取期号失败: {e}")
    
    # 3. 显示支持的彩票类型
    print("\n3. 支持的彩票类型:")
    for code, cfg in LOTTERY_CONFIGS.items():
        print(f"  - {code}: {cfg.name}")
    
    print("\n=== 示例完成 ===")
    print("\n下一步:")
    print("1. 使用 'python scripts/train.py --name ssq --window-size 5 --red-epochs 60' 训练模型")
    print("2. 使用 'python scripts/predict.py --name ssq --window-size 5 --save' 进行预测")


if __name__ == "__main__":
    quick_start_example()


================================================
FILE: requirements.lock.txt
================================================
beautifulsoup4==4.12.3
black==24.4.2
build==0.10.0
matplotlib==3.9.0
numpy==1.24.3
pandas==2.2.2
pyyaml==6.0.1
python-dotenv==1.0.1
pytest==8.2.2
pytest-cov==5.0.0
requests==2.32.3
responses==0.25.0
ruff==0.5.6
scikit-learn==1.5.1
tensorflow-intel==2.15.1
tensorflow==2.15.1
keras==2.15.0
tqdm==4.66.4
loguru==0.7.2

================================================
FILE: requirements.txt
================================================
tensorflow==2.15.1
keras==2.15.0
requests==2.32.3
beautifulsoup4==4.12.3
pandas==2.2.2
numpy>=1.24,<2.0
lxml==5.2.2
loguru==0.7.2
scikit-learn==1.5.1
matplotlib==3.9.0
tqdm==4.66.4
pyyaml==6.0.1
python-dotenv==1.0.1
responses==0.25.0

# 开发依赖
black==24.4.2
ruff==0.5.6
mypy==1.10.0
pytest==8.2.2
pytest-cov==5.0.0


================================================
FILE: scripts/get_data.py
================================================
# -*- coding:utf-8 -*-
"""
历史数据下载脚本。

示例：
    python scripts/get_data.py --name ssq --start 2024001 --end 2024350
"""

from __future__ import annotations

import argparse
import sys
from pathlib import Path

from loguru import logger


PROJECT_ROOT = Path(__file__).resolve().parents[1]
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

from src.common import get_data_run  # noqa: E402
from src.config import LOTTERY_CONFIGS  # noqa: E402


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="从 500.com 下载历史开奖数据")
    parser.add_argument(
        "--name",
        default="ssq",
        help="彩票类型代码，如 ssq / dlt / kl8，默认 ssq",
    )
    parser.add_argument("--start", type=int, default=None, help="起始期号（包含），默认从最早可用期开始")
    parser.add_argument("--end", type=int, default=None, help="结束期号（包含），默认至最新期")
    parser.add_argument(
        "--sequence",
        action="store_true",
        help="快乐8 是否使用出球顺序数据（仅 kl8 有效）",
    )
    return parser.parse_args()


def main() -> None:
    args = parse_args()
    code = args.name.lower().strip()
    if code not in LOTTERY_CONFIGS:
        raise SystemExit(f"不支持的彩票类型：{args.name}，有效选项：{', '.join(LOTTERY_CONFIGS.keys())}")
    get_data_run(code, cq=int(args.sequence), start_issue=args.start, end_issue=args.end)
    logger.success("数据下载完成：{}", code)


if __name__ == "__main__":
    main()



================================================
FILE: scripts/predict.py
================================================
# -*- coding: utf-8 -*-
"""
预测脚本，基于最新训练好的模型输出下一期号码。

示例：
    python scripts/predict.py --name ssq --window-size 5 --save
"""

from __future__ import annotations

import argparse
import datetime
import json
import sys
from pathlib import Path

from loguru import logger

PROJECT_ROOT = Path(__file__).resolve().parents[1]
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

from src.common import predict_latest  # noqa: E402
from src.config import LOTTERY_CONFIGS, PATHS  # noqa: E402


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="使用最新模型预测彩票开奖号码")
    parser.add_argument("--name", default=None, help="彩票类型代码，如 ssq / dlt / kl8 （必需）")
    parser.add_argument("--list-models", action="store_true", help="列出已训练的模型窗口并退出")
    parser.add_argument("--window-size", type=int, default=None, help="使用指定窗口大小的模型进行预测")
    parser.add_argument("--save", action="store_true", help="是否将预测结果保存到 predict/<code>/ 目录")
    return parser.parse_args()


def save_prediction(code: str, data: dict) -> Path:
    timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    output_dir = PATHS["predict"] / code
    output_dir.mkdir(parents=True, exist_ok=True)
    path = output_dir / f"prediction_{timestamp}.json"
    path.write_text(
        json.dumps(
            {"code": code, "timestamp": timestamp, "prediction": data},
            ensure_ascii=False,
            indent=2,
        )
        + "\n",
        encoding="utf-8",
    )
    return path


def main() -> None:
    args = parse_args()
    if args.list_models:
        # List available model windows for each lottery type or for provided name
        target = args.name.lower().strip() if args.name else None
        for code, cfg in LOTTERY_CONFIGS.items():
            if target and code != target:
                continue
            model_dir = PATHS["model"] / code
            if not model_dir.exists():
                print(f"{code}: (no models)")
                continue
            windows = sorted([p.name for p in model_dir.iterdir() if p.is_dir() and p.name.startswith("window_")])
            print(f"{code}: {', '.join(windows) if windows else '(no models)'}")
        return

    if not args.name:
        parser = argparse.ArgumentParser()
        parser.print_help()
        raise SystemExit("参数 --name 必需。示例：python scripts/predict.py --name ssq")

    code = args.name.lower().strip()
    if code not in LOTTERY_CONFIGS:
        raise SystemExit(f"不支持的彩票类型：{args.name}，有效选项：{', '.join(LOTTERY_CONFIGS.keys())}")

    # If window_size not provided, pick the latest window_* folder under model/<code>/
    window_size = args.window_size
    if window_size is None:
        model_dir = PATHS["model"] / code
        if model_dir.exists():
            windows = [p for p in model_dir.iterdir() if p.is_dir() and p.name.startswith("window_")]
            if windows:
                latest = sorted(windows, key=lambda p: int(p.name.split("_")[-1]) if p.name.split("_")[-1].isdigit() else -1)[-1]
                try:
                    window_size = int(latest.name.split("_")[-1])
                except Exception:
                    window_size = None
    predictions = predict_latest(code, window_size=args.window_size)
    logger.info("预测结果: {}", predictions)
    if args.save:
        file_path = save_prediction(code, predictions)
        logger.success("预测结果已保存到 {}", file_path)


if __name__ == "__main__":
    main()



================================================
FILE: scripts/train.py
================================================
# -*- coding: utf-8 -*-
"""
模型训练脚本（TensorFlow 2.15+ 版本）。

示例：
    python scripts/train.py --name ssq --window-size 5 --red-epochs 60
"""

from __future__ import annotations

import argparse
import sys
from pathlib import Path

from loguru import logger

# Ensure the project root is on sys.path so `src` imports work and the
# bootstrap shim can be imported early.
PROJECT_ROOT = Path(__file__).resolve().parents[1]
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

# import as early as possible; src.bootstrap is best-effort
try:
    import src.bootstrap  # noqa: F401
except Exception:
    # If bootstrap fails, continue; the bootstrap shim is non-critical
    pass

from src.common import get_data_run, train_pipeline  # noqa: E402
from src.config import LOTTERY_CONFIGS  # noqa: E402


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="训练指定彩票的 LSTM 模型")
    parser.add_argument(
        "--name",
        default="ssq",
        help="彩票类型代码，如 ssq / dlt / kl8，默认 ssq",
    )
    parser.add_argument("--window-size", type=int, default=None, help="时间窗口大小，默认使用玩法配置")
    parser.add_argument("--batch-size", type=int, default=None, help="训练批大小，默认使用玩法配置")
    parser.add_argument("--red-epochs", type=int, default=None, help="红球模型训练轮数")
    parser.add_argument("--blue-epochs", type=int, default=None, help="蓝球模型训练轮数")
    parser.add_argument("--download-data", action="store_true", help="训练前自动下载最新数据")
    return parser.parse_args()


def main() -> None:
    args = parse_args()
    code = args.name.lower().strip()
    if code not in LOTTERY_CONFIGS:
        raise SystemExit(f"不支持的彩票类型：{args.name}，有效选项：{', '.join(LOTTERY_CONFIGS.keys())}")

    if args.download_data:
        logger.info("开始下载数据...")
        get_data_run(code)

    summary = train_pipeline(
        name=code,
        window_size=args.window_size,
        batch_size=args.batch_size,
        red_epochs=args.red_epochs,
        blue_epochs=args.blue_epochs,
    )
    logger.success("训练完成，详情见 model/{}/window_{}/{}", summary.code, summary.window_size, "metadata.json")


if __name__ == "__main__":
    main()



================================================
FILE: src/__init__.py
================================================
# -*- coding: utf-8 -*-
"""
彩票AI预测系统核心模块

该模块包含了彩票AI预测系统的核心功能，包括：
- 数据获取和处理
- 模型训练和预测
- 数据分析工具
- 配置管理

Author: KittenCN
"""

__version__ = "3.0.0"
__author__ = "KittenCN"


================================================
FILE: src/analysis.py
================================================
# -*- coding:utf-8 -*-
"""
Author: KittenCN
"""

import pandas as pd
from .config import *

datacnt = [0] * 81
dataori = [i for i in range(81)]
ori_data = []

def BasicAnalysis(oridata):
    # Basic analysis of the data
    # ori_data: original data
    # Return: None
    # Author: KittenCN
    global datacnt, dataori
    datacnt = [0] * 81
    dataori = [i for i in range(81)]
    for row in oridata:
        for item in row:
            datacnt[int(item)] += 1
    datacnt, dataori = sortcnt(datacnt, dataori, 81)
    lastcnt = -1
    for i in range(81):
        if dataori[i] == 0:
            continue
        if lastcnt != datacnt[i]:
            print()
            print("{}: {}".format(datacnt[i], dataori[i]), end = " ")
            lastcnt = datacnt[i]
        elif lastcnt == datacnt[i]:
            print(dataori[i], end = " ")
    return datacnt, dataori
    
def sortcnt(datacnt, dataori, rangenum=81):
    for i in range(rangenum):
        for j in range(i + 1, rangenum):
            if datacnt[i] < datacnt[j]:
                datacnt[i], datacnt[j] = datacnt[j], datacnt[i]
                dataori[i], dataori[j] = dataori[j], dataori[i]
            elif datacnt[i] == datacnt[j]:
                if dataori[i] < dataori[j]:
                    datacnt[i], datacnt[j] = datacnt[j], datacnt[i]
                    dataori[i], dataori[j] = dataori[j], dataori[i]
    return datacnt, dataori

def getdata():
    strdata = input("输入要统计的出现次数，“，”分隔, -1结束: ").split(',')
    if strdata[0] == "-1":
        return None, None
    data = [int(i) for i in strdata]
    oridata = []
    for i in range(81):
        if dataori[i] == 0:
            continue
        if datacnt[i] in data:
            oridata.append(dataori[i])
    booldata = [False] * len(oridata)
    return oridata, booldata

def dfs(oridata, booldata, getnums, dep, ans, cur):
    if dep == getnums:
        ans.append(cur.copy())
        return
    for i in oridata:
        if booldata[oridata.index(i)] or i <= cur[dep - 1]:
            continue
        booldata[oridata.index(i)] = True
        cur[dep] = i
        dfs(oridata,booldata, getnums, dep + 1, ans, cur)
        booldata[oridata.index(i)] = False
    return ans

def shrink(oridata, booldata):
    getnums = int(input("输入要缩水至几个数? (-1表示结束) "))
    while getnums != -1:
        ans = dfs(oridata,booldata, getnums, 0, [], [0] * getnums)
        print("一共有 {} 条结果，可缩水至 {} 个数.".format(len(ans), getnums))
        strSumMinMax = input("输入和值最小和最大值，用‘，’分隔").split(',')
        SumMinMax = [int(i) for i in strSumMinMax]
        SumMin = SumMinMax[0]
        SumMax = SumMinMax[1]
        for i in range(len(ans)):
            if sum(ans[i]) < SumMin or sum(ans[i]) > SumMax:
                continue
            print(ans[i])
        getnums = int(input("输入要缩水至几个数? (-1表示结束) "))

def sumanalyusis(limit=-1):
    oridata = pd.read_csv("{}{}".format(name_path["kl8"]["path"], data_file_name))
    data = oridata.iloc[:, 2:].values
    sumori = [i for i in range(1401)]
    sumcnt = [0] * 1401
    linenum = 0
    for row in data:
        if limit != -1 and linenum >= limit:
            break
        linenum += 1
        sum = 0
        for item in row:
            sum += item
        sumcnt[sum] += 1
    sumcnt, sumori = sortcnt(sumcnt, sumori, 1401)
    lastcnt = -1
    for i in range(1401):
        if sumori[i] == 0 or sumcnt[i] == 0:
            continue
        if lastcnt != sumcnt[i]:
            print()
            print("{}: {}".format(sumcnt[i], sumori[i]), end = " ")
            lastcnt = sumcnt[i]
        elif lastcnt == sumcnt[i]:
            print(sumori[i], end = " ")
    print()
    sumtop = int(input("输入要统计前几位和值:"))
    lastsum = -1
    sumans = []
    sumanscnt = 0
    for i in range(1401):
        if sumcnt[i] == 0:
            continue
        if sumcnt[i] != lastsum:
            if sumanscnt == sumtop:
                break;
            else:
                lastsum = sumcnt[i]
                sumanscnt += 1
        sumans.append(sumori[i])
    print(sumans)

if __name__ == "__main__":
    while True:
        print()
        print(" 1. 读取预测数据并分析\r\n 2. 缩水\r\n 3. 和值分析\r\n 0. 退出\r\n")
        choice = int(input("input your choice:"))
        if choice == 1:
            _datainrow = []
            n = int(input("输入数据组数，-1为从文件输入:"))
            if n != -1:
                for i in range(n):
                    tmpdata = input("输入第 #{} 组数据: ".format(i + 1)).strip().split(' ')
                    for item in tmpdata:
                        _datainrow.append(int(item))
                    _datainrow.append(int(item) for item in tmpdata)
                    ori_data.append(tmpdata)
            else:
                filename = input("输入文件名: ")
                fileadd = "{}{}{}{}".format(predict_path, "kl8/", filename, ".csv")
                ori_data = pd.read_csv(fileadd).values
                limit = int(input("共有{}组数据，输入要分析前多少组：".format(len(ori_data))))
                ori_data = ori_data[:limit]
                for row in ori_data:
                    for item in row:
                        _datainrow.append(item)           
            datacnt, dataori = BasicAnalysis(ori_data)
            print()
            currentnums = input("输入当前获奖数据，-1为结束： ").split(' ')
            if currentnums[0] != "-1":
                curnums = [int(i) for i in currentnums]
                curcnt = 0
                tmp_cnt = [0] * len(ori_data)
                for item in curnums:
                    for i, row in enumerate(ori_data):
                        if item in row:
                            curcnt += 1
                            tmp_cnt[i] += 1
                            break
                totalnums = len(list(set(_datainrow)))
                for i in range(len(tmp_cnt)):
                    print("第{}组数据中，当前获奖数据出现的次数为{}次，概率为：{:.2f}%".format(i + 1, tmp_cnt[i], tmp_cnt[i] / totalnums * 100))
                print("命中数 / 总预测数: {} / {}".format(curcnt, totalnums))
                lastcnt = -1
                for i in range(81):
                    if dataori[i] == 0:
                        continue
                    elif dataori[i] in curnums:
                        if lastcnt != datacnt[i]:
                            print()
                            print("{}: {}".format(datacnt[i], dataori[i]), end = " ")
                            lastcnt = datacnt[i]
                        elif lastcnt == datacnt[i]:
                            print(dataori[i], end = " ")
                print()
            oridata, booldata = getdata()
            print(oridata)

        elif choice == 2:
            oridata, booldata = getdata()
            shrink(oridata, booldata)
        elif choice == 3:
            limit = int(input("输入要分析的数据组数，-1为全部:"))
            sumanalyusis(limit)
        if choice == 0:
            break


================================================
FILE: src/bootstrap.py
================================================
"""Bootstrap helpers run early to prepare runtime compatibility (e.g. TensorFlow shims).

This module should be imported as early as possible by entry scripts so that
third-party libraries (Keras) see the compatibility shims at import-time.
"""
from __future__ import annotations

import importlib
import types
import warnings
import logging


def _ensure_ragged_compat() -> None:
    """Ensure tf.ragged.RaggedTensorValue is available or mapped to compat.v1.

    This helps older third-party code that references the deprecated symbol.
    It's best-effort and will not raise on failure. Suppress warnings/logging
    while performing the mapping to avoid emitting deprecation warnings.
    """
    try:
        tf_spec = importlib.util.find_spec('tensorflow')
        if tf_spec is None:
            return
        try:
            import tensorflow as tf  # type: ignore
        except Exception:
            return

        # Temporarily suppress DeprecationWarning and TensorFlow logger output
        tf_logger = logging.getLogger('tensorflow')
        old_level = tf_logger.level
        try:
            with warnings.catch_warnings():
                warnings.filterwarnings('ignore', category=DeprecationWarning, message='.*RaggedTensorValue.*')
                tf_logger.setLevel(logging.ERROR)

                # Ensure tf.ragged namespace exists
                if not hasattr(tf, 'ragged'):
                    tf.ragged = types.SimpleNamespace()

                # If RaggedTensorValue is missing, prefer compat.v1 mapping
                if not hasattr(tf.ragged, 'RaggedTensorValue'):
                    mapped = None
                    if hasattr(tf, 'compat') and hasattr(tf.compat, 'v1'):
                        compat_v1 = getattr(tf.compat, 'v1')
                        if hasattr(compat_v1, 'ragged') and hasattr(compat_v1.ragged, 'RaggedTensorValue'):
                            mapped = compat_v1.ragged.RaggedTensorValue

                    # Fallback: try to use tf.RaggedTensor class if available
                    if mapped is None and hasattr(tf, 'RaggedTensor'):
                        mapped = getattr(tf, 'RaggedTensor')

                    if mapped is not None:
                        setattr(tf.ragged, 'RaggedTensorValue', mapped)
        finally:
            tf_logger.setLevel(old_level)
    except Exception:
        # Swallow all exceptions; this is a best-effort compatibility layer
        return


_ensure_ragged_compat()


def _ensure_keras_shim() -> None:
    """If standalone `keras` is not installed, expose `keras` name pointing to `tf.keras`.

    TensorFlow's lazy loader may attempt `import keras` when accessing `tf.keras`.
    Creating a best-effort shim avoids ImportError in environments where only
    `tensorflow` is installed.
    """
    try:
        import importlib
        import sys
        import types

        # If keras is already importable, nothing to do
        if importlib.util.find_spec("keras") is not None:
            return

        # Try to import tensorflow; if unavailable, skip
        tf_spec = importlib.util.find_spec("tensorflow")
        if tf_spec is None:
            return
        import tensorflow as tf  # type: ignore

        if not hasattr(tf, "keras"):
            return

        # Create a lightweight stub module named 'keras' to satisfy import
        # and basic version checks performed by TensorFlow's lazy loader.
        # The stub intentionally does NOT delegate into tf.keras to avoid
        # triggering recursive import logic in environments where tf.keras
        # initialization itself tries to import `keras`.
        fake = types.ModuleType("keras")
        # Use a non-3.x version string to avoid Keras-3 specific branches.
        fake.__version__ = getattr(tf, "__version__", "0.0.0")
        fake.__name__ = "keras"
        # Add a few common submodule placeholders so attribute access succeeds
        fake.layers = types.ModuleType("keras.layers")
        fake.models = types.ModuleType("keras.models")
        fake.utils = types.ModuleType("keras.utils")
        fake.backend = types.ModuleType("keras.backend")

        sys.modules.setdefault("keras", fake)
    except Exception:
        # Best-effort only
        return


# Run the shim early
_ensure_keras_shim()


================================================
FILE: src/common.py
================================================
# -*- coding: utf-8 -*-
"""
公共接口封装。

为脚本层提供以下能力：
1. 下载历史数据：`get_data_run`
2. 查询最新期号：`get_current_number`
3. 训练模型：`train_pipeline`
4. 预测下一期开奖：`predict_latest`
"""

from __future__ import annotations

from typing import TYPE_CHECKING, Dict, Optional

from loguru import logger

from .config import LOTTERY_CONFIGS, ensure_runtime_directories, get_lottery_config
from .data_fetcher import download_history, get_current_issue, load_history

if TYPE_CHECKING:  # pragma: no cover
    from .pipeline import TrainingSummary


_PIPELINE_MODULE = None


def _load_pipeline():
    global _PIPELINE_MODULE
    if _PIPELINE_MODULE is None:
        from . import pipeline as _pipeline  # noqa: WPS433

        _PIPELINE_MODULE = _pipeline
    return _PIPELINE_MODULE


def get_data_run(
    name: str,
    cq: int = 0,
    start_issue: Optional[int] = None,
    end_issue: Optional[int] = None,
) -> None:
    """下载指定彩票的历史数据。"""

    ensure_runtime_directories()
    code = name.lower().strip()
    if code not in LOTTERY_CONFIGS:
        raise ValueError(f"不支持的彩票类型: {name}")
    use_sequence = bool(cq) and code == "kl8"
    download_history(code, start=start_issue, end=end_issue, use_sequence_order=use_sequence)


def get_current_number(name: str) -> str:
    """返回指定彩票的当前期号。"""

    code = name.lower().strip()
    if code not in LOTTERY_CONFIGS:
        raise ValueError(f"不支持的彩票类型: {name}")
    return get_current_issue(code)


def train_pipeline(
    name: str,
    window_size: Optional[int] = None,
    batch_size: Optional[int] = None,
    red_epochs: Optional[int] = None,
    blue_epochs: Optional[int] = None,
) -> "TrainingSummary":
    """高层训练接口，封装 pipeline.train_lottery_models。"""

    code = name.lower().strip()
    logger.info("开始训练【{}】模型...", LOTTERY_CONFIGS[code].name)
    pipeline_module = _load_pipeline()
    summary = pipeline_module.train_lottery_models(
        code=code,
        window_size=window_size,
        batch_size=batch_size,
        red_epochs=red_epochs,
        blue_epochs=blue_epochs,
    )
    logger.success("训练完成: {}", summary)
    return summary


def predict_latest(name: str, window_size: Optional[int] = None) -> Dict[str, list]:
    """使用最新模型预测下一期号码。"""

    code = name.lower().strip()
    cfg = get_lottery_config(code)
    pipeline_module = _load_pipeline()
    predictions = pipeline_module.predict_next_draw(code=code, window_size=window_size)
    readable = {key: list(map(int, value.tolist())) for key, value in predictions.items()}
    logger.info("【{}】预测结果: {}", cfg.name, readable)
    return readable


__all__ = [
    "get_data_run",
    "get_current_number",
    "train_pipeline",
    "predict_latest",
    "download_history",
    "load_history",
]


================================================
FILE: src/config.py
================================================
# -*- coding: utf-8 -*-
"""
项目全局配置模块。

该模块负责：
1. 读取 `config/config.yaml` 获取运行时配置；
2. 定义彩票玩法的模型超参数与默认训练设置；
3. 提供路径常量与工具函数，供数据、模型与脚本复用。

Author: Codex 升级
"""

from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Iterable, Optional

import yaml

BASE_DIR = Path(__file__).resolve().parents[1]
CONFIG_FILE = BASE_DIR / "config" / "config.yaml"


@dataclass(frozen=True)
class SequenceModelSpec:
    """描述单个序列模型的结构参数。"""

    sequence_len: int
    num_classes: int
    embedding_dim: int
    hidden_units: Iterable[int]
    dropout: float = 0.2


@dataclass(frozen=True)
class LotteryModelConfig:
    """描述单种彩票玩法的训练所需配置。"""

    code: str
    name: str
    red: SequenceModelSpec
    blue: Optional[SequenceModelSpec] = None
    default_window: int = 3
    default_batch_size: int = 32
    default_red_epochs: int = 40
    default_blue_epochs: int = 30
    learning_rate: float = 5e-4
    allow_sequence_order: bool = False


def _load_yaml_config() -> Dict[str, object]:
    if not CONFIG_FILE.exists():
        raise FileNotFoundError(f"未找到系统配置文件: {CONFIG_FILE}")
    with CONFIG_FILE.open(encoding="utf-8") as fp:
        return yaml.safe_load(fp)


YAML_CONFIG: Dict[str, object] = _load_yaml_config()

PATHS = {
    "data": Path(YAML_CONFIG.get("paths", {}).get("data", BASE_DIR / "data")).resolve(),
    "model": Path(YAML_CONFIG.get("paths", {}).get("model", BASE_DIR / "model")).resolve(),
    "predict": Path(YAML_CONFIG.get("paths", {}).get("predict", BASE_DIR / "predict")).resolve(),
    "logs": Path(YAML_CONFIG.get("paths", {}).get("logs", BASE_DIR / "logs")).resolve(),
}

NETWORK_CONFIG = {
    "timeout": YAML_CONFIG.get("network", {}).get("timeout", 20),
    "retry_count": YAML_CONFIG.get("network", {}).get("retry_count", 3),
    "backoff_factor": YAML_CONFIG.get("network", {}).get("backoff_factor", 0.6),
    "user_agent": YAML_CONFIG.get("network", {}).get(
        "user_agent",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/126.0.0.0 Safari/537.36",
    ),
}


ALLOWED_DOMAINS = {"datachart.500.com", "data.917500.cn", "datachart.500.com"}

DATA_FILE_NAME = "data.csv"
MODEL_METADATA_FILE = "metadata.json"


LOTTERY_CONFIGS: Dict[str, LotteryModelConfig] = {
    "ssq": LotteryModelConfig(
        code="ssq",
        name="双色球",
        red=SequenceModelSpec(
            sequence_len=6,
            num_classes=33,
            embedding_dim=64,
            hidden_units=(128, 64),
            dropout=0.3,
        ),
        blue=SequenceModelSpec(
            sequence_len=1,
            num_classes=16,
            embedding_dim=32,
            hidden_units=(64,),
            dropout=0.2,
        ),
        default_window=5,
        default_batch_size=32,
        default_red_epochs=60,
        default_blue_epochs=30,
        learning_rate=8e-4,
    ),
    "dlt": LotteryModelConfig(
        code="dlt",
        name="大乐透",
        red=SequenceModelSpec(
            sequence_len=5,
            num_classes=35,
            embedding_dim=64,
            hidden_units=(128, 64),
            dropout=0.3,
        ),
        blue=SequenceModelSpec(
            sequence_len=2,
            num_classes=12,
            embedding_dim=32,
            hidden_units=(64,),
            dropout=0.2,
        ),
        default_window=5,
        default_batch_size=32,
        default_red_epochs=60,
        default_blue_epochs=30,
        learning_rate=8e-4,
    ),
    "pls": LotteryModelConfig(
        code="pls",
        name="排列三",
        red=SequenceModelSpec(
            sequence_len=3,
            num_classes=10,
            embedding_dim=32,
            hidden_units=(64, 32),
            dropout=0.2,
        ),
        blue=None,
        default_window=5,
        default_batch_size=32,
        default_red_epochs=50,
        default_blue_epochs=0,
        learning_rate=5e-4,
    ),
    "qxc": LotteryModelConfig(
        code="qxc",
        name="七星彩",
        red=SequenceModelSpec(
            sequence_len=7,
            num_classes=10,
            embedding_dim=48,
            hidden_units=(96, 48),
            dropout=0.25,
        ),
        blue=None,
        default_window=6,
        default_batch_size=32,
        default_red_epochs=60,
        default_blue_epochs=0,
        learning_rate=6e-4,
    ),
    "kl8": LotteryModelConfig(
        code="kl8",
        name="快乐8",
        red=SequenceModelSpec(
            sequence_len=20,
            num_classes=80,
            embedding_dim=48,
            hidden_units=(128, 128, 64),
            dropout=0.35,
        ),
        blue=None,
        default_window=6,
        default_batch_size=48,
        default_red_epochs=40,
        default_blue_epochs=0,
        learning_rate=5e-4,
        allow_sequence_order=True,
    ),
    "sd": LotteryModelConfig(
        code="sd",
        name="福彩3D",
        red=SequenceModelSpec(
            sequence_len=3,
            num_classes=10,
            embedding_dim=32,
            hidden_units=(64, 32),
            dropout=0.2,
        ),
        blue=None,
        default_window=5,
        default_batch_size=32,
        default_red_epochs=50,
        default_blue_epochs=0,
        learning_rate=4e-4,
    ),
}


def ensure_runtime_directories() -> None:
    """确保项目运行所需的目录存在。"""

    for path in PATHS.values():
        path.mkdir(parents=True, exist_ok=True)


def get_lottery_config(code: str) -> LotteryModelConfig:
    """根据玩法代码获取配置。"""

    normalized = code.lower().strip()
    if normalized not in LOTTERY_CONFIGS:
        raise ValueError(f"未知的彩票类型: {code}")
    return LOTTERY_CONFIGS[normalized]


name_path = {
    code: {
        "name": cfg.name,
        "path": f"{(PATHS['data'] / code).as_posix()}/",
    }
    for code, cfg in LOTTERY_CONFIGS.items()
}
predict_path = f"{PATHS['predict'].as_posix()}/"
data_file_name = DATA_FILE_NAME


__all__ = [
    "BASE_DIR",
    "CONFIG_FILE",
    "DATA_FILE_NAME",
    "data_file_name",
    "MODEL_METADATA_FILE",
    "PATHS",
    "LOTTERY_CONFIGS",
    "LotteryModelConfig",
    "SequenceModelSpec",
    "NETWORK_CONFIG",
    "ALLOWED_DOMAINS",
    "ensure_runtime_directories",
    "get_lottery_config",
    "name_path",
    "predict_path",
]


================================================
FILE: src/data_fetcher.py
================================================
# -*- coding: utf-8 -*-
"""
数据抓取模块，负责从 500.com 拉取彩票历史数据并保存到本地。

特点：
1. 使用带重试的 requests.Session，满足网络安全要求；
2. 输出 Pandas DataFrame，供预处理与训练使用；
3. 针对快乐8（kl8）提供顺序版与常规版两种下载模式。
"""

from __future__ import annotations

import json
from dataclasses import dataclass
from datetime import datetime
from typing import Iterable, Optional
from urllib.parse import urlparse

import pandas as pd
import requests
from bs4 import BeautifulSoup
from loguru import logger
from requests.adapters import HTTPAdapter
from urllib3 import Retry

from .config import (
    ALLOWED_DOMAINS,
    DATA_FILE_NAME,
    LOTTERY_CONFIGS,
    NETWORK_CONFIG,
    PATHS,
    LotteryModelConfig,
    ensure_runtime_directories,
)


@dataclass
class DownloadResult:
    """描述一次下载操作的元信息。"""

    code: str
    total_issues: int
    saved_path: str
    timestamp: str


class LotteryHttpClient:
    """封装网络访问逻辑，提供带重试与域名校验的 GET 方法。"""

    def __init__(
        self,
        timeout: float,
        retries: int,
        backoff_factor: float,
        user_agent: str,
    ) -> None:
        self._timeout = timeout
        self._session = requests.Session()
        retry_strategy = Retry(
            total=retries,
            backoff_factor=backoff_factor,
            status_forcelist=(429, 500, 502, 503, 504),
            allowed_methods=frozenset(["GET"]),
        )
        adapter = HTTPAdapter(max_retries=retry_strategy)
        self._session.mount("https://", adapter)
        self._session.mount("http://", adapter)
        self._headers = {
            "User-Agent": user_agent,
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "zh-CN,zh;q=0.9",
        }

    def get_text(self, url: str) -> str:
        parsed = urlparse(url)
        domain = parsed.netloc.lower()
        if all(allowed not in domain for allowed in ALLOWED_DOMAINS):
            raise ValueError(f"禁止访问域名: {domain}")
        response = self._session.get(url, headers=self._headers, timeout=self._timeout)
        response.raise_for_status()
        response.encoding = "utf-8"
        return response.text


def _build_history_url(config: LotteryModelConfig, start: Optional[int], end: Optional[int]) -> str:
    base = f"https://datachart.500.com/{config.code}/history/"
    if config.code in {"qxc", "pls", "sd"}:
        path = "inc/history.php"
    elif config.code == "kl8":
        path = "newinc/jbzs_redblue.php"
    else:
        path = "history.shtml"

    if path.endswith(".shtml"):
        return f"{base}{path}"

    start_issue = start or 1
    end_issue = end or 999999
    limit = end_issue - start_issue + 1
    query = f"{path}?start={start_issue}&end={end_issue}&limit={limit}"
    return f"{base}{query}"


def _parse_issue_list(config: LotteryModelConfig, html: str) -> pd.DataFrame:
    soup = BeautifulSoup(html, "lxml")
    rows = []
    if config.code in {"ssq", "dlt", "kl8"}:
        tbody = soup.find("tbody", attrs={"id": "tdata"})
        if not tbody:
            raise ValueError("未找到开奖号码数据表格 (id=tdata)")
        trs = tbody.find_all("tr")
    else:
        table = soup.find("table", id="tablelist")
        if not table:
            raise ValueError("未找到开奖号码数据表格 (id=tablelist)")
        trs = table.find_all("tr")

    for tr in trs:
        tds = tr.find_all("td")
        if not tds:
            continue
        issue = tds[0].get_text(strip=True)
        if not issue or issue == "期号":
            continue
        record = {"期数": issue}
        if config.code == "ssq":
            for idx in range(config.red.sequence_len):
                record[f"红球_{idx + 1}"] = tds[idx + 1].get_text(strip=True)
            record["蓝球_1"] = tds[7].get_text(strip=True)
        elif config.code == "dlt":
            for idx in range(config.red.sequence_len):
                record[f"红球_{idx + 1}"] = tds[idx + 1].get_text(strip=True)
            for idx in range(config.blue.sequence_len):
                record[f"蓝球_{idx + 1}"] = tds[6 + idx].get_text(strip=True)
        elif config.code in {"pls", "sd", "qxc"}:
            digits = tds[1].get_text(strip=True).split(" ")
            for idx, value in enumerate(digits):
                record[f"红球_{idx + 1}"] = value
        elif config.code == "kl8":
            numbers = [td.get_text(strip=True) for td in tds if td.get_text(strip=True).isdigit()]
            for idx, value in enumerate(numbers):
                record[f"红球_{idx + 1}"] = value
        rows.append(record)

    if not rows:
        raise ValueError("解析开奖号码失败，未获取到任何数据")
    df = pd.DataFrame(rows)
    df.sort_values("期数", ascending=False, inplace=True)
    return df.reset_index(drop=True)




def get_current_issue(code: str, client: Optional[LotteryHttpClient] = None) -> str:
    """获取指定彩票的最新期号。"""

    cfg = LOTTERY_CONFIGS[code]
    client = client or LotteryHttpClient(
        timeout=NETWORK_CONFIG["timeout"],
        retries=NETWORK_CONFIG["retry_count"],
        backoff_factor=NETWORK_CONFIG.get("backoff_factor", 0.6),
        user_agent=NETWORK_CONFIG["user_agent"],
    )

    if cfg.code in {"qxc", "pls", "sd"}:
        url = f"https://datachart.500.com/{cfg.code}/history/inc/history.php"
    elif cfg.code == "kl8":
        url = f"https://datachart.500.com/{cfg.code}/history/newinc/jbzs_redblue.php"
    else:
        url = f"https://datachart.500.com/{cfg.code}/history/history.shtml"

    html = client.get_text(url)
    soup = BeautifulSoup(html, "lxml")
    if cfg.code == "kl8":
        value = soup.find("div", class_="wrap_datachart").find("input", {"id": "to"})["value"]
    else:
        value = soup.find("div", class_="wrap_datachart").find("input", {"id": "end"})["value"]
    logger.info("【{}】最新期号: {}", cfg.name, value)
    return value


def download_history(
    code: str,
    start: Optional[int] = None,
    end: Optional[int] = None,
    use_sequence_order: bool = False,
    client: Optional[LotteryHttpClient] = None,
) -> DownloadResult:
    """下载历史数据并保存到 data/<code>/data.csv。"""

    ensure_runtime_directories()
    cfg = LOTTERY_CONFIGS[code]
    client = client or LotteryHttpClient(
        timeout=NETWORK_CONFIG["timeout"],
        retries=NETWORK_CONFIG["retry_count"],
        backoff_factor=NETWORK_CONFIG.get("backoff_factor", 0.6),
        user_agent=NETWORK_CONFIG["user_agent"],
    )

    if cfg.code == "kl8" and use_sequence_order:
        raise NotImplementedError("KL8相关功能已迁移至独立项目：https://github.com/KittenCN/kl8-lottery-analyzer")
    else:
        url = _build_history_url(cfg, start, end)
        logger.info("下载【{}】历史数据: {}", cfg.name, url)
        html = client.get_text(url)
        df = _parse_issue_list(cfg, html)

    save_dir = PATHS["data"] / cfg.code
    save_dir.mkdir(parents=True, exist_ok=True)
    output_path = save_dir / DATA_FILE_NAME
    df.to_csv(output_path, index=False, encoding="utf-8")
    meta = DownloadResult(
        code=cfg.code,
        total_issues=len(df),
        saved_path=str(output_path),
        timestamp=datetime.utcnow().isoformat(),
    )
    logger.success("数据下载完成，共 {} 期，保存至 {}", meta.total_issues, output_path)
    (output_path.parent / "download_meta.json").write_text(
        json.dumps(meta.__dict__, ensure_ascii=False, indent=2), encoding="utf-8"
    )
    return meta


def load_history(code: str) -> pd.DataFrame:
    """加载本地已下载的历史数据。"""

    cfg = LOTTERY_CONFIGS[code]
    path = PATHS["data"] / cfg.code / DATA_FILE_NAME
    if not path.exists():
        raise FileNotFoundError(f"未找到 {cfg.name} 历史数据，请先执行下载: {path}")
    df = pd.read_csv(path, encoding="utf-8")
    if "期数" not in df.columns:
        raise ValueError(f"{path} 缺失【期数】字段，数据损坏或格式异常")
    return df


__all__ = [
    "DownloadResult",
    "LotteryHttpClient",
    "download_history",
    "get_current_issue",
    "load_history",
]



================================================
FILE: src/modeling.py
================================================
# -*- coding: utf-8 -*-
"""
模型构建模块。

提供基于 TensorFlow 2.15.1 的多层 LSTM 序列模型，并针对红球/蓝球输出
逐位置的类别概率。
"""

from __future__ import annotations

from typing import Dict

import importlib
import warnings

try:
    import tensorflow as tf
except Exception as exc:  # pragma: no cover - runtime environment dependent
    raise ImportError(
        "TensorFlow import failed. Ensure TensorFlow (e.g. tensorflow or tensorflow-intel) is installed."
    ) from exc

from loguru import logger


# Ensure tf.keras is available; prefer bundled tf.keras over standalone `keras` package.
if not hasattr(tf, "keras"):
    # Some environments may have an incomplete TensorFlow install where `tf.keras` is missing.
    # Try to import standalone `keras` as a best-effort fallback, but do not fail hard here;
    # instead emit a clear warning so downstream code that expects tf.keras will see a
    # friendlier message.
    try:
        import keras  # type: ignore

        warnings.warn(
            "Standalone `keras` was found but `tf.keras` is missing. Using standalone keras may cause incompatibilities.",
            UserWarning,
        )
    except Exception:
        raise ImportError(
            "Keras cannot be imported. Check that it is installed or that your TensorFlow installation is complete."
        )

from src.config import LotteryModelConfig, SequenceModelSpec


def _time_distributed_lstm(
    inputs: tf.Tensor,
    units: int,
    name: str,
) -> tf.Tensor:
    """对每个球位独立应用 LSTM，提取窗口维度特征。"""

    layer = tf.keras.layers.TimeDistributed(
        tf.keras.layers.LSTM(units, return_sequences=False, name=f"{name}_inner"),
        name=name,
    )
    return layer(inputs)


def build_sequence_model(
    spec: SequenceModelSpec,
    window_size: int,
    learning_rate: float,
    name: str,
) -> tf.keras.Model:
    """根据给定规格构建序列模型。"""

    inputs = tf.keras.layers.Input(
        shape=(window_size, spec.sequence_len),
        dtype=tf.int32,
        name=f"{name}_input",
    )
    embedding_layer = tf.keras.layers.Embedding(
        input_dim=spec.num_classes,
        output_dim=spec.embedding_dim,
        embeddings_initializer="he_normal",
        name=f"{name}_embedding",
    )
    embedded = embedding_layer(inputs)  # (batch, window, seq_len, embed_dim)
    # 将球位与时间维度交换，便于对每个球做 LSTM
    per_ball_sequence = tf.transpose(embedded, perm=(0, 2, 1, 3), name=f"{name}_permute")
    per_ball_encoded = _time_distributed_lstm(
        per_ball_sequence,
        units=int(spec.hidden_units[0]),
        name=f"{name}_per_ball_lstm",
    )

    x = per_ball_encoded
    for layer_idx, units in enumerate(spec.hidden_units[1:], start=1):
        x = tf.keras.layers.LSTM(
            units,
            return_sequences=True,
            dropout=spec.dropout,
            recurrent_dropout=0.0,
            name=f"{name}_global_lstm_{layer_idx}",
        )(x)

    if len(spec.hidden_units) == 1:
        x = tf.keras.layers.LSTM(
            spec.hidden_units[0],
            return_sequences=True,
            dropout=spec.dropout,
            name=f"{name}_global_lstm",
        )(x)

    x = tf.keras.layers.Dropout(spec.dropout, name=f"{name}_dropout")(x)
    logits = tf.keras.layers.Dense(
        spec.num_classes,
        name=f"{name}_logits",
    )(x)
    output = tf.keras.layers.Activation("softmax", name=f"{name}_softmax")(logits)

    model = tf.keras.Model(inputs=inputs, outputs=output, name=f"{name}_model")
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate, clipnorm=1.0),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
    )

    logger.debug("构建模型 {}：窗口={}，序列长={}，类别数={}", name, window_size, spec.sequence_len, spec.num_classes)
    return model


def build_models_for_lottery(
    config: LotteryModelConfig,
    window_size: int,
) -> Dict[str, tf.keras.Model]:
    """构建指定彩票的红/蓝球模型。"""

    models: Dict[str, tf.keras.Model] = {
        "red": build_sequence_model(config.red, window_size, config.learning_rate, f"{config.code}_red"),
    }
    if config.blue:
        models["blue"] = build_sequence_model(
            config.blue,
            window_size,
            config.learning_rate,
            f"{config.code}_blue",
        )
    return models


__all__ = ["build_models_for_lottery", "build_sequence_model"]


================================================
FILE: src/pipeline.py
================================================
# -*- coding: utf-8 -*-
"""
训练与预测流程封装。

暴露的核心函数：
- train_lottery_models：基于历史数据训练模型并写入本地；
- load_trained_models：从磁盘加载已训练模型；
- predict_next_draw：使用最新窗口数据给出预测结果。
"""

from __future__ import annotations

import json
from dataclasses import asdict, dataclass
from datetime import datetime
from pathlib import Path
from typing import Dict, Optional, Tuple

import numpy as np
import tensorflow as tf
from loguru import logger

from .config import (
    DATA_FILE_NAME,
    MODEL_METADATA_FILE,
    PATHS,
    LotteryModelConfig,
    ensure_runtime_directories,
    get_lottery_config,
)
from .data_fetcher import load_history
from .modeling import build_models_for_lottery
from .preprocessing import ComponentDataset, prepare_training_arrays, train_validation_split


@dataclass
class ComponentTrainingSummary:
    train_samples: int
    val_samples: int
    best_val_loss: Optional[float]
    best_val_metric: Optional[float]
    epochs_trained: int


@dataclass
class TrainingSummary:
    code: str
    name: str
    window_size: int
    trained_on_issues: Tuple[str, str]
    components: Dict[str, ComponentTrainingSummary]
    timestamp: str


def _ensure_enough_samples(dataset: ComponentDataset, window_size: int, name: str) -> None:
    if dataset.features.shape[0] == 0:
        raise ValueError(
            f"{name} 可用数据不足，窗口大小 {window_size} 生成的样本数为 0，请增加历史期数或减小窗口。"
        )


def _build_tf_dataset(
    features: np.ndarray,
    labels: np.ndarray,
    batch_size: int,
    shuffle: bool,
) -> tf.data.Dataset:
    ds = tf.data.Dataset.from_tensor_slices((features, labels))
    if shuffle:
        buffer = min(len(features), max(batch_size * 4, 256))
        ds = ds.shuffle(buffer)
    return ds.batch(batch_size).prefetch(tf.data.AUTOTUNE)


def _denormalize(pred: np.ndarray, spec_classes: int) -> np.ndarray:
    if pred.min() < 0:
        raise ValueError("预测结果包含负数，可能是模型输出异常")
    return pred + 1


def _get_latest_window(arr: np.ndarray, window_size: int) -> np.ndarray:
    if arr.shape[0] < window_size:
        raise ValueError(f"历史数据不足，无法获取 {window_size} 条窗口序列")
    return arr[-window_size:]


def train_lottery_models(
    code: str,
    window_size: Optional[int] = None,
    batch_size: Optional[int] = None,
    red_epochs: Optional[int] = None,
    blue_epochs: Optional[int] = None,
    validation_ratio: float = 0.15,
) -> TrainingSummary:
    """训练指定彩票模型，并返回训练摘要。"""

    ensure_runtime_directories()
    cfg: LotteryModelConfig = get_lottery_config(code)
    df = load_history(cfg.code)
    window = window_size or cfg.default_window
    arrays = prepare_training_arrays(df, cfg, window)
    summary_components: Dict[str, ComponentTrainingSummary] = {}

    models = build_models_for_lottery(cfg, window)
    save_dir = PATHS["model"] / cfg.code / f"window_{window}"
    save_dir.mkdir(parents=True, exist_ok=True)

    first_issue = str(df["期数"].min())
    last_issue = str(df["期数"].max())

    for component, model in models.items():
        dataset = arrays[component]
        _ensure_enough_samples(dataset, window, f"{cfg.name}-{component}")
        (x_train, y_train), (x_val, y_val) = train_validation_split(
            dataset.features, dataset.labels, validation_ratio=validation_ratio
        )
        effective_batch = max(1, min(batch_size or cfg.default_batch_size, x_train.shape[0]))
        train_ds = _build_tf_dataset(x_train, y_train, effective_batch, shuffle=True)
        val_ds = None
        if x_val.shape[0] > 0:
            val_ds = _build_tf_dataset(x_val, y_val, effective_batch, shuffle=False)

        callbacks = [
            tf.keras.callbacks.EarlyStopping(
                monitor="val_loss",
                patience=8,
                restore_best_weights=True,
                verbose=1,
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor="val_loss",
                factor=0.5,
                patience=4,
                min_lr=1e-6,
                verbose=1,
            ),
        ]
        if val_ds is None:
            callbacks = []

        epochs = red_epochs if component == "red" else blue_epochs
        if epochs is None:
            epochs = cfg.default_red_epochs if component == "red" else cfg.default_blue_epochs
        epochs = max(1, epochs)

        logger.info(
            "训练模型 {}-{}: 样本={}，验证集={}，窗口={}，批大小={}，轮数={}",
            cfg.code,
            component,
            dataset.features.shape[0],
            x_val.shape[0],
            window,
            effective_batch,
            epochs,
        )

        history = model.fit(
            train_ds,
            validation_data=val_ds,
            epochs=epochs,
            verbose=2,
            callbacks=callbacks,
        )

        model_path = save_dir / f"{component}.keras"
        model.save(model_path, overwrite=True)
        logger.success("模型已保存至 {}", model_path)

        best_loss = min(history.history.get("val_loss", history.history.get("loss", [None])))
        metric_key = None
        for candidate in ("val_accuracy", "val_sparse_categorical_accuracy", "accuracy"):
            if candidate in history.history:
                metric_key = candidate
                break
        best_metric = None
        if metric_key is not None:
            best_metric = max(history.history[metric_key])
        summary_components[component] = ComponentTrainingSummary(
            train_samples=int(x_train.shape[0]),
            val_samples=int(x_val.shape[0]),
            best_val_loss=float(best_loss) if best_loss is not None else None,
            best_val_metric=float(best_metric) if best_metric is not None else None,
            epochs_trained=len(history.history.get("loss", [])),
        )

    metadata = TrainingSummary(
        code=cfg.code,
        name=cfg.name,
        window_size=window,
        trained_on_issues=(first_issue, last_issue),
        components=summary_components,
        timestamp=datetime.utcnow().isoformat(),
    )
    metadata_path = save_dir / MODEL_METADATA_FILE
    metadata_path.write_text(
        json.dumps(
            {
                **asdict(metadata),
                "components": {key: asdict(value) for key, value in summary_components.items()},
                "data_file": str(PATHS["data"] / cfg.code / DATA_FILE_NAME),
            },
            ensure_ascii=False,
            indent=2,
        ),
        encoding="utf-8",
    )
    logger.success("训练摘要已写入 {}", metadata_path)
    return metadata


def load_trained_models(code: str, window_size: Optional[int] = None) -> Dict[str, tf.keras.Model]:
    """从磁盘加载训练好的模型。"""

    cfg = get_lottery_config(code)
    window = window_size or cfg.default_window
    directory = PATHS["model"] / cfg.code / f"window_{window}"
    if not directory.exists():
        raise FileNotFoundError(f"未找到已训练的模型目录: {directory}")

    models: Dict[str, tf.keras.Model] = {}

    for component in ("red", "blue"):
        model_path = directory / f"{component}.keras"
        if model_path.exists():
            models[component] = tf.keras.models.load_model(
                model_path,
                compile=True,
                safe_mode=False,
            )
            logger.info("载入模型 {}", model_path)
    if not models:
        raise FileNotFoundError(f"{directory} 下未找到 red/blue 模型文件")
    return models


def predict_next_draw(
    code: str,
    window_size: Optional[int] = None,
) -> Dict[str, np.ndarray]:
    """使用最新模型预测下一期开奖号码。"""

    cfg = get_lottery_config(code)
    window = window_size or cfg.default_window
    df = load_history(cfg.code)
    arrays = prepare_training_arrays(df, cfg, window)
    models = load_trained_models(cfg.code, window)

    predictions: Dict[str, np.ndarray] = {}

    # 最新窗口特征取 prepare_training_arrays 中的原始数组最后 window 条
    red_dataset = arrays["red"]
    latest_features = _get_latest_window(red_dataset.features, 1).reshape(1, window, cfg.red.sequence_len)

    red_model = models["red"]
    red_pred = red_model.predict(latest_features, verbose=0)  # shape: (1, 球位数, 类别数)
    red_pred = red_pred.squeeze(axis=0)  # shape: (球位数, 类别数)
    num_balls = cfg.red.sequence_len
    num_classes = cfg.red.num_classes
    # 贪心去重采样：每次选概率最大的未被选过的数字
    chosen = set()
    result = []
    for i in range(num_balls):
        probs = red_pred[i].copy()
        # 将已选过的数字概率置为-1，避免重复
        for idx in chosen:
            if idx < len(probs):
                probs[idx] = -1
        pick = int(np.argmax(probs))
        chosen.add(pick)
        result.append(pick)
    predictions["red"] = np.array(result, dtype=int)

    if cfg.blue and "blue" in models:
        blue_dataset = arrays["blue"]
        latest_blue = _get_latest_window(blue_dataset.features, 1).reshape(1, window, cfg.blue.sequence_len)
        blue_pred_raw = models["blue"].predict(latest_blue, verbose=0)
        predictions["blue"] = np.argmax(blue_pred_raw, axis=-1).squeeze(axis=0).astype(int)

    # 将预测结果转换回原始编号（0-based -> 1-based）
    if red_dataset.needs_offset:
        predictions["red"] = _denormalize(predictions["red"], cfg.red.num_classes)
    if "blue" in predictions:
        blue_dataset = arrays["blue"]
        if blue_dataset.needs_offset:
            predictions["blue"] = _denormalize(predictions["blue"], cfg.blue.num_classes)
    return predictions


__all__ = ["train_lottery_models", "load_trained_models", "predict_next_draw", "TrainingSummary"]


================================================
FILE: src/preprocessing.py
================================================
from __future__ import annotations

from dataclasses import dataclass
from typing import Dict, Tuple

import numpy as np
import pandas as pd

from .config import LotteryModelConfig, SequenceModelSpec


@dataclass(frozen=True)
class ComponentDataset:
    """封装单个号码序列的特征、标签与偏移信息。"""

    features: np.ndarray
    labels: np.ndarray
    needs_offset: bool


def _select_number_columns(df: pd.DataFrame, prefix: str, count: int) -> pd.DataFrame:
    columns = [f"{prefix}_{idx + 1}" for idx in range(count)]
    missing = [col for col in columns if col not in df.columns]
    if missing:
        raise ValueError(f"数据缺失列: {missing}")
    return df[columns]


def _needs_offset(values: np.ndarray, spec: SequenceModelSpec) -> bool:
    if values.size == 0:
        return False
    arr = values.astype(int)
    return arr.min() >= 1 and arr.max() <= spec.num_classes


def _to_zero_based(values: np.ndarray, shift: bool) -> np.ndarray:
    arr = values.astype(np.int32)
    if shift:
        arr = arr - 1
    return arr


def _build_windows(array: np.ndarray, window_size: int) -> Tuple[np.ndarray, np.ndarray]:
    features, labels = [], []
    for idx in range(window_size, len(array)):
        features.append(array[idx - window_size : idx])
        labels.append(array[idx])
    return np.asarray(features, dtype=np.int32), np.asarray(labels, dtype=np.int32)


def prepare_training_arrays(
    df: pd.DataFrame,
    config: LotteryModelConfig,
    window_size: int,
) -> Dict[str, ComponentDataset]:
    """基于历史数据构建训练所需的 NumPy 数组。"""

    sorted_df = df.sort_values("期数").reset_index(drop=True)
    result: Dict[str, ComponentDataset] = {}

    red_df = _select_number_columns(sorted_df, "红球", config.red.sequence_len)
    red_shift = _needs_offset(red_df.values, config.red)
    red_array = _to_zero_based(red_df.values, red_shift)
    red_x, red_y = _build_windows(red_array, window_size)
    result["red"] = ComponentDataset(red_x, red_y, red_shift)

    if config.blue:
        blue_df = _select_number_columns(sorted_df, "蓝球", config.blue.sequence_len)
        blue_shift = _needs_offset(blue_df.values, config.blue)
        blue_array = _to_zero_based(blue_df.values, blue_shift)
        blue_x, blue_y = _build_windows(blue_array, window_size)
        result["blue"] = ComponentDataset(blue_x, blue_y, blue_shift)

    return result


def train_validation_split(
    x: np.ndarray,
    y: np.ndarray,
    validation_ratio: float = 0.1,
) -> Tuple[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]:
    """将窗口数据按比例划分训练与验证集合。"""

    if not 0 < validation_ratio < 1:
        raise ValueError("validation_ratio 必须介于 (0, 1) 之间")
    total = x.shape[0]
    split_index = max(1, int(total * (1 - validation_ratio)))
    if split_index >= total:
        split_index = total - 1
    if split_index <= 0:
        split_index = total // 2 or 1
    return (x[:split_index], y[:split_index]), (x[split_index:], y[split_index:])


__all__ = ["prepare_training_arrays", "train_validation_split", "ComponentDataset"]


================================================
FILE: tests/conftest.py
================================================
import os
import random
from typing import Iterator

import numpy as np
import pytest

from src import config as project_config


@pytest.fixture(autouse=True)
def isolate_paths(tmp_path) -> Iterator[None]:
    """将数据/模型等输出目录重定向到临时路径，保证测试隔离。"""

    original_paths = project_config.PATHS.copy()
    original_name_path = project_config.name_path.copy()

    for key in project_config.PATHS.keys():
        new_dir = tmp_path / key
        new_dir.mkdir(parents=True, exist_ok=True)
        project_config.PATHS[key] = new_dir

    project_config.name_path = {
        code: {
            "name": cfg.name,
            "path": f"{(project_config.PATHS['data'] / code).as_posix()}/",
        }
        for code, cfg in project_config.LOTTERY_CONFIGS.items()
    }

    yield

    project_config.PATHS.update(original_paths)
    project_config.name_path = original_name_path


@pytest.fixture(autouse=True)
def set_random_seed() -> None:
    seed = 42
    random.seed(seed)
    np.random.seed(seed)
    try:
        import tensorflow as tf  # type: ignore

        tf.random.set_seed(seed)
    except Exception:
        pass
    os.environ.setdefault("TF_CPP_MIN_LOG_LEVEL", "2")


================================================
FILE: tests/test_config.py
================================================
# -*- coding: utf-8 -*-
import pytest

from src.config import (
    LOTTERY_CONFIGS,
    PATHS,
    ensure_runtime_directories,
    get_lottery_config,
    name_path,
)


def test_get_lottery_config_returns_dataclass():
    cfg = get_lottery_config("ssq")
    assert cfg.code == "ssq"
    assert cfg.red.sequence_len == 6
    assert cfg.red.num_classes == 33


def test_ensure_runtime_directories(tmp_path):
    for key in PATHS:
        PATHS[key] = tmp_path / key
    ensure_runtime_directories()
    for key, path in PATHS.items():
        assert path.exists()
        assert path.is_dir()


def test_name_path_backward_compatibility():
    assert "ssq" in name_path
    assert name_path["ssq"]["name"] == LOTTERY_CONFIGS["ssq"].name
    assert name_path["ssq"]["path"].endswith("/")


================================================
FILE: tests/test_modeling.py
================================================
# -*- coding: utf-8 -*-
import numpy as np

from src.modeling import build_sequence_model
from src.config import SequenceModelSpec


def test_build_sequence_model_trains_and_predicts():
    spec = SequenceModelSpec(
        sequence_len=3,
        num_classes=5,
        embedding_dim=4,
        hidden_units=(8,),
        dropout=0.1,
    )
    model = build_sequence_model(spec, window_size=2, learning_rate=1e-3, name="test_lstm")
    x = np.random.randint(0, spec.num_classes, size=(12, 2, spec.sequence_len))
    y = np.random.randint(0, spec.num_classes, size=(12, spec.sequence_len))
    history = model.fit(x, y, epochs=1, batch_size=4, verbose=0)

    assert "loss" in history.history
    preds = model.predict(x[:2], verbose=0)
    assert preds.shape == (2, spec.sequence_len, spec.num_classes)



================================================
FILE: tests/test_pipeline.py
================================================
# -*- coding: utf-8 -*-
import json
from pathlib import Path

import numpy as np
import pandas as pd

from src.config import LOTTERY_CONFIGS, LotteryModelConfig, SequenceModelSpec
from src.pipeline import load_trained_models, predict_next_draw, train_lottery_models


def build_tiny_config() -> LotteryModelConfig:
    return LotteryModelConfig(
        code="ssq",
        name="双色球",
        red=SequenceModelSpec(
            sequence_len=6,
            num_classes=33,
            embedding_dim=8,
            hidden_units=(16,),
            dropout=0.1,
        ),
        blue=SequenceModelSpec(
            sequence_len=1,
            num_classes=16,
            embedding_dim=4,
            hidden_units=(8,),
            dropout=0.1,
        ),
        default_window=3,
        default_batch_size=8,
        default_red_epochs=3,
        default_blue_epochs=2,
        learning_rate=1e-3,
    )


def create_fake_dataset(path: Path, rows: int = 12) -> None:
    records = []
    for idx in range(rows):
        issue = f"2024{idx:03d}"
        base = np.arange(1, 7) + idx % 5
        red_numbers = (base % 33) + 1
        blue_number = (idx % 16) + 1
        record = {"期数": issue}
        for i, value in enumerate(red_numbers, start=1):
            record[f"红球_{i}"] = int(value)
        record["蓝球_1"] = int(blue_number)
        records.append(record)
    df = pd.DataFrame(records)
    path.parent.mkdir(parents=True, exist_ok=True)
    df.to_csv(path, index=False, encoding="utf-8")


def test_train_and_predict_pipeline(monkeypatch, tmp_path):
    # 替换 ssq 配置为轻量级版本，保证测试速度
    tiny_cfg = build_tiny_config()
    monkeypatch.setitem(LOTTERY_CONFIGS, "ssq", tiny_cfg)
    data_path = tmp_path / "data" / "ssq" / "data.csv"
    create_fake_dataset(data_path, rows=12)

    summary = train_lottery_models(
        code="ssq",
        window_size=3,
        batch_size=4,
        red_epochs=2,
        blue_epochs=1,
        validation_ratio=0.2,
    )

    assert summary.code == "ssq"
    model_dir = tmp_path / "model" / "ssq" / "window_3"
    assert (model_dir / "red.keras").exists()
    assert (model_dir / "metadata.json").exists()

    models = load_trained_models("ssq", window_size=3)
    assert set(models.keys()) == {"red", "blue"}

    predictions = predict_next_draw("ssq", window_size=3)
    assert "red" in predictions
    assert len(predictions["red"]) == tiny_cfg.red.sequence_len
    assert all(1 <= value <= tiny_cfg.red.num_classes for value in predictions["red"])

    metadata = json.loads((model_dir / "metadata.json").read_text(encoding="utf-8"))
    assert metadata["code"] == "ssq"


================================================
FILE: tests/test_preprocessing.py
================================================
# -*- coding: utf-8 -*-
import pandas as pd
import pytest

from src.config import LotteryModelConfig, SequenceModelSpec
from src.preprocessing import ComponentDataset, prepare_training_arrays, train_validation_split


@pytest.fixture
def tiny_config() -> LotteryModelConfig:
    return LotteryModelConfig(
        code="demo",
        name="示例彩票",
        red=SequenceModelSpec(
            sequence_len=3,
            num_classes=5,
            embedding_dim=4,
            hidden_units=(8,),
        ),
        blue=SequenceModelSpec(
            sequence_len=1,
            num_classes=3,
            embedding_dim=2,
            hidden_units=(4,),
        ),
        default_window=2,
        default_batch_size=4,
    )


def test_prepare_training_arrays_handles_offset(tiny_config):
    df = pd.DataFrame(
        {
            "期数": ["1", "2", "3", "4"],
            "红球_1": [1, 2, 3, 4],
            "红球_2": [2, 3, 4, 5],
            "红球_3": [3, 4, 5, 1],
            "蓝球_1": [0, 1, 2, 0],
        }
    )
    datasets = prepare_training_arrays(df, tiny_config, window_size=2)
    assert set(datasets.keys()) == {"red", "blue"}
    red_ds: ComponentDataset = datasets["red"]
    blue_ds: ComponentDataset = datasets["blue"]
    assert red_ds.features.shape == (2, 2, 3)
    assert red_ds.needs_offset is True
    assert blue_ds.needs_offset is False
    assert (red_ds.features.min(), red_ds.features.max()) == (0, 4)


def test_train_validation_split_minimum_samples():
    x = [[1], [2], [3], [4]]
    y = [[10], [20], [30], [40]]
    (x_train, y_train), (x_val, y_val) = train_validation_split(
        x,
        y,
        validation_ratio=0.25,
    )
    assert len(x_train) == 3
    assert len(x_val) == 1


================================================
FILE: validation_report.py
================================================
# -*- coding: utf-8 -*-
"""
验证 kl8_analysis_plus.py 修复结果

Author: KittenCN
"""

def main():
    print("=== 验证 kl8_analysis_plus.py 修复状况 ===\n")
    
    # 1. 测试 sklearn 导入
    try:
        from sklearn.cluster import KMeans
        print("✅ sklearn.cluster.KMeans 导入成功")
    except ImportError as e:
        print(f"❌ sklearn 导入失败: {e}")
        return False
    
    # 2. 测试 matplotlib 导入  
    try:
        import matplotlib.pyplot as plt
        print("✅ matplotlib.pyplot 导入成功")
    except ImportError as e:
        print(f"❌ matplotlib 导入失败: {e}")
        return False
    
    # 3. 检查已修复的文件
    import os
    kl8_files = [
        'src/analysis/kl8_analysis_plus.py',
        'src/analysis/kl8_analysis.py', 
        'src/analysis/kl8_cash_plus.py',
        'src/analysis/kl8_cash.py'
    ]
    
    print("\n📝 检查修复的文件:")
    for file_path in kl8_files:
        if os.path.exists(file_path):
            print(f"   ✅ {file_path} 存在")
        else:
            print(f"   ❌ {file_path} 不存在")
    
    print("\n🔧 已完成的修复:")
    print("   • 安装了 scikit-learn>=1.1.0")
    print("   • 安装了 matplotlib>=3.5.0") 
    print("   • 修复了 kl8_analysis_plus.py 中的 'from common import' 为 'from ..common import'")
    print("   • 修复了 kl8_analysis.py 中的导入路径")
    print("   • 修复了 kl8_cash_plus.py 中的导入路径")
    print("   • 修复了 kl8_cash.py 中的导入路径")
    print("   • 所有文件的 'from config import' 已更新为 'from ..config import'")
    
    print("\n🎉 kl8_analysis_plus.py 的导入错误已修复!")
    print("   现在可以正常使用 sklearn.cluster 等依赖了。")
    
    return True

if __name__ == "__main__":
    main()

Download .txt

gitextract_e5o09tfl/

├── .github/
│   └── workflows/
│       └── ci.yml
├── .gitignore
├── AGENTS.md
├── ASSUMPTIONS.md
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── README_KL8.md
├── agent_report.md
├── config/
│   └── config.yaml
├── docker-compose.yml
├── docs/
│   ├── api.md
│   ├── architecture.md
│   ├── decision_record.md
│   ├── environment.md
│   ├── ops.md
│   └── verify.md
├── environment.yml
├── examples/
│   ├── analysis_example.py
│   └── quick_start.py
├── requirements.lock.txt
├── requirements.txt
├── scripts/
│   ├── get_data.py
│   ├── predict.py
│   └── train.py
├── src/
│   ├── __init__.py
│   ├── analysis.py
│   ├── bootstrap.py
│   ├── common.py
│   ├── config.py
│   ├── data_fetcher.py
│   ├── modeling.py
│   ├── pipeline.py
│   └── preprocessing.py
├── tests/
│   ├── conftest.py
│   ├── test_config.py
│   ├── test_modeling.py
│   ├── test_pipeline.py
│   └── test_preprocessing.py
└── validation_report.py

Download .txt

SYMBOL INDEX (68 symbols across 19 files)

FILE: examples/analysis_example.py
  function analysis_example (line 18) | def analysis_example():

FILE: examples/quick_start.py
  function quick_start_example (line 19) | def quick_start_example():

FILE: scripts/get_data.py
  function parse_args (line 26) | def parse_args() -> argparse.Namespace:
  function main (line 43) | def main() -> None:

FILE: scripts/predict.py
  function parse_args (line 27) | def parse_args() -> argparse.Namespace:
  function save_prediction (line 36) | def save_prediction(code: str, data: dict) -> Path:
  function main (line 53) | def main() -> None:

FILE: scripts/train.py
  function parse_args (line 34) | def parse_args() -> argparse.Namespace:
  function main (line 49) | def main() -> None:

FILE: src/analysis.py
  function BasicAnalysis (line 13) | def BasicAnalysis(oridata):
  function sortcnt (line 37) | def sortcnt(datacnt, dataori, rangenum=81):
  function getdata (line 49) | def getdata():
  function dfs (line 63) | def dfs(oridata, booldata, getnums, dep, ans, cur):
  function shrink (line 76) | def shrink(oridata, booldata):
  function sumanalyusis (line 91) | def sumanalyusis(limit=-1):

FILE: src/bootstrap.py
  function _ensure_ragged_compat (line 14) | def _ensure_ragged_compat() -> None:
  function _ensure_keras_shim (line 66) | def _ensure_keras_shim() -> None:

FILE: src/common.py
  function _load_pipeline (line 28) | def _load_pipeline():
  function get_data_run (line 37) | def get_data_run(
  function get_current_number (line 53) | def get_current_number(name: str) -> str:
  function train_pipeline (line 62) | def train_pipeline(
  function predict_latest (line 85) | def predict_latest(name: str, window_size: Optional[int] = None) -> Dict...

FILE: src/config.py
  class SequenceModelSpec (line 26) | class SequenceModelSpec:
  class LotteryModelConfig (line 37) | class LotteryModelConfig:
  function _load_yaml_config (line 52) | def _load_yaml_config() -> Dict[str, object]:
  function ensure_runtime_directories (line 205) | def ensure_runtime_directories() -> None:
  function get_lottery_config (line 212) | def get_lottery_config(code: str) -> LotteryModelConfig:

FILE: src/data_fetcher.py
  class DownloadResult (line 38) | class DownloadResult:
  class LotteryHttpClient (line 47) | class LotteryHttpClient:
    method __init__ (line 50) | def __init__(
    method get_text (line 74) | def get_text(self, url: str) -> str:
  function _build_history_url (line 85) | def _build_history_url(config: LotteryModelConfig, start: Optional[int],...
  function _parse_issue_list (line 104) | def _parse_issue_list(config: LotteryModelConfig, html: str) -> pd.DataF...
  function get_current_issue (line 154) | def get_current_issue(code: str, client: Optional[LotteryHttpClient] = N...
  function download_history (line 182) | def download_history(
  function load_history (line 225) | def load_history(code: str) -> pd.DataFrame:

FILE: src/modeling.py
  function _time_distributed_lstm (line 47) | def _time_distributed_lstm(
  function build_sequence_model (line 61) | def build_sequence_model(
  function build_models_for_lottery (line 125) | def build_models_for_lottery(

FILE: src/pipeline.py
  class ComponentTrainingSummary (line 37) | class ComponentTrainingSummary:
  class TrainingSummary (line 46) | class TrainingSummary:
  function _ensure_enough_samples (line 55) | def _ensure_enough_samples(dataset: ComponentDataset, window_size: int, ...
  function _build_tf_dataset (line 62) | def _build_tf_dataset(
  function _denormalize (line 75) | def _denormalize(pred: np.ndarray, spec_classes: int) -> np.ndarray:
  function _get_latest_window (line 81) | def _get_latest_window(arr: np.ndarray, window_size: int) -> np.ndarray:
  function train_lottery_models (line 87) | def train_lottery_models(
  function load_trained_models (line 211) | def load_trained_models(code: str, window_size: Optional[int] = None) ->...
  function predict_next_draw (line 236) | def predict_next_draw(

FILE: src/preprocessing.py
  class ComponentDataset (line 13) | class ComponentDataset:
  function _select_number_columns (line 21) | def _select_number_columns(df: pd.DataFrame, prefix: str, count: int) ->...
  function _needs_offset (line 29) | def _needs_offset(values: np.ndarray, spec: SequenceModelSpec) -> bool:
  function _to_zero_based (line 36) | def _to_zero_based(values: np.ndarray, shift: bool) -> np.ndarray:
  function _build_windows (line 43) | def _build_windows(array: np.ndarray, window_size: int) -> Tuple[np.ndar...
  function prepare_training_arrays (line 51) | def prepare_training_arrays(
  function train_validation_split (line 77) | def train_validation_split(

FILE: tests/conftest.py
  function isolate_paths (line 12) | def isolate_paths(tmp_path) -> Iterator[None]:
  function set_random_seed (line 38) | def set_random_seed() -> None:

FILE: tests/test_config.py
  function test_get_lottery_config_returns_dataclass (line 13) | def test_get_lottery_config_returns_dataclass():
  function test_ensure_runtime_directories (line 20) | def test_ensure_runtime_directories(tmp_path):
  function test_name_path_backward_compatibility (line 29) | def test_name_path_backward_compatibility():

FILE: tests/test_modeling.py
  function test_build_sequence_model_trains_and_predicts (line 8) | def test_build_sequence_model_trains_and_predicts():

FILE: tests/test_pipeline.py
  function build_tiny_config (line 12) | def build_tiny_config() -> LotteryModelConfig:
  function create_fake_dataset (line 38) | def create_fake_dataset(path: Path, rows: int = 12) -> None:
  function test_train_and_predict_pipeline (line 55) | def test_train_and_predict_pipeline(monkeypatch, tmp_path):

FILE: tests/test_preprocessing.py
  function tiny_config (line 10) | def tiny_config() -> LotteryModelConfig:
  function test_prepare_training_arrays_handles_offset (line 31) | def test_prepare_training_arrays_handles_offset(tiny_config):
  function test_train_validation_split_minimum_samples (line 51) | def test_train_validation_split_minimum_samples():

FILE: validation_report.py
  function main (line 8) | def main():

Download .json

Condensed preview — 43 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (201K chars).

[
  {
    "path": ".github/workflows/ci.yml",
    "chars": 2154,
    "preview": "# 彩票AI预测系统 GitHub Actions CI/CD\nname: CI/CD Pipeline\n\non:\n  push:\n    branches: [ main, develop ]\n  pull_request:\n    br"
  },
  {
    "path": ".gitignore",
    "chars": 1560,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\nmodel/\ndata/\ndata_cache/\nresults/\n*.py[cod]\n*$py.class\n*.csv\n\n# C e"
  },
  {
    "path": "AGENTS.md",
    "chars": 11874,
    "preview": "# agent.md | Codex 执行代理规范（中英双语版）\n\n> 目标：让 Codex（或同类代码生成模型）在最少追问的前提下自动完成需求、自测并保证可运行，交付可维护成果。<br>> Goal: Enable Codex (or s"
  },
  {
    "path": "ASSUMPTIONS.md",
    "chars": 4293,
    "preview": "6. KL8（快乐8）相关内容已于2025年10月迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-lottery-analyzer)。本仓库不再维护KL8相关功能"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 3892,
    "preview": "# Changelog\n\nAll notable changes to this project should be documented in this file.\n\n\n## [Unreleased]\n- ❌ **KL8（快乐8）相关内容"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 5202,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participa"
  },
  {
    "path": "Dockerfile",
    "chars": 775,
    "preview": "# 彩票AI预测系统 Docker 镜像\nFROM python:3.11-slim\n\n# 设置工作目录\nWORKDIR /app\n\n# 设置环境变量\nENV PYTHONPATH=/app\nENV PYTHONUNBUFFERED=1\nE"
  },
  {
    "path": "LICENSE",
    "chars": 35149,
    "preview": "                    GNU GENERAL PUBLIC LICENSE\n                       Version 3, 29 June 2007\n\n Copyright (C) 2007 Free "
  },
  {
    "path": "Makefile",
    "chars": 2588,
    "preview": "# 彩票AI预测系统 Makefile\n# 提供一键任务命令\n\n# 默认目标\n.DEFAULT_GOAL := help\n\n# 变量定义\nPYTHON := python\nPIP := pip\nVENV := python311\nSRC_D"
  },
  {
    "path": "README.md",
    "chars": 4825,
    "preview": "# 彩票AI预测系统 v2.0\n\n> **重要声明**：彩票理论上属于完全随机事件，任何一种单一算法都不可能精确预测彩票结果！本项目仅供学习和研究使用，请合理投资，切勿沉迷！\n\n基于深度学习技术的彩票号码预测系统，支持多种彩票类型的数据分析"
  },
  {
    "path": "README_KL8.md",
    "chars": 6065,
    "preview": "# KL8 (快乐8) 彩票数据分析预测工具 🚀\n\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/d"
  },
  {
    "path": "agent_report.md",
    "chars": 975,
    "preview": "\n# 自动执行报告 | Automation Execution Report\n\n## 需求摘要 | Requirement Summary\n- 背景与目标 | Background & objectives:\n  - KL8（快乐8）相关"
  },
  {
    "path": "config/config.yaml",
    "chars": 728,
    "preview": "# 彩票AI预测系统配置文件\n# 用于存放应用程序配置\n\n# 应用信息\napp:\n  name: \"彩票AI预测系统\"\n  version: \"2.0.0\"\n  author: \"KittenCN\"\n\n# 路径配置\npaths:\n  dat"
  },
  {
    "path": "docker-compose.yml",
    "chars": 1170,
    "preview": "# 彩票AI预测系统 Docker Compose 配置\nversion: '3.8'\n\nservices:\n  lottery-predict:\n    build:\n      context: .\n      dockerfile: "
  },
  {
    "path": "docs/api.md",
    "chars": 5198,
    "preview": "# API 接口文档（TensorFlow 2.15 版）\n\n## 概述\n\n新版系统围绕以下核心模块组织：\n\n- `src.config`：集中管理路径、彩票玩法配置、超参数；\n- `src.data_fetcher`：负责网络访问、历史数"
  },
  {
    "path": "docs/architecture.md",
    "chars": 3657,
    "preview": "# 系统架构文档\n\n## 概述\n\n彩票AI预测系统是一个基于深度学习的彩票号码预测系统，采用模块化设计，支持多种彩票类型的数据获取、模型训练和预测分析。\n\n## 系统架构图\n\n```mermaid\ngraph TB\n    A[用户接口层]"
  },
  {
    "path": "docs/decision_record.md",
    "chars": 2419,
    "preview": "## 2025-10：KL8（快乐8）玩法迁移决策\n\n### 背景\nKL8（快乐8）相关源码、数据、分析与脚本已迁移至独立项目 [KL8-Lottery-Analyzer](https://github.com/KittenCN/kl8-l"
  },
  {
    "path": "docs/environment.md",
    "chars": 643,
    "preview": "# 环境复现与依赖管理\n\n建议使用 conda 管理二进制依赖（如 numpy、tensorflow-intel、pytorch 等），并使用 pip 安装剩余纯 Python 包。\n\n推荐流程（在仓库根目录）：\n\n1. 使用 conda "
  },
  {
    "path": "docs/ops.md",
    "chars": 2662,
    "preview": "# 运维手册（Ops Guide）\n\n## 1. 运行环境要求\n- Python 3.11（推荐通过 `conda create -n python311 python=3.11` 创建环境）；\n- 依赖锁定在 `requirements."
  },
  {
    "path": "docs/verify.md",
    "chars": 868,
    "preview": "# 验证步骤与示例输出\n\n在新环境中执行下面的命令以验证 TensorFlow / Keras 导入与我们添加的 shim 行为：\n\n1. 导入 TensorFlow 与 Keras：\n\n```bash\npython -c \"import "
  },
  {
    "path": "environment.yml",
    "chars": 779,
    "preview": "name: predict_lottery\nchannels:\n  - defaults\n  - conda-forge\n  - pytorch\ndependencies:\n  - python=3.11\n  - pip\n  - numpy"
  },
  {
    "path": "examples/analysis_example.py",
    "chars": 916,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n数据分析示例\n\n展示如何使用数据分析功能分析彩票数据\n\nAuthor: KittenCN\n\"\"\"\nimport sys\nimport os\n\n# 添加src目录到Python路径\nsy"
  },
  {
    "path": "examples/quick_start.py",
    "chars": 1197,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n彩票预测系统快速开始示例\n\n本示例展示如何使用彩票预测系统进行数据获取、模型训练和预测\n\nAuthor: KittenCN\n\"\"\"\nimport sys\nimport os\n\n# 添加"
  },
  {
    "path": "requirements.lock.txt",
    "chars": 315,
    "preview": "beautifulsoup4==4.12.3\nblack==24.4.2\nbuild==0.10.0\nmatplotlib==3.9.0\nnumpy==1.24.3\npandas==2.2.2\npyyaml==6.0.1\npython-do"
  },
  {
    "path": "requirements.txt",
    "chars": 313,
    "preview": "tensorflow==2.15.1\nkeras==2.15.0\nrequests==2.32.3\nbeautifulsoup4==4.12.3\npandas==2.2.2\nnumpy>=1.24,<2.0\nlxml==5.2.2\nlogu"
  },
  {
    "path": "scripts/get_data.py",
    "chars": 1405,
    "preview": "# -*- coding:utf-8 -*-\n\"\"\"\n历史数据下载脚本。\n\n示例：\n    python scripts/get_data.py --name ssq --start 2024001 --end 2024350\n\"\"\"\n\nf"
  },
  {
    "path": "scripts/predict.py",
    "chars": 3469,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n预测脚本，基于最新训练好的模型输出下一期号码。\n\n示例：\n    python scripts/predict.py --name ssq --window-size 5 --save"
  },
  {
    "path": "scripts/train.py",
    "chars": 2151,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n模型训练脚本（TensorFlow 2.15+ 版本）。\n\n示例：\n    python scripts/train.py --name ssq --window-size 5 --r"
  },
  {
    "path": "src/__init__.py",
    "chars": 171,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n彩票AI预测系统核心模块\n\n该模块包含了彩票AI预测系统的核心功能，包括：\n- 数据获取和处理\n- 模型训练和预测\n- 数据分析工具\n- 配置管理\n\nAuthor: KittenCN\n"
  },
  {
    "path": "src/analysis.py",
    "chars": 6833,
    "preview": "# -*- coding:utf-8 -*-\n\"\"\"\nAuthor: KittenCN\n\"\"\"\n\nimport pandas as pd\nfrom .config import *\n\ndatacnt = [0] * 81\ndataori ="
  },
  {
    "path": "src/bootstrap.py",
    "chars": 4302,
    "preview": "\"\"\"Bootstrap helpers run early to prepare runtime compatibility (e.g. TensorFlow shims).\n\nThis module should be imported"
  },
  {
    "path": "src/common.py",
    "chars": 2704,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n公共接口封装。\n\n为脚本层提供以下能力：\n1. 下载历史数据：`get_data_run`\n2. 查询最新期号：`get_current_number`\n3. 训练模型：`train_"
  },
  {
    "path": "src/config.py",
    "chars": 6266,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n项目全局配置模块。\n\n该模块负责：\n1. 读取 `config/config.yaml` 获取运行时配置；\n2. 定义彩票玩法的模型超参数与默认训练设置；\n3. 提供路径常量与工具函数"
  },
  {
    "path": "src/data_fetcher.py",
    "chars": 7799,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n数据抓取模块，负责从 500.com 拉取彩票历史数据并保存到本地。\n\n特点：\n1. 使用带重试的 requests.Session，满足网络安全要求；\n2. 输出 Pandas Da"
  },
  {
    "path": "src/modeling.py",
    "chars": 4390,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n模型构建模块。\n\n提供基于 TensorFlow 2.15.1 的多层 LSTM 序列模型，并针对红球/蓝球输出\n逐位置的类别概率。\n\"\"\"\n\nfrom __future__ impo"
  },
  {
    "path": "src/pipeline.py",
    "chars": 9382,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n训练与预测流程封装。\n\n暴露的核心函数：\n- train_lottery_models：基于历史数据训练模型并写入本地；\n- load_trained_models：从磁盘加载已训练模"
  },
  {
    "path": "src/preprocessing.py",
    "chars": 3021,
    "preview": "from __future__ import annotations\n\nfrom dataclasses import dataclass\nfrom typing import Dict, Tuple\n\nimport numpy as np"
  },
  {
    "path": "tests/conftest.py",
    "chars": 1176,
    "preview": "import os\nimport random\nfrom typing import Iterator\n\nimport numpy as np\nimport pytest\n\nfrom src import config as project"
  },
  {
    "path": "tests/test_config.py",
    "chars": 787,
    "preview": "# -*- coding: utf-8 -*-\nimport pytest\n\nfrom src.config import (\n    LOTTERY_CONFIGS,\n    PATHS,\n    ensure_runtime_direc"
  },
  {
    "path": "tests/test_modeling.py",
    "chars": 806,
    "preview": "# -*- coding: utf-8 -*-\nimport numpy as np\n\nfrom src.modeling import build_sequence_model\nfrom src.config import Sequenc"
  },
  {
    "path": "tests/test_pipeline.py",
    "chars": 2621,
    "preview": "# -*- coding: utf-8 -*-\nimport json\nfrom pathlib import Path\n\nimport numpy as np\nimport pandas as pd\n\nfrom src.config im"
  },
  {
    "path": "tests/test_preprocessing.py",
    "chars": 1721,
    "preview": "# -*- coding: utf-8 -*-\nimport pandas as pd\nimport pytest\n\nfrom src.config import LotteryModelConfig, SequenceModelSpec\n"
  },
  {
    "path": "validation_report.py",
    "chars": 1566,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n验证 kl8_analysis_plus.py 修复结果\n\nAuthor: KittenCN\n\"\"\"\n\ndef main():\n    print(\"=== 验证 kl8_analys"
  }
]

About this extraction

This page contains the full source code of the KittenCN/predict_Lottery_ticket GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 43 files (157.0 KB), approximately 50.4k tokens, and a symbol index with 68 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo